In addition, it combines several ingredients, which form the basis of the modern practice of random forests.
A Relational Model of Data for Large Shared Data Banks Written by EF Codd in 1970, this paper was a breakthrough in Relational Data Base systems.
This paper explores the feasibility of building a hybrid system. This paper outlines the S4 architecture in detail, describes various applications, including real-life deployments, to show that the S4 design is surprisingly flexible and lends itself to run in large clusters built with commodity hardware.
Dremel: Interactive Analysis of Web-Scale Datasets This paper describes the architecture and implementation of Dremel, a scalable, interactive ad-hoc query system for analysis of read-only nested data, and explains how it complements Map Reduce-based computing.
He was the man who first conceived of the relational model for database management.
Map-Reduce for Machine Learning on Multicore The paper focuses on developing a general and exact technique for parallel programming of a large class of machine learning algorithms for multicore processors.The Chubby lock service for loosely-coupled distributed systems Chubby is a distributed lock service; it does a lot of the hard parts of building distributed systems and provides its users with a familiar interface (writing files, taking a lock, file permissions).The paper describes it, focusing on the API rather than the implementation details.Proponents of parallel databases argue that the strong emphasis on performance and efficiency of parallel databases makes them well-suited to perform such analysis.On the other hand, others argue that Map Reduce-based systems are better suited due to their superior scalability, fault tolerance, and flexibility to handle unstructured data.F1: A Distributed SQL Database That Scales F1 is a distributed relational database system built at Google to support the Ad Words business.F1 is a hybrid database that combines high availability, the scalability of No SQL systems like Bigtable, and the consistency and usability of traditional SQL databases.Finding a needle in Haystack: Facebook’s photo storage This paper describes Haystack, an object storage system optimized for Facebook’s Photos application.Facebook currently stores over 260 billion images, which translates to over 20 petabytes of data.Chukwa: A large-scale monitoring system This paper describes the design and initial implementation of Chukwa, a data collection system for monitoring and analyzing large distributed systems.Chukwa is built on top of Hadoop, an open source distributed filesystem and Map Reduce implementation, and inherits Hadoop’s scalability and robustness.