How Does Hive Compare to HBase

7/28/2019 How Does Hive Compare to HBase

1/26

How does Hive compare to HBase?

up

vote35 down vot e favorite

11

I'm interested in finding out how the recently-released (http://mirror.facebook.com/facebook/hive/hadoop-0.17/ ) Hive compares to HBase in terms of performance. The SQL-like interface used by Hive is very much

preferable to the HBase API we have implemented.hadoop hbase hive

share |improve this question asked Aug 23 '08 at 12:

mrhahn 202 146

5 Answersactiveoldes tvotes

up

vote28 down vote accepted

It's hard to find much about Hive, but I found this snippet on the Hive site that leans heavily in favor of HBase (bold added):Hive is based on Hadoop which is a batch processing system. Accordingly, this system does not and cannot

romise low latencies on queries . The paradigm here is strictly of submitting jobs and being notified whenthe jobs are completed as opposed to real time queries. As a result it should not be compared with systemslike Oracle where analysis is done on a significantly smaller amount of data but the analysis proceeds muchmore iteratively with the response times between iterations being less than a few minutes. For Hive queries

response times for even the smallest jobs can be of the order of 5-10 minutes and for larger jobs thismay even run into hours. Since HBase and HyperTable are all about performance (being modeled on Google's BigTable), they soundlike they would certainly be much faster than Hive, at the cost of functionality and a higher learning curve(e.g., they don't have joins or the SQL-like syntax).

share |improve this answer answered Aug 30 '08 at

Chris Bunch 16.7k 176092

up

vote8dowrom one perspective, Hive consists of five main components: a SQL-like grammar and parser, a query planner,query execution engine, a metadata repository, and a columnar storage layout. Its primary focus is data
http://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbasehttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbasehttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbasehttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbasehttp://mirror.facebook.com/facebook/hive/hadoop-0.17/http://mirror.facebook.com/facebook/hive/hadoop-0.17/http://mirror.facebook.com/facebook/hive/hadoop-0.17/http://mirror.facebook.com/facebook/hive/hadoop-0.17/http://stackoverflow.com/questions/tagged/hadoophttp://stackoverflow.com/questions/tagged/hbasehttp://stackoverflow.com/questions/tagged/hbasehttp://stackoverflow.com/questions/tagged/hivehttp://stackoverflow.com/questions/tagged/hivehttp://stackoverflow.com/questions/tagged/hivehttp://stackoverflow.com/q/24179http://stackoverflow.com/posts/24179/edithttp://stackoverflow.com/posts/24179/edithttp://stackoverflow.com/posts/24179/edithttp://stackoverflow.com/users/2588/mrhahnhttp://stackoverflow.com/users/2588/mrhahnhttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase?answertab=active#tab-tophttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase?answertab=votes#tab-tophttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase?answertab=votes#tab-tophttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase?answertab=votes#tab-tophttp://wiki.apache.org/hadoop/Hivehttp://wiki.apache.org/hadoop/Hivehttp://wiki.apache.org/hadoop/Hivehttp://stackoverflow.com/a/36398http://stackoverflow.com/posts/36398/edithttp://stackoverflow.com/posts/36398/edithttp://stackoverflow.com/posts/36398/edithttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/users/422/chris-bunchhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyNCwiYXYiOjQxNCwiYXQiOjQsImNtIjoyOTExLCJjaCI6MTE3OCwiY3IiOjEwNzY5LCJkbSI6MSwiZmMiOjE2ODc1LCJmbCI6NzQ3Miwia3ciOiJoYWRvb3AsaGJhc2UsaGl2ZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzA0IiwicnYiOjAsInByIjoxNjA0LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6ImNkODVhMzA2NmFmMzRiMjQ5ZTczZmJhYWIyNjk3YTM5IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=gP0j6m2aLsCpRf0BZZJZuHWxUEEhttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/users/2588/mrhahnhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyMSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjksImRtIjoxLCJmYyI6MTY4ODYsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCxoYmFzZSxoaXZlIiwibnciOjIyLCJyZiI6Imh0dHA6Ly9zdGFja292ZXJmbG93LmNvbS9zZWFyY2g_cGFnZT0zMDQiLCJydiI6MCwicHIiOjE1NjgsInN0Ijo4Mjc3LCJ6biI6NDMsImRpIjoiMTFlZmE0MDkzMzI4NGYyZGEyZmU0MjAyOTA3M2I5NWIiLCJ1ciI6Imh0dHA6Ly9jYXJlZXJzLnN0YWNrb3ZlcmZsb3cuY29tL2pvYnMvdGVsZWNvbW11dGUifQ&s=7xecnZqEFl9kyJ7tTx0h8qi_PVshttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyNCwiYXYiOjQxNCwiYXQiOjQsImNtIjoyOTExLCJjaCI6MTE3OCwiY3IiOjEwNzY5LCJkbSI6MSwiZmMiOjE2ODc1LCJmbCI6NzQ3Miwia3ciOiJoYWRvb3AsaGJhc2UsaGl2ZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzA0IiwicnYiOjAsInByIjoxNjA0LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6ImNkODVhMzA2NmFmMzRiMjQ5ZTczZmJhYWIyNjk3YTM5IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=gP0j6m2aLsCpRf0BZZJZuHWxUEEhttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/users/2588/mrhahnhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyMSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjksImRtIjoxLCJmYyI6MTY4ODYsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCxoYmFzZSxoaXZlIiwibnciOjIyLCJyZiI6Imh0dHA6Ly9zdGFja292ZXJmbG93LmNvbS9zZWFyY2g_cGFnZT0zMDQiLCJydiI6MCwicHIiOjE1NjgsInN0Ijo4Mjc3LCJ6biI6NDMsImRpIjoiMTFlZmE0MDkzMzI4NGYyZGEyZmU0MjAyOTA3M2I5NWIiLCJ1ciI6Imh0dHA6Ly9jYXJlZXJzLnN0YWNrb3ZlcmZsb3cuY29tL2pvYnMvdGVsZWNvbW11dGUifQ&s=7xecnZqEFl9kyJ7tTx0h8qi_PVshttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyNCwiYXYiOjQxNCwiYXQiOjQsImNtIjoyOTExLCJjaCI6MTE3OCwiY3IiOjEwNzY5LCJkbSI6MSwiZmMiOjE2ODc1LCJmbCI6NzQ3Miwia3ciOiJoYWRvb3AsaGJhc2UsaGl2ZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzA0IiwicnYiOjAsInByIjoxNjA0LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6ImNkODVhMzA2NmFmMzRiMjQ5ZTczZmJhYWIyNjk3YTM5IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=gP0j6m2aLsCpRf0BZZJZuHWxUEEhttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/users/2588/mrhahnhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyMSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjksImRtIjoxLCJmYyI6MTY4ODYsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCxoYmFzZSxoaXZlIiwibnciOjIyLCJyZiI6Imh0dHA6Ly9zdGFja292ZXJmbG93LmNvbS9zZWFyY2g_cGFnZT0zMDQiLCJydiI6MCwicHIiOjE1NjgsInN0Ijo4Mjc3LCJ6biI6NDMsImRpIjoiMTFlZmE0MDkzMzI4NGYyZGEyZmU0MjAyOTA3M2I5NWIiLCJ1ciI6Imh0dHA6Ly9jYXJlZXJzLnN0YWNrb3ZlcmZsb3cuY29tL2pvYnMvdGVsZWNvbW11dGUifQ&s=7xecnZqEFl9kyJ7tTx0h8qi_PVshttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyNCwiYXYiOjQxNCwiYXQiOjQsImNtIjoyOTExLCJjaCI6MTE3OCwiY3IiOjEwNzY5LCJkbSI6MSwiZmMiOjE2ODc1LCJmbCI6NzQ3Miwia3ciOiJoYWRvb3AsaGJhc2UsaGl2ZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzA0IiwicnYiOjAsInByIjoxNjA0LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6ImNkODVhMzA2NmFmMzRiMjQ5ZTczZmJhYWIyNjk3YTM5IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=gP0j6m2aLsCpRf0BZZJZuHWxUEEhttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/users/2588/mrhahnhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1MjQ3NTkyMSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjksImRtIjoxLCJmYyI6MTY4ODYsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCxoYmFzZSxoaXZlIiwibnciOjIyLCJyZiI6Imh0dHA6Ly9zdGFja292ZXJmbG93LmNvbS9zZWFyY2g_cGFnZT0zMDQiLCJydiI6MCwicHIiOjE1NjgsInN0Ijo4Mjc3LCJ6biI6NDMsImRpIjoiMTFlZmE0MDkzMzI4NGYyZGEyZmU0MjAyOTA3M2I5NWIiLCJ1ciI6Imh0dHA6Ly9jYXJlZXJzLnN0YWNrb3ZlcmZsb3cuY29tL2pvYnMvdGVsZWNvbW11dGUifQ&s=7xecnZqEFl9kyJ7tTx0h8qi_PVshttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/users/422/chris-bunchhttp://stackoverflow.com/posts/36398/edithttp://stackoverflow.com/a/36398http://wiki.apache.org/hadoop/Hivehttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase?answertab=votes#tab-tophttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase?answertab=active#tab-tophttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase?answertab=active#tab-tophttp://stackoverflow.com/users/2588/mrhahnhttp://stackoverflow.com/users/2588/mrhahnhttp://stackoverflow.com/users/2588/mrhahnhttp://stackoverflow.com/posts/24179/edithttp://stackoverflow.com/q/24179http://stackoverflow.com/questions/tagged/hivehttp://stackoverflow.com/questions/tagged/hbasehttp://stackoverflow.com/questions/tagged/hadoophttp://mirror.facebook.com/facebook/hive/hadoop-0.17/http://mirror.facebook.com/facebook/hive/hadoop-0.17/http://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbasehttp://stackoverflow.com/questions/24179/how-does-hive-compare-to-hbase


2/26

n vote arehouse-style analytical workloads, so low latency retrieval of values by key is not necessary.

Base has its own metadata repository and columnar storage layout. It is possible to author HiveQL queriesover HBase tables, allowing HBase to take advantage of Hive's grammar and parser, query planner, and queryexecution engine. See http://wiki.apache.org/hadoop/Hive/HBaseIntegration for more details.

share |improve this answer answered Jun 4 '10 at 4:3

Jeff Hammerbacher 1,403 719

up

vote6down vote

ive is an analytics tool. Just like pig, it was designed for ad hoc batch processing of potentially enourmousmounts of data by leveraging map reduce. Think terrabytes. Imagine trying to do that in a relational database...

Base is a column based key value store based on BigTable. You can't do queries per se, though you can runap reduce jobs over HBase. It's primary use case is fetching rows by key, or scanning ranges of rows. A major

feature is being able to have data locality when scanning across ranges of row keys for a 'family' of columns.

share |improve this answer edited Oct 7 '11 at 9:30

Bolo 4,086 11031

answered Jun 25 '09 at 21

Tim 354 26

up

vote3downvote

To my humble knowledge, Hive is more comparable to Pig. Hive is SQL-like and Pig is script based. Hiveseems to be more complicated with query optimization and execution engines as well as requires end user needs to specify schema parameters(partition etc). Both are intend to process text files, or sequenceFiles.

HBase is for key value data store and retrieve...you can scan or filter on those key value pairs(rows). You cannot do queries on (key,value) rows.

share |improve this answer answered Jun 6 '10 at 5

haijin 236 139

up

vote0downvote

As of the most recent Hive releases, a lot has changed that requires a small update as Hive and HBase arenow integrated . What this means is that Hive can be used as a query layer to an HBase datastore. Now if

people are looking for alternative HBase interfaces, Pig also offers a really nice way of loading and storing

HBase data . Additionally, it looks like Cloudera Impala may offer substantial performance Hive basedqueries on top of HBase. They are claim up to 45x faster queries over traditional Hive setups.

share |improve this answer

44 downvote favorite

58

I'm learning traditional Relational Databases (with PostgreSQL ) and doing some research I've come across somenew types of databases. CouchDB , Drizzle , and Scalaris to name a few, what is going to be the next databasetechnologies to deal with?sql database nosql non-relational-database
http://wiki.apache.org/hadoop/Hive/HBaseIntegrationhttp://wiki.apache.org/hadoop/Hive/HBaseIntegrationhttp://wiki.apache.org/hadoop/Hive/HBaseIntegrationhttp://stackoverflow.com/a/2971553http://stackoverflow.com/posts/2971553/edithttp://stackoverflow.com/posts/2971553/edithttp://stackoverflow.com/posts/2971553/edithttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/a/1046386http://stackoverflow.com/posts/1046386/edithttp://stackoverflow.com/posts/1046386/edithttp://stackoverflow.com/posts/1046386/edithttp://stackoverflow.com/posts/1046386/revisionshttp://stackoverflow.com/posts/1046386/revisionshttp://stackoverflow.com/posts/1046386/revisionshttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/a/2983181http://stackoverflow.com/posts/2983181/edithttp://stackoverflow.com/posts/2983181/edithttp://stackoverflow.com/posts/2983181/edithttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/users/359535/haijinhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegrationhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegrationhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegrationhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegrationhttp://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.htmlhttp://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.htmlhttp://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.htmlhttp://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.htmlhttp://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/http://stackoverflow.com/a/14699935http://stackoverflow.com/posts/14699935/edithttp://stackoverflow.com/posts/14699935/edithttp://stackoverflow.com/posts/14699935/edithttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813http://www.postgresql.org/http://www.postgresql.org/http://www.postgresql.org/http://couchdb.apache.org/http://couchdb.apache.org/http://couchdb.apache.org/https://launchpad.net/drizzlehttps://launchpad.net/drizzlehttps://launchpad.net/drizzlehttp://www.zib.de/CSR/Projects/scalaris/http://www.zib.de/CSR/Projects/scalaris/http://www.zib.de/CSR/Projects/scalaris/http://stackoverflow.com/questions/tagged/sqlhttp://stackoverflow.com/questions/tagged/databasehttp://stackoverflow.com/questions/tagged/databasehttp://stackoverflow.com/questions/tagged/nosqlhttp://stackoverflow.com/questions/tagged/nosqlhttp://stackoverflow.com/questions/tagged/non-relational-databasehttp://stackoverflow.com/questions/tagged/non-relational-databasehttp://stackoverflow.com/questions/tagged/non-relational-databasehttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/questions/tagged/non-relational-databasehttp://stackoverflow.com/questions/tagged/nosqlhttp://stackoverflow.com/questions/tagged/databasehttp://stackoverflow.com/questions/tagged/sqlhttp://www.zib.de/CSR/Projects/scalaris/https://launchpad.net/drizzlehttp://couchdb.apache.org/http://www.postgresql.org/http://stackoverflow.com/questions/282783/the-next-gen-databases/282813http://stackoverflow.com/posts/14699935/edithttp://stackoverflow.com/a/14699935http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-queries-in-apache-hadoop-for-real/http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.htmlhttp://pig.apache.org/docs/r0.9.1/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.htmlhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegrationhttps://cwiki.apache.org/confluence/display/Hive/HBaseIntegrationhttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/users/359535/haijinhttp://stackoverflow.com/posts/2983181/edithttp://stackoverflow.com/a/2983181http://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/128876/timhttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/users/320226/bolohttp://stackoverflow.com/posts/1046386/revisionshttp://stackoverflow.com/posts/1046386/edithttp://stackoverflow.com/a/1046386http://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/users/171965/jeff-hammerbacherhttp://stackoverflow.com/posts/2971553/edithttp://stackoverflow.com/a/2971553http://wiki.apache.org/hadoop/Hive/HBaseIntegration


3/26

share |improve this question edited Apr 30 '12 at 22:43

Jakub Konecki

21.4k 31756

asked Nov 12 '08 at 2:02

Randin

615 1914 Could someone please update this question to refer to "databases" instead of "SQL"? Rick Nov 12 '08 at 2:05

Even though randin is using the term SQL incorrectly, I think that change would be against the spirit of peer editing. Bill Karwin Nov 12 '08 at 2:29

too late.. sorry Bill. Feel free to roll back my edit if you feel strongly. I made my change before you posted your comment. I think rephrasing it the way I did is both educational to the OP and more useful to the community. SquareCog Nov 12 '08 at 2:31

1 Well, it's good to be correct. A tech writer friend of mine said, "you can't get the right answers unless you ask the rightquestions." Bill Karwin Nov 12 '08 at 2:36

1 Ah, sorry about the misleading question, my knowledge of SQL and databases was non-existent when I asked thequestion. Randin Mar 16 '09 at 2:53

show 1 more comment


up


I would say next-gen database , not next-gen SQL.SQL is a language for querying and manipulating relational databases. SQL is dictated by an internationalstandard. While the standard is revised, it seems to always work within the relational database paradigm.

Here are a few new data storage technologies that are getting attention currently:

CouchDB is a non-relational database. They call it a document-oriented database. Amazon SimpleDB is also a non-relational database accessed in a distributed manner through a web

service. Amazon also has a distributed key-value store called Dynamo , which powers some of its S3services.

Dynomite and Kai are open source solutions inspired by Amazon Dynamo. BigTable is a proprietary data storage solution used by Google, and implemented using their Google

File System technology. Google's MapReduce framework uses BigTable. Hadoop is an open-source technology inspired by Google's MapReduce, and serving a similar need, to

distribute the work of very large scale data stores. Scalaris is a distributed transactional key/value store. Also not relational, and does not use SQL. It's a

research project from the Zuse Institute in Berlin, Germany. RDF is a standard for storing semantic data, in which data and metadata are interchangeable. It has its

own query language SPARQL, which resembles SQL superficially, but is actually totally different. Vertica is a highly scalable column-oriented analytic database designed for distributed (grid)

architecture. It does claim to be relational and SQL-compliant. It can be used through Amazon's Elastic

Compute Cloud. Greenplum is a high-scale data warehousing DBMS, which implements both MapReduce and SQL. XML isn't a DBMS at all, it's an interchange format. But some DBMS products work with data in XML

format. ODBMS , or Object Databases, are for managing complex data. There don't seem to be any dominant

ODBMS products in the mainstream, perhaps because of lack of standardization. Standard SQL isgradually gaining some OO features (e.g. extensible data types and tables).

Drizzle is a relational database, drawing a lot of its code from MySQL. It includes various architecturalchanges designed to manage data in a scalable "cloud computing" system architecture. Presumably it
http://stackoverflow.com/q/282783http://stackoverflow.com/posts/282783/edithttp://stackoverflow.com/posts/282783/edithttp://stackoverflow.com/posts/282783/edithttp://stackoverflow.com/posts/282783/revisionshttp://stackoverflow.com/posts/282783/revisionshttp://stackoverflow.com/posts/282783/revisionshttp://stackoverflow.com/users/449906/jakub-koneckihttp://stackoverflow.com/users/449906/jakub-koneckihttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/14138/rickhttp://stackoverflow.com/users/14138/rickhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133454_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133454_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133454_282783http://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133487_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133487_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133487_282783http://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133491_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133491_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133491_282783http://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133499_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133499_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133499_282783http://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment462640_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment462640_282783http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment462640_282783http://stackoverflow.com/questions/282783/the-next-gen-databases?answertab=active#tab-tophttp://stackoverflow.com/questions/282783/the-next-gen-databases?answertab=votes#tab-tophttp://stackoverflow.com/questions/282783/the-next-gen-databases?answertab=votes#tab-tophttp://stackoverflow.com/questions/282783/the-next-gen-databases?answertab=votes#tab-tophttp://couchdb.apache.org/http://couchdb.apache.org/http://aws.amazon.com/simpledb/http://aws.amazon.com/simpledb/http://github.com/cliffmoon/dynomite/tree/masterhttp://github.com/cliffmoon/dynomite/tree/masterhttp://kai.wiki.sourceforge.net/http://kai.wiki.sourceforge.net/http://kai.wiki.sourceforge.net/http://research.google.com/archive/bigtable.htmlhttp://research.google.com/archive/bigtable.htmlhttp://hadoop.apache.org/core/http://hadoop.apache.org/core/http://www.zib.de/CSR/Projects/scalaris/http://www.zib.de/CSR/Projects/scalaris/http://www.w3.org/RDF/http://www.w3.org/RDF/http://www.vertica.com/http://www.vertica.com/http://www.greenplum.com/http://www.greenplum.com/http://www.w3.org/XML/http://www.w3.org/XML/http://www.odbms.org/http://www.odbms.org/https://launchpad.net/drizzlehttps://launchpad.net/drizzlehttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/449906/jakub-koneckihttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/449906/jakub-koneckihttps://launchpad.net/drizzlehttp://www.odbms.org/http://www.w3.org/XML/http://www.greenplum.com/http://www.vertica.com/http://www.w3.org/RDF/http://www.zib.de/CSR/Projects/scalaris/http://hadoop.apache.org/core/http://research.google.com/archive/bigtable.htmlhttp://kai.wiki.sourceforge.net/http://github.com/cliffmoon/dynomite/tree/masterhttp://aws.amazon.com/simpledb/http://couchdb.apache.org/http://stackoverflow.com/questions/282783/the-next-gen-databases?answertab=votes#tab-tophttp://stackoverflow.com/questions/282783/the-next-gen-databases?answertab=active#tab-tophttp://stackoverflow.com/questions/282783/the-next-gen-databases?answertab=active#tab-tophttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment462640_282783http://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133499_282783http://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133491_282783http://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133487_282783http://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133454_282783http://stackoverflow.com/users/14138/rickhttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/449906/jakub-koneckihttp://stackoverflow.com/users/449906/jakub-koneckihttp://stackoverflow.com/users/449906/jakub-koneckihttp://stackoverflow.com/posts/282783/revisionshttp://stackoverflow.com/posts/282783/edithttp://stackoverflow.com/q/282783


4/26

will continue to use standard SQL with some MySQL enhancements. Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store, developed

at Facebook by one of the authors of Amazon Dynamo, and contributed to the Apache project. Project Voldemort is a non-relational, distributed, key-value storage system. It is used at LinkedIn.com Berkeley DB deserves some mention too. It's not "next-gen" because it dates back to the early 1990's.

It's a popular key-value store that is easy to embed in a variety of applications. The technology is

currently owned by Oracle Corp.Also see this nice article by Richard Jones: "Anti-RDBMS: A list of distributed key-value stores ." He goesinto more detail describing some of these technologies.Relational databases have weaknesses, to be sure. People have been arguing that they don't handle all datamodeling requirements since the day it was first introduced.

Year after year, researchers come up with new ways of managing data to satisfy special requirements: either requirements to handle data relationships that don't fit into the relational model, or else requirements of high-scale volume or speed that demand data processing be done on distributed collections of servers, instead of central database servers.

Even though these advanced technologies do great things to solve the specialized problem they were designedfor, relational databases are still a good general-purpose solution for most business needs. SQL isn't goingaway.

I've written an article in php|Architect magazine about the innovation of non-relational databases, and datamodeling in relational vs. non-relational databases. http://www.phparch.com/magazine/2010-2/september/

share |improve this answer edited Mar 5 at 14:40

Emil 3,064 43289

answered Nov 12 '08 at

Bill Karwin 138k 20171326

1

Hey Bill, we do tend to answer the same questions a lot.. your answer here is thorough enough I don't feel writing myown would be of much use -- want to add some info about Vertica et al, and Greenplum and friends, to make it morecomplete? SquareCog Nov 12 '08 at 2:56

1 oh and XML and Object databases.. I always forget about those.. SquareCog Nov 12 '08 at 3:14

Thank you Bill for the through answer, I'll just stick with PostgreSQL for the time being. Randin Nov 12 '08 at 10:50

1 PostgreSQL is a fine choice for RDBMS. Have fun! Bill Karwin Nov 12 '08 at 16:46

Hey, thanks for the list! :) hasen j Feb 22 '09 at 16:06

up

vote20 down vote

I'm missing graph databases in the answers so far. A graph or network of objects is common in programming and can be useful in databases as well. It can handle semi-structured and interconnectedinformation in an efficient way. Among the areas where graph databases have gained a lot of interest aresemantic web and bioinformatics. RDF was mentioned, and it is in fact a language that represents a graph.Here's some pointers to what's happening in the graph database area:
http://incubator.apache.org/cassandra/http://incubator.apache.org/cassandra/http://project-voldemort.com/http://project-voldemort.com/http://www.oracle.com/technology/products/berkeley-db/index.htmlhttp://www.oracle.com/technology/products/berkeley-db/index.htmlhttp://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/http://www.phparch.com/magazine/2010-2/september/http://www.phparch.com/magazine/2010-2/september/http://www.phparch.com/magazine/2010-2/september/http://stackoverflow.com/a/282813http://stackoverflow.com/posts/282813/edithttp://stackoverflow.com/posts/282813/edithttp://stackoverflow.com/posts/282813/edithttp://stackoverflow.com/posts/282813/revisionshttp://stackoverflow.com/posts/282813/revisionshttp://stackoverflow.com/posts/282813/revisionshttp://stackoverflow.com/users/267892/emilhttp://stackoverflow.com/users/267892/emilhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133525_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133525_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133525_282813http://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133551_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133551_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133551_282813http://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134048_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134048_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134048_282813http://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134745_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134745_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134745_282813http://stackoverflow.com/users/35364/hasen-jhttp://stackoverflow.com/users/35364/hasen-jhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment386956_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment386956_282813http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment386956_282813http://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NjYyNjU4OCwiYXYiOjQxNCwiYXQiOjQsImNtIjoyODEwOSwiY2giOjExNzgsImNyIjo2NzU1MCwiZG0iOjEsImZjIjo5NDkwMywiZmwiOjQ2OTg3LCJrdyI6InNxbCxkYXRhYmFzZSxub3NxbCxub24tcmVsYXRpb25hbC1kYXRhYmFzZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAzIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjAyMWFlYjIwYzc1NTRlMzZhMDY0ZTAwZGNjYTJkMGNiIiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9lbXBsb3llcj9hZD0zIn0&s=YENvWZ323T2bRCrRz-Gsdx3pEtUhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/267892/emilhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NjYyNjU4OCwiYXYiOjQxNCwiYXQiOjQsImNtIjoyODEwOSwiY2giOjExNzgsImNyIjo2NzU1MCwiZG0iOjEsImZjIjo5NDkwMywiZmwiOjQ2OTg3LCJrdyI6InNxbCxkYXRhYmFzZSxub3NxbCxub24tcmVsYXRpb25hbC1kYXRhYmFzZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAzIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjAyMWFlYjIwYzc1NTRlMzZhMDY0ZTAwZGNjYTJkMGNiIiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9lbXBsb3llcj9hZD0zIn0&s=YENvWZ323T2bRCrRz-Gsdx3pEtUhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/267892/emilhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NjYyNjU4OCwiYXYiOjQxNCwiYXQiOjQsImNtIjoyODEwOSwiY2giOjExNzgsImNyIjo2NzU1MCwiZG0iOjEsImZjIjo5NDkwMywiZmwiOjQ2OTg3LCJrdyI6InNxbCxkYXRhYmFzZSxub3NxbCxub24tcmVsYXRpb25hbC1kYXRhYmFzZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAzIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjAyMWFlYjIwYzc1NTRlMzZhMDY0ZTAwZGNjYTJkMGNiIiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9lbXBsb3llcj9hZD0zIn0&s=YENvWZ323T2bRCrRz-Gsdx3pEtUhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/267892/emilhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment386956_282813http://stackoverflow.com/users/35364/hasen-jhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134745_282813http://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment134048_282813http://stackoverflow.com/users/3932/randinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133551_282813http://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133525_282813http://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/267892/emilhttp://stackoverflow.com/users/267892/emilhttp://stackoverflow.com/users/267892/emilhttp://stackoverflow.com/posts/282813/revisionshttp://stackoverflow.com/posts/282813/edithttp://stackoverflow.com/a/282813http://www.phparch.com/magazine/2010-2/september/http://www.metabrew.com/article/anti-rdbms-a-list-of-distributed-key-value-stores/http://www.oracle.com/technology/products/berkeley-db/index.htmlhttp://project-voldemort.com/http://incubator.apache.org/cassandra/


5/26

Graphs - a better database abstraction Graphd, the backend of Freebase Neo4j open source graph database engine AllegroGraph RDFstore Graphdb abstraction layer for bioinformatics Graphdb behind Directed Edge recommendation engine

I'm part of the Neo4j project, which is written in Java but has bindings to Python, Ruby and Scala as well.Some people use it with Clojure or Groovy/Grails. There is also a GUI tool evolving.

share |improve this answer edited Oct 8 '09 at 19:11 answered Mar 26 '09 at

nawroth 3,180 11015

How about db4o.com , an object-database, but its designed around managing object graphs. Norman H Mar 16 '11 at1:53

Object databases (OODB) are different from graph databases. Simply put a graphdb won't tie your data directly to your

object model. In a graphdb relationships are first class citizens, while you'd have to implement that on your own in anOODB. In a graphdb you can have different object types represent different views on the same data. Graphdbs typicallysupport things like finding shortest paths and the like. nawroth Mar 16 '11 at 12:05

Cool, thanks for the clarification! Norman H Mar 17 '11 at 13:47

up

vote5downvote

Might be not the best place to answer with this, but I'd like to share this taxonomy of noSQL world created bySteve Yen (please find it at http://dl.dropbox.com/u/2075876/nosql-steve-yen.pdf ) (1) key value cache memcachedrepcachedcoherence

innispan eXtreme scale

jboss cachevelocityterracoqa (2) key value store

keyspaceare

schema freeRAMCloud

(3) eventually consistent key value store dynamo

voldemortDynomiteSubRecordMongoDbDovetaildb

(4) ordered key value store tokyo tyrantlightcloud NMDB

luxiomemcachedb
http://whydoeseverythingsuck.com/2008/03/graphs-better-database-abstraction.htmlhttp://whydoeseverythingsuck.com/2008/03/graphs-better-database-abstraction.htmlhttp://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/http://neo4j.org/http://neo4j.org/http://agraph.franz.com/http://agraph.franz.com/http://code.google.com/p/pygr/http://code.google.com/p/pygr/http://blog.directededge.com/2009/02/27/on-building-a-stupidly-fast-graph-database/http://blog.directededge.com/2009/02/27/on-building-a-stupidly-fast-graph-database/http://neo4j.org/http://neo4j.org/http://neo4j.org/http://wiki.neo4j.org/content/Neoclipsehttp://wiki.neo4j.org/content/Neoclipsehttp://wiki.neo4j.org/content/Neoclipsehttp://stackoverflow.com/a/687790http://stackoverflow.com/posts/687790/edithttp://stackoverflow.com/posts/687790/edithttp://stackoverflow.com/posts/687790/edithttp://stackoverflow.com/posts/687790/revisionshttp://stackoverflow.com/posts/687790/revisionshttp://stackoverflow.com/posts/687790/revisionshttp://stackoverflow.com/users/36710/nawrothhttp://stackoverflow.com/users/36710/nawrothhttp://www.db4o.com/http://www.db4o.com/http://www.db4o.com/http://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6003217_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6003217_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6003217_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6003217_687790http://stackoverflow.com/users/36710/nawrothhttp://stackoverflow.com/users/36710/nawrothhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6009316_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6009316_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6009316_687790http://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6028719_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6028719_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6028719_687790http://dl.dropbox.com/u/2075876/nosql-steve-yen.pdfhttp://dl.dropbox.com/u/2075876/nosql-steve-yen.pdfhttp://dl.dropbox.com/u/2075876/nosql-steve-yen.pdfhttp://stackoverflow.com/users/36710/nawrothhttp://dl.dropbox.com/u/2075876/nosql-steve-yen.pdfhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6028719_687790http://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6009316_687790http://stackoverflow.com/users/36710/nawrothhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6003217_687790http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment6003217_687790http://stackoverflow.com/users/77329/norman-hhttp://www.db4o.com/http://stackoverflow.com/users/36710/nawrothhttp://stackoverflow.com/users/36710/nawrothhttp://stackoverflow.com/users/36710/nawrothhttp://stackoverflow.com/posts/687790/revisionshttp://stackoverflow.com/posts/687790/edithttp://stackoverflow.com/a/687790http://wiki.neo4j.org/content/Neoclipsehttp://neo4j.org/http://blog.directededge.com/2009/02/27/on-building-a-stupidly-fast-graph-database/http://code.google.com/p/pygr/http://agraph.franz.com/http://neo4j.org/http://blog.freebase.com/2008/04/09/a-brief-tour-of-graphd/http://whydoeseverythingsuck.com/2008/03/graphs-better-database-abstraction.html


6/26

actord (5) data structures server redis (6) tuple store gigaspaces

coord

apache river (7) object database ZopeDB db4o Shoal (8) document store CouchDBMongo

JackrabbitXML DatabasesThruDBCloudKitPerservereRiak BashoScalaris

(9) wide columnar store BigTableHbaseCassandra

HypertableKAI

OpenNep

share |improve this answer answered Mar 19 '11 at

Paolo Bozzola 189 38

up

vote2down vote

or a look into what academic research is being done in the area of next gen databases take a look athis: http://www.thethirdmanifesto.com/

In regard to the SQL language as a proper implementation of the relational model, I quote from wikipedia,"SQL, initially pushed as the standard language for relational databases, deviates from the relational model inseveral places. The current ISO SQL standard doesn't mention the relational model or use relational terms or concepts. However, it is possible to create a database conforming to the relational model using SQL if one does

ot use certain SQL features."

ttp://en.wikipedia.org/wiki/Relational_model (Referenced in the section "SQL and the relational model" onarch 28, 2010

share |improve this answer answered Mar 28 '10 at 1

Norman H 605 412

up

vote1down vote

ot to be pedantic, but I would like to point out that at least CouchDB isn't SQL-based. And I would hope thathe next-gen SQL would make SQL a lot less... fugly and non-intuitive.
http://stackoverflow.com/a/5361522http://stackoverflow.com/posts/5361522/edithttp://stackoverflow.com/posts/5361522/edithttp://stackoverflow.com/posts/5361522/edithttp://stackoverflow.com/users/667093/paolo-bozzolahttp://stackoverflow.com/users/667093/paolo-bozzolahttp://www.thethirdmanifesto.com/http://www.thethirdmanifesto.com/http://www.thethirdmanifesto.com/http://en.wikipedia.org/wiki/Relational_modelhttp://en.wikipedia.org/wiki/Relational_modelhttp://stackoverflow.com/a/2532677http://stackoverflow.com/posts/2532677/edithttp://stackoverflow.com/posts/2532677/edithttp://stackoverflow.com/posts/2532677/edithttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/667093/paolo-bozzolahttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/667093/paolo-bozzolahttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/users/77329/norman-hhttp://stackoverflow.com/posts/2532677/edithttp://stackoverflow.com/a/2532677http://en.wikipedia.org/wiki/Relational_modelhttp://www.thethirdmanifesto.com/http://stackoverflow.com/users/667093/paolo-bozzolahttp://stackoverflow.com/users/667093/paolo-bozzolahttp://stackoverflow.com/users/667093/paolo-bozzolahttp://stackoverflow.com/posts/5361522/edithttp://stackoverflow.com/a/5361522


7/26

share |improve this answer answered Nov 12 '08 at 2:05

Jason Baker

43.6k 34185348 A friend of mine said, "It's supposed to be hard to read! It's called code for a reason!" :-) Bill Karwin Nov 12 '08 at2:30

My brain is broken, I like SQL, too much looking at it grow on to you :) Robert Gould Nov 12 '08 at 2:35

up

vote1down vote

There are special databases for XML like MarkLogic and Berkeley XMLDB. They can index xml-docs and onecan query them with XQuery. I expect JSON databases, maybe they already exist. Did some googling butcouldn't find one.

share |improve this answer answered Mar 22 '09 at 1

tuinstoel 5,779 1422

There are a few that provide a JSON interface to the data. Terrastore is one example. quikchange Jul 7 '10 at 18:35

up

vote0down vote

SQL has been around since the early 1970s so I don't think that it's going to go away any time soon.

aybe the 'new(-ish) sql' will oql (see http://en.wikipedia.org/wiki/ODBMS )

share |improve this answer answered Nov 12 '08 at 2

Christopher Edwards 2,281 21739

up

vote0down vote

I heard also about NimbusDB by Jim Starkey

Jim Starkey is the man who "create" Interbase

who work on Vulcan (a Firebird fork)

and who was at the begining of Falcon for MySQL

share |improve this answer

Hadoop examples?

up I'm examining Hadoop as a possible tool with which to do some log analysis. I want to analyze several kindsof statistics in one run. Each line of my log files has all sorts of potentially useful that I'd like to aggregate. I'd
http://stackoverflow.com/a/282785http://stackoverflow.com/posts/282785/edithttp://stackoverflow.com/posts/282785/edithttp://stackoverflow.com/posts/282785/edithttp://stackoverflow.com/users/2147/jason-bakerhttp://stackoverflow.com/users/2147/jason-bakerhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133488_282785http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133488_282785http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133488_282785http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133488_282785http://stackoverflow.com/users/15124/robert-gouldhttp://stackoverflow.com/users/15124/robert-gouldhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133495_282785http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133495_282785http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133495_282785http://stackoverflow.com/a/671338http://stackoverflow.com/posts/671338/edithttp://stackoverflow.com/posts/671338/edithttp://stackoverflow.com/posts/671338/edithttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/26551/quikchangehttp://stackoverflow.com/users/26551/quikchangehttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment3295233_671338http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment3295233_671338http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment3295233_671338http://en.wikipedia.org/wiki/ODBMShttp://en.wikipedia.org/wiki/ODBMShttp://en.wikipedia.org/wiki/ODBMShttp://stackoverflow.com/a/282801http://stackoverflow.com/posts/282801/edithttp://stackoverflow.com/posts/282801/edithttp://stackoverflow.com/posts/282801/edithttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/a/732052http://stackoverflow.com/posts/732052/edithttp://stackoverflow.com/posts/732052/edithttp://stackoverflow.com/posts/732052/edithttp://stackoverflow.com/questions/735791/hadoop-exampleshttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjgsImRtIjoxLCJmYyI6MTY4ODUsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQzLCJkaSI6ImRmNmI4OGUwMmM0ZjQxMjI4MGViMWRiNGI5NTQyMWU0IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=K2ziKQpZ-TcBztvvNxgf60FMdTIhttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/2147/jason-bakerhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjgsImRtIjoxLCJmYyI6MTY4ODUsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQzLCJkaSI6ImRmNmI4OGUwMmM0ZjQxMjI4MGViMWRiNGI5NTQyMWU0IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=K2ziKQpZ-TcBztvvNxgf60FMdTIhttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/2147/jason-bakerhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjgsImRtIjoxLCJmYyI6MTY4ODUsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQzLCJkaSI6ImRmNmI4OGUwMmM0ZjQxMjI4MGViMWRiNGI5NTQyMWU0IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=K2ziKQpZ-TcBztvvNxgf60FMdTIhttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/2147/jason-bakerhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjgsImRtIjoxLCJmYyI6MTY4ODUsImZsIjoyNDQ0LCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQzLCJkaSI6ImRmNmI4OGUwMmM0ZjQxMjI4MGViMWRiNGI5NTQyMWU0IiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=K2ziKQpZ-TcBztvvNxgf60FMdTIhttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/2147/jason-bakerhttp://stackoverflow.com/questions/735791/hadoop-exampleshttp://stackoverflow.com/posts/732052/edithttp://stackoverflow.com/a/732052http://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/users/29411/christopher-edwardshttp://stackoverflow.com/posts/282801/edithttp://stackoverflow.com/a/282801http://en.wikipedia.org/wiki/ODBMShttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment3295233_671338http://stackoverflow.com/users/26551/quikchangehttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/users/43901/tuinstoelhttp://stackoverflow.com/posts/671338/edithttp://stackoverflow.com/a/671338http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133495_282785http://stackoverflow.com/users/15124/robert-gouldhttp://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133488_282785http://stackoverflow.com/questions/282783/the-next-gen-databases/282813#comment133488_282785http://stackoverflow.com/users/20860/bill-karwinhttp://stackoverflow.com/users/2147/jason-bakerhttp://stackoverflow.com/users/2147/jason-bakerhttp://stackoverflow.com/users/2147/jason-bakerhttp://stackoverflow.com/posts/282785/edithttp://stackoverflow.com/a/282785


8/26

vote82 down vot e favorite

60

like to get all sorts of data out of the logs in a single Hadoop run, but the example Hadoop programs I seeonline all seem to total exactly one thing. This may be because every single example Hadoop program I canfind just does word counts. Can I use Hadoop to solve two or more problems at once?

Are there other Hadoop examples, or a Hadoop tutorial out there, that don't solve the word count problem?

hadoop share |improve this question edited Mar 9 at 15:15

j0k 10.1k 102335

asked Apr 9 '09 at 20:1

Brandon Yarbrough 5,347 53165

11 As a general comment, I do seem to notice that Hadoop doesn't have a lot of examples floating around. Not sure whythat is. John Feminella Apr 9 '09 at 20:24


up


One of the best resources that I have found to get started is Cloudera. They are a startup company comprisedof mainly ex-Google and ex-Yahoo employees. On their page there is a training section with lessons on thedifferent technologies here . I found that very useful in playing with straight Hadoop, Pig and Hive. They havea virtual machine that you can download that has everything configured and some examples that help you getcoding. All of that is free in the training section. The only thing that I couldn't find is a tutorial on HBase. Ihave been looking for one for a while. Best of luck.

share |improve this answer answered May 7 '09 at

Ryan H 1,017 2818

2 Mark White's Second edition has info on HBase, Pig and Hive C-x C-t Feb 18 '11 at 10:01

up

vote16 down vote

I'm finishing up a tutorial on processing Wikipedia pageview log files, several parts of which computemultiple metrics in one pass (sum of pageviews, trend over the last 24 hours, running regressions, etc.). The

code is here: http://github.com/datawrangling/trendingtopics/tree/master The Hadoop code mostly uses a mix of Python streaming & Hive w/ the Cloudera distro on EC2...

share |improve this answer answered Jun 30 '09 at

Pete Skomoroch
http://stackoverflow.com/questions/735791/hadoop-exampleshttp://stackoverflow.com/questions/735791/hadoop-exampleshttp://stackoverflow.com/questions/735791/hadoop-exampleshttp://stackoverflow.com/questions/tagged/hadoophttp://stackoverflow.com/questions/tagged/hadoophttp://stackoverflow.com/q/735791http://stackoverflow.com/posts/735791/edithttp://stackoverflow.com/posts/735791/edithttp://stackoverflow.com/posts/735791/edithttp://stackoverflow.com/posts/735791/revisionshttp://stackoverflow.com/posts/735791/revisionshttp://stackoverflow.com/posts/735791/revisionshttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/75170/john-feminellahttp://stackoverflow.com/users/75170/john-feminellahttp://stackoverflow.com/questions/735791/hadoop-examples#comment547339_735791http://stackoverflow.com/questions/735791/hadoop-examples#comment547339_735791http://stackoverflow.com/questions/735791/hadoop-examples#comment547339_735791http://stackoverflow.com/questions/735791/hadoop-examples?answertab=active#tab-tophttp://stackoverflow.com/questions/735791/hadoop-examples?answertab=votes#tab-tophttp://stackoverflow.com/questions/735791/hadoop-examples?answertab=votes#tab-tophttp://stackoverflow.com/questions/735791/hadoop-examples?answertab=votes#tab-tophttp://www.cloudera.com/hadoop-traininghttp://www.cloudera.com/hadoop-traininghttp://www.cloudera.com/hadoop-traininghttp://stackoverflow.com/a/835303http://stackoverflow.com/posts/835303/edithttp://stackoverflow.com/posts/835303/edithttp://stackoverflow.com/posts/835303/edithttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/171180/c-x-c-thttp://stackoverflow.com/users/171180/c-x-c-thttp://stackoverflow.com/questions/735791/hadoop-examples#comment5638990_835303http://stackoverflow.com/questions/735791/hadoop-examples#comment5638990_835303http://stackoverflow.com/questions/735791/hadoop-examples#comment5638990_835303http://github.com/datawrangling/trendingtopics/tree/masterhttp://github.com/datawrangling/trendingtopics/tree/masterhttp://github.com/datawrangling/trendingtopics/tree/masterhttp://stackoverflow.com/a/1061949http://stackoverflow.com/posts/1061949/edithttp://stackoverflow.com/posts/1061949/edithttp://stackoverflow.com/posts/1061949/edithttp://stackoverflow.com/users/82978/pete-skomorochhttp://stackoverflow.com/users/82978/pete-skomorochhttp://stackoverflow.com/users/82978/pete-skomorochhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NywiYXYiOjQ5MywiYXQiOjQsImNtIjoyNzQ3OCwiY2giOjExNzgsImNyIjo0NzE0MCwiZG0iOjEsImZjIjo5MTkyNywiZmwiOjQ1OTIxLCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY3LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjJiMTBmMTU3YjI0YjQxOGJiNDk1Nzc3OWUyNTdhYWU2IiwidXIiOiJodHRwOi8vZGV2ZWxvcGVyLmF0dC5jb20vc3BlZWNoP3V0bV9zb3VyY2U9c3RhY2tvdmVyZmxvdyZ1dG1fbWVkaXVtPWJhbm5lciZ1dG1fY29udGVudD03Mjh4OTAmdXRtX2NhbXBhaWduPW5ldyJ9&s=90ceBlOSVKGNI7uMg8RjTVDQMqMhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/82978/pete-skomorochhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NywiYXYiOjQ5MywiYXQiOjQsImNtIjoyNzQ3OCwiY2giOjExNzgsImNyIjo0NzE0MCwiZG0iOjEsImZjIjo5MTkyNywiZmwiOjQ1OTIxLCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY3LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjJiMTBmMTU3YjI0YjQxOGJiNDk1Nzc3OWUyNTdhYWU2IiwidXIiOiJodHRwOi8vZGV2ZWxvcGVyLmF0dC5jb20vc3BlZWNoP3V0bV9zb3VyY2U9c3RhY2tvdmVyZmxvdyZ1dG1fbWVkaXVtPWJhbm5lciZ1dG1fY29udGVudD03Mjh4OTAmdXRtX2NhbXBhaWduPW5ldyJ9&s=90ceBlOSVKGNI7uMg8RjTVDQMqMhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/82978/pete-skomorochhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NywiYXYiOjQ5MywiYXQiOjQsImNtIjoyNzQ3OCwiY2giOjExNzgsImNyIjo0NzE0MCwiZG0iOjEsImZjIjo5MTkyNywiZmwiOjQ1OTIxLCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY3LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjJiMTBmMTU3YjI0YjQxOGJiNDk1Nzc3OWUyNTdhYWU2IiwidXIiOiJodHRwOi8vZGV2ZWxvcGVyLmF0dC5jb20vc3BlZWNoP3V0bV9zb3VyY2U9c3RhY2tvdmVyZmxvdyZ1dG1fbWVkaXVtPWJhbm5lciZ1dG1fY29udGVudD03Mjh4OTAmdXRtX2NhbXBhaWduPW5ldyJ9&s=90ceBlOSVKGNI7uMg8RjTVDQMqMhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/82978/pete-skomorochhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NywiYXYiOjQ5MywiYXQiOjQsImNtIjoyNzQ3OCwiY2giOjExNzgsImNyIjo0NzE0MCwiZG0iOjEsImZjIjo5MTkyNywiZmwiOjQ1OTIxLCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY3LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjJiMTBmMTU3YjI0YjQxOGJiNDk1Nzc3OWUyNTdhYWU2IiwidXIiOiJodHRwOi8vZGV2ZWxvcGVyLmF0dC5jb20vc3BlZWNoP3V0bV9zb3VyY2U9c3RhY2tvdmVyZmxvdyZ1dG1fbWVkaXVtPWJhbm5lciZ1dG1fY29udGVudD03Mjh4OTAmdXRtX2NhbXBhaWduPW5ldyJ9&s=90ceBlOSVKGNI7uMg8RjTVDQMqMhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/82978/pete-skomorochhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1NzcxODc2NywiYXYiOjQ5MywiYXQiOjQsImNtIjoyNzQ3OCwiY2giOjExNzgsImNyIjo0NzE0MCwiZG0iOjEsImZjIjo5MTkyNywiZmwiOjQ1OTIxLCJrdyI6ImhhZG9vcCIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY3LCJzdCI6ODI3Nywiem4iOjQ0LCJkaSI6IjJiMTBmMTU3YjI0YjQxOGJiNDk1Nzc3OWUyNTdhYWU2IiwidXIiOiJodHRwOi8vZGV2ZWxvcGVyLmF0dC5jb20vc3BlZWNoP3V0bV9zb3VyY2U9c3RhY2tvdmVyZmxvdyZ1dG1fbWVkaXVtPWJhbm5lciZ1dG1fY29udGVudD03Mjh4OTAmdXRtX2NhbXBhaWduPW5ldyJ9&s=90ceBlOSVKGNI7uMg8RjTVDQMqMhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/82978/pete-skomorochhttp://stackoverflow.com/users/82978/pete-skomorochhttp://stackoverflow.com/users/82978/pete-skomorochhttp://stackoverflow.com/posts/1061949/edithttp://stackoverflow.com/a/1061949http://github.com/datawrangling/trendingtopics/tree/masterhttp://stackoverflow.com/questions/735791/hadoop-examples#comment5638990_835303http://stackoverflow.com/users/171180/c-x-c-thttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/users/69108/ryan-hhttp://stackoverflow.com/posts/835303/edithttp://stackoverflow.com/a/835303http://www.cloudera.com/hadoop-traininghttp://stackoverflow.com/questions/735791/hadoop-examples?answertab=votes#tab-tophttp://stackoverflow.com/questions/735791/hadoop-examples?answertab=active#tab-tophttp://stackoverflow.com/questions/735791/hadoop-examples?answertab=active#tab-tophttp://stackoverflow.com/questions/735791/hadoop-examples#comment547339_735791http://stackoverflow.com/users/75170/john-feminellahttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/81491/brandon-yarbroughhttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/users/569101/j0khttp://stackoverflow.com/posts/735791/revisionshttp://stackoverflow.com/posts/735791/edithttp://stackoverflow.com/q/735791http://stackoverflow.com/questions/tagged/hadoophttp://stackoverflow.com/questions/735791/hadoop-examples


9/26

1,181 64

1 i loved your tute pete, especially the overview you gave at hadoop world, awesome stuff! mat kelcey Oct 20 '09 at5:06

up

vote9downvote

Here are two examples using Cascading (and API over Hadoop)

A simple log parser Calculates arrival rate of requests

You can start with the second and just keep adding metrics.

Cascading project site

share |improve this answer edited May 1 '11 at 20:07 answered Apr 10 '09 at

cwensel 881 54

up

vote7down vote

You can refer to Tom White's Hadoop book for more examples and usecases :http://www.amazon.com/Hadoop-efinitive-Guide-Tom-White/dp/1449389732/

share |improve this answer answered Oct 19 '10 at 1

Pavan Yara 7111

up

vote6down vote

ith the normal Map/Reduce paradigm, you typically solve one problem at a time. In the map step youypically perform some transformation or denormalization, in the Reduce step you often aggregate the map

outputs.

If you want to answer multiple questions about your data, the best way to do it in Hadoop is to write multipleobs, or a sequence of jobs that read the previous step's outputs.

There are several higher-level abstraction languages or APIs (Pig, Hive, Cascading) that simplify some of thisork for you, allowing you to write more traditional procedural or SQL-style code that, under the covers, just

creates a sequence of Hadoop jobs.

share |improve this answer answered Apr 23 '09 at 0:

Ilya Haykinson 450 44

up

vote5down vote

There was a course taught by Jimmy Lin at the University of Maryland. He developed the Cloud9 package as araining tool. It contains several examples.

Cloud9 Documentation and Source
http://stackoverflow.com/users/26094/mat-kelceyhttp://stackoverflow.com/users/26094/mat-kelceyhttp://stackoverflow.com/questions/735791/hadoop-examples#comment1456301_1061949http://stackoverflow.com/questions/735791/hadoop-examples#comment1456301_1061949http://stackoverflow.com/questions/735791/hadoop-examples#comment1456301_1061949http://stackoverflow.com/questions/735791/hadoop-examples#comment1456301_1061949https://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/logparser/src/java/logparser/Main.javahttps://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/logparser/src/java/logparser/Main.javahttps://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/logparser/src/java/logparser/Main.javahttps://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/loganalysis/src/java/loganalysis/Main.javahttps://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/loganalysis/src/java/loganalysis/Main.javahttps://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/loganalysis/src/java/loganalysis/Main.javahttp://www.cascading.org/http://www.cascading.org/http://stackoverflow.com/a/737945http://stackoverflow.com/posts/737945/edithttp://stackoverflow.com/posts/737945/edithttp://stackoverflow.com/posts/737945/edithttp://stackoverflow.com/posts/737945/revisionshttp://stackoverflow.com/posts/737945/revisionshttp://stackoverflow.com/posts/737945/revisionshttp://stackoverflow.com/users/37943/cwenselhttp://stackoverflow.com/users/37943/cwenselhttp://rads.stackoverflow.com/amzn/click/1449389732http://rads.stackoverflow.com/amzn/click/1449389732http://rads.stackoverflow.com/amzn/click/1449389732http://rads.stackoverflow.com/amzn/click/1449389732http://stackoverflow.com/a/3967341http://stackoverflow.com/posts/3967341/edithttp://stackoverflow.com/posts/3967341/edithttp://stackoverflow.com/posts/3967341/edithttp://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/a/779806http://stackoverflow.com/posts/779806/edithttp://stackoverflow.com/posts/779806/edithttp://stackoverflow.com/posts/779806/edithttp://stackoverflow.com/users/94689/ilya-haykinsonhttp://stackoverflow.com/users/94689/ilya-haykinsonhttp://www.umiacs.umd.edu/~jimmylin/cloud9/docs/http://www.umiacs.umd.edu/~jimmylin/cloud9/docs/http://stackoverflow.com/users/94689/ilya-haykinsonhttp://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/users/37943/cwenselhttp://stackoverflow.com/users/94689/ilya-haykinsonhttp://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/users/37943/cwenselhttp://stackoverflow.com/users/94689/ilya-haykinsonhttp://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/users/37943/cwenselhttp://www.umiacs.umd.edu/~jimmylin/cloud9/docs/http://stackoverflow.com/users/94689/ilya-haykinsonhttp://stackoverflow.com/users/94689/ilya-haykinsonhttp://stackoverflow.com/users/94689/ilya-haykinsonhttp://stackoverflow.com/posts/779806/edithttp://stackoverflow.com/a/779806http://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/users/480341/pavan-yarahttp://stackoverflow.com/posts/3967341/edithttp://stackoverflow.com/a/3967341http://rads.stackoverflow.com/amzn/click/1449389732http://rads.stackoverflow.com/amzn/click/1449389732http://stackoverflow.com/users/37943/cwenselhttp://stackoverflow.com/users/37943/cwenselhttp://stackoverflow.com/users/37943/cwenselhttp://stackoverflow.com/posts/737945/revisionshttp://stackoverflow.com/posts/737945/edithttp://stackoverflow.com/a/737945http://www.cascading.org/https://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/loganalysis/src/java/loganalysis/Main.javahttps://github.com/cwensel/cascading.samples/blob/d166b1276a3e6256356fa700ff7cf1b4333940db/logparser/src/java/logparser/Main.javahttp://stackoverflow.com/questions/735791/hadoop-examples#comment1456301_1061949http://stackoverflow.com/questions/735791/hadoop-examples#comment1456301_1061949http://stackoverflow.com/users/26094/mat-kelcey


10/26

share |improve this answer answered Jun 8 '09 at 18:29

user119381

5111 up

vote4down vote

mazon has a new service based on Hadoop, its a great way to get started and they have some niceexamples. http://aws.amazon.com/elasticmapreduce/

share |improve this answer answered Apr 10 '09 at 1

Maurice Flanagan 2,456 1623

up

vote4down vote

You can also follow Cloudera blog, they posted recently a really good article about Apache log analysis withig.

share |improve this answer answered Jul 8 '09 at 22:1

Romain Rigaux 1,531 810

As the author of said article, I want to point out that it was written more from a "getting familiar with Pig" perspectivethan a "doing log parsing in Hadoop" perspective. There are more efficient and less verbose ways to do those things.But yeah, Pig is nice for this sort of stuff at large scale. SquareCog Nov 25 '09 at 1:04

up

vote3down vote

ave you looked at the wiki ? You could try looking through the software in the contrib section though the codefor those will probably be hard to learn from. Looking over the page they seem to have a link to an external

utorial.share |improve this answer answered Apr 9 '09 at 21:

fuzzy-waffle 546 29

up

vote3down vote

I'm sure you've solved your problem by now, but for those who still get redirected here from google searchingfor examples here is a excellent blog with hundreds lines of working code :http://sujitpal.blogspot.com/

share |improve this answer edited Nov 30 '09 at 15:12 answered Nov 27 '09 at 1

alex 232 1213

up

vote2down vote

There are several examples using ruby under Hadoop streaming in the wukong library . (Disclaimer: I am anuthor of same). Besides the now-standard wordcount example, there's pagerank and a couple simple graphanipulation scripts.
http://stackoverflow.com/a/966241http://stackoverflow.com/posts/966241/edithttp://stackoverflow.com/posts/966241/edithttp://stackoverflow.com/posts/966241/edithttp://stackoverflow.com/users/119381/user119381http://stackoverflow.com/users/119381/user119381http://aws.amazon.com/elasticmapreduce/http://aws.amazon.com/elasticmapreduce/http://aws.amazon.com/elasticmapreduce/http://stackoverflow.com/a/737954http://stackoverflow.com/posts/737954/edithttp://stackoverflow.com/posts/737954/edithttp://stackoverflow.com/posts/737954/edithttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/http://stackoverflow.com/a/1100837http://stackoverflow.com/posts/1100837/edithttp://stackoverflow.com/posts/1100837/edithttp://stackoverflow.com/posts/1100837/edithttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/questions/735791/hadoop-examples#comment1681613_1100837http://stackoverflow.com/questions/735791/hadoop-examples#comment1681613_1100837http://stackoverflow.com/questions/735791/hadoop-examples#comment1681613_1100837http://wiki.apache.org/hadoop/http://wiki.apache.org/hadoop/http://wiki.apache.org/hadoop/http://stackoverflow.com/a/736022http://stackoverflow.com/posts/736022/edithttp://stackoverflow.com/posts/736022/edithttp://stackoverflow.com/posts/736022/edithttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://sujitpal.blogspot.com/http://sujitpal.blogspot.com/http://sujitpal.blogspot.com/http://stackoverflow.com/a/1810260http://stackoverflow.com/posts/1810260/edithttp://stackoverflow.com/posts/1810260/edithttp://stackoverflow.com/posts/1810260/edithttp://stackoverflow.com/posts/1810260/revisionshttp://stackoverflow.com/posts/1810260/revisionshttp://stackoverflow.com/posts/1810260/revisionshttp://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/165130/alexhttp://github.com/infochimps/wukonghttp://github.com/infochimps/wukonghttp://github.com/infochimps/wukonghttp://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/119381/user119381http://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/119381/user119381http://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/119381/user119381http://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/119381/user119381http://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/119381/user119381http://github.com/infochimps/wukonghttp://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/users/165130/alexhttp://stackoverflow.com/posts/1810260/revisionshttp://stackoverflow.com/posts/1810260/edithttp://stackoverflow.com/a/1810260http://sujitpal.blogspot.com/http://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/users/84805/fuzzy-wafflehttp://stackoverflow.com/posts/736022/edithttp://stackoverflow.com/a/736022http://wiki.apache.org/hadoop/http://stackoverflow.com/questions/735791/hadoop-examples#comment1681613_1100837http://stackoverflow.com/users/15962/squarecoghttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/users/73117/romain-rigauxhttp://stackoverflow.com/posts/1100837/edithttp://stackoverflow.com/a/1100837http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/http://www.cloudera.com/blog/2009/06/17/analyzing-apache-logs-with-pig/http://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/users/38791/maurice-flanaganhttp://stackoverflow.com/posts/737954/edithttp://stackoverflow.com/a/737954http://aws.amazon.com/elasticmapreduce/http://stackoverflow.com/users/119381/user119381http://stackoverflow.com/users/119381/user119381http://stackoverflow.com/users/119381/user119381http://stackoverflow.com/posts/966241/edithttp://stackoverflow.com/a/966241


11/26

share |improve this answer answered Apr 21 '09 at 19:18

mrflip

565 35 up

vote2down vote

or your given example I would recommend the following implementation:

In the MAP-Step you walk through the log line by line. In each line, you separate your relevant data from eachother (somethink like split() I guess) and emit a key-value-pair for each bit of information for every line.

So if your log has a format like this:

(Timestamp) (A) (B) (C)

123 4 5 6

789 1 2 3

You could emit (A,4),(B,5),(C,6) for the first line and so forth for the other lines.

ow you can even have parallel reducers! Each reducers collects the bits for a given category. You can tweak our Hadoop app, so one reducer gets all "A"s and another one gets all "B"s.

The Reduce itself is like the typical word-count ;-)

share |improve this answer answered Feb 2 '11 at 18:

Peter Wippermann 643 315

up

vote2down vote

pache have released a set of examples. You can find them at:

ttp://svn.apache.org/repos/asf/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/

share |improve this answer answered Jul 13 '11 at 14

Adrian Mouat 4,263 31829

up

vote2down vote

Two tools that can give a good starting point to solve the problem in the Hadoop way are PIG (that was alreadyentioned with a link above) and MAHOUT (machine learning libraries).

egarding Mahout, you can read IBM's articles that give a very good introduction on what you can do "easily"ith it:

ttp://www.ibm.com/developerworks/java/library/j-mahout/ http://www.ibm.com/developerworks/java/library/j-ahout-scaling/

It gives you the next thing (Clustering, Categorization...) that you would like to do with the Accounting data
http://stackoverflow.com/a/774163http://stackoverflow.com/posts/774163/edithttp://stackoverflow.com/posts/774163/edithttp://stackoverflow.com/posts/774163/edithttp://stackoverflow.com/users/41857/mrfliphttp://stackoverflow.com/users/41857/mrfliphttp://stackoverflow.com/a/4878557http://stackoverflow.com/posts/4878557/edithttp://stackoverflow.com/posts/4878557/edithttp://stackoverflow.com/posts/4878557/edithttp://stackoverflow.com/users/193705/peter-wippermannhttp://stackoverflow.com/users/193705/peter-wippermannhttp://svn.apache.org/repos/asf/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/http://svn.apache.org/repos/asf/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/http://stackoverflow.com/a/6680684http://stackoverflow.com/posts/6680684/edithttp://stackoverflow.com/posts/6680684/edithttp://stackoverflow.com/posts/6680684/edithttp://stackoverflow.com/users/4332/adrian-mouathttp://stackoverflow.com/users/4332/adrian-mouathttp://www.ibm.com/developerworks/java/library/j-mahout/http://www.ibm.com/developerworks/java/library/j-mahout/http://stackoverflow.com/users/4332/adrian-mouathttp://stackoverflow.com/users/193705/peter-wippermannhttp://stackoverflow.com/users/41857/mrfliphttp://stackoverflow.com/users/4332/adrian-mouathttp://stackoverflow.com/users/193705/peter-wippermannhttp://stackoverflow.com/users/41857/mrfliphttp://stackoverflow.com/users/4332/adrian-mouathttp://stackoverflow.com/users/193705/peter-wippermannhttp://stackoverflow.com/users/41857/mrfliphttp://www.ibm.com/developerworks/java/library/j-mahout/http://stackoverflow.com/users/4332/adrian-mouathttp://stackoverflow.com/users/4332/adrian-mouathttp://stackoverflow.com/users/4332/adrian-mouathttp://stackoverflow.com/posts/6680684/edithttp://stackoverflow.com/a/6680684http://svn.apache.org/repos/asf/hadoop/common/trunk/mapreduce/src/examples/org/apache/hadoop/examples/http://stackoverflow.com/users/193705/peter-wippermannhttp://stackoverflow.com/users/193705/peter-wippermannhttp://stackoverflow.com/users/193705/peter-wippermannhttp://stackoverflow.com/posts/4878557/edithttp://stackoverflow.com/a/4878557http://stackoverflow.com/users/41857/mrfliphttp://stackoverflow.com/users/41857/mrfliphttp://stackoverflow.com/users/41857/mrfliphttp://stackoverflow.com/posts/774163/edithttp://stackoverflow.com/a/774163


12/26

hat you can get from the likes of PIG or hand written MapReduce code that you will write. Y

share |improve this answer edited Jan 31 '12 at 13:06 answered Jan 31 '12 at 12

Guy 1,698 1614

up

vote1down vote

Ilya said it well: folks usually write one job per task because the output from the mapper and reducers are veryspecific to the result you're after.

Also, at higher scale, jobs take longer and usually you'll run different jobs at different frequencies (and subsetsof your data). Finally, it's a lot more maintainable.

We've been spoiled by Hive for syslog and app log analysis. That might get you closer to the lightweight, ad-hoc queries that would let you do multiple results really quickly :http://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hive Passing multiple functions to a SELECT clause would probably accomplish what you're after, but you still mayneed a temporary table.

Parallelizing the Reduce in MapReduce

up

vote8down vot e favorite 2

I understand how Map is easily parallelizable - each computer/CPU can just operate on a small portion of therray.

Is Reduce/foldl parallelizable? It seems like each computation depends on the previous one. Is it justarallelizable for certain types of functions?

ultithreading optimization map multicore reduce

share |improve this question asked Nov 30 '08 at 21:4

Claudiu 35.2k 32143295

Give us some clues: what platform or programming language are you talking about? This doesn't sound like MPI. Andwhat's a "foldl"? Die in Sente Dec 1 '08 at 0:17

foldl is a left fold, or a fold with a left-associative operator: folding [1,2,3,4] with + would yield (((1 + 2) + 3) + 4) Frank Shearar Apr 7 '10 at 7:47

http://stackoverflow.com/a/9079830http://stackoverflow.com/posts/9079830/edithttp://stackoverflow.com/posts/9079830/edithttp://stackoverflow.com/posts/9079830/edithttp://stackoverflow.com/posts/9079830/revisionshttp://stackoverflow.com/posts/9079830/revisionshttp://stackoverflow.com/posts/9079830/revisionshttp://stackoverflow.com/users/179529/guyhttp://stackoverflow.com/users/179529/guyhttp://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hivehttp://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hivehttp://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hivehttp://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hivehttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreducehttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreducehttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368http://stackoverflow.com/questions/tagged/multithreadinghttp://stackoverflow.com/questions/tagged/optimizationhttp://stackoverflow.com/questions/tagged/optimizationhttp://stackoverflow.com/questions/tagged/maphttp://stackoverflow.com/questions/tagged/maphttp://stackoverflow.com/questions/tagged/multicorehttp://stackoverflow.com/questions/tagged/multicorehttp://stackoverflow.com/questions/tagged/reducehttp://stackoverflow.com/questions/tagged/reducehttp://stackoverflow.com/questions/tagged/reducehttp://stackoverflow.com/q/329423http://stackoverflow.com/posts/329423/edithttp://stackoverflow.com/posts/329423/edithttp://stackoverflow.com/posts/329423/edithttp://stackoverflow.com/users/15055/claudiuhttp://stackoverflow.com/users/15055/claudiuhttp://stackoverflow.com/users/40756/die-in-sentehttp://stackoverflow.com/users/40756/die-in-sentehttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment167952_329423http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment167952_329423http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment167952_329423http://stackoverflow.com/users/10259/frank-sheararhttp://stackoverflow.com/users/10259/frank-sheararhttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment2598366_329423http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment2598366_329423http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment2598366_329423http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce?answertab=active#tab-tophttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce?answertab=votes#tab-tophttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce?answertab=votes#tab-tophttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce?answertab=votes#tab-tophttp://stackoverflow.com/users/15055/claudiuhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1ODMxMzg5OSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjgsImRtIjoxLCJmYyI6MTY4ODUsImZsIjoyNDQ0LCJrdyI6Im11bHRpdGhyZWFkaW5nLG9wdGltaXphdGlvbixtYXAsbXVsdGljb3JlLHJlZHVjZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQzLCJkaSI6IjIwNjAyY2E5NmYwNjQ5YmU5OTY0NTYxODU2ZmNlYmYyIiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=XRMhnkUbSENt1X2UQFlg-brWZ9Yhttp://stackoverflow.com/users/179529/guyhttp://stackoverflow.com/users/15055/claudiuhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1ODMxMzg5OSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjgsImRtIjoxLCJmYyI6MTY4ODUsImZsIjoyNDQ0LCJrdyI6Im11bHRpdGhyZWFkaW5nLG9wdGltaXphdGlvbixtYXAsbXVsdGljb3JlLHJlZHVjZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQzLCJkaSI6IjIwNjAyY2E5NmYwNjQ5YmU5OTY0NTYxODU2ZmNlYmYyIiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=XRMhnkUbSENt1X2UQFlg-brWZ9Yhttp://stackoverflow.com/users/179529/guyhttp://stackoverflow.com/users/15055/claudiuhttp://engine.adzerk.net/r?e=eyJ0cyI6MTM2Mzg1ODMxMzg5OSwiYXYiOjQxNCwiYXQiOjQsImNtIjo4NDcsImNoIjoxMTc4LCJjciI6MTA3NjgsImRtIjoxLCJmYyI6MTY4ODUsImZsIjoyNDQ0LCJrdyI6Im11bHRpdGhyZWFkaW5nLG9wdGltaXphdGlvbixtYXAsbXVsdGljb3JlLHJlZHVjZSIsIm53IjoyMiwicmYiOiJodHRwOi8vc3RhY2tvdmVyZmxvdy5jb20vc2VhcmNoP3BhZ2U9MzAyIiwicnYiOjAsInByIjoxNTY4LCJzdCI6ODI3Nywiem4iOjQzLCJkaSI6IjIwNjAyY2E5NmYwNjQ5YmU5OTY0NTYxODU2ZmNlYmYyIiwidXIiOiJodHRwOi8vY2FyZWVycy5zdGFja292ZXJmbG93LmNvbS9qb2JzL3RlbGVjb21tdXRlIn0&s=XRMhnkUbSENt1X2UQFlg-brWZ9Yhttp://stackoverflow.com/users/179529/guyhttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce?answertab=votes#tab-tophttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce?answertab=active#tab-tophttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce?answertab=active#tab-tophttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment2598366_329423http://stackoverflow.com/users/10259/frank-sheararhttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368#comment167952_329423http://stackoverflow.com/users/40756/die-in-sentehttp://stackoverflow.com/users/15055/claudiuhttp://stackoverflow.com/users/15055/claudiuhttp://stackoverflow.com/users/15055/claudiuhttp://stackoverflow.com/posts/329423/edithttp://stackoverflow.com/q/329423http://stackoverflow.com/questions/tagged/reducehttp://stackoverflow.com/questions/tagged/multicorehttp://stackoverflow.com/questions/tagged/maphttp://stackoverflow.com/questions/tagged/optimizationhttp://stackoverflow.com/questions/tagged/multithreadinghttp://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreduce/425368http://stackoverflow.com/questions/329423/parallelizing-the-reduce-in-mapreducehttp://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hivehttp://help.papertrailapp.com/kb/analytics/log-analytics-with-hadoop-and-hivehttp://stackoverflow.com/users/179529/guyhttp://stackoverflow.com/users/179529/guyhttp://stackoverflow.com/users/179529/guyhttp://stackoverflow.com/posts/9079830/revisionshttp://stackoverflow.com/posts/9079830/edithttp://stackoverflow.com/a/9079830


13/26

up


If your reduction underlying operation is associative*, you can play with the order of operations and locality.Therefore you often have a tree-like structure in the 'gather' phase, so you can do it in several passes inlogarithmic time:

a + b + c + d\ / \ /

(a+b) (c+d)\ /

((a+b)+(c+d))instead of (((a+b)+c)+d)

If your operation is commutative, further optimization are possible as you can gather in different order (it may be important for data alignment when those operations are vector operations for example)

[*] your real desired mathematical operations, not those on effective types like floats of course.

share |improve this answer edited Dec 1 '08 at 14:07 answered Nov 30 '08 at

Piotr Lesnicki 3,244 11119

Do you mean "associative" rather than "commutative"? Patrick McElhaney Dec 1 '08 at 2:38

You're right, thanks, I meant associative, corrected! But in fact it also helps if the operation is commutative, so that youcan gather your chunks in any order (you do that for data alignment issues for example) Piotr Lesnicki Dec 1 '08 at14:02

up

vote4down vote

Yes, if the operator is associative. For example, you can parallelise summing a list of numbers:

step 1: 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8step 2: 3 + 7 + 11 + 15step 3: 10 + 26step 4: 36This works because (a+b)+c = a+(b+c), i.e. the order in which the additions are per

Documents

How Does Hive Compare to HBase