BIG
DATA
WHY NOW ?
World’s information totaled over
2 Zetabytes
That’s 2 Trillion Gigabytes
By 2020, this number will be
35 Trillion ZB
“world’s data is doubling every 1.2 years”
“80% of this data is unstructured”
5 V
Money
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2013
Today
Apache Cassandra
Apache Hadoop
Amazon Dynamo
Big Table
Map Reduce
Google File System
Dremel
Impala
Spanner ?
Analytics
(Hadoop)
Realtime
(“NoSql”)
TH
E E
CO
SY
STEM
Hadoop Ecosystem
Apache Hadoop is an open-source software
framework that supports running applications on
large clusters of commodity hardware.
Replication
Fault Tolerant
Commodity Hardware
Map Reduce
Map Reduce
Word Count
World's largest biometric identity platform
2,00,00,00,00,000 Biometric Matches
2 PB Data
Hadoop Stack
This is just the Beginning of “Big Data Revolution”
This is just the Beginning of “Big Data Revolution”
@sameersaw at twitter
Images
Raymond Bryson Marius B IntelFreePress License Pedro Moura Pinheiro