Upload
ideaport
View
131
Download
5
Embed Size (px)
Citation preview
Seeing the Big Picture Through Big Data
Onur Karadeli
Mustafa Murat Sever
March-2016C1 - Public
Agenda
• What is Big Data?• Use Cases• Apache Hadoop Ecosystem• Q&A
C1 - Public
Big Data ?
C1 - Public
What is Big Data ?
4C1 - Public
Big Data is growing (Google Trends)
5C1 - Public
Definition of Big Data
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate.
Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying and information privacy.
-BigData WIKIPEDIA
What is Big ?
6C1 - Public
The ‘3V’ s
• Volume• Velocity• Variety
7C1 - Public
Volume
• %40 Growth per year
• 50 Zettabytes by 2020
Ref:Where-is-your-data-FINAL-5a
8C1 - Public
Volume Scalibility
Single TV
Multi TVs
9C1 - Public
Velocity
Ref: http://wersm.com/how-much-data-is-generated-every-minute-on-social-media/10C1 - Public
Velocity – Realtime triggers
11C1 - Public
Variety
12
Ref: Relational Solutions
C1 - Public
Additional ‘V’ s
• Veracity• Variability• Visualization• Value
Ref: http://blog.sqlauthority.com
Where to Use Big Data
C1 - Public
It’s a Big Family
15C1 - Public
16
Human Being as a Big Data Source
C1 - Public
Every moment, new data...
17C1 - Public
18
They are smart now
C1 - Public
Health
19https://www.youtube.com/watch?v=Lyv0_GIGSbY
C1 - Public
Some professions will disappear
News reporters Sonographers LawyersSports reporters Phlebotomists Compliance officers/workersWall street reporters Radiologists Bill collectorsJournalists Psychotherapists Meeting/event plannersAuthors Counselors Fitness coachesPsychologists Cost estimators CryptographersMilitary planners Accountants Financial planners/advisorsLogisticians Dietitians Tax advisors Interpreters/translatorsNutritionists Customer service reps AuditorsDoctors Teachers
* By Thomas Frey (Senior Futurist @ Da Vinci Institute)
C1 - Public
Not Only Humans ...
21C1 - Public
Not Only Humans: Connected Cows !
22C1 - Public
Not Only Humans: Connected Cows !
23C1 - Public
New opportunities for tech companies and new brands
24C1 - Public
Social Data
25C1 - Public
Visuality is important
26
https://www.youtube.com/watch?v=ujcrJZRSGkg
C1 - Public
Just Music
27C1 - Public
Not a rocket science, but ...
28
Discover Weekly Data Flow
C1 - Public
Not a rocket science, but ...
29
Implicit Matrix Factorization
C1 - Public
Other trend use cases
30C1 - Public
Apache Hadoop
C1 - Public
Apache Hadoop • Open-Source Projects/Sub-projects of
Apache.
• Core projectsHDFS: Hadoop Distributed File SystemMapReduce: Distributed Data processing
...• Hadoop is not a database.
• Move computation to data !
• Now- %32 percent of all enterprise uses Apache Hadoop.
32C1 - Public
Apache Hadoop History
• 2003 Google File system paper• 2006 Hadoop subproject created• 2008 Sort record: Running on a 910-node cluster, Hadoop sorted one
terabyte in 209 seconds• 2009 Yahoo runs 17 clusters with 24,000 machines• 2011 Facebook, LinkedIn, eBay and IBM collectively contribute 200,000
lines of code
Ref: https://en.wikipedia.org/wiki/Apache_Hadoop
33C1 - Public
Apache Hadoop Base Components & Enablers
Ref: http://synerzip.com - Innovation – It’s in our DNA
34C1 - Public
BI & Visualization example
35
Ref: http://forums.bsdinsight.com/articles/?page=4
C1 - Public
Hadoop Platforms
Ecosystem Management Software for Platform Management.
36
Examples:• Cloudera• Hortonworks• IBM• Pivotal
C1 - Public
HDFS File Storage Architecture
37C1 - Public
Hadoop Topology
38C1 - Public
Task Management
39C1 - Public
Map & Reduce
40C1 - Public
The Best Big Data Team should have ...
41
• Data Hygienists – for clean data
• Data Explorers – discover data to use
• Business Solution Architects – combine data for a use case
• Data Scientists – for the right model
• Campaign Expert – for the best benefit
* From HBR : https://hbr.org/2013/07/five-roles-you-need-on-your-bi
C1 - Public
Data Scientists Skills
42C1 - Public
Thank you
C1 - Public