62
1 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc Big Data Concepts & Practice Vladimir Suvorov [email protected] EMC & DataScienceSquad.com

Vladimir_Suvorov_Big_data

Embed Size (px)

DESCRIPTION

Meeting #1. Game|Changers. Data Mining Track.

Citation preview

Page 1: Vladimir_Suvorov_Big_data

1 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Big Data Concepts & Practice

Vladimir Suvorov [email protected]

EMC &

DataScienceSquad.com

Page 2: Vladimir_Suvorov_Big_data

2 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

About myself

Page 3: Vladimir_Suvorov_Big_data

© 2012 IBM Corporation February 16, 2013

Why Big Data

How We Got Here

Page 4: Vladimir_Suvorov_Big_data

4 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 4

…by t

he e

nd o

f 2011,

this

was a

bout

30

billion a

nd g

row

ing e

ven f

aste

r

In 2

005 t

here

were

1.3

billion R

FID

t

ags in c

ircula

tion…

Page 5: Vladimir_Suvorov_Big_data

5 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

An increasingly sensor-enabled and instrumented business environment generates HUGE volumes of

data with MACHINE SPEED characteristics…

1 BILLION lines of code

EACH engine generating 10 TB every 30 minutes!

Page 6: Vladimir_Suvorov_Big_data

6 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

350B Transactions/Year

Meter Reads every 15 min.

3.65B – meter reads/day 120M – meter reads/month

Page 7: Vladimir_Suvorov_Big_data

7 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

In August of 2010, Adam Savage, of “Myth Busters,” took a photo of his vehicle using his smartphone. He then posted the photo to his Twitter account including the phrase “Off to work.”

Since the photo was taken by

his smartphone, the image contained metadata revealing the exact geographical location the photo was taken

By simply taking and posting a

photo, Savage revealed the exact location of his home, the vehicle he drives, and the time he leaves for work

Page 8: Vladimir_Suvorov_Big_data

8 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

The Social Layer in an Instrumented Interconnected World

2+ billion

people on the

Web by end

2011

30 billion

RFID tags today (1.3B in 2005)

4.6 billion

camera phones

world wide

100s of millions of GPS

enabled

devices sold

annually

76 million smart

meters in 2009… 200M by 2014

12+ TBs

of tweet data every day

25+ TBs of log data

every day

? T

Bs o

f data

every

day

Page 9: Vladimir_Suvorov_Big_data

9 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Twitter Tweets per Second Record Breakers of 2011

Page 10: Vladimir_Suvorov_Big_data

10 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Extract Intent, Life Events, Micro Segmentation

Attributes

Jo Jobs

Tina Mu

Tom Sit

Pauline

Name, Birthday, Family

Not Relevant - Noise

Not Relevant - Noise

Monetizable Intent

Monetizable Intent Relocation

Location Wishful Thinking

SPAMbots

Page 11: Vladimir_Suvorov_Big_data

11 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Extracting insight from an immense volume, variety and velocity of data, in context, beyond what was previously possible

Big Data Includes Any of the following Characteristics

Manage the complexity of data in many different structures, ranging from relational, to logs, to raw text Streaming data and large volume data movement Scale from Terabytes to Petabytes (1K TBs) to Zetabytes (1B TBs)

Variety: Velocity: Volume:

Page 12: Vladimir_Suvorov_Big_data

12 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

• Retailers collect click-stream data from Web site interactions and loyalty card data

– This traditional POS information is used by retailer for shopping basket analysis, inventory replenishment, +++

– But data is being provided to suppliers for customer buying analysis

• Healthcare has traditionally been dominated by paper-based systems, but this information is getting digitized

• Science is increasingly dominated by big science initiatives

– Large-scale experiments generate over 15 PB of data a year and can’t be stored within the data center; sent to laboratories

• Financial services are seeing large and large volumes through smaller trading sizes, increased market volatility, and technological improvements in automated and algorithmic trading

• Improved instrument and sensory technology

– Large Synoptic Survey Telescope’s GPixel camera generates 6PB+ of image data per year or consider Oil and Gas industry

Bigger and Bigger Volumes of Data

Page 13: Vladimir_Suvorov_Big_data

13 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Data AVAILABLE to

an organization

Data an organization

can PROCESS

The Big Data Conundrum

• The percentage of available data an enterprise can analyze is decreasing

proportionately to the available to it

Quite simply, this means as enterprises, we are getting

“more naive” about our business over time

We don’t know what we could already know….

Page 14: Vladimir_Suvorov_Big_data

14 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Why Not All of Big Data Before: Didn’t have the Tools?

Page 15: Vladimir_Suvorov_Big_data

15 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Applications for Big Data Analytics

Homeland Security

Finance Smarter Healthcare Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics

Fraud and Risk

Log Analysis

Search Quality

Retail: Churn, NBO

Page 16: Vladimir_Suvorov_Big_data

16 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

16

Most Requested Uses of Big Data • Log Analytics & Storage

• Smart Grid / Smarter Utilities

• RFID Tracking & Analytics

• Fraud / Risk Management & Modeling

• 360° View of the Customer

• Warehouse Extension

• Email / Call Center Transcript Analysis

• Call Detail Record Analysis

• +++

Page 17: Vladimir_Suvorov_Big_data

17 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

What companies & analytics think of Big Data

Page 18: Vladimir_Suvorov_Big_data

18 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Gartner & McKinsley

Page 19: Vladimir_Suvorov_Big_data

19 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Hype Cycle of Big Data

Page 20: Vladimir_Suvorov_Big_data

20 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Priority matrix

Page 21: Vladimir_Suvorov_Big_data

21 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Key vision • Predictive modeling is gaining momentum with property

and casualty (P&C) companies who are using them to

support claims analysis, CRM, risk management, pricing

and actuarial workflows, quoting, and underwriting.

• Social content is the fastest growing category of new

content in the enterprise and will eventually attain 20%

market penetration.

• Gartner reports that 45% as sales management teams

identify sales analytics as a priority to help them

understand sales performance, market conditions and

opportunities.

• Over 80% of Web Analytics solutions are delivered via

Software-as-a-Service (SaaS).

Page 22: Vladimir_Suvorov_Big_data

22 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Big Data deliverables by McKinsley

Page 23: Vladimir_Suvorov_Big_data

23 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Page 24: Vladimir_Suvorov_Big_data

24 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Intel

Page 25: Vladimir_Suvorov_Big_data

25 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Intel Big Data Cluster Example Application Big Data Algorithms Compute

Style

Scientific study (e.g. earthquake study)

Ground model Earthquake simulation, thermal conduction, …

HPC

Internet library search

Historic web snapshots

Data mining MapReduce

Virtual world analysis

Virtual world database

Data mining TBD

Language translation

Text corpuses, audio archives,…

Speech recognition, machine translation, text-to-speech, …

MapReduce & HPC

Video search Video data Object/gesture identification, face recognition, …

MapReduce

25

There has been more video uploaded to YouTube in the last 2 months than if ABC, NBC, and CBS had been airing content 24/7/365 continuously since 1948. - Gartner

Page 26: Vladimir_Suvorov_Big_data

26 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc 26

Example Motivating Application:

Online Processing of Archival Video • Research project: Develop a context recognition system that is 90% accurate over

90% of your day

• Leverage a combination of low- and high-rate sensing for perception

• Federate many sensors for improved perception

• Big Data: Terabytes of archived video from many egocentric cameras

• Example query 1: “Where did I leave my briefcase?”

• Sequential search through all video streams [Parallel Camera]

• Example query 2: “Now that I’ve found my briefcase, track it”

• Cross-cutting search among related video streams [Parallel Time]

26

Big Data Cluster

Page 27: Vladimir_Suvorov_Big_data

27 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Oracle

Page 28: Vladimir_Suvorov_Big_data

28 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Big Data Use Cases

Today’s Challenge New Data What’s Possible

Healthcare

Expensive office visits

Remote patient

monitoring

Preventive care,

reduced hospitalization

Manufacturing

In-person support Product sensors

Automated diagnosis,

support

Location-Based

Services

Based on home zip

code

Real time location data Geo-advertising, traffic,

local search

Public Sector

Standardized services Citizen surveys

Tailored services,

cost reductions

Retail

One size fits all

marketing

Social media Sentiment analysis

segmentation

Page 29: Vladimir_Suvorov_Big_data

29 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

•Operational efficiency and productivity

•Fraud detection and prevention

•Close tax gaps

•Value for money for citizens

•Prevent crime waves

•Customize actions based on population

segments

•Public utilities to reduce consumption

•Produce safety from farm to fork

What’s in Big Data for Public Sector

Page 30: Vladimir_Suvorov_Big_data

30 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Microsoft

Page 31: Vladimir_Suvorov_Big_data

31 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Increases ad revenue by processing 3.5 billion events per day

Massive Volumes Processes 464 billion rows per quarter, with average query time under 10 secs.

Measures and ranks online user influence by processing 3 billion signals per day

Cloud Connectivity Connects across 15 social networks via the cloud for data and API access

Improving investigation time by analyzing large volume & variety of data

Real-Time Insight Cut investigation time from 2 years to 15 days

New opportunities

Page 32: Vladimir_Suvorov_Big_data

32 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Microsoft’s Approach to Big Data

Page 33: Vladimir_Suvorov_Big_data

33 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

A Holistic Big Data Solution from Microsoft

Page 34: Vladimir_Suvorov_Big_data

34 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Data Scientist Job

Page 35: Vladimir_Suvorov_Big_data

35 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Sexy Job of Data Scientist

Tom Davenport, who is teaching an executive

program in Big Data and analytics at Harvard

University, said some data scientists are

earning annual salaries as high as $300,000,

which is “pretty good for somebody that

doesn't have anyone else working for them.”

Davenport also said such workers are

motivated by the problems and opportunities

data provides.

Page 36: Vladimir_Suvorov_Big_data

36 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

What EMC Think of Data Scientists

Page 37: Vladimir_Suvorov_Big_data

37 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Job evolution

Page 38: Vladimir_Suvorov_Big_data

38 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

What Forbes think of Data Scientists

Page 39: Vladimir_Suvorov_Big_data

39 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Data Science Courses

Page 40: Vladimir_Suvorov_Big_data

40 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Course Modules and Navigation Icons

40 Introduction and Course Agenda

Data Science and Big Data Analytics

1. Introduction to Big Data Analytics

2. Data Analytics Lifecycle + Lab

3. Review of Basic Data Analytics Methods Using R +

Labs

4. Advanced Analytics - Theory & Methods + Labs

5. Advanced Analytics - Technology & Tools + Labs

6. The Endgame, or Putting it All Together + Final Lab

Page 41: Vladimir_Suvorov_Big_data

41 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Topics : Data Science and Big Data Analytics

Course

41

Introduction to Big Data Analytics + Data Analytics Lifecycle

Review of Basic Data Analytic Methods Using R

Advanced Analytics – Theory and Methods

Advanced Analytics - Technology and Tools

The Endgame, or Putting it All Together + Final Lab on Big Data Analytics

Big Data Overview State of the Practice in Analytics The Data Scientist Big Data Analytics in Industry Verticals Data Analytics Lifecycle

Using R to Look at Data - Introduction to R Analyzing and Exploring the Data Statistics for Model Building and Evaluation

K-means Clustering Association Rules Linear Regression Logistic Regression Naive Bayesian Classifier Decision Trees Time Series Analysis Text Analysis

Analytics for Unstructured Data (MapReduce and Hadoop) The Hadoop Ecosystem In-database Analytics – SQL Essentials Advanced SQL and MADlib for In-database Analytics

Operationalizing an Analytics Project Creating the Final Deliverables Data Visualization Techniques + Final Lab – Application of the Data Analytics Lifecycle to a Big Data Analytics Challenge

Page 42: Vladimir_Suvorov_Big_data

42 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Hadoop

Page 43: Vladimir_Suvorov_Big_data

43 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Top companies need Hadoop

Page 44: Vladimir_Suvorov_Big_data

44 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

What is Hadoop and Where did it start?

• Created by Doug Cutting, formerly of Yahoo! Now Cloudera

– HDFS (storage) & MapReduce (compute)

– Inspired by Google’s MapReduce and Google File System (GFS) papers

• Much of the initial work on Hadoop was done by Yahoo

• It is now a top-level Apache project backed by large open source development community

Page 45: Vladimir_Suvorov_Big_data

45 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

What is Hadoop?

• Storage & Compute in 1 Framework • Open Source Project of the Apache Software Foundation • Written in Java

HDFS MapReduce

Two Core Components

Storage in the Hadoop Distributed File System

Compute via the MapReduce distributed processing platform

Page 46: Vladimir_Suvorov_Big_data

46 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Hadoop cluster architecture

Page 47: Vladimir_Suvorov_Big_data

47 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

MapReduce example

Page 48: Vladimir_Suvorov_Big_data

48 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Hadoop versions

Page 49: Vladimir_Suvorov_Big_data

49 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Hadoop Wave Report

“EMC Greenplum is the first mover in Hadoop appliances. EMC Greenplum the first EDW vendor to provide a full-featured enterprise-grade Hadoop appliance and roll out an appliance family that integrates its Hadoop, EDW, and data integration in a single rack. It provides its own open source Hadoop distribution software, integrates EMC’s strong storage product portfolio in its appliances, and has an extensive professional services force of EMC technical consultants and data scientists with Hadoop expertise.”

Page 50: Vladimir_Suvorov_Big_data

50 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Hadoop Players Today

Page 51: Vladimir_Suvorov_Big_data

51 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Get Started With Hadoop Today

Hadoop Architecture Services – POC planning and deployment

– Installation and best practices

– Educate the team

Greenplum Analytics Labs – Leverage the expertise of Greenplum’s

Data Scientists

– Packaged solutions that produce business value and actionable results

– Accelerate Hadoop capabilities on your data with your analysts

Establish a strategic vision – Roadmap for Hadoop and unified analytics

Data Scientists & Hadoop Architecture teams deliver customer success

Page 52: Vladimir_Suvorov_Big_data

52 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

The Greenplum Unified Analytics Platform

Page 53: Vladimir_Suvorov_Big_data

53 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

NoSQL

Page 54: Vladimir_Suvorov_Big_data

54 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Definition from nosql-databases.org • Next Generation Databases mostly addressing

some of the points: being non-relational, distributed, open-source and horizontal scalable. The original intention has been modern web-scale databases. The movement began early 2009 and is growing rapidly. Often more characteristics apply as: schema-free, easy replication support, simple API, eventually consistent /BASE (not ACID), a huge data amount, and more. So the misleading term "nosql" (the community now translates it mostly with "not only sql") should be seen as an alias to something like the definition above.

Page 55: Vladimir_Suvorov_Big_data

55 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

NoSQL

http://nosql-database.org/

• Non relational

• Scalability – Vertically

• Add more data

– Horizontally • Add more storage

• Collection of structures – Hashtables, maps, dictionaries

• No pre-defined schema

• No join operations

• CAP not ACID – Consistency, Availability and Partitioning (but not all three at

once!)

– Atomicity, Consistency, Isolation and Durability

Page 56: Vladimir_Suvorov_Big_data

56 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Advantages of NoSQL

• Cheap, easy to implement

• Data are replicated and can be partitioned

• Easy to distribute

• Don't require a schema

• Can scale up and down

• Quickly process large amounts of data

• Relax the data consistency requirement (CAP)

• Can handle web-scale data, whereas Relational DBs cannot

Page 57: Vladimir_Suvorov_Big_data

57 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Disadvantages of NoSQL

• New and sometimes buggy

• Data is generally duplicated, potential for inconsistency

• No standardized schema

• No standard format for queries

• No standard language

• Difficult to impose complicated structures

• Depend on the application layer to enforce data integrity

• No guarantee of support

• Too many options, which one, or ones to pick

Page 58: Vladimir_Suvorov_Big_data

58 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

NoSQL Options

Key-Value Stores

• This technology you know and love and use all the

time

– Hashmap for example

• Put(key,value)

• value = Get(key)

• Examples

– Redis (my favorite!!) – in memory store

– Memcached

– and 100s more

Page 59: Vladimir_Suvorov_Big_data

59 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Column Stores

• Not to be confused with the relational-db version

of this

– Sybase-IQ etc.

• Multi-dimensional map

• Not all entries are relevant each time

– Column families

• Examples

– Cassandra

– Hbase

– Amazon SimpleDB

Page 60: Vladimir_Suvorov_Big_data

60 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Document Stores

• Key-document stores

– However the document can be seen as a value so

you can consider this is a super-set of key-value

• Big difference is that in document stores one can

query also on the document, i.e. the document

portion is structured (not just a blob of data)

• Examples

– MongoDB

– CouchDB

Page 61: Vladimir_Suvorov_Big_data

61 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc

Graph Stores

• Use a graph structure

– Labeled, directed, attributed multi-graph

• Label for each edge

• Directed edges

• Multiple attributes per node

• Multiple edges between nodes

– Relational DBs can model graphs, but an edge

requires a join which is expensive

• Example Neo4j

– http://www.infoq.com/articles/graph-nosql-neo4j

Page 62: Vladimir_Suvorov_Big_data

62 Non-commercial education only. Corresponding information belongs to its respectful owner. These includes EMC, IBM, Microsoft, Oracle, Gartner etc