138
1 Global Marketing Confidential REMINDER Check in on the COLLABORATE mobile app #C14LV 207:Surviving and thriving in the big data revolution Guy Harrison Executive Director, R&D Information Management Group Dell Software

Thriving and surviving the Big Data revolution

Embed Size (px)

DESCRIPTION

Presentation on Big Data given at Collaborate 2014 #c14lv

Citation preview

Page 1: Thriving and surviving the Big Data revolution

1 Global MarketingConfidential

REMINDER

Check in on the COLLABORATE mobile app

C

14

LV

207Surviving and thriving in the big data revolution

Guy Harrison

Executive Director RampD

Information Management Group

Dell Software

207Surviving and thriving in the big data revolution

Guy Harrison

Executive Director RampDInformation management group

3 Software Group

Introductions

Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1

4 Software Group

5 Software Group

6 Software Group

7 Software Group

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 2: Thriving and surviving the Big Data revolution

207Surviving and thriving in the big data revolution

Guy Harrison

Executive Director RampDInformation management group

3 Software Group

Introductions

Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1

4 Software Group

5 Software Group

6 Software Group

7 Software Group

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 3: Thriving and surviving the Big Data revolution

3 Software Group

Introductions

Web guyharrisonnet Email guyharrisonsoftwaredellcom Twitter guyharrisonGoogle Plus httpswwwgooglecom+GuyHarrison1

4 Software Group

5 Software Group

6 Software Group

7 Software Group

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 4: Thriving and surviving the Big Data revolution

4 Software Group

5 Software Group

6 Software Group

7 Software Group

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 5: Thriving and surviving the Big Data revolution

5 Software Group

6 Software Group

7 Software Group

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 6: Thriving and surviving the Big Data revolution

6 Software Group

7 Software Group

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 7: Thriving and surviving the Big Data revolution

7 Software Group

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 8: Thriving and surviving the Big Data revolution

8 Software Group

Dell and Quest ndash a brief history

>
>

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 9: Thriving and surviving the Big Data revolution

9 Software Group

But Seriously

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 10: Thriving and surviving the Big Data revolution

10 Software Group

What is Big Data

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 11: Thriving and surviving the Big Data revolution

11 Software Group

Three or Four ldquoVrdquos

VolumeTerabytesPetabytesExabytesZetabytes

VarietyStructuredUnstructuredHuman GeneratedMachine Generated

VelocityUser populations xTransaction rates xMachine data

Value Competitive or Collective advantage

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 12: Thriving and surviving the Big Data revolution

12 Software Group

Instead - the industrial Revolution of data

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 13: Thriving and surviving the Big Data revolution

13 Software Group

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 14: Thriving and surviving the Big Data revolution

14 Software Group

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 15: Thriving and surviving the Big Data revolution

15 Software Group

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 16: Thriving and surviving the Big Data revolution

16 Software Group

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 17: Thriving and surviving the Big Data revolution

17 Software Group

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 18: Thriving and surviving the Big Data revolution

18 Software Group

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 19: Thriving and surviving the Big Data revolution

19 Software Group

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 20: Thriving and surviving the Big Data revolution

20 Software Group

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 21: Thriving and surviving the Big Data revolution

21 Software Group

Generated internally

Key to operational efficiency

1993

Generated externally

Key to competitive advantage

Source of product innovation

Changing our lives

2013

Data means more

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 22: Thriving and surviving the Big Data revolution

22 Software Group

Big Data is the culmination of cloud social and mobile

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 23: Thriving and surviving the Big Data revolution

23 Software Group

Not all upside

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 24: Thriving and surviving the Big Data revolution

24 Software Group

Will Big Data kill retail

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 25: Thriving and surviving the Big Data revolution

25 Software Group

Prevalence of Showrooming

Consumer Electronics

Home Improvement

0 10 20 30 40 50 60 70

Pct

Garter Research G00249458Survey Analysis Focus on Customer Basics to Challenge Amazon as Showrooming Is Universal but Not UnbeatablePublished 12 February 2013

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 26: Thriving and surviving the Big Data revolution

26 Software Group

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 27: Thriving and surviving the Big Data revolution

27 Software Group

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 28: Thriving and surviving the Big Data revolution

28 Software Group

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 29: Thriving and surviving the Big Data revolution

29 Software Group

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 30: Thriving and surviving the Big Data revolution

30 Software Group

Some novel defences

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 31: Thriving and surviving the Big Data revolution

31 Software Group

Web analytics for retail

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 32: Thriving and surviving the Big Data revolution

32 Software Group

Connected Store

bull Shelf assortment optimization

bull In store offers

bull Customer entertainment

bull Checkout anywhere

bull Relationship management

bull Customer analytics

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 33: Thriving and surviving the Big Data revolution

33 Software Group

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 34: Thriving and surviving the Big Data revolution

34 Software Group

Why showrooming

Selection

Stock

Faster

Cheaper

Dynamic Pricing

Predictive ordering

Assortment optimization

Predictive recommendations

Personalization

Defences

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 35: Thriving and surviving the Big Data revolution

35 Software Group

Itrsquos not enough to lay out products on tables

bull Online has significant advantages

bull Retailers can only survive by embracing online and emulating online practicesndash Dynamic pricingndash Shelf optimizationndash Personalized service and selection

bull Only big data analytics can provide these advantages

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 36: Thriving and surviving the Big Data revolution

36 Software Group

Therersquos a similar story in every industry

Web

Transport

Power Grid

Dating

Retail

SecurityFinance

Government

Science

Healthcare

Insurance

Telecom

Advertising

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 37: Thriving and surviving the Big Data revolution

37 Software Group

The Revolution is not over yet

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 38: Thriving and surviving the Big Data revolution

38 Software Group

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 39: Thriving and surviving the Big Data revolution

39 Software Group

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 40: Thriving and surviving the Big Data revolution

40 Software Group

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 41: Thriving and surviving the Big Data revolution

41 Software Group

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 42: Thriving and surviving the Big Data revolution

42 Software Group

Willy Bowman

Nationality German

Donrsquot Mention the WAR

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 43: Thriving and surviving the Big Data revolution

43 Software Group

Buying choices

Amazon softcover $4599

Oracle Performance Survival Guide

Amazon Kindle $3999

Say ldquoscrew you booksellerrdquo to buy kindle version

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 44: Thriving and surviving the Big Data revolution

44 Software Group

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 45: Thriving and surviving the Big Data revolution

45 Software Group

Data Input

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 46: Thriving and surviving the Big Data revolution

46 Software Group

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 47: Thriving and surviving the Big Data revolution

Siri

From now on Irsquoll call you lsquoAn Ambulancersquo OK

ldquoSiri call me an ambulancerdquo

I found 14 bridges nearby

ldquoI want to jump off a bridgerdquo

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 48: Thriving and surviving the Big Data revolution

48 Software Group

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 49: Thriving and surviving the Big Data revolution

49 Software Group

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 50: Thriving and surviving the Big Data revolution

50 Software Group

Brain Control

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 51: Thriving and surviving the Big Data revolution

51 Software Group

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 52: Thriving and surviving the Big Data revolution

52 Software Group

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 53: Thriving and surviving the Big Data revolution

53 Software Group

Muze

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 54: Thriving and surviving the Big Data revolution

54 Software Group

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 55: Thriving and surviving the Big Data revolution

55 Software Group

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 56: Thriving and surviving the Big Data revolution

56 Software Group

The instrumented human

bull Bluetooth Personal Area Network

bull 3GWiFi Wide Area Network

bull GPSbull Storage

bull Pulse temp monitor

bull Silent alarmsbull Pedometer sleep

monitoring

bull Compass bull Camerabull Mikeearphonesbull Heads up displaybull EmotionAttention

monitor

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 57: Thriving and surviving the Big Data revolution

57 Software Group

The instrumented world

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 58: Thriving and surviving the Big Data revolution

58 Software Group

All of which accelerates what we call Big Data

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 59: Thriving and surviving the Big Data revolution

59 Software Group

Big Database technologies

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 60: Thriving and surviving the Big Data revolution

60 Software Group

Pioneers of Big Data

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 61: Thriving and surviving the Big Data revolution

61 Software Group

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 62: Thriving and surviving the Big Data revolution

62 Software Group

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 63: Thriving and surviving the Big Data revolution

63 Software Group

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 64: Thriving and surviving the Big Data revolution

64 Software Group

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 65: Thriving and surviving the Big Data revolution

65 Software Group

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 66: Thriving and surviving the Big Data revolution

66 Software Group

Google File System (GFS)

Map Reduce BigTable

Google Applications

Google Software Architecture

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 67: Thriving and surviving the Big Data revolution

67 Software Group

Start ReduceMapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

MapMap

Map Reduce

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 68: Thriving and surviving the Big Data revolution

68 Software Group

HDFS

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

MAPPER

SCANSORT

MAPPER

MAPPER

MAPPER

MAPPER

AGGREGATE

REDUCEClient

Multi-stage Map-Reduce

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 69: Thriving and surviving the Big Data revolution

69 Software Group

Schema on Read vs Schema on Write

Data

Analyse

Aggregate

Normalize

Cleanse

CodeExtract

Load Transform Data Warehouse

Data LoadHadoop

Analyse

Cleanse

Code

Utilize

Schema on Write

Schema on Read

Utilize

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 70: Thriving and surviving the Big Data revolution

70 Software Group

Hadoop Open Source Map-Reduce Stack

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 71: Thriving and surviving the Big Data revolution

71 Software Group

Hadoop at Yahoo

Yahoo Hadoop cluster

bull 4000 nodesbull 16PB diskbull 64 TB of RAMbull 32000 Cores

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 72: Thriving and surviving the Big Data revolution

72 Software Group

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 73: Thriving and surviving the Big Data revolution

73 Software Group

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 74: Thriving and surviving the Big Data revolution

74 Software Group

Hadoop File System (HDFS)

Map Reduce YARNHbase

(Database)ZooKeeper(Locking)

SQOOP(RDBMS loader)

Hive(Query)

Pig(Scripting)

Flume(Log Loader)

Oozie (Workflow manager)

Hadoop ecosystem

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 75: Thriving and surviving the Big Data revolution

75 Software Group

Hadoop 10 Architecture

MAP REDUCE (DISTRIBUTED PROCESSING)

HADOOP CLIENT (JAVA PIG HIVE)

HDFS (DISTRIBUTED

STORAGE)

JOB TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

SECONDARY NAME NODE

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

DATA NODE TASK TRACKER

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 76: Thriving and surviving the Big Data revolution

76 Software Group

Hadoop 20 YARN

APPLICATION MASTER

NODE MANAGER

CONTAINER

RESOURCE MANAGER

NODE MANAGER

CONTAINER

NODE MANAGER

CONTAINER

HADOOP CLIENT (JAVA PIG HIVE)

Yet Another Resource Negotiator

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 77: Thriving and surviving the Big Data revolution

77 Software Group

Tez1

1Hindi for ldquofastrdquo

HDFS

MAP

REDUCE

MAP

MAP

REDUCE

MAP

MAP

REDUCE

MAP

Job 2Job 1

Job 3

HDFS

Job 1

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 78: Thriving and surviving the Big Data revolution

78 Software Group

HBase

A Real time database built on Hadoop

ASM

Datafiles

Buffer Cache

Table Table

Redo

Disks

LogBuffe

r

HDFS

HFile

MemStore

Table Table

WA Log

Disks

HFile

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 79: Thriving and surviving the Big Data revolution

79 Software Group

Name Site Counter

Dick Ebay 507018

Dick Google 690414

Jane Google 716426

Dick Facebook 723649

Jane Facebook 643261

Jane ILoveLarrycom 856767

Dick MadBillFanscom 675230

NameId Name

1 Dick

2 Jane

SiteId SiteName

1 Ebay

2 Google

3 Facebook

4 ILoveLarrycom

5 MadBillFanscom

NameId SiteId Counter

1 1 507018

1 3 690414

2 3 716426

1 3 723649

2 3 643261

2 4 856767

1 5 675230

Id Name Ebay Google Facebook (other columns) MadBillFanscom

1 Dick 507018 690414 723649 675230

Id Name Google Facebook (other columns) ILoveLarrycom

2 Jane 716426 643261 856767

Hbase Data Model

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 80: Thriving and surviving the Big Data revolution

80 Software Group

Hive

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 81: Thriving and surviving the Big Data revolution

81 Software Group

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 82: Thriving and surviving the Big Data revolution

82 Software Group

SQL

JAV

A

RES

ULT

S

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 83: Thriving and surviving the Big Data revolution

83 Software Group

Other SQL-like Hadoop Interfaces

Cloudera Impala

MapR Drill Aster

Greenplumb (Pivotal HD) Paraccel Hadapt

Oracle SQL Connector for

Hadoop (External Table interface to

HDFS)

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 84: Thriving and surviving the Big Data revolution

84 Software Group

Pig

Pig Latin

SQL or Hive QL

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 85: Thriving and surviving the Big Data revolution

85 Software Group

Flume and SQOOP

CUSTOMERS

WebLogs

PRODUCTS

HDFS

RDBMS

FLUME

SQOOP

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 86: Thriving and surviving the Big Data revolution

86 Software Group

Berkeley Data Analytic Stack (BDAS)

Yarn Yarn EC2 Yarn

Mesos ndash heterogeneous cluster manager

Tachyon ndash in memory File system

Spark ndash memory optimized distributed execution

Spark Streaming

Mlbase Mlib ndash Machine Learning

Map Reduce

Shark (SQL) Hive (SQL)

BlinkDB

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 87: Thriving and surviving the Big Data revolution

87 Software Group

Meanwhile back at the Death Star

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 88: Thriving and surviving the Big Data revolution

88 Software Group

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 89: Thriving and surviving the Big Data revolution

89 Software Group

Oracle Exadata (X-2)

Database servers

64 cores 576 GB RAM

Storage Servers112 cores 100 TB SAS or336 TB SATA plus5 TB SSD

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 90: Thriving and surviving the Big Data revolution

90 Software Group

Economies

Exadata

Hadoop

$0 $1000 $2000 $3000 $4000 $5000 $6000

$4911

$750

Exadata vs Hadoop $$TB (Hardware only)

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 91: Thriving and surviving the Big Data revolution

93 Software Group

Oracle Big Data Appliance

bull 18 Sun X4270 M2 serversndash 48GB RAM per node (864GB total)ndash 2x6 Core CPU per node (216 total)ndash 12x2TB HDD per node (216 spindles 864 TB)ndash 40Gbs Infiniband between nodesndash 10Gbs Ethernet to datacentre

bull Competitive Pricingwwworaclecomusbigdataindexhtml

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 92: Thriving and surviving the Big Data revolution

94 Software Group

Big Data Appliance Software

bull Cloudera Enterprise

bull Oracle Enterprise R

bull Oracle NoSQL

bull Oracle Big Data Connectors

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 93: Thriving and surviving the Big Data revolution

95 Software Group

Generating competitive advantage through ldquoBig Data analyticsrdquo Machine

LearningPrograms that evolve with ldquoexperiencerdquo

Collective IntelligencePrograms that use inputs from ldquocrowdsrsquo to seem intelligent

Predictive AnalyticsPrograms that extrapolate from existing data into the future

Big Data AnalyticsAKA Data Science

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 94: Thriving and surviving the Big Data revolution

96 Software Group

Collective Intelligence

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 95: Thriving and surviving the Big Data revolution

97 Software Group

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 96: Thriving and surviving the Big Data revolution

98 Software Group

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 97: Thriving and surviving the Big Data revolution

99 Software Group

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 98: Thriving and surviving the Big Data revolution

100 Software Group

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 99: Thriving and surviving the Big Data revolution

101 Software Group

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 100: Thriving and surviving the Big Data revolution

102 Software Group

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 101: Thriving and surviving the Big Data revolution

103 Software Group

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 102: Thriving and surviving the Big Data revolution

104 Software Group

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 103: Thriving and surviving the Big Data revolution

105 Software Group

Google Flu Trends

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 104: Thriving and surviving the Big Data revolution

106 Software Group

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 105: Thriving and surviving the Big Data revolution

107 Software Group

Collective Intelligence outsmarts Artificial Intelligence

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 106: Thriving and surviving the Big Data revolution

108 Software Group

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 107: Thriving and surviving the Big Data revolution

109 Software Group

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 108: Thriving and surviving the Big Data revolution

110 Software Group

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 109: Thriving and surviving the Big Data revolution

111 Software Group

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 110: Thriving and surviving the Big Data revolution

112 Software Group

Artificial Intelligence Strikes back

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 111: Thriving and surviving the Big Data revolution

113 Software Group

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 112: Thriving and surviving the Big Data revolution

114 Software Group

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 113: Thriving and surviving the Big Data revolution

115 Software Group

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 114: Thriving and surviving the Big Data revolution

116 Software Group

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 115: Thriving and surviving the Big Data revolution

117 Software Group

Watson is big data AI

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 116: Thriving and surviving the Big Data revolution

118 Software Group

Predictive Analytics

0 20 40 60 80 100 120

-20

0

20

40

60

80

100

120

f(x) = 0971521231456065 x + 071906459527154

bull Linear regressionbull Non-linear (curve fit)bull Multivariatebull Time seriesbull Logistical Regressionbull CART

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 117: Thriving and surviving the Big Data revolution

119 Software Group

Classificationbull Create a model that

identifiesclassifies new data

bull Spam detection churn risk customer value

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 118: Thriving and surviving the Big Data revolution

120 Software Group

Clusteringbull Group data without a

pre-existing classification scheme

bull For instance basket analysis

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 119: Thriving and surviving the Big Data revolution

121 Software Group

SupervisedMachine Learning

Raw Data Clean

Validate

Model

Candidate

ModelTraining Set

Validation Set

Production

ModelNew Data

New Business

Existing Business

Prediction

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 120: Thriving and surviving the Big Data revolution

122 Software Group

Inmapslinkedincom

Unsupervised learning

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 121: Thriving and surviving the Big Data revolution

123 Software Group

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 122: Thriving and surviving the Big Data revolution

124 Software Group

Big Data Analytics

Data Science

Search Optimization

Recommendation Systems

Securitybull Vulnerabili

tybull Penetratio

n Detection

Fraud Detection

CRMbull Churn bull Defaults

Medicalbull Risk

analysisbull Diagnosisbull Prognosis

Game optimization

Advertisingbull Targetingbull Tailoring

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 123: Thriving and surviving the Big Data revolution

125 Software Group

Data Science is hard

bull Machine learning collective intelligence Hadoop predictive analytics R Weka Mahout are HARD

bull Small-medium businesses need help to compete

bull Data scientists to the rescue

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 124: Thriving and surviving the Big Data revolution

126 Software Group

Data Scientists to the rescue

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 125: Thriving and surviving the Big Data revolution

127 Software Group

Kitenga Analytics Suite

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 126: Thriving and surviving the Big Data revolution

128 Software Group

Toad for Hadoop

httpwwwtoadworldcomproductstoad-for-hadoopdefaultaspx

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 127: Thriving and surviving the Big Data revolution

129 Software Group

SharePlexreg for Hadoop

Redo-logs

Change Data Capture

JMS Queue Hadoop Poster

BatchedHDFS File Copy Audit Change

Data

HBase RealTime replication

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 128: Thriving and surviving the Big Data revolution

130 Software Group

Toad BI Suite

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 129: Thriving and surviving the Big Data revolution

131 Software Group

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 130: Thriving and surviving the Big Data revolution

132 Software GroupConfidential

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dellrsquos offering was not completehellip

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

In order to address the demands that face mid-market customers Dell must offer end-to-end solutions enabled with advanced analytic capabilities

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 131: Thriving and surviving the Big Data revolution

133 Software GroupConfidential

Dell acquires Statsoft

Data Integration

Database Management

Advanced Analytics

Business Intelligence

Server and Storage

STATISTICA

Server and Storage

TOAD amp Shareplex

TOAD BI

Boomi

Kitenga

Key co

mponents

to b

uild

end-

to-e

nd B

IA

naly

tics

solu

tions

Dell + StatSoft = completes a strong end-to-end analytics driven information management value proposition

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 132: Thriving and surviving the Big Data revolution

134 Software GroupConfidentialConfidential13

4

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 133: Thriving and surviving the Big Data revolution

135 Software GroupConfidentialConfidential

Data Visualization

135

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 134: Thriving and surviving the Big Data revolution

136 Software GroupConfidentialConfidential

Live scoring ndash integration into operational systems

136

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 135: Thriving and surviving the Big Data revolution

137 Software GroupConfidentialConfidential

Industry and cross-industry packaged solutions

137

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 136: Thriving and surviving the Big Data revolution

138 Software Group

For your business

bull How could data and algorithms transform your business

bull What are the technologies that will be most importantndash Mobilityndash Cloudndash Hadoopndash Big Data Analytics

bull Where is the datandash Start collecting now

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 137: Thriving and surviving the Big Data revolution

139 Software Group

For your career bull Hadoop and NoSQL creates

strong career opportunities for DBAs and developersndash Demand will exceed supply for

the foreseeable future

bull Lotrsquos of opportunities for those with Math amp Statisticsndash Good time to brush off that

statistics textbook and play with R (maybe Oracle Enterprise R)

bull Easy to get started with Hadoopndash SQOOPndash Hive ndash Pig

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app
Page 138: Thriving and surviving the Big Data revolution

C

14

LV

C1

4LV

Please complete the session evaluation on the mobile appWe appreciate your feedback and insight

This box will have simplified instructions about how to complete the session evaluation online

  • 207Surviving and thriving in the big data revolution
  • 207Surviving and thriving in the big data revolution (2)
  • Introductions
  • Slide 4
  • Slide 5
  • Slide 6
  • Slide 7
  • Dell and Quest ndash a brief history
  • But Seriously
  • What is Big Data
  • Slide 11
  • Instead - the industrial Revolution of data
  • Slide 13
  • Slide 14
  • Slide 15
  • Slide 16
  • Slide 17
  • Slide 18
  • Slide 19
  • Slide 20
  • Data means more
  • Big Data is the culmination of cloud social and mobile
  • Not all upside
  • Will Big Data kill retail
  • Prevalence of Showrooming
  • Slide 26
  • Slide 27
  • Slide 28
  • Slide 29
  • Some novel defences
  • Web analytics for retail
  • Connected Store
  • Slide 33
  • Why showrooming
  • Itrsquos not enough to lay out products on tables
  • Therersquos a similar story in every industry
  • The Revolution is not over yet
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Data Input
  • Slide 46
  • Siri
  • Slide 48
  • Slide 49
  • Brain Control
  • Slide 51
  • Slide 52
  • Muze
  • Slide 54
  • Slide 55
  • The instrumented human
  • The instrumented world
  • All of which accelerates what we call Big Data
  • Big Database technologies
  • Pioneers of Big Data
  • Slide 61
  • Slide 62
  • Slide 63
  • Slide 64
  • Slide 65
  • Google Software Architecture
  • Map Reduce
  • Multi-stage Map-Reduce
  • Schema on Read vs Schema on Write
  • Hadoop Open Source Map-Reduce Stack
  • Hadoop at Yahoo
  • Slide 72
  • Slide 73
  • Hadoop ecosystem
  • Hadoop 10 Architecture
  • Hadoop 20 YARN
  • Tez1
  • HBase
  • Hbase Data Model
  • Hive
  • Slide 81
  • Slide 82
  • Other SQL-like Hadoop Interfaces
  • Pig
  • Flume and SQOOP
  • Berkeley Data Analytic Stack (BDAS)
  • Meanwhile back at the Death Star
  • Slide 88
  • Oracle Exadata (X-2)
  • Economies
  • Oracle Big Data Appliance
  • Big Data Appliance Software
  • Generating competitive advantage through ldquoBig Data analyticsrdquo
  • Collective Intelligence
  • Slide 97
  • Slide 98
  • Slide 99
  • Slide 100
  • Slide 101
  • Slide 102
  • Slide 103
  • Slide 104
  • Google Flu Trends
  • Slide 106
  • Collective Intelligence outsmarts Artificial Intelligence
  • Slide 108
  • Slide 109
  • Slide 110
  • Slide 111
  • Artificial Intelligence Strikes back
  • Slide 113
  • Slide 114
  • Slide 115
  • Slide 116
  • Watson is big data AI
  • Predictive Analytics
  • Classification
  • Clustering
  • Supervised Machine Learning
  • Unsupervised learning
  • Slide 123
  • Big Data Analytics
  • Data Science is hard
  • Data Scientists to the rescue
  • Kitenga Analytics Suite
  • Toad for Hadoop
  • SharePlexreg for Hadoop
  • Toad BI Suite
  • Slide 131
  • Dellrsquos offering was not completehellip
  • Dell acquires Statsoft
  • Slide 134
  • Data Visualization
  • Live scoring ndash integration into operational systems
  • Industry and cross-industry packaged solutions
  • For your business
  • For your career
  • Please complete the session evaluation on the mobile app We app