62
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle Big Data y Database Analytics en el ámbito empresarial J. Andrés Araújo Principal Sales Consultant Technology Sales Consulting Sevilla, 10 Diciembre 2014

Oracle Big Data y Database Analytics - Andres Araujo

Embed Size (px)

DESCRIPTION

Oracle proporciona una solución completa y abierta, sencilla de implementar, que combina hardware y software, para incorporar entornos y arquitecturas Big Data en entornos IT empresariales que requieran elevados niveles de fiabilidad, seguridad y productividad. Con Oracle Big Data SQL es posible mantener múltiples repositorios de información -Hadoop, NoSQL y relacionales- y acceder a ellos de forma unificada mediante SQL con el máximo rendimiento y el mínimo movimiento de información.

Citation preview

Page 1: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data y Database Analytics en el ámbito empresarial

J. Andrés Araújo

Principal Sales Consultant

Technology Sales Consulting

Sevilla, 10 Diciembre 2014

Page 2: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

If you take a

snapshot of a

minute on the global

internet all of these activities

are happening ...

Big Data is the result of a Data Explosion

Page 3: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 98

What Makes it Big Data?

VOLUME VELOCITY VARIETY VALUE

BLOG BLOG

Smart Metering

Social Social

Page 4: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Why Is Big Data Important? Value Creation

HEALTH CARE MANUFACTURING COMMUNICATIONS

“In a big data world, a competitor that fails to sufficiently develop its capabilities will be left behind.”

Reduce Prescription Fraud

Accelerate Test Cycles to Reduce Backlog

Offering New Services based on Location

Data

McKinsey Global Institute

RETAIL

Better Predict Product Success

PUBLIC SECTOR

Improve Student Outcomes

Page 5: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Technology Overview

100

Page 6: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Extending Data Management… Big Data = Hadoop + NoSQL + Relational

Oracle Confidential – Internal/Restricted/Highly Restricted 101

• Run the Business

– Integrate existing systems

– Support mission-critical tasks

– Protect existing expenditures

– Insure skills relevance

Relational Hadoop

• Change the Business

– Disrupt competitors

– Disintermediate supply chains

– Leverage new paradigms

– Exploit new analyses

NoSQL

• Scale the Business

– Meet mobile challenges

– Accelerate developer agility

– Scale-out economically

– Serve data faster

Page 7: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Software framework

Distributed processing in large sets of computers with redundant storage

Highly scalable data processing

Cost-effective model for high volume, low density data

Open source

Batch operation

102

Big Data Technology Today Hadoop & MapReduce

Management/Monitoring

Hadoop Distributed File System (HDFS)

MapReduce

Page 8: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 103

Scanning All The Data Using Map/Reduce

SHUFFLE

/SORT

MAP

MAP

MAP

MAP SHUFFLE

/SORT

REDUCE

REDUCE

SHUFFLE

/SORT

SHUFFLE

/SORT

REDUCE

REDUCE

REDUCE

INPUT 2

OUTPUT 2

OUTPUT 1

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

SHUFFLE

/SORT

INPUT 1

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

Page 9: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Technology Today

• Not-only-SQL (2009)

• Broad class of non-relational DBMS systems that typically – Provide horizontal/distributed scalability

– Avoid joins

– Have relaxed consistency guarantees

– Don’t require a structured schema

– Are application/developer-centric

• No standards – Rapid evolving set of solutions (150+ on nosql-database.org)

– Highly variable feature set

– UnQL launched in July

• Majority are open source

104

NoSQL databases

Page 10: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Key value pair database

Dynamic data model

Highly scalable, available

Transparent load balancing

Commercial software and support

Easy management

Built using Berkeley DB

105

Oracle NoSQL Database

Nodes East

Nodes West

Nodes Central

Nodes

NoSQL Driver

Application

NoSQL Driver

Application

… Nodes

Rea

d

Del

ete

Rea

d

Up

dat

e

Page 11: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Open source language and environment

Used for statistical computing and graphics

Strength in easily producing publication-quality graphs

Highly extensible

Created by Robert Gentleman and Ross Ihaka.

106

Big Data Technology Today R Statistical Programming Language

Page 12: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data The Oracle Proposal

107

Page 13: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Management System

SOU

RC

ES

DATA RESERVOIR DATA WAREHOUSE

Oracle Database

Oracle Industry Models

Oracle Advanced

Analytics

Oracle Spatial & Graph

Big Data Appliance

Apache Flume

Oracle GoldenGate

Oracle Event Processing

Cloudera Hadoop

Oracle NoSQL

Oracle R Advanced Analytics for Hadoop

Oracle R Distribution

Oracle Database

In-Memory, Multi-tenant

Oracle Industry Models

Oracle Advanced Analytics

Oracle Spatial & Graph

Exadata

Oracle GoldenGate

Oracle Event Processing

Oracle Data Integrator

Oracle Big Data Connectors

Oracle Data Integrator

ORACLE BIG DATA SQL

B

Page 14: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 109

Big Data Hardware

Physical Installation (10 racks)

Electricians

Network Engineers

Storage Engineers

System Administrators

286 hours 236 hours, 616 cables

264 hours, 864 cables

320 hours, 576 cables

232 hours

Totals: 1338 people hours, 677 elapsed hours, 2344 cables

Page 15: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Appliance Hardware

• 18 Nodes fully cabled

• 288 Intel® Xeon® E5-2650 V2

• 1152 GB total memory*

• 864 TB total raw storage capacity

• 40Gb/sec InfiniBand Network

• 10Gb/sec Data Center Connectivity

110

X4-2 Full Rack

* Expandable to 9216 GB

Page 16: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 111

Oracle Big Data Appliance Installation

Physical Installation (10 racks)

Electricians

Network Engineers

Storage Engineers

System Administrators

286 hours 236 hours, 616 cables

264 hours, 864 cables

320 hours, 576 cables

232 hours

16 hours 16 hours, 32 cables

6 hours, 14 cables

n/a n/a

38 vs. 1306 hours 19 vs. 677 elapsed hours 46 vs. 2344 cables

vs.

Oracle

Custom

Page 17: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Appliance Hardware

• 6 Nodes fully cabled

• 96 Intel® Xeon® E5 Processors (SandyBridge)

• 384 GB total memory

• 288 TB total raw storage capacity

• 40Gb/sec InfiniBand Network

• 10Gb/sec Data Center Connectivity

• All required switches for growth and Exadata Connectivity

112

X4-2 Starter Rack / In Rack Expansion

Page 18: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Enterprise-ready Big Data platform.

• 100% pure Apache Hadoop

• All components for Hadoop deployment

• Cloudera Manager and all Cloudera subscription products included

Tested by Cloudera

Supported by Oracle

113

Big Data Software Cloudera Distribution Including Apache Hadoop

Coordination

Data Integration Fast

Read/Write Access

Languages / Compilers

Workflow Scheduling Metadata

APACHE ZOOKEEPER

APACHE FLUME, APACHE SQOOP

APACHE HBASE

APACHE PIG, APACHE HIVE, APACHE MAHOUT

APACHE OOZIE APACHE OOZIE APACHE HIVE

File System Mount UI Framework SDK

FUSE-DFS HUE HUE SDK

HDFS, MAPREDUCE

Page 19: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Appliance Software (I)

• Oracle Linux 6.4 with UEK 2 (v2.6.39)

• Oracle Java – JDK 7

• Cloudera CDH 4.4

– including Impala, Hbase, Accumulo and Search

• Cloudera Manager 4.7

– including Backup and Disaster Recovery (BDR) and Navigator

• Big Data Appliance Enterprise Manager Plug-In

• NoSQL DB CE 12cR1

• Oracle R Distribution (Open Source)

114

Pre-installed, pre-Integrated

Page 20: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Appliance Software (II)

• Oracle Big Data Connectors 2.3*

– Oracle SQL Connector for Hadoop

– Oracle Loader for Hadoop

– Oracle XQuery for Hadoop

– Oracle R Advanced Analytics for Hadoop

– Oracle Data Integrator Application Adapter for Hadoop

• Oracle Audit Vault and Database Firewall for Hadoop Auditing*

• Oracle Data Integrator*

• Oracle NoSQL Database Enterprise Edition*

115

Pre-installed, pre-Integrated

* Separately licensed software, can be pre-installed and configured on BDA

Page 21: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 116

Required Skills for MapReduce Development

Java

Hadoop Framework

Parallel Algorithms

Page 22: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 117

A Map/Reduce Pipeline

SHUFFLE /SORT

SHUFFLE /SORT

MAP

MAP

MAP

MAP SHUFFLE

/SORT

REDUCE

REDUCE

SHUFFLE /SORT

SHUFFLE /SORT

REDUCE

REDUCE

REDUCE

INPUT 2

INPUT 1

OUTPUT 2

OUTPUT 1

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

MAP

MAP

MAP

MAP

MAP

REDUCE

REDUCE

REDUCE

Page 23: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integrator

Reduces Hadoop complexities through graphical tooling

Copyright © 2012, Oracle and/or its affiliates. All rights reserved. 118

Page 24: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Data Integrator: NoETL Approach

119

One Logical Design: Many Engine Alternatives:

Data Engines: Examples: Engine I/O: Best Use:

SQL / OLTP Database

• Oracle DBMS • Any OLTP DBMS • DW Appliances

SSD / Disk based

High volumes of transformations on relational data

MapReduce • Hive / MR2 • Pig / Oozie / MR2

SSD / Disk based

Huge batch-like transformations on any data types

In Memory (SQL / Big Data)

• Oracle InMemory • Hive / Tez / YARN • Spark / YARN • Cloudera Impala

D/RAM; with various built in spill to disk approaches

Highly interactive data transformation patterns

Streaming Big Data

• Storm / YARN • Oracle Event

Processor (OEP)

D/RAM; “always on” data pipeline

Very low latency transformations

Modern design studio for simple map development

Team-based GUI Tooling for work on Enterprise projects

Integrated lifecycle and metadata management

Automated support for Changed Data Capture

SEPARATE ETL ENGINE NOT REQUIRED!

Oracle OpenWorld 2014

Data Integrator

Page 25: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2014 120

Oracle Data Integration – Powerful Big Data Solutions

Commodity Data Reservoir Leverage Oracle Data Integration

with a wide array of databases or data warehouse appliances

Support Hadoop distributions on commodity hardware

Oracle Engineered Systems Deeply integrated with Oracle Big

Data Appliance and Exadata Take advantage of Infiniband

performance, Oracle Big Data SQL, Columnar Compression, and all integrated Loader technologies

Streaming Big Data Integrate realtime transactional

databases with streaming analytics Filter, join and transform data while

it is in motion, make business decisions while data is in memory

Page 26: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Most Heterogeneous Solution

Oracle OpenWorld 2014 121

Hadoop HBase Hadoop Hive/Flume HP Enscribe HP NonStop HP Neoview Hypersonic SQL IBM DB2 i Series IBM DB2 UDB IBM DB2 z Series IBM Informix IBM Netezza JMS / MQ Microsoft Access Microsoft SQLServer MySQL Pivotal Greenplum PostgreSQL Salesforce.com SAP BW / BI SAP ERP / ECC SAS SQL/MP SQL/MX Sybase ASE Sybase IQ Teradata

Adaptive Altova Apache Hcatalog Apache Hive/HQL Borland CA ERwin Cloudera Impala COBOL Copybook DataStax Embarcadero EMC ProActivity GentleWare Google BigQuery Grandite Hadapt Hive Hortonworks Hive IBM Cognos IBM DB2 IBM DataStage IBM Discovery IBM Federation Server IBM Lotus Notes IBM Netezza IBM Rational Rose IBM Rational Architect Informatica Metadata Mgr. Informatica PowerCenter

CoSORT ISO SQL Standard (DDL) MapR Hadoop Hive MicroFocus Microsoft Access Microsoft Office Excel Microsoft Visio Microsoft SQL Server Microsoft SSIS Microsoft Visual Studio Microstrategy Magic Draw OMG CWM Standard OMG UML Standard Oracle BI Answers Oracle BI Enterprise Edition Oracle BI Server Oracle DAC Oracle Data Integrator Oracle Data Modeler Oracle Database Oracle Designer Oracle Hyperion Applications Oracle Hyperion Essbase Oracle Warehouse Builder Pivotal Greenplum PostgreSQL

QlikView SAP BO Crystal Reports SAP BO Designer SAP BO Desktop Intelligence SAP BO Repository SAP BO Data Integrator SAP BO Data Steward SAP Master Data Management SAP Sybase PowerDesigner SAP Sybase ASE Database SAS Data Integration Studio SAS BI Server SAS Information Map SAS Metadata Management SAS OLAP Server Select Sparx Architect Syncsort Tableau Talend Teradata Tigris Visible W3C DTD & XSD Schema

Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis) Oracle Database Oracle Exadata Oracle Big Data Appliance Oracle TimesTen Oracle OLAP Oracle Business Intelligence Oracle BI Applications Oracle E-Business Suite Oracle JD Edwards Enterprise One Oracle JD Edwards World Oracle Fusion Applications Oracle Governance Risk and Compliance Oracle Fusion AIA Oracle Retail Applications Oracle Agile BI / DW Oracle Agile PLM for Process Oracle iFlex FlexCUBE Oracle iFlex Mantas Oracle Hyperion Applications Oracle PeopleSoft Oracle Siebel CRM / OnDemand Oracle Communications Oracle WebLogic Server Oracle Coherence Data Grid Oracle SOA Suite Oracle Enterprise Service Bus

+ open APIs and standards based meta-model

Page 27: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Connectors

Page 28: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Connectors

Data Load Oracle Loader for Hadoop

Data Access Oracle SQL Connector for

HDFS

R Analytics Oracle R Advanced Analytics

on Hadoop

Oracle Data Integrator Knowledge Modules

XML/XQuery Oracle XQuery on Hadoop

XQuery R Client

Optimized for Hadoop: Maximise parallelism

Fast performance Analyze data on Hadoop using

familiar client tools

Page 29: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle XQuery for Hadoop

• OXH is a transformation engine for Big Data

• XQuery language executed on the Map/Reduce framework

124

Acquire – Organize – Analyze

Oracle Big Data Connectors

Oracle Data Integrator Oracle

Loader for Hadoop

XQuery

for $ln in

text:collection()

let $f :=

tokenize($ln)

where $f[1] = 'x'

return

text:put($f[2])

Map/Reduce

Execution Plan

M/R

M/R

M/R

M/R

Map/Reduce

Worker Nodes

HDFS

OXH

Engine

Page 30: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Linearly Scale a Robust Set of R Algorithms

Leverage MapReduce for R Calculations

Compute Intensive Parallelism for Simulations

128

R Analytics leveraging Hadoop and HDFS Oracle R Connector for Hadoop

HDFS

Hadoop

Oracle R Client

MAP MAP MAP MAP

REDUCE REDUCE

Page 31: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Integrated R environment

• Native R MapReduce

• Native R HDFS Access

Improved productivity

129

Running R on Hadoop Oracle R Connector for Hadoop

ORE

Client Host

R Engine

Hadoop Cluster

Software

R Engine

MapReduce Nodes

HDFS

Oracle Big Data Appliance

Oracle Exadata

R Engine ORE

ORHC ORHC

Page 32: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Loader for Hadoop

• Parallel load, optimized for Hadoop

• Automatic load balancing

• Convert to Oracle format on Hadoop

– Save database CPU

• Load specific Hive partitions

• Kerberos authentication

• Load directly into In-Memory table

JSON Log

files Hive

Text Parquet Avro Sequence

files

Compressed

files And more …

Page 33: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle SQL Connector for HDFS

OSCH

Hive Text

OSCH

OSCH

OSCH

External

Table

create table customer_address

( ca_customer_id number(10,0)

, ca_street_number char(10)

, ca_state char(2)

, ca_zip char(10))

organization external (

TYPE ORACLE_LOADER

DEFAULT DIRECTORY DEFAULT_DIR

ACCESS PARAMETERS

(…)

PREPROCESSOR “HDFS_BIN_PATH:hdfs_stream”)

LOCATION (‘addr1’, ‘addr2’, ‘addr3’))

• Parallel query and load

• Load into database or query in place

• Access text or Hive over text

• Access compressed data

• Access specific Hive partitions

• Kerberos authentication Compressed

files

Page 34: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle SQL Connector for HDFS

• Includes tool to generate external table

• Performance on Engineered Systems

– 15 TB/hour load time

• Query and load Oracle Data Pump files

– Binary file in Oracle format

– Uses less database CPU cycles during query/load

Page 35: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 135

Oracle In-Database Unified Analytics Platform

XML Relational OLAP Spatial

Data Layer RDF Media

Parallel Processing Engine

Oracle R Enterprise

Oracle Data Mining

Text and Search

Spatial Analytics

SQL Analytics

Oracle MapReduce

Page 36: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 136

In-Database Map/Reduce Oracle Database

Reduce

Table

Map

Map Reduce

Table K V

timestamp userid pageid

10:00:00 12345 A73_2

10:00:02 8901 A74_3

10:00:03 12345 A73_3

10:01:12 12345 A74_4

session userid pageid duration

0 12345 A73_2 3

0 12345 A73_3 70

0 12345 A74_4 12

1 8901 A74_3 89

MapReduce within the Oracle Database:

select session, userid, pageid, duration

from table(oracle_map_reduce.reducer(cursor(

select * from table(oracle_map_reduce.mapper(cursor(

select * from clicks))) map_result)));

=> Works on internal and external data sources

=> Leverage PL/SQL skills for big data analytics

=> High efficiency through parallel pipelined infrastructure

=> In-database execution allows for fast query performance

Page 37: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

R code and/or SQL

Models run in-database

Avoid Data Movement

Processes large data sets

Uses the power of Oracle Database 11g, 12c and Exadata

Same code, much faster

137

Oracle Advance Analytics Oracle R Enterprise Approach

Page 38: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

More Powerful Together

Business Intelligence and Information Discovery

Optimized for Exalytics In-Memory

Machine

Analysis Problems Measure, Analyze, Report

Discovery Problems Investigate, Explore, Understand

Unstructured Data Diverse, textual,

uncertain quality

Structured Data Modeled and

conforming

Oracle Business Intelligence

Proven Answers to Known Questions

Oracle Endeca

Information Discovery

Fast Answers to New Questions Insights yield new

metrics to monitor,

data to integrate

New questions

require exploration,

new information;

Leverage existing

investments

Page 39: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 139

Extend Business Analytics with Unstructured Data

Oracle Endeca Information Discovery

Social Media Content Systems,

Files, Email

Websites

Unstructured Data

Big Data

Oracle Endeca Information Discovery Best platform for Unstructured Analytics

Endeca Server Hybrid Search/Analytical Database

Flexible Data Model

Oracle Business Intelligence Best platform for integrated ROLAP and MOLAP

BI Server + OLAP Common Enterprise Information Model

OLTP & ODS

Systems

Enterprise Applications

(Oracle, SAP, Others)

Data Warehouse

& Data Marts

Structured Data

Page 40: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Evolution of Analytical SQL

• Introduction of “window” functions

• Enhanced window functions (percentile, etc)

• Rollup, grouping sets, cube

• Statistical functions

• SQL model clause

• Partition Outer Join

• In-database Data Mining

• SQL Pivot

• Recursive WITH

• ListAgg, Nth value window

• Pattern matching

• Top N clause

• Approx Count distinct

• JSON support

8i 9i 10g 11g 12c

Page 41: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Barriers to Big Data Adoption Complexity

• Skills

– Lack tools and training to exploit Big Data

– IT Operations ability administer and manage Big Data

• Integration – Adding Big Data to existing architecture is complex

– Too much effort required in data preparation

• Security

– No clear route to governance or enforcement

Page 42: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Warehouses

Business Analytics

Evolution of Big Data Analytics in the Enterprise

Transactional Applications

Operational Reporting

Social Media

Internet of Things

73°

Big Data Platform

Page 43: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Analytics Challenge Separate silos of information to analyze

Page 44: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Analytics Challenge Separate data access interfaces

Page 45: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Data Analytics Challenge No comprehensive SQL interface across Oracle, Hadoop and NoSQL

Page 46: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What customers want Rich, comprehensive SQL access to all enterprise data

NoSQL

Page 47: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

What gives Exadata extreme performance?

Oracle Database 12c

SQL

Offload Query to Exadata Storage Servers

Small data subset quickly returned

Hadoop & NoSQL

Page 48: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Introducing Oracle Big Data SQL Massively Parallel SQL Query across Oracle, Hadoop and NoSQL

Oracle Database 12c

Offload Query to Exadata Storage Servers

Small data subset quickly returned

Offload Query to Data Nodes

SQL

data subset

SQL

Hadoop & NoSQL

Page 49: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Storage Layer

Oracle Confidential – Internal/Restricted/Highly Restricted 149

Oracle Big Data SQL: A New Hadoop Processing Engine

Filesystem (HDFS) NoSQL Databases

(Oracle NoSQL DB, Hbase)

Resource Management (YARN, cgroups)

Processing Layer

MapReduce and Hive

Spark Impala Search Big Data

SQL

Page 50: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data SQL

150

SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id;

Relevant SQL runs on BDA nodes

10’s of Gigabytes of Data

Only columns and rows needed to answer query are returned

Hadoop Cluster

B B B

Big Data SQL

Oracle Database

CUSTOMERS WEB_LOGS

Page 51: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data SQL

151

SELECT w.sess_id, c.name FROM web_logs w, customers c WHERE w.source_country = ‘Brazil’ AND w.cust_id = c.customer_id;

Relevant SQL runs on BDA nodes

10’s of Gigabytes of Data

Only columns and rows needed to answer query are returned

Hadoop Cluster

B B B

Big Data SQL

Oracle Database

CUSTOMERS WEB_LOGS

SQL Push Down in Big Data SQL

• Hadoop Scans on Unstructured Data • WHERE Clause Evaluation • Column Projection • Bloom Filters for Better Join Performance • JSON Parsing, Data Mining Model Evaluation

Page 52: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Why Make Big Data a Divided World?

VS

Page 53: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Unified Big Data Environment

VS &

Page 54: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Securing Big Data

• Increasingly, Big Data solutions are capturing sensitive information must be protected and audited

• This is no different than critical data stored in an RDBMS

Page 55: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Enhanced Big Data Security

Authenticate users with secure Kerberos protocol

Authorize access to data with fine grained controls

Audit activity and access with Oracle Audit Vault and Database Firewall

Encrypt data as it flows thru the system*

*Planned for v2.3.2

Page 56: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Management System

SOU

RC

ES

DATA RESERVOIR DATA WAREHOUSE

Oracle Database

Oracle Industry Models

Oracle Advanced

Analytics

Oracle Spatial & Graph

Big Data Appliance

Apache Flume

Oracle GoldenGate

Oracle Event Processing

Cloudera Hadoop

Oracle NoSQL

Oracle R Advanced Analytics for Hadoop

Oracle R Distribution

Oracle Database

In-Memory, Multi-tenant

Oracle Industry Models

Oracle Advanced Analytics

Oracle Spatial & Graph

Exadata

Oracle GoldenGate

Oracle Event Processing

Oracle Data Integrator

Oracle Big Data Connectors

Oracle Data Integrator

ORACLE BIG DATA SQL

B

Page 57: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Oracle Big Data Platform Big Data Management System

z BY INDUSTRY & LINE OF BUSINESS

BIG

DA

TA

AP

PLI

CA

TIO

NS

DISCOVERY

BU

SIN

ESS

AN

ALY

TIC

S

BUSINESS ANALYTICS

DATA RESERVOIR

BIG

DA

TA

MA

NA

GEM

ENT

DATA WAREHOUSE

SOU

RC

ES

ORACLE BIG DATA SQL

Page 58: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Big Data Why Oracle?

158

Page 59: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |

Unified Data Platform

Advanced Query & Analysis Full Power of SQL and Advanced Analytics

Transparent to Applications No Changes to Application Code

Single View of All Data Unified Metadata Across RDBMS & Hadoop

Fastest Performance Utilize SQL Processing Across the Platform

Leverage Existing Skills Lower Cost & Complexity of Big Data Adoption

Page 60: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 160

Questions & Answers

Page 61: Oracle Big Data y Database Analytics - Andres Araujo

Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 161

Page 62: Oracle Big Data y Database Analytics - Andres Araujo