Upload
jeffrey-t-pollock
View
302
Download
4
Embed Size (px)
DESCRIPTION
OpenWorld 2014, Big Data Integration
Citation preview
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration: CON7934Tapping into the Big Data Reservoir with All Data
Jeff PollockVice President, Oracle Data Integration
1Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
2Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Today’s Agenda
3
Oracle Data Integration Solutions
Big Data Reservoir
• Next generation data platform architecture on Hadoop
Oracle Data Integration for Big Data Reservoir
• Take complete advantage of the modern Big Data platform and leave legacy ETL tools behind
Proven Results with Big Data
• Beyond theory, early adopters getting benefits NOW!
1
2
3
4
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration Solutions and Proven Benefits
Oracle OpenWorld 2014 4
Improve Agility• Deploy Projects Faster
• Reliable Real-Time
Reduce Risk• Popular, Proven Tools
• Open, Not Proprietary
Reduce Costs• Better Productivity
• Eliminate ETL Servers
Analytic Data Integration• Big Data Integration & Governance• Data Warehouse Integration• Business Intelligence Applications
Enterprise Data Integration and Governance• Enterprise Data Quality and Profiling• Comprehensive, Heterogeneous Data Integration• Business Glossary and Metadata Management
Business Continuity• Active-Active for Maximum Availability• Zero Downtime Migrations• Data Consolidation / Application Modernization
24 x 7 x 365
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Comprehensive Data Integration & Governance Capabilities
Oracle OpenWorld 2014 5
Real-Time Data Movement– Low impact capture, stage in Hadoop– Continuous data availability
Data Transformation– Bulk data movement– Pushdown data processing
Data Federation– Virtualized Data Services
Data Quality & Verification– Fix quality at the source– Verify data consistency
Metadata Management– Lineage and Impact Analysis– Business Glossary Semantics
Data GovernanceFoundation
Oracle Data Integrator(Transformation)
Enterprise Data Quality(Profile, Cleanse, Match and De-duplicate)
FastLoad
Oracle GoldenGate(Movement)
Enterprise Metadata Management & Business Glossary(Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
Data Service Integrator(Federation)
GoldenGate Veridata(Online Data Verification)
ELT Processingon Hadoop or SQL
Continuous Availability
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Data GovernanceFoundation
Differentiated Technical Approach
Oracle OpenWorld 2014 6
Dynamic Data Movement– Real-time CDC is by default, not ETL– Least invasive on sources– Proven best performance– Integrated Oracle capture/apply
No ETL Engines– Take the processing to the data;
don’t move the data to the process– Leverage your data engines for the
workloads (Hadoop or SQL)
Most Heterogeneous– Leverage open source Hadoop, not
proprietary distributions– Hadoop is the Hub, not ETL tools– Open metadata standards
Oracle Data Integrator(Transformation)
Enterprise Data Quality(Profile, Cleanse, Match and De-duplicate)
FastLoad
Oracle GoldenGate(Movement)
Enterprise Metadata Management & Business Glossary(Business Glossary, Data Lineage, Impact Analysis and Data Provenance)
Data Service Integrator(Federation)
GoldenGate Veridata(Online Data Verification)
ELT Processingon Hadoop or SQL
Continuous Availability
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Today’s Agenda
7
1
2
3
4
Oracle OpenWorld 2014
Oracle Data Integration Solutions
Big Data Reservoir
• Next generation data platform architecture on Hadoop
Oracle Data Integration for Big Data Reservoir
• Take complete advantage of the modern Big Data platform and leave legacy ETL tools behind
Proven Results with Big Data
• Beyond theory, early adopters getting benefits NOW!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Why the word “Reservoir?”
8
https://blogs.oracle.com/bigdata/entry/big_data_and_analytic_top
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
True Hadoop Opportunity: Big Data Reservoir
9
Deep DataStorage
Data Preparation
Data Discovery
Data staged / merged in
Hadoop to provide single place
to explore/discover data
External data staging and long
running batch jobs run in Hadoop
to make the most of the DB
Store more raw detail data for
less cost, while keeping
aggregates in the DB
DW
Support for Exploratory Analytics
without time consuming data
modeling
Lower cost data staging and data
preparation
Lower cost storage for
questionable business data
Data Staging & Preparation
New Data Discovery
Detailed, Deep Data
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 10
Reports & Dashboards
Query Planning
Data Integration
Data Modelling
Database Mgmt.
Data Visualization
Query Construction
Data Enrichment
Data Preparation
Data Exploration
Data Acquisition
Operational Responsibilities
Data Science & Discovery
Operational Data Flow and Staffing Models
Oracle OpenWorld 2014
Data Scientists
DBAs, Developers, Data Stewards, Analysts
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Logical Architecture – Seamless Data Integration is Crucial
11
Virtu
alis
atio
n &
Qu
ery
Fe
de
ratio
n
Enterprise Performance Management
Pre-built & Ad-hoc BI Assets
Information
Services
Data Ingestion
Information Interpretation
Access & Performance Layer
Foundation Data Layer
Raw Data Reservoir
Data Science
Data Engines & Poly-structured sources
Content
Docs Web & Social Media
SMS
StructuredDataSources
• Operational Data
• COTS Data
• Streaming & BAM
Immutable raw data reservoir
Raw data at rest is not interpreted
Immutable modelled data. Business
Process Neutral form. Abstracted
from business process changes
Past, current and future interpretation of
enterprise data. Structured to support agile
access & navigation
Discovery Lab Sandboxes Rapid Development Sandboxes
Project based data stores
to support specific
discovery objectives
Project based data stored
to facilitate rapid content /
presentation delivery
Data Sources
Master & ReferenceData Sources
DataIntegration & Governance
DataIntegration & Governance
DI&
G
DI&
G
DI&
G
DI&
G
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Concrete Business Value with Big Data Reservoir
Oracle OpenWorld 2014 12
Lower TCO for the Data
Warehouse
LoB Faster Access to
Analytic Data
New Types of Analytics for
All Data• Control the costs of the Data
Warehouse
• Massive value multipliers for Teradata and Netezzacustomers
• Put an end to the annual upgrade cycle
• Give analytics to the business earlier in the data lifecycle
• Avoid up front modelling overhead for Discovery
• Empower IT to focus on highest value analytics
• Run BI queries faster
• Support Exploratory Analytics directly from Hadoop
• Run Streaming Analytics from OEP, Storm, Flume etc.
• Drive new business solutions (telematics data, machine data, log data, unstructured data)
COST SPEED VALUE
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 13
Top US AutomakerOracle Data Integration for RealtimeData Delivery to Hadoop Reservoir
Petabyte Scale
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Today’s Agenda
14
1
2
3
4
Oracle OpenWorld 2014
Oracle Data Integration Solutions
Big Data Reservoir
• Next generation data platform architecture on Hadoop
Oracle Data Integration for Big Data Reservoir
• Take complete advantage of the modern Big Data platform and leave legacy ETL tools behind
Proven Results with Big Data
• Beyond theory, early adopters getting benefits NOW!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Data Integration – Powerful Big Data Solutions
15
Commodity Data Reservoir Leverage Oracle Data Integration
with a wide array of databases or data warehouse appliances
Support Hadoop distributions on commodity hardware
Oracle Engineered Systems Deeply integrated with Oracle Big
Data Appliance and Exadata Take advantage of Infiniband
performance, Oracle Big Data SQL, Columnar Compression, and all integrated Loader technologies
Streaming Big Data Integrate realtime transactional
databases with streaming analytics Filter, join and transform data while
it is in motion, make business decisions while data is in memory
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Heterogeneous Reservoir with Oracle Data Integration
16
FlumeHive on MR, Tez, Spark
Logs
OLTP DB
SQOOP
OGG
Pig on MR, Tez, Spark
ODI
SQOOP
Any DW
OGG
Spark
Oozie
OEDQ OEMM
Data Validation & Cleansing
Metadata Mgmt& Lineage
API/File
Hive/HCat,HDFS,HBase
Hive/HCat,HDFS,HBase
NoSQL
Flume
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2014 17
European Energy Co.Oracle Data Integration for
Data Staging and Transformingin Hortonworks
Real-Time to Hive
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Load to Oracle
OLH/OSCH
Red Stack Reservoir with Oracle Data Integration
18
TransformHive
ODI
Hive/HDFS
Federate Hive/HDFS to Oracle
Big Data SQL
Oracle DB OLTP
Load from Oracle
CopyToBDA
Hive/HDFS
Federate Oracle to Hive
Query Provider for Hadoop
OGGOGG Hive/HDFS
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Engineered System for Big Data from Oracle
19
DISK
PCI
FLASH
DRAM
Warm
Data
Hottest Data
Active Data
• Engineered data platform
• ODI Data Transformation at the
speed of DRAM or the scale of
Hadoop
• Utilize each data tier for
specialized algorithms &
compression
• Speed of DRAM
• I/Os of Flash
• Cost of Disk
• Scale of HadoopHadoop
DISKSDeep Data
Oracle Data Integrator
Oracle GoldenGate
Fully exploit Big Data SQL, In-Memory and No-SQL Advancements from Oracle
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 20
Top European BankOracle Data Integration MapReduce Data
Transformations in Big Data Appliance
Massively Parallel
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Streaming Reservoir with NoSQL and DIS
21
Transform(Hive, Pig/Oozie, Spark)
ODI
Federate Hive/HDFS
Big Data SQL
OracleNoSQL
Hive/HDFS
OGG
OGG
Hive/HDFSAny DB
Sensors & Events
Hive/HDFS
OEP
Load to Oracle
OLH/OSCH
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 22
US Digital TV ProviderOracle Data Integration with
Hadoop & Kafka
100m Tx/Hr
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Today’s Agenda
23
1
2
3
4
Oracle OpenWorld 2014
Oracle Data Integration Solutions
Big Data Reservoir
• Next generation data platform architecture on Hadoop
Oracle Data Integration for Big Data Reservoir
• Take complete advantage of the modern Big Data platform and leave legacy ETL tools behind
Proven Results with Big Data
• Beyond theory, early adopters getting benefits NOW!
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | Oracle OpenWorld 2014 24
The 90’s Are Calling: Don’t Custom Code Data Integration!!Hey big data coders! Yes, all you out there writing your
data load programs in Scala, PigLatin, HiveQL or Java MR….
Custom coded data loading is BAD, stay away!
Been there, done that with C++, Pipes
and Pro*C
Debugging kills, live data is always bad, downtime is a major bummer, projects can’t scale to large teams…
When you are past Discovery and in to
Operations, use enterprise tools for their reliability and reach into existing
IT systems.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Integration Better
25
Dynamic Data Movement– CDC is by default, not an add-on– Least invasive on sources– Proven best performance– Native Oracle capture/apply
NoETL Engine– Take the processing to the data;
don’t move the data to the process– Leverage your data engines for the
workloads (Hadoop or SQL)
Most Heterogeneous– Leverage open source Hadoop, not
proprietary distributions– Hadoop is the Hub, not ETL tools– Open metadata standards
vs.
Batch Data Movement– Typical ETL vendors all default to batch data
movement in their reference architectures– Some can “talk the talk” but their CDC tech can’t
touch Oracle GoldenGate scale/performance
ETL Engine Must Scale Alongside Hadoop– Carefully watch how ETL engines scale out;
parallelism runs via the Engine – more H/W to buy– Map out the physical deployment architecture,
compare to GG&ODI, the benefits will be clear
Proprietary Vendor Lock-in– One popular ETL vendor puts their engines at the
center of the architecture, not Hadoop– The mainframe of ETL vendors has proprietary
features that mainly run in their own distro– “Fake free” ETL vendors sell proprietary add-ons
vs.
vs.
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: Dynamic Data Movement
26Oracle OpenWorld 2014
HDFS (Files)
HBase (NoSQL)
Hive / Hive Streaming (SQL)
Flume & Storm (Streaming)
Kafka (MPP Pub/Sub)
Spark Streaming (Machine Learning)
Capture Database Transactions and Deliver to Big Data in Real-Time
Ca
ptu
re
Tra
il
Ro
ute
De
live
r
Pu
mp
GoldenGate
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: Invented Pushdown Processing
27
OR
CL In
ve
stm
en
ts in E
LT
/Pu
sh
dow
n T
ech
Scripted
SQL
Stored
Procs
Warehouse
Builder
Data
Integrator
(Heterogeneous)
ODI for
Columnar
DBs
ODI for
In-Memory
DBs
ODI for
Engineered
Systems
ODI for
Hadoop
NoSQL
ODI for
Hadoop
Pig & Oozie
ODI for
Spark
ODI for …
1990’s
Eon of Scripts and PL-SQL Era of Native SQL Big Data Revolution
Oracle’s tool maturity and operational know-how for E-LT is unmatched
10x bigger footprint with E-LT than next closest competitor using “pushdown”
Simple and easy way to blend Hadoop and SQL E-LT execution from one tool
ODI for
Hadoop
Hive
Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: NoETL Approach
28
One Logical Design: Many Engine Alternatives:
Data Engines: Examples: Engine I/O: Best Use:
SQL / OLTP Database
• Oracle DBMS• Any OLTP DBMS• DW Appliances
SSD / Diskbased
High volumes of transformations on relational data
MapReduce • Hive / MR2• Pig / Oozie / MR2
SSD / Disk based
Huge batch-like transformations on any data types
In Memory(SQL / Big Data)
• Oracle InMemory• Hive / Tez / YARN• Spark / YARN• Cloudera Impala
D/RAM; with various built in spill to disk approaches
Highly interactive data transformation patterns
StreamingBig Data
• Storm / YARN• Oracle Event
Processor (OEP)
D/RAM;“always on” data pipeline
Very low latencytransformations
Modern design studio for simple map development
Team-based GUI Tooling for work on Enterprise projects
Integrated lifecycle and metadata management
Automated support for Changed Data Capture SEPARATE ETL ENGINE NOT REQUIRED!
Oracle OpenWorld 2014
Data Integrator
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: Most Open & Heterogeneous
Oracle OpenWorld 2014 29
Hadoop HBase Hadoop Hive/Flume HP Enscribe HP NonStop HP Neoview Hypersonic SQL IBM DB2 i Series IBM DB2 UDB IBM DB2 z Series IBM Informix IBM Netezza JMS / MQ Microsoft Access Microsoft SQLServer MySQL Pivotal Greenplum PostgreSQL Salesforce.com SAP BW / BI SAP ERP / ECC SAS SQL/MP SQL/MX Sybase ASE Sybase IQ Teradata
Adaptive Altova Apache Hcatalog Apache Hive/HQL Borland CA ERwin Cloudera Impala COBOL Copybook DataStax Embarcadero EMC ProActivity GentleWare Google BigQuery Grandite Hadapt Hive Hortonworks Hive IBM Cognos IBM DB2 IBM DataStage IBM Discovery IBM Federation Server IBM Lotus Notes IBM Netezza IBM Rational Rose IBM Rational Architect Informatica Metadata Mgr. Informatica PowerCenter
CoSORT ISO SQL Standard (DDL) MapR Hadoop Hive MicroFocus Microsoft Access Microsoft Office Excel Microsoft Visio Microsoft SQL Server Microsoft SSIS Microsoft Visual Studio Microstrategy Magic Draw OMG CWM Standard OMG UML Standard Oracle BI Answers Oracle BI Enterprise Edition Oracle BI Server Oracle DAC Oracle Data Integrator Oracle Data Modeler Oracle Database Oracle Designer Oracle Hyperion Applications Oracle Hyperion Essbase Oracle Warehouse Builder Pivotal Greenplum PostgreSQL
QlikView SAP BO Crystal Reports SAP BO Designer SAP BO Desktop Intelligence SAP BO Repository SAP BO Data Integrator SAP BO Data Steward SAP Master Data Management SAP Sybase PowerDesigner SAP Sybase ASE Database SAS Data Integration Studio SAS BI Server SAS Information Map SAS Metadata Management SAS OLAP Server Select Sparx Architect Syncsort Tableau Talend Teradata Tigris Visible W3C DTD & XSD Schema
Operational Integration (Movement / Transformation) Metadata Harvesting (Glossary, Lineage & Impact Analysis) Oracle Database Oracle Exadata Oracle Big Data Appliance Oracle TimesTen Oracle OLAP Oracle Business Intelligence Oracle BI Applications Oracle E-Business Suite Oracle JD Edwards Enterprise One Oracle JD Edwards World Oracle Fusion Applications Oracle Governance Risk and Compliance Oracle Fusion AIA Oracle Retail Applications Oracle Agile BI / DW Oracle Agile PLM for Process Oracle iFlex FlexCUBE Oracle iFlex Mantas Oracle Hyperion Applications Oracle PeopleSoft Oracle Siebel CRM / OnDemand Oracle Communications Oracle WebLogic Server Oracle Coherence Data Grid Oracle SOA Suite Oracle Enterprise Service Bus
+ open APIs and standards based meta-model
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Does Big Data Better: Clear Business Benefits
30
Proven
Technology
Better
Architecture
Best for
Oracle• Unlike custom coding, a tools
based approach is proven to result in lower cost long term operations
• Oracle GoldenGate is industry standard for Data Replication
• Oracle invented E-LT Pushdown processing and is 10x more widely deployed than competitors
• Oracle GoldenGate provides the most scalable, native integration for database replication
• Oracle Data Integrator provides ultimate scalability and choice for Hadoop data transformations
• Consistent agent-based architecture avoids having multiple, incompatible engines (eg; old style ETL tools)
• Exadata – OGG and ODI are deeply integrated and are the only Replication and ETL processes certified to run on the appliance
• Big Data Appliance – deeply integrated technology part of core reference architecture
• Big Data Connectors – ODI included with core connector technologies for Hadoop
RISK SCALE COMPLETE
Heterogeneous Access
Oracle OpenWorld 2014
Copyright © 2014, Oracle and/or its affiliates. All rights reserved. |
Join the Community
#OOW14 #ODI12c #GoldenGate12c #EDQ12c
Oracle Data Integration blog
blogs.oracle.com/dataintegration
Connect with Oracle on Social Media
OR connect via the web
Oracle Data Integration Home Page
oracle.com/goto/dataintegration
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
2014
2014 Oracle Excellence Award Ceremony for Fusion Middleware Innovation
ORACLE FUSION MIDDLEWARE:CELEBRATE THIS YEAR'S MOST INNOVATIVE CUSTOMER SOLUTIONS
Tuesday, September 30, 2014 5:00-5:45pm YBCA Theater (next to Moscone North)Session ID: CON7029
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Oracle Fusion MiddlewareThe Cloud Platform for Digital Business
CloudOn-Premise
DIGITAL ENGAGEMENT
APPLICATION & DATA INTEGRATIONIDEN
TITY
MA
NA
GEM
ENT
SYST
EMS
MA
NA
GEM
ENT
APPLICATION INFRASTRUCTURE & TOOLS
BUSINESS PROCESS MANAGEMENT
BUSINESS ANALYTICSCONTENT & COLLABORATION
Web Mobile Social Internet of Things
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Questions and Answers
34Oracle OpenWorld 2014
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. | 36Oracle OpenWorld 2014