22
Oracle Big Data: Interactive Quick Reference

Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Oracle Big Data: Interactive Quick Reference

Page 2: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Copyright © 2019, Oracle and/or its affiliates. All rights reserved.

This software and related documentation are provided under a license agreement containing restrictions on useand disclosure and are protected by intellectual property laws. Except as expressly permitted in your licenseagreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit,distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering,disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited.

The information contained herein is subject to change without notice and is not warranted to be error-free. If youfind any errors, please report them to us in writing.

If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it onbehalf of the U.S. Government, then the following notice is applicable:

U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, anyprograms installed on the hardware, and/or documentation, delivered to U.S. Government end users are"commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specificsupplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs,including any operating system, integrated software, any programs installed on the hardware, and/ordocumentation, shall be subject to license terms and license restrictions applicable to the programs. No otherrights are granted to the U.S. Government.

This software or hardware is developed for general use in a variety of information management applications. It isnot developed or intended for use in any inherently dangerous applications, including applications that maycreate a risk of personal injury. If you use this software or hardware in dangerous applications, then you shall beresponsible to take all appropriate fail-safe, backup, redundancy, and other measures to ensure its safe use.Oracle Corporation and its affiliates disclaim any liability for any damages caused by use of this software orhardware in dangerous applications.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks oftheir respective owners.

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks areused under license and are trademarks or registered trademarks of SPARC International, Inc. AMD, Opteron, theAMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices.UNIX is a registered trademark of The Open Group.

This software or hardware and documentation may provide access to or information about content, products,and services from third parties. Oracle Corporation and its affiliates are not responsible for and expresslydisclaim all warranties of any kind with respect to third-party content, products, and services unless otherwiseset forth in an applicable agreement between you and Oracle. Oracle Corporation and its affiliates will not beresponsible for any loss, costs, or damages incurred due to your access to or use of third-party content,products, or services, except as set forth in an applicable agreement between you and Oracle.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 2

Page 3: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Big Data Conceptual Architecture

Discovery Lab

Streaming Engine Data Lake Enterprise Data and Reporting

InnovationExecution

ActionableMetrics

ActionableData Sets

ActionableEvents

Data DiscoveryOutput

StructuredEnterprise

DataInputEvents

Oracle’s Information Management Conceptual Architecture shows key components and flows in a compactform. Oracle Big Data Cloud services delivers a broad and integrated portfolio of products and engineeredsystems. It helps you acquire and organize the diverse data sources and analyze them alongside your existingdata to find new insights and capitalize on hidden relationships.

1. Input Events: Input events are those data from different sources that are given as input to the StreamingEngine. Data sources are potential sources of raw data which are required by the business to address itsinformation requirements. Sources include both internal and external system – including IoT, websites,and more. Data from these systems will vary in structure and presentation method.

2. Streaming Engine: The streaming engine takes the data published by the producers, persists it, andreliably delivers it to consumers.

3. Actionable Events: Actionable Events are events which lets you take the next-best action for the plan tosucceed.

4. Data Lake: A data lake is a storage repository that holds a vast amount of raw data in its original format,including structured, semi-structured, and unstructured data. With a data lake, you just load in the rawdata, as-is, and then when you’re ready to use the data, that’s when you give it shape and structure.That’s called schema-on-read. A data lake allows for ad hoc discovery, organization, and enrichment ofunmodelled data before it moves to more refined sets of analytics tools. It typically captures its data in aHadoop cluster or Object Store.

5. Actionable Data Sets: Actionable data set is a piece of information that enables you to make an informeddecision. They are usually derived by synthesizing vast amounts of data into crisp and concisestatements..

6. Enterprise Data & Reporting: Enterprise Data is a large scale formalized and modeled business criticaldata store. It is typically represented by an Enterprise Data Warehouse. A warehouse stores productsready for consumers. This data, when gathered, cleansed, and formatted for reporting and analysispurposes, constitutes the bulk of traditional structured data warehouses, data marts, and OLAP. IncludesBI tools and infrastructure components for timely and accurate reporting. In this phase, users may beengineers using the data for their systems, analysts, or decision makers.

7. Actionable Metrics: Actionable metrics translate data into something useful that helps you to make adecision about your future plans going forward. A metric generally adds context to data; how it compares

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 3

Page 4: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

to history, a benchmark, etc.8. Structured Enterprise Data: Structured Enterprise Data are those data that originate from the internal and

external enterprise systems (e.g. ERP, HR, etc.). They are usually processed and have a defined structureto it. This includes data contained in relational databases and spreadsheets.

9. Execution: The interplay of the components and their assembly into solutions are divided into executionand innovation divisions. The execution division contains those tasks which support and inform dailyoperations. This arrangement of solutions on either side helps inform system requirements for security,governance, and timeliness.

10. Innovation: The interplay of the components and their assembly into solutions are divided into executionand innovation. The innovation division contains those tasks which drive new insights back to thebusiness. This arrangement of solutions on either side helps inform system requirements for security,governance, and timeliness.

11. Discovery Lab: The Discovery Lab is a distinct design pattern within the conceptual architecture. It has aset of data stores, processing engines, and analysis tools that are separate from the everyday processingof data.

12. Data: Data is given as input to the discovery lab for analysis.13. Discovery Output: The discovery labs provides deployable code (Actionable Events), scores or some

interesting phenomena in the data as the discovery output. This may be a fraud prediction, next best offer,etc.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 4

Page 5: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Key Technologies

Streaming Engine

NoSQL

InnovationExecution

InputEvents

StructuredEnterpriseData

DataDiscoveryOutput

Enterprise Data and Reporting

Discovery Lab

Notebooks/Analytic Services

Data Lake

Hadoop/HDFSObjectStore

Data Visualization

Data Virtualization

The key technologies of the Big Data Architecture are:

1. Apache Kafka: Apache Kafka is an publish-subscribe messaging system that is exchanging data betweenprocesses, applications, and servers. In Kafka, the messages are immediately written to file system andreplicated within the cluster to prevent data loss. Kafka is used for real-time streams of data, to collect bigdata, or to do real time analysis (or both).

2. NoSQL: NoSQL databases are highly scalable and flexible database management systems which allowsyou to store and process unstructured as well as semi-structured data. It uses a Key-Value data model.There are a variety of NoSQL databases in the market. NoSQL databases are frequently used to acquireand store big data. For example, NoSQL databases are often used to collect and store social media.

3. Spark Streaming: Spark Streaming is an extension of Spark. It extends Spark for doing large scale streamprocessing. Spark Streaming supports both Java and Scala, which makes it easy for users to map, filter,join, and reduce streams (among other operations) using functions in the Scala/Java programminglanguage. Spark Streaming reads data from a Kafka topic, processes it and writes processed data to anew topic where it becomes available for users and applications. For more information, see SparkStreaming.

4. Object Store: The Object Store can store an unlimited amount of unstructured data of any content type,including analytic data and rich content, like images and videos.

5. Hadoop/Hdfs: Hadoop is a distributed framework for enormous amounts of data. It is an open-sourceframework that allows you to store and process big data in a distributed environment across clusters ofcomputers using simple programming models.

6. DataWarehouse: A data warehouse is a strategic collection of all types of data in support of the decision-making process at all levels of an enterprise. This includes all types of data stores that maintaininformation for historical and analytical purposes. The historical data stores consolidate large quantities ofinformation in a manner that best maintains historical integrity. Analytical data stores are designed tosupport analysis by maximizing ease of access and query performance. Technologies and schemadesigns consist of dimensional data models, OLAP cubes, etc.

7. Data Visualization: Data visualization describes the presentation of abstract information in graphical form.Data visualization allows us to identify patterns, trends, and correlations that otherwise might go

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 5

Page 6: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

unnoticed in traditional reports, tables, or spreadsheets. You can discover the insights hidden in yourdata, with rich, interactive visuals using data visualization.

8. Data Virtualization: Data virtualization technology provides a single point of access to the data byaggregating it from a wide range of data sources. The process of data virtualization involves abstracting,transforming, federating and delivering data from disparate sources.

9. Notebooks/Analytics Services: Notebooks are used by data scientists for quick exploration tasks.Analytics Notebooks enables easy access to both data and computing power. For example: ApacheZeppelin is a web-based notebook that enables data-driven,interactive data analytics and collaborativedocuments with SQL, Scala and more.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 6

Page 7: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Oracle Products

Data Lake

Oracle Object Storage

Oracle Big Data

Oracle BigData Cloud

Big Data SQL

Oracle Analytics Cloud

InnovationExecution

Enterprise Data and Reporting

Oracle ExadataCloud Service

Oracle DatabaseCloud Service

Oracle Autonomous Data Warehouse

Streaming Engine

Oracle Data Hub

Oracle Event Hub Oracle Stream Analytics

Oracle Data Integration Platform Cloud

Discovery Lab

Oracle AdvancedAnalytics

Oracle DataMining

Oracle REnterprise

Oracle Big DataSpatial and Graph

Oracle R AdvancedAnalytics for Hadoop

InputEvents

StructuredEnterpriseData

Data DiscoveryOutput

Oracle products for Big Data Architecture are:

1. Streaming Engine: Oracle Event Hub, Oracle Data Hub, Oracle NoSQL Database, Oracle StreamAnalytics

2. Data Lake Oracle Object Storage, Oracle Big Data Cloud, Oracle Big Data3. Enterprise Data and Reporting: Oracle Autonomous DataWarehouse, Oracle Exadata Cloud Service,

Oracle Database Cloud Service4. Data Virtualization: Big Data SQL5. Data Visualization: Oracle Analytics Cloud6. Discovery Lab: Oracle Advanced Analytics, Oracle R Advanced Analytics for Hadoop, Oracle Big Data

Spatial and Graph, and Oracle Analytics Cloud.7. Oracle Data Integration Platform Cloud: Oracle Data Integration Platform Cloud with autonomous

capabilities helps migrate and extract value from data by bringing together capabilities of a complete DataIntegration, Data Quality, and Data Governance solution into a single unified autonomous cloud basedplatform. Oracle data integration products are: Oracle GoldenGate, Oracle Data Integrator and OracleEnterprise Data Quality. In our Big Data Architecture, Golden Gate replicates data from Enterprise Data &Reporting to the Data Lake. And Data Integrator moves data from the Data Lake to the Enterprise Data &Reporting. For more information, see Oracle Data Integration Platform Cloud.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 7

Page 8: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Streaming Engine

Oracle Data Hub

NoSQL

Streaming EngineOracle Stream Analytics

Oracle Event Hub

Use Case

1. Oracle Event Hub Cloud Service: Oracle Event Hub Cloud Service leverages Oracle Cloud and ApacheKafka to enable you to work with streaming data. You can quickly create, configure and manage yourTopics in the cloud while Oracle manages the underlying platform. An instance of Oracle Event Hub CloudService is called a Topic. All messages are organized into Topics. Oracle Event Hub Cloud Service enablesyou to unify and organize these data and make it easily accessible and available for consumption anytimeby anyone ranging from an engineer to an advanced analytic machine. For more information, see OracleEvent Hub Cloud Service.

2. Oracle Data Hub Cloud Service: Oracle Data Hub Cloud Service enables you to consistently provisionand manage NoSQL database clusters such as Apache Cassandra on Oracle Cloud. Currently, you canuse the DHCS Console/API/CLI to easily provision the Apache Cassandra database clusters within theOracle Cloud Infrastructure Classic (OCI-Classic) platform. You can use this cluster as a data store foryour big-data, cloud-native applications or to persistently store the messages from the Oracle Event HubCloud Service. You should use the DHCS when you want a consistent interface to provision, administerand monitor popular open source database clusters within the Oracle Cloud platform. For moreinformation, see Oracle Data Hub Cloud Service.

3. Oracle NoSQL Database: Oracle has its own NoSQL solution, which enables fast performance andflexibility by supporting a wide variety of data types and multiple data access. It offers Key-Value andTable Data Models. Oracle NoSQL stores unstructured, semi-structured, or structured data and isaccessible by using Java APIs, C/C++, JavaScript, Python, Node.js, REST APIs. Provides the followingintegration benefits:

Query NoSQL data from Oracle DatabaseAccess to NoSQL data from HadoopSupport for Hive, DMS, Apache Spark, Kerberos

For more information, see Oracle NoSQL Database.

4. Oracle Stream Analytics: Oracle Stream Analytics allows users to process and analyze large scale real-time information by using sophisticated correlation patterns, enrichment, and machine learning. It offersreal-time actionable business insight on streaming data and automates action to drive today’s agilebusinesses. From its interactive designer, users can explore real-time data through live charts, maps,

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 8

Page 9: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

visualizations, and graphically build streaming pipelines without any hand coding. These pipelines executein a scalable and highly available clustered Big Data environment utilizing Spark integrated with Oracle’sContinuous Query Engine to address critical real-time use cases of modern enterprises. For moreinformation, see Oracle Stream Analytics.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 9

Page 10: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Apache Kafka

Web App Email AppMobile App

Recommendations SecurityMonitoring

Analytics

Kafka Cluster

Topic 1 Topic 2

Activity data is the lifeblood of Internet-based applications. On social networking sites such as Facebook orLinkedIn, you can see who has viewed your profile. Even the ads show up according to your browsing historyand recent activities. Therefore, log data processing is critical for Internet companies. Apache Kafka is atechnology that is designed for real-time collecting and delivering of log data. It collects and delivers highvolumes of activity log data with low latency by using a messaging system. Unlike tradition offline processing oflog data, whereby data is processed and stored in a data warehouse or a Hadoop environment for batchprocessing jobs, reporting, and ad hoc analysis, Kafka is designed for real-time processing of log data. Inaddition, Kafka enables the transfer of log data between primary big data processing engines, including RDBMS,Hadoop, and NoSQL.

Apache Kafka runs as a cluster on one or more servers. With Kafka, a stream of messages of a particular type isdefined as a topic. A producer can publish messages to a topic. The published messages are then stored in aset of servers called brokers in a Kafka cluster. Producers can be web servers or mobile apps, and the types ofmessages they send to Apache Kafka is logging information. These logs include events that indicate actions, forexample: A certain event might record the link that a user clicks, and when the link was clicked.

Consumers are various processes that want to find out about the events that are occurring in real time. Theymay want to generate analytics, monitor for unusual activity, generate personalized recommendations for users,and so on. Consumers may subscribe to one or more topics from the brokers, and consume the subscribedmessages by pulling data from the brokers. After pulling a message, consumers perform message aggregationor other processing of these streams. In addition to real-time processors, consumers may also be Hadoop anddata warehousing stores that load virtually all feeds for batch-oriented processing.

For more information, see Apache Kafka.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 10

Page 11: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Apache Kafka Use Case

PublishEvents

Movie SiteActivity

NetworkActivity

Streaming Engine

Movie SiteTopic

NetworkTopic

In this use case, we are demonstrating Oracle MoviePlex application. This use case is based on the Big DataConceptual Architecture. Oracle MoviePlex is an on-line movie streaming company like many other on-linestores. With this web-based application, you can browse a catalog of movies, watch movie trailers, rent movies,review and rank movies, get personalized experience and recommendations. Like many other on-line stores,they needed a cost effective approach to tackle their “big data” challenges. They recently implemented Oracle’sBig Data Management System to better manage their business, identify key opportunities and enhancecustomer satisfaction. Users accessing MoviePlex application, consume massive amount of bandwidth which isa potential big data challenge. By combining Movie Site activity data with Network data, you will answerquestions like:

1. How do you monitor Network Traffic and find out who is consuming more bandwidth?2. How the current revenue stream compares to the benchmark?3. Which country or city is experiencing significant network issues?

Suppose you are watching a movie and suddenly you face frequent buffering, loading problem, or network error.What would you do? Probably log off the site. And this can be a performance challenge for the MoviePlexapplication. In our use case, current Network activity and Movie Site Activity is streamed using Kafka whichgives stream consumers the latest view of MoviePlex application usage and allocation. Network and Movie SiteActivity( clicking movie list, watching, logging, etc.) events are published to the Streaming engine i.e Kafka.Specifically, events are published to a specific topic in Kafka – e.g. web site activity to a topic called Movie Site.We have one topic named Movie Site and one named Network. Now, Spark Streaming reads data from theKafka topics, processes it and writes processed data to a new topic where it becomes available for users andapplications for further analysis. Example: A targeted, special offer is being made during the movie playback.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 11

Page 12: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Data Lake

Data LakeOracle Object Storage

Oracle Big Data Cloud Hadoop/HDFSOracle

Big Data

Use Case

A data lake stores structured and unstructured data, as well as a method for organizing large volumes of highlydiverse data from varied sources. A data lake tends to ingest data very quickly and prepare it later on the fly asyou access it. The data lake can have many platforms under it, for example:

1. Oracle Object Storage: Oracle Cloud Infrastructure Object Storage is an Infrastructure as a Service (IaaS)product, which provides an object storage solution for files and unstructured data. You can use OracleCloud Infrastructure Object Storage to back up content to an off-site location, programmatically store andretrieve content, and share content with peers. Oracle Object storage stores data as objects within a flathierarchy of containers. With Object Storage, you can safely and securely store or retrieve data directlyfrom the internet or from within the cloud platform. These objects could be an image file, logs, HTML files,or any self-contained blob of bytes. Object storage allows data to be stored across multiple regions andscales infinitely to petabytes and beyond. For more information, see Oracle Object Storage.

2. Oracle Big Data Cloud Service: Oracle Big Data Cloud Service gives you access to the resources of apreconfigured Oracle Big Data environment, including a complete installation of the Cloudera DistributionIncluding Apache Hadoop (CDH) and Apache Spark. It is an efficient long term store for Hadoop/HDFS.Use Oracle Big Data Cloud Service to capture and analyze the massive volumes of data generated bysocial media feeds, e-mail, web logs, photographs, smart meters, sensors, and similar devices. For moreinformation, see Oracle Big Data Cloud Service.

3. Oracle Big Data Cloud: Oracle Big Data Cloud combines open source technologies such as ApacheSpark and Apache Hadoop with unique innovations from Oracle to deliver a complete Big Data platformfor running and managing Big Data Analytics applications. For more information, see Oracle Big DataCloud.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 12

Page 13: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Apache Hadoop

DistributedStorage

Filesystem(HDFS)

Resource Management

YARN

Distributed Processing

SparkHiveMapReduce Impala Search Big DataSQL

HiveMetastore

Object Storage

Apache Hadoop is a fundamental building block in capturing and processing big data. At a high level, ApacheHadoop is designed to make parallel the data processing across computing nodes (servers) to speedcomputations and hide latency and complexity. Apache Hadoop is a batch and interactive data-processingsystem for enormous amounts of data. It is designed to process huge amounts of structured and unstructureddata (terabytes to petabytes) and is implemented on racks of commodity servers as a Hadoop cluster. Serverscan be added or removed from the cluster dynamically because Apache Hadoop is designed to be “self-healing.” In other words, Apache Hadoop is able to detect changes, including failures, and adjust to thosechanges and continue to operate without interruption.

Apache Hadoop contains the following main core components:

HDFS is a distributed file system for storing information and it sits on top of the operating system that youare using.Yet Another Resource Negotiator (YARN), an extensible framework job scheduling and cluster resourcemanagement, orMapReduce is a parallel processing framework that operates on local data, whenever possible. Itabstracts the complexity of parallel processing. This enables developers to focus more on the businesslogic rather than on the processing framework.Spark is another processing framework which is more popular than MR today. Spark is optimized to workwith data in memory rather than disk.

Object storage is the persistent storage repository for the data in your data lake. Combining object storage inthe cloud with Spark is more flexible than typical Hadoop/MapReduce configuration. If you need more compute,you can spin up a new Spark cluster and leave your storage alone. If you’ve just acquired many terabytes of newdata, then just expand your object storage. In the cloud, compute and storage aren’t just elastic. They’reindependently elastic. And that’s good, because your needs for compute and storage are also independentlyelastic.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 13

Page 14: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Data Lake Use Case

PublishEvents

Movie SiteActivity

NetworkActivity

Data Lake

Hadoop/HDFSMovie Site

History

NetworkHistory

Object Store

Streaming Engine

Movie SiteTopic

NetworkTopic

Every click that takes place on the MoviePlex site is streamed into the persistent stores - HDFS or Object Storeand then analyzed. And, it is easy to move data between the Hadoop cluster and Object Store.

Event Hub (Kafka) can save data into long term, persistent storage using OCI sink connector. OCI SinkConnector allows you to export data from a Kafka Topic into an Oracle Cloud Storage instance. Object Storestores all historical movie site activity and network usage data. In our use case, if we want to compare thecurrent network stream to the historical benchmark, then we can make use of data stored in the Object Store.And in order to perform analytics, the historical data is copied into HDFS and analyzed.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 14

Page 15: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Enterprise Data and Reporting

Legacy Data

External Data

Operations Data

AnalyticalReporting

DecisionMaking

Oracle ExadataCloud Service

Oracle DatabaseCloud Service

Data MartsEnterprise Data

Warehouse

Oracle AutonomousData Warehouse

Use Case

Data residing in operational systems such as CRM, ERP, warehouse management systems, etc., is typically verywell structured and are designed to consistently store operational data, one transaction at a time. But this datais not always meaningfully presented to the end-user query tool. Alternatively, analytical reporting requiresdatabase design that even business users find directly usable. To achieve this, different database designtechniques are required (for example, the use of dimensional and star schemas with highly denormalizeddimension tables). In our Big Data Conceptual Architecture, Enterprise Data an Reporting is considered as aData Warehousing or Data Mart solution to all the issues identified with data extraction strategy.

Dimensional data models are generally used for structured data analysis. They support most of the operationaland performance reporting requirements of the business. A data warehouse is a strategic collection of all typesof data in support of the decision-making process at all levels of an enterprise. This includes all types of datastores that maintain information for historical and analytical purposes. The historical data stores consolidatelarge quantities of information in a manner that best maintains historical integrity.

Analytical data stores are designed to support analysis by maximizing ease of access and query performance.Technologies and schema designs consist of dimensional data models, OLAP cubes, etc. Dimensional datamodels are generally used for structured data analysis. They support most of the operational and performancereporting requirements of the business.

DataWarehouse is populated with many different types of data from a variety of sources which includesstructured data sources such as operational data, as well as system-generated data and some forms of content.Data can be ingested into the DataWarehouse using a combination of batch or real-time methods. Traditionalextract, transform, and load (ETL) processes, or the extract, load, transform (ELT) variant, are frequently used forbatch data transfer. Typically the DataWarehouse will have a Foundation Data layer and an Access &Performance layer. The Foundation Data layer is a canonical business-neutral representation of the data (thirdnormal form 3NF) by its nature. Foundation layer focuses on historic data management at the atomic level.Access and Performance layer also known as analytical layer with business/functional specific models,snapshots, aggregations, and summaries.

Oracle Cloud Products:

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 15

Page 16: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

1. Oracle Autonomous DataWarehouse: Autonomous Data Warehouse Cloud (ADWC) is a fully manageddatabase tuned and optimized for data warehouse workloads with the market-leading performance ofOracle Database. This is Oracle’s solution to data warehousing and BI in the cloud area. DWCS iscompatible with all business analytics tools that support Oracle Database. Oracle Autonomous DataWarehouse Cloud uses applied machine learning to self-tune and automatically optimizes performancewhile the database is running. It is characterized by many exciting features:

ADWC is very easy to set up, manage, and use because it is built on a fully automated database. Allthe major tasks like provisioning, patching, upgrades, taking backups, and performance tuning arecompletely automated.ADWC uses the Exadata machine, which is very fast, scalable, and reliable. It also avails of thebenefits of Oracle Database capabilities such as parallelism, columnar processing, andcompression.ADWC allows you to scale compute and storage without down time, thus offering high elasticity.You can pay for only the resources that you consume.

For more information, see Oracle Autonomous Data Warehouse Cloud Service.

2. Oracle Database CS: Oracle Database Cloud Service provides you the ability to deploy Oracle databasesin the Cloud, with each database deployment containing a single Oracle database. You have full access tothe features and operations available with Oracle Database, but with Oracle providing the computingpower, physical storage and (optionally) tooling to simplify routine database maintenance andmanagement operations. For more information, see Oracle Database Cloud Service.

3. Oracle Exadata Cloud Service: Exadata Cloud Service is offered on Oracle Cloud, using state-of-the-artOracle-managed data centers. You can also choose Exadata Cloud at Customer which provides ExadataCloud Service hosted in your data center. For more information, see Oracle Exadata Cloud Service.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 16

Page 17: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Enterprise Data and Reporting Use Case

Enterprise Data and Reporting

DimensionTables

SalesTransactions

ORACLE

Movie SiteActivity

SalesTxns

PublishEvents

NetworkActivity

Data Lake

Hadoop/HDFSMovie SiteHistory

NetworkHistory

ObjectStore

Streaming Engine

Movie SiteTopic

NetworkTopic

Data from the Lake is flowing into the Enterprise Data and Reporting Layer. It is going to be enriched, cleansed,etc. – making it a trusted source. Oracle Database has both dimensional and fact data. This includes revenueinformation, information about sales, transactions, movies, users, and more. Suppose we want to find out ifsales revenue in a particular country say UK or France has been impacted due to significant network issues.Then we can analyze the sales data and the streaming network data simultaneously.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 17

Page 18: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Big Data SQL

Node.js JavaREST Python SQL R Graph

NoSQLDatabase

OracleStorage

ORACLE SQL Engine

Oracle Big Data SQL

Hadoop/HDFS

Use Case

Due to an expanded adoption of big data stores—such as Kafka, Hadoop and NoSQL—Oracle customers areexperiencing greater challenges integrating disparate data formats within their information managementsystems. In addition to various big data sources, programming environments are also expanding, includingtechnologies such as Representational State Transfer (REST), nodeJS, and Python. This evolution raisesimportant questions for Oracle customers who want to leverage valuable big data information. Some include:

1. How do you integrate big data with your Data Warehouse?2. How do you analyze all data together?

Big Data SQL is a data virtualization technology that allows users and applications to use Oracle’s rich SQLlanguage across data stored in Apache Kafka, Oracle Database, Hadoop and NoSQL stores. One query cancombine data from all these sources. Oracle Information Management System unifies the data platform byproviding a common query language, management platform, and security framework across Kafka, Hadoop,NoSQL, and Oracle Database. In our IM architecture, Kafka contains stream data and it's able to answer thequestion "what is going on right now", whereas in Database you store operational data, in Hadoop historical andthose two sources are able to answer the question "how it use to be". Oracle Big Data SQL is a key componentof the platform. Big Data SQL allows you to run the SQL over those tree sources and correlate real-time eventswith historical. For more information, see Oracle Big Data SQL.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 18

Page 19: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Big Data SQL Use Case

Enterprise Data and Reporting

DimensionTables

SalesTransactions

ORACLE

Big Data SQL

Transactions and Context

HistoricalBenchmarks

Real TimeEvents

Movie SiteActivity

PublishEvents

NetworkActivity

Data Lake

Hadoop/HDFSMovie Site

History

NetworkHistory

ObjectStore

Streaming Engine

Movie SiteTopic

NetworkTopic

One of the key challenges is to query across all these sources ensuring business users have a full view of thedata. So, now we have data in three different sources:

Kafka contains the streaming Network data and Movie site activity.HDFS contains the web logs - or user behavior. It stores the all the Network and Movie site offline data.Sales Transactions, dimensional data for the MoviePlex app comes from Oracle Database 12c.

In our use case, Big Data SQL easily blends real time streams with history, benchmarks and context. It helps usto answer questions like:

1. Are we running at peak performance?2. What is the opportunity cost of our current network latency?3. How do you correlate between these stores?4. How did current network stream compares to the benchmark? And, did this lead to sales (database)?

Big Data SQL combine data in flight with data in HDFS and Oracle Database. You can use Big Data SQL to runqueries joining data across Oracle Database (sales data) and HDFS (movie site activity).

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 19

Page 20: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Oracle Analytics Cloud Use Case

Enterprise Data and Reporting

DimensionTables

SalesTransactions

ORACLE

Big Data SQL

Transactions and Context

HistoricalBenchmarks

Real TimeEvents

SalesTxns

PublishEvents

Oracle Analytics Cloud

Movie SiteActivity

NetworkActivity

Data Lake

Hadoop/HDFSMovie Site

History

NetworkHistory

ObjectStore

Streaming Engine

Movie SiteTopic

NetworkTopic

In our use case, you use Oracle Analytics Cloud to analyze the movie site activity. Here, Oracle Analytics Cloud(OAC) is accessing MoviePlex application data using Big Data SQL. Use OAC to create project and can analyzenetwork errors happening in different countries or visualize graphs which depicts how the current revenuestream compares to the benchmark. Which country is experiencing network issues and the same sales drop off?Is revenue impacted for France or UK?

Oracle Products:

1. Oracle Analytics Cloud: With Oracle Analytics Cloud, you can take data from any source, and exploreand collaborate with real-time data. You can interact with your personal data, ingest and harmonize datasources, collate and manage disparate inputs, and handle data with coherence and consistency duringorganizational sharing. As you visually research and discover, you can review and visualize both personaland corporate data, and gain insights at key stages of the iterative information cycle. For moreinformation, see Oracle Analytics Cloud.

2. Oracle BI Cloud Service: Oracle BI Cloud Service is one of the Platform as a Service (PaaS) services thatis provided by Oracle Cloud. The service offers many self-service capabilities such as creating reports foryour line of business. You can use Oracle BI Cloud service to easily and efficiently explore data and addyour own data from external sources. And create and share analyses and dashboards that enable you tosolve business problems. For more information, see Oracle BI Cloud Service.

3. Oracle Data Visualization Cloud Service: Oracle Data Visualization Cloud Service makes easy yetpowerful visual analytics accessible to everyone. For more information, see Oracle Data VisualizationCloud Service.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 20

Page 21: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

Discovery Lab

Discovery Lab

Oracle AdvancedAnalytics

Oracle DataMining

Oracle REnterprise

Oracle Big DataSpatial and Graph

Oracle R AdvancedAnalytics for Hadoop

The Discovery Lab is itself a distinct design pattern within the conceptual architecture. It includes a set ofprocessing engines, and analysis tools that are separate from the everyday processing of data. This componentfacilitates the discovery of new knowledge of value to the business. It is characterized by the following:

Specific focus on identifying commercial value for exploitation.Small group of highly skilled individuals (aka Data Scientists, Data Mining practitioners, and so on).Iterative development approach—data oriented and not development oriented.Wide range of tools and techniques applied.Discovery Lab outputs that may include new knowledge, data mining models or parameters, scored data,and others.

There are many tools for analytics and data discovery, for example: Apache Zeppelin, Jupyter, R, etc. ApacheZeppelin is a web-based notebook used for data analytics. It provides a number of useful data-discoveryfeatures such as data ingestion, data discovery, data visualization and collaboration. You can construct strikingdata-driven, interactive and collaborative documents with SQL, Scala and more.

Oracle Products:

1. Oracle Advanced Analytics: Oracle Advanced Analytics allows data and business analysts to extractknowledge, discover new insights, and make predictions by working directly with large data volumes inOracle Database. With Oracle Advanced Analytics, you can discover patterns hidden in massive datavolumes, discover new insights, make predictions, and immediately transform raw data into actionableinsights. For more information, see Oracle Advanced Analytics.

2. Oracle R Advanced Analytics for Hadoop: Oracle R Advanced Analytics for Hadoop is a collection of Rpackages that provide Interfaces to work with Apache Hive tables, HDFS, the local R environment, andOracle Database tables. It provides predictive analytic techniques, written in R or Java as HadoopMapReduce jobs, that can be applied to data in HDFS files. For more information, see Oracle R AdvancedAnalytics for Hadoop.

3. Oracle Big Data Spatial and Graph: Oracle Big Data Spatial and Graph delivers advanced spatial andgraph analytic capabilities to supported Apache Hadoop and NoSQL Database Big Data platforms. Thespatial features include support for data enrichment of location information, spatial filtering andcategorization based on distance and location-based analysis, and spatial data processing for vector andraster processing of digital map, sensor, satellite and aerial imagery values, and APIs for mapvisualization. For more information, see Oracle Big Data Spatial and Graph.

4. Oracle Analytics Cloud: With Oracle Analytics Cloud, you can take data from any source, and exploreand collaborate with real-time data. You can interact with your personal data, ingest and harmonize datasources, collate and manage disparate inputs, and handle data with coherence and consistency duringorganizational sharing. As you visually research and discover, you can review and visualize both personal

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 21

Page 22: Oracle Big Data: Interactive Quick Reference€¦ · Oracle Big Data: Interactive Quick Reference Author: Dimpi Sharmah Subject: This interactive diagram lets you explore the Oracle

and corporate data, and gain insights at key stages of the iterative information cycle. For moreinformation, see Oracle Analytics Cloud.

Copyright © 2019, Oracle and/or its affiliates. All rights reserved. 22