42
Oracle Database 12c Features for Big Data Disclaimer : The information presented here is based on my views, and information gathered from online sources, the presentation is only to create an awareness about the features and does not describe a real solution. Presented by Abishek V S

Oracle Database 12c - Features for Big Data

Embed Size (px)

Citation preview

Page 1: Oracle Database 12c - Features for Big Data

Oracle Database 12c

Features for Big Data

Disclaimer : The information presented here is based on my views, and information gathered from online sources, the presentation is only to create an awareness about the features and does not describe a real solution.

Presented by Abishek V S

Page 2: Oracle Database 12c - Features for Big Data

Agenda

• What is Big Data

• Big Data Versus RDBMS

• Oracle In-Memory Column Store

• JSON support in Oracle Database

• Oracle Database And Hadoop

Page 3: Oracle Database 12c - Features for Big Data

What is Big Data

Page 4: Oracle Database 12c - Features for Big Data

What is Big Data

Big data is simply data that breaks traditional architectures due to its sheer volume, speed and variety.

Structured

Unstructured

Semi-Structured

Multiple Sources

Large Volumes

Page 5: Oracle Database 12c - Features for Big Data

Characterization of Big Data

Volume

Variety

Velocity

From “Understanding Big Data” by IBM

Veracity, Validity, Volatility

Page 6: Oracle Database 12c - Features for Big Data

Characterization of Big Data

From the dawn of civilization until

2003, humankind generated five

exabytes of data. Now we produce

five exabytes every two days…and

the pace is accelerating.Eric Schmidt,

Executive Chairman, Google

Page 7: Oracle Database 12c - Features for Big Data

Characterization of Big Data

Page 8: Oracle Database 12c - Features for Big Data

Characterization of Big Data

Page 9: Oracle Database 12c - Features for Big Data

Big Data: Driving Factors & Motivation

• Exponential growth of the internet

• Widespread acceptance of E-Commerce

• Growth of the Social Network

• Commoditization of the computing resources

• Per GB cost of storage is more affordable now than 10 years back.

• Commodity computers have become more powerful.

• Popularity of clusters based on commodity computers

• IoT (Internet of Things)

– Day by day the devices we own are getting smarter and are learning about us.

Page 10: Oracle Database 12c - Features for Big Data

• Distributed computing– Distributed Servers and Storage (Cloud based)

– Distributed processing Eg : MapReduce with Hadoop

• Schema Free Databases– NoSQL Database

• In-memory

• Semi Structures

– JSON

– Key, Value pairs

• Columnar databases

• Big Data Operations• Analytic / Semantic Processing (e.g. R, OWLIM)

Big Data: Technologies and Tools

Page 11: Oracle Database 12c - Features for Big Data

Big Data Versus RDBMS

Page 12: Oracle Database 12c - Features for Big Data

Big Data versus RDBMS

• RDBMS– Data is stored in defined structures (tables)

– Transactional in nature

– Data consistency is a primary consideration

– Drives operational systems

– Response time is crucial

• Big Data– Data comes in all shapes and sizes

– Behavioral Data

– Prone to rapid change

– Useful in VAS, identifying patterns not exposed by Operational systems

– The value derived is of prime importance.

Page 13: Oracle Database 12c - Features for Big Data

Big Data versus RDBMS

RDBMS

Captures Business Transactions

Ensures Operational Efficiency

Operational Decision support

Analytics is very limited

Integrating external data is expensive

ERP, BI, ETL, Data warehouse

Big Data

Captures User behavioral data

System logs, social data

Acts as Feedback to business

New opportunity exploration

Analytics is the key focus

Technology aims at integration.

User activity log, Web Analytics, Social Media Streaming API, Hadoop Map Reduce, NoSQL data store optimized for Analytics

Page 14: Oracle Database 12c - Features for Big Data

Big Data versus RDBMS

Big Data

RDBMS

Page 15: Oracle Database 12c - Features for Big Data

Oracle In-Memory Column Store

Page 16: Oracle Database 12c - Features for Big Data

Oracle In-Memory Column Store

• A column format database stores each of the attributes about a transaction or record in a separate column structure

• A column format is ideal for analytics, as it allows for faster data retrieval when only a few columns are selected but the query accesses a large portion of the data set.

• A column format is not so efficient at processing row wise DML: In order to insert or delete a single record in a column format all of the columnar structures in the table must be changed.

• Up until now you have been forced to pick just one format and suffer the tradeoff of either suboptimal OLTP or sub-optimal analytics performance.

Page 17: Oracle Database 12c - Features for Big Data

Oracle In-Memory Column Store

Oracle Database In-Memory provides best of both worlds

The in-memory column format store cache should be sized to fit the objects that

must be stored in memory.

Less than 20% overhead in terms of total memory requirements.

Database In-Memory uses an In-Memory column store (IM column store), which is

a new component of the Oracle Database System Global Area (SGA), called the In-

Memory Area (INMEMORY_SIZE).

Page 18: Oracle Database 12c - Features for Big Data

Oracle In-Memory Column Store

• Tablespace Level

– ALTER TABLESPACE ts_data INMEMORY;

• Table Level

– ALTER TABLE sales INMEMORY NO INMEMORY(prod_id);

• Partition Level

– ALTER TABLE sales MODIFY PARTITION SALES_Q1_1998 NO INMEMORY;

• Objects are populated into the IM column store either in a prioritized list immediately after the database is opened or after they are scanned (queried) for the first time.

– ALTER TABLE customers INMEMORY PRIORITY CRITICAL;

Page 19: Oracle Database 12c - Features for Big Data

Oracle In-Memory Column Store

• In-Memory Compression

• Typically compression is considered only as a space-saving mechanism. However, data populated into the IM column store is compressed using a new set of compression algorithms that not only help save space but also improve query performance

Page 20: Oracle Database 12c - Features for Big Data

Oracle In-Memory Column Store

• In-Memory Scans– Analytic queries typically reference only a small subset of the columns in a table.

– Oracle Database InMemory scans only the columns needed by a SQL, and applies any WHERE clause filter predicates to these columns directly without decompressing them.

• In-Memory Storage Index– A further reduction in the amount of data accessed

– Automatically created and maintained on each of the columns in the IM column store.

– Storage Indexes allow data pruning based on the filter predicates in a SQL statement.

Page 21: Oracle Database 12c - Features for Big Data

• SIMD Vector Processing– Database In-Memory uses SIMD (Single Instruction processing Multiple Data values) vector

processing

– SIMD vector processing allows a set of column values to be evaluated together in a single CPU instruction.

• In-Memory Joins– SQL statements that join multiple tables can also be processed very efficiently in the IM

column store as they can take advantage of Bloom Filters.

• A Bloom filter transforms a join into a filter that can be applied as part of the scan of the larger table.

• In-Memory Aggregation– Analytic style queries often require complex aggregations and summaries.

– A new optimizer transformation, called Vector Group By, has been introduced with Oracle Database 12.1.0.2 to ensure more complex analytic queries can be processed using new CPU-efficient algorithms.

Oracle In-Memory Column Store

Page 22: Oracle Database 12c - Features for Big Data

JSON support in Oracle Database

Page 23: Oracle Database 12c - Features for Big Data

JSON support in Oracle Database

• JSON (Java Script Object Notation) is a fast-growing data type often used in web and mobile applications.

• JSON is also used as a data interchange format– More lightweight

– Bandwidth-non-intensive

• JSON integrates into web pages as javascript can directly inherit a JSON

Page 24: Oracle Database 12c - Features for Big Data

JSON support in Oracle Database

• JSON is gaining popularity– APIs (application programming interfaces)

• Most Social network providers provide JSON based data services API.

• Webservices : RESTful (Representative state transfer)

– Big Data

• Many NoSQL databases use JSON as the storage format

– MongoDB, CouchDB, and Riak

– Internet of Things (IoT)

• With more personal devices and appliances getting smart and hooking up to the internet, JSON is becoming the choice of use as it is lightweight and better adaptable to these devices.

Page 25: Oracle Database 12c - Features for Big Data

JSON support in Oracle Database

• JSON in Oracle Database 12c R1 (12.1.0.2)– Creating Tables to Hold JSON

– Querying JSON Data

• Dot Notation

• IS JSON

• JSON_EXISTS

• JSON_VALUE

• JSON_QUERY

• JSON_TABLE

• JSON_TEXTCONTAINS

– Identifying Columns Containing JSON

– Loading JSON Files Using External Tables

Page 26: Oracle Database 12c - Features for Big Data

JSON support in Oracle Database

• Creating Tables to Hold JSON – No new data type has been added to support JSON. Instead, it is stored

in regular VARCHAR2 or CLOB columns.

– The IS JSON constraint indicates the column contains valid JSON data.

CREATE TABLE json_documents (

id RAW(16) NOT NULL,

data CLOB,

CONSTRAINT json_documents_pk PRIMARY KEY (id),

CONSTRAINT json_documents_json_chk CHECK (data IS JSON)

);

Lax or Strict checking “(data is JSON(Strict))”

– The [USER|ALL|DBA]_JSON_COLUMNS views can be used to identify tables and columns containing JSON data.

Page 27: Oracle Database 12c - Features for Big Data

INSERT INTO json_documents (id, data)

VALUES (SYS_GUID(),

'{

"FirstName" : "John",

"LastName" : "Doe",

"Job" : "Clerk",

"Address" : {

"Street" : "99 My Street",

"City" : "My City",

"Country" : "UK",

"Postcode" : "A12 34B"

},

"ContactDetails" : {

"Email" : "[email protected]",

"Phone" : "44 123 123456",

"Twitter" : "@johndoe"

},

"DateOfBirth" : "01-JAN-1980",

"Active" : true

}');

Page 28: Oracle Database 12c - Features for Big Data

COLUMN FirstName FORMAT A15

COLUMN LastName FORMAT A15

COLUMN Postcode FORMAT A10

COLUMN Email FORMAT A25

SELECT a.data.FirstName,

a.data.LastName,

a.data.Address.Postcode AS Postcode,

a.data.ContactDetails.Email AS Email

FROM json_documents a

ORDER BY a.data.FirstName,

a.data.LastName;

FIRSTNAME LASTNAME POSTCODE EMAIL

--------------- --------------- ---------- -------------------------

Jayne Doe A12 34B [email protected]

John Doe A12 34B [email protected]

Page 29: Oracle Database 12c - Features for Big Data

• IS JSON– The IS JSON condition can be used to test if a column contains JSON data.

• SELECT JSON_VALUE(a.data, '$.FirstName') AS first_name FROM json_documents_no_constraint a WHERE a.data IS JSON;

• JSON_EXISTS– Similar to IS NULL, checks if an element has a value

• JSON_VALUE– Returns an element from the JSON document, based on the specified JSON

path.

• JSON_QUERY– The JSON_QUERY function returns a JSON fragment representing one or more

values.

• JSON_TABLE– The JSON_TABLE function incorporates all the functionality of JSON_VALUE,

JSON_EXISTS and JSON_QUERY.

– JSON_TABLE is used for making JSON data look like relational data, which is especially useful when creating relational views over JSON data,

• JSON_TEXTCONTAINS– Works with JSON indexes and enables faster text searching through the JSON

data.

Page 30: Oracle Database 12c - Features for Big Data

JSON support in Oracle Database

Loading JSON Files Using External Tables

• Create the directory objects for use with the external table.CREATE OR REPLACE DIRECTORY order_entry_dir

AS '/u01/app/oracle/product/12.1.0.2/db_1/demo/schema/order_entry';

GRANT READ, WRITE ON DIRECTORY order_entry_dir TO test;

CREATE OR REPLACE DIRECTORY loader_output_dir AS '/tmp';

GRANT READ, WRITE ON DIRECTORY loader_output_dir TO test;

• Create the external table and query it to check if it is working.CREATE TABLE json_dump_file_contents (json_document CLOB)

ORGANIZATION EXTERNAL (TYPE ORACLE_LOADER DEFAULT DIRECTORY order_entry_dir

ACCESS PARAMETERS (RECORDS DELIMITED BY 0x'0A'

DISABLE_DIRECTORY_LINK_CHECK

BADFILE loader_output_dir: 'JSONDumpFile.bad'

LOGFILE order_entry_dir: 'JSONDumpFile.log'

FIELDS (json_document CHAR(5000)))

LOCATION (order_entry_dir:'PurchaseOrders.dmp'))

PARALLEL

REJECT LIMIT UNLIMITED;

Page 31: Oracle Database 12c - Features for Big Data

JSON support in Oracle Database

SELECT COUNT(*) FROM json_dump_file_contents;

COUNT(*)

----------

10000

• You can now load the database table with the contents of the external table.

TRUNCATE TABLE json_documents;

INSERT /*+ APPEND */ INTO json_documents

SELECT SYS_GUID(), json_document

FROM json_dump_file_contents

WHERE json_document IS JSON;

COMMIT;

Page 32: Oracle Database 12c - Features for Big Data

Oracle Database And Hadoop

Page 33: Oracle Database 12c - Features for Big Data

Oracle Database And Hadoop

• Big Data Discussion is incomplete without the mention of Hadoop

• Hadoop is a distributed computing framework

• Runs Batch operations(MapReduce) on distributed clusters made of commodity computers.

• Stores data in a distributed clustered filesystem

• Hadoop clusters are a shared nothing paradigm

Page 34: Oracle Database 12c - Features for Big Data

Oracle Database And Hadoop

• MapReduce Paradigm

Page 35: Oracle Database 12c - Features for Big Data

Oracle Database And Hadoop

• In-Database MapReduce

• Avoid Shipping of data residing in RDBMS to an external infrastructure

• Database security can be applied to the processed data.

• Shorter learning curve for both Developers and DBAs

• Mix SQL with MapReduce processing for flexibility and efficiency

• Uses PL/SQL or Java Pipe-Lined Functions

INSERT INTO OUTTABLE

SELECT * FROM TABLE

(Word_Count_Reduce (:ConfKey,

CURSOR(SELECT * FROM TABLE

(Word_Cursor_Map(:ConfKey,

CURSOR(SELECT * FROM InTable)))))) ;

Page 36: Oracle Database 12c - Features for Big Data

Oracle Database And Hadoop

• Pipelined Functions : Can either return a stream of rows or take it as input too.

• Can be Parallelized with a partition key

• Implemented using PL/SQL, Java or C

• Contains 2 Pipelined Functions, one for mapper the other for reducer.

• Further the mapper input source could be an external table, and the reducer output may be placed in a DB table or further sent out to filesystem file.

• Can leverage external tables, DBFS, use Java or C to write to files.

• The opportunities are endless when coupled with other DB features and options.

• DB Scheduler can be used to schedule the mapreduce

• Clustered with distributed databases using DBLinks

• Add fault tolerance and scalability with RAC.

Page 37: Oracle Database 12c - Features for Big Data

Oracle Database And Hadoop

• Oracle In-Database Hadoop

• We will look at this in a future discussion …

Page 38: Oracle Database 12c - Features for Big Data

Oracle Database And Hadoop

Page 39: Oracle Database 12c - Features for Big Data

The Road Ahead

• Big Data/NoSQL databases WILL NOT replace RDBMS databases.

• Oracle’s Roadmap has been Single Vendor Solutions.

• Reusing available resources : Both technology and human resource.

• Oracle is building more Appliance based solutions.

Page 40: Oracle Database 12c - Features for Big Data

The Road Ahead

• Oracle Big Data Products.– Oracle Big Data Management

• Oracle Big Data Appliance

• Oracle Big Data SQL

• Oracle NoSQL Database

– Oracle Big Data Integration

• Oracle GoldenGate

• Oracle Data Integration

• Oracle Event Processing

– Big Data Analytics

• Oracle Big Data Discovery

• Oracle Advanced Analytics

• Oracle Business Intelligence Foundation

Page 41: Oracle Database 12c - Features for Big Data

Please mail me at [email protected]

Page 42: Oracle Database 12c - Features for Big Data