Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Spa@al and Graph An Overview
July, 2015
1
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. |
Safe Harbor Statement
The following is intended to outline our general product direc@on. It is intended for informa@on purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or func@onality, and should not be relied upon in making purchasing decisions. The development, release, and @ming of any features or func@onality described for Oracle’s products remains at the sole discre@on of Oracle.
2
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Agenda
1
2
3
4
Introduc@on to Big Data Spa@al and Graph
Big Data – Graph Features
Big Data – Spa@al Features
Resources
Q & A
3
5
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Oracle’s Spa@al and Graph Strategy
Enable Spa/al and Graph use cases on every Big Data pla?orm
NoSQL
Oracle Big Data Spa@al and Graph Oracle Database Spa@al and Graph
Spa@al and Graph in Cloud Offerings
4
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Oracle Big Data Spa@al and Graph Property Graph for Analysis of: • Social Media rela@onships
• Internet of Things interac@ons
• Cyber-‐Security
Spa@al Analysis Features for:
• Loca@on Data Enrichment
• Proximity and containment analysis
• Prepara@on of digital map and imagery data sets
5
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Conven@onal database or Big Data technologies Typical technical decision criteria
0
1
2
3
4
5 Tooling maturity
Stringent Non-‐Func@onals
ACID transac@onal requirement
Security
Variety of data formats
Data sparsity
ETL simplicity
Cost effec@vely store low value data
Inges@on rate
Straight Through Processing (STP)
Hadoop
Rela@onal
6
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
The Big Picture – Oracle Big Data Management System
SOURC
ES
DATA RESERVOIR DATA WAREHOUSE
Oracle Database
Oracle Industry Models
Oracle Advanced Analy/cs
Oracle Spa/al & Graph
Big Data Appliance
Apache Flume
Oracle GoldenGate
Oracle Event Processing
Cloudera Hadoop
Oracle Big Data SQL
Oracle NoSQL
Oracle R Distribu/on
Oracle Big Data Spa/al and Graph
Oracle Database
In-‐Memory, Mul/-‐tenant
Oracle Industry Models
Oracle Advanced Analy/cs
Oracle Spa/al and Graph
Exadata
Oracle GoldenGate
Oracle Event Processing
Oracle Data Integrator
Oracle Big Data Connectors
Oracle Data Integrator B
7
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
1
2
3
4
Introduc@on to Big Data Spa@al and Graph
Big Data – Graph Features
Big Data – Spa@al Features
Resources
Q & A
8
5
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Why Graph Databases Now?
Rise of social networking Google, Yahoo, Twijer, Facebook, Linked In
Enterprise applica@ons increasingly need to model data rela@onships Telecoms: Network & Data center management, iden@ty management Financial Services: Fraud detec@on; cross-‐selling Media & Publishing: Social apps, recommenda@on, sen@ment Health Care: CRM, fraud detec@on
Modeling complex rela@onships as graphs is efficient Improves performance Simplifies queries, traversal, search and analy@cs
9
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Data Models
RDF Data Model • Data federa@on • Knowledge representa@on • Seman@c Web
! Na@onal Intelligence ! Public Safety ! Social Media search ! Marke@ng -‐ Sen@ment
Property Graph Model • Graph Data Management • Social Network Analysis • En@ty analy@cs
! Life Sciences ! Health Care ! Publishing ! Finance
! Logis@cs ! Transporta@on ! U@li@es ! Telcoms
Network Data Model • Network path analysis • Transporta@on modeling
Use Case Graph Model Industry Domain
10
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph for Social and Unstructured Data Analysis
Graph is a powerful tool for Data Analysis
you capture fine-‐grained, arbitrary
By represen@ng your data as a graph with
rela@onships between data en@@es
Individual rela@onships are represented as links
When analyzing such a graph,
you are using explicit rela@onships
to find implicit informa@on
about your data
Without compu@ng mul@ple joins
11
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Graph Analysis Examples
• Ajribute searching (Get people with a given name)
• Node/edge adjacency (Get people that like a given Web page)
• Fixed-‐length paths (Get the friends of the friends of a given person) • Reach-‐ability (Is there a “friend” connec@on between two people?) • Pajern matching (Get the common friends between two people)
• Aggregates (Get the number of friends of a given person)
12
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Common Graph Analysis Use Cases
Purchase Record
customer items
Product Recommenda@on Influencer Iden@fica@on
Communica@on Stream (e.g. tweets)
Graph Pajern Matching Community Detec@on
Recommend the most similar item purchased by similar people
Find out people that are central in the given network – e.g. influencer marke@ng
Iden@fy group of people that are close to each other – e.g. target group marke@ng
Find out all the sets of en@@es that match to the given pajern – e.g. fraud detec@on
13
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. 14
CyberSecurity Modeling / Internet of Things
• Property graph model
• Dynamic construc@on of IP network
• The graph includes metadata as well as events/enriched data
• Extensible by other data source (add proper@es, rela@ons)
• Search – Text search on graph DB proprie@es
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Graph Solution Workflows and Characteristics • Graph Data Management
– Raw business data is converted to graph format and persisted as HDFS – Graph queries on HDFS or NoSQL using Java REST APIs
• Analysis and Exploration (in-memory analysis engine) – Data scientists try different ideas (algorithms) on the data – Flexible, interactive, iterative, small-scale (sampled), ….
• Production phase – Important discoveries are applied to the production system – Fixed, automated, batch-oriented, large-scale, …
Graph Persistence (Hbase, NoSQL) Data En@@es Graph Query
and Analysis
Discoveries on the data
Ideas about the data
Produc@on System
15
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Performance
• Distributed Processing: Hbase, NoSQL load, index, query, search
• Parallel, in-‐memory graph analy@cs
Scalability
• Horizontal scalability • Concurrency: mul@ple users and analy@c opera@ons on one or more in-‐memory graphs • Filtering to refine in-‐memory graph requirements
Ease of Programming
• Easy to use parallel & distributed analy@cs • Ease of graph data modeling • Popular open source Java and REST APIs
Manageability
• Integra@on with Hbase and Oracle NoSQL database features for indexing, sharding, fault tolerance, availability, installa@on, and management • Op@on to Oracle BDA
Enterprise Requirements
16
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Big Data Graph Detail
17
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Graph Data Model
What is a graph? – A set of links and nodes (and op@onally ajributes) – A graph is simply linked data
Why do we care? – Graphs are everywhere
• Social networks/Social Web (Facebook, Linkedin, Twijer, Baidu, Google+,…)
• Cyber networks, power grids, protein interac@on graphs • Knowledge graphs (IBM Watson, Apple SIRI, Google Knowledge Graph)
– Graphs are intui@ve and flexible • Easy to navigate, easy to form a path, natural to visualize
• Do not require a predefined schema
E
A D
C B
F
18
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
The Property Graph Data Model
• A set of ver@ces (or nodes) – each vertex has a unique iden@fier. – each vertex has a set of in/out edges. – each vertex has a collec@on of key-‐value
proper@es.
• A set of edges (or links) – each edge has a unique iden@fier. – each edge has a head/tail vertex. – each edge has a label deno@ng type of
rela@onship between two ver@ces. – each edge has a collec@on of key-‐value
proper@es. hjps://github.com/@nkerpop/blueprints/wiki/Property-‐Graph-‐Model
19
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Big Data Spa@al and Graph (BDSG) Property Graph Features
• Highly scalable graph database and analy@cs engine • Implemented on Apache HBase and Oracle NoSQL Database
• Rich developer APIs – Blueprints, REST, Java graph plus support for Groovy, Python, PHP, Perl, Ruby, and JavaScript
• Fast, scalable suite of social network analysis func@ons – Ranking, centrality, recommender, community detec@on, path finding…
– Targeted to address main industry requirements
• Manageability – Bulk load – Console to execute Java and Gremlin APIs
20
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Big Data Graph Architecture
Scalable and Persistent Storage
Graph Data Access Layer API
Graph Analy@cs In-‐memory Analy@c Engine
REST Web Service
Blueprints & SolrCloud / Lucene
Property Graph Support on Apache HBase and Oracle NoSQL
Python, Perl, PHP, Ruby,
Javascript, …
Java APIs
Java APIs
21
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Support for Open Source TinkerPop Graph Tool Stack Oracle Big Data Spa@al and Graph Blueprints API implementa@on provides support for the de-‐facto graph database standard TinkerPop component stack.
These include query language, dataflow, REST APIs, and others.
22
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
Data Format Support • GML, GraphML, GraphSON
• Oracle-‐defined Property Graph flat files – Vertex file, Edge file – Support basic data types + Date with Timezone + Serializable objects – Allow mul@ple data types to be associated with one key – UTF8 based
1,name,1,Barack%20Obama,, 1,age,2,,53, 1,likes,1,scrabble,, 1,likes,5,,,2009-01-20T00:00:00.000-05:00 1,occupation,1,44th%20president%20of%20United%20States%20of%20America,,
23
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Text Search through Apache Lucene/Solr
• Integra@on with Apache Lucene/Solr • Support manual and auto indexing of Graph elements
• Manual index:
• oraclePropertyGraph.createIndex(“my_index", Vertex.class);
• indexVer@ces = oraclePropertyGraph.getIndex(“my_index” , Vertex.class);
• [email protected](“key”, “value”, myVertex);
• Auto Index • oraclePropertyGraph.createKeyIndex(“name”, Edge.class);
• oraclePropertyGraph.getEdges(“name”, “*hello*world”);
• Enables queries to use syntax like “*oracle* or *graph*”
24
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
In-‐Memory Graph Analysis Framework
• Large graph analysis is @me-‐consuming because … – The computa@on typically involves touching most nodes and edges in the graph – The data-‐access pajern is random
• In-‐memory, parallel framework for fast graph analy@cs
• Exploits the architecture of modern servers – The computa@on is parallelized using mul@ple CPU cores – The non-‐sequen@al data-‐access is mi@gated with large DRAMs
• J2EE container support (WLS, Tomcat, Jejy)
25
Copyright © 2015 Oracle and/or its affiliates. All rights reserved.
35 Graph Func@ons Detec@ng Components and Communi@es
Tarjan’s, Kosaraju’s, Weakly Connected Components, Label Propaga@on (w/ variants), Soman and Narang’s
Ranking and Walking
Pagerank, Personalized Pagerank, Betweenness Centrality (w/ variants), Closeness Centrality, Degree Centrality, Eigenvector Centrality, HITS, Random walking and sampling (w/ variants)
Evalua@ng Community Structures
∑ ∑
Conductance, Modularity Clustering Coefficient (Triangle Coun@ng) Adamic-‐Adar
Path-‐Finding Hop-‐Distance (BFS) Dijkstra’s, Bi-‐direc@onal Dijkstra’s Bellman-‐Ford’s
Link Predic@on SALSA (Twijer’s Who-‐to-‐follow)
Other Classics Vertex Cover Minimum Spanning-‐Tree(Prim’s)
• 26
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Unique Graph Filtering Opera@ons
• Graph analysis engine reads graph into memory from HBase or NoSQL – Can reach memory limit for huge graphs – Subgraph mechanism to address this
• Data Access Layer filtering used to create subgraph for Property Graph Engine analy@cs
• Persisted graph can s@ll be modified through Java and REST APIs – Changes can be propagated to in-‐memory graph
Oracle Property Graph or RDF (HBase or NoSQL)
Property Graph Engine
Analy@c Request Analy@c Request
Analy@c Request Analy@c Request Analy@c Request Analy@c Request
Trans-‐ac@onal Request
27
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
1
2
3
4
Introduc@on to Big Data Spa@al and Graph
Big Data – Graph Features
Big Data – Spa@al Features
Resources
Q & A
28
5
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
What is Spa@al Data Integral part of almost every database
• Business data that contains or describes loca@on – Geographic features (roads, rivers, parks, etc.) – Assets (pipe lines, cables, transformers – Sales data (sales territory, customer registra@on, etc.) – Street and postal address (customers, stores, factories, etc.)
• Anything associated with a physical loca@on • Described by coordinates or implicitly as text (place name), ...
• Loca@on is a “universal key” rela@ng otherwise unrelated en@@es
29
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | |
Oracle Big Data Spa/al Overview
• Oracle Big Data Spa@al and Graph on Apache Hadoop is a framework that uses the MapReduce programs and analy@c capabili@es in a Hadoop cluster to store, access, and analyze the spa@al data
• The spa@al features provide a schema and func/ons that facilitate the storage, retrieval, update, and query of collec@ons of spa@al data.
• The spa@al data is loaded for query and analysis by the Spa@al Server and the images are stored and processed by an Image Processing Framework.
30
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data – Spa@al Features • Geo-‐enrichment for Data Harmoniza@on
– Resolu@on of loca@on-‐related informa@on – Determina@on of loca@on hierarchies
• Categoriza@on and filtering – Tracking, proximity analysis, geo-‐fencing and categoriza@on based on loca@on
• Data prepara@on – Large scale geoprocessing for cleansing, prepara@on of imagery, sensor data, and raw data input
• Data visualiza@on
31
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Linking informa@on by loca@on Are these data points related?
• Tweet: sailing by #goldengate • Instagram image sub@tle: 골든게이트 교* • Text message: Driving on 101 North , just reached border between Marin County and San Francisco County
• GPS Sensor: N 37°49′11″ W 122°28′44″
• Now find all data points around Golden Gate Bridge ...
* Golden Gate Bridge (in Korean)
32
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Spa/al Use Cases
Industry Usage
U/li/es Smart grids, dynamic demand response, network u@liza@on
Financial services, Insurance Fraud analysis, origin-‐des@na@on/flow, site planning, demographic analysis, risk assessment
Telecomm Network monitoring and planning, loca@on based adver@sing, tracking/loca@on based services
Transporta/on, logis/cs Asset tracking, fleet management, service planning
Retail Site planning, loca@on-‐based marke@ng
33
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Use Cases: Geo-‐Enrichment
34
(i) Wireless network performance (dropped calls, u@liza@on): aggregate and display geocoded Call Data Records, sensor data, other sources for analysis
(ii) Transporta@on origin-‐des@na@on analysis: combine transit card/payment info and other sources to determine where (and how many) people travel to, star@ng from any sta@on on a transit network
(iii) Geotagged Twijer: where are the tourists and locals twee@ng
i ii
iii
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Use Cases: Categoriza@on, Filtering, Aggrega@on
35
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Use case: Data prepara@on
36
Mosaic images
Terrains and contours Shaded reliefs
Pyramiding: layers at different resolu@on
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Overview of Spa@al Features • Vector Data Processing
– Support spa@al processing of data stored in HDFS – Commonly used opera@ons like pointInPolygon, buffer crea@on, distance calcula@ons, anyinteract opera@ons, etc.
– Supports both Geode@c and Cartesian data – Data enrichment services using GeoNames and geometry hierarchy data
– Map visualiza@on API (HTML5)
• Raster Data Processing – GDAL to load raster data onto HDFS – Raster processing opera@ons: Mosaic and sub-‐set opera@ons
– MapReduce framework for raster analysis opera@ons (for example, calculate the slope at each pixel based on the DEM)
37
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
S&G Java API
HDFS Geospa/al Vector data
(any format)
Spa/al Operators, Func/ons Spa/al Enrichment, Categoriza/on API
Mapper and Reducer Classes
Enrichments, Categoriza/ons results
MapReduce Framework, templates
Customer Applica/on
RecordReader class
GeoJSON, JGeometry format GeoNames and Hierarchy data
Sample Applica/on
Big Data Spa@al and Graph Spa@al Vector Processing Framework
Customer data
Generated data
Oracle Provided
Customer code
38
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Spa@al Features (Vector Services) cont. Build Map Visualiza/on using HTML5-‐based API
• Includes an HTML5-‐based map client API for developers – Default world boundary data provided as JSON files
– The map view can display any data stored in GeoJSON files
– Browser based, rich interac@on
39
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Vector spa@al data storage in HDFS • Customers load their data into HDFS using a loader of their choice
– We do not require the data to be in a format that we specify – This makes it easy for customers to use any data format their applica@ons prefer – And the data can have other business data and not just spa@al data
• We require the customer to provide a RecordReader class – This class reads the customer data record and produces an instance of JGeometry or a GeoJSON instance
– With this model we can support any data format customers uses for their data
40
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Spa@al Features (Raster Services) • HDFS storage for the image or raster files
– We can support dozens of file formats – Images are georeferenced – Images can be in different coordinate systems and resolu@ons
• Three main capabili@es – GDAL-‐based loader to load raster data from NFS to HDFS – Mosaic and subset opera@ons – Image processing framework for raster analysis
41
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Oracle Big Data Spa@al Features (Raster Services) cont.
• Console to view the set of images that are available – Map displays all available images using image footprints – Users can zoom into different areas of the world to see available images for any region
– Informa@on about the source and other spa@al informa@on for each image can also be displayed
– Ability to group sets of images into groups for further processing – Images can be given priority in a group based on date, resolu@on, etc.
42
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Console Create Index on spa@al data in HDFS
43
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Console
Run Map Reduce job to perform categoriza@on based on spa@al hierarchy
44
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Console Results in Console
“Tweets in May by State”
45
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Program Agenda
1
2
3
4
Introduc@on to Big Data Spa@al and Graph
Big Data – Spa@al Features
Big Data – Graph Features
Resources
Q & A
46
5
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Resources • Oracle Big Data Spa@al and Graph on Oracle.com: hjps://www.oracle.com/database/big-‐data-‐spa@al-‐and-‐graph
• OTN product page (trial so~ware downloads, documenta@on): hjp://www.oracle.com/technetwork/database/database-‐technologies/bigdata-‐spa@alandgraph
• Blog (technical examples and @ps): hjps://blogs.oracle.com/bigdataspa@algraph/
• Big Data Lite Virtual Machine (a free sandbox environment to get started): hjp://www.oracle.com/technetwork/database/bigdata-‐appliance/oracle-‐bigdatalite-‐2104726.html
47
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. |
Q&A
48
Copyright © 2015 Oracle and/or its affiliates. All rights reserved. | 49