Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
What’s New in Oracle Database 12c Graph DatabaseXavier Lopez, Ph.D. Senior DirectorZhe Wu, Ph.D. Architect
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Agenda
• Graph Database Strategy• Customer Use Cases• Oracle Spatial and Graph RDF Graph Features• Future Plans
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Graph Database Strategy
Support Graph Data Types… …On all enterprise platforms• Oracle Database• Oracle NoSQL Database• Oracle Big Data Appliance• Oracle Cloud
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
What Sets Us Apart?
• Scalability: Trillions of triples• Transactional: Concurrent loading and updates with ACID properties• Security: OLS security labels at “triple” level (OLS). • Standards based: W3C• Manageable: Use existing DB tools, utilities and expertise• Multi-type support: graph, relational, search, geospatial …• Multi-platform: Relational database, NoSQL, Hadoop
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
RDF Graph v. Property Graph
RDF Semantic Graphs Property Graphs
• Use Case: – Social network analysis
• Analytics:– Clustering, centrality, page rank, path
finding
• Analytics Execution– In-memory, In-database
• Use Case: – Linked data, semantic metadata layer
• Analytics: – pattern matching, Inferencing
• Analytics Execution– In-database
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
RDF Semantic Graph feature of Oracle Spatial and GraphFor Oracle Database 12c
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Find related content & relations by navigating connected entities
“Reason” across entities
Find related content & relations by navigating connected entities
“Reason” across entities
Two Application
Use CasesLinked Data Entity Analytics
•Unified metadata model for distributed data sources
•Flexible model for sparse and evolving data
•Validate semantic and structural consistency
SPARQL pattern matchingDetecting related entities
across large, sparse, disparate collections of data
Inferencing: Applying rules on asserted data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Graph-based Metadata Layer
Linked Data in Support of Distributed Data
–W3C standard, flexible model for
sparse and evolving data–Common vocabulary enables
data integration & app development
–Relational data stays in place, apps don’t need to change Database Server
HR Database Sales Database
Inventory Database
HR Schema Inventory Schema Sales Schema
Mid-Tier ServerApplication 1
Application 2 Application 3
SQL RDF GraphInventory
Graph Sales Graph
Shared Ontologies
SPARQL
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Linked Data in Enterprise
Index
Content Mgmt BI Server Data Warehouse
Machine Generated Data
Semantic Graph model
Transaction Systems
Hadoop Appliance
Subscription ServicesHuman Sourced
InformationSocial Media
Event Server
Data Servers
Data Sources / Types
Access & Presentation Layer
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Linked Data / Enterprise Metadata
• Life Sciences• Finance• Media • Networks &
Communications• Defense & Intelligence• Police
Industries
Hutchinson 3G Austria
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Business Challenge• Link database information on genes,
proteins, metabolic pathways, compounds, ligands, etc. to original sources.
• Increase productivity for accessing, sharing, searching, navigating, cross-linking, analyzing internal /external data
Novartis Institutes for BioMedical Research (NIBR)
Solution• Semantic integration layer on RDF graph• Rich domain-specific terminology (biology,
chemistry and medicine) 1.6 M terms• Terminology Hub: 8 GB of referential data
that cross-references between data repositories.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Find related content & relations by navigating connected entities
“Reason” across entities
Find related content & relations by navigating connected entities
“Reason” across entities
RDF Semantic
Graph-based
Applications
Linked Data Entity Analytics
•Unified metadata model for distributed data sources
•Flexible model for sparse and evolving data
•Validate semantic and structural consistency
SPARQL pattern matchingDetecting related entities
across large, sparse, disparate collections of data
Inferencing: Applying rules on asserted data
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Knowledge Management in Intelligence Domain
Data SourcesContents Repository
DatabasesWeb resources
Blogs, Mails, news, RSS feeds
Information ExtractionFeature Extraction, Term Extraction
Financial Data
Telephone Records
Internet Traffic
Extracted Entities & Relationships
RDF Intelligence OntologiesSQL/SPARQL
Search, Presentation, Report, Visualization, Query
National Intelligence Scenario
Enterprise DataSpatial Documents
Person: Abduwali Abdukhadir Muse
Nationality: SomalianCountry: UK
Group: Al Shabab
Ideology: Islamist
Person: ?
Nationality: Pakistani
Country: Pakistan
Group: ?
Person: Chehab Abdouljamid Bouyaly
Country: Morocco
Group: al Qaeda
Currently resides
Member of
Currently resides
Member of
Supports
Supports
Link ?
Link ?
Member of
Currently resides
Has
Has
images
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Spatial and Graph RDF Semantic Graph Features
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Database 12c RDF Semantic Graph Database
• Exadata ready• Compression & partitioning• Parallel load, inference, query• High availability• Label security: triple-level• W3C standards compliance• Semantic Indexing of text• Enterprise Manager
• Native RDF graph data store• Manages billions of triples• Optimized storage architecture
• SPARQL-Jena/Joseki, Sesame• SQL/graph query, B-tree indexing • Ontology assisted SQL query
• RDFS, OWL2 RL, EL, SKOS• User-defined rules• Incremental, parallel reasoning• User-defined inferencing• Plug-in architecture
Load / Storage
Query
Reasoning
• Semantic indexing framework• Integration with
• OBIEE, Oracle R Enterprise
• Oracle Data Mining
Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Support for Apache Jena and OpenRDF Sesame
Provides application developers with: • Easy-to-use Java APIs to access Oracle databases and RDF files• A standard-compliant SPARQL web service endpoint (Joseki, Fuseki)• Data loading (RDF/XML, N-TRIPLES, N-QUADS, TriG ,Turtle)• JSON output• Oracle-specific extensions for query execution control and
management
Leverage existing investments in open source frameworks
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• RDF views on relational tables• Enables SPARQL query on distributed
resources• Views: Automatic and custom• Aligns with W3C RDB2RDF standard• No duplication of data and storage
RDB to RDF Mapping
Relational to RDF Mapping
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Label Security Data Classification
• Fine grained security through integration with Oracle Label Security• Model level security through GRANT/REVOKE privileges • Oracle Label Security - mandatory access control
• Labels assigned to both users and data• Data labels determine the sensitivity of the rows or the rights a
person must posses in order to read or write the data. • User labels indicate their access rights to the data records.
18
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Core Inferencing Features
• Forward-chaining based inference engine in the database• Native rulebases: RDFS, OWL 2 RL, OWL 2 EL, SKOS• Validation of inferred data• Proof generation • User defined inferencing
- Temporal reasoning, Spatial reasoning• Ladder Based Inference
- Fine grained security for inference graph• Integration with external OWL 2 reasoners (TrOWL, Pellet)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. 20
RDF Semantic Graph: Graph Visualization & Modeling Support
Cytoscape
Graph Visualization Semantic Modeling
Protégé
Oracle Confidential – Internal/Restricted/Highly Restricted
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Analyzing RDF with Oracle BI and Oracle Advanced Analytics
Oracle BI Oracle Advanced Analytics
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Partner Tools: (IO Informatics)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Partner Tools: Tom Sawyer Social Network Analysis
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Manageability of RDF Semantic GraphBuilt in support from Oracle Database utilities and tools
Control query execution:• in database & Jena client
Create & monitor graph w/ SQL Developer:• Semantic Network• Models, virtual models• Btree indexes• Rule bases• Entailments• Security data labels• Semantic index policies
Tune / AnalyzeIngest / Replicate / Recover Manage
Tune load/ query/ inference:• Parallelism• Btree indexing triple/quad• Typed literals indexing• SPARQL query hints• Statistics gathering• Dynamic Sampling
Analyze performance:• Enterprise Manager: view
optimizer plans, monitor execution / resource usage
Bulk load:• Apache Jena bulk loader• Oracle external tables &• SQL*Loader (Direct Path)
w/ PL/SQL Bulk Load API
Replicate & recover:• Data Guard: physical standby• Data Pump: staging tables• Recovery Manager: RMAN
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Open Geospatial Consortium: GeoSPARQL Support
Defines a Vocabulary for Spatial Query Patterns–Classes
• Spatial Object, Feature, Geometry–Properties
• Topological relations• Links between features and geometries
–Datatypes for geometry literals• ogc:wktLiteral, ogc:gmlLiteral
• Query Functions–Topological relations, distance, buffer, intersection, …
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• RDF Graph support in Oracle NoSQL Database Enterprise Edition
• High performance Key Value store• SPARQL 1.1 access to graph data• Jena & Joseki SPARQL Web Services• Massive horizontal scalability • Support for World Wide Web Consortium
(W3C) Semantic Web standards
RDF Graph for Oracle NoSQL
Graph Support on Oracle NoSQL DBBrings horizontal scalability to RDF graph applications
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• High volume, simple queries (low latency)• Queries aggregating over most of the
graph (e.g. what are the hobbies of the 100 most popular people in the network)
• Frequent, large-scale updates• Large Data Centers
RDF Graph for Oracle NoSQL
When to Consider a NoSQL Database for GraphsHorizontal scalability, low query latency/cost, ease of install & management
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Quick Steps to Get Started
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Quick Steps to Get Started
Install Oracle Database 12c
orUse a Prebuilt VM from OTN
Initialize- Creating a tablespace ‘ts’- Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’)
- Run as SYS (for 12.1.0.2 only) in SQL*Plus exec mdsys.enableGeoRaster;
Configure Joseki/Fusekiweb service
endpoint
Using Java APIsLoad/Query/Inference through
GraphOracleSem, DatasetGraphOracleSem,
OracleBulkUpdateHandler, …
Using SQL/PLSQL APIsexec create_sem_model
insert/delete triples, bulk load, run SEM_MATCH, create_entailment, …
SPARQL QuerySPARQL Update
REST APIs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Quick Steps to Get Started
Install Oracle Database 12c
orUse a Prebuilt VM from OTN
Initialize- Creating a tablespace ‘ts’- Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’)
- Run as SYS (for 12.1.0.2 only) in SQL*Plus exec mdsys.enableGeoRaster;
Configure Joseki/Fusekiweb service
endpoint
Using Java APIsLoad/Query/Inference through
GraphOracleSem, DatasetGraphOracleSem,
OracleBulkUpdateHandler, …
Using SQL/PLSQL APIsexec create_sem_model
insert/delete triples, bulk load, run SEM_MATCH, create_entailment, …
SPARQL QuerySPARQL Update
REST APIs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Quick Steps to Get Started
Install Oracle Database 12c
orUse a Prebuilt VM from OTN
Initialize- Creating a tablespace ‘ts’- Run as SYS in SQL*Plus exec sem_apis.create_sem_network(‘ts’)
- Run as SYS (for 12.1.0.2 only) in SQL*Plus exec mdsys.enableGeoRaster;
Configure Joseki/Fusekiweb service
endpoint
Using Java APIsLoad/Query/Inference through
GraphOracleSem, DatasetGraphOracleSem,
OracleBulkUpdateHandler, …
Using SQL/PLSQL APIsexec create_sem_model
insert/delete triples, bulk load, run SEM_MATCH, create_entailment, …
SPARQL QuerySPARQL Update
REST APIs
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Performance
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Oracle Spatial and Graph - LUBM 200K on 3-Node RAC X2-4Load, Inference and Query Performance
The LUBM 200K Graph has 48+ Billion triples (edges)
– Original graph has 26.6 Billion unique triples (quads)
– Inference produced another 21.4 Billion triples
Data Loading Performance
– Triples Loaded and Indexed Per Second (TLIPS): 273K
Inference Performance
– Triples Inferred and Indexed Per Second (TIIPS): 327K
SPARQL Query Performance
– Query Results Per Second (QRPS): 459K
Setup:Hardware: Sun Server X2-4, 3-node RAC
- Each node configured with 1TB RAM, 4 CPU 2.4GHz 10-Core Intel E7-4870) - Storage: Dual Node 7420, both heads configured as: Sun ZFS Storage 7420 4 CPU 2.00GHz 8-Core (Intel E7-4820)
256G Memory 4x SSD SATA2 512G (READZ) 2x SATA 500G 10K. Four disk trays with 20 x 900GB disks @10Krpm, 4x SSD 73GB (WRITEZ)Software: Oracle Database 11.2.0.3.0, SGA_TARGET=750G and PGA_AGGREGATE_TARGET=200G Note: Only one node in this RAC was used for performance test. Test performed in April 2013.
48+ Billion edges graph
Copyright © 2014 Oracle and/or its affiliates. All rights reserved. Oracle Confidential – Internal 34
Oracle Spatial and Graph – LUBM 4400K on Exadata X4-2Load, Inference and Query Performance
Degrees of Parallelism
Data set Load(B triples/hr)
OWL Inference(B triples/hr)
Query(B answers/hr)
256* LUBM 4400K
605.4B / 115.2hrs
475.6+ B / 86hrs 30m
92.5B / 22.5 hrs
Exadata X4-2 High capacity full rackZS3-2 with 2 controllers, 8 trays of diskEight compute nodes of ExadataOracle 12.1.0.1 DB standard install of Exadata* A mix of DOP used: 296, 256, 192
Open cursors = 1000Processes = 1000SGA = 132GB, PGA = 100GB32K blocksize was given to all graph tablespacesTEMP group was created with 3 bigfile tablespaces Test performed in Aug/Sept 2014.
Setup:
Data Loading Performance
– Triples Loaded and Indexed Per Second (TLIPS): 1.420M
Inference Performance
– Triples Inferred and Indexed Per Second (TIIPS): 1.527M
SPARQL Query Performance
– Query Results Per Second (QRPS): 1.130M
1.08 Trillion edges graph
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Best Practices in Solving Performance Issues• When there is an underperforming SQL in RDF data loading, inference,
or query operations, check:
• Have you gathered statistics?• APIs: export_model_stats,export_entailment_stats,
export _network_stats, import_model_stats, import_entailment_stats, import_network_stats
• Have you tried parallel execution?• Balanced hardware is key.
• Have you tried dynamic sampling? (Level 6, 8, 11)• Is there a lack of indexes (including text index)?
• DO NOT just add indexes without careful & thorough testing
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• When there is an underperforming SQL in RDF data loading, inference, or query operations, check:
• Have you looked at the plan?• Is it possible to write the same query in a different way?• Is it possible to simplify?
• Simpler queries Better chance to find more efficient ways to execute
• Tweak plan through hints• Send a small, reproducible test case with the execution plan to Oracle Support
or post it on the Forum
Best Practices in Solving Performance Issues (2)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
• Find the top thread(s) in Java VM
• Are there excessive GC activities?• Try –XX:+UseParallelGC, -XX:+UseConcMarkSweepGC, …
• Has the heap size been set properly?• Try larger heap size, analyze heap by performing a heap dump
• Send a small, reproducible test case with the thread dump to Oracle Support or post it on the Forum
Best Practices in Solving Performance Issues (3)
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Cool Ongoing Activities:
• Enable Oracle Cloud Services: Oracle Social Network • Integration with Oracle business applications and middleware• Ongoing support for RDF Graph on all major platforms
• Relational Database• NoSQL Database• Big Data (Hadoop)• Cloud
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
Appendix
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
W3C Semantic Technology Stack
http://www.w3.org/2007/03/layerCake.svg
• Core Technologies• URI
• Uniform resource identifier
• RDF• Resource description
framework
• RDFS• RDF Schema
• OWL• Web ontology language
Copyright © 2014 Oracle and/or its affiliates. All rights reserved.
What is RDF A graph data model for web resources
and their relationships
The graph can be serialized into- RDF/XML, N3, N-TRIPLE, …
Construction unit: Triple (or assertion, or fact)
<http://foobar> <:produces> <:mp3>
Quads (named graphs) add context, provenance, identification, etc. to assertions
<http://foobar> <:produces> <:mp3 > <:ProductGraph>
Subject Predicate Object
http://www.foobar.com
“CA”
http://www.foobar.com/products/mp3
http://…/locatedIn
http://…/produce
http://www.oracle.com
http://www.oracle.com/products/RDF
http://…/produce
http://…/customerOf
http://…/uses