Upload
data-ninja-api
View
50
Download
3
Embed Size (px)
Citation preview
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Applying Large-Scale Text Analytics with Graph Databases to Visualize Entity and Relationship Inferences Trung Diep Ronald Sujithan Zhe Wu Architect Software Architect Architect Docomo Innovations Docomo Innovations Oracle
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Outline
2
• Introduction and overview of graph technologies and graph database
• RDF semantic graph
• Property graph
• Overview of text analytics offered by Data Ninja Services
• Case Study #1: news mining application
• Case Study #2: insights from analyzing Amazon product reviews
• Summary
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
• Relational Model • Graph Model
Relational Model vs. Graph Model
Courtesy: Tom Sawyer 2016
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Two Graph Models: RDF and Property Graph
RDF Data Model
• Data federation
• Knowledge representation
• Inferencing
Social Network Analysis
National Intelligence Public Safety Social Media search Marketing - Sentiment
Linked Data / Semantic Mediation
Property Graph Model • Graph Search & Analysis
• Big Data analytics
• Entity analytics
Life Sciences Health Care Publishing Finance
Application Area Graph Model Industry Domain
Release 2 (12.2) in Oracle Cloud
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
• World’s fastest data loading performance
• World’s fastest query performance
• Worlds fastest inference performance
• Massive scalability: 1.08 trillion edges
• Platform: Oracle Exadata X4-2 Database Machine
• Source: w3.org/wiki/LargeTripleStores, 9/26/2014
Oracle Database 12c can load, query and inference millions of RDF graph edges
per second
0.00
0.50
1.00
1.50
2.00
Query Load Inference
1.13
1.42 1.52
Millions of triples per second
World’s Fastest Big Data Graph Benchmark 1 Trillion Triple RDF Benchmark with Oracle Spatial and Graph
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
What is RDF
A graph data model for web resources and their relationships
The graph can be serialized into - RDF/XML, N3, N-TRIPLE, …
Construction unit: Triple
(or assertion, or fact) <http://foobar> <:produces> <:mp3>
Quads (named graphs) add context, provenance, identification, etc. to assertions
<http://foobar> <:produces> <:mp3 > <:ProductGraph>
Subject Predicate Object
http://www.foobar.com
“CA”
http://www.foobar.com/products/mp3
http://…/locatedIn
http://…/produce
http://www.oracle.com
http://www.oracle.com/products/RDF
http://…/produce http://…/uses
6
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Semantic Graph Technologies Partners Ontology Engineering & Visualization
Open Source Frameworks Standards
External Reasoners
Applications & Tools SI / Consulting
Sesame Joseki
NLP Entity Extractors
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
The advantage of Oracle RDF Triple store: – Greater flexibility that single purpose triple stores
– SPARQL and SQL interaction with relationally stored data
– Use of SQL Hints, indexes and caching to increase performances
– Standard DB Administration : Backup/recovery/replication, etc…
– PL/SQL or Java programming
– Supports large volumes of data (100’s of billions to over a trillion)
– Good integration with standard RDF client tools such as Jena and Sesame
Why Oracle Spatial & Graph for Linked Data?
Oracle Semantic Graph in a scientific knowledge portal Date 16-09-2013
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
GeoSPARQL Support for Spatial Data
Enterprise Data Servers
Spatial Database Population Statistics
Database
Relational Schema 2D Feature Schema
Web Analyst 1 Web Analyst 2
Linked Data Graphs
Pop_Stat_Graph Spatial_Graph
SPARQL/GeoSPARQL
Spatial Vocabularies
Rest
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Enriching Text Using NLP and Domain Ontologies
NLP Machine Learning
Genzyme ontologies
Search, Presentation, Report, Visualization, Query
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Data Ninja Text Analytics Cloud Services
12
Text Analytics
Ontology (RDF)
Oracle Social Cloud
Unstructured Data
Semantic Extractor
Relational Table
Oracle Spatial and Graph
Graph Analytics
Graph Visualization
Structured Data
New BusinessInsights
by making graph inferences that could not be queried in a relational database
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Graph Roadmap
• SPARQL optimization with RDBMS kernel
• SNA Analysis: Cluster, path analysis, community detection, page rank...
• Manageable: Enterprise Developer integration
• R2RML Enhancements: Geospatial (vector) features
• Deeper RDBMS kernel: Graph computation
• Standards based: OWL QL
• Multi-type support: graph, relational, JSON, text, geospatial …
• Visualization: Richer graph visualization options
13
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Property Graph
14
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
The Property Graph Data Model
• A set of vertices (or nodes) – each vertex has a unique identifier.
– each vertex has a set of in/out edges.
– each vertex has a collection of key-value properties.
• A set of edges (or links) – each edge has a unique identifier.
– each edge has a head/tail vertex.
– each edge has a label denoting type of relationship between two vertices.
– each edge has a collection of key-value properties.
https://github.com/tinkerpop/blueprints/wiki/Property-Graph-Model
15
3
1
6
4
2
5
weight=0.4
weight=1.0
weight=0.2
weight=0.4
9
8 7
weight=0.5
10
12
11
knows
knows
created
created
created
created
weight=1.0
name= “ripple” lang = “java”
name= “lop” lang = “java”
name= “peter” age = 35
name=“josh” age = 32
name = “vadas” age = 27
name=“marko” age = 29
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Graph Analysis in Business
Purchase Record
customer items
Product Recommendation Influencer Identification
Communication Stream (e.g. tweets)
Graph Pattern Matching Community Detection
Recommend the most similar item purchased by similar people
Find out people that are central in the given network – e.g. influencer marketing
Identify group of people that are close to each other – e.g. target group marketing
Find out all the sets of entities that match to the given pattern – e.g. fraud detection
16
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Oracle Big Data Spatial and Graph
Data Access Layer
Architecture of Existing Property Graph Support
Graph Analytics
Apache Blueprints & Lucene/SolrCloud
RDF (RDF/XML, N-Triples, N-Quads,
TriG,N3,JSON)
REST/W
eb
Service
Java, Gro
ovy, P
ytho
n, …
Java APIs
Java APIs/JDBC/SQL/PLSQL Property graph formats supported
GraphML GML
Graph-SON Flat Files
CSV Relational Data Sources
Oracle NoSQL Database
Apache HBase
Parallel In-Memory Graph Analytics (PGX)
Oracle Spatial and Graph
Oracle Database 12.2
Java SDK
Java APIs
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Support for Cytoscape Open Source Visualization
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Integration with Tom Sawyer Perspectives via property graph REST APIs
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
In-Memory Analyst on 1 node is up to 2 orders of magnitude faster than Spark GraphX distributed execution on 2 to 16 nodes
Oracle’s In-Memory Analyst vs Spark GraphX 1.1
20
0.1
1
10
100
1000
10000
Oracle
Spark (2
)
Spark (4
)
Spark (8
)
Spark (1
6)
Oracle
Spark (2
)
Spark (4
)
Spark (8
)
Spark (1
6)
Twitter Web
Exe
cuti
on
Tim
e (
secs
)
1
10
100
1000
10000
Oracle
Spark (2
)
Spark (4
)
Spark (8
)
Spark (1
6)
Oracle
Spark (2
)
Spark (4
)
Spark (8
)
Spark (1
6)
Twitter Web
Exe
cuti
on
Tim
e (
secs
)
Single-Source Shortest Path
Pagerank
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Oracle Big Data Spatial and Graph
…
Data Access Layer
Roadmap for Property Graph Support
Apache TinkerPop3 & Lucene/SolrCloud/ElasticSearch
RDF (RDF/XML, N-Triples, N-Quads,
TriG,N3,JSON)
REST/W
eb
Service
Java, Gro
ovy, P
ytho
n, …
Java APIs
Java APIs/JDBC/SQL/PLSQL Property graph formats supported
GraphML GML
Graph-SON Flat Files
CSV Relational Data Sources
21
Oracle NoSQL
Database
Apache HBase
Oracle Spatial and Graph
Oracle Database 12.2
Apache Spark Integration (ML lib, SPARK-SQL)
Deep Learning (Neural Networks)
Graph Analytics
Parallel In-Memory Graph Analytics (PGX)
Apache Cassandra
Java SDK
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Case Study: News Mining Application
22
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Text Analytics API in N-TRIPLE Format
23
Free-form Texts
Structured Data
Documents Messages
News
Concepts Categories Entities Sentiments
• Cloud-based web services • Daily updated knowledge base • Support for customization • Scalable performance
Text Analytics
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Text Analytics API for Constructing RDF Graphs
24
Free-form Texts
N-Triples
Documents Tweets
News
Concepts Categories Entities Sentiments
Text Analytics API RDF Graphs
Concepts
Categories
Entities
Entity Categories
Texts
Ontology
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
News Mining Overview
newsID newsArticle newsSource
20160902_555 A new study says that parts of Africa and the Asia-Pacific region may be vulnerable to outbreaks of the Zika virus, including some of the world's most populous countries and many with limited resources to identify and respond to the mosquito-borne disease. [more]
http://www.newkerala.com/news/2016/fullnews-113309.html
20160903_1317 Hurricane Hermine, set to cause flooding and damage when it hits Florida overnight, will make it harder for the state to fight Zika, a mosquito-borne virus shown to cause birth defects, experts in infectious diseases and mosquitoes said on Thursday. [more]
http://kelo.com/news/articles/2016/sep/01/hurricane-hermine-will-complicate-floridas-zika-fight-experts/
20160904_2209 Singapore confirmed 26 more cases of locally transmitted Zika infections, the health ministry and National Environment Agency (NEA) said in a joint statement on Saturday, bringing the tally to 215. Of the 26 new cases, 24 were linked to existing clusters while two cases have no known links to any existing cluster, they said. [more]
https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-052937119--finance.html
… … …
• Domain-specific, health-related news crawling
• English language only
• Worldwide coverage
• Healthcare-related keywords in news titles
25
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Graph Example of Extracted Entities Subject Predicate Object
http://www.newkerala.com/news/2016/fullnews-113309.html
http://dataninja.net/occurrence urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/entity http://dataninja.net/entity/Zika+virus
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/occurrence/entity/sentiment http://dataninja.net/entity/sentiment/negative
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/occurrence/entity/count "12"^^xsd:integer
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/occurrence/entity/sentiment_score “-1.0"^^xsd:float
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/occurrence/entity/score "1.0"^^xsd:float
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/occurrence/entity/text_locations "(135,145) (565,575) (777,787) (950,960) (1142,1152) (1535,1545) (1696,1706) (1755,1765) (1887,1891) (2191,2195) (2352,2362) (2376,2386)"
(265 more for same news article)
26
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Graphs for Extracted Entities (one news article)
27
http://www.newkerala.com/news/2016/fullnews-113309.html
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/entity/Zika+virus
negative
12 …
http://dataninja.net/occurrence
http://dataninja.net/entity
http://dataninja.net/occurrence/entity/sentiment
http://dataninja.net/occurrence/entity/count
http://dataninja.net/entity/Philippines
http://dataninja.net/entity/Thailand
http://dataninja.net/entity/Nigeria
One occurrence-blank node for each extracted entity
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Graphs for Extracted Entities (multiple articles)
28
http://www.newkerala.com/news/2016/fullnews-113309.html
urn:uuid:e47c4916-e7c1-4a3b-b650-f243e0d7ba33
http://dataninja.net/entity/Zika+virus
http://www.newkerala.com/news/2016/fullnews-113309.html
urn:uuid:68282cbb-b70c-4f6e-8157-5ef6b1d34d31
https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-052937119--finance.html
urn:uuid:ab7b9e43-710f-436e-b6ff-15abad71ca15
Same URI for same entity
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Ontology for Extracted Entities
29
http://dataninja.net/entity/Philippines
http://dataninja.net/entity/Thailand
http://dataninja.net/entity/Nigeria
http://dataninja.net/entity/category/Location http://dataninja.net/entity/category/Country
http://dataninja.net/entity/category/Kingdom
rdfs:subClassOf
Ontology extracted for categories of entities
rdfs:subClassOf
rdfs:subClassOf
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Ontology for Extracted Entities (with more categories)
30
http://dataninja.net/entity/Philippines
http://dataninja.net/entity/Thailand
http://dataninja.net/entity/Nigeria
http://dataninja.net/entity/category/Location
http://dataninja.net/entity/category/Country
rdfs:subClassOf
http://dataninja.net/category/Southeast+Asia
http://dataninja.net/entity/category/Kingdom
http://dataninja.net/category/Regions+of+Asia
Additional categories of entities added to ontology http://dataninja.net/category/Africa
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
rdfs:subClassOf
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Graphs for Extracted Concepts (one news article)
31
http://www.newkerala.com/news/2016/fullnews-113309.html
urn:uuid:3f365159-2572-4c91-99ea-0f7ec7c0b7bc
http://dataninja.net/concept/Zika+virus
0.33
http://dataninja.net/occurrence
http://dataninja.net/concept
http://dataninja.net/occurrence/concept/score
http://dataninja.net/entity/Zika+fever
Same URI for same concepts, but not for entities with same names
http://dataninja.net/entity/Zika+virus
owl:sameAs
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Graphs for Extracted Concepts (with categories)
32
http://dataninja.net/concept/Zika+virus
http://dataninja.net/concept/Zika+fever http://dataninja.net/category/Flaviviruses
http://dataninja.net/category/Zoonoses http://dataninja.net/category/Viral+diseases
http://dataninja.net/category/Infectious+diseases
rdfs:subClassOf
rdfs:subClassOf
More categories of concepts added to improve richness of ontology
rdfs:subClassOf
rdfs:subClassOf rdfs:subClassOf
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
RDF Graphs for Extracted Relationships
33
https://www.yahoo.com/news/singapore-says-confirms-26-more-local-transmission-zika-052937119--finance.html
http://dataninja.net/entity/Zika+virus http://dataninja.net/entity/Singapore
http://dataninja.net/occurrence
http://dataninja.net/entity
http://dataninja.net/relationship/Outbreak
http://dataninja.net/relationship/Mosquitoes
http://dataninja.net/relationship/Infections
New relationships discovered over time to
enrich the ontology further
owl:intersectionOf
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Semantic Search using RDF Graphs
34
Documents Documents
News Articles
Oracle Spatial and Graph
Concepts, related concepts, categories, entities, entity
categories, keywords, relationships
Relevant Matched
News Articles
Oracle Graph Analytics
Queries
RDF Graph
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Case Study: Insights from analyzing Amazon Product Reviews
35
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Amazon Product Reviews – PG Data Model
36
A
1 5
Helpful reviewText
Overall Summary
reviewTime
Review
created
asin=“0000078”
name=“John” Raw JSON Format: {"reviewerID": "A3AF8FFZAZYNE5",
"asin": "0000000078",
"helpful": [1, 1],
"reviewText": “…”,
"overall": 5.0,
"summary": "Impactful!",
"unixReviewTime": 1092182400,
"reviewTime": "08 11, 2004"}
B C
3
D
2
Review Review Review Review Review
name=“Sue” name=“buy1” name=“shopper”
asin=“10467328” asin=“00675434” asin=“20794378”
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Amazon Product Reviews – Data Ninja Enrichment
37
A
1 5
helpful reviewText
overall summary
reviewTime sentiment
sentimentScore
Review
created
asin=“0000078”
name=“John”
B C
3
D
2
Review Review Review Review Review
name=“Sue” name=“buy1” name=“shopper”
asin=“10467328” asin=“00675434” asin=“20794378”
JSON
Parser Fetch
Sentiment
Create
Nodes Create
Relationship
Oracle
Connector
Product Review
Oracle Big Data Spatial and Graph
Oracle NoSQL Database Apache HBase
Product Review Product Review
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — Data Ninja Integration
# Please sign-up at https://market.mashape.com/dataninja/smart-content
# and obtain your free Data Ninja API key.
# Alternatively, you can use the Amazon Web Services API Gateway
# using your AWS account): https://auth.dataninja.net/cart
smartcontent_url = 'https://smartcontent.dataninja.net/smartcontent/tag'
mashape_key = ‘YOUR_API_KEY_HERE’
headers = {'Content-Type': 'application/json',
'Accept': 'application/json',
'X-Mashape-User': user_name,
'X-Mashape-Key': mashape_key}
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — Data Ninja Integration
def getSmartSentiment(text):
payload = {'text': text}
r = requests.post(smartcontent_url, headers=headers,
data=json.dumps(payload))
data = r.json()
# Extract the sentiment and sentiment_score from output
sentiment = ''
if 'sentiment' in data:
sentiment = data['sentiment']
sentScore = 0.0
if 'sentiment_score' in data:
sentScore = data['sentiment_score']
return sentiment, sentScore
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — Initialization
# Log into the Oracle Big Data Lite VM
cd /opt/oracle/oracle-spatial-graph/property_graph/dal/groovy/sh
gremlin-opg-nosql.sh
server = new ArrayList<String>();
server.add(“localhost:5000");
cfg = GraphConfigBuilder.forPropertyGraphNosql() \
.setName(“aws_review").setStoreName("kvstore") \
.setHosts(server) \
.addVertexProperty("name", PropertyType.STRING, “EMPTY_NAME") \
.addEdgeProperty("overall", PropertyType.DOUBLE, "0.0") \
.addEdgeProperty("sentimentScore", PropertyType.DOUBLE, "0.0") \
.addEdgeProperty("sentiment", PropertyType.STRING, "NO_SENTIMENT") \
.addEdgeProperty("reviewText", PropertyType.STRING, "NO_REVIEW") \
.setMaxNumConnections(2).build();
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — Create session
// Create an in-memory instance of our property graph using
// the configuration from the previous step
opg = OraclePropertyGraph.getInstance(cfg);
// Create a new Analyst session and read the graph from database
// into memory — this will allow us to perform PGQL queries
// efficiently and run built-in graph algorithms
session = Pgx.createSession("session1");
analyst = session.createAnalyst();
pgxGraph = session.readGraphWithProperties(cfg);
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — PGQL queries // PGQL is a SQL-like query language for Property Graphs
// http://pgql-lang.org/
query1 = “SELECT n, e, e.overall, e.sentimentScore, m ” +
“WHERE (n) -[e]-> (m) LIMIT 10”;
pgxResultSet=pgxGraph.queryPgql(query1);
pgxResultSet.print(10);
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| n | e | m |
===============================================================================================
| PgxVertex[ID=-7340878287527889238] | PgxEdge[ID=5762] | PgxVertex[ID=-9102601091582098129] |
| PgxVertex[ID=-3177690238472796119] | PgxEdge[ID=16300] | PgxVertex[ID=-9064039503677645533] |
| PgxVertex[ID=4519911688218637303] | PgxEdge[ID=17019] | PgxVertex[ID=-8952286227085815033] |
| PgxVertex[ID=-519930175215930092] | PgxEdge[ID=10178] | PgxVertex[ID=-8670116947875050439] |
| PgxVertex[ID=-3248157193225014577] | PgxEdge[ID=10818] | PgxVertex[ID=-8450344604270036796] |
| PgxVertex[ID=1160440609280744779] | PgxEdge[ID=11251] | PgxVertex[ID=-8079550817648245886] |
| PgxVertex[ID=6181033568449534264] | PgxEdge[ID=8948] | PgxVertex[ID=-7996993222650009100] |
| PgxVertex[ID=8061500766289030429] | PgxEdge[ID=3605] | PgxVertex[ID=-7826585563510228947] |
| PgxVertex[ID=-6856157354094250528] | PgxEdge[ID=5813] | PgxVertex[ID=-7593018979011067527] |
| PgxVertex[ID=862019015540675002] | PgxEdge[ID=1018] | PgxVertex[ID=-7556968917107238591] |
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — Aggregate queries (1)
// Example1: Disagreement in polarity: high rating and low sentiment score
query2 = “SELECT n.name, e.overall, e.sentimentScore, e.reviewText, m “ +
“WHERE (n) -[e with overall > 4.0 and sentimentScore < -0.9]-> (m) “ +
“order by e.sentimentScore LIMIT 10”;
pgxResultSet=pgxGraph.queryPgql("query2");
pgxResultSet.print(10);
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| n.name | e.overall | e.sentimentScore | e.reviewText | m |
===============================================================================================================================================================================================
| Kiwi | 5.0 | -0.90514827 | She climbed out of the cockpit of her Fairey Barracuda and became instantly famo | PgxVertex[ID=1878509548385937579] |
| Gary Selikow | 5.0 | -0.90514827 | The Holocaust A History of the Jews of Europe During the Second World War , by p | PgxVertex[ID=9122138607977681669] |
| Miss Calculation "Mathbaby" | 5.0 | -0.90514827 | There I was. Probably the only one in the movie theater above the age of thirtee | PgxVertex[ID=-611636155378504919] |
| Srinivas P. Ganti "prasad" | 5.0 | -0.90514827 | In a very exhaustive account of Middle Eastern politics, Friedman narrates, base | PgxVertex[ID=7872217753950946849] |
| Bluestalking Reader "Bluestalking Reader" | 5.0 | -0.90514827 | I guess the only way to do this is just plunge right in, though of all the books | PgxVertex[ID=4467821667800686818] |
| Bonnie Brody "Book Lover and Knitter" | 5.0 | -0.90514827 | Joyce Carol Oates has written a deeply felt memoir, ̀ A Widow's Story', following | PgxVertex[ID=4467821667800686818] |
| Stephen Frater | 5.0 | -0.90514827 | Book reviewBy STEPHEN FRATER, author of HELL ABOVE EARTHLOST IN SHANGRI-LA: | PgxVertex[ID=5830558107292558467] |
| Cy B. Hilterman "Cy. Hilterman" | 5.0 | -0.90514827 | A true historic story of survival in the jungles of New Guinea amidst natives wh | PgxVertex[ID=5830558107292558467] |
| Cy B. Hilterman "Cy. Hilterman" | 5.0 | -0.90514827 | What a delightful read! Water for Elephants has got to be one of the best reads | PgxVertex[ID=5894498295248166816] |
| John Umland | 5.0 | -0.90514827 | I read Unbroken in two days. I will summarize the story, mention the author's ef | PgxVertex[ID=-5439053811866244671] |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — Aggregate queries (2)
// Example2: Disagreement in polarity: low rating and high sentiment score
query3 = “SELECT n.name, e.overall, e.sentimentScore, e.reviewText, m “ +
“WHERE (n) -[e with overall < 2.0 and sentimentScore > 0.9]-> (m) “ +
“order by e.sentimentScore LIMIT 10”;
pgxResultSet=pgxGraph.queryPgql("query3");
pgxResultSet.print(10);
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| n.name | e.overall | e.sentimentScore | e.reviewText | m |
=====================================================================================================================================================================================
| Amazon Customer | 1.0 | 0.90086615 | This title is deceptive-- no one knows what an "Annual" is, except for t | PgxVertex[ID=-5918987544460979951] |
| Galina | 1.0 | 0.90110934 | This book takes many, many pages to say in a remarkably roundabout and flowery w | PgxVertex[ID=-4968252386747415161] |
| Elizebeth Neumann | 1.0 | 0.90114343 | Unless you enjoy reading a book as interesting as the dictionary this book isnt | PgxVertex[ID=5415848389720693761] |
| Doug Rice | 1.0 | 0.90118825 | A dictionary should demonstrate good lexicographic technique and have an up-to-d | PgxVertex[ID=-8360052157946045560] |
| Doug Rice | 1.0 | 0.9011979 | A dictionary should demonstrate good lexicographic technique and have an up-to-d | PgxVertex[ID=-5498908216507816124] |
| Kindle Reader "Kindle Reader" | 1.0 | 0.90166533 | This was positively the most frustrating book I have ever read. Where others mi | PgxVertex[ID=-4463070554159192016] |
| Alessandro Bruno | 1.0 | 0.90180194 | I felt compelled to review this book in order to shake off that feeling of intel | PgxVertex[ID=3280190210596483762] |
| Hiwaycruzer | 1.0 | 0.9018065 | This book is a must read for all teenagers considering a career at nearby Disney | PgxVertex[ID=3280190210596483762] |
| Jackal | 1.0 | 0.9019033 | This is a boring book about traditional Russian cooking. If you want current Rus | PgxVertex[ID=2190854144543979320] |
| Amazon Customer "Sci-reader" | 1.0 | 0.90192723 | I just finished this book and I must ay that it was a spectacularly boring coll | PgxVertex[ID=-6659798236378008734] |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Demo — Graph Algoritms
// Personalized Pagerank
vertexSet = pgxGraph.createVertexSet();
vertex = pgxGraph.getVertex(4681900072665192241L);
vertexSet.add(vertex);
ppr = analyst.personalizedPagerank(pgxGraph, vertexSet);
it = ppr.getTopKValues(10); // iterate over the top-K values
// Community detection, Path Analysis, Clustering, …
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Summary
• Introduction and overview of graph technologies and graph database • RDF Semantic Graph
• Property Graph
• Integrating text analytics with graph technologies • Construct graph out of text using Natural Language Understanding technologies
• Enrich graph data with text analytics
• Data Ninja Services Java client for Oracle Spatial and Graph available with the Oracle Big Data Lite Virtual Machine • Please try it and give us your feedback!
• Contact us at [email protected] or [email protected]
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Resources
• Oracle Spatial and Graph
oracle.com/technetwork/database/options/spatialandgraph
• Oracle Big Data Spatial and Graph
oracle.com/database/big-data-spatial-and-graph/index.html
• Data Ninja Services
https://dataninja.net
• Java SDK for Oracle Spatial and Graph
https://github.com/DataNinjaAPI/dataninja-api-oracle-sdk-java
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
BACKUP
48
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved. 49
Semantic Alignment of Enterprise Metadata Powering Enterprise Federation and Integration
Benefits:
– Existing relational data stays in place and corresponding applications do not need to change
– Use of virtual mapping eliminates synchronization issues
– Common vocabulary helps with data integration issues
Database Server
HR Schema Inventory Schema Sales Schema
Mid-Tier Server
Application 1
Application 2 Application 3
SQL RDF Graph
Inventory Graph Sales Graph
Shared Ontologies
SPARQL
HR Database Inventory Database Sales Database
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
The National Statistics Center (NSTAC), an incorporated administrative agency, forms a part of the central statistical organization in Japan.
The Database of IMISOS has been Exadata X2-2 Half Rack since 2013,with Active Data Guard option and Database Firewall. Oracle Japan published customer case study.
NSTAC also bought Exadata X3-2 Eighth Rack for the Tabulation Work. (FY14Q4)
Other Exadata opportunity for population census will be closed by FY15Q3.
50
http://www.nstac.go.jp/en/index.html
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved. 51
• Pattern matching on relational tables
• Supports W3C RDF & SPARQL standard
• Automatic and custom mapping
• RDF views: on tables, views, SQL query results
• No duplication of data and storage
• Direct Mapping – Automatic
• R2RML - express customized mappings
RDF Semantic Graph RDF Views on Relational Tables
EmpNo Ename Job Mgr DeptNo
7521 Ward Salesman 7698 10
7698 Blake Manager 7839 10
7839 King President 30
DeptNo LOC
10 NYC
30 CHI
Ward Blake King
Salesman Manager President
:emp7521 :emp7698 :emp7839
:dept10 :dept30
NYC CHI
:name :name :name :job :job :job
:hasMgr :hasMgr
:worksAt :worksAt :worksAt
:location :location
Copyright © 2017 Oracle and/or its affiliates and Data Ninja Services. All rights reserved.
Text Search through Apache Lucene/SolrCloud
• Integration with Apache Lucene & SolrCloud
• Support manual and auto indexing of Graph elements
• Manual index:
• oraclePropertyGraph.createIndex(“my_index", Vertex.class);
• indexVertices = oraclePropertyGraph.getIndex(“my_index” , Vertex.class);
• indexVertices.put(“key”, “value”, myVertex);
• Auto Index
• oraclePropertyGraph.createKeyIndex(“name”, Edge.class);
• oraclePropertyGraph.getEdges(“name”, “*hello*world”);
• Enables queries to use syntax like “*oracle* or *graph*”
52