Upload
ontotext
View
196
Download
0
Embed Size (px)
Citation preview
May 3, 2023
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud
Announcement: New training course
Designing a Semantic Technology Proof of Concept with GraphDB™13 December 2016 | 10am CET | 9am GMT | 11am EET
Course contents:• 3 hours worth of tailored video materials on Semantic Technologies• 2 hours worth of SPARQL exercises and sample solutions• 4 hours live interactive session designing a sample Proof of Concept with GraphDB• 1 hour 1-on-1 consulting follow-up session
Topics Covered
#2
• Modeling data using the Resource Description Framework• Applying flexible schemas on schema-less data• Using simple ontologies for automated reasoning on data• Effectively using and configuring RDF databases & repositories• Transforming, cleaning up and linking heterogeneous data with
OntoRefine
• Loading distributed data in one unified data layer• Querying and updating RDF data with SPARQL• Linked Open Data: how to link data and useful LOD resources• Data exploration and data visualization with GraphDB™• Domain-specific use cases of adopting semantic technologies
Presentation Outline• Modeling data using RDF
• Applying flexible schema on schema-less data
• Ontologies for automated reasoning on data
• SPARQL query types and modifiers
• Graph databases and triplestores
• Choosing an appropriate database solution
• Niche-specific reference projects
• S4 for on-demand low-cost smart data management
• S4 REST services
• S4 Knowledge graph
#3
MODELING DATA USING RDF
#4
Example
#5
Information can be described through relationships between things, e.g.• The relationship between the movie Thor and Kenneth Branagh is that
Kenneth directed the movie.• The relationship between the movie Thor and the date May 6, 2011 is that
the movie was released (in the US) on that date.Such descriptions are formalized using the Resource Description Framework.
Resource Description Framework (RDF) is a graph data model that• Formally describes the semantics, or meaning, of information• Represents metadata, i.e., data about data
RDF data model consists of triples• That represent links (or edges) in an RDF graph• Where the structure of each triple is Subject, Predicate, Object
Example triples:
‘mdb:’ refers to the namespace ‘http://example.org/movieDB/’ so that ‘mdb:Thor’ expands to <http://example.org/movieDB/Thor> a Universal Resource Identifier (URI).
What is RDF?
Subject Predicate Object
mdb:Thor mdb:directedBymdb:KennethBranagh .mdb:Thor mdb:releaseDate 2011-05-06 .
6#6
An Example of an RDF Model
7#7
An Example of an RDF Model
8#8
An Example of an RDF Model
9#9
An Example of an RDF Model
10#10
But RDF is more than just a tool for representing information that we already know!
FLEXIBLE SCHEMA &AUTOMATED REASONING
#11
RDF Schema (RDFS)
• Adds– Concepts such as Resource, Literal, Class, and Datatype – Relationships such as subClassOf, subPropertyOf, domain, and range
• Provides the means to define– Classes and properties– Hierarchies of classes and properties
• Includes “entailment rules”, i.e., axioms to infer new triples from existing ones
What is RDFS?
12#12
Applying RDFS To Infer New Triples
mdb:directedBy rdfs:domain mdb:Movie ; rdfs:range mdb:Director .
mdb:Thor mdb:directedBy mdb:KennethBranagh .mdb:Director rdfs:subClassOf mdb:Human .
mdb:Thor a mdb:Movie .mdb:KennethBranagh a mdb:Director .
mdb:KennethBranagh a mdb:Human .
13#13
An ontology is a formal specification that provides sharable and reusable knowledge representation.
Other examples of such formal specifications include:
• Taxonomies
• Vocabularies
• Thesauri
• Topic Maps
• Logical Models
#14
What is in an Ontology?
What is in an Ontology?
An ontology specification includes descriptions of• Concepts and properties in a domain • Relationships between concepts • Constraints on how the relationships can be used• Individuals as members of concepts
15#15
The Benefits of an Ontology
Ontologies provide:• A common understanding of information• Explicit domain assumptions
These provisions are valuable because ontologies:• Support data integration for analytics• Apply domain knowledge to data• Support interoperation of applications• Enable model-driven applications• Reduce the time and cost of application development• Improve data quality, i.e., metadata and provenance
16#16
OWL Overview
The Web Ontology Language (OWL) adds more powerful ontology modelling means to RDF/RDFS• Providing
– Consistency checks: Are there logical inconsistencies?– Satisfiability checks: Are there classes that cannot have instances?– Classification: What is the type of an instance?
• Adding identity equivalence and identity difference – Such as, sameAs, differentFrom, equivalentClass, equivalentProperty
• Offering more expressive class definitions, such as– Class intersection, union, complement, disjointness– Cardinality restrictions
• Offering more expressive property definitions such as,– Object and datatype properties– Transitive, functional, symmetric, inverse properties– Value restrictions
17#17
SPARQL
#18
#19
What is SPARQL?
SPARQL is a SQL-like query language for RDFgraph data with the following query types:
• SELECT which returns tabular results • CONSTRUCT creates a new RDF graph based on query results• ASK which returns ‘yes’ if the query has a solution, otherwise ‘no’• DESCRIBE which returns RDF graph data about a resource; useful when the query
client does not know the structure of the RDF data in the data source• INSERT which inserts triples into a graph• DELETE which deletes triples from a graph.
SemanticSearch
Ontotext, AD and Keen Analytics, LLC. All Rights Reserved 19
Using SPARQL to Insert Triples
To create an RDF graph, perform these steps:• Define prefixes to URIs with the PREFIX keyword
• Use INSERT DATA to signify you want to insert statements. Write the subject-predicate-object statements (triples).
• Execute this query. PREFIX mdb: <http://example.org/movieDB/>INSERT DATA { mdb:Thor mdb:starring mdb:ChrisHemsworth;
mdb:starring mdb:NataliePortman,
mdb:AnthonyHopkins. }
#20
Using SPARQL to Select Triples
To access the RDF graph you just created, perform these steps:• Define prefixes to URIs with the PREFIX keyword.
• Use SELECT to signify you want to select certain information, and WHERE to signify your conditions, restrictions and filters.
• Execute this query.
PREFIX : <http://example.org/movieDB>SELECT ?subject ?predicate ?object WHERE {?subject ?predicate ?object }
Subject Predicate Object
mdb:Thor mdb:directedBymdb:KennethBranaghmdb:Thor mdb:releaseDate2011-05-06mdb:Thor mdb:starring mdb:ChrisHemsworthmdb:Thor mdb:starring mdb:NataliePortmanmdb:Thor mdb:starring mdb:AnthonyHopkins
#21
Using SPARQL to Find Prolific Actors
To find actors who stars in multiple movies, first find out if such an actor exists:• Define prefixes to URIs with the PREFIX keyword
• Use ASK to discover whether an actor is starring in two (or more) different movies
• Use WHERE to signify those conditions.
YES
PREFIX mdb: <http://example.org/movieDB/>PREFIX owl: <http://www.w3.org/2002/07/owl#>ASKWHERE { ?movie1 a mdb:Movie; mdb:starring ?actor . ?movie2 a :Movie; mdb:starring ?actor . FILTER NOT EXISTS {?movie1 owl:sameAs ?movie2}}
Ontotext, AD and Keen Analytics, LLC. All Rights Reserved#22
Using SPARQL to Find Prolific Actors
Now that we know at least one such actor exists, perform these steps to find each actor and pair of movies:• Define prefixes to URIs with the PREFIX keyword
• Use SELECT to signify you want to select an actor and 2 movies, and WHERE to signify your conditions.
?actor ?movie1 ?movie2
mdb:AnthonyHopkins mdb:Noah mdb:Thor
#23
PREFIX mdb: <http://example.org/movieDB/>PREFIX owl: <http://www.w3.org/2002/07/owl#>SELECT ?actor ?movie1 ?movie2WHERE { ?movie1 a mdb:Movie; mdb:starring ?actor . ?movie2 a :Movie; mdb:starring ?actor . FILTER NOT EXISTS {?movie1 owl:sameAs ?movie2}}
GRAPH DATABASES &TRIPLESTORES
#24
Graph databases
Graph databases store data in terms of entities and the relationships between entities.
They are particularly suited for interconnected data, as they cater for:•Integration of heterogeneous data sources•Hierarchical or interconnected datasets•Dynamic data models / schema evolution•Relationship centric analytics / discovery•Path traversal / navigation, sub-graph pattern matching
#25
Semantic graph databases
A variant on graph databases are RDF databases (triplestores, semantic graph databases) which store data in triples of the format subject-predicate-object.Advantages of semantic graph databases include:
•Simple, graph based data model•Exploratory queries against unknown schema•Agile schema / schema-less•Rich, semantic data models (schemas)•Easily map between data models (schemas)•Global identifiers of nodes & relations•Inference of implicit facts, based on rules•Compliance to standards (RDF, SPARQL), no vendor lock-in•Easy to publish / consume open Knowledge Graphs (Linked Data)
#26
GraphDB by Ontotext
• High performance semantic graph database, 10s of billions of triples• Full compliance to W3C standards (RDF, SPARQL, OWL, …)• Various inference profiles, including custom rules• Extensions
– Geo-spatial, RDF Rank, full-text search, Blueprints/Gremlin, 3rd party plugins
• Tooling for DBAs
#27
GraphDB™ Editions
• GraphDB™ Free• GraphDB™ Standard• GraphDB™ Cloud• GraphDB™ as-a-Service (S4)• GraphDB™ Enterprise
#28
Fully Managed Database-as-a-Service
• Low-cost DBaaS for Ontotext GraphDB • Ideal for small to moderate data & query volumes
– database options: 10M (free), 50M, 250M & 1B triples
• Instantly deploy new databases when needed – Easily scale up / down as data volume changes
• Zero administration – automated operations, maintenance & upgrades
• Faster experimentation & prototyping, reduced TCO
#29
CHOOSING ADATABASE SOLUTION
#30
Choosing an appropriate database solution
From experimentation to production
• Priorities: cost, ease of deployment, performance, availability• GraphDB options: Free, Standard, Enterprise• Deployment: on premise, AWS cloud, database-as-a-service • Seamless upgrade paths
– all options based on the same engine
#31
Learning Prototype Pilot Production
Choosing an appropriate database solution
Learning
• Priorities– Free– Easy & quick to set up, “sandbox” environment
• Recommended– Database-as-a-Service (free 10M triples)– GraphDB Free
#32
Learning Prototype Pilot Production
Choosing an appropriate database solution
Prototype
• Priorities– Free / low-cost– Easy & quick to set up, “sandbox” environment
• Recommended– GraphDB Free– Database-as-a-Service (10M – 50M triples)
#33
Learning Prototype Pilot Production
Choosing an appropriate database solution
Pilot
• Priorities– Low-cost– Performance and scalability
• Recommended– GraphDB Standard
• Also consider– Database-as-a-Service (250M – 1B triples)– GraphDB Free
#34
Learning Prototype Pilot Production
Choosing an appropriate database solution
Production
• Priorities– Performance and scalability– High availability
• Recommended– GraphDB Enterprise
• Recommended– GraphDB Standard
#35
Learning Prototype Pilot Production
REFERENCE PROJECTS
#36
Profile• Mass media broadcaster founded in 1922• 23,000 employees and over 5 billion pounds in annual
revenue.
Goals• Create a dynamic semantic publishing platform that
assembled web pages on-the-fly using a variety of data sources
• Deliver highly relevant data to web site visitors with sub-second response
Challenges• BBC journalists author and publish content which is then
statistically rendered. The costs and time to do this were high.
• Diverse content was difficult to navigate, content re-use was not flexible
• User experience needed to be improved with relevant content
"The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform."
John O’Donovan Chief Technical Architect
BBC
#37
Future Media BBC MMXII
10 000+ Dynamic Aggregations
Profile• Top 3 business media• Focused both on B2C publishing and B2B services
Goals• Create a horizontal platform for both data and content based
on semantics and serve all functionality through it
Challenges• Critical part of the entire workflow• Multiple development projects in parallel with up to 2
months time between inception and go live• GraphDB used not only for data, but for content storage as
well • Horizontal platform with focus on organizations, people, GPEs and relations between them
• Automatic extraction of all these concepts and relationships • Separate stream of work for a user behavior based
recommendation of relevant content and data across the entire media
Financial Times
#39
Profile• Established in 1961 to enable federal agencies • Specializes in logistics, financial, infrastructure & information
management
Goals• Unlock large collections of complex documents• Improve analyst productivity• Create an application they can sell to US Federal agencies
Challenges• Analysts taking hours to find, download and search
documents, using inaccurate keyword searches• Needed a knowledge base to search quickly and guide the
analysts – highly relevant searches
• Extracts knowledge from collection of documents• Uses GraphDB to intuitively search and filter• Knowledge base used to suggest searches• Hyper speed performance• Huge savings in analyst time• Accurate results
LMI
#40
Profile• Global, Bio-pharma company• $28 billion in sales in 2012• $4 billion in R&D across three continents
Goals• Efficient design of new clinical studies• Quick access to all of the data• Improved evidence based decision-making• Strengthen the knowledge feedback loop• Enable predictive science
Challenges• Over 7,000 studies and 23,000 documents are difficult to
obtain• Searches returning 1,000 – 10,000 results• Document repositories not designed for reuse• Tedious process to arrive at evidence based decisions
AstraZeneca
#41
Profile• Euromoney Institutional Investor PLC, the international
online information and events group
Goals• Create a horizontal platform to serve 100 different
publications • create a new publishing and information platform which
would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository
Challenges• Different domains covered • Sophisticated content analytics incl. Relation, template and
scenario extraction
• Analytics of reports and news of various domains• Extraction of sophisticated macro economic views on markets and
market conditions; trades, condition and trade horizons, assets, asset allocations, etc.
• Multi-faceted search • Completely new content and data infrastructure
Euromoney
#42
S4 - SELF-SERVICESEMANTIC SUITE
#43
• Capabilities for Smart Data management and analytics – Text analytics for news, life sciences and
social media – RDF graph database as-a-service – Access to large open knowledge graphs
• Available on-demand, anytime, anywhere – Simple RESTful services
• Simple pay-per-use pricing – No upfront commitments
Self-service semantic suite (S4)
#44
• Enables quick prototyping – Instantly available, no provisioning & operations required – Focus on building applications, don’t worry about software + infrastructure
• Free tier! • Easy to start, shorter learning curve
– Detailed documentation, various add-ons, SDKs and demo code
• Based on enterprise technology by Ontotext
S4 Benefits
#45
Support and FAQ’s
Additional resources:
Ontotext:Community Forum and Evaluation Support: http://stackoverflow.com/questions/tagged/graphdb GraphDB Website and Documentation: http://graphdb.ontotext.comWhitepapers, Fundamentals: http://ontotext.com/knowledge-hub/fundamentals/
SPARQL, OWL, and RDF: RDF: http://www.w3.org/TR/rdf11-concepts/ RDFS: http://www.w3.org/TR/rdf-schema/ SPARQL Overview: http://www.w3.org/TR/sparql11-overview/ SPARQL Query: http://www.w3.org/TR/sparql11-query/ SPARQL Update: http://www.w3.org/TR/sparql11-update
#46
For Further Information
• Georgi Georgiev, Head of Global Alliances Development– [email protected]– 359.882.885.636
• Ilian Uzunov, Europe Sales and Business Development– [email protected] – 359.888.772.248
• Peio Popov, North America Sales and Business Development– [email protected] – 1.929.239.0659
#47
Announcement: New training course
Designing a Semantic Technology Proof of Concept with GraphDB™13 December 2016 | 10am CET | 9am GMT | 11am EET
Course contents:• 3 hours worth of tailored video materials on Semantic Technologies• 2 hours worth of SPARQL exercises and sample solutions• 4 hours live interactive session designing a sample Proof of Concept with GraphDB• 1 hour 1-on-1 consulting follow-up session
Topics Covered
#48
• Modeling data using the Resource Description Framework• Applying flexible schemas on schema-less data• Using simple ontologies for automated reasoning on data• Effectively using and configuring RDF databases & repositories• Transforming, cleaning up and linking heterogeneous data with
OntoRefine
• Loading distributed data in one unified data layer• Querying and updating RDF data with SPARQL• Linked Open Data: how to link data and useful LOD resources• Data exploration and data visualization with GraphDB™• Domain-specific use cases of adopting semantic technologies
The End
First Steps in Semantic Data Modelling and Search & Analytics in the Cloud