First Steps in Semantic Data Modelling and Search & Analytics in the Cloud

May 3, 2023

First Steps in Semantic Data Modelling and Search & Analytics in the Cloud

http://www.ontotext.com/

Announcement: New training course

Designing a Semantic Technology Proof of Concept with GraphDB™13 December 2016 | 10am CET | 9am GMT | 11am EET

Course contents:• 3 hours worth of tailored video materials on Semantic Technologies• 2 hours worth of SPARQL exercises and sample solutions• 4 hours live interactive session designing a sample Proof of Concept with GraphDB• 1 hour 1-on-1 consulting follow-up session

Topics Covered

#2

• Modeling data using the Resource Description Framework• Applying flexible schemas on schema-less data• Using simple ontologies for automated reasoning on data• Effectively using and configuring RDF databases & repositories• Transforming, cleaning up and linking heterogeneous data with

OntoRefine

• Loading distributed data in one unified data layer• Querying and updating RDF data with SPARQL• Linked Open Data: how to link data and useful LOD resources• Data exploration and data visualization with GraphDB™• Domain-specific use cases of adopting semantic technologies

Presentation Outline• Modeling data using RDF

• Applying flexible schema on schema-less data

• Ontologies for automated reasoning on data

• SPARQL query types and modifiers

• Graph databases and triplestores

• Choosing an appropriate database solution

• Niche-specific reference projects

• S4 for on-demand low-cost smart data management

• S4 REST services

• S4 Knowledge graph

#3

MODELING DATA USING RDF

#4

Example

#5

Information can be described through relationships between things, e.g.• The relationship between the movie Thor and Kenneth Branagh is that

Kenneth directed the movie.• The relationship between the movie Thor and the date May 6, 2011 is that

the movie was released (in the US) on that date.Such descriptions are formalized using the Resource Description Framework.

Resource Description Framework (RDF) is a graph data model that• Formally describes the semantics, or meaning, of information• Represents metadata, i.e., data about data

RDF data model consists of triples• That represent links (or edges) in an RDF graph• Where the structure of each triple is Subject, Predicate, Object

Example triples:

‘mdb:’ refers to the namespace ‘http://example.org/movieDB/’ so that ‘mdb:Thor’ expands to <http://example.org/movieDB/Thor> a Universal Resource Identifier (URI).

What is RDF?

Subject Predicate Object

mdb:Thor mdb:directedBymdb:KennethBranagh .mdb:Thor mdb:releaseDate 2011-05-06 .

6#6

An Example of an RDF Model

7#7


8#8


9#9


10#10

But RDF is more than just a tool for representing information that we already know!

FLEXIBLE SCHEMA &AUTOMATED REASONING

#11

RDF Schema (RDFS)

• Adds– Concepts such as Resource, Literal, Class, and Datatype – Relationships such as subClassOf, subPropertyOf, domain, and range

• Provides the means to define– Classes and properties– Hierarchies of classes and properties

• Includes “entailment rules”, i.e., axioms to infer new triples from existing ones

What is RDFS?

12#12

Applying RDFS To Infer New Triples

mdb:directedBy rdfs:domain mdb:Movie ; rdfs:range mdb:Director .

mdb:Thor mdb:directedBy mdb:KennethBranagh .mdb:Director rdfs:subClassOf mdb:Human .

mdb:Thor a mdb:Movie .mdb:KennethBranagh a mdb:Director .

mdb:KennethBranagh a mdb:Human .

13#13

An ontology is a formal specification that provides sharable and reusable knowledge representation.

Other examples of such formal specifications include:

• Taxonomies

• Vocabularies

• Thesauri

• Topic Maps

• Logical Models

#14

What is in an Ontology?

What is in an Ontology?

An ontology specification includes descriptions of• Concepts and properties in a domain • Relationships between concepts • Constraints on how the relationships can be used• Individuals as members of concepts

15#15

The Benefits of an Ontology

Ontologies provide:• A common understanding of information• Explicit domain assumptions

These provisions are valuable because ontologies:• Support data integration for analytics• Apply domain knowledge to data• Support interoperation of applications• Enable model-driven applications• Reduce the time and cost of application development• Improve data quality, i.e., metadata and provenance

16#16

OWL Overview

The Web Ontology Language (OWL) adds more powerful ontology modelling means to RDF/RDFS• Providing

– Consistency checks: Are there logical inconsistencies?– Satisfiability checks: Are there classes that cannot have instances?– Classification: What is the type of an instance?

• Adding identity equivalence and identity difference – Such as, sameAs, differentFrom, equivalentClass, equivalentProperty

• Offering more expressive class definitions, such as– Class intersection, union, complement, disjointness– Cardinality restrictions

• Offering more expressive property definitions such as,– Object and datatype properties– Transitive, functional, symmetric, inverse properties– Value restrictions

17#17

SPARQL

#18

#19

What is SPARQL?

SPARQL is a SQL-like query language for RDFgraph data with the following query types:

• SELECT which returns tabular results • CONSTRUCT creates a new RDF graph based on query results• ASK which returns ‘yes’ if the query has a solution, otherwise ‘no’• DESCRIBE which returns RDF graph data about a resource; useful when the query

client does not know the structure of the RDF data in the data source• INSERT which inserts triples into a graph• DELETE which deletes triples from a graph.

SemanticSearch

Ontotext, AD and Keen Analytics, LLC. All Rights Reserved 19

Using SPARQL to Insert Triples

To create an RDF graph, perform these steps:• Define prefixes to URIs with the PREFIX keyword

• Use INSERT DATA to signify you want to insert statements. Write the subject-predicate-object statements (triples).

• Execute this query. PREFIX mdb: <http://example.org/movieDB/>INSERT DATA { mdb:Thor mdb:starring mdb:ChrisHemsworth;

mdb:starring mdb:NataliePortman,

mdb:AnthonyHopkins. }

#20

Using SPARQL to Select Triples

To access the RDF graph you just created, perform these steps:• Define prefixes to URIs with the PREFIX keyword.

• Use SELECT to signify you want to select certain information, and WHERE to signify your conditions, restrictions and filters.

• Execute this query.

PREFIX : <http://example.org/movieDB>SELECT ?subject ?predicate ?object WHERE {?subject ?predicate ?object }

Subject Predicate Object

mdb:Thor mdb:directedBymdb:KennethBranaghmdb:Thor mdb:releaseDate2011-05-06mdb:Thor mdb:starring mdb:ChrisHemsworthmdb:Thor mdb:starring mdb:NataliePortmanmdb:Thor mdb:starring mdb:AnthonyHopkins

#21

Using SPARQL to Find Prolific Actors

To find actors who stars in multiple movies, first find out if such an actor exists:• Define prefixes to URIs with the PREFIX keyword

• Use ASK to discover whether an actor is starring in two (or more) different movies

• Use WHERE to signify those conditions.

YES

PREFIX mdb: <http://example.org/movieDB/>PREFIX owl: <http://www.w3.org/2002/07/owl#>ASKWHERE { ?movie1 a mdb:Movie; mdb:starring ?actor . ?movie2 a :Movie; mdb:starring ?actor . FILTER NOT EXISTS {?movie1 owl:sameAs ?movie2}}

Ontotext, AD and Keen Analytics, LLC. All Rights Reserved#22

Using SPARQL to Find Prolific Actors

Now that we know at least one such actor exists, perform these steps to find each actor and pair of movies:• Define prefixes to URIs with the PREFIX keyword

• Use SELECT to signify you want to select an actor and 2 movies, and WHERE to signify your conditions.

?actor ?movie1 ?movie2

mdb:AnthonyHopkins mdb:Noah mdb:Thor

#23

PREFIX mdb: <http://example.org/movieDB/>PREFIX owl: <http://www.w3.org/2002/07/owl#>SELECT ?actor ?movie1 ?movie2WHERE { ?movie1 a mdb:Movie; mdb:starring ?actor . ?movie2 a :Movie; mdb:starring ?actor . FILTER NOT EXISTS {?movie1 owl:sameAs ?movie2}}

GRAPH DATABASES &TRIPLESTORES

#24

Graph databases

Graph databases store data in terms of entities and the relationships between entities.

They are particularly suited for interconnected data, as they cater for:•Integration of heterogeneous data sources•Hierarchical or interconnected datasets•Dynamic data models / schema evolution•Relationship centric analytics / discovery•Path traversal / navigation, sub-graph pattern matching

#25

Semantic graph databases

A variant on graph databases are RDF databases (triplestores, semantic graph databases) which store data in triples of the format subject-predicate-object.Advantages of semantic graph databases include:

•Simple, graph based data model•Exploratory queries against unknown schema•Agile schema / schema-less•Rich, semantic data models (schemas)•Easily map between data models (schemas)•Global identifiers of nodes & relations•Inference of implicit facts, based on rules•Compliance to standards (RDF, SPARQL), no vendor lock-in•Easy to publish / consume open Knowledge Graphs (Linked Data)

#26

GraphDB by Ontotext

• High performance semantic graph database, 10s of billions of triples• Full compliance to W3C standards (RDF, SPARQL, OWL, …)• Various inference profiles, including custom rules• Extensions

– Geo-spatial, RDF Rank, full-text search, Blueprints/Gremlin, 3rd party plugins

• Tooling for DBAs

#27

GraphDB™ Editions

• GraphDB™ Free• GraphDB™ Standard• GraphDB™ Cloud• GraphDB™ as-a-Service (S4)• GraphDB™ Enterprise

#28

Fully Managed Database-as-a-Service

• Low-cost DBaaS for Ontotext GraphDB • Ideal for small to moderate data & query volumes

– database options: 10M (free), 50M, 250M & 1B triples

• Instantly deploy new databases when needed – Easily scale up / down as data volume changes

• Zero administration – automated operations, maintenance & upgrades

• Faster experimentation & prototyping, reduced TCO

#29

CHOOSING ADATABASE SOLUTION

#30

Choosing an appropriate database solution

From experimentation to production

• Priorities: cost, ease of deployment, performance, availability• GraphDB options: Free, Standard, Enterprise• Deployment: on premise, AWS cloud, database-as-a-service • Seamless upgrade paths

– all options based on the same engine

#31

Learning Prototype Pilot Production


Learning

• Priorities– Free– Easy & quick to set up, “sandbox” environment

• Recommended– Database-as-a-Service (free 10M triples)– GraphDB Free

#32



Prototype

• Priorities– Free / low-cost– Easy & quick to set up, “sandbox” environment

• Recommended– GraphDB Free– Database-as-a-Service (10M – 50M triples)

#33



Pilot

• Priorities– Low-cost– Performance and scalability

• Recommended– GraphDB Standard

• Also consider– Database-as-a-Service (250M – 1B triples)– GraphDB Free

#34



Production

• Priorities– Performance and scalability– High availability

• Recommended– GraphDB Enterprise

• Recommended– GraphDB Standard

#35


REFERENCE PROJECTS

#36

Profile• Mass media broadcaster founded in 1922• 23,000 employees and over 5 billion pounds in annual

revenue.

Goals• Create a dynamic semantic publishing platform that

assembled web pages on-the-fly using a variety of data sources

• Deliver highly relevant data to web site visitors with sub-second response

Challenges• BBC journalists author and publish content which is then

statistically rendered. The costs and time to do this were high.

• Diverse content was difficult to navigate, content re-use was not flexible

• User experience needed to be improved with relevant content

"The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform."

John O’Donovan Chief Technical Architect

BBC

#37

Future Media BBC MMXII

10 000+ Dynamic Aggregations

Profile• Top 3 business media• Focused both on B2C publishing and B2B services

Goals• Create a horizontal platform for both data and content based

on semantics and serve all functionality through it

Challenges• Critical part of the entire workflow• Multiple development projects in parallel with up to 2

months time between inception and go live• GraphDB used not only for data, but for content storage as

well • Horizontal platform with focus on organizations, people, GPEs and relations between them

• Automatic extraction of all these concepts and relationships • Separate stream of work for a user behavior based

recommendation of relevant content and data across the entire media

Financial Times

#39

Profile• Established in 1961 to enable federal agencies • Specializes in logistics, financial, infrastructure & information

management

Goals• Unlock large collections of complex documents• Improve analyst productivity• Create an application they can sell to US Federal agencies

Challenges• Analysts taking hours to find, download and search

documents, using inaccurate keyword searches• Needed a knowledge base to search quickly and guide the

analysts – highly relevant searches

• Extracts knowledge from collection of documents• Uses GraphDB to intuitively search and filter• Knowledge base used to suggest searches• Hyper speed performance• Huge savings in analyst time• Accurate results

LMI

#40

Profile• Global, Bio-pharma company• $28 billion in sales in 2012• $4 billion in R&D across three continents

Goals• Efficient design of new clinical studies• Quick access to all of the data• Improved evidence based decision-making• Strengthen the knowledge feedback loop• Enable predictive science

Challenges• Over 7,000 studies and 23,000 documents are difficult to

obtain• Searches returning 1,000 – 10,000 results• Document repositories not designed for reuse• Tedious process to arrive at evidence based decisions

AstraZeneca

#41

Profile• Euromoney Institutional Investor PLC, the international

online information and events group

Goals• Create a horizontal platform to serve 100 different

publications • create a new publishing and information platform which

would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository

Challenges• Different domains covered • Sophisticated content analytics incl. Relation, template and

scenario extraction

• Analytics of reports and news of various domains• Extraction of sophisticated macro economic views on markets and

market conditions; trades, condition and trade horizons, assets, asset allocations, etc.

• Multi-faceted search • Completely new content and data infrastructure

Euromoney

#42

S4 - SELF-SERVICESEMANTIC SUITE

#43

• Capabilities for Smart Data management and analytics – Text analytics for news, life sciences and

social media – RDF graph database as-a-service – Access to large open knowledge graphs

• Available on-demand, anytime, anywhere – Simple RESTful services

• Simple pay-per-use pricing – No upfront commitments

Self-service semantic suite (S4)

#44

• Enables quick prototyping – Instantly available, no provisioning & operations required – Focus on building applications, don’t worry about software + infrastructure

• Free tier! • Easy to start, shorter learning curve

– Detailed documentation, various add-ons, SDKs and demo code

• Based on enterprise technology by Ontotext

S4 Benefits

#45

Support and FAQ’s

[email protected]

Additional resources:

Ontotext:Community Forum and Evaluation Support: http://stackoverflow.com/questions/tagged/graphdb GraphDB Website and Documentation: http://graphdb.ontotext.comWhitepapers, Fundamentals: http://ontotext.com/knowledge-hub/fundamentals/

SPARQL, OWL, and RDF: RDF: http://www.w3.org/TR/rdf11-concepts/ RDFS: http://www.w3.org/TR/rdf-schema/ SPARQL Overview: http://www.w3.org/TR/sparql11-overview/ SPARQL Query: http://www.w3.org/TR/sparql11-query/ SPARQL Update: http://www.w3.org/TR/sparql11-update

#46

mailto:[email protected]

http://stackoverflow.com/questions/tagged/graphdb

http://graphdb.ontotext.com/

http://ontotext.com/knowledge-hub/fundamentals/

http://www.w3.org/TR/rdf11-concepts/

http://www.w3.org/TR/rdf-schema/

http://www.w3.org/TR/sparql11-overview/

http://www.w3.org/TR/sparql11-query/

http://www.w3.org/TR/sparql11-update/

For Further Information

• Georgi Georgiev, Head of Global Alliances Development– [email protected]– 359.882.885.636

• Ilian Uzunov, Europe Sales and Business Development– [email protected] – 359.888.772.248

• Peio Popov, North America Sales and Business Development– [email protected] – 1.929.239.0659

#47




Announcement: New training course

Designing a Semantic Technology Proof of Concept with GraphDB™13 December 2016 | 10am CET | 9am GMT | 11am EET

Course contents:• 3 hours worth of tailored video materials on Semantic Technologies• 2 hours worth of SPARQL exercises and sample solutions• 4 hours live interactive session designing a sample Proof of Concept with GraphDB• 1 hour 1-on-1 consulting follow-up session

Topics Covered

#48

• Modeling data using the Resource Description Framework• Applying flexible schemas on schema-less data• Using simple ontologies for automated reasoning on data• Effectively using and configuring RDF databases & repositories• Transforming, cleaning up and linking heterogeneous data with

OntoRefine

• Loading distributed data in one unified data layer• Querying and updating RDF data with SPARQL• Linked Open Data: how to link data and useful LOD resources• Data exploration and data visualization with GraphDB™• Domain-specific use cases of adopting semantic technologies

The End

First Steps in Semantic Data Modelling and Search & Analytics in the Cloud

Data & Analytics

First Steps in Semantic Data Modelling and Search & Analytics in the Cloud