50
Semantic Technology in Publishing & Finance Triplestores and inference, applications in Finance, the GraphDB engine, text-mining, projects and solutions for financial media and publishers Keystone Industrial Panel ISWC 2014, Riva del Garda, 18 Oct 2014 Semantic Technology in Publishing & Finance #1 Oct 2014

Semantic Technology in Publishing & Finance

Embed Size (px)

DESCRIPTION

Triplestores and inference, applications in Finance, text-mining. Projects and solutions for financial media and publishers. Keystone Industrial Panel, ISWC 2014, Riva del Garda, 18 Oct 2014. Thanks to Atanas Kiryakov for this presentation, I just cut it to size.

Citation preview

Page 1: Semantic Technology in Publishing & Finance

Semantic Technology in Publishing & Finance

Triplestores and inference, applications in Finance, the GraphDB engine, text-mining, projects and solutions for financial media and publishers

Keystone Industrial PanelISWC 2014, Riva del Garda, 18 Oct 2014

Semantic Technology in Publishing & Finance #1Oct 2014

Page 2: Semantic Technology in Publishing & Finance

• Introduction to Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up

Outline

Semantic Technology in Publishing & Finance #2Oct 2014

Page 3: Semantic Technology in Publishing & Finance

Ontotext

• Information management company providing text analysis, data management and state-of-the-art semantic technology

• 75 employees, head quartered in Sofia, Bulgaria• Sales presence in London, Washington DC, and Boston• Clients include BBC, AstraZeneca, US DoD, OUP, Wiley, Getty…• Over 400 person-years in R&D to create a one-stop shop for:

– Content enrichment– Data management – Graph database engine

• Open and standard compliant technology:– RDF(S), OWL, GATE, Sesame

#3Semantic Technology in Publishing & Finance Oct 2014

Page 4: Semantic Technology in Publishing & Finance

Interlinking Text and Data

Semantic Technology in Publishing & Finance #4Oct 2014

Page 5: Semantic Technology in Publishing & Finance

Semantic Annotation

Semantic Technology in Publishing & Finance #5

pmid:17714090

umls:C0035204

COPD

Bronchial Diseases

Respiration Disorders

umls:C0006261

Chronic Obstructive Airway Diseases

Asthma umls:C000496

Ian A Yang

Clinical and experimental pharmacology …

Oct 2014

Page 6: Semantic Technology in Publishing & Finance

Semantic Annotation

Semantic Technology in Publishing & Finance

Semantic Annotation goes far beyond tagging. It allows search using

enrichment, linking and rules to return explicit and implicit results – complete

intelligence.

Graph Database• Standards Based• 24-7 Resiliency• Hybrid Semantic

Queries & Search

Content Enrichment• Text Mining &

Classification• Curation• Quality Monitoring

Data Management• Ontologies and

Semantic Annotation• Web mining• Identity Resolution

#6Oct 2014

Page 7: Semantic Technology in Publishing & Finance

What is RDF Good for?

• Metadata-based content management– Metadata represents a re-usable result of content analytics– It can be repurposed allowing for a wide range of applications– Most of the search engines do analytics, but the results are not

explicit; so, they cannot be validated, refined and used by other applications

• Linking text and structured data– Allows structured, uniform and efficient access to diverse domain

models, taxonomies, dictionaries, reference databases

• Reference data management– E.g. product catalogs and taxonomies that are too structured to be

managed with NoSQL, but too diverse and interconnected for SQL

• Using open linked data (LOD)– A growing amount and diverse public data can be used in enterprise

Knowledge Management applications

#7Oct 2014Semantic Technology in Publishing & Finance

Page 8: Semantic Technology in Publishing & Finance

• July 2013 stats: 2 289 datasets (http://stats.lod2.eu/)

• Growing exponentially (see the dotted trend line)

LOD: Growing Exponentially

27 43 89 162295

822

2,289

2007 2008 2009 2010 2011 2012 2013

Linked Data Datasets

#8Oct 2014Semantic Technology in Publishing & Finance

Page 9: Semantic Technology in Publishing & Finance

How Does Inference Help?

• Intelligent mapping of queries to data– This matters a lot when an application should query a dataset

combined from 10+ sources, which evolve independently– There is no way application developer can stay on top of all schemata

and all datasets, all the time

• Finding patterns and inferring new relationships– Think of someone constantly looking for patterns that elicit new

relations, which can match patterns that elicit other relations …– Or someone who goes deeper and deeper into finding new ways to

rewrite a query, over and over again, until all alternatives are exhausted

• Get deeper results and more complete results

• Cheaper data integration, easier querying

#9Oct 2014Semantic Technology in Publishing & Finance

Page 10: Semantic Technology in Publishing & Finance

Ontotext Technology Portfolio

Semantic Technology in Publishing & Finance #10Oct 2014

Page 11: Semantic Technology in Publishing & Finance

• Introduction to Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up

Outline

Semantic Technology in Publishing & Finance #11Oct 2014

Page 12: Semantic Technology in Publishing & Finance

BBC: The Perfect Application

Since year 2000 Semantic technology was striving for:• Pertinent applications, a really good use case

• Real high-profile projects to prove its maturity

The “Dynamic Semantic Publishing” architecture implemented by the BBC for its FIFA World cup 2010 web-site filled this gap!

It demonstrates:• How RDF database serves well, where RDBMS fail to

• How text-mining and triplestores complement one another

• How inference adds value at a decent scale

• 24/7 live operation that cannot work without a functional triplestore

#12Semantic Technology in Publishing & Finance Oct 2014

Page 13: Semantic Technology in Publishing & Finance

Ontotext and BBC

Semantic Technology in Publishing & Finance

Profile• Mass media broadcaster founded in 1922• 23,000 employees and over 5 billion

pounds in annual revenue.

Goals• Create a dynamic semantic publishing

platform that assembled web pages on-the-fly using a variety of data sources

• Deliver highly relevant data to web site visitors with sub-second response

Challenges• BBC journalists author and publish content

which is then statistically rendered. The costs and time to do this were high.

• Diverse content was difficult to navigate, content re-use was not flexible

• User experience needed to be improved with relevant content

"The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform." John O’Donovan Chief Technical Architect

#13Oct 2014

Page 14: Semantic Technology in Publishing & Finance

BBC: The Perfect Application (ctd)

• The BBC’s FIFA World cup 2010 project was widely recognized as the best showcase for SemTech– It used OWLIM as a triplestore (chosen after a thorough evaluation)– It triggered a wave of adoption of the technology

• The next milestone: London 2012 Olympic Games– The two most important websites used the DSP architecture: the

official one, operated by Press Association, and the one of the BBC– Ontotext text-mining technology was used for content enrichment

• Four years later this application pattern is still the best use case– And there are still no other triplestores that can survive such load,

judging by the LDBC Semantic Publishing Benchmark, public information and feedback from the industry

#14Semantic Technology in Publishing & Finance Oct 2014

Page 15: Semantic Technology in Publishing & Finance

Ontotext and AstraZeneca

Semantic Technology in Publishing & Finance

Profile• Global, Bio-pharma company• $28 billion in sales in 2012• $4 billion in R&D across three continents

Goals• Efficient design of new clinical studies• Quick access to all of the data• Improved evidence based decision-making• Strengthen the knowledge feedback loop• Enable predictive science

Challenges• Over 7,000 studies and 23,000 documents

are difficult to obtain• Searches returning 1,000 – 10,000 results• Document repositories not designed for

reuse• Tedious process to arrive at evidence

based decisions

#15Oct 2014

Page 16: Semantic Technology in Publishing & Finance

Context-based Disambiguation

Semantic Technology in Publishing & Finance #16Oct 2014

Page 17: Semantic Technology in Publishing & Finance

Semantic Technology in Publishing & Finance

Ontotext and LMI

Profile• Established in 1961 to enable federal

agencies • Specializes in logistics, financial,

infrastructure & information management

Goals• Unlock large collections of complex

documents• Improve analyst productivity• Create an application they can sell to US

Federal agencies

Challenges• Analysts taking hours to find, download

and search documents, using inaccurate keyword searches

• Needed a knowledge base to search quickly and guide the analysts – highly relevant searches

• Extracts knowledge from collection of documents

• Uses GraphDB to intuitively search and filter• Knowledge base used to suggest searches• Hyper speed performance• Huge savings in analyst time• Accurate results

#17Oct 2014

In Apr’14 OpenPolicy won the

Innovation Award of WP and NVTC

Page 18: Semantic Technology in Publishing & Finance

Some of our clients

#18

The most popular financial

newspaper

Oct 2014Semantic Technology in Publishing & Finance

Page 19: Semantic Technology in Publishing & Finance

• Introduction of Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up

Outline

Semantic Technology in Publishing & Finance #19Oct 2014

Page 20: Semantic Technology in Publishing & Finance

Publishing and Media Solution

Semantic Technology in Publishing & Finance #20Oct 2014

Page 21: Semantic Technology in Publishing & Finance

• Dedicated solutions for media and publishing• Based on the Ontotext Semantic Platform• Mature implementation and continuous adaptation

methodology• Introducing advanced features to the authoring,

editorial and publishing phases of content and data workflows

Solution Features

Semantic Technology in Publishing & Finance #21Oct 2014

Page 22: Semantic Technology in Publishing & Finance

#22

Methodology

Oct 2014Semantic Technology in Publishing & Finance

Page 23: Semantic Technology in Publishing & Finance

Architecture Overview

Semantic Technology in Publishing & Finance #23Oct 2014

Page 24: Semantic Technology in Publishing & Finance

Authoring

Related assets – as you type

Related entities and concepts

Entity profiles and facts on the fly

Create higher value content at the same cost

Oct 2014 #24Semantic Technology in Publishing & Finance

Page 25: Semantic Technology in Publishing & Finance

Contextual Authoring

Oct 2014Semantic Technology in Publishing & Finance

#25

Page 26: Semantic Technology in Publishing & Finance

Curation

Continuous adaptation through editorial feedback

Query driven publishing templates

Dynamic re-purposing and reuse

New publishing products with the same content

Oct 2014 #26Semantic Technology in Publishing & Finance

Page 27: Semantic Technology in Publishing & Finance

Example of Client Integrated Curation

Oct 2014 #27Semantic Technology in Publishing & Finance

Page 28: Semantic Technology in Publishing & Finance

Example Curation Tool: PressAssociation

Oct 2014 #28Semantic Technology in Publishing & Finance

Page 29: Semantic Technology in Publishing & Finance

Monitoring and Curation Curation Tool

Oct 2014Semantic Technology in Publishing & Finance

#29

Page 30: Semantic Technology in Publishing & Finance

Continuous Adaptation

Oct 2014Semantic Technology in Publishing & Finance

#30

Page 31: Semantic Technology in Publishing & Finance

Publishing

Dynamic construction of products (e.g. topic pages)

Personalized content streams

Semantics driven trend and user analytics

Behavior driven personal asset streams

Oct 2014Semantic Technology in Publishing & Finance

#31

Page 32: Semantic Technology in Publishing & Finance

User Behavior Tracking

Oct 2014Semantic Technology in Publishing & Finance

#32

Page 33: Semantic Technology in Publishing & Finance

Personalized Recommendations

Oct 2014 #33Semantic Technology in Publishing & Finance

Page 34: Semantic Technology in Publishing & Finance

#34

Methodology

Oct 2014Semantic Technology in Publishing & Finance

Page 35: Semantic Technology in Publishing & Finance

Methodology

#35Oct 2014Semantic Technology in Publishing & Finance

Page 36: Semantic Technology in Publishing & Finance

Methodology

#36Oct 2014Semantic Technology in Publishing & Finance

Page 37: Semantic Technology in Publishing & Finance

Methodology

#37Oct 2014Semantic Technology in Publishing & Finance

Page 38: Semantic Technology in Publishing & Finance

Complete Domain Ontology

#38Oct 2014Semantic Technology in Publishing & Finance

Page 39: Semantic Technology in Publishing & Finance

Example KB for 50 daily publications

#39Oct 2014Semantic Technology in Publishing & Finance

Page 40: Semantic Technology in Publishing & Finance

Methodology

#40Oct 2014Semantic Technology in Publishing & Finance

Page 41: Semantic Technology in Publishing & Finance

Design of Machine Learning Pipeline

#41Oct 2014Semantic Technology in Publishing & Finance

Page 42: Semantic Technology in Publishing & Finance

• Introduction of Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up

Outline

Semantic Technology in Publishing & Finance #42Oct 2014

Page 43: Semantic Technology in Publishing & Finance

Discovering Suspicious Relationships

Semantic Technology in Publishing & Finance #43Oct 2014

Page 44: Semantic Technology in Publishing & Finance

• Have a database of locations, with part-of info• Have a database with companies, with dependencies• Define semantics for the relevant relationships:

– sub-region and control are transitive relationships– Located-in is transitive over sub-region

• Define the semantics of suspicious relationshipsCONSTRUCT { ?orgA my:suspiciousLink ?orgB } WHERE {

?orgA ptop:locatedIn ?x ; fibo:controls ?y .

?y fibo:controls ?orgB ; ptop:locatedIn ?z .

?orgB ptop:locatedIn ?x .

?z a ptop:OffshoreZone .

}

What It Takes to Make It Work?

Semantic Technology in Publishing & Finance #44Oct 2014

Page 45: Semantic Technology in Publishing & Finance

Use Cases

Semantic Technology in Publishing & Finance

• Investigating networks of linked entities– As prerequisite for risk assessment and compliance research

• Risk assessment– Tracing information about suspicious entities– Identifying risk-indicators across multiple sources– Identifying risks related to linked entities– Determining exposure against a group of linked entities

• Compliance-related research– Fraud detection, insider trading, etc.

• Searching in large policies and regulations– See Open Policy

#45Oct 2014

Page 46: Semantic Technology in Publishing & Finance

How: Semantic BI/Data-warehouses

Semantic Technology in Publishing & Finance

• Imagine integrated database, which allows querying across silted databases– E.g. bond market data vs. risk assessment vs. equity markets vs. M&A– A lot of duplicate data across various databases in different

departments of banks, and data is simply not linked or organized in a unified data model

• Benefits compared to the mainstream technology:– Lower cost of development and maintenance;– Direct benefit from industry standards, using inference– Real-time updates, unlike traditional data-warehousing, where

updates should often be scheduled overnight– Support for a wide variety of analytical queries, which are far more

flexible than traditional approaches

#46Oct 2014

Page 47: Semantic Technology in Publishing & Finance

Semantic Technology in Publishing & Finance

Ontotext and top 3 Business Media

Profile• Top 3 business media• Focused both on B2C publishing and B2B

services

Goals• Create a horizontal platform for both data

and content based on semantics and serve all functionality through it

Challenges• Critical part of the entire workflow• Multiple development projects in parallel

with up to 2 months time between inception and go live

• GraphDB used not only for data, but for content storage as well

• Horizontal platform with focus on organizations, people, GPEs and relations between them

• Automatic extraction of all these concepts and relationships

• Separate stream of work for a user behavior based recommendation of relevant content and data across the entire media

#47Oct 2014

Page 48: Semantic Technology in Publishing & Finance

Reference Projects: BCA/Euromoney

Semantic Technology in Publishing & Finance

• BCA/Euromoney Macroeconomics Reports– Implementation of the Euromoney Semantic Platform

• Automatically generate metadata about:– Markets, geo-political entities, economies, currencies, indicators, indices;– Themes of the report;– Economic and market conditions;– Views of the economist with horizon, focus of the view, and prediction; – Suggested trades of tradable objects (bonds, commodities, equities).

• Semantic indices powering various services: – Live Charts – serving macro economics charts with the possibility to add

additional data series/indices;– Macroeconomists dashboard of views with their objects, sentiment,

horizon, and agreement/disagreement.

#48Oct 2014

Page 49: Semantic Technology in Publishing & Finance

Semantic Technology in Publishing & Finance

Ontotext and Euromoney

Profile• Euromoney Institutional Investor PLC, the

international online information and events group

Goals• Create a horizontal platform to serve 100

different publications • create a new publishing and information

platform which would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository

Challenges• Different domains covered • Sophisticated content analytics incl.

Relation, template and scenario extraction

• Analytics of reports and news of various domains

• Extraction of sophisticated macro economic views on markets and market conditions; trades, condition and trade horizons, assets, asset allocations, etc.

• Multi-faceted search • Completely new content and data

infrastructure

#49Oct 2014

Page 50: Semantic Technology in Publishing & Finance

• Ontotext has a full stack of semantic technologies• Triplestores combine beauties from NoSQL and SQL• Inference fosters discovery in diverse dynamic data• GraphDB is in a league on its own:

– Standard compliant – comprehensive support for OWL and SPARQL– Efficient inference through the entire life-cycle of the data– H igh-availability cluster architecture – proven and mature– FTS and NoSQL Connectors for seamless integration

• End-to-end solution for Media and Publishing– Authoring, curation and publishing through adaptive text-mining

• All the above proven with industry leaders

Wrap up

Semantic Technology in Publishing & Finance #50Oct 2014