Semantic Technology in Publishing & Finance
Triplestores and inference, applications in Finance, the GraphDB engine, text-mining, projects and solutions for financial media and publishers
Keystone Industrial PanelISWC 2014, Riva del Garda, 18 Oct 2014
Semantic Technology in Publishing & Finance #1Oct 2014
• Introduction to Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up
Outline
Semantic Technology in Publishing & Finance #2Oct 2014
Ontotext
• Information management company providing text analysis, data management and state-of-the-art semantic technology
• 75 employees, head quartered in Sofia, Bulgaria• Sales presence in London, Washington DC, and Boston• Clients include BBC, AstraZeneca, US DoD, OUP, Wiley, Getty…• Over 400 person-years in R&D to create a one-stop shop for:
– Content enrichment– Data management – Graph database engine
• Open and standard compliant technology:– RDF(S), OWL, GATE, Sesame
#3Semantic Technology in Publishing & Finance Oct 2014
Interlinking Text and Data
Semantic Technology in Publishing & Finance #4Oct 2014
Semantic Annotation
Semantic Technology in Publishing & Finance #5
pmid:17714090
umls:C0035204
COPD
Bronchial Diseases
Respiration Disorders
umls:C0006261
Chronic Obstructive Airway Diseases
Asthma umls:C000496
Ian A Yang
Clinical and experimental pharmacology …
Oct 2014
Semantic Annotation
Semantic Technology in Publishing & Finance
Semantic Annotation goes far beyond tagging. It allows search using
enrichment, linking and rules to return explicit and implicit results – complete
intelligence.
Graph Database• Standards Based• 24-7 Resiliency• Hybrid Semantic
Queries & Search
Content Enrichment• Text Mining &
Classification• Curation• Quality Monitoring
Data Management• Ontologies and
Semantic Annotation• Web mining• Identity Resolution
#6Oct 2014
What is RDF Good for?
• Metadata-based content management– Metadata represents a re-usable result of content analytics– It can be repurposed allowing for a wide range of applications– Most of the search engines do analytics, but the results are not
explicit; so, they cannot be validated, refined and used by other applications
• Linking text and structured data– Allows structured, uniform and efficient access to diverse domain
models, taxonomies, dictionaries, reference databases
• Reference data management– E.g. product catalogs and taxonomies that are too structured to be
managed with NoSQL, but too diverse and interconnected for SQL
• Using open linked data (LOD)– A growing amount and diverse public data can be used in enterprise
Knowledge Management applications
#7Oct 2014Semantic Technology in Publishing & Finance
• July 2013 stats: 2 289 datasets (http://stats.lod2.eu/)
• Growing exponentially (see the dotted trend line)
LOD: Growing Exponentially
27 43 89 162295
822
2,289
2007 2008 2009 2010 2011 2012 2013
Linked Data Datasets
#8Oct 2014Semantic Technology in Publishing & Finance
How Does Inference Help?
• Intelligent mapping of queries to data– This matters a lot when an application should query a dataset
combined from 10+ sources, which evolve independently– There is no way application developer can stay on top of all schemata
and all datasets, all the time
• Finding patterns and inferring new relationships– Think of someone constantly looking for patterns that elicit new
relations, which can match patterns that elicit other relations …– Or someone who goes deeper and deeper into finding new ways to
rewrite a query, over and over again, until all alternatives are exhausted
• Get deeper results and more complete results
• Cheaper data integration, easier querying
#9Oct 2014Semantic Technology in Publishing & Finance
Ontotext Technology Portfolio
Semantic Technology in Publishing & Finance #10Oct 2014
• Introduction to Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up
Outline
Semantic Technology in Publishing & Finance #11Oct 2014
BBC: The Perfect Application
Since year 2000 Semantic technology was striving for:• Pertinent applications, a really good use case
• Real high-profile projects to prove its maturity
The “Dynamic Semantic Publishing” architecture implemented by the BBC for its FIFA World cup 2010 web-site filled this gap!
It demonstrates:• How RDF database serves well, where RDBMS fail to
• How text-mining and triplestores complement one another
• How inference adds value at a decent scale
• 24/7 live operation that cannot work without a functional triplestore
#12Semantic Technology in Publishing & Finance Oct 2014
Ontotext and BBC
Semantic Technology in Publishing & Finance
Profile• Mass media broadcaster founded in 1922• 23,000 employees and over 5 billion
pounds in annual revenue.
Goals• Create a dynamic semantic publishing
platform that assembled web pages on-the-fly using a variety of data sources
• Deliver highly relevant data to web site visitors with sub-second response
Challenges• BBC journalists author and publish content
which is then statistically rendered. The costs and time to do this were high.
• Diverse content was difficult to navigate, content re-use was not flexible
• User experience needed to be improved with relevant content
"The goal is to be able to more easily and accurately aggregate content, find it and share it across many sources. From these simple relationships and building blocks you can dynamically build up incredibly rich sites and navigation on any platform." John O’Donovan Chief Technical Architect
#13Oct 2014
BBC: The Perfect Application (ctd)
• The BBC’s FIFA World cup 2010 project was widely recognized as the best showcase for SemTech– It used OWLIM as a triplestore (chosen after a thorough evaluation)– It triggered a wave of adoption of the technology
• The next milestone: London 2012 Olympic Games– The two most important websites used the DSP architecture: the
official one, operated by Press Association, and the one of the BBC– Ontotext text-mining technology was used for content enrichment
• Four years later this application pattern is still the best use case– And there are still no other triplestores that can survive such load,
judging by the LDBC Semantic Publishing Benchmark, public information and feedback from the industry
#14Semantic Technology in Publishing & Finance Oct 2014
Ontotext and AstraZeneca
Semantic Technology in Publishing & Finance
Profile• Global, Bio-pharma company• $28 billion in sales in 2012• $4 billion in R&D across three continents
Goals• Efficient design of new clinical studies• Quick access to all of the data• Improved evidence based decision-making• Strengthen the knowledge feedback loop• Enable predictive science
Challenges• Over 7,000 studies and 23,000 documents
are difficult to obtain• Searches returning 1,000 – 10,000 results• Document repositories not designed for
reuse• Tedious process to arrive at evidence
based decisions
#15Oct 2014
Context-based Disambiguation
Semantic Technology in Publishing & Finance #16Oct 2014
Semantic Technology in Publishing & Finance
Ontotext and LMI
Profile• Established in 1961 to enable federal
agencies • Specializes in logistics, financial,
infrastructure & information management
Goals• Unlock large collections of complex
documents• Improve analyst productivity• Create an application they can sell to US
Federal agencies
Challenges• Analysts taking hours to find, download
and search documents, using inaccurate keyword searches
• Needed a knowledge base to search quickly and guide the analysts – highly relevant searches
• Extracts knowledge from collection of documents
• Uses GraphDB to intuitively search and filter• Knowledge base used to suggest searches• Hyper speed performance• Huge savings in analyst time• Accurate results
#17Oct 2014
In Apr’14 OpenPolicy won the
Innovation Award of WP and NVTC
Some of our clients
#18
The most popular financial
newspaper
Oct 2014Semantic Technology in Publishing & Finance
• Introduction of Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up
Outline
Semantic Technology in Publishing & Finance #19Oct 2014
Publishing and Media Solution
Semantic Technology in Publishing & Finance #20Oct 2014
• Dedicated solutions for media and publishing• Based on the Ontotext Semantic Platform• Mature implementation and continuous adaptation
methodology• Introducing advanced features to the authoring,
editorial and publishing phases of content and data workflows
Solution Features
Semantic Technology in Publishing & Finance #21Oct 2014
#22
Methodology
Oct 2014Semantic Technology in Publishing & Finance
Architecture Overview
Semantic Technology in Publishing & Finance #23Oct 2014
Authoring
Related assets – as you type
Related entities and concepts
Entity profiles and facts on the fly
Create higher value content at the same cost
Oct 2014 #24Semantic Technology in Publishing & Finance
Contextual Authoring
Oct 2014Semantic Technology in Publishing & Finance
#25
Curation
Continuous adaptation through editorial feedback
Query driven publishing templates
Dynamic re-purposing and reuse
New publishing products with the same content
Oct 2014 #26Semantic Technology in Publishing & Finance
Example of Client Integrated Curation
Oct 2014 #27Semantic Technology in Publishing & Finance
Example Curation Tool: PressAssociation
Oct 2014 #28Semantic Technology in Publishing & Finance
Monitoring and Curation Curation Tool
Oct 2014Semantic Technology in Publishing & Finance
#29
Continuous Adaptation
Oct 2014Semantic Technology in Publishing & Finance
#30
Publishing
Dynamic construction of products (e.g. topic pages)
Personalized content streams
Semantics driven trend and user analytics
Behavior driven personal asset streams
Oct 2014Semantic Technology in Publishing & Finance
#31
User Behavior Tracking
Oct 2014Semantic Technology in Publishing & Finance
#32
Personalized Recommendations
Oct 2014 #33Semantic Technology in Publishing & Finance
#34
Methodology
Oct 2014Semantic Technology in Publishing & Finance
Methodology
#35Oct 2014Semantic Technology in Publishing & Finance
Methodology
#36Oct 2014Semantic Technology in Publishing & Finance
Methodology
#37Oct 2014Semantic Technology in Publishing & Finance
Complete Domain Ontology
#38Oct 2014Semantic Technology in Publishing & Finance
Example KB for 50 daily publications
#39Oct 2014Semantic Technology in Publishing & Finance
Methodology
#40Oct 2014Semantic Technology in Publishing & Finance
Design of Machine Learning Pipeline
#41Oct 2014Semantic Technology in Publishing & Finance
• Introduction of Ontotext• Clients, cases• Text Mining, Media and Publishing Solution• SemTech applications in Finance• Wrap-up
Outline
Semantic Technology in Publishing & Finance #42Oct 2014
Discovering Suspicious Relationships
Semantic Technology in Publishing & Finance #43Oct 2014
• Have a database of locations, with part-of info• Have a database with companies, with dependencies• Define semantics for the relevant relationships:
– sub-region and control are transitive relationships– Located-in is transitive over sub-region
• Define the semantics of suspicious relationshipsCONSTRUCT { ?orgA my:suspiciousLink ?orgB } WHERE {
?orgA ptop:locatedIn ?x ; fibo:controls ?y .
?y fibo:controls ?orgB ; ptop:locatedIn ?z .
?orgB ptop:locatedIn ?x .
?z a ptop:OffshoreZone .
}
What It Takes to Make It Work?
Semantic Technology in Publishing & Finance #44Oct 2014
Use Cases
Semantic Technology in Publishing & Finance
• Investigating networks of linked entities– As prerequisite for risk assessment and compliance research
• Risk assessment– Tracing information about suspicious entities– Identifying risk-indicators across multiple sources– Identifying risks related to linked entities– Determining exposure against a group of linked entities
• Compliance-related research– Fraud detection, insider trading, etc.
• Searching in large policies and regulations– See Open Policy
#45Oct 2014
How: Semantic BI/Data-warehouses
Semantic Technology in Publishing & Finance
• Imagine integrated database, which allows querying across silted databases– E.g. bond market data vs. risk assessment vs. equity markets vs. M&A– A lot of duplicate data across various databases in different
departments of banks, and data is simply not linked or organized in a unified data model
• Benefits compared to the mainstream technology:– Lower cost of development and maintenance;– Direct benefit from industry standards, using inference– Real-time updates, unlike traditional data-warehousing, where
updates should often be scheduled overnight– Support for a wide variety of analytical queries, which are far more
flexible than traditional approaches
#46Oct 2014
Semantic Technology in Publishing & Finance
Ontotext and top 3 Business Media
Profile• Top 3 business media• Focused both on B2C publishing and B2B
services
Goals• Create a horizontal platform for both data
and content based on semantics and serve all functionality through it
Challenges• Critical part of the entire workflow• Multiple development projects in parallel
with up to 2 months time between inception and go live
• GraphDB used not only for data, but for content storage as well
• Horizontal platform with focus on organizations, people, GPEs and relations between them
• Automatic extraction of all these concepts and relationships
• Separate stream of work for a user behavior based recommendation of relevant content and data across the entire media
#47Oct 2014
Reference Projects: BCA/Euromoney
Semantic Technology in Publishing & Finance
• BCA/Euromoney Macroeconomics Reports– Implementation of the Euromoney Semantic Platform
• Automatically generate metadata about:– Markets, geo-political entities, economies, currencies, indicators, indices;– Themes of the report;– Economic and market conditions;– Views of the economist with horizon, focus of the view, and prediction; – Suggested trades of tradable objects (bonds, commodities, equities).
• Semantic indices powering various services: – Live Charts – serving macro economics charts with the possibility to add
additional data series/indices;– Macroeconomists dashboard of views with their objects, sentiment,
horizon, and agreement/disagreement.
#48Oct 2014
Semantic Technology in Publishing & Finance
Ontotext and Euromoney
Profile• Euromoney Institutional Investor PLC, the
international online information and events group
Goals• Create a horizontal platform to serve 100
different publications • create a new publishing and information
platform which would include the latest authoring, storing, and display technologies including, semantic annotation, search and a triple store repository
Challenges• Different domains covered • Sophisticated content analytics incl.
Relation, template and scenario extraction
• Analytics of reports and news of various domains
• Extraction of sophisticated macro economic views on markets and market conditions; trades, condition and trade horizons, assets, asset allocations, etc.
• Multi-faceted search • Completely new content and data
infrastructure
#49Oct 2014
• Ontotext has a full stack of semantic technologies• Triplestores combine beauties from NoSQL and SQL• Inference fosters discovery in diverse dynamic data• GraphDB is in a league on its own:
– Standard compliant – comprehensive support for OWL and SPARQL– Efficient inference through the entire life-cycle of the data– H igh-availability cluster architecture – proven and mature– FTS and NoSQL Connectors for seamless integration
• End-to-end solution for Media and Publishing– Authoring, curation and publishing through adaptive text-mining
• All the above proven with industry leaders
Wrap up
Semantic Technology in Publishing & Finance #50Oct 2014