Text Mining in PoolParty Semantic Suite

Preview:

Citation preview

Martin KaltenböckCFO, Semantic Web Company

Timea Turdean Technical Consultant, SWC

POOLPARTY SEMANTIC SUITE

AIMS Webinar 21st Sept 2017

1

PoolParty Drupal Integration

2

Agenda▸ Introduction Semantic Web Company (SWC)

▸ Introduction PoolParty Semantic Suite

▸ Using PoolParty for Text & Data Mining

▹ Text Mining for continuous knowledge graph modelling

▹ Entity linking and data integration

▹ Classification and semantic annotation / tagging

▸ DEMO(s) of text mining capability of PoolParty

▸ Customer Success Stories

▹ REEEP ClimateTagger

▹ healthdirect Australia

▹ CTCN Semantic Search

▹ EIP Water Matchmaking

▸ Q&A Session

INTRODUCTIONSemantic Web Company &

PoolParty Semantic Suite

3

INTRODUCING SEMANTIC WEB COMPANY

Semantic Web Company (SWC)▸ Founded in 2004

▸ Based in Vienna

▸ Privately held

▸ 40+ employees, experts in text

mining & linked data

▸ ~15-20% revenue growth / year

▸ 2.5 Mio Euro funding for R&D

▸ SWC named to KMWorld’s 2017

‘100 Companies That Matter in

Knowledge Management’

▸ Organising SEMANTiCS

conference series for 13 years

▸ https://www.semantic-web.com

4

INTRODUCING POOLPARTY

PoolParty Semantic Suite

▸ First release in 2009

▸ Current version 6.0

▸ W3C standards compliant

▸ Over 200 installations

worldwide

▸ 50% of revenue is reinvested

into PoolParty development

PoolParty on-premises or

used as a cloud service

▸ KMWorld listed PoolParty as Trend-Setting Product 2015, 2016 and 2017

▸ https://www.poolparty.biz/

5

SELECTED CUSTOMER REFERENCESAND PARTNERS

SWC head-quarters

6

Customer References

● Credit Suisse● Boehringer Ingelheim● Roche● adidas● The Pokémon Company● Canadian Broadcasting Corporation● Harvard Business School● Wolters Kluwer● Talend● HealthStream● TC Media● Techtarget● Seek● Alliander N.V.● Pearson - Always Learning● Education Services Australia● American Physical Society● Healthdirect Australia● World Bank Group● Inter-American Development Bank● Renewable Energy Partnership● Wood MacKenzie● Oxford University Press● International Atomic Energy Agency● Norwegian Directorate of Immigration● Ministry of Finance (AT)● Council of the E.U.● Australian National Data Service

Partners

● Accenture● EPAM Systems● Enterprise Knowledge● Mekon Intelligent Content Solutions● B-S-S Business Software Solutions● MarkLogic● Wolters Kluwer● Digirati● Quark

US East

US West

AUS/NZL

UK

MAKE USE OF POOLPARTY SEMANTIC SUITE

OVERVIEW

7

TECHNICAL CORE COMPONENTS

8

Bain Capital is a venture capital

company based in Boston, MA.

Since inception it has invested in

hundreds of companies including AMC

Entertainment, Brookstone, and Burger

King. The company was co-founded by

Mitt Romney.

Taxonomy & Ontology Server

Entity Extractor & Text Mining

Data Integration & Data Linking

Unstructured

Data

Semi-

structured

Data

Structured

Data

Unified

Views

PoolParty

GraphSearch

Identify newcandidate conceptsto be included in a controlled vocabulary

Controlled vocabulariesas a basis for highly

precise entity extraction

Entity Extractor informsall incoming data streams about its semantics and links them

Schema mapping based on ontologies

RDF

Graph Database

PoolParty Semantic Suite

System Architecture Overview

9

360-degree views over various content repositories

10

‘Elevator Pitch’

▸ Built as a ‘Semantic Middleware’

▸ Outstanding user-friendliness

▸ Fully standards-compliant

▸ Highly precise entity extraction

▸ Comprehensive API

▸ Excellent maintainability of extraction models

▸ Integrated with leading search engines & graph databases

▸ Integrated with leading content management platforms

▸ Product configuration options for growing requirements

▸ Highly expertised partners / service team

11

Product Overview

All products are available as cloud services or for on-premise installation

> PoolParty Feature & Price Matrix

12

PoolParty Basic Server

PoolParty Advanced Server

PoolParty Enterprise Server

PoolParty Semantic Integrator

SKOS Taxonomy ManagementMultiple Projects

Taxonomy Rest APIImport/Export (incl. Excel)

Rollback and History

Ontologies and Custom SchemesQuality Management & ReportsAdvanced Corpus Management

Vocabulary Mapping, Linked Data MappingLinked Data Enrichment, Frontend, and SPARQL endpoint

Entity Extractor Extractor APIAuto Populate project from DBpedia

Export to Remote RepositoryWorkflow Management

SKOS-XL (optional)

Integration with Graph databasesIntegration with Search engines

Data linking & mappingData transformation pipelines with UnifiedViews

Graph Search Server

HOW DOESTHIS WORK

Taking a look under the hood

13

BASIC PRINCIPLESBenefiting from the Semantic Web

in a Nutshell

14

Four-layered Content Architecture

15

Metadata and semantic data

16

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Metadata and semantic data

17

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Peggy Guggenheim

Peggy Guggenheim Collection

Venice

Canale Grande

http://my.com/resource/328832

skos:preLabel

http://my.com/docs/45367

skos:preLabel

http://my.com/docs/52345

skos:preLabel

http://my.com/resource/328832

skos:preLabel

Metadata and semantic data

18

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Peggy Guggenheim

Peggy Guggenheim Collection

Venice

museum

Canale Grande

skos:preLabel

http://my.com/docs/45367

skos:preLabel

http://my.com/docs/52345

skos:preLabel

skos:preLabel

http://my.com/resource/62545

skos:preLabel

http://www.mycom.com/images/90546089

imgae

has ladmark

named after

http://my.com/resource/328832

http://my.com/resource/328832hosted in

hosted in

has

Metadata and semantic data

19

The Peggy Guggenheim Collectionis a modern art museum on the Grand Canal in the Dorsoduro sestiere of Venice, Italy. It is one of the most visited attractions in Venice. The collection is housed in the Palazzo Venier dei Leoni, an 18th-century palace, which was the home of the American heiress Peggy Guggenheim for three decades. She began displaying her private collection of modern artworks to the public seasonally in 1951. After her death in 1979, it passed to the Solomon R. Guggenheim Foundation, which eventually opened the collection year-round.

Peggy Guggenheim Collection

dct:title

Mike Miller

Michael Miller

skos:prefLabel

skos:altLabel

dct:creator

http://my.com/docs/328832

http://my.com/people/32schema:Article

rdf:type

http://my.com/img/99.jpg

schema:image

skos:subject

Peggy Guggenheim Collection Venice

museum

skos:prefLabel

skos:subject

skos:altLabel

skos:broader

skos:prefLabel

schema:image

Canale Grande

skos:prefLabel

Resolving Language Problems

“While most people can deal with linguistic features as synonyms, homographs, polyhierarchies, and even with far more peculiar characteristics of natural languages, machines often struggle with automatic sense-making because of the lack of a semantic knowledge model that can be used programmatically.”

Knowledge Graph Text Mining for

knowledge graph development

21

PoolParty Extractor

Uses several components of a knowledge model:

▸ Taxonomies based on the SKOS standard

▸ Ontologies based on RDF Schema or OWL

▸ Word form dictionaries

▸ Blacklists and stop word lists

▸ Disambiguation settings

▸ Domain-specific reference document corpus

▸ Statistical language model

22

PoolParty’s SKOS editor

23

The Audi Q3 is a compact crossover SUV made by Audi.

It is based on the PQ35 platform of Volkswagen.

A5 platform

A series

PoolParty’s ontology and custom schema management

24

Taxonomy

Ontology

Ontology 1from library

Ontology 2(imported)

Ontology 3(custom-made)

Custom Schema

‘Setting the rules’ for text mining & entity extraction via thesaurus

25

Proper use of an funduscoperequires a bit of practice and familiarity with the functions of your device.

Diagnostic Equipment

Ophtalmoscope

Disambiguation settings

26

Disambiguation settings

27

Corpus analysis results in a network of concepts and terms

28I need support to continuously extend our taxonomy / controlled vocabulary!

skos:Concept

ReferenceCorpus

- Websites- PDF, Word, …- Abstracts from

DBpedia- RSS Feeds

skos:Concept

skos:Concept

Term 1

Term 3

Term 7

Term 8

Term 6

Term 4

Term 2

Term 5

- Relevant terms and phrases- Relevancy of concepts- co-occurence between concepts and terms- co-occurence between terms and terms

Semantic AnnotationClassification and Semantic

Annotation / Tagging

29

Entity Extraction based on Knowledge Graphs

30

PoolParty as a supervised learning system

31

Content Manager

Integrator

Taxonomist/Ontologist

ThesaurusServer

Extractor

PowerTagging

uses API

is user of

is user of

is basis of

is basis of

Index

annotates

enriches

Reference Corpus

CMS

extends

is basis of

analyzesuses API

Data Integration Mapping and Linking of Data

32

PoolParty Semantic Integrator -at a glance

https://youtu.be/l_LppfS3wxk

33

Deep Data Analytics

SemanticSearch

SemanticIntegrator

Unstructured Data

Structured Data

ETL / Monitoring / Scheduling

PoolParty Semantic Integrator

High-level architecture

34

DEMO(s)… lets see how it works in action

35

PoolParty Thesaurus Manager● SKOS editor● Ontology and custom scheme manager

PoolParty PowerTagging for Drupal (backend)● Automated Tagging ● Manual Tagging ● Configuration of modules

PoolParty GraphSearch for Drupal (frontend)● Semantic Search● Explore Trends & Sentiments● Facets and Similarity

36

DEMOS

Drupal and PoolParty at a Glance

37

PoolParty Drupal Integration Demo: http://drupal.poolparty.biz/

USE CASESSuccess Stories about Text Mining and Linked Data

using PoolParty Semantic Suite

38

Use Cases: Text Mining & Linked Data

▸ Climate Tagger (PDF)Streamline and catalogue data and information resources

▸ healthdirect Australia (PDF)Semantic Search based on the Australian Health Thesaurus

▸ CTCN Semantic SearchIntegrating thousands of documents from several sources on climate technology

▸European Innovation Partnership /EIP) on Water Online Marketplace including semantic Matchmaking

39

Place your screenshot here

40

Climate TaggerHelp organizations in the climate and development arenas catalogue, categorize, contextualize, and connect data and information resources.

Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus.

http://www.climatetagger.net

How does it work

41

Place your screenshot here

42

EIP Water MatchmakingControlled vocabularies enable accurate matchmaking between Supply and Demand for Water Innovation in Europe.

Matchmaking is based upon the EIP Water Innovation Thesaurus (GEMET based).

http://www.eip-water.eu

Place your screenshot here

43

CTCN Semantic SearchHelp organisations in the climate technology field to explore and find relevant content from thousands of Drupal Nodes and several sources using PoolParty, PowerTagging and s0nr webmining

CTCN is backed by the CTCN Climate Technology Thesaurus.

https://www.ctc-n.org/semantic-search

Place your screenshot here

44

healthdirect AustraliaIntegrated views and semantic search over more than 100 trusted sources.

Harmonization of various metadata systems through the use of a central vocabulary hub: Australian Health Thesaurus.

http://www.healthdirect.gov.au

SUMMARY

WHY TAXONOMISTS AND INFORMATION ARCHITECTS LIKE POOLPARTY

Read more

Different project stakeholders expect specific qualities from a semantic technology platform:45

I am a taxonomist. I need a tool that provides convenient functionalities and intuitive user interfaces for my daily work.

I am an information architect. Enterprise metadata management deserves scalable technologies, which provide semantic services on top of rich APIs based on standards.

PoolParty Academy

Get certified!

46

https://www.poolparty.biz/academy/

GET STARTED

47

Get your test account atwww.poolparty.biz

CONNECT

Timea TurdeanTechnical Consultant, SWC▸ timea.turdean@semantic-web.com▸ https://www.linkedin.com/in/timeaturdean/▸ https://twitter.com/poolparty_team

48

© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/

Martin KaltenböckCFO, Semantic Web Company

▸ m.kaltenboeck@semantic-web.at

▸ https://www.linkedin.com/in/martinkaltenboeck

▸ https://twitter.com/semwebcompany

▸ https://blog.semantic-web.at/

Recommended