91
ailab.ijs.si Machine Learning and Knowledge Discovery for Semantic Web Dunja Mladenić Artificial Intelligence Laboratory, J. Stefan Institute, Slovenia

Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Machine Learning and Knowledge Discovery for Semantic Web

Dunja MladenićArtificial Intelligence Laboratory,

J. Stefan Institute,

Slovenia

Page 2: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Jožef Stefan Institute, Artificial Intelligence Laboratory

Selection of FP6 & FP7 Projects (Integrated Projects and Networks of Excellence only):

FP7 IP ACTIVE – Enabling the Knowledge Powered Enterprise

FP7 IP COIN – COllaboration and INteroperability for networked enterprises

FP7 IP EURIDICE – Inter-Disciplinary Research on Intelligent Cargo for Efficient, Safe and Environment-friendly Logistics

FP7 NoE PASCAL2 – Pattern Analysis, Statistical Modeling and Computational Learning

FP7 NoE MetaNet – Machine Translation & Multilingual Information Retrieval

FP7 NoE Multilingual Web

FP6 IP NeOn – Lifecycle Support for Networked Ontologies

FP6 IP ECOLEAD – European Collaborative Networked Organizations Leadership Initiative

FP6 IP SEKT – Semantically-Enabled Knowledge Technologies

Jozef Stefan Institute (JSI) is the leading Slovene research institution for natural sciences (900+ people)

in the areas of computer science, physics, chemistry

Artificial Intelligence Laboratory has over 30 people working in various areas of artificial intelligence(machine learning, data mining, semantic technologies, computational linguistics, logic)

Spinoff-s: Quintlligence, Cyc-Europe, LiveNetLife, ModroOko, Envigence

Selection of Portals and Products:

Text-Garden (http://www.textmining.net)

Enrycher (http://enrycher.ijs.si/)

VideoLectures.NET (http://videolectures.net/)

IST-World (http://www.ist-world.org/)

Project Intelligence (http://pi.ijs.si/)

Search-Point (http://searchpoint.ijs.si/)

OntoGen (http://ontogen.ijs.si/)

Document-Atlas (http://docatlas.ijs.si/)

AnswerArt (http://answerart.net/)

Contextify (http://contextify.net/)

Document-Atlas

VideoLectures.NET

Business Clients: Accenture Labs, Bloomberg, British Telecom, Google Labs, Microsoft Research, New York Times, Siemens, Wikipedia

Academic Partners: Carnegie Mellon, Cornel, Stanford, MIT, Uni. Maryland, KIT, UCL

Enrycher IST-WorldSearchPoint

OntoGen AnswerArt Contextify e-mails

Page 3: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

AILabTechnologies

Graph/Social Network Analysis

(GraphGarden/SNAP, IST-World,

FPIntelligence)

Complex Data Visualization

(DocAtlas, NewsExplorer, SearchPoint)

Computational Linguistics

(Enrycher, AnswerArt)

Social Computing/Web2.0 (LiveNetLife)

Light-Weight Semantic Technologies

(OntoGen, Contextify)

Deep Semantics & Reasoning (Cyc)

Statistical Machine Learning

Data/Web/Text/Stream-Mining

(TextGarden Suite of tools)

Page 4: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Outline

Motivation

Machine Learning and Ontologies

OntoGen

OntoPlus

Semantics for search and browsing

SearchPoint

AnswerArt

Enrycher

Sensor Search

Real-time data processing

NYTMiner, BBMiner, Personalized News Search

…to conclude

Page 5: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Motivation

Semantic Web

integrates many existing ideas and technologies focusing on

upgrading the existing nature of web-based information

systems to a more “semantic” oriented nature

typical approach is top-down modeling of knowledge and

proceeding down towards the data

Machine Learning and Knowledge Discovery in

Databases

aims at data modeling and extraction of interesting (non-

trivial, implicit, previously unknown and potentially useful)

information from large datasets

data-driven bottom-up approach trying to discover the

structure in the data and express it in the more abstract ways

and rich knowledge formalisms

Page 6: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

ML & KDD role within Semantic WebOntology construction

SW applications involve deep structured knowledge composed into ontologies

ML/KDD discovering structure in the data - structuring knowledge

semi-automatically extract knowledge from data into ontological structure

Integrating domain knowledgeML/KDD approaches, e.g., “Active Learning” and “Semi-supervised Learning” make use of small pieces of human knowledge for better guidance towards the desired model (e.g., ontology)

reduce human efforts by an order of magnitude preserving the quality of results

Handling data over time - dynamic ontologiesdata and the corresponding semantic structures change in time

KDD technologies for stream mining - deal with the stream of incoming data fast enough to be up-to-date with the corresponding models (ontologies)

Supporting different data modalitiesML/KDD technologies are not limited to a specific data representation -handling different data modalities (databases, text, multimedia, graphs)

ML/KDD for Language Technologies SW mainly deals with textual data, LT are thus important for SW including lexical, syntactical and semantic levels of natural language processing

ML/KDD for modeling natural language by automatic learning from rare/costly data

Scalability KDD approaches consider scalability

SW is ultimately concerned with real-life data on the web which have exponential growth

Page 7: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Ontology - SW commonly uses ontologies to structure knowledge

Ontology can be seen as a graph/network

structure consisting from:

a set of concepts (vertices in a graph),

a set of relationships connecting concepts

(directed edges in a graph),

a set of instances assigned to a particular

concepts (data records assigned to vertices in

a graph)

Page 8: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Ontology construction

One of the methodologies defined for ontology construction is a methodology for semi-automatic ontology constructionanalogous to the CRISP-DM methodology can be defined as consisting of the following interrelated phases:

1. domain understanding (what is the area we are dealing with?),

2. data understanding (what is the available data and its relation to semi-automatic ontology construction?),

3. task definition (based on the available data and its properties, define task(s) to be addressed),

4. ontology learning (semi-automated process addressing the task(s)

5. ontology evaluation (estimate quality of the solutions to the addressed task(s)),

6. refinement with human in the loop (perform any transformation needed to improve the ontology and return to any of the previous steps, as desired)

[Grobelnik & Mladenić 2006]

Page 9: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

ML/KDD for ontology learning

Define the ontology learning tasks in terms of mappings between ontology components, where some of the components are given and some are missing and we want to induce the missing ones.

Some typical scenarios in ontology learning are the following:

Inducing concepts/clustering of instances (given instances)

Inducing relations (given concepts and the associated instances)

Ontology population (given an ontology and relevant, but not associated instances)

Ontology generation (given instances and any other background information)

Ontology updating/extending (given an ontology and background information, such as, new instances or the ontology usage patterns)

Page 10: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Ontology Population via document classification into topic ontology

Goal: given a collection of documents organized into a topic ontology, classify a new document into the ontology

Different classification algorithms were applied on different data representations (e.g., word-vectors, word n-gram vectors, flexible phrase vectors)

on different datasets (e.g., Yahoo! directory of Web pages, US patent database, Directory of Slovenia/Croatian Web pages, News directory)

Page 11: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

OntoClassify

System for scalable classification of text into large

topic ontologies [Grobelnik & Mladenić, 2005]

Available as Web service

for DMoz directory of Web pages

for Inspec ontology for annotating papers

for Mesh medical ontology

Page 12: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Constructing ontology from data stream

Goal: given a stream of documents (e.g., news

arriving over time) construct ontology

Solution: Framework that incorporates the stream

mining process into a formal definition of ontology[Grobelnik et al., 2006]

Extract named entities and use them as instances of the ontology

Entities and co-occurring entity pairs are represented by feature

vectors based on the content of the documents they occur in

Concepts and relations can be formed either by clustering or by

classification into an existing topic hierarchy

Page 13: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Illustrative results on Reuters news

Observe change in relations between entities

over time, e.g.,

France – UK relation focused first on

Society (Society, Government, Regional,...) and later

moves to

Business (Investing, Business, Stocks, Bonds,…);

Page 14: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Ontology Learning from text

Extending the existing ontologycommonly used is the English lexical ontology WordNet that is extended using some text, eg., Web documents [Agirre et al., 2000]

Learning relations for an existing ontology (from docs)learn relations between the concepts (eg., “isa” [Cimiano et al., 2004], “hasPart” [Maedche, Staab, 2001]), extract semantic relations from text based on collocations [Heyer et al., 2001]

Ontology construction based on clustering (from docs)split each document into sentences, parse the text and apply clustering for semi-automatic construction of an ontology [Bisson et al., 2000; Reinberger et al., 2004]

cluster sentences map them upon the concepts of a general ontology (eg., Wordnet [Hotho et al., 2003])

use whole documents and guiding the user through a semi-automatic process of ontology construction [Fortuna et al., 2005]

Page 15: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Ontology Learning from text (cont)

Ontology construction based on semantic graphsparse the documents and construct semantic graphs, use it for learning document summaries [Leskovec et al., 2004]

Ontology construction from a collection of news stories

represent news as graphs of named entities with relationships based on collocations, used for visualization/browsing [Grobelnik, Mladenić, 2004]

More information in edited book [Buitelaar et al., 2005]

Page 16: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

SEMI-AUTOMATIC DATA-DRIVEN ONTOLOGY CONSTRUCTION

Blaz Fortuna, Dunja Mladenić, Marko Grobelnik

http://ontogen.ijs.si

Page 17: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Ontology Learning with OntoGen

Semi-Automaticprovide suggestions and insights into the domain

the user interacts with parameters of methods

final decisions taken by the user

Data-Drivenmost of the aid provided by the system is based on some underlying data

instances are described by features extracted from the data (eg., words-vectors)

Installation package available at ontogen.ijs.si

Page 18: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Main Features

Interactive user interface

User can interact in real-

time with the integrated

machine learning and text

mining methods

Concept discovery

methods:

Unsupervised

k-means clustering

Latent Semantic

Indexing (LSI)

Supervised

Active learning

Concept visualization

Methods for helping at

understanding the

discovered concepts:

Keyword extraction

TFIDF and SVM-normal

based keyword extraction

Concept visualization

LSI and multi-dimensional

scaling based visualization

Also available as a separate

tool named Document

Atlas:http://docatlas.ijs.si

Page 19: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Ontology management

Concept hierarchy

List of suggested sub-concepts

Ontology visualization

Selected concept

Page 20: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Concept management

Concept’s details

Concept’s instance

management

Selected concept

Keywords

Selected instance

Page 21: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Active Learning for concept learning

SVM hyperplane distance based active learning algorithm

First few labelled documents are bootstrapped from a query search

Instances for final concept are selected using the final SVM model

Query

New Concept

Page 22: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Reuters news articles used in the upper example with two different

sets of categories: topics or list of countries that appear in the news

articles.

Each set of categories offers a different view on the data.

SVM based method detects importance of keywords for each view.

Multiple views of the same data

Topics

view

Countries

view

UK takeovers and mergers

The following are additions

and deletions to the

takeovers and mergers list

for the week beginning

August 19, as provided by

the Takeover …

Lloyd’s CEO questioned in

recovery suit in U.S.

Ronald Sandler, chief

executive of Lloyd's of

London, on Tuesday

underwent a second day of

court interrogation about …

Page 23: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Instances are visualized as points on 2D map. The distance between two

instances on the map correspond to their similarity.

Characteristic keywords are shown for all parts of the map.

User can select groups of instances on the map to create sub-concepts.

Concept’s instances visualization

Page 24: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

New documents

Classification of selected document

Selected document

Ontology population

System uses one vs. all linear SVM trained on created ontology to classify new instances into concepts.

Users can finalize the classifications using an interactive user interface

Page 25: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

ONTOGEN ON IMAGES

Nenad Tomašev, Blaz Fortuna, Dunja Mladenić, Marko Grobelnik

Page 26: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

SIFT features

Color

info

Text

Extract

features

Data

Mining

Application

Image representation

Page 27: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Image representation - features

SIFT features

Rotation, scale and translation invariant orientation

gradients located at “interesting” points on an image

Usually, SIFT feature space is quantized to get

“representative” vectors (“codebook” histogram)

Color histogram

Simply divide the color spectrum into “buckets” and

calculate the distribution of colors into these buckets,

(color histogram)

Distance - weighted sum of SIFT codebook and color data

distances

Page 28: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

OntoGen on ImageNet subset (flowers, fire, buildings)

Page 29: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Document list for quick overview

Page 30: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Collection visualization (without displaying images)

Page 31: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Collection visualization(displaying images)

Page 32: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Creating ontology on images

Grouping similar images - concepts

Displaying relevant features as concept names

Page 33: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Sub-concept visualization

flower

buildings

fire

Page 34: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Adding sub-concepts

Page 35: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

TEXT-DRIVEN ONTOLOGYEXTENSION

Inna Novalija, Dunja Mladenić

Page 36: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Arc

hit

ectu

re

OntoPlus

OntoPlus methodology

allows for the effective

extension of the very large

ontologies.

OntoPlus methodology

provides the user with

required concepts and

relationships in the form

of the ranked list.

OntoPlus methodology

combines textual ontology

content, ontology structure

and co-occurrence

information.

Domain Subset Extraction Module (DSEM)

Ontology Extension

Module (OEM)

3

4

5

Ontology Extender

Validated Entries:

Glossary Term,

Ontology Concept,

Relation

Candidate Entries:

for Each Glossary Term -

Ranked List of Related

Ontology Concept s and

Correspondent Relations

Suggested

Domain

Knowledge

Extractor

Extraction of

ontology concepts

defined in relevant

domains

Extraction of ontology

concepts with denotation

similar to Glossary Term

names

Extraction of

relevant domains

2 Relevant

Ontology

SubsetUpper-Level

Domain

Extractor

6

Multi-Domain

Ontology

7

Domain KB

Domain Information Module (DIM)

Domain

Keywords

Domain Glossary:

Term Names;

Term Descriptions

1

Domain information

identification

Extraction of the

domain relevant

ontology subsetRelated concepts

extraction

User validation

Ontology reuse

Page 37: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

OntoPlus

Text-Driven Ontology Extension Using Ontology Content,

Structure and Co-occurrence

Ranking existing ontology concepts as corresponding to a new

domain concept suggested for the ontology extension

Experiments using Cyc ontology and textual material from two

domains – Finances and, Fisheries & Aquaculture

Best results by combining content, structure and co-occurrence

information

Financial domain - ontology content and structure

Fisheries & Aquaculture domain - ontology content and co-

occurrence

Page 38: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Results – Concept Ranking

100 Random Terms

HR (Top 1) HR (Top 5) HR (Top 10)

Weighting Measure Eqv or Hier

Rels

Any

Rels

Eqv or Hier

Rels

Any Rels Eqv or Hier

Rels

Any Rels

Baseline - Name: [1.0] 18 28 24 36 25 40

Content (cos. similarity): [1.0] 32 65 60 92 68 95

Co-occur (Jaccard similarity): [1.0] 30 48 48 62 52 73

Content: [0.5]

Structure: [0.4]

Co-occur: [0.1]

38 68 66 95 76 98

100 Random Terms

HR (Top 1) HR (Top 5) HR (Top 10)

Weighting Measure Eqv or Hier

Rels

Any Rels Eqv or Hier

Rels

Any Rels Eqv or Hier

Rels

Any Rels

Baseline - Name: [1.0] 24 37 25 38 27 40

Content (cos. similarity): [1.0] 32 72 52 88 56 91

Co-occur (Jaccard similarity): [1.0] 33 71 49 89 51 90

Content: [0.5]

Structure: [0.0]

Co-occur: [0.5]

42 84 63 96 66 96

Evaluation of the top suggested candidate concepts for ontology extension

(ASFA thesaurus)

Evaluation of the top suggested candidate concepts for ontology extension

(Financial glossary)

String edit distance of

concept name

Content +

Co-occurrence

Content +

Structure +

Co-occurrence

String edit distance of

concept name

Page 39: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Demo

Page 40: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

CONTEXT SENSITIVE SEARCH

Boštjan Pajntar, Marko Grobelnik, Dunja Mladenić

http://SearchPoint.ijs.si

Page 41: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

SearchPoint

Search engines generally work very well

There are cases where it is difficult to specify aquery

Idea: help the user by clustering all the hits and visualise the results space

Some related work: mindset.research.yahoo.com – research vs. shopping aspect

www.ujiko.com – clustering & user interface

vivisimo.com – hierarchical clustering

Page 42: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Approach Description

Search results clustered and shown in 2D space

Each point in this cluster space coresponds to a ranking

Hits are ordered according to the position of the focus -

the selected point

Initial focus position corresponds to Google ranking

Positioning clusters with respect to centroid to centroid

similarity

Calculating ranking of document using its similarity to each

centroid:

Classifiying documents into web directory (DMoz),

visualising relevant parts of the directory

Page 43: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Search

“Internet search” – one of the

most common tasks involving

text manipulation in everyday

life

…but – how smart is search

technology today?

…not too smart!

It is sophisticated, but not smart

Page 44: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Example – Searching for “jaguar”

Query “jaguar” has many meanings…

…but the first page of search engines doesn’t provide us with many answers

…there are 84M more results

Page 45: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Query

Conceptual map

Search Point

Dynamic

contextual

ranking based

on the search

point

Context sensitive search

Page 46: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

SearchPoint

Page 47: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

SearchPoint

Page 48: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Main advantages

Generated clusters

(in contrast to predefined)

User can search the whole cluster space and is

not forced to select a single cluster

(Computer generated clusters are not necessarily

what user has in mind)

Page 49: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

SearchPoint integrated in Accenture’s intranet search

Page 50: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

ANSWER ART

Luka Bradeško, Lorand Dali, Blaž Fortuna, Marko Grobelnik, Dunja

Mladenić, Inna Novalija, Boštjan Pajntar

http://AnswerArt.net

Page 51: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

TripletsExtendedontology

AnswerArt – System Architecture

AnswerArtpreprocessing

Domain ontology(ASFA, WordNet)

Semantic enhancement

of triplets

AnswerArt

Index

Extraction

Cyc

Question Answer

Page 52: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

AnswerArt using Medline

Page 53: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Show

document

AnswerArt using Medline

Page 54: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Show document

overview

Page 55: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Page 56: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

AnswerArt using ASFA

Page 57: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

AnswerArt using ASFA

Show

document

Page 58: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

AnswerArt using ASFA

Show document

overview

Page 59: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

NATURAL LANGUAGE TEXTENRICHMENT

Tadej Štajner, Delia Rusu, Lorand Dali, Blaž Fortuna,

Dunja Mladenić, Marko Grobelnik

http://enrycher.ijs.si

Page 60: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Enrycher Service

Annotation Features:

Entity extraction

People, locations, organizations,

dates, percentages and money

amounts

Entity resolution

co-reference

anaphora

Entity linkage to Linked Open

Data (LOD)

Word Sense Disambiguation to

LOD (WordNet 3.0 VUA)

Assertion extraction

Subject – predicate – object sentence

elements together with their modifiers

Categories – from the Open

Directory and the Wikipedia category

schema

Page 61: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Entity resolution in text

Page 62: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Enrycher Service Dependencies

The dashed line marks dependencies between components that are optional,

whereas the filled lines mark required dependencies

Page 63: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

A comparative view on five systems: Enrycher, Text Runner, Open Calais, GATE and Read the

Web

Features Enrycher Text Runner Open Calais GATE NELL

Named Entity Extraction

Co-reference and

Anaphora Resolution

Entity resolution

Disambiguation

Assertion Extraction Relationshipextraction

Events andFacts

Relationshipextraction

Categories

Vizualization

RDF Output

Multi-Language Support English English,

French,Spanish

Web Service API

Can work on a singledocument

Page 64: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Enrycher - demo

Page 65: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Enrycher - demo

Page 66: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Enrycher - demo

Entities

Semantic graph

Page 67: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Enrycher - demo

Entity details

In OpenCyc

Category

Page 68: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

OPINION MINING

Andreea Bizău, Delia Rusu, Dunja Mladenić

Page 69: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Opinion MiningUse case: Twitter comments on movies

amazing,

awesome

Weird,

odd

Weird, odd,

bad

amazing,

awesome,

perfect,

fantastic

IMDb Movie reviews*

(sample)

IMDb Movie reviews*

(Training data)

Domain-specific

opinion vocabulary

2 Clusters

Vocabulary

* http://www.cs.cornell.edu/people/pabo/movie-review-data/

applied to

Twitter comments analysis

Movie tweets

(Test data)

Page 70: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Twitter comments

analysis

• Sentiment words

distribution for a

movie

• Sentiment orientation

evolution per week,

day, hour

• Movie comparison

Page 71: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

SENSOR SEARCH

Lorand Dali, Alexandra Moraru, Dunja Mladenić

Page 72: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Sensor Search - Architecture

Sensor Descriptions

(Text)Inverted Index

Ranking Model

(Personalized PageRank)

Geo Filtering

S

E

A

R

C

H

E

N

G

I

N

E

Query

• keywords

• center of area

of interest

• radius of area

of interest

Page 73: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Page 74: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

REAL-TIME INFORMATION PROCESSING

Blaz Fortuna, Dunja Mladenić, Marko Grobelnik

Page 75: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Generic platform running on clouds for intensive data stream analytics…processes thousands of events per second

…includes state of the art data/text/web/stream-mining algorithms

Deployed in British Telecom, NYTimes, Bloomberg, Microsoft, TheStreet.com,

… ongoing work with Google News, Telefonica, Wikipedia,

QMiner – generic software platform for Real-Time information processing &

Complex Event Detection & Anomaly Detection

Transform&

Enrich

Anomaly

detection

Complex

events

detection

Analytics: Prediction,

Segment, Visualization

Model

CaptureReality

(Events)

Sensors,

Alarms,

User logs,

Page 76: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Network Monitoring for British Telecom

Alarms Server

Alarms

Explorer

Server

Live feed of

data

Operator Big board display

British

Telecom

Network

(~25 000

devices)

Alarms~10-100/sec

Alarms Explorer Server implements three real-

time scenarios on the alarms stream:

1. Root-Cause-Analysis – finding which device is

responsible for occasional “flood” of alarms

2. Short-Term Fault Prediction – predict which

device will fail in next 15mins

3. Long-Term Anomaly Detection – detect unusual

trends in the network

…system is used in British Telecom

Page 77: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

VisualizingRoot-cause

and prediction

Root-

cause

Prediction

Page 78: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

How Well Are We Predicting

Percentage Realisation of Predictions

0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

Minutes

Pe

rce

nta

ge

86%

80%

60%

Page 79: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

User Modeling for NYTimes & Bloomberg

Log Files

(~100M

page clicks

per day)

User

profiles

NYT

articles

Stream of

profiles

Advertisers

Segment Keywords

Stock Market Stock Market, mortgage, banking,

investors, Wall Street, turmoil, New York

Stock Exchange

Health diabetes, heart disease, disease, heart,

illness

Green

Energy

Hybrid cars, energy, power, model,

carbonated, fuel, bulbs,

Hybrid cars Hybrid cars, vehicles, model, engines,

diesel

Travel travel, wine, opening, tickets, hotel, sites,

cars, search, restaurant

… …

Segments

Trend Detection System

Stream of

clicks

Trends and

updated segments

Campaign

to sell

segments

$

Sales

Page 80: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Generalizing from registered users

BEP for Age (20% = random)

50,0%

55,0%

60,0%

65,0%

70,0%

75,0%

Conte

xt

Text F

eatu

res

Nam

ed E

ntities

All

Me

ta D

ata

All

Conte

nt

All

Fe

atu

res

Male

Female

BEP for Gender on users with at

least 10 visits (50% = random)

20,00%

25,00%

30,00%

35,00%

40,00%

45,00%

≥2

≥10

≥50

Page 81: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Good recommendations

can make a big difference

when keeping a user on a

web site

…the key is how rich context

model a system is using to

select information for a user

Bad recommendations <1%

users, good ones >5% users

click

Using User Modeling for News Recommendations

Contextual

personalized

recommendations

generated in ~20ms

Page 82: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

RecommendationFeatures:

History (user profile)

Geo (based on IP)

Requested page (where we serve recommendation)

Referring URL

Time

timenow

US

Finance

Oil

All History Context Geo Requested Referring Time

Top1 Recall 66 65 65 65 66 60 60

Top2 Recall 81 78 78 75 78 67 67

Top3 Recall 86 83 83 79 81 72 72

Top Precision 52 48 49 43 41 36 36

Regular (visits > 50)

Context Geo Requested Referring Time

Top1 Recall 60 58 46 60 60

Top2 Recall 77 70 61 71 71

Top3 Recall 85 77 72 78 78

Top Precision 45 36 35 37 37

New (first visit)

training

Page 83: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Real-time Architecture

Logging

Collaborative Filter

SVM

Archive

Web

Amazon

Crawl

Page 84: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Results

0,0%

1,0%

2,0%

3,0%

4,0%

5,0%

6,0%

7,0%

17.apr 24.apr 1.maj 8.maj 15.maj

News Personalization Test Page-Story Page Transition Probabilities

Control JSI SVM Random JSI CF DailyMe Personalized Most Popular ContextualCompetitor

Page 86: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

PERSONALIZED NEWS SEARCH

Lorand Dali, Blaž Fortuna

Page 87: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Personalized News Search –System Architecture

Ranking Model

Learning to Rank

Query

Search

Logs

keywords

User

−age

−country

−gender

−income

−industry

−job

Page 88: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

User: Young female computer programmer

Query: Religion

Page 89: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

User: Middle aged male clergy

Query: Religion

Page 90: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Videolectures.net562 events, 8169 authors, 10539 lectures,

12859 videos

Page 91: Machine Learning and Knowledge Discovery for Semantic Webtranslectures.videolectures.net/site/normal_dl/tag=... · 2011-05-27 · Semantic Web integrates many existing ideas and technologies

ailab.ijs.si

Montreal @ Video Lectures