Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
ailab.ijs.si
Machine Learning and Knowledge Discovery for Semantic Web
Dunja MladenićArtificial Intelligence Laboratory,
J. Stefan Institute,
Slovenia
ailab.ijs.si
Jožef Stefan Institute, Artificial Intelligence Laboratory
Selection of FP6 & FP7 Projects (Integrated Projects and Networks of Excellence only):
FP7 IP ACTIVE – Enabling the Knowledge Powered Enterprise
FP7 IP COIN – COllaboration and INteroperability for networked enterprises
FP7 IP EURIDICE – Inter-Disciplinary Research on Intelligent Cargo for Efficient, Safe and Environment-friendly Logistics
FP7 NoE PASCAL2 – Pattern Analysis, Statistical Modeling and Computational Learning
FP7 NoE MetaNet – Machine Translation & Multilingual Information Retrieval
FP7 NoE Multilingual Web
FP6 IP NeOn – Lifecycle Support for Networked Ontologies
FP6 IP ECOLEAD – European Collaborative Networked Organizations Leadership Initiative
FP6 IP SEKT – Semantically-Enabled Knowledge Technologies
Jozef Stefan Institute (JSI) is the leading Slovene research institution for natural sciences (900+ people)
in the areas of computer science, physics, chemistry
Artificial Intelligence Laboratory has over 30 people working in various areas of artificial intelligence(machine learning, data mining, semantic technologies, computational linguistics, logic)
Spinoff-s: Quintlligence, Cyc-Europe, LiveNetLife, ModroOko, Envigence
Selection of Portals and Products:
Text-Garden (http://www.textmining.net)
Enrycher (http://enrycher.ijs.si/)
VideoLectures.NET (http://videolectures.net/)
IST-World (http://www.ist-world.org/)
Project Intelligence (http://pi.ijs.si/)
Search-Point (http://searchpoint.ijs.si/)
OntoGen (http://ontogen.ijs.si/)
Document-Atlas (http://docatlas.ijs.si/)
AnswerArt (http://answerart.net/)
Contextify (http://contextify.net/)
Document-Atlas
VideoLectures.NET
Business Clients: Accenture Labs, Bloomberg, British Telecom, Google Labs, Microsoft Research, New York Times, Siemens, Wikipedia
Academic Partners: Carnegie Mellon, Cornel, Stanford, MIT, Uni. Maryland, KIT, UCL
Enrycher IST-WorldSearchPoint
OntoGen AnswerArt Contextify e-mails
ailab.ijs.si
AILabTechnologies
Graph/Social Network Analysis
(GraphGarden/SNAP, IST-World,
FPIntelligence)
Complex Data Visualization
(DocAtlas, NewsExplorer, SearchPoint)
Computational Linguistics
(Enrycher, AnswerArt)
Social Computing/Web2.0 (LiveNetLife)
Light-Weight Semantic Technologies
(OntoGen, Contextify)
Deep Semantics & Reasoning (Cyc)
Statistical Machine Learning
Data/Web/Text/Stream-Mining
(TextGarden Suite of tools)
ailab.ijs.si
Outline
Motivation
Machine Learning and Ontologies
OntoGen
OntoPlus
Semantics for search and browsing
SearchPoint
AnswerArt
Enrycher
Sensor Search
Real-time data processing
NYTMiner, BBMiner, Personalized News Search
…to conclude
ailab.ijs.si
Motivation
Semantic Web
integrates many existing ideas and technologies focusing on
upgrading the existing nature of web-based information
systems to a more “semantic” oriented nature
typical approach is top-down modeling of knowledge and
proceeding down towards the data
Machine Learning and Knowledge Discovery in
Databases
aims at data modeling and extraction of interesting (non-
trivial, implicit, previously unknown and potentially useful)
information from large datasets
data-driven bottom-up approach trying to discover the
structure in the data and express it in the more abstract ways
and rich knowledge formalisms
ailab.ijs.si
ML & KDD role within Semantic WebOntology construction
SW applications involve deep structured knowledge composed into ontologies
ML/KDD discovering structure in the data - structuring knowledge
semi-automatically extract knowledge from data into ontological structure
Integrating domain knowledgeML/KDD approaches, e.g., “Active Learning” and “Semi-supervised Learning” make use of small pieces of human knowledge for better guidance towards the desired model (e.g., ontology)
reduce human efforts by an order of magnitude preserving the quality of results
Handling data over time - dynamic ontologiesdata and the corresponding semantic structures change in time
KDD technologies for stream mining - deal with the stream of incoming data fast enough to be up-to-date with the corresponding models (ontologies)
Supporting different data modalitiesML/KDD technologies are not limited to a specific data representation -handling different data modalities (databases, text, multimedia, graphs)
ML/KDD for Language Technologies SW mainly deals with textual data, LT are thus important for SW including lexical, syntactical and semantic levels of natural language processing
ML/KDD for modeling natural language by automatic learning from rare/costly data
Scalability KDD approaches consider scalability
SW is ultimately concerned with real-life data on the web which have exponential growth
ailab.ijs.si
Ontology - SW commonly uses ontologies to structure knowledge
Ontology can be seen as a graph/network
structure consisting from:
a set of concepts (vertices in a graph),
a set of relationships connecting concepts
(directed edges in a graph),
a set of instances assigned to a particular
concepts (data records assigned to vertices in
a graph)
ailab.ijs.si
Ontology construction
One of the methodologies defined for ontology construction is a methodology for semi-automatic ontology constructionanalogous to the CRISP-DM methodology can be defined as consisting of the following interrelated phases:
1. domain understanding (what is the area we are dealing with?),
2. data understanding (what is the available data and its relation to semi-automatic ontology construction?),
3. task definition (based on the available data and its properties, define task(s) to be addressed),
4. ontology learning (semi-automated process addressing the task(s)
5. ontology evaluation (estimate quality of the solutions to the addressed task(s)),
6. refinement with human in the loop (perform any transformation needed to improve the ontology and return to any of the previous steps, as desired)
[Grobelnik & Mladenić 2006]
ailab.ijs.si
ML/KDD for ontology learning
Define the ontology learning tasks in terms of mappings between ontology components, where some of the components are given and some are missing and we want to induce the missing ones.
Some typical scenarios in ontology learning are the following:
Inducing concepts/clustering of instances (given instances)
Inducing relations (given concepts and the associated instances)
Ontology population (given an ontology and relevant, but not associated instances)
Ontology generation (given instances and any other background information)
Ontology updating/extending (given an ontology and background information, such as, new instances or the ontology usage patterns)
ailab.ijs.si
Ontology Population via document classification into topic ontology
Goal: given a collection of documents organized into a topic ontology, classify a new document into the ontology
Different classification algorithms were applied on different data representations (e.g., word-vectors, word n-gram vectors, flexible phrase vectors)
on different datasets (e.g., Yahoo! directory of Web pages, US patent database, Directory of Slovenia/Croatian Web pages, News directory)
ailab.ijs.si
OntoClassify
System for scalable classification of text into large
topic ontologies [Grobelnik & Mladenić, 2005]
Available as Web service
for DMoz directory of Web pages
for Inspec ontology for annotating papers
for Mesh medical ontology
ailab.ijs.si
Constructing ontology from data stream
Goal: given a stream of documents (e.g., news
arriving over time) construct ontology
Solution: Framework that incorporates the stream
mining process into a formal definition of ontology[Grobelnik et al., 2006]
Extract named entities and use them as instances of the ontology
Entities and co-occurring entity pairs are represented by feature
vectors based on the content of the documents they occur in
Concepts and relations can be formed either by clustering or by
classification into an existing topic hierarchy
ailab.ijs.si
Illustrative results on Reuters news
Observe change in relations between entities
over time, e.g.,
France – UK relation focused first on
Society (Society, Government, Regional,...) and later
moves to
Business (Investing, Business, Stocks, Bonds,…);
ailab.ijs.si
Ontology Learning from text
Extending the existing ontologycommonly used is the English lexical ontology WordNet that is extended using some text, eg., Web documents [Agirre et al., 2000]
Learning relations for an existing ontology (from docs)learn relations between the concepts (eg., “isa” [Cimiano et al., 2004], “hasPart” [Maedche, Staab, 2001]), extract semantic relations from text based on collocations [Heyer et al., 2001]
Ontology construction based on clustering (from docs)split each document into sentences, parse the text and apply clustering for semi-automatic construction of an ontology [Bisson et al., 2000; Reinberger et al., 2004]
cluster sentences map them upon the concepts of a general ontology (eg., Wordnet [Hotho et al., 2003])
use whole documents and guiding the user through a semi-automatic process of ontology construction [Fortuna et al., 2005]
ailab.ijs.si
Ontology Learning from text (cont)
Ontology construction based on semantic graphsparse the documents and construct semantic graphs, use it for learning document summaries [Leskovec et al., 2004]
Ontology construction from a collection of news stories
represent news as graphs of named entities with relationships based on collocations, used for visualization/browsing [Grobelnik, Mladenić, 2004]
More information in edited book [Buitelaar et al., 2005]
ailab.ijs.si
SEMI-AUTOMATIC DATA-DRIVEN ONTOLOGY CONSTRUCTION
Blaz Fortuna, Dunja Mladenić, Marko Grobelnik
http://ontogen.ijs.si
ailab.ijs.si
Ontology Learning with OntoGen
Semi-Automaticprovide suggestions and insights into the domain
the user interacts with parameters of methods
final decisions taken by the user
Data-Drivenmost of the aid provided by the system is based on some underlying data
instances are described by features extracted from the data (eg., words-vectors)
Installation package available at ontogen.ijs.si
ailab.ijs.si
Main Features
Interactive user interface
User can interact in real-
time with the integrated
machine learning and text
mining methods
Concept discovery
methods:
Unsupervised
k-means clustering
Latent Semantic
Indexing (LSI)
Supervised
Active learning
Concept visualization
Methods for helping at
understanding the
discovered concepts:
Keyword extraction
TFIDF and SVM-normal
based keyword extraction
Concept visualization
LSI and multi-dimensional
scaling based visualization
Also available as a separate
tool named Document
Atlas:http://docatlas.ijs.si
ailab.ijs.si
Ontology management
Concept hierarchy
List of suggested sub-concepts
Ontology visualization
Selected concept
ailab.ijs.si
Concept management
Concept’s details
Concept’s instance
management
Selected concept
Keywords
Selected instance
ailab.ijs.si
Active Learning for concept learning
SVM hyperplane distance based active learning algorithm
First few labelled documents are bootstrapped from a query search
Instances for final concept are selected using the final SVM model
Query
New Concept
ailab.ijs.si
Reuters news articles used in the upper example with two different
sets of categories: topics or list of countries that appear in the news
articles.
Each set of categories offers a different view on the data.
SVM based method detects importance of keywords for each view.
Multiple views of the same data
Topics
view
Countries
view
UK takeovers and mergers
The following are additions
and deletions to the
takeovers and mergers list
for the week beginning
August 19, as provided by
the Takeover …
Lloyd’s CEO questioned in
recovery suit in U.S.
Ronald Sandler, chief
executive of Lloyd's of
London, on Tuesday
underwent a second day of
court interrogation about …
ailab.ijs.si
Instances are visualized as points on 2D map. The distance between two
instances on the map correspond to their similarity.
Characteristic keywords are shown for all parts of the map.
User can select groups of instances on the map to create sub-concepts.
Concept’s instances visualization
ailab.ijs.si
New documents
Classification of selected document
Selected document
Ontology population
System uses one vs. all linear SVM trained on created ontology to classify new instances into concepts.
Users can finalize the classifications using an interactive user interface
ailab.ijs.si
ONTOGEN ON IMAGES
Nenad Tomašev, Blaz Fortuna, Dunja Mladenić, Marko Grobelnik
ailab.ijs.si
SIFT features
Color
info
Text
Extract
features
Data
Mining
Application
Image representation
ailab.ijs.si
Image representation - features
SIFT features
Rotation, scale and translation invariant orientation
gradients located at “interesting” points on an image
Usually, SIFT feature space is quantized to get
“representative” vectors (“codebook” histogram)
Color histogram
Simply divide the color spectrum into “buckets” and
calculate the distribution of colors into these buckets,
(color histogram)
Distance - weighted sum of SIFT codebook and color data
distances
ailab.ijs.si
OntoGen on ImageNet subset (flowers, fire, buildings)
ailab.ijs.si
Document list for quick overview
ailab.ijs.si
Collection visualization (without displaying images)
ailab.ijs.si
Collection visualization(displaying images)
ailab.ijs.si
Creating ontology on images
Grouping similar images - concepts
Displaying relevant features as concept names
ailab.ijs.si
Sub-concept visualization
flower
buildings
fire
ailab.ijs.si
Adding sub-concepts
ailab.ijs.si
TEXT-DRIVEN ONTOLOGYEXTENSION
Inna Novalija, Dunja Mladenić
ailab.ijs.si
Arc
hit
ectu
re
OntoPlus
OntoPlus methodology
allows for the effective
extension of the very large
ontologies.
OntoPlus methodology
provides the user with
required concepts and
relationships in the form
of the ranked list.
OntoPlus methodology
combines textual ontology
content, ontology structure
and co-occurrence
information.
Domain Subset Extraction Module (DSEM)
Ontology Extension
Module (OEM)
3
4
5
Ontology Extender
Validated Entries:
Glossary Term,
Ontology Concept,
Relation
Candidate Entries:
for Each Glossary Term -
Ranked List of Related
Ontology Concept s and
Correspondent Relations
Suggested
Domain
Knowledge
Extractor
Extraction of
ontology concepts
defined in relevant
domains
Extraction of ontology
concepts with denotation
similar to Glossary Term
names
Extraction of
relevant domains
2 Relevant
Ontology
SubsetUpper-Level
Domain
Extractor
6
Multi-Domain
Ontology
7
Domain KB
Domain Information Module (DIM)
Domain
Keywords
Domain Glossary:
Term Names;
Term Descriptions
1
Domain information
identification
Extraction of the
domain relevant
ontology subsetRelated concepts
extraction
User validation
Ontology reuse
ailab.ijs.si
OntoPlus
Text-Driven Ontology Extension Using Ontology Content,
Structure and Co-occurrence
Ranking existing ontology concepts as corresponding to a new
domain concept suggested for the ontology extension
Experiments using Cyc ontology and textual material from two
domains – Finances and, Fisheries & Aquaculture
Best results by combining content, structure and co-occurrence
information
Financial domain - ontology content and structure
Fisheries & Aquaculture domain - ontology content and co-
occurrence
ailab.ijs.si
Results – Concept Ranking
100 Random Terms
HR (Top 1) HR (Top 5) HR (Top 10)
Weighting Measure Eqv or Hier
Rels
Any
Rels
Eqv or Hier
Rels
Any Rels Eqv or Hier
Rels
Any Rels
Baseline - Name: [1.0] 18 28 24 36 25 40
Content (cos. similarity): [1.0] 32 65 60 92 68 95
Co-occur (Jaccard similarity): [1.0] 30 48 48 62 52 73
Content: [0.5]
Structure: [0.4]
Co-occur: [0.1]
38 68 66 95 76 98
100 Random Terms
HR (Top 1) HR (Top 5) HR (Top 10)
Weighting Measure Eqv or Hier
Rels
Any Rels Eqv or Hier
Rels
Any Rels Eqv or Hier
Rels
Any Rels
Baseline - Name: [1.0] 24 37 25 38 27 40
Content (cos. similarity): [1.0] 32 72 52 88 56 91
Co-occur (Jaccard similarity): [1.0] 33 71 49 89 51 90
Content: [0.5]
Structure: [0.0]
Co-occur: [0.5]
42 84 63 96 66 96
Evaluation of the top suggested candidate concepts for ontology extension
(ASFA thesaurus)
Evaluation of the top suggested candidate concepts for ontology extension
(Financial glossary)
String edit distance of
concept name
Content +
Co-occurrence
Content +
Structure +
Co-occurrence
String edit distance of
concept name
ailab.ijs.si
Demo
ailab.ijs.si
CONTEXT SENSITIVE SEARCH
Boštjan Pajntar, Marko Grobelnik, Dunja Mladenić
http://SearchPoint.ijs.si
ailab.ijs.si
SearchPoint
Search engines generally work very well
There are cases where it is difficult to specify aquery
Idea: help the user by clustering all the hits and visualise the results space
Some related work: mindset.research.yahoo.com – research vs. shopping aspect
www.ujiko.com – clustering & user interface
vivisimo.com – hierarchical clustering
ailab.ijs.si
Approach Description
Search results clustered and shown in 2D space
Each point in this cluster space coresponds to a ranking
Hits are ordered according to the position of the focus -
the selected point
Initial focus position corresponds to Google ranking
Positioning clusters with respect to centroid to centroid
similarity
Calculating ranking of document using its similarity to each
centroid:
Classifiying documents into web directory (DMoz),
visualising relevant parts of the directory
ailab.ijs.si
Search
“Internet search” – one of the
most common tasks involving
text manipulation in everyday
life
…but – how smart is search
technology today?
…not too smart!
It is sophisticated, but not smart
ailab.ijs.si
Example – Searching for “jaguar”
Query “jaguar” has many meanings…
…but the first page of search engines doesn’t provide us with many answers
…there are 84M more results
ailab.ijs.si
Query
Conceptual map
Search Point
Dynamic
contextual
ranking based
on the search
point
Context sensitive search
ailab.ijs.si
SearchPoint
ailab.ijs.si
SearchPoint
ailab.ijs.si
Main advantages
Generated clusters
(in contrast to predefined)
User can search the whole cluster space and is
not forced to select a single cluster
(Computer generated clusters are not necessarily
what user has in mind)
ailab.ijs.si
SearchPoint integrated in Accenture’s intranet search
ailab.ijs.si
ANSWER ART
Luka Bradeško, Lorand Dali, Blaž Fortuna, Marko Grobelnik, Dunja
Mladenić, Inna Novalija, Boštjan Pajntar
http://AnswerArt.net
ailab.ijs.si
TripletsExtendedontology
AnswerArt – System Architecture
AnswerArtpreprocessing
Domain ontology(ASFA, WordNet)
Semantic enhancement
of triplets
AnswerArt
Index
Extraction
Cyc
Question Answer
ailab.ijs.si
AnswerArt using Medline
ailab.ijs.si
Show
document
AnswerArt using Medline
ailab.ijs.si
Show document
overview
ailab.ijs.si
ailab.ijs.si
AnswerArt using ASFA
ailab.ijs.si
AnswerArt using ASFA
Show
document
ailab.ijs.si
AnswerArt using ASFA
Show document
overview
ailab.ijs.si
NATURAL LANGUAGE TEXTENRICHMENT
Tadej Štajner, Delia Rusu, Lorand Dali, Blaž Fortuna,
Dunja Mladenić, Marko Grobelnik
http://enrycher.ijs.si
ailab.ijs.si
Enrycher Service
Annotation Features:
Entity extraction
People, locations, organizations,
dates, percentages and money
amounts
Entity resolution
co-reference
anaphora
Entity linkage to Linked Open
Data (LOD)
Word Sense Disambiguation to
LOD (WordNet 3.0 VUA)
Assertion extraction
Subject – predicate – object sentence
elements together with their modifiers
Categories – from the Open
Directory and the Wikipedia category
schema
ailab.ijs.si
Entity resolution in text
ailab.ijs.si
Enrycher Service Dependencies
The dashed line marks dependencies between components that are optional,
whereas the filled lines mark required dependencies
ailab.ijs.si
A comparative view on five systems: Enrycher, Text Runner, Open Calais, GATE and Read the
Web
Features Enrycher Text Runner Open Calais GATE NELL
Named Entity Extraction
Co-reference and
Anaphora Resolution
Entity resolution
Disambiguation
Assertion Extraction Relationshipextraction
Events andFacts
Relationshipextraction
Categories
Vizualization
RDF Output
Multi-Language Support English English,
French,Spanish
Web Service API
Can work on a singledocument
ailab.ijs.si
Enrycher - demo
ailab.ijs.si
Enrycher - demo
ailab.ijs.si
Enrycher - demo
Entities
Semantic graph
ailab.ijs.si
Enrycher - demo
Entity details
In OpenCyc
Category
ailab.ijs.si
OPINION MINING
Andreea Bizău, Delia Rusu, Dunja Mladenić
ailab.ijs.si
Opinion MiningUse case: Twitter comments on movies
amazing,
awesome
Weird,
odd
Weird, odd,
bad
amazing,
awesome,
perfect,
fantastic
IMDb Movie reviews*
(sample)
IMDb Movie reviews*
(Training data)
Domain-specific
opinion vocabulary
2 Clusters
Vocabulary
* http://www.cs.cornell.edu/people/pabo/movie-review-data/
applied to
Twitter comments analysis
Movie tweets
(Test data)
ailab.ijs.si
Twitter comments
analysis
• Sentiment words
distribution for a
movie
• Sentiment orientation
evolution per week,
day, hour
• Movie comparison
ailab.ijs.si
SENSOR SEARCH
Lorand Dali, Alexandra Moraru, Dunja Mladenić
ailab.ijs.si
Sensor Search - Architecture
Sensor Descriptions
(Text)Inverted Index
Ranking Model
(Personalized PageRank)
Geo Filtering
S
E
A
R
C
H
E
N
G
I
N
E
Query
• keywords
• center of area
of interest
• radius of area
of interest
ailab.ijs.si
ailab.ijs.si
REAL-TIME INFORMATION PROCESSING
Blaz Fortuna, Dunja Mladenić, Marko Grobelnik
ailab.ijs.si
Generic platform running on clouds for intensive data stream analytics…processes thousands of events per second
…includes state of the art data/text/web/stream-mining algorithms
Deployed in British Telecom, NYTimes, Bloomberg, Microsoft, TheStreet.com,
… ongoing work with Google News, Telefonica, Wikipedia,
QMiner – generic software platform for Real-Time information processing &
Complex Event Detection & Anomaly Detection
Transform&
Enrich
Anomaly
detection
Complex
events
detection
Analytics: Prediction,
Segment, Visualization
Model
CaptureReality
(Events)
Sensors,
Alarms,
User logs,
…
ailab.ijs.si
Network Monitoring for British Telecom
Alarms Server
Alarms
Explorer
Server
Live feed of
data
Operator Big board display
British
Telecom
Network
(~25 000
devices)
Alarms~10-100/sec
Alarms Explorer Server implements three real-
time scenarios on the alarms stream:
1. Root-Cause-Analysis – finding which device is
responsible for occasional “flood” of alarms
2. Short-Term Fault Prediction – predict which
device will fail in next 15mins
3. Long-Term Anomaly Detection – detect unusual
trends in the network
…system is used in British Telecom
ailab.ijs.si
VisualizingRoot-cause
and prediction
Root-
cause
Prediction
ailab.ijs.si
How Well Are We Predicting
Percentage Realisation of Predictions
0.00%
10.00%
20.00%
30.00%
40.00%
50.00%
60.00%
70.00%
80.00%
90.00%
Minutes
Pe
rce
nta
ge
86%
80%
60%
ailab.ijs.si
User Modeling for NYTimes & Bloomberg
Log Files
(~100M
page clicks
per day)
User
profiles
NYT
articles
Stream of
profiles
Advertisers
Segment Keywords
Stock Market Stock Market, mortgage, banking,
investors, Wall Street, turmoil, New York
Stock Exchange
Health diabetes, heart disease, disease, heart,
illness
Green
Energy
Hybrid cars, energy, power, model,
carbonated, fuel, bulbs,
Hybrid cars Hybrid cars, vehicles, model, engines,
diesel
Travel travel, wine, opening, tickets, hotel, sites,
cars, search, restaurant
… …
Segments
Trend Detection System
Stream of
clicks
Trends and
updated segments
Campaign
to sell
segments
$
Sales
ailab.ijs.si
Generalizing from registered users
BEP for Age (20% = random)
50,0%
55,0%
60,0%
65,0%
70,0%
75,0%
Conte
xt
Text F
eatu
res
Nam
ed E
ntities
All
Me
ta D
ata
All
Conte
nt
All
Fe
atu
res
Male
Female
BEP for Gender on users with at
least 10 visits (50% = random)
20,00%
25,00%
30,00%
35,00%
40,00%
45,00%
≥2
≥10
≥50
ailab.ijs.si
Good recommendations
can make a big difference
when keeping a user on a
web site
…the key is how rich context
model a system is using to
select information for a user
Bad recommendations <1%
users, good ones >5% users
click
Using User Modeling for News Recommendations
Contextual
personalized
recommendations
generated in ~20ms
ailab.ijs.si
RecommendationFeatures:
History (user profile)
Geo (based on IP)
Requested page (where we serve recommendation)
Referring URL
Time
timenow
US
Finance
Oil
All History Context Geo Requested Referring Time
Top1 Recall 66 65 65 65 66 60 60
Top2 Recall 81 78 78 75 78 67 67
Top3 Recall 86 83 83 79 81 72 72
Top Precision 52 48 49 43 41 36 36
Regular (visits > 50)
Context Geo Requested Referring Time
Top1 Recall 60 58 46 60 60
Top2 Recall 77 70 61 71 71
Top3 Recall 85 77 72 78 78
Top Precision 45 36 35 37 37
New (first visit)
training
ailab.ijs.si
Real-time Architecture
Logging
Collaborative Filter
SVM
Archive
Web
Amazon
Crawl
ailab.ijs.si
Results
0,0%
1,0%
2,0%
3,0%
4,0%
5,0%
6,0%
7,0%
17.apr 24.apr 1.maj 8.maj 15.maj
News Personalization Test Page-Story Page Transition Probabilities
Control JSI SVM Random JSI CF DailyMe Personalized Most Popular ContextualCompetitor
ailab.ijs.sihttp://log3.quintelligence.com/test/rec/test-svmcfni.html
ailab.ijs.si
PERSONALIZED NEWS SEARCH
Lorand Dali, Blaž Fortuna
ailab.ijs.si
Personalized News Search –System Architecture
Ranking Model
Learning to Rank
Query
Search
Logs
keywords
User
−age
−country
−gender
−income
−industry
−job
ailab.ijs.si
User: Young female computer programmer
Query: Religion
ailab.ijs.si
User: Middle aged male clergy
Query: Religion
ailab.ijs.si
Videolectures.net562 events, 8169 authors, 10539 lectures,
12859 videos
ailab.ijs.si
Montreal @ Video Lectures