Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides

Preview:

DESCRIPTION

Enhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 Conference - User Modeling Personalization and Adaptation - Montreal (Canada) - Joint Work University of Bari and Philips Research, presented at the Industrial Track of the Conference

Citation preview

UMAP 2012 - Industrial Track Montréal (Canada), 19.07.2012

Enhanced Semantic TV-Show Representation for Personalized Electronic Program Guides

Cataldo Musto, Fedelucio Narducci, Pasquale Lops, Giovanni Semeraro, Marco de Gemmis (University of Bari, Aldo Moro)Mauro Barbieri, Jan Korst, Verus Pronk and Ramon Clout (Philips Research, Eindhoven, The Netherlands)

exponential growthof available TV assets

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

Some stats4 hours watched every day

out of 3000 hours of broadcast TV shows

0.013%ratio

source: Nielsen Survey, 2011

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

Information Overload

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

what TV shows should I watch?C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

industrial scenario

how does Philips cope with the overload of TV shows?

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

personalization.

solution

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

recommender systems

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

content-based recommenderskey concepts

• Each item (TV show) has to be described through a set of features

• Description of TV shows, plot of the movie and so on.

• Each user is described through the features that occur in TV shows she watched (liked) in the past

• Recommendations are provided by calculating the overlap between the textual description of the TV show and the features stored in the user profile

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

content-based recommendersexample: TV shows recommendations

user profile

recommendations

documentary

basketball

football

nba (basketball)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

documentary

content-based recommendersexample: TV shows recommendations

♥ Xbasketball

football

nba (basketball)

user profile recommendations

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

documentary

content-based recommendersexample: TV shows recommendations

♥ X

user profile recommendations

basketball

football

nba (basketball)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

‘in vitro’ experimentspersonal channels

concept

Idea: combining boolean filters to filter TV shows and recommenders to rank them.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

‘in vitro’ experiments

Watchmi plug-indeveloped by Aprico.tv

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

descriptions of TV shows are often too short or poorly meaningful

to feed a content-based recommendation algorithm

problem

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

solutionfeature generation techniquesbased on open knowledge sources

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

solutionfeature generation techniquesbased on open knowledge sources

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

explicit semantic analysis

• Explicit Semantic Analysis (ESA) (Gabrilovitch and Markovitch, 2006)

• Goals To introduce a methodology for representing the knowledge stored in Wikipedia

• To define a relationship between terms in natural language and Wikipedia articles

• Insights

• ESA provides a vector-space representation for each term

• Terms are represented as rows in a matrix (called ESA matrix) where

each column is a Wikipedia concept (article)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

ESA representationterm/document matrix

a1 a2 a3 a4 a5 a6 a7 a8 a9

t1 ✔ ✔ ✔ ✔

t2 ✔ ✔ ✔ ✔

t3 ✔ ✔ ✔

t4 ✔ ✔ ✔ ✔

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

ESA representationterm/Wikipedia articles matrix

a1 a2 MotoGP a4 a5 a6 a7 a8 a9

t1 ✔ ✔ ✔ ✔

t2 ✔ ✔ ✔ ✔

t3 ✔ ✔ ✔

t4 ✔ ✔ ✔ ✔

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

ESA representation

Cat$[0.92]+

Leopard$[0.84]+

Roar$[0.77]+

Every Wikipedia article is a concept

Each concept is represented through the TF-IDF scores of the terms that occur in the

article

Superbike (0.92)

grand prix (0.76)

valentino rossi (0.59)

MotoGp

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

ESA representationterm/Wikipedia Articles matrix

Politics MotoGP Basketball M.Biaggi V.Rossi

Superbike ✔ ✔ ✔

t2 ✔

t3 ✔

t4 ✔ ✔ ✔

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

ESA representation

Cat$ Cat$[0.95]$

Jane.Fonda$[0.07]$

Panthera([0.92](

Each term can be defined upon the Wikipedia concepts it occurs in

the whole vector is called Semantic Interpretation Vector

“ the semantics of a term is the vector of its associations with Wikipedia articles”

Superbike MotoGP(0.92)

Bridgestone(0.43)

Max Biaggi(0.63)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

ESA representation

bu#on&Dick+Bu#on&[0.84]&

Bu#on&[0.93]&

Game%Controller%[0.32]%

Mouse+computing*[0.81]&

mouse&Mouse+computing*[0.89]&

Mouse+rodent*[0.91]&

John+Steinbeck&[0.17]&

Mickey%Mouse%[0.81]%

mouse++bu#on&

DragB+andBdrop&[0.32]&

Mouse+rodent*[0.46]&

Mouse+computing*[0.85]&

IBM&PS/2*[0.35]&

semantics of text fragments

calculated as the centroid vector of the semantic interpretations vectors that compose the fragment

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

Research QuestionHow can we exploit ESA for performing

feature generation in the scenario of EPGs personalization?

ESA has already been adopted for text classification, information retrieval and

semantic relatedness computation

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

From BOW to eBOW

Given a description of a TV show, we exploit ESA to obtain an enhanced representation

The original set of features is enriched with the set of Wikipedia articles related the most with the TV show

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

BOW$Concept$

n$[0.32]$

Concept$47!

[0.46]$

Concept$1!

[0.85]$

centroid vector

Concept$50!

[0.35]$

The centroid vector of the whole description of the TV show is calculated

The n most related Wikipedia concepts are extracted

Concepts are added to the original BOW to obtain an enhanced BOW (e-BOW)

algorithm

From BOW to eBOW

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

TV SHOW Rad an Rad

Die besten Duelle der MotoGP (Wheel to wheel

The best duels in the MotoGP)

Wikipedia(Articles(großer&preis&von&italien&

(motorrad)&großer&preis&von&malaysia&

(motorrad)&großer&preis&von&tschechien&

(motorrad)&scuderia&ferrari&valen8no&rossi&

motorrad9wm9saison&2005&motorrad9wm9saison&2006&

max&biaggi&

großer&preis&der&usa&(motorrad)&motorrad9wm9saison&2008&

rad&(heraldik)&loris&capirossi&shin’ya&nakano&

motogp&

example

From BOW to eBOW

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

what about the advantages?

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

user profile tv show

motogp

sports

motorbike

...

competition

example

2012 Superbike Italian Grand

Prix

BOW representation

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

user profile tv show

motogp

sports

motorbike

...

competition

example

2012 Superbike Italian Grand

Prix

XNo matching!

BOW representation

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

user profile tv show

motogp

superbike

sports

motorbike

formula 1

...

competition

example

2012 Superbike Italian Grand

Prix

eBOW representation

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

user profile tv show

motogp

superbike

sports

motorbike

formula 1

...

competition

example

2012 Superbike Italian Grand

Prix

Matching!

✔eBOW representation

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

knowledge is fluid.

ESA advantages

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

knowledge is fluid.it is necessary to exploit open and

always updated knowledge sources

ESA advantages

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

concept:example

‘American Politics’

2000Year Enrichment

Clinton

Bush

Obama

20052011

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

concept:(counter)example

‘Italian Politics’

2000Year Enrichment

Berlusconi

Berlusconi

Berlusconi

20052011

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

experiments.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

• retrieval task• Given a set of program types and a repository of TV

shows

• We want to retrieve the shows that belong to a specific program type

design of the experimentstask

Movie

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

dataset

•Dataset

• 47 German-language Channels provided by Axel Springer

• 133k TV Shows, 17 program types

• Textual features: title, synopsis, description, program type

•Explicit Semantic Analysis

• Dump: October, 2010

• 814,013 terms (rows) and 484,218 articles (colums)

Aprico.tv data

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

design of the experiments

• Two state-of-the-art learning methods have been compared

• Random Indexing

• Vector Space Model (VSM)-based representation

• Incremental approach to compress the representation in an effective way

• Both TV shows and user profile are points in a vector space

• Logistic Regression

• Supervised Learning Method, state of the art for Text Classification

• Each TV show is classified as relevant or not relevant for the user, according to user profile

• TV shows can be ranked according to their probability scores

learning methods

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

design of the experimentsresearch questions

Which one is the learning method than can provide the best recommendations ?

Does the idea of enriching the BOWs with ESA improve the accuracy of the suggestions ?

1.

2.

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

50

62,5

75

87,5

100

P@5% P@10% P@25% P@50% P@75% P@100%

Logistic RegressionRandom Indexing

experiment 1results

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

experiment 2results

74

79,25

84,5

89,75

95

P@5% P@10% P@25% P@50% P@75% P@100%

BOWeBOW (+20)eBOW (+40)eBOW (+60)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

experiment 2results

74

79,25

84,5

89,75

95

P@5% P@10% P@25% P@50% P@75% P@100%

BOWeBOW (+20)eBOW (+40)eBOW (+60)

Differences between BOW and eBOW(+40, +60) are

statistically significant (Mann-Whitney Test,

p<0,005)

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

Recap

• Content-based Personalization Techniques for Electronic Program Guides

• Joint work: Philips Research - Aprico.tv - University of Bari

• Feature generation to enrich textual descriptions of TV shows

• Exploitation of ESA: Explicit Semantic Analysis

• Introducing eBOW for content representation

• BOW + Wikipedia concepts related to the textual description

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

Conclusions• Linear Regression can provide good accuracy in retrieving

related TV shows

• Almost 90% in precision.

• Feature Generation techniques based on Wikipedia can improve the precision of a content-based recommendation approach

• eBOW representation overcomes the classical BOW representation

• Good results: 94% in precision

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 19.07.12

questions?

C. Musto, F. Narducci, G. Semeraro, P. Lops, M. de Gemmis, M. Barbieri, J. Korst, V. Pronk, R. CloutEnhanced Semantic TV-Shows Representation for Personalized Electronic Program Guides - UMAP 2012 - 18.07.12

Recommended