18
Hierarchical Relational Models for Document Networks Jonathan Chang and David Blei Facebook and Princeton University The Annals of Applied Statistics, 2010 Presented by Haojun Chen Images and some text are from the original paper.

Hierarchical Relational Models for Document Networks

  • Upload
    clarke

  • View
    51

  • Download
    4

Embed Size (px)

DESCRIPTION

Hierarchical Relational Models for Document Networks. Jonathan Chang and David Blei Facebook and Princeton University The Annals of Applied Statistics , 2010. Presented by Haojun Chen. Images and some text are from the original paper. Introduction. - PowerPoint PPT Presentation

Citation preview

Page 1: Hierarchical Relational Models for Document Networks

Hierarchical Relational Models for Document Networks

Jonathan Chang and David BleiFacebook and Princeton University

The Annals of Applied Statistics, 2010

Presented by Haojun ChenImages and some text are from the original paper.

Page 2: Hierarchical Relational Models for Document Networks

Introduction

• Network data attracted lots of research interests in machine learning and applied statistics.

• Previous work focused only for the network structure but ignores the attributes of nodes.

For example, in a citation network of articles, text and abstracts of documents should be used for exploiting the latent structure in the data too.

• In this paper, Relational Topic Model (RTM) is developed for network data, which accounts for both links and node attributes.

Page 3: Hierarchical Relational Models for Document Networks

Data Example for RTM

Page 4: Hierarchical Relational Models for Document Networks

Graphical Model for RTM

Page 5: Hierarchical Relational Models for Document Networks

Generative Process for RTM

Page 6: Hierarchical Relational Models for Document Networks

Link Probability Function

• Four Link Probability Function:

CDF of Normal distribution

: Hadamard product

Page 7: Hierarchical Relational Models for Document Networks

Model Inference, Estimation and Prediction

• Variational inference for and

• Maximum likelihood estimate for , and

• Prediction– Link prediction from words

– Word prediction from links

Page 8: Hierarchical Relational Models for Document Networks

Empirical Results

• Data summary

• Three experiments– Evaluating the predictive distribution– Automatic link suggestion– Modeling spatial data

Page 9: Hierarchical Relational Models for Document Networks

Evaluating Predictive Distribution (1/2)

Lower is Better

Page 10: Hierarchical Relational Models for Document Networks

Evaluating Predictive Distribution (2/2)

Page 11: Hierarchical Relational Models for Document Networks

Automatic Link Suggestion (1/3)

• Citation suggestionSuggest citation given the abstract

• Cora dataset and number of Topic is set to 10

• RTM improves precision over LDA+Regression by 80% in the first 20 documents retrieved from the model

Page 12: Hierarchical Relational Models for Document Networks

Automatic Link Suggestion (2/3)

Page 13: Hierarchical Relational Models for Document Networks

Automatic Link Suggestion (3/3)

Page 14: Hierarchical Relational Models for Document Networks

Modeling Spatial Data (1/4)

• Local News Data: 51 documents and each document for one state

• Number of Topic is set to 5

• Word are ranked by the following score:

Page 15: Hierarchical Relational Models for Document Networks

Modeling Spatial Data (2/4)• Each color depicts a single topic. Each state’s color intensity indicates

the magnitude of that topic’s component. • Corresponding words associated with each topic are given in the table.

RTM LDA

Page 16: Hierarchical Relational Models for Document Networks

Modeling Spatial Data (3/4)

RTM LDA

Page 17: Hierarchical Relational Models for Document Networks

Modeling Spatial Data (4/4)

RTM LDA

Page 18: Hierarchical Relational Models for Document Networks

Discussion

• Relational Topic Model (RTM) is a hierarchical model of networks and per-node attribute data.

• It is demonstrated qualitatively and quantitatively that RTM is effective and useful mechanism for analyzing and using network data.