25
Dynamic Multi- Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Embed Size (px)

Citation preview

Page 1: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Dynamic Multi-Faceted Topic Discovery in TwitterDate : 2013/11/27Source : CIKM’13Advisor : Dr.Jia-ling, KohSpeaker : Wei, Chang

1

Page 2: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Outline• Introduction• Approach• Experiment• Conclusion

2

Page 3: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Twitter

3

Page 4: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

What are they talking about?• Entity-centric• High dynamic

4

Page 5: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Multiple facets of a topic discussed in Twitter

5

Page 6: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Goal

6

Page 7: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Outline• Introduction• Approach• Framework• Pre-processing• LDA• MfTM

• Experiment• Conclusion

7

Page 8: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Framework

8

Training document

Model(hyper parameter)

Twitter

Per document DocumentVector

Twitter

Pre-processing

Pre-processing

Page 9: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Pre-processing• Convert to lower-case• Remove punctuation and numbers• “Goooood” to “good”• Remove stop words• Named entity recognition• Entity types : person, organization, location, general terms• Linked Web : http://nlp.stanford.edu/ner/• Tweet : http://github.com/aritter/twitter_nlp

• All user’s posts published during the same day are grouped as a document

9

Page 10: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

 Latent Dirichlet Allocation

• Each document may be viewed as a mixture of various topics.• The topic distribution is assumed to have

a Dirichlet prior.• Unsupervised learning• Need to initialize the topic number K

•Not Linear discriminant analysis (LDA)

10

Page 11: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Example• I like to eat broccoli and bananas.• I ate a banana and spinach smoothie for breakfast.• Chinchillas and kittens are cute.• My sister adopted a kitten yesterday.• Look at this cute hamster munching on a piece of broccoli.

Topic 1

Topic 2

: food

: cute animals

11

Page 12: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

How LDA write a document?

Topic 2Topic 1

broccoli

munching

breakfast

bananas

kittens

chinchillas

cute

hamster

12

Page 13: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Real World Example

13

Page 14: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

LDA Plate Annotation

14

, , , ,

𝛽=[0 .7 0.2 0.10.3 0.8 0.9

0 .8 0.4 0.70.2 0.6 0.3

0 .8 0.60.2 0.4 ]

Different implies different for every document.Each decide the fraction of each topic.

Different implies different topic mixture to each word.

Page 15: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

LDA

15

𝐷={𝑤1 ,𝑤2 ,𝑤3 ,…,𝑤𝑀 }

Page 16: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

How to find • EM algorithm• Gibbs sampling• Stochastic Variational Inference (SVI)

16

Page 17: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Multi-Faceted Topic Model

17

Page 18: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Outline• Introduction• Approach• Experiment• Conclusion

18

Page 19: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Perplexity Evaluation• Perplexity is algebraicly equivalent to the inverse of the

geometric mean per-word likelihood.

• M is the model learned from the training dataset, is the word vector for document d and is the number of words in d.

19

Page 20: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Perplexity Evaluation

20

Page 21: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

KL-divergence• P={1/6, 1/6, 1/6, 1/6, 1/6, 1/6}• Q={1/10, 1/10, 1/10, 1/10, 1/10, 1/2}

• KL is a non-symmetric measure 21

+++

Page 22: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

KL-divergence

22

Page 23: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Scalability• A standard PC with a dual-core CPU, 4GB RAM and a 600GB

hard-drive

23

Page 24: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Outline• Introduction• Approach• Experiment• Conclusion

24

Page 25: Dynamic Multi-Faceted Topic Discovery in Twitter Date : 2013/11/27 Source : CIKM’13 Advisor : Dr.Jia-ling, Koh Speaker : Wei, Chang 1

Conclusion• We propose a novel Multi-Faceted Topic Model. The model

extracts semantically-rich latent topics, including general terms mentioned in the topic, named entities and a temporal distribution

25