18
Plans on “Latent Topic Model”

Plans on “Latent Topic Model”

  • Upload
    telma

  • View
    21

  • Download
    1

Embed Size (px)

DESCRIPTION

Plans on “Latent Topic Model”. High-Level Architecture. Users. Ads. User Encoding. User Encoding. User Clustering. Prediction. eCTR / FB Prediction. Existing Pipeline. Encoding Auto-encoder for dimension reduction Political affiliation clustering - PowerPoint PPT Presentation

Citation preview

Page 1: Plans on “Latent Topic Model”

Plans on “Latent Topic Model”

Page 2: Plans on “Latent Topic Model”

High-Level Architecture

Users Ads

UserEncoding

eCTR / FB Prediction

UserClustering

UserEncoding

Prediction

Page 3: Plans on “Latent Topic Model”

Existing Pipeline

• Encoding– Auto-encoder for dimension reduction– Political affiliation clustering– Output: Hive table (user id + low-dim representation)

• eCTR prediction– Optional: user clustering stage

Page 4: Plans on “Latent Topic Model”

Approaches to use encoding in eCTR prediction

Page 5: Plans on “Latent Topic Model”

Social Networks

Page 6: Plans on “Latent Topic Model”

Information on a social network• Social graph

– Friendship networks– User-ads network ...

• Text– News feed– Messages– Ads text …

• Images – Album– Random posts– Ads figures …

• Demographics – Age, occupation …

• Very high-dimensional• Non-independent • Insufficient training data (this is

true even we use the whole web)• Hard to optimize and interpret

eCTR

Page 7: Plans on “Latent Topic Model”

Essentials of a good user-ads representation

• Distilling all local attribute semantics– Social roles – Topical contents– Ideology/sentiment

• Capture relational information– long range indirect influence– social environments and contexts

• Capture dynamic trends– e.g., change of strength of interest– New/dying interests

• Discriminative: – optimize against well-defined predictive task rather than vague intermediate

goals such as clustering

• Low dimensional and (perhaps) interpretable

Page 8: Plans on “Latent Topic Model”

Example:

Page 9: Plans on “Latent Topic Model”

Proposed Models

Page 10: Plans on “Latent Topic Model”

Dynamic tomography

• How to model dynamics in a simplex?

Project an individual/stock in network into a "tomographic" space

Trajectory of an individual/stock in the "tomographic" space

Page 11: Plans on “Latent Topic Model”

Senate Network: role trajectoriesCluster legendJon Corzine’s seat (#28,

Democrat, New Jersey) was taken over by Bob Menendez from t=5

onwards.

Corzine was especially left-wing, so much that his views did not

align with the majority of Democrats (t=1 to 4).

Once Menendez took over, the latent space vector for senator

#28 shifted towards role 4, corresponding to the main Democratic voting clique.

Jon Corzine’s seat (#28, Democrat, New Jersey) was taken over by Bob Menendez from t=5

onwards.

Corzine was especially left-wing, so much that his views did not

align with the majority of Democrats (t=1 to 4).

Once Menendez took over, the latent space vector for senator

#28 shifted towards role 4, corresponding to the main Democratic voting clique.

Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more

consistent with the Republican party.

Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space

vector includes more of role 3, corresponding to the main Republican

voting clique.

This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,

during which a high proportion of Republicans voted for him.

Ben Nelson (#75) is a right-wing Democrat (Nebraska), whose views are more

consistent with the Republican party.

Observe that as the 109th Congress proceeds into 2006, Nelson’s latent space

vector includes more of role 3, corresponding to the main Republican

voting clique.

This coincides with Nelson’s re-election as the Senator from Nebraska in late 2006,

during which a high proportion of Republicans voted for him.

Page 12: Plans on “Latent Topic Model”

Visualization

Page 13: Plans on “Latent Topic Model”

Visualization

Page 14: Plans on “Latent Topic Model”

Algorithm Details

Page 15: Plans on “Latent Topic Model”

Data

Page 16: Plans on “Latent Topic Model”

Learning System

Given – a network of user/documents

Perform E-step(Gibbs sampling)in parallel way. Get Sufficient Stats

Perform M-stepIn parallel way

Repeat until convergence

Single Program

α, β, η, μα, β, η, μα, β, η, μα, β, η, μ

α, β, η, μα, β, η, μ

zz zz zz zz

Page 17: Plans on “Latent Topic Model”

Project Plans and Milestones

• Scalable implementation of baseline user text model (M1)

• Discriminative M1

• M1 + network model M2

• M3 + history + time M3

• Parallel work on downstream utility– eCTR prediction– Visualization – User/ads clustering

Page 18: Plans on “Latent Topic Model”

Resources

• CMU: – First intern Keisuke will come in mid Oct , implementing

M1– Second intern Qirong Hu will come in later Dec,

implementing M2 and M3

• FB:– Rajat Raina– Rong Yang– System support