Gender and Interest Targeting For Sponsored Post Advertising at Tumblr
48
Gender and Interest Targeting for Sponsored Post Advertising at Tumblr Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric, Narayan Bhamidipati, Ananth Nagarajan Yahoo Research – Ad Sciences
Gender and Interest Targeting For Sponsored Post Advertising at Tumblr
1. Gender and Interest Targeting for Sponsored Post Advertising
at Tumblr Mihajlo Grbovic, Vladan Radosavljevic, Nemanja Djuric,
Narayan Bhamidipati, Ananth Nagarajan Yahoo Research Ad
Sciences
2. Talk Overview Audience Targeting Intro Tumblr Basics
Overview of the network Advertising on Tumblr Tumblr Data Data
Sources Tumblr User Profiles Tumblr Gender Prediction Approach and
Results Tumblr Interest Prediction Approach and Results
3. Audience targeting Targeting unit = audience = a group of
users 1. Audience expansion Seed modeling finding similar users to
a provided set Click/Conversion prediction
4. Audience targeting Targeting unit = audience = a group of
users 2. Off-the-shelf audiences (e.g. sports > basketball)
Interest - based categorizing user actions (search, mail, news,
tumblr, apps) no supervision - pure interest (recency &
intensity) Intent - based categorizing clicks, purchases, etc.
supervision fit a model that predicts clicks/purchases
5. Audience targeting retail PC sport shoes PC vitamins PC food
PC outdoor PC cosmetics PC e.kohls.com dickssportinggoods.co m
vitacost.com papajohns-specials.com rei.com hautelookmail.com
landsend.com finishline.com luckyvitamin.com dominos.com
backcountry.com maccosmetics.com sears.com footlocker.com
wansonvitamins.com jimmyjohns.com campmor.com cs.sephora.com
gap.com newbalance.com iherb.com grubhub.com orvis.com ulta.com
jcrew.com zappos.com walgreens.com chipotle.com usoutdoor.com
eyeslipsface.com o.macys.com 6pm.com christianbook.com pizzahut.com
kelty.com hautelookmail.com [1] Grbovic, M. et al. "Sparse
Principal Component Analysis with Constraints" AAAI 2012 [2]
Grbovic, M. et al. "Generating Ad Targeting rules using Sparse
Principal Component Analysis with Constraints" WWW 2014 [3]
Grbovic, M. et al. "Search retargeting using directed query
embeddings WWW 2015 3. Retargeting Search retargeting (finding
similar queries to provided set of queries) Mail retargeting
(finding similar domains to provided domain)
6. Tumblr Basics Tumblr official statistics 249 Million Blogs
117 Billion Posts 90 Million Daily Posts 13 Languages Source
http://www.tumblr.com/about
7. Tumblr Basics 1 user has 1 primary blog (user=blog)
8. Tumblr Basics Blogs have informative descriptions: Tristan,
i'm 15, Canada... Snowboarding Travel - Football FTB Hello, I'm
Tess, I post a lot of stuff and Spot Conlon is my bae. Musicals are
rad and Shawn Hunter is forever golden Alyssa|18|California. I like
bands, books, shows, and random things. And geese. One bit me in
the crotch once. Good times, good times I'm Carla // 19yrs old //
Texas y'all listen, I just like to blog about anime and cute
animals and and video games. My name's Kierstin. I love
basketball
9. Tumblr Basics As blog owner you can create posts (your own
or reblog) follow other blogs Post types: text photo quote link
chat audio video 14.13% 78.11% 2.27% 0.46% 0.85% 2.01% 1.35%
10. Tumblr Basics regular post title body tags reblog like
11. Tumblr Basics sponsored post
12. Advertising on Tumblr
13. Advertising on Tumblr
14. Advertising on Tumblr How to enhance it? Targeting Reach
only users that are interested in the product/category 1. Gender
Targeting most basic form of ad targeting proven to work better
than targeting random users 2. Interest Targeting more involved
find users with interest in specific category, e.g. fashion,
sports, etc. proven to work better than pure gender targeting
15. Data Sources Firehose (user actions + post details) 1. Blog
details - title, description 2. Post details: photo posts: caption,
tags text posts: title, tags audio posts: artist, tags 3. User
actions - post, reblog, like, unlike gnip.com/sources/tumblr
16. Data Sources Follower Graph Subset we extracted: 96.9M
nodes (users) 5.1B edges (follows 1 ) 18.2M blogs follow each other
Average user follows 58.9 blogs
17. User Profiles User profile (details in paper) created from:
1. Declared Features Text from Blog Title Text from Blog
Description 2. Content Features Tags from Blog Posts Text from Blog
Post content Artist names from audio posts 3. User Actions Like
Follow Reblog user 0 1 7 0 3 vector intensity + recency
18. Gender Prediction Main Goal Assign Gender to Tumblr Users
For example: user x is most likely female Based on the results
serve targeted ads Steps 1. Used Golden set (known gender) + user
profiles to train a predictive model 2. Score all users for which
we have a profile 3. Apply threshold to keep only most certain
predictions
19. Gender Prediction Golden Sets: Based on Declared User First
Names Extract first names from Blog Descriptions Use US Census data
(1880 to 2013) to get probability of gender given the name male
female 395K 564K regex count regex count my name is* 783,564 mi
chiamo* 9,181 my names* 291,811 mein name ist* 1,025 me llamo*
47,663 meu nome e* 512 the names* 38,065 mon nom est* 215 mi nombre
es* 9,751 mio nome e* 185 golden set size
20. Gender Prediction Model Training: Large-scale weighted
Logistic Regression ground truth Predicts the probability of user
being male weights - model parameter weighted learning
21. Gender Prediction Results: On hold-out set: Editorial
Evaluation of 1000 random blogs: Coverage: The classified users
cover >95% actions (posts, reblogs, likes, etc.) Gender
Precision Recall female 0.806 0.838 male 0.794 0.689 Gender Correct
Wrong Dont Know female 429 4 298 male 144 5 127
22. Interest Targeting Main Goal Assign Interest categories to
Tumblr Users For example: user x is interested in fashion Based on
the results serve targeted ads Interests picked from a fixed
Advertising Taxonomy
23. Interest Targeting Level 2 Arts and Entertainment/Movies
Arts and Entertainment/Television Style and Fashion/Clothing
Hobbies and Interests/Photography Food and Drink/Dining Out Family
and Parenting Food and Drink/Dining Out - Fast Food Education/K to
12 Education Beauty and Personal Care/Face and Body Care Arts and
Entertainment/Music Arts and Entertainment/Books and Literature
Beauty and Personal Care/Hair Care Style and Fashion/Footware Arts
and Entertainment/Movies Level 1 Arts and Entertainment Style and
Fashion Pets Shopping Food and Drink Home and Garden Health and
Fitness Beauty and Personal Care Education Society Sports
Technology and Computing Travel Automotive
24. Interest Targeting Intent Audiences (drives clicks) Collect
clicks on categorized ads Train a model where: clicks (+1) and no
clicks (-1) Score all users to estimate probability of click
Interest Audiences (drives brand awareness with relevant audience)
Infer user interest in certain category based on their activity
Create categorized user profiles
25. Interest Targeting Approach (details in paper): 1.
Categorize keywords from post content (post tags, post text) and
blog titles and descriptions 2. Predict user interest categories
based on the categorized tags and text in posts, blog titles and
descriptions they use (intensity + recency) 3. Leverage follower
graph and like actions to categorize users who do not create much
content
26. Tag Categorization How to represent tags? 1) Traditional
Bag of words bag of words 0 1 0 1 0 1 where query words are 0
everywhere else Tag 1: movie releases releasesmovie bag of words 1
0 1 1Tag 2: new blockbuster hits new blockbusterhits ISSUE No way
we can find that these 2 tags are similar
27. Tag Categorization 2) Improvement add context You shall
know a word by the company it keeps
28. Tag Categorization 3) New move from sparse to dense vectors
Represent tags as numeric vectors Learn vectors from training data
(user posts) Leverage context of tags (surrounding tags in same
post) Result: tags with similar contexts will have similar vectors
post 1: trip_ideas cheap_flights holiday_travel_deals post 2:
trip_ideas air_tickets holiday_travel_deals tag vector 0.2 1.1 7.2
0.8 3.1
30. Tag Categorization Word2Vec Classification model with word
w and context c pairs: surrounding words treated as positives: D
random sampling of negatives: D [1] Tomas Mikolov, Ilya Sutskever,
Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed
Representations of Words and Phrases and their Compositionality. In
Proceedings of NIPS, 2013. In our case, word w = tag tj-n tj tj-1
tj+1 tj+n Projec on j-th tag tags within a single post
31. Tag Categorization T8 T1 T2 T6 current tag neighborhood T1
T6 T8 RND T2 embedding space RND Tag2Vec Example search session:
tag8, tag1, tag2, tag6
32. Tag Categorization movie releases new blockbuster hits
similarity=0.9 0.2 1.1 7.2 0.8 3.1 0.21 1.2 6.8 0.74 3.2 Tag2Vec
after training
33. Tag Categorization How to learn tag classes? Tag features:
tag2vec vector Tag labels: human input 8,400 tags categorized by
editors
34. Tag Categorization How to learn tag classes? 1. Supervised
learning using tag vectors as features x and assigned classes y
#movie_releases 0.2 1.1 7.2 0.8 3.1 18 features x label ytag f (x)
y Fit a model that maps features to category labels: minimize
prediction loss one-against all classifiers (multi-class)
35. Tag Categorization How to learn tag classes? 2.
Semi-supervised learning of category vectors while we are learning
tag vectors (predict the closest category vector) 0.2 1.1 7.2 0.8
3.1 #movie_releases 0.2 1.1 7.2 0.8 3.1 features xtag
arts&ent./movies category similarity=0.9
36. Tag Categorization Skip-Gram semi-supervised Skip-Gram tj-n
tj tj-1 tj+1 tj+n Projec on j-th tag tags within a single post c1
ck j-th tag categories tj-n tj tj-1 tj+1 tj+n Projec on j-th tag
tags within a single post 2. Semi-supervised learning of category
vectors while we are learning tag vectors (predict the closest
category vector)
38. Tag Categorization Tag2Vec Final Model t = tags c = context
(sorrounding tags) n = random negatives class = class tags c1 ck
j-th tag categories tj-n tj tj-1 tj+1 tj+n Projec on j-th tag tags
within a single post
39. Tag Categorization Tag2Vec - training Data: ~6.8B posts
that contained tags Parameters: window size = 5, random negatives =
5, most frequent tags down sampled Output: ~2M tag vectors trained
(d=300) Categorization: 380K most confident tag predictions kept
(>0.5 cosine similarity to the closest category vector)
41. Tag Categorization Food & Drink/DessertsHealth &
Fitness/Weight Loss http://youtu.be/ygn5oUBydfM
42. Tag Categorization
43. Interest Prediction user category categorized features user
1 Arts and Entertainment/Mov ies tag spoilers:30 tag shrek:18 tag
hercules:12 desc dvd:1 tag pokemon:7 tag thor:58 tag cinderella:3
tag hobbit:123 desc comedy:1 txt movies:100 desc movie:1 tag
hulk:21 photo aladdin:28 tag disney:500 photo batman:10 txt
bambi:12 desc animation:12 tag pixar:87 tag tarzan:8 tag marvel:385
tag wolverine:21 desc oscar:1 tag twilight:2 tag user 2 Style and
Fashion txt fashion:108 tag womensfashion:110 tag
fashiondiaries:133 tag redhair:2 tag menswear:125 tag
springfashion:50 tag style:132 tag streetstyle:132 tag
hairstylist:134 tag dapper:3 tag mensfashion:124 tag chanel:4
Repeat the semi-supervised process for post context text and
phrases (phrase2vec) to increase reach Calculate users affinity
based on intensity and recency
44. Interest Prediction user category categorized features user
1 Arts and Entertainment/Mov ies tag spoilers:30 tag shrek:18 tag
hercules:12 desc dvd:1 tag pokemon:7 tag thor:58 tag cinderella:3
tag hobbit:123 desc comedy:1 txt movies:100 desc movie:1 tag
hulk:21 photo aladdin:28 tag disney:500 photo batman:10 txt
bambi:12 desc animation:12 tag pixar:87 tag tarzan:8 tag marvel:385
tag wolverine:21 desc oscar:1 tag twilight:2 tag user 2 Style and
Fashion txt fashion:108 tag womensfashion:110 tag
fashiondiaries:133 tag redhair:2 tag menswear:125 tag
springfashion:50 tag style:132 tag streetstyle:132 tag
hairstylist:134 tag dapper:3 tag mensfashion:124 tag chanel:4 user
3 Food and Drinks follows_user31:1 follows_user43:1 likes_user131:1
follows_user423:1 follows_user331:1 user 4 Style and Fashion
follows_user556:1 follows_user221:1 likes_user191:1
follows_user13423:1 likes_user335831:1 Leverage follower graph and
like actions We identify users with high value of ucat=k
(influencers) Follows and likes of posts created by influencers in
the k-th category serve as additional features Good for users who
do not create much content
45. Tumblr Interest Targeting A/B Tests With 8 advertisers we
ran consecutive untargeted and targeted campaigns On average 20%
lift in engagement (likes, reblogs, follows) Campaign Control
Targeted Home & Garden - +9.71% Style & Fashion - +42.53%
Sports/Outdoor - +19.86% Arts & Enter./Television - +24.37%
Arts & Enter./Video Games - +19.02% Pets/Dogs - +27.21% Arts
& Enter. (1) - +9.08% Arts & Enter. (2) - +6.54%
46. Deployed System Deployed system Delivers inference for
users that covers more than 90% of daily activities on Tumblr
Adoption rate: 60% of all campaigns use our targeting today
Interest and gender models are retrained on a regular basis Daily
scoring by leveraging MapReduce on Hadoop
47. Evaluation Accuracy Tested on my Blog Gender Prediction
Interest Prediction - high support ones Score #features inferred
gender 1.330301 236 male Category #features why Sports 111 I follow
a lot of soccer related blogs Arts and Entert./TV 107 I follow and
reblog game of thrones blogs Photography 95 In my description I say
I like photography and post about it Science 29 I reblog Yahoo Labs
blogs and have it in description Advertising/Marketing 7 I follow
advertising related blogs