Online advertising and large scale model fitting

  • Upload
    wush-wu

  • View
    3.627

  • Download
    1

Embed Size (px)

Citation preview

Online Advertising and Large-scale model fitting

Wush Wu2014-10-24

Outline

Introduction of Online Advertising

Handling Real DataData Engineering

Model Matrix

Enhance Computation Speed of R

Fitting Model to Large Scale DataBatch Algorithm Parallelizing Existed Algorithm

Online Algorithm SGD, FTPRL and Learning Rate Schema

Display Advertising Challenge

Ad Formats Pre Roll Video Ads

Ad Formats Banner/ Display Ads

Adwords Search Ads

Related Content Ads

Online Advertising is Growing Rapidly

Why Online Advertising is Growing?

Wide reach

Target oriented

Quick conversion

Highly informative

Cost-effective

Easy to use

Measurable

Half the money I spend on advertising is wasted; the trouble is I don't know which half.

How do we measure the online ad?

The user behavior on the internet is trackable.We know who watches the ad.

We know who buys the product.

We collect data for measurement.

How do we collect the data?

Performance-based advertising

Pricing ModelCost-Per-Mille (CPM)

Cost-Per-Click (CPC)

Cost-Per-Action (CPA) or Cost-Per-Order (CPO)

To Improve Profit

Display the ad with high Click-Through Rate(CTR) * CPC, or Conversion Rate (CVR) * CPO

Estimation of the probability of click (conversion) is the central problem Rule Based

Statistical Modeling (Machine Learning)

System

WebsiteAd RequestRecommendationAd Delivering

WebsiteLog Server

Model Fitting

Batch

Online

Rule Based

Let the advertiser selects the target group

X

Statistical Modeling

We log the display and collect the response

FeaturesAd

Channel

User

Features of Ad

Ad typeText

Figure

Video

Ad ContentFashion

Health

Game

Features of Channel

Visibility

Features of User

Sex

Age

Location

Behavior

Real Features

Zhang, Weinan and Yuan, Shuai and Wang, Jun and Shen, Xuehua. Real-Time Bidding Benchmarking with iPinYou Dataset

Know How v.s. Know Why

We usually do not study the reason of high CTR

Little improvement of accuracy implies large improvement of profit

Predictive Analysis

Data

SchoolStatic

Cleaned

Public

CommercialDynamic

Error

Private

Data Engineering

Impression

CLICK_TIMECLIENT_IPCLICKED ADID

2014/05/17 ...2.17.x.x133594

2014/05/17 ...140.112.x.x134811

Click

+

Data Engineering with R
http://wush978.github.io/REngineering/

Automation of R JobsConvert R script to command line application

Learn modern tools such as jenkins

Connections between multiple machineLearn ssh

LoggingLinux tools: bash redirection, tee

R package: logging

R Error Handlingtry, tryCatch

Characteristic of Data

Rare Event

Large Amount of Categorical FeaturesBinning Numerical Features

Features are highly correlated

Some features occurs frequently, some occurs rarely

Common Statistical Model for CTR

Logistic Regression

Gradient Boosted Regression TreeCheck xgboost

Logistic Regression

Linear relationship with featuresFast prediction

(Relative) Fast Fitting

Usually fit the model with L2 regularization

How large is the data?

Instances: 10^9

Binary features: 10^5

Subsampling

Sampling is useful for:Data exploration

Code testing

Sampling might harm the accuracy (profit)Rare event

Some features occurs frequently and some occurs rarely

We do not subsample data so far

Sampling

Olivier Chapelle, et. al. Simple and scalable response prediction for display advertising.

Computation

Model Matrix

head(model.matrix(Species ~ ., iris))

Dense Matrix

10^9 instances

10^5 binary features

10^14 elements for model matrix

Size: 4 * 10^14 bytes400 TB

In memory is about 10^3 faster than on disk

R and Large Scale Data

R cannot handle large scale data

R consumes lots of memory

Sparse Matrix

Sparse Matrix

Sparse Matrix

The size of non-zero could be estimated by the number of categorical variable

Sparse Matrix

Sparse matrix is useful for:Large amount of categorical data

Text Analysis

Tag Analysis

R package: Matrix

m1