Network A/B Testing: From Sampling to Estimation Ya Xu ‡ Joint work with Huan Gui † Anmol Bhasin ‡ Jiawei Han † † University of Illinois at Urbana-Champaign,

Network A/B Testing: From Sampling to Estimation

Ya Xu‡

Joint work with Huan Gui† Anmol Bhasin‡ Jiawei Han†

† University of Illinois at Urbana-Champaign, Urbana ‡ LinkedIn Corporation

INTRODUCTION

A/B Testing

• Uniformly Random• Control• Treatment• Average Treatment

Effect•

A/B Testing – Two parallel universes

• Two parallel universes

Parallel Universe 1 (control, )

Parallel Universe 2 (treatment, )

Real World(Observations, )

Assumption

Network A/B Testing

INTERACTIONS BETWEEN NODES IN NETWORKS

Examples– Experiment on feed ranking algorithms

• Treatment feed algorithm ranks more relevant items higher• Adam (treatment) clicks on a feed update(X)• X shows up higher for Adam’s friend Ben (control)• Ben (control) clicks on X

– Experiment on People You May Know recommendations

– …

Assumption: SUTVA • SUTVA (Stable Unit Treatment Value

Assumption) – Treatment Assignment Vector

•

– Response function

• Each individual’s response is affected only by their own treatment assignments.

NETWORK A/B TESTING FRAMEWORK

Framework1. Experimental Design– Randomize assignment to minimize

interactions

2. Experimental Analysis– Adjust for network effect post experiment

Experimental Design1. Partition the network/graph 2. Randomize at cluster level

Minimize the links between clusters Minimize the interactions between treatment and control Minimize information leakage Smaller bias for ATE

Balanced Graph Partition

• If the cluster sizes are the same for all clusters• No matter what users’ responses are, the covariance is

zero, leading to non-biased estimator.

See Middleton and Aronow 2011 for derivation

Clustering Real NetworkHeterogeneous & large scale (350MM+)

An employee network from LinkedIn

3-net clustering (Ugander et. al.,KDD’13)

Randomized Balanced Graph Partition

• Random Shuffling on Label Propagation1. Randomly initialize clusters (equal size)2. Select two nodes and swap their labels if it results in

fewer edges between clusters.3. Randomly Shuffle x% of labels4. Repeat until convergence.

Break local optimal

Clustering Results• Network Statistics

• Edges # within each clusters

Nodes # Edges # Max Degree Avg. Degree

7.26e4 2.88e6 3997 39.67

Method LP RSLP MM

# of edges(1e6) 2.161 2.355 2.359

RSLP can be easily distributed as Label Propagation Algorithm, while achieves comparable performance as Modularity Maximization.

Experimental Analysis• Exposure Models– SUTVA– Neighborhood Exposure (Ugander et. al., KDD’13)

• Definition: i is neighborhood exposed to treatment if (1) i is in treatment, and (2) At least θ% of i’s neighbors are in treatment

• Assumption: i’s response under neighborhood exposure is the same as if everyone receives treatment.

Bias-Variance Tradeoff

θ = 0.9

θ = 0.3

About 80% of data points would be

invalid (high variance)

Stronger assumption

Yi(θ= 0.3) = Yi(θ= 1)

(large bias)

Fraction Neighborhood Exposure • Users’ responses are determined by– the treatment assignment – the fraction of neighbors having the

same treatment assignment.

can be arbitrary function

E.g., Additive Models

Example• Additive Model I

– ATE

Example• Additive Model II

– ATE

SIMULATIONS & REAL EXPERIMENTS

Simulations• Real network graph• Generation model (Eckles et al. 2014)

• Compare bias & variance of five estimators

Bias Variance

Increasing treatment%

Increasing treatment%

Bias Variance

Increasing Network Effect

Increasing Network Effect

Real Online Experiment1. Select a country2. Apply randomized balanced graph partitioning to assign

treatment/control3. Apply two Feed ranking algorithms to treatment/control4. Estimate ATE using various approaches

Real Online Experiment• Picked Netherlands

• 600 clusters 300/300 in treatment/control• Conducted A/A test to ensure no bias

Real Online ExperimentsResults

Method ATE for social gesture

SUTVA 0.168

Network Exposure θ = 0.75 0.264


Hajek. Network Exposure θ = 0.75

0.625


Fraction Exposure (Additive I) 0.687

Fraction Exposure (Additive II) 0.714

Key Takeaways• Network effect in A/B Testing• Experimental Design: Balanced Graph Partition• Experimental Analysis: Fraction Neighborhood

Exposure Model • Experiments

– Simulation– Real Online Experiments

• Lots of future work!

Percentage of Units in Treatment

• The distribution of changes with percentage of units in treatment.

is not representative.

Graph Cluster Randomization (Ugander et. al.,

KDD’13)

• Partition the social network – How to cluster? Any constraints?

• Randomization on the cluster level – Users in the same cluster receive the

same treatment assignment (treatment/control).

• Estimate Average Treatment Effect– Any assumptions?

Documents

Network A/B Testing: From Sampling to Estimation Ya Xu ‡ Joint work with Huan Gui † Anmol Bhasin ‡ Jiawei Han † † University of Illinois at Urbana-Champaign,