38
Website Optimization Problem and Its Solutions Shuhei Iitsuka and Yutaka Matsuo The University of Tokyo

Website Optimization Problem and Its Solutions

Embed Size (px)

Citation preview

Page 1: Website Optimization Problem and Its Solutions

Website Optimization Problem and Its Solutions

Shuhei Iitsuka and Yutaka Matsuo The University of Tokyo

Page 2: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

A/B testing is powerful.

2

ref. How Obama Raised $60 million by Running a Simple Experimenthttp://blog.optimizely.com/2010/11/29/how-obama-raised-60-million-by-running-a-simple-experiment/

8.3% 11.6%sign-up rate

$60M!

Page 3: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Sample size is power.

3

Result

Result

Page 4: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

See the wood first.

4

See the wood first. Search the neighbors.Initialization Phase Local Search Phase

Page 5: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Agenda

1. Related Studies

2. Website Optimization Problem

3. Proposed Testing Method

4. Experimental Results

5. Discussion & Conclusion

5

Page 6: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Past Studies

Giants making profits by online testing with a large number of users.

6

1. Related Studies

However, how can we use it for smaller websites?

Page 7: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Existing Testing Methods

7

A B

A B

A B

A B

B

A

A

B

A B

A B

A B

A B

A B

A/B Testing Full Factorial Design

Fractional Factorial Design Bandit Algorithm

1. Related Studies

Page 8: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Agenda

1. Related Studies

2. Website Optimization Problem

3. Proposed Testing Method

4. Experimental Results

5. Discussion & Conclusion

8

Page 9: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Expression of a VariationA website variation can be denoted as a combination of elements.

9

=( , , )Variation

→ The problem can be defined as a combinatorial optimization problem.

“GET INVOLVED”

“CHANGE”

2. Website Optimization Problem

Website Variation:

Page Element:

x = (x1, · · · , xm)

xi 2 Vi

Page 10: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Interaction with Users

10

p(y|x)f(x) ' E[y|x]

The evaluation value need to be estimated from the given feedback.

y p(y|x)f(x) ' E[y|x] where

→ The evaluation function is estimated by the expected value.

2. Website Optimization Problem

xWebsite Variation

yUser Behavior

Page 11: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Website Optimization Problem

Find the solution which satisfies the following equation.

11

x

= arg max

x2XE[y|x] s.t. y p(y|x)

• maximizes the conditional expected value of the key metrics.

• is derived from the probability distribution.

2. Website Optimization Problem

x

y

x

Page 12: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Local Search Solution

12

1. Initialization

2. Repeat until no improvement is made or all samples have been used.

2-1. Neighbor Solution Generation

2-2. Solution Move

X

x 2 X

X

0 Neighbors(x)

x Move(x, X 0)

2. Website Optimization Problem

Page 13: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Agenda

1. Related Studies

2. Website Optimization Problem

3. Proposed Testing Method

4. Experimental Results

5. Discussion & Conclusion

13

Page 14: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Organization of Existing Testing Methods

14

Search Algorithm Technique

A/B Testing Local Search None

Full Factorial Design Brute-force Search None

Fractional Factorial Design Brute-force Search Linear Assumption

Bandit Algorithm Brute-force Search Flexible Sample Allocation

3. Proposed Testing Method

Page 15: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Technique #1: Linear Assumption

15

Color Label Location

A B C L R

x = (x1, x2, x3)

3. Proposed Testing Method

Page 16: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Technique #2: Flexible Sample Allocation

16

3. Proposed Testing Method

3.2% 2.4% 5.6% 1.6%

Expected Value

Page 17: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Racing AlgorithmAnother implementation of Flexible Sample Allocation.

17

3. Proposed Testing Method

Clic

k Th

roug

h Ra

te

A B C D EVariation

Remove

Adopt

Page 18: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Overview of Proposed Method

18

Initialization Phase Local Search Phase• Collects data from the entire

solution space. • Estimates the optimal solution

with linear assumption.

• Start Local Search starting from the estimated solution.

3. Proposed Testing Method

+ streamlined by flexible sample allocation

Page 19: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Agenda

1. Related Studies

2. Website Optimization Problem

3. Proposed Testing Method

4. Experimental Results

5. Discussion & Conclusion

19

Page 20: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Evaluation Experiments

1. Simulation Experiment / Artificial Problem

2. Simulation Experiment / Actual Large-scale Website

3. Practical Experiment / Actual Small-scale Website

20

4. Experimental Results

Page 21: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Testing Methods

21

Method Initialization Local Search

BF (Brute-force) Random N/A

LA (Linear Assumption) Linear Assumption N/A

LS (Local Search) Random Local Search

LALS (Linear Assumption +

Local Search)Linear Assumption Local Search

LALS+ (LALS +

Racing Algorithm)

Linear Assumption + Flexible Allocation

Local Search + Flexible Allocation

Baseline

Proposal

4. Experimental Results

Page 22: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Exp. #1: Simulation on Artificial Problems

22

Problem Evaluation Function Sample Size

#1 Linear Init. Only

#2 Linear Init. + Local Search

#3 Non-Linear Init. Only

#4 Non-Linear Init. + Local Search

f2(x) = x1 + x2 + x3 x4 x5 x6 x1x2 +N(0, 1)

f1(x) = x1 + x2 + x3 x4 x5 x6 +N(0, 1)

Problem Settings

Linear Evaluate Function

Non-Linear Evaluate Function

Nf(x)

xi 2 0, 1, 2

4. Experimental Results

Non-Linear Member

Noise

Page 23: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Each method is evaluated by the accuracy of the estimated optimal solution.

23

Exp. #1 Results

Problem BF LA LS LALS LALS+

#1 (Linear/Small) 0.24 1.00 0.00 1.00 1.00

#2 (Linear/Large) 0.54 1.00 0.01 1.00 1.00

#3 (Non-Linear/Small) 0.26 0.14 0.01 0.22 0.22

#4 (Non-Linear/Large) 0.46 0.26 0.02 0.33 0.68

Baseline Proposal

Linear assumption works well with the linear evaluation function.

Flexible sample allocation boosts the local search.

4. Experimental Results

Page 24: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Exp. #2: Simulation on a Large Website

• Actual large-scale website with 1000-10000 visiters/day.

• Key metrics: Ads Click-through Rate

• Evaluation function is simulated from the log (Mar 14-22, 2013)

24

A B C

Which one does maximize CTR?

SPYSEE http://spysee.jp

q(x) = 0.0640 + 0.0117xA 0.0067xB 0.0134xC

xi 2 0, 1 (Apply the change or not)

4. Experimental Results

Page 25: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Exp. #2 Results

25

0.25

0.50

0.75

1.00

0 10000 20000 30000Sample Size n

Accuracy

Method

LALS+

LALS

LS

LA

BF

Average accuracy of each algorithm LA exhibits the best performance because the evaluation function is linear.

Our proposed methods succeeds to start the local search from the promising initial solution.

LALS+ can improve the performance rapidly with the flexible sample allocation.

Init. Local Search

4. Experimental Results

Page 26: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Exp. #3: Practical Test on a Small Website• Implemented our proposed method as an optimizer program.

• Actual small-scale website with hundreds of visitors/day.

• LS (Baseline) VS. LALS (Proposal)

• Key metric: Page views per session

26

Element Values

Thumbnail border width 0px, 5px

Thumbnail margin 0px, 5px, 10px

Thumbnail Size 100px, 200px, 300px

Thumbnail Shape square, circle

Imagerous* http://imagero.us

Tested Elements

4. Experimental Results

Page 27: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Exp. #3: Results

• LALS reached a 57% higher solution.(t-test: 99% confidence)

• Our proposed method functions as a practical optimizer program with an actual small-scale website.

27

Transition of the current solution and the expected value.

Expe

cted

Val

ue E

[y|x

]

0

2

4

6

8

Sample Size n

0 175 350 525 700

LSLALS

4. Experimental Results

57%

Page 28: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Agenda

1. Related Studies

2. Website Optimization Problem

3. Proposed Testing Method

4. Experimental Results

5. Discussion & Conclusion

28

Page 29: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

From Bits to Atoms

29

Requirements

Each solution is expressed as a combination of elements.

Reconfiguration cost is zero.ex.) 3D printers

User feedback is observable.ex.) Review website

5. Discussion & Conclusion

Page 30: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Conclusion

• We formalized existing testing methods and a website optimization problem.

• We proposed a new rapid testing method which works on small-scale websites.

• We evaluated that our proposed method works on actual small-scale websites.

30

5. Discussion & Conclusion

Page 31: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Future Works

31

Make a Hypothesis

x 2 XDefine Metrics

f(x)Explore the Solution

x

= arg max

x2Xf(x)

We’ve tackled this!Which key metrics we

need to focus for effective experiments?

How do we define our website as a set of

variables? How can we automate

the generation of candidates?

Website Optimization Process

5. Discussion & Conclusion

Page 32: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions (Paper ID:516)” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo

Shuhei Iitsuka, The University of Tokyo. tushuhei.com

[email protected]

Thank you for listening.

32

Page 33: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions (Paper ID:516)” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo

Appendix

33

Page 34: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

X: Candidate Solutions

Y ← : Empty Set for Observed Data, n ← 0 : Number of Observations.

N_1: Sample Size for Initialization Phase, N_2: Sample Size for Local Search Phase.

FOR N_1 TIMES:

Y ← Observe(RandomChoice(X))

n++

x* ← LinearAssumption(Y)

WHILE n < N DO:

x’ ← GetNeighborSolution(x*, X)

FOR N_2 TIMES:

Y ← Observe(x’)

n++

x* ← Update(x*, x’, Y)

RETURN x*

34

Initialization

Local Search

3. Proposed Testing Method

+ Streamlined by flexible allocation

Page 35: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

DOE and Linear Assumption• DOE (Design of Experiment) is used in traditional industries

which have huge cost to reconfigure the environment.

• Websites require no cost to change the parameters. → We can conduct random observation, then apply ANOVA to estimate each element’s effect.

35

Design of Experiment: Design beforehand.

Linear Assumption: Random collection first.

Zero Reconfiguration Cost

5. Discussion & Conclusion

Page 36: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Webpage Segmentation

36

7

2ϕ3ϕ

12ϕ 2

(a) (b)

(c)

( )1, 2, 3, 4VB VB VB VBΟ =

1 2 3, ,ϕ ϕ ϕΦ =

( )( )( )

1

2

3

1, 22, 33, 4

VB VBVB VBVB VB

else NULL

ϕϕδϕ

! "! "# $# $# $# $ = # $# $# $# $# $ # $% & % &

( )2 2 _1, 2 _ 2, 2 _ 3VB VB VB VB=

2 1 22 2,ϕ ϕΦ =

( )( )

12

2 22

2 _1, 2 _ 2

2 _ 2, 2 _ 3

VB VB

VB VBNULLelse

ϕδ ϕ

! "! "# $# $

= # $# $# $# $ # $

% & % &

(d) (e)

Figure 1. The layout structure and vision-based content structure of an example page. (d)

and (e) show the corresponding specification of vision-based content structure.

Since each !i is a sub-web-page of the original page, it has similar content structure

as !. Recursively, we have ( ), ,t t t ts s s sO δΩ = Φ , 1 2, ,..., stNt

s st st stO = Ω Ω Ω ,

Cai, Deng, et al. Vips: a vision-based page segmentation algorithm. Microsoft technical report, MSR-TR-2003-79, 2003.

Page 37: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Page Element Extraction

37

WELCOME!

JOIN NOW

Background: WHITE, BLACK

Button Color: WHITE, BLACK

Strong Interactive Effect?

Page 38: Website Optimization Problem and Its Solutions

“Website Optimization Problem and Its Solutions” Shuhei Iitsuka and Matsuo Yutaka, The University of Tokyo. KDD 2015.

Bandit Algorithm• ε-greedy

• ε: exploration, 1 - ε: exploitation

• Softmax: High expected value → High exploitation rate

• UCB1: Expected value + Freshness bonus

38

13

ばれるアルゴリズムを説明する。

• epsilon-greedy epsilon-greedy アルゴリズムではあらかじめ非常に小さな値としてパラメータ 0 < ε < 1が設定されており、εの確率で探求を行い、1 − εの確率で活用を行う。探求が選択された場合は最も評価値の期待値が大きい解を選んで表示する。一方、活用が選択された場合には実行可能解の中からランダムに一つの解を選んで表示する。このように探求と活用の間を行き来することで未評価の解を評価しつつも、評価値の期待値が大きい解を優先的に表示することで実験による損失を免れている。簡単に実装ができる反面、解の期待値に関わらず探求または活用を選択するため、期待値に大きな差がある場合でも期待値が低い解を選んでしまう可能性がある。

• Softmax Softmax アルゴリズムでは、解の評価値の期待値に応じて表示する確率を変化させる。解空間をX、観測データから算出される解 x ∈ X の評価値の期待値を yx

とすると、解 xをユーザに表示する確率 p(x)は式 2.1によって表される。

p(x) =exp(yx/τ)!

x∈X exp(yx/τ)(2.1)

τ は温度と呼ばれるパラメータであり、探究心の強さを表している。温度が非常に高いとき、すなわち τ →∞のときは解 xを選ぶ確率 p(x)は 1/|X|に収束するため、すべての解が均等の確率で選ばれることになる。逆に温度が低いときは yx が効き始めるため、最も評価値の期待値が高い解が 1に近い確率で選ばれるようになる。

• UCB1 UCB1 ではこれまでに紹介したアルゴリズムとは異なり、ランダム性を用いない。UCB1 では基本的に評価値の期待値 yx が最も高い解を選ぶ戦略だが、解を選んだ回数に応じてボーナスが追加される。解 x ∈ X を表示した回数を tx とすると、解 x

の UCB値 ux は

ux = yx +

"2 log(

!x∈X tx)

tx

と算出され、この UCB値を最大にする解 xが選択される。

バンディットアルゴリズムは実験を行いながら、その実験の過程で評価値の低い解をフィルタリングし、最適解を常に表示する状態に徐々に移行するアルゴリズムの枠組みだということができる。つまり、仮説パターンとアルゴリズムさえ設定しておけば、人手を挟むことなく自動で最適化を行うことができる [23]。Amazon.com*8 のトップページでは同様の手法を用い

*8 Amazon http://www.amazon.com/

13

ばれるアルゴリズムを説明する。

• epsilon-greedy epsilon-greedy アルゴリズムではあらかじめ非常に小さな値としてパラメータ 0 < ε < 1が設定されており、εの確率で探求を行い、1 − εの確率で活用を行う。探求が選択された場合は最も評価値の期待値が大きい解を選んで表示する。一方、活用が選択された場合には実行可能解の中からランダムに一つの解を選んで表示する。このように探求と活用の間を行き来することで未評価の解を評価しつつも、評価値の期待値が大きい解を優先的に表示することで実験による損失を免れている。簡単に実装ができる反面、解の期待値に関わらず探求または活用を選択するため、期待値に大きな差がある場合でも期待値が低い解を選んでしまう可能性がある。

• Softmax Softmax アルゴリズムでは、解の評価値の期待値に応じて表示する確率を変化させる。解空間をX、観測データから算出される解 x ∈ X の評価値の期待値を yx

とすると、解 xをユーザに表示する確率 p(x)は式 2.1によって表される。

p(x) =exp(yx/τ)!

x∈X exp(yx/τ)(2.1)

τ は温度と呼ばれるパラメータであり、探究心の強さを表している。温度が非常に高いとき、すなわち τ →∞のときは解 xを選ぶ確率 p(x)は 1/|X|に収束するため、すべての解が均等の確率で選ばれることになる。逆に温度が低いときは yx が効き始めるため、最も評価値の期待値が高い解が 1に近い確率で選ばれるようになる。

• UCB1 UCB1 ではこれまでに紹介したアルゴリズムとは異なり、ランダム性を用いない。UCB1 では基本的に評価値の期待値 yx が最も高い解を選ぶ戦略だが、解を選んだ回数に応じてボーナスが追加される。解 x ∈ X を表示した回数を tx とすると、解 x

の UCB値 ux は

ux = yx +

"2 log(

!x∈X tx)

tx

と算出され、この UCB値を最大にする解 xが選択される。

バンディットアルゴリズムは実験を行いながら、その実験の過程で評価値の低い解をフィルタリングし、最適解を常に表示する状態に徐々に移行するアルゴリズムの枠組みだということができる。つまり、仮説パターンとアルゴリズムさえ設定しておけば、人手を挟むことなく自動で最適化を行うことができる [23]。Amazon.com*8 のトップページでは同様の手法を用い

*8 Amazon http://www.amazon.com/