54
Modern Computational Statistics Lecture 20: Applications in Computational Biology Cheng Zhang School of Mathematical Sciences, Peking University December 09, 2019

Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Modern Computational Statistics

Lecture 20: Applications in ComputationalBiology

Cheng Zhang

School of Mathematical Sciences, Peking University

December 09, 2019

Page 2: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Introduction 2/23

I While modern statistical approaches have been quitesuccessful in many application areas, there are stillchallenging areas where the complex model structuresmake it difficult to apply those methods.

I In this lecture, we will discuss some of the recentadvancement on statistical approaches for computationalbiology, with an emphasis on evolutionary models.

Page 3: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Challenges in Computational Biology 3/23

Adapted from Narges Razavian 2013

Page 4: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Phylogenetic Inference 4/23

The goal of phylogenetic inference is to reconstruct theevolution history (e.g., phylogenetic trees) from molecularsequence data (e.g., DNA, RNA or protein sequences)

Molecular Sequence Data

Taxa

Species A

Species B

Species C

Species D

Characters

ATGAACAT

ATGCACAC

ATGCATAT

ATGCATGC

Phylogenetic Tree

D

A

B

C

Lots of modern biological and medical applications: predict theevolution of influenza viruses and help vaccine design, etc.

Page 5: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Example: B Cell Evolution 5/23

This happens inside of you!

These inferences guide rational vaccine design.

Page 6: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Example: B Cell Evolution 5/23

This happens inside of you!

These inferences guide rational vaccine design.

Page 7: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Example: B Cell Evolution 5/23

This happens inside of you!

These inferences guide rational vaccine design.

Page 8: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Example: B Cell Evolution 5/23

This happens inside of you!

These inferences guide rational vaccine design.

Page 9: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

eEvolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 10: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

pa

ch

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

eEvolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 11: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

A

A

A

A

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 12: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

A

A

A

A

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 13: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

T

T

T

T

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 14: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

G

G

G

G

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 15: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

A

C

C

C

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 16: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

A

A

A

A

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 17: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

C

C

T

T

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 18: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

C

C

T

T

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 19: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Bayesian Phylogenetics 6/23

∑∏

∑∏

∑∏

C

C

T

T

ATGAAC · · ·

ATGCAC · · ·

ATGCAT · · ·

ATGCAT · · ·(τ, q) y1y2y3y4y5y6

e

Evolution model:

p(ch|pa, qe)

qe: amount of evolution on e.

Likelihood

p(Y |τ, q) =

M∏i=1

∑ai

η(aiρ)∏

(u,v)∈E(τ)

Paiuaiv(quv)

Given a proper prior distribution p(τ, q), the posterior is

p(τ, q|Y ) ∝ p(Y |τ, q)p(τ, q).

Page 20: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Markov chain Monte Carlo 7/23

Random-walk MCMC (MrBayes, BEAST):

I simple random perturbation (e.g., Nearest NeighborhoodInterchange) to generate new state.

NNI

Challenges for MCMC

I Large search space: (2n− 5)!! unrooted trees (n taxa)

I Intertwined parameter space, low acceptance rate, hard toscale to data sets with many sequences.

Page 21: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Variational Inference 8/23

q∗(θ)p(θ|x)

Qq∗(θ) = arg min

q∈QKL (q(θ)‖p(θ|x))

I VI turns inference into optimization

I Specify a variational family of distributions over the modelparameters

Q = {qφ(θ);φ ∈ Φ}

I Fit the variational parameters φ to minimize the distance(often in terms of KL divergence) to the exact posterior

Page 22: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Evidence Lower Bound 9/23

L(θ) = Eq(θ)(log p(x, θ))− Eq(θ)(log q(θ)) ≤ log p(x)

I KL is intractable; maximizing the evidence lower bound(ELBO) instead, which only requires the joint probabilityp(x, θ).I The ELBO is a lower bound on log p(x).I Maximizing the ELBO is equivalent to minimizing the KL.

I The ELBO strikes a balance between two termsI The first term encourages q to focus probability mass where

the model puts high probability.I The second term encourages q to be diffuse.

I As an optimization approach, VI tends to be faster thanMCMC, and is easier to scale to large data sets (viastochastic gradient ascent)

Page 23: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Subsplit Bayesian Networks 10/23

Inspired by previous works (Hohna and Drummond 2012,Larget 2013), we can decompose trees into local structures andencode the tree topology space via Bayesian networks!

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 24: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Subsplit Bayesian Networks 10/23

Inspired by previous works (Hohna and Drummond 2012,Larget 2013), we can decompose trees into local structures andencode the tree topology space via Bayesian networks!

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 25: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Subsplit Bayesian Networks 10/23

Inspired by previous works (Hohna and Drummond 2012,Larget 2013), we can decompose trees into local structures andencode the tree topology space via Bayesian networks!

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 26: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Subsplit Bayesian Networks 10/23

Inspired by previous works (Hohna and Drummond 2012,Larget 2013), we can decompose trees into local structures andencode the tree topology space via Bayesian networks!

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 27: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Subsplit Bayesian Networks 10/23

Inspired by previous works (Hohna and Drummond 2012,Larget 2013), we can decompose trees into local structures andencode the tree topology space via Bayesian networks!

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 28: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Probability Estimation Over Tree Topologies 11/23

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Rooted Trees

psbn(T = τ) = p(S1 = s1)∏i>1

p(Si = si|Sπi = sπi).

Page 29: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Probability Estimation Over Tree Topologies 11/23

A B

C D

1 2

4 5

3

A

B

C

D

1roo

t/unro

ot

A

B

C

D

3root/unroot

A

A

B

C

D

A

B

CD

A

BCD

A

B

C

D

A

B

C

D

AB

CD

S4

S5

S6

S7

S2

S3

S1

Unrooted Trees:

psbn(T u = τ) =∑s1∼τ

p(S1 = s1)∏i>1

p(Si = si|Sπi = sπi).

Page 30: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Tree Probability Estimation via SBNs 12/23

SBNs can be used to learn a probability distribution based on acollection of trees T = {T1, · · · , TK}.

Tk = {Si = si,k, i ≥ 1}, k = 1, . . . ,K

Rooted Trees

I Maximum Likelihood Estimates: relative frequencies.

pMLE(S1 = s1) =ms1

K, pMLE(Si = si|Sπi = ti) =

msi,ti∑s∈Ci

ms,ti

Unrooted Trees

I Expectation Maximization

pEM,(n+1) = arg maxp

Ep(S1|T,pEM,(n))

(log p(S1) +

∑i>1

log p(Si|Sπi)

)

Page 31: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Example: Phylogenetic Posterior Estimation 13/23

10 8 10 6 10 4 10 2 100

log(ground truth)10 8

10 7

10 6

10 5

10 4

10 3

10 2

10 1

100

log(

estim

ated

pro

babi

lity)

CCD

peak 1peak 2

10 8 10 6 10 4 10 2 100

log(ground truth)10 8

10 7

10 6

10 5

10 4

10 3

10 2

10 1

100

log(

estim

ated

pro

babi

lity)

SBN-EM

peak 1peak 2

104 105

number of samples

10 2

10 1

100

KL d

iver

genc

e

DS1ccdsbn-sasbn-emsbn-em-srf

[Zhang and Matsen, NeurIPS 2018]

I Compared to a previous method CCD (Larget, 2013),SBNs significantly reduce the biases for both highprobability and low probability trees.

I SBNs perform better in the weak data regime.

Page 32: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Example: Phylogenetic Posterior Estimation 14/23

Data set (#Taxa, #Sites)Tree space

sizeSampledtrees

KL divergence to ground truth

SRF CCD SBN-SA SBN-EM SBN-EM-α

DS1 (27, 1949) 5.84×1032 1228 0.0155 0.6027 0.0687 0.0136 0.0130DS2 (29, 2520) 1.58×1035 7 0.0122 0.0218 0.0218 0.0199 0.0128DS3 (36, 1812) 4.89×1047 43 0.3539 0.2074 0.1152 0.1243 0.0882DS4 (41, 1137) 1.01×1057 828 0.5322 0.1952 0.1021 0.0763 0.0637DS5 (50, 378) 2.84×1074 33752 11.5746 1.3272 0.8952 0.8599 0.8218DS6 (50, 1133) 2.84×1074 35407 10.0159 0.4526 0.2613 0.3016 0.2786DS7 (59, 1824) 4.36×1092 1125 1.2765 0.3292 0.2341 0.0483 0.0399DS8 (64, 1008) 1.04×10103 3067 2.1653 0.4149 0.2212 0.1415 0.1236

[Zhang and Matsen, NeurIPS 2018]

Remark: Unlike previous methods, SBNs are flexible enoughto provide accurate approximations to real data posteriors!

Page 33: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Variational Bayesian Phylogenetic Inference 15/23

I Approximating Distribution:

Qφ,ψ(τ, q) ,

tree topology

Qφ(τ)

·branch length

Qψ(q|τ)

I Multi-sample Lower Bound:

LK(φ,ψ) = EQφ,ψ(τ1:K ,q1:K) log

(1

K

K∑i=1

p(Y |τ i, qi)p(τ i, qi)Qφ(τ i)Qψ(qi|τ i)

)I Use stochastic gradient ascent (SGA) to maximize the

lower bound:

φ, ψ = arg maxφ,ψ

LK(φ,ψ)

I φ: VIMCO/RWSI ψ: The Reparameterization Trick

Page 34: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Variational Bayesian Phylogenetic Inference 15/23

I Approximating Distribution:

Qφ,ψ(τ, q) ,tree topology

Qφ(τ) ·branch length

Qψ(q|τ)

I Multi-sample Lower Bound:

LK(φ,ψ) = EQφ,ψ(τ1:K ,q1:K) log

(1

K

K∑i=1

p(Y |τ i, qi)p(τ i, qi)Qφ(τ i)Qψ(qi|τ i)

)I Use stochastic gradient ascent (SGA) to maximize the

lower bound:

φ, ψ = arg maxφ,ψ

LK(φ,ψ)

I φ: VIMCO/RWSI ψ: The Reparameterization Trick

Page 35: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Variational Bayesian Phylogenetic Inference 15/23

I Approximating Distribution:

Qφ,ψ(τ, q) ,tree topology

Qφ(τ) ·branch length

Qψ(q|τ)

I Multi-sample Lower Bound:

LK(φ,ψ) = EQφ,ψ(τ1:K ,q1:K) log

(1

K

K∑i=1

p(Y |τ i, qi)p(τ i, qi)Qφ(τ i)Qψ(qi|τ i)

)

I Use stochastic gradient ascent (SGA) to maximize thelower bound:

φ, ψ = arg maxφ,ψ

LK(φ,ψ)

I φ: VIMCO/RWSI ψ: The Reparameterization Trick

Page 36: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Variational Bayesian Phylogenetic Inference 15/23

I Approximating Distribution:

Qφ,ψ(τ, q) ,tree topology

Qφ(τ) ·branch length

Qψ(q|τ)

I Multi-sample Lower Bound:

LK(φ,ψ) = EQφ,ψ(τ1:K ,q1:K) log

(1

K

K∑i=1

p(Y |τ i, qi)p(τ i, qi)Qφ(τ i)Qψ(qi|τ i)

)I Use stochastic gradient ascent (SGA) to maximize the

lower bound:

φ, ψ = arg maxφ,ψ

LK(φ,ψ)

I φ: VIMCO/RWSI ψ: The Reparameterization Trick

Page 37: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Structured Parameterization 16/23

SBNs Parameters

p(S1 = s1) =exp(φs1)∑

sr∈Sr exp(φsr ), p(Si = s|Sπi

= t) =exp(φs|t)∑

s∈S·|t exp(φs|t)

Branch Length Parameters

Qψ(q|τ) =∏

e∈E(τ)

pLognormal (qe | µ(e, τ), σ(e, τ))

I Simple Split

µs(e, τ) = ψµe/τ , σs(e, τ) = ψσe/τ .

I Primary Subsplit Pair (PSP)

µpsp(e, τ) = ψµe/τ +∑

s∈e//τψµs

σpsp(e, τ) = ψσe/τ +∑

s∈e//τψσs .

W

ψµ (W

1,W

2|W,

Z)

Z

ψ µ(Z

1 , Z2 |W,Z)

eψµ(W,Z)

W1

W2

Z1

Z2

+

µs(e, τ)

Page 38: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Structured Parameterization 16/23

SBNs Parameters

p(S1 = s1) =exp(φs1)∑

sr∈Sr exp(φsr ), p(Si = s|Sπi

= t) =exp(φs|t)∑

s∈S·|t exp(φs|t)

Branch Length Parameters

Qψ(q|τ) =∏

e∈E(τ)

pLognormal (qe | µ(e, τ), σ(e, τ))

I Simple Split

µs(e, τ) = ψµe/τ , σs(e, τ) = ψσe/τ .

I Primary Subsplit Pair (PSP)

µpsp(e, τ) = ψµe/τ +∑

s∈e//τψµs

σpsp(e, τ) = ψσe/τ +∑

s∈e//τψσs .

W

ψµ (W

1,W

2|W,

Z)Z

ψ µ(Z

1 , Z2 |W,Z)

eψµ(W,Z)

W1

W2

Z1

Z2

+

µpsp(e, τ)

Page 39: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Stochastic Gradient Estimators 17/23

SBNs Parameters φ. With τ j , qjiid∼ Qφ,ψ(τ, q)

I VIMCO. [Minh and Rezende, ICML 2016]

∇φLK(φ,ψ) 'K∑j=1

(LKj|−j(φ,ψ)− wj

)∇φ logQφ(τ j).

I RWS. [Bornschein and Bengio, ICLR 2015]

∇φLK(φ,ψ) 'K∑j=1

wj∇φ logQφ(τ j).

Branch Length Parameters ψ. gψ(ε|τ) = exp(µψ,τ + σψ,τ � ε).

I Reparameterization Trick. Let fφ,ψ(τ, q) = p(Y |τ,q)p(τ,q)Qφ(τ)Qψ(q|τ) .

∇ψLK(φ,ψ) 'K∑j=1

wj∇ψ log fφ,ψ(τ j , gψ(εj |τ j))

where τ jiid∼ Qφ(τ), εj

iid∼ N (0, I).

Page 40: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Qφ(τ)

sample

e.g., ancestralsampling for SBNs

B

D

A

C

τ 1

C

D

A

B

τ 2

... B

C

A

D

τK

Qψ(q|τ)sample

e.g., Lognormalfor branch lengths

B

D

A

C

(τ 1, q1)

C

D

A

B(τ 2, q2)

... B

C

A

D(τK, qK)

LK(φ,ψ)

multi-samplelower bound

SGA update

Page 41: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Ancestral sampling for SBNs

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 42: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Ancestral sampling for SBNs

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 43: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Ancestral sampling for SBNs

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 44: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Ancestral sampling for SBNs

S4

S5

S6

S7

S2

S3

S1

ABC

D

A

BC

D

A

B

C

D

D

1.0

1.0

1.0

1.0

AB

CD

A

B

C

D

A

B

C

D

1.0

1.0

1.0

1.0

D

A

B

C

A

B

C

D

Page 45: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Qφ(τ)sample

e.g., ancestralsampling for SBNs

B

D

A

C

τ 1

C

D

A

B

τ 2

... B

C

A

D

τK

Qψ(q|τ)sample

e.g., Lognormalfor branch lengths

B

D

A

C

(τ 1, q1)

C

D

A

B(τ 2, q2)

... B

C

A

D(τK, qK)

LK(φ,ψ)

multi-samplelower bound

SGA update

Page 46: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Qφ(τ)sample

e.g., ancestralsampling for SBNs

B

D

A

C

τ 1

C

D

A

B

τ 2

... B

C

A

D

τK

Qψ(q|τ)

sample

e.g., Lognormalfor branch lengths

B

D

A

C

(τ 1, q1)

C

D

A

B(τ 2, q2)

... B

C

A

D(τK, qK)

LK(φ,ψ)

multi-samplelower bound

SGA update

Page 47: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Qφ(τ)sample

e.g., ancestralsampling for SBNs

B

D

A

C

τ 1

C

D

A

B

τ 2

... B

C

A

D

τK

Qψ(q|τ)sample

e.g., Lognormalfor branch lengths

B

D

A

C

(τ 1, q1)

C

D

A

B(τ 2, q2)

... B

C

A

D(τK, qK)

LK(φ,ψ)

multi-samplelower bound

SGA update

Page 48: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Qφ(τ)sample

e.g., ancestralsampling for SBNs

B

D

A

C

τ 1

C

D

A

B

τ 2

... B

C

A

D

τK

Qψ(q|τ)sample

e.g., Lognormalfor branch lengths

B

D

A

C

(τ 1, q1)

C

D

A

B(τ 2, q2)

... B

C

A

D(τK, qK)

LK(φ,ψ)

multi-samplelower bound

SGA update

Page 49: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

The VBPI Pipeline 18/23

Qφ(τ)sample

e.g., ancestralsampling for SBNs

B

D

A

C

τ 1

C

D

A

B

τ 2

... B

C

A

D

τK

Qψ(q|τ)sample

e.g., Lognormalfor branch lengths

B

D

A

C

(τ 1, q1)

C

D

A

B(τ 2, q2)

... B

C

A

D(τK, qK)

LK(φ,ψ)

multi-samplelower bound

SGA update

Page 50: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Performance on Synthetic Data 19/23

A simulated study on unrooted phylogenetic trees with 8 leaves(10395 trees). The target distribution is a random sample fromthe symmetric Dirichlet distribution Dir(β1), β = 0.008

0 50 100 150 200Thousand Iterations

3.0

2.5

2.0

1.5

1.0

0.5

0.0

Evid

ence

Low

er B

ound

EXACTVIMCO(20)VIMCO(50)RWS(20)RWS(50)

0.02

0.00

0 50 100 150 200Thousand Iterations

10 1

100KL

Div

erge

nce

VIMCO(20)VIMCO(50)RWS(20)RWS(50)

10 4 10 3 10 2 10 1

Ground truth10 4

10 3

10 2

10 1

Varia

tiona

l app

roxi

mat

ion VIMCO(50)

RWS(50)

[Zhang and Matsen, ICLR 2019]

ELBOs approach 0 quickly ⇒ SBNs approximations are flexible.

More samples in the multi-sample ELBOs could be helpful.

Page 51: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Performance on Real Data 20/23

0 50 100 150 200Thousand Iterations

10 1

100

101

KL D

iver

genc

e

VIMCO(10)VIMCO(20)RWS(10)RWS(20)MCMC

0 50 100 150 200Thousand Iterations

10 1

100

101

KL D

iver

genc

e

VIMCO(10) + PSPVIMCO(20) + PSPRWS(10) + PSPRWS(20) + PSPMCMC

7042 7040 7038 7036GSS

7042

7041

7040

7039

7038

7037

7036

VBPI

[Zhang and Matsen, ICLR 2019]

I More samples ⇒ better exploration ⇒ betterapproximation

I More flexible branch length distributions across treetopologies (PSP) ease training and improve approximation

I Outperform MCMC via much more efficient tree spaceexploration and branch length updates

Page 52: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Performance on Real Data 21/23

Data setMarginal Likelihood (NATs)

VIMCO(10) VIMCO(20) VIMCO(10)+PSP VIMCO(20)+PSP SS

DS1 -7108.43(0.26) -7108.35(0.21) -7108.41(0.16) -7108.42(0.10) -7108.42(0.18)DS2 -26367.70(0.12) -26367.71(0.09) -26367.72(0.08) -26367.70(0.10) -26367.57(0.48)DS3 -33735.08(0.11) -33735.11(0.11) -33735.10(0.09) -33735.07(0.11) -33735.44(0.50)DS4 -13329.90(0.31) -13329.98(0.20) -13329.94(0.18) -13329.93(0.22) -13330.06(0.54)DS5 -8214.36(0.67) -8214.74(0.38) -8214.61(0.38) -8214.55(0.43) -8214.51(0.28)DS6 -6723.75(0.68) -6723.71(0.65) -6724.09(0.55) -6724.34(0.45) -6724.07(0.86)DS7 -37332.03(0.43) -37331.90(0.49) -37331.90(0.32) -37332.03(0.23) -37332.76(2.42)DS8 -8653.34(0.55) -8651.54(0.80) -8650.63(0.42) -8650.55(0.46) -8649.88(1.75)

[Zhang and Matsen, ICLR 2019]

I Competitive to state-of-the-art (stepping-stone),dramatically reducing cost at test time: VBPI(1000) vsSS(100,000)

I PSP alleviates the demand for large samples, reducingcomputation while maintaining approximation accuracy

Page 53: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

Conclusion 22/23

I We introduced VBPI, a general variational framework forBayesian phylogenetic inference.

I VBPI allows efficient learning on both tree topology andbranch lengths, providing competitive performance toMCMC while requiring much less computation.

I Can be used for further statistical analysis (e.g., marginallikelihood estimation) via importance sampling.

I There are many extensions, including more flexible branchlength distributions, more general models, designingadaptive transition kernels in MCMC approaches, etc.

Page 54: Modern Computational Statistics [0.75em] Lecture 20 ... · Lecture 20: Applications in Computational Biology Cheng Zhang ... December 09, 2019. Introduction 2/23 I While modern statistical

References 23/23

I Sebastian Hohna and Alexei J. Drummond. Guided treetopology proposals for Bayesian phyloge- netic inference.Syst. Biol., 61(1):1–11, January 2012.

I Bret Larget. The estimation of tree posterior probabilitiesusing conditional clade probability distributions. Syst.Biol., 62(4):501–511, July 2013.

I Zhang, C. and Matsen F. A., Generalizing Tree ProbabilityEstimation via Bayesian Networks. In Advances in NeuralInformation Processing Systems, 2018.

I Zhang, C. and Matsen F. A., Variational BayesianPhylogenetic Inference. In Proceedings of the 7thInternational Conference on Learning Representations,2019.