16
Informative Subspace Learning for Counterfactual Inference Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy Department of Electrical and Computer Engineering Northeastern University February 9, 2017

Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Informative Subspace Learning for Counterfactual Inference

Informative Subspace Learning for Counterfactual Inference

Yale Chang Jennifer G. DyDepartment of Electrical and Computer Engineering

Northeastern University

February 9, 2017

Page 2: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Motivation: Why Causal Inference?

Treatment Outcome?

New Medication

Blood Pressure

?

Job Training Employee’s Income

?Ø Healthcare

Ø Economics

Ø Advertising Advertising Campaign

Company’s Revenue

?

Question of Interest:

Page 3: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Challenges

Figures: Shalit & Sontag www.cs.nyu.edu/~shalit/tutorial.html

?Potential Outcome Framework

Ø Only one outcome can be observed

Randomized Controlled Trial

Observational Data

Ø Confounding factors

Page 4: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Contributions of This Work

Ø Propose a novel approach for causal inference on observational data.

Ø Speed up the proposed approach (reducing quadratic to linear complexity) via randomized approximation and provide theoretical results proving an upper bound on the approximation error.

Ø Empirical results on simulated and real-world data demonstrate that our proposed approach outperforms competing methods.

Page 5: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Potential Outcome Framework

Age

Bloo

d Pr

essu

re

Control Outcome

Treatment Outcome

Factual Outcome

Page 6: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Potential Outcome Framework

Age

Bloo

d Pr

essu

re

Control Outcome

Treatment Outcome

Factual Outcome

Counterfactual Outcome

Page 7: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Potential Outcome Framework

Age

Bloo

d Pr

essu

re

ITEITE

ITE: Individual Treatment Effect

Page 8: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Potential Outcome Framework

Age

Bloo

d Pr

essu

re

ITEITE

ITE: Individual Treatment Effect

ATE:Average Treatment Effect

ATE

Page 9: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Nearest Neighbor Matching

Ø Set each sample’s counterfactual outcome equal to factual outcome of its nearest neighbor in the opposite group

Ø Distance can be measured by Euclidean metric

Age

Bloo

d Pr

essu

re

Page 10: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Nearest Neighbor Matching

Ø Not all features affect the outcome.

Ø Need learn informative subspaces (predictive of outcomes) for both treatment and control group before matching.

However!

In this case, only age affects the outcome

AgeWeight

Blo

od P

ress

ure

Page 11: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Informative Subspace Learning

Key Property: make samples 𝑥" with similar outcomes 𝑦" be close in the learned subspace.

𝐾% =𝑠𝑖𝑚(𝑦+, 𝑦+) ⋯ 𝑠𝑖𝑚(𝑦+, 𝑦/)

⋮ ⋱ ⋮𝑠𝑖𝑚(𝑦/, 𝑦+) ⋯ 𝑠𝑖𝑚(𝑦/, 𝑦/)

Learn projection matrix 𝑊 ∈ ℝ5×7 to map 𝑥" ∈ ℝ5 to its low dimensional embedding 𝑧" = 𝑊9𝑥" ∈ ℝ7 to preserve the similarity structure in 𝑌.

𝐾< =𝑠𝑖𝑚(𝑧+, 𝑧+) ⋯ 𝑠𝑖𝑚(𝑧+, 𝑧/)

⋮ ⋱ ⋮𝑠𝑖𝑚(𝑧/, 𝑧+) ⋯ 𝑠𝑖𝑚(𝑧/, 𝑧/)

Maximize Hilbert-Schmidt Independence Criterion (HSIC) between 𝒁 and 𝒀!

HSIC Z, Y = 1

𝑛(𝑛 − 1)𝑇𝑟 𝐾< 𝐾%

=1

𝑛(𝑛 − 1)KK𝐾L 𝑖, 𝑗 𝐾%(𝑖, 𝑗)

/

NO+

/

"O+

Page 12: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Error Bound on HSIC Approximation

ChallengeThe storage and computation of kernels are quadratic w.r.t. sample size!

SolutionApproximate kernel matrices with random Fourier features.

𝐾<

𝐾%

PPQR

SSQT

𝐹 ∈ ℝ/×R

𝐾< ∈ ℝ/×/

𝐾% ∈ ℝ/×/

𝐺 ∈ ℝ/×T

𝑚, 𝑙 are the numbers of random Fourier features 𝑚, 𝑙 ≪ 𝑛

Approximation Error Bound

𝔼 |𝑒𝑟𝑟𝑜𝑟|≤ /

/^+_/ `ab /RT

cd/ `ab /RT

Page 13: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Learning Objective

maxh

HSIC Z, Y − 𝜆||𝑊||Pd

Ø Solved with L-BFGS

Ø Time complexity: 𝒪(𝑛(𝑚𝑑 +𝑚𝑙 + 𝑑𝑞))

Ø Storage cost: 𝒪(𝑛(𝑑 +𝑚 + 𝑙))

Page 14: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Infant Health Development Data

MDM PSM RLP LASSO BART CausalForest

Proposed

Page 15: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

News Data

MDM PSM RLP LASSO BART CausalForest

Proposed

Page 16: Informative Subspace Learning for Counterfactual Inferenceychang/papers/AAAI_2017_talk.pdf · Informative Subspace Learning for Counterfactual Inference Yale Chang Jennifer G. Dy

Summary

ØSignificantly improve nearest-neighbor matching for counterfactual inference through informative subspace learning.

ØSpeed up HSIC computation via random Fourier features and provided proof on an upper bound on the approximation error.

ØEmpirically show state-of-the-art performance on real datasets.