Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Using indication embeddings to represent patient health for drug safety studies
Rachel MelamedBiomedical Data Science @University of Chicago &
Biology @ UMass [email protected] | @RDMelamed
Goal: high-throughput drug safety studiesRandomized trials:
low-throughput but
unbiased
1) From data select people..
✄ ☤☤✄
☤✄ ☤
☤
☤
1) Enroll cohort 2) Randomize treatment 3) Compare experimental groups
Cohort studies: reuse
health data to emulate
randomized trial.
Drug safetyDoes taking this drug
change your risk of
some health outcome?
Exposure Cancer☤Health data
X-ray
ast
hm
a
stat
in
arthritis
✄ ☤
5,000
Precriptions
10,000
Diagnosis
codes
20,000
Procedure
codes
…taking treatment drug
…or comparator drug
Can we match without expert
design?
Creating indication embeddings
Evaluating embeddings
The challenge: confounding
Embeddings identify comparator drugs
Match with embeddings
age
aspirin
P(treated | age,..)
Propensity score match
HEALTHage aspirin
☤diabetes
Evaluate with
propensity score:
P(treated) in
treated cohort
P(treated)
in comp.
P(treated | …)
2) Match to emulate randomization
☤
High-throughput cohort studies Currently, cohort study
design relies on domain
experts:
never
often
25 75age
aspirin
Expert
Task 1
Find suitable
comparator drug
Expert
Task 2
Design
matching—
identify
confounders
The solution: matching
age
Match on confounders! "#$%"$& %'$, %)!*#*+, … )
Insulin resistance
insulin, Type 2 diabetes
amoxi
cillin
xanax …other Rx,
Dx, Px
How to match on 30,000+ dimensional,
sparse, uninformative vectors?Instead map them to small, meaningful
embeddings
Indication
embedding
Drug
embedding
Training task:
Predict drug
Simple neural network
150 million patient histories
☤blo
od lab
gout
statin
diab
etes✄
New Rx metformin
History
event
New Rx
( , )
( , )
Create training
examples
( , )
( , )✄
Embeddings relate codes to health needs
☤
Drug embedding = drugs
given in most similar health
contexts
Indication embedding =
health context for prescription
of a new drug
Map each event to 50-dimensional vector
For each drug, performance of embedding
distance to predict indications. Overall
ROCAUC = .82
auc
Dot-products between
antidepressants and selected
closest diagnoses.
tricyclicSSRI
SNRI
anticonvulsants
antipsychotics
Drugs with closest embedding dot
product are more comparable, as
measured by AUC
Drugs with same therapeutic use
as carbamazepine: primarily
anticonvulsant, off-label for bipolar.aripiprazoleasenapinecarbamazepinechlorpromazineclozapinedivalproex_sodiumfluphenazinegabapentinhaloperidolhaloperidol_lactatelacosamidelamotriginelevetiracetamlithium_carbonatelurasidoneOlanzapine
✄ ☤
☤✄☤
✄ ☤☤
☤
ROC AUC
Expert
Task 1
Thera
peutic c
lass
oxcarbazepinepaliperidoneperphenazineprimidoneprochlorperazineprochlorperazine_maleatequetiapine_fumaraterisperidonethioridazinethiothixenetiagabinetopiramatetrifluoperazinevalproic_acidziprasidonezonisamide
2003
2013
25 75age
year
Step 1:
Coarsened exact matching by
age, gender, year, number of Rx
Step 2:
Encode histories àsmall dense vectors
Step 3:
Mahalanobis match on health
summaries within bins
Indication
embedding
(RV x E )
✄
Weighted average
(upweight recent history)
✄ ☤
Embedding matching
Expert
Task 2
!
☤
Embedding can match for key confoundersPropensity score matching
Matching people on
bupropion to trazodone is
complicated by alternate
indication of bupropion for
smoking cessation. Each
point is one person on
bupropion, trazodone, or
varenicline.
Health summary vectors
Embedding better matches
nonsmokers to nonsmokers
0 0 1 0 0 0 0 0 1 0 1 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
0 .2 1
0 .1 .8
Then do simple nearest-
neighbor matching