Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
consistent connectome classification
joshua t. vogelstein
assistant research scientistdept. applied math & statsjohns hopkins university
right now
survey
who here is a card carrying:
biologist
neuroscientist
mathematician
computer scientist
statistician
nota bene
if i use some jargon that you don’t know, please interrupt me!!!
let’s have fun
survey
who here is a card carrying:
biologist
neuroscientist
mathematician
computer scientist
statistician
nota bene
if i use some jargon that you don’t know, please interrupt me!!!
let’s have fun
outline
1 backgroundneurostats
2 methodsstatsneuro
3 resultsstatsneuro
4 discussion
motivation
International Neuroimaging Data-sharing Initiative (Child Mind Institute)
motivation
Connectome Project (Harvard University)
motivation
Institute for Data Intensive Engineering and Science (JHU)
background
human brains
O(1011) neurons (vertices)
O(1015) synapses (edges)
we believe the details matter (somewhat)
neuro-anatomists have proposed a multiscale structure
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
background
graphs
G = (V ,E )
G ∈ Gn, |Gn| =?
|Gn| = 2(n2)
# of bytes in a hellabyte: 1027
# of chess positions: 1047
# of atoms in the universe: 1080
# go positions: 10170
goals
long-term
colloquial: ”understand” the relationship between our brains and ourminds
math/cs: explain how brain-graphs encode and process information
stats: build statistical models that capture mental property variabilityconditional on brain-graph properties
short-term
learn/develop some statistics for graphs
be able to classify people into groups based on their brain-graphs
goals
long-term
colloquial: ”understand” the relationship between our brains and ourminds
math/cs: explain how brain-graphs encode and process information
stats: build statistical models that capture mental property variabilityconditional on brain-graph properties
short-term
learn/develop some statistics for graphs
be able to classify people into groups based on their brain-graphs
concrete goal
classify
given a collection of brain-graphs and associated class labels,
Ds = (Gi , yi )i∈[s], where (Gi , yi ) ∈ G × Y
build a classifier h : G → Y that takes a new graph G and estimatesits corresponding class.
(most?) previous work
ignore block structure
approach: take each graph, say that it is a matrix, vectorize it(concatenate columns), and apply standard machine learning stuff
problem: ignores graph structure, graphs are not matrices
ignore vertex labels
approach: take each graph, compute a bunch of graph invariants∗,and apply standard machine learning stuff
problem: ignores vertex label information, lacks theory
∗ a graph invariant is an function ψ : G → Rd that is invariant to vertexrelabeling, e.g., degree distribution.
(most?) previous work
ignore block structure
approach: take each graph, say that it is a matrix, vectorize it(concatenate columns), and apply standard machine learning stuff
problem: ignores graph structure, graphs are not matrices
ignore vertex labels
approach: take each graph, compute a bunch of graph invariants∗,and apply standard machine learning stuff
problem: ignores vertex label information, lacks theory
∗ a graph invariant is an function ψ : G → Rd that is invariant to vertexrelabeling, e.g., degree distribution.
our work
consistent graph classification
1 posit a joint random graph-class model, FGY = FGY(θ) : θ ∈ Θ2 construct an algorithm for this model
3 prove this algorithm is model consistent
4 demonstrate better than state-of-the-art performance on real data
probabilistic theory of pattern recognition
G : Ω→ GY : Ω→ Y(G ,Y ), (Gi , yi )i∈[s]
exch.∼ FGY ∈ FGY
Bayes optimal classifier:
h∗(G ) = argminy∈Y
FG=G |Y=yFY=y
Bayes error L∗F is the misclassification rate of h∗
our work
consistent graph classification
1 posit a joint random graph-class model, FGY = FGY(θ) : θ ∈ Θ2 construct an algorithm for this model
3 prove this algorithm is model consistent
4 demonstrate better than state-of-the-art performance on real data
probabilistic theory of pattern recognition
G : Ω→ GY : Ω→ Y(G ,Y ), (Gi , yi )i∈[s]
exch.∼ FGY ∈ FGY
Bayes optimal classifier:
h∗(G ) = argminy∈Y
FG=G |Y=yFY=y
Bayes error L∗F is the misclassification rate of h∗
outline
1 backgroundneurostats
2 methodsstatsneuro
3 resultsstatsneuro
4 discussion
signal subgraph random graph classification model
assumptions
1 G1(V ) = G2(V ) = · · · = Gs(V ) and all v ∈ V are uniquely labeled
2 edges are sampled independently
3 only the signal subgraph contains class conditional signal
formalizing assumptions
FG|Y = FA|Y (1)
=∏
(u,v)∈E
Bernoulli(auv ; ηuv |y ) (2)
=∏
(u,v)∈S
Bernoulli(auv ; ηuv |y )∏
(u,v)/∈S
Bernoulli(auv ; ηuv ) (3)
signal subgraph classifier
bayes optimal classifier
h∗(G ) = FG=G |Y=yFY=y
=∏
(u,v)∈S
Bernoulli(auv ; ηuv |y )Bernoulli(y ;πy )
bayes plugin classifier
hs(G ) = F sG=G |Y=yF
sY=y
=∏
(u,v)∈SsBernoulli(auv ; ηsuv |y )Bernoulli(y ; πsy )
signal subgraph classifier
our tasks
1 estimate πy ∀y ∈ Y2 estimate S3 estimate ηuv |y ∀(u, v) ∈ S, y ∈ Y
estimating the prior
trivial
πMLEy = sy/s
estimating the likelihood terms
less trivial
estimator equation result
MLE 1sy
∑i :yi=y a
(i)uv fails
objective MAP B(1/2, 1/2) weirdweakly informative MAP B(1, 1) fails
spike & slab bern + beta didn’t tryBishop et al. (’73) ωηMLE + (1− ω)λ expensive
our L-estimator look ↓ better for us
ηuv |y =
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
ηMLEuv |y otherwise
estimating the likelihood terms
less trivial
estimator equation result
MLE 1sy
∑i :yi=y a
(i)uv fails
objective MAP B(1/2, 1/2) weird
weakly informative MAP B(1, 1) failsspike & slab bern + beta didn’t try
Bishop et al. (’73) ωηMLE + (1− ω)λ expensiveour L-estimator look ↓ better for us
ηuv |y =
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
ηMLEuv |y otherwise
estimating the likelihood terms
less trivial
estimator equation result
MLE 1sy
∑i :yi=y a
(i)uv fails
objective MAP B(1/2, 1/2) weirdweakly informative MAP B(1, 1) fails
spike & slab bern + beta didn’t tryBishop et al. (’73) ωηMLE + (1− ω)λ expensive
our L-estimator look ↓ better for us
ηuv |y =
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
ηMLEuv |y otherwise
estimating the likelihood terms
less trivial
estimator equation result
MLE 1sy
∑i :yi=y a
(i)uv fails
objective MAP B(1/2, 1/2) weirdweakly informative MAP B(1, 1) fails
spike & slab bern + beta didn’t try
Bishop et al. (’73) ωηMLE + (1− ω)λ expensiveour L-estimator look ↓ better for us
ηuv |y =
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
ηMLEuv |y otherwise
estimating the likelihood terms
less trivial
estimator equation result
MLE 1sy
∑i :yi=y a
(i)uv fails
objective MAP B(1/2, 1/2) weirdweakly informative MAP B(1, 1) fails
spike & slab bern + beta didn’t tryBishop et al. (’73) ωηMLE + (1− ω)λ expensive
our L-estimator look ↓ better for us
ηuv |y =
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
ηMLEuv |y otherwise
estimating the likelihood terms
less trivial
estimator equation result
MLE 1sy
∑i :yi=y a
(i)uv fails
objective MAP B(1/2, 1/2) weirdweakly informative MAP B(1, 1) fails
spike & slab bern + beta didn’t tryBishop et al. (’73) ωηMLE + (1− ω)λ expensive
our L-estimator look ↓ better for us
ηuv |y =
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
ηMLEuv |y otherwise
estimating the likelihood terms
less trivial
estimator equation result
MLE 1sy
∑i :yi=y a
(i)uv fails
objective MAP B(1/2, 1/2) weirdweakly informative MAP B(1, 1) fails
spike & slab bern + beta didn’t tryBishop et al. (’73) ωηMLE + (1− ω)λ expensive
our L-estimator look ↓ better for us
ηuv |y =
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
ηMLEuv |y otherwise
estimating the likelihood terms
L-estimators
a linear combination of (nonlinear functions of) order statistics
h(x (i))
=
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
x (i) otherwise
ani = 1/n
Thm 1: ηP→ ηMLE as s →∞.
Proof: it is an L-estimator.
estimating the likelihood terms
L-estimators
a linear combination of (nonlinear functions of) order statistics
h(x (i))
=
εn if maxi :yi=y a
(i)uv = 0
1− εn if mini :yi=y a(i)uv = 1
x (i) otherwise
ani = 1/n
Thm 1: ηP→ ηMLE as s →∞.
Proof: it is an L-estimator.
estimating the signal subgraph
some defintions
signal subgraph: S = u ∼ v : ηuv |y 6= ηuv |y ′signal vertices: V = argminV ′ V ′ where
V ′ = v : ∀u ∼ v ∈ S, u ∪ v ∈ V ′
incoherent estimator: assume |S| = q n2 is known
coherent estimator: assume |V| = m n is known
estimating the signal subgraph
incoherent estimator
recall: edges are assumed to be independent
construct a test for each edge
H0 : ηuv |y = ηuv |y ′
HA : ηuv |y 6= ηuv |y ′
Fisher’s exact test is optimal under our assumptions
let puv be the p-value for this test
rank-order the p-values, p(1) < p(2) < · · · < p((n2))
let Ssinc = p(1), . . . , p(q)
Thm 2: SsincP→ S as s →∞
Proof: puvP→ 0 ∀ (u, v) ∈ S and puv
P→ U(0, 1) ∀ (u, v) /∈ S
estimating the signal subgraph
coherent estimator
again, get p-values the same way
rank-order p-values for each u ∈ V , pu,(1) < pu,(2) < · · · < pu,(n)
find a collection ofm vertices who collectively have q edges that aremost significant, call those edges SscohThm 3: Sscoh
P→ S as s →∞Proof: idem.
neuro data analysis
the process
collect diffusion MRI data from 50 people in 2 groups
estimate a brain-graph for each person
pretend brain-graphs are perfect estimates
build classifiers
compare results
data collection
diffusion-weighted MRI (e.g., diffusion tensor imaging)
brains have large fiber bundles
water primarily diffuses within bundles, not across them
at each voxel, we can estimate the primary direction of diffusion
we can estimate the most likely path from each voxel
all voxels along the path we say are connected
we can also parcellate the brain into 70 regions
we can then compress the graph into 70 vertices
graph inference
(Mr. Cap): magnetic resonance connectome automated pipeline
outline
1 backgroundneurostats
2 methodsstatsneuro
3 resultsstatsneuro
4 discussion
simulated data analysis
set |S| = 20, |V| = 1, η, and π
generate s samples (Gi , yi )iid∼ FGY (θ)
estimate parameters using both strategies
compare edge selection and classification performance
a single simulation
a monte carlo experiment
relative efficiency
brain-graphs!!!
average brain-graphs
Figure: caption
leave-one-out cross-validation misclassification rate
synthetic data analysis
model assumption checking
signal subgraph
bake-off
complementary approaches
nonparametric: k-nearest neighbor
semiparametric: metric embedding infinite gaussian mixture model
hackometric: graph invariants PCA SVM
theoretical properties
nonparametric: universally consistent (see [1] or [2] for proof)
semiparametric: universally consistent? (proof in progress)
hackometric: NOT
bake-off
complementary approaches
nonparametric: k-nearest neighbor
semiparametric: metric embedding infinite gaussian mixture model
hackometric: graph invariants PCA SVM
theoretical properties
nonparametric: universally consistent (see [1] or [2] for proof)
semiparametric: universally consistent? (proof in progress)
hackometric: NOT
bake-off
complementary approaches
nonparametric: k-nearest neighbor
semiparametric: metric embedding infinite gaussian mixture model
hackometric: graph invariants PCA SVM
theoretical properties
nonparametric: universally consistent (see [1] or [2] for proof)
semiparametric: universally consistent? (proof in progress)
hackometric: NOT
bake-off
complementary approaches
nonparametric: k-nearest neighbor
semiparametric: metric embedding infinite gaussian mixture model
hackometric: graph invariants PCA SVM
theoretical properties
nonparametric: universally consistent (see [1] or [2] for proof)
semiparametric: universally consistent? (proof in progress)
hackometric: NOT
bake-off
complementary approaches
nonparametric: k-nearest neighbor
semiparametric: metric embedding infinite gaussian mixture model
hackometric: graph invariants PCA SVM
theoretical properties
nonparametric: universally consistent (see [1] or [2] for proof)
semiparametric: universally consistent? (proof in progress)
hackometric: NOT
bake-off
results
classifier error (%)
naive bayes 41incoherent 27coherent 16
kNN 20CW6GI 20
kNNPCA12GI 16semiparametric 16
outline
1 backgroundneurostats
2 methodsstatsneuro
3 resultsstatsneuro
4 discussion
summary
stats
first (to our knowledge) to develop probabilistic classifiers thatnatively operate on graph-valued data
the combination of parametric, semiparametric, nonparametric, andhackometric spans the space of strategies that one might employ
neuro
we can now efficiently (in parallel) estimate brain-graphs
better than state-of-the-art performance
next steps
stats
semiparametric theory
more interesting generative process story
generalize to multiclass and regression
bayesianize
neuro
data start as 128× 128× 128× 256 ∈ R50M per subject
we then call our 70× 70 graphs high-dimensional
we have O(103) brain-scans
building a neuroimaging database to efficiently process/query datastaticize “pre-processing”
can apply to function connectivity as well
acknowledgements
brother: R. Jacob Vogelstein, PhD
postdoc advisor: Carey E. Priebe, PhD
grad student: William R. Gray
his advisor: Jerry L. Prince, PhD
presented data: Susan Resnick, PhD
large data collection: Michael Milham, PhD, MD
database: Randal Burns, PhD
anything
email: [email protected]
all code, pre-prints, etc, available at my website: http://jovo.me
all brain data: http://openconnecto.me