consistent connectome classification · consistent connectome classi cation joshua t. vogelstein assistant research scientist dept. applied math & stats johns hopkins university right

consistent connectome classification

joshua t. vogelstein

assistant research scientistdept. applied math & statsjohns hopkins university

right now

survey

who here is a card carrying:

biologist

neuroscientist

mathematician

computer scientist

statistician

nota bene

if i use some jargon that you don’t know, please interrupt me!!!

let’s have fun

survey

who here is a card carrying:

biologist

neuroscientist

mathematician

computer scientist

statistician

nota bene

if i use some jargon that you don’t know, please interrupt me!!!

let’s have fun

outline

1 backgroundneurostats

2 methodsstatsneuro

3 resultsstatsneuro

4 discussion

motivation

International Neuroimaging Data-sharing Initiative (Child Mind Institute)

motivation

Connectome Project (Harvard University)

motivation

Institute for Data Intensive Engineering and Science (JHU)

background

human brains

O(1011) neurons (vertices)

O(1015) synapses (edges)

we believe the details matter (somewhat)

neuro-anatomists have proposed a multiscale structure

background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)

# of bytes in a hellabyte: 1027

# of chess positions: 1047

# of atoms in the universe: 1080

# go positions: 10170

background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)





background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)





background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)





background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)





background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)





background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)





background

graphs

G = (V ,E )

G ∈ Gn, |Gn| =?

|Gn| = 2(n2)





goals

long-term

colloquial: ”understand” the relationship between our brains and ourminds

math/cs: explain how brain-graphs encode and process information

stats: build statistical models that capture mental property variabilityconditional on brain-graph properties

short-term

learn/develop some statistics for graphs

be able to classify people into groups based on their brain-graphs

goals

long-term

colloquial: ”understand” the relationship between our brains and ourminds

math/cs: explain how brain-graphs encode and process information

stats: build statistical models that capture mental property variabilityconditional on brain-graph properties

short-term

learn/develop some statistics for graphs

be able to classify people into groups based on their brain-graphs

concrete goal

classify

given a collection of brain-graphs and associated class labels,

Ds = (Gi , yi )i∈[s], where (Gi , yi ) ∈ G × Y

build a classifier h : G → Y that takes a new graph G and estimatesits corresponding class.

(most?) previous work

ignore block structure

approach: take each graph, say that it is a matrix, vectorize it(concatenate columns), and apply standard machine learning stuff

problem: ignores graph structure, graphs are not matrices

ignore vertex labels

approach: take each graph, compute a bunch of graph invariants∗,and apply standard machine learning stuff

problem: ignores vertex label information, lacks theory

∗ a graph invariant is an function ψ : G → Rd that is invariant to vertexrelabeling, e.g., degree distribution.

(most?) previous work

ignore block structure

approach: take each graph, say that it is a matrix, vectorize it(concatenate columns), and apply standard machine learning stuff

problem: ignores graph structure, graphs are not matrices

ignore vertex labels

approach: take each graph, compute a bunch of graph invariants∗,and apply standard machine learning stuff

problem: ignores vertex label information, lacks theory

∗ a graph invariant is an function ψ : G → Rd that is invariant to vertexrelabeling, e.g., degree distribution.

our work

consistent graph classification

1 posit a joint random graph-class model, FGY = FGY(θ) : θ ∈ Θ2 construct an algorithm for this model

3 prove this algorithm is model consistent

4 demonstrate better than state-of-the-art performance on real data

probabilistic theory of pattern recognition

G : Ω→ GY : Ω→ Y(G ,Y ), (Gi , yi )i∈[s]

exch.∼ FGY ∈ FGY

Bayes optimal classifier:

h∗(G ) = argminy∈Y

FG=G |Y=yFY=y

Bayes error L∗F is the misclassification rate of h∗

our work

consistent graph classification

1 posit a joint random graph-class model, FGY = FGY(θ) : θ ∈ Θ2 construct an algorithm for this model

3 prove this algorithm is model consistent

4 demonstrate better than state-of-the-art performance on real data

probabilistic theory of pattern recognition

G : Ω→ GY : Ω→ Y(G ,Y ), (Gi , yi )i∈[s]

exch.∼ FGY ∈ FGY

Bayes optimal classifier:

h∗(G ) = argminy∈Y

FG=G |Y=yFY=y

Bayes error L∗F is the misclassification rate of h∗

outline


2 methodsstatsneuro

3 resultsstatsneuro

4 discussion

signal subgraph random graph classification model

assumptions

1 G1(V ) = G2(V ) = · · · = Gs(V ) and all v ∈ V are uniquely labeled

2 edges are sampled independently

3 only the signal subgraph contains class conditional signal

formalizing assumptions

FG|Y = FA|Y (1)

=∏

(u,v)∈E

Bernoulli(auv ; ηuv |y ) (2)

=∏

(u,v)∈S

Bernoulli(auv ; ηuv |y )∏

(u,v)/∈S

Bernoulli(auv ; ηuv ) (3)

signal subgraph classifier

bayes optimal classifier

h∗(G ) = FG=G |Y=yFY=y

=∏

(u,v)∈S

Bernoulli(auv ; ηuv |y )Bernoulli(y ;πy )

bayes plugin classifier

hs(G ) = F sG=G |Y=yF

sY=y

=∏

(u,v)∈SsBernoulli(auv ; ηsuv |y )Bernoulli(y ; πsy )

signal subgraph classifier

our tasks

1 estimate πy ∀y ∈ Y2 estimate S3 estimate ηuv |y ∀(u, v) ∈ S, y ∈ Y

estimating the prior

trivial

πMLEy = sy/s

estimating the likelihood terms

less trivial

estimator equation result

MLE 1sy

∑i :yi=y a

(i)uv fails

objective MAP B(1/2, 1/2) weirdweakly informative MAP B(1, 1) fails

spike & slab bern + beta didn’t tryBishop et al. (’73) ωηMLE + (1− ω)λ expensive

our L-estimator look ↓ better for us

ηuv |y =

εn if maxi :yi=y a

(i)uv = 0

1− εn if mini :yi=y a(i)uv = 1

ηMLEuv |y otherwise


less trivial


MLE 1sy

∑i :yi=y a

(i)uv fails

objective MAP B(1/2, 1/2) weird

weakly informative MAP B(1, 1) failsspike & slab bern + beta didn’t try

Bishop et al. (’73) ωηMLE + (1− ω)λ expensiveour L-estimator look ↓ better for us

ηuv |y =

εn if maxi :yi=y a

(i)uv = 0




less trivial


MLE 1sy

∑i :yi=y a

(i)uv fails




ηuv |y =

εn if maxi :yi=y a

(i)uv = 0




less trivial


MLE 1sy

∑i :yi=y a

(i)uv fails


spike & slab bern + beta didn’t try

Bishop et al. (’73) ωηMLE + (1− ω)λ expensiveour L-estimator look ↓ better for us

ηuv |y =

εn if maxi :yi=y a

(i)uv = 0




less trivial


MLE 1sy

∑i :yi=y a

(i)uv fails




ηuv |y =

εn if maxi :yi=y a

(i)uv = 0




less trivial


MLE 1sy

∑i :yi=y a

(i)uv fails




ηuv |y =

εn if maxi :yi=y a

(i)uv = 0




less trivial


MLE 1sy

∑i :yi=y a

(i)uv fails




ηuv |y =

εn if maxi :yi=y a

(i)uv = 0




L-estimators

a linear combination of (nonlinear functions of) order statistics

h(x (i))

=

εn if maxi :yi=y a

(i)uv = 0


x (i) otherwise

ani = 1/n

Thm 1: ηP→ ηMLE as s →∞.

Proof: it is an L-estimator.


L-estimators

a linear combination of (nonlinear functions of) order statistics

h(x (i))

=

εn if maxi :yi=y a

(i)uv = 0


x (i) otherwise

ani = 1/n

Thm 1: ηP→ ηMLE as s →∞.

Proof: it is an L-estimator.

estimating the signal subgraph

some defintions

signal subgraph: S = u ∼ v : ηuv |y 6= ηuv |y ′signal vertices: V = argminV ′ V ′ where

V ′ = v : ∀u ∼ v ∈ S, u ∪ v ∈ V ′

incoherent estimator: assume |S| = q n2 is known

coherent estimator: assume |V| = m n is known


incoherent estimator

recall: edges are assumed to be independent

construct a test for each edge

H0 : ηuv |y = ηuv |y ′

HA : ηuv |y 6= ηuv |y ′

Fisher’s exact test is optimal under our assumptions

let puv be the p-value for this test

rank-order the p-values, p(1) < p(2) < · · · < p((n2))

let Ssinc = p(1), . . . , p(q)

Thm 2: SsincP→ S as s →∞

Proof: puvP→ 0 ∀ (u, v) ∈ S and puv

P→ U(0, 1) ∀ (u, v) /∈ S


coherent estimator

again, get p-values the same way

rank-order p-values for each u ∈ V , pu,(1) < pu,(2) < · · · < pu,(n)

find a collection ofm vertices who collectively have q edges that aremost significant, call those edges SscohThm 3: Sscoh

P→ S as s →∞Proof: idem.

neuro data analysis

the process

collect diffusion MRI data from 50 people in 2 groups

estimate a brain-graph for each person

pretend brain-graphs are perfect estimates

build classifiers

compare results

data collection

diffusion-weighted MRI (e.g., diffusion tensor imaging)

brains have large fiber bundles

water primarily diffuses within bundles, not across them

at each voxel, we can estimate the primary direction of diffusion

we can estimate the most likely path from each voxel

all voxels along the path we say are connected

we can also parcellate the brain into 70 regions

we can then compress the graph into 70 vertices

graph inference

(Mr. Cap): magnetic resonance connectome automated pipeline

outline


2 methodsstatsneuro

3 resultsstatsneuro

4 discussion

simulated data analysis

set |S| = 20, |V| = 1, η, and π

generate s samples (Gi , yi )iid∼ FGY (θ)

estimate parameters using both strategies

compare edge selection and classification performance

a single simulation

a monte carlo experiment

relative efficiency

brain-graphs!!!

average brain-graphs

Figure: caption

leave-one-out cross-validation misclassification rate

synthetic data analysis

model assumption checking

signal subgraph

bake-off

complementary approaches

nonparametric: k-nearest neighbor

semiparametric: metric embedding infinite gaussian mixture model

hackometric: graph invariants PCA SVM

theoretical properties

nonparametric: universally consistent (see [1] or [2] for proof)

semiparametric: universally consistent? (proof in progress)

hackometric: NOT

bake-off








hackometric: NOT

bake-off








hackometric: NOT

bake-off








hackometric: NOT

bake-off








hackometric: NOT

bake-off

results

classifier error (%)

naive bayes 41incoherent 27coherent 16

kNN 20CW6GI 20

kNNPCA12GI 16semiparametric 16

outline


2 methodsstatsneuro

3 resultsstatsneuro

4 discussion

summary

stats

first (to our knowledge) to develop probabilistic classifiers thatnatively operate on graph-valued data

the combination of parametric, semiparametric, nonparametric, andhackometric spans the space of strategies that one might employ

neuro

we can now efficiently (in parallel) estimate brain-graphs

better than state-of-the-art performance

next steps

stats

semiparametric theory

more interesting generative process story

generalize to multiclass and regression

bayesianize

neuro

data start as 128× 128× 128× 256 ∈ R50M per subject

we then call our 70× 70 graphs high-dimensional

we have O(103) brain-scans

building a neuroimaging database to efficiently process/query datastaticize “pre-processing”

can apply to function connectivity as well

acknowledgements

brother: R. Jacob Vogelstein, PhD

postdoc advisor: Carey E. Priebe, PhD

grad student: William R. Gray

his advisor: Jerry L. Prince, PhD

presented data: Susan Resnick, PhD

large data collection: Michael Milham, PhD, MD

database: Randal Burns, PhD

anything

email: [email protected]

all code, pre-prints, etc, available at my website: http://jovo.me

all brain data: http://openconnecto.me

[email protected]

http://jovo.me

http://openconnecto.me

Documents

consistent connectome classification · consistent connectome classi cation joshua t. vogelstein assistant research scientist dept. applied math & stats johns hopkins university right