View
215
Download
1
Category
Preview:
Citation preview
BIG (NEURO)
STATISTICS
by, Joshua T. Vogelstein Duke University, Dept. Stats
JHU, IDIES (cosmology) Kavli Salon, Here & Now
Outline
• Background & Motivation
• Computational Challenges
• Statistical Challenges
• Summary & Discussion
Neuroscientific Aims• Lay: “understand” the brain and its relationship to the mind
• Formal: let X = brain, and Y = mind:
• construct models: {P𝛉[X,Y] : 𝛉 ϵ 𝚹}
• estimate parameters: 𝛉* = argmax P𝛉[X,Y]
• make predictions: y = argmax P𝛉[X=x | Y=y’]P[Y=y’]
• test theories: P𝛉[X=x | Y=0] ≠ P𝛉[X=x | Y=1]
Motivations
• Y = awesome computational power
• Y = personality type
• Y = psychiatric disorder
(reminder: X = brain, and Y = mind)
other examples: Higgs boson, galaxy, mutant, etc.
Big Data Challenges
Computational Statistical
Memory/Storage High-Dimensions
Writes & Indexing Outliers
Scalable algorithms Non-Euclidean
Outline
• Background & Motivation
• Computational Challenges
• Statistical Challenges
• Summary & Discussion
Computational Desiderata
Scalable Computer Vision
Multidimensional Database
4D alignment Seamless integration
Color correction Spatial (locality) indexing
Scene segmentation Massive writes
Take Home Messagesstuff we lack
Computational Statistical
Scalable Computer Vision
Multidimensional Spatial Databases
Outline
• Background & Motivation
• Computational Challenges
• Statistical Challenges
• Summary & Discussion
Motivating (Descriptive) Challenge
• Growth charts are useful prognostic tools
• We want “growth” chart for the brain (using ‘objective’ measurements)
• This requires estimating descriptive statistics, such as the mean
Motivating (Predictive) Challenge
• Biomarkers for clinical disorders are useful
• We want biomarkers for various psychiatric disorders
• This requires estimating predictive statistics, such as a classifier
Statistical Taxonomy
Descriptive Predictive # samples
Little x x d ≪ n
Big x x D ≫ n
for i ϵ {1,2,…,n}
Example ChallengesDescriptive Predictive
class exemplars class exemplars
location mean, median
two-class classification
LDA, QDA
scale variance, MAD
n-class classification
kNN, CART
matrix factorizations
eigenvectors NMF regression CART,
SVRdensity
estimation KDE multivariate regression RRR
multimodal fusion CCA manifold
matching JoFC
Challenge 1: High-Dimensions• simplest descriptive example possible: estimate mean
• little solution: x* = argmin Σi(xi - x)2
• big challenge 1: x*is inadmissible when d > 2
Challenge 1: High-Dimensions• simplest descriptive example possible: 2-class classification
• little solution: little solution (LDA): y* = argmaxy N(x; my, S) py
• big challenge: S is singular when D ≫ n
Challenge 2: Outliers• simplest predictive example: estimate mean with outliers
• little solution (median): x* = argmin Σi abs(xi - x)
• big challenge: median is not well-defined when d > 1
Challenge 2: Outliers• simplest predictive example: classify with outliers
• little solution: SVM
• big challenge: support vectors are even more singular
Challenge 3: Non-Euclidean
• simplest descriptive example: estimate mean when x ∉ Rd
• little solution: x* = (x1 + x2 + … + xn)/n
• big challenge: no obvious choice for ‘+’, e.g., what is:
+ = ?A + B = C
Take Home Messagesstuff we lack
Computational Statistical
Scalable Computer Vision
‘Default’ Descriptive Theory & Methods
Multidimensional Spatial Databases
‘Default’ Predictive Theory & Methods
AcknowledgementsTheory Carey E. Priebe, Mauro Maggioni, David Dunson
Code Randal Burns,
Data M Milham, K Deisseroth, J Lichtman, C Reid, S Smith
Funds XDATA (DARPA), BIGDATA & CRCNS (NIH/NSF)
Love yummy, family, friends, earth, universe, multiverse?
e: jo.vo@duke.edu, c:443.858.9911 w: jovo.me, openconnecto.me
Recommended