Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CRICOS No. 00213Ja university for the worldrealR
Bayesian modelling in the
big data era: case studies
in understanding
neurodegenerative
disease
Kerrie Mengersen
Distinguished Professor
Statistics, QUT
CRICOS No. 00213Ja university for the worldrealR
Acknowledgements
• Peter Silburn (MD, PD)
• Graham Kerr (QUT, PD)
• James Doecke (CSIRO, AD)
• BRAG (QUT)
• ACEMS (QUT)
CRICOS No. 00213Ja university for the worldrealR
https://www.youtube.com/watch?v=PI7SLOovO5c
CRICOS No. 00213Ja university for the worldrealR
Tackling Big Data with Data Science
DataScience@QUT
Australian Data Science Network
CRICOS No. 00213Ja university for the worldrealR
Partnerships in research and
translation
Domain-specific data science
research
Fundamentaldata science
research
Mathematical Sciences
+
Computer Science
DataScience@QUT
+
Collaborative Domains
Australian Data Science Network
+
External Partners
CRICOS No. 00213Ja university for the worldrealR
Advancing Training in Data Science
Creating and enhancing data science capability and capacity
Research Training HDRs and ECRs
Undergraduate vacation research programs
Targeted programs, e.g., Women in Data Science, Indigenous Data Science
Knowledge Exchange Peer-to-peer connections: internships, professional sabbaticals
Short courses, seminars, conferences, workshops, MOOC
Directory of Data Science Resources
QUT Activities Master of Data Analytics
Work Integrated Learning undergraduate program
QUT EX Program
e-Research Training Program
Linkages ARC Training Centres
CRICOS No. 00213Ja university for the worldrealR
Bayesian Research and Applications Group
(BRAG) at QUT
Modelling
• Combining data
sources
• Modelling with
uncertainty
• Using prior
information
• Probabilistic
prediction
• Risk stratification
• Complex systems
models
Computational Methods
• Algorithms
(MCMC, ABC, VB, HMC)
• Software
(R, Python, Julia)
• Approximations
• Ensembles
• Platforms
• Dimension reduction
• Parallelisation, Sketching
• Distributed learning
Applications
• Health
• Environment
• Industry
• Complex systems
• Varied information
sources
• Varied data sources
(citizen science, VR,
drones, digital data)
CRICOS No. 00213Ja university for the worldrealR
Models:
• Probabilistic
• Regularised
• Flexible
• Robust
• Transferable
• Adaptive
Computation:
• Scalable (parallelisable)
• Subsampling
• Pre-computable
• Approximations (eg. ABC)
"In the past ten years, it's hard to find anything that doesn't advocate a Bayesian approach." -Nate Silver
Inference:
• Estimation
• Optimisation
• Uncertainty quantification
• Testing
• Model averaging
Meeting the challenge: “New Bayes”
CRICOS No. 00213Ja university for the worldrealR
Bayesian Modelling
p(q|y) = p(y|q) p(q) / p(y)
p(q|y) = p(q) p(y|q) / p(y)
Likelihood x Prior / Normalising constant
Prior x Likelihood / Normalising constant
CRICOS No. 00213Ja university for the worldrealR
p(q|y) = p(y|q) p(q) / p(y)
1763 1812 1838 1930’s 1950’s 1980’s 1990’s 2000’s
Bayes Laplace Boole Fisher Jeffreys Geman Gelfand Today’s
Venn Neyman Geman Smith Bayesians
“Probability “ Inverse “BayesianTheory” Probability” Analysis”
CRICOS No. 00213Ja university for the worldrealR
Bayesian spatial image analysis
MRI scans of brains
https://www.radiologyinfo.org/en/info.cfm?pg=alzheimers
CRICOS No. 00213Ja university for the worldrealR
Approaches
• Clair Alston-Knox – spatial mixture models
• Chris Strickland – spatial dynamic factor models
• Zoe van Havre – overfitted mixture models
• Matthew Moores – scalable approximate algorithms and pre-processing
• Cathy Hargrave – feature alignment
• Marcela Cespedes – hierarchical multivariate models
• Hongbo Xi – sparse matrix factorisation
• Insha Ullah – PCA approaches for high-D variable selection
• Jacinta Holloway – decision-tree methods
• Aleysha Thomas – ensemble meta-analysis + Bayesian network models
CRICOS No. 00213Ja university for the worldrealR
1. Parkinson’s Disease (PD)
• PD is a common neurodegenerative disorder that
affects 0.3% of the general population.
• 4% of the cases are under the age of 50 years.
• Onset of PD is often mistaken for normal healthy
ageing.
• There is limited literature on the age at PD onset. A
deeper understanding about the age at onset could
lead to better clinical assessments and timely
management.
CRICOS No. 00213Ja university for the worldrealR
Parkinson’s Disease (PD)
• Although some non-genetic risk factors have
been shown to have a strong influence on PD,
most of these have been individually studied,
and only a few have been studied in
association with the age at PD onset.
• One of the factors that is increasing in interest
is organochlorine pesticide (OCP) exposure.
• What are the combined effects of non-genetic
risk factors and pesticide exposure on the age
at onset of Parkinson’s Disease?
CRICOS No. 00213Ja university for the worldrealR
Ensemble Approach
• Meta-analysis to obtain odds ratio estimates from previous studies on the effect
of individual risk factors on age at PD onset and merge inferences on five OCPs
into a single distribution.
• Bayesian Network to combine results results of the meta-analysis and
information of PD patients from multiple data sources
CRICOS No. 00213Ja university for the worldrealR
Data
• Risk factor information collected on a cohort of 350 PD patients as part of the
Queensland Parkinson’s Project (QPP).
• Concentrations of OCP (HCB, -HCH, trans-nonachlor, p,p’-DDE and p,p’-DDT)
measured from pooled samples of human blood serum from males and females
collected in Brisbane, Australia in age groups 5-15, 16-30, 31-45, 46-60 and >60
years
• Estimates of the association between risk factors and age at onset obtained from a
systematic review: articles that reported an odds ratio (OR) and a 95% CI were
included.
CRICOS No. 00213Ja university for the worldrealR
Literature Review
CRICOS No. 00213Ja university for the worldrealR
Meta-analysis model
Fit separate models for each age group and gender.
Let yi be the estimated log odds ratio for early age at
onset of PD (<50 years old) associated with exposure
to pesticide for the ith study.
• Fit a random-effects meta-analysis model to
combine effects across studies.
The overall combined OCP concentrations (q0 ) for
each age group and gender were used to
parameterise part of the BN.
CRICOS No. 00213Ja university for the worldrealR
Meta-analysis results: OR of early age at onset of PD
associated with exposure to PD
CRICOS No. 00213Ja university for the worldrealR
Bayesian Network
CRICOS No. 00213Ja university for the worldrealR
Bayesian Network: strength of influence
CRICOS No. 00213Ja university for the worldrealR
Bayesian Network:
varying the evidence
Large variation in outcome associated
with:
- OCP exposure
- Head injury
- Both head injury and family history
Small variation associated with:
- smoking
- alcohol
- presence of both led to lower
probability of an early age at onset
The absence of one or both medical
history risk factors had a lower
probability of early age at onset.
CRICOS No. 00213Ja university for the worldrealR
Conclusions
• Family history, prior head injury and OCP exposure are strongly associated
with an earlier age at PD onset.
• Irrespective of other risk factors, OCP exposure has a strong influence on the
probability of an early age at onset: high exposure is linked to a higher
probability of early onset compared to low exposure.
CRICOS No. 00213Ja university for the worldrealR
Extracellular recordings provide real time
monitoring of brain activity.
Measurement of action potentials (spikes) indicate
neuron populations present in the region of interest.
Spike sorting assigns individual spikes to source
neurons.
We want to analyse spikes collected from the
subthalamic nucleus during Deep Brain Stimulation, a
surgical intervention for the alleviation of symptoms in
patients with advanced Parkinson's disease (PD).
2. Spike sorting
Zoe van Havre, Nicole White, Judith
Rousseau, Kerrie Mengersen
CRICOS No. 00213Ja university for the worldrealR
• In PD, parts of the basal ganglia are either
under- or over-stimulated. Normal movement
is replaced by tremor, rigidity and stiffness.
• DBS of specific ganglia alters the abnormal
electrical circuits and helps stabilize the
feedback loops, thus reducing symptoms.
• Electrodes can be placed in the subthalmic
nucleus, thalamus or globus pallidus.
Deep Brain Stimulation
CRICOS No. 00213Ja university for the worldrealR
• In PD, parts of the basal ganglia are either
under- or over-stimulated. Normal movement is
replaced by tremor, rigidity and stiffness.
• DBS of specific ganglia alters the abnormal
electrical circuits and helps stabilize the
feedback loops, thus reducing symptoms.
• Electrodes can be placed in the subthalmic
nucleus, thalamus or globus pallidus.
Deep Brain Stimulation
CRICOS No. 00213Ja university for the worldrealR
• Three independent samples, Y1, Y2, Y3
• Dimension reduction performed using robust Principal Components Analysis (PCA).
The first four principal components (PCs) were used as inputs into the model.
Data
CRICOS No. 00213Ja university for the worldrealR
• Problem: infer the partition of n multivariate observations into K clusters (K unknown).
• For the sample y = (y1,…, yn) let yi = (yi1,…, yir) consist of r measurements associated
with observation i.
• Cluster membership for each yi is inferred via the discrete latent variable zi where zi = k
denotes the assignment of yi to cluster k.
𝑝 𝒚𝑖 𝑧𝑖 = 𝑘, 𝜽𝑘 = 𝑁𝑟(𝝁𝑘 , 𝚺𝑘)• Priors:
𝑝 𝜇𝑘 Σ𝑘 = 𝑁𝑟 𝑏0,Σ𝑘𝑁0
; 𝑝 Σ𝑘 = 𝐼𝑊(𝑐0, 𝐶0)
Model setup
CRICOS No. 00213Ja university for the worldrealR
• Likelihood
𝑝 𝒚 𝜽, 𝝅 = ෑ
𝑖=1
𝑛
𝑘=1
𝐾∗>𝐾
𝜋𝑘𝑁𝑟(𝝁𝑘 , 𝜮𝑘)
• Priors:
𝑧𝑖|𝝅 ~ 𝑀𝑁(1; 𝜋1, … , 𝜋𝐾∗)
𝜋1, … , 𝜋𝐾∗ ~ 𝐷(𝛼,… , 𝛼)
• Set 𝑏0 = ത𝑦,𝑁0 = 0.01, 𝑐0 = 5, 𝐶0 = 0.75cov 𝑦 .
• Choose a by prior tempering (ZMix).
1: Overfitted finite Gaussian mixture model
CRICOS No. 00213Ja university for the worldrealR
Random measure G characterized by mean G0 of DP and
concentration parameter m0
𝑦𝑖|𝜃𝑖 ~ 𝜃𝑖
𝜃𝑖|𝐺 ~ 𝐺
𝐺 ~ 𝐷𝑃(𝑚𝐺0)
2: Dirichlet Process model
CRICOS No. 00213Ja university for the worldrealR
2: Dirichlet Process mixture model
𝐺 = σ𝑘=1∞ 𝜋𝑘𝛿𝜃𝑘
𝜋𝑘 = 𝜈𝑘ෑ
𝑙<𝑘
(1 − 𝜐𝑙)
𝜈𝑘~Beta(1,𝑚)
𝜃𝑘|𝐺0 ~ 𝐺0
𝑚 ~ Gamma(1,1)
• Implemented using slice sampler
• The Posterior Expected Rand (PEAR) index used as a MAP estimator for z.
CRICOS No. 00213Ja university for the worldrealR
Results: posterior distribution of number
of occupied components
CRICOS No. 00213Ja university for the worldrealR
Results: optimal partitioning
CRICOS No. 00213Ja university for the worldrealR
Results: optimal partitioning
CRICOS No. 00213Ja university for the worldrealR
Results: optimal partitioning
CRICOS No. 00213Ja university for the worldrealR
Results: frequency of cluster membership
CRICOS No. 00213Ja university for the worldrealR
Conclusions
• Both methods could identify high probability clusters comprising spikes with similar
trajectories.
• The uncertainty in the clustering was caused by a small number of quite different
spikes in each dataset.
• The DPM captured this through a variable number of small clusters.
• The OFM captured this by combining the ‘outliers’ into a single group with a large
covariance (multivariate Gaussian noise component). This prevented the
interpretation of the smallest clusters and fine structure.
CRICOS No. 00213Ja university for the worldrealR
From PD spikes to AD pre-clinical diagnosis
• Accumulation of b-amyloid (Ab) accumulation begins 15-25 years prior to the clinical
classification of dementia.
https://www.keepmemoryalive.org/
brain-science/alzheimers-brain
The cortex shrivels up, damaging
areas involved in thinking, planning
and remembering.
Shrinkage is especially sever in the
hippocampus (formation of new
memories).
Ventricles (fluid-filled spaces) grow
larger.
CRICOS No. 00213Ja university for the worldrealR
From PD spikes to AD pre-clinical diagnosis
• We wanted to identify pre-clinical Alzheimer's Disease in a population of elderly
cognitively normal participants.
• We sampled 761 clinically normal (CN) participants at 4 periods (0-54 months) in the
Australian Imaging, Biomarkers and Lifestyle (AIBL) study of ageing.
• We used six standardised composite neuropsychological scores:
Verbal episodic memory; Visual memory;
Executive function; Language;
Attention and processing speed; Visuo-spatial functioning
CRICOS No. 00213Ja university for the worldrealR
Method
• We fitted a Bayesian mixture model to each composite score and time point, using
ZMix.
• We defined an aggregate measure of posterior probabilities (AMPP score) to
establish the likelihood of pre-clinical AD.
• For the ith person,
𝐴𝑀𝑃𝑃𝑖 =
𝑗=1
24
Pr 𝑧𝑖𝑗 = 𝑘 Pr(𝜇𝑘 < 0) > 0.95)
• We compared these results with groupings based on clinical measures, PET and MRI
scans.
CRICOS No. 00213Ja university for the worldrealR
Results
AMPP for 0 vs 18
months, for each
composite score.
Scale for AMPP:
Green: low
Yellow: medium
Red/black: high
CRICOS No. 00213Ja university for the worldrealR
Results
Low
Moderate
High
CRICOS No. 00213Ja university for the worldrealR
Results
• From Baseline through to 54 months, visuo-spatial function had
the greatest contribution to the AMPP score, followed by attention
and processing speed and visual memory.
• Participants with the highest AMPP scores had both increasing
neo-cortical amyloid burden and decreasing hippocampus volume
over 54 months, compared to those in the lowest category with
stable amyloid burden and hippocampus volume.
• This approach can provide an indication of pre-clinical AD.
CRICOS No. 00213Ja university for the worldrealR
Compare brain networks in normal and AD patients
3. Alzheimers Disease
https://physicsworld.com/a/
towards-a-vaccine-for-
alzheimers-disease/
CRICOS No. 00213Ja university for the worldrealR
Pearson
pairwise
correlations
MildNormal
Alzheimers
CRICOS No. 00213Ja university for the worldrealR
Yirk : cortical thickness of region k = 1:K for participant i = 1:I who has r = 1:Ri replicates
𝑦𝑖𝑟𝑘|𝑏𝑖𝑘 , 𝛽, 𝜎2~𝑁(𝑥𝑖𝛽 + 𝑏𝑖𝑘, 𝜎2)
𝒃𝑖|𝜎𝑠2,𝑊~𝑀𝑉𝑁(𝟎, 𝜎𝑠
2Q)
𝑄−1 = 𝜌 𝐷𝑤 −𝑊 + 1 − 𝜌 𝐼
D : diagonal matrix with elements given by row sums (or number of neighbours) σ𝑗=1𝐾 𝑤𝑗𝑘
W : zero-diagonal, binary symmetric matrix, 𝑤𝑗𝑘 = 1 if regions j and k are neighbours, else = 0
r : determines global level of spatial correlation
Bayesian Hierarchical Model
Typically r is fixed – can we estimate it?
CRICOS No. 00213Ja university for the worldrealR
Estimating the neighbourhood
Computation: MCMC
CRICOS No. 00213Ja university for the worldrealR
Results:
Simulation
CRICOS No. 00213Ja university for the worldrealR
Normal
(N=120)
Alzheimers
(N=20)
Results:
case study
CRICOS No. 00213Ja university for the worldrealR
Comparison of neurodegeneration over time
• Biomarker and Lifestyle (AIBL) study of ageing
• Neuroimaging data across healthy controls (HC), mild cognitive impaired (MCI) and AD.
• Focus on ventricle and hippocampus regions
• Three types of inference:
1. comparisons of estimated rates of population deterioration
2. ranking of participants by order of linear volumetric rate of change
3. probability trajectories across age of diagnosis groups
CRICOS No. 00213Ja university for the worldrealR
Results
(i) large differences in average rate of change of
volume for the ventricle and hippocampus regions
across diagnosis groups
(ii) high risk individuals who had progressed from HC
to MCI and displayed similar rates of deterioration as
AD counterparts
(iii) critical time points which indicate where
deterioration of regions begin to diverge between the
diagnosis groups
CRICOS No. 00213Ja university for the worldrealR
Computation: Bayesian Analysis via AutoStat https://autostat.com.au/
CRICOS No. 00213Ja university for the worldrealR
CRICOS No. 00213Ja university for the worldrealR
CRICOS No. 00213Ja university for the worldrealR
CRICOS No. 00213Ja university for the worldrealR
Intelligent data collection via (adaptive) design
If we have a specific
question, we don’t need
to analyse all of it.
Use experimental design
principles to select the
data required to answer
the question.
CC Drovandi, CC Holmes, JM McGree, K Mengersen, S Richardson, EG Ryan,
Principles of Experimental Design for Big Data Analysis, Statistical Science, 32 (3), 385–404, 2017
CRICOS No. 00213Ja university for the worldrealR
A decision analysis approach to experimental design
CRICOS No. 00213Ja university for the worldrealR
Bayesian experimental design
CRICOS No. 00213Ja university for the worldrealR
Experimental design in the context of big data
1. Answer questions of interest: Find the optimal (or near optimal) design to answer the
question and use the design as a ‘template’ for sub-sampling the data.
2. Sequential learning: Apply a given design to incoming data or new datasets until the
question of interest answered.
3. Assess data quality: Absence of design points/windows may indicate structured
missingness or bias in the big dataset.
4. Assess model quality: Replicate designs can be ’laid over’ the big data for model
checking (eg posterior predictives), concept drift etc.
5. Enlarge loss function: Include model misspecification, time constraints etc.
CRICOS No. 00213Ja university for the worldrealR
Example: logistic regression
• 6 covariates
• 1,000,000 records
Analysis aims:
- Identification of important covariates for prediction
- Accurate and precise parameter estimates
CRICOS No. 00213Ja university for the worldrealR
Experimental design approach
• Select a random sample of 10,000 points to construct prior distributions.
• “Value add” to the information gained through a sequential design process.
• Use Sequential Monte Carlo for fast computation.
• For each new data point, update the prior information to reflect the information
gained and form a 95% credible interval for all parameters.
• If any credible interval is contained within (−tol, tol), drop it from the model.
• Re-fit the reduced model and re-run.
• Iterate until 20,000 data points are extracted.
CRICOS No. 00213Ja university for the worldrealR
Tackling Big Data with Bayesian Statistics
DataScience@QUT
Australian Data Science Network
CRICOS No. 00213Ja university for the worldrealR
QUT Centre for Data Science
CRICOS No. 00213Ja university for the worldrealR
ReferencesM Cespedes, J McGree, CC Drovandi, K Mengersen, LB Reid, JD Doecke, J Fripp (2017) A Bayesian hierarchical approach to jointly model structural biomarkers and covariance networks. Arxiv.
M Cespedes, J Fripp, JM McGree, CC Drovandi, K Mengersen, JD Doecke (2017) Comparison of neurodegeneration over time between health ageing and Alzheimer’s disease cohorts via Bayesian inference. BMJ Open, 7(2).
Z van Havre, P Manuff, V Villemagne, K Mengersen, J Rousseau, N White, J Doecke(2019) Identification of pre-clinical Alzheimer’s Disease in a population of elderly cognitively normal participants. Journal of Alzheimer’s Disease, 1-10.
Z van Havre, N White, J Rousseau, K Mengersen (2015) Clustering action potential spikes: Insights on the use of overfitted finite mixture models and Dirichlet process mixture modelsarXiv preprint arXiv:1602.01915
A Thomas, NM White, LM Leontjew Toms, K Mengersen (2018) Application of ensemble methods to analyse the decline of organochloring pesticides in relation to the interactions between age, gender and time. PLOS ONE.