91
Theory in Practice: Modeling in Neuroimaging How to model “big” MRI datasets

Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Theory in Practice: Modeling in Neuroimaging

How to model “big” MRI datasets

Page 2: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Outline of talk

• Theory recap: modelling approaches can be reduced to two types: predictive and descriptive

• “Big data” complicates our ability to apply both approaches

• Marginal Modelling is a good approach good for descriptive modelling

• Functional Random Forests is a good approach for predictive modelling

• Other approaches can also handle big data, but are beyond the scope of this workshop

Page 3: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Before even considering models, we need to know what question to ask• How and where may cortical thickness be associated with working

memory performance?

Page 4: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Before even considering models, we need to know what question to ask• How and where may cortical thickness be associated with working

memory performance?

• Can measures of functional brain organization predict an individual’s working memory ability?

Page 5: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Each question requires a different modelling approach• How and where may cortical thickness be associated with working

memory performance? Descriptive modelling

• Can measures of functional brain organization predict an individual’s working memory ability? Predictive modelling

Page 6: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Descriptive models measure what one has collected predictive models measure what one will collect

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

Page 7: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Descriptive models explore data, predictive models confirm properties of data

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

Page 8: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Descriptive models provide insight, predictive models apply insight

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

Page 9: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Descriptive models are limited to in-sample data, predictive models require out-of-sample data

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

Page 10: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Descriptive models are assessed via theory and inference, predictive models are assessed by independent testing

https://www.educba.com/predictive-analytics-vs-descriptive-analytics/

5

5

Page 11: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Outline of talk

• Theory recap: modelling approaches can be reduced to two types: predictive and descriptive

• “Big data” complicates our ability to apply both approaches

• Marginal Modelling is a good approach for descriptive modelling

• Functional Random Forests is a good approach for predictive modelling

• Other approaches can also handle big data, but are beyond the scope of this workshop

Page 12: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

First, all health-focused imaging studies should probably be big data

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

Page 13: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Our ABCD pipeline generates anywhere from 10 to 90 thousand tests

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

Page 14: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Our ABCD pipeline generates anywhere from 10 to 90 thousand tests (some special cases are in hundreds)

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

Page 15: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

We’ve collected about 10,000 cases

https://www.cell.com/neuron/pdf/S0896-6273(17)31141-8.pdf

Page 16: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

ABCD needed a lot of coordination and data aggregation to collect over 10,000 participants

Auchter et al, 2018, https://doi.org/10.1016/j.dcn.2018.04.003

Page 17: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Descriptive models must take into account this nested structure• Complex models may be slow to calculate when analyzing ~4500

participants• Permutation tests may take days or even weeks

• Permutation tests lack exchangeability for complex questions

Page 18: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Permutation testing can reveal whether differences in community structure are significantly different

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression

Page 19: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Permute group assignment and calculate statistic

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression

‘depression’

no depression

‘no depression’

Page 20: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Do so for multiple permutations and construct a distribution of the statistic for permuted groups

Hirschhorn,2005, https://doi.org/10.1038/nrg1521

depression

‘depression’

no depression

‘no depression’

Page 21: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

P value is determined by the proportional rankof the observed statistic compared to the permuted distribution

Freq

ue

ncy

Page 22: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

At a Z=2.3, falsepositive rates are highwhen not usingpermutation testing

Page 23: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

At a Z=3.1, falsepositive rates aregenerally better andin-line with the trueFP rate

Page 24: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

This all works because each individual is independently acquired from one another – the data are exchangeable

Page 25: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Independence gets more complicated when you have more complicated designs – but even here we can exchange every individual

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558

Drug use

Cannabis Alcohol Nicotine Stimulant

Page 26: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

However, if a second factor is nested, our permutations are limited to the nested pairs, restricting our permutations

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558

Drug use

Cannabis Alcohol Nicotine Stimulant

Family nested by drug use

Page 27: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

More complex designs have even more restrictions, relative to the total number of permutations

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558

Drug use

Cannabis Alcohol Nicotine Stimulant

Ho

met

ow

n

Page 28: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

In turn, restricted permutations have reduced power when controlling for the false positive rate

Anderson and Braak, 2003, JSCS; 10.1080=0094965021000015558

Page 29: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Predictive models must also take into account nested structure

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5736019/

Page 30: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Scanner effects can be common, independent of site

Gareth Harman, 4/11/19 – combat Cortical Thickness

Page 31: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

ComBat has also been used to correct for ABCD data, which can be predicted by site

Nielson, 2018, biorxiv; http://dx.doi.org/10.1101/309260

Site classification accuracy

Page 32: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Cross-validation strategies can mitigate known but not unknown effects• Stratified validation is possible via independent stratified groups

• Leave-one-site-out validation can help catch site effects

• But what about effects of scanner upgrades, software maintenance, or even changes in personnel?

Page 33: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Outline of talk

• Theory recap: modelling approaches can be reduced to two types: predictive and descriptive

• “Big data” complicates our ability to apply both approaches

• Marginal Modelling is a good approach for descriptive modelling

• Functional Random Forests is a good approach for predictive modelling

• Other approaches can also handle big data, but are beyond the scope of this workshop

Page 34: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

The marginal model may be a more feasible solution for modeling ABCD populations• Strengths:

• Marginal model makes few assumptions with respect to the data• Nested-designs can be modeled or unmodeled, and left to the error term (hopefully)

• Individual cases can be incomplete or missing for a marginal model

• Longitudinal designs are feasible within the marginal model framework

• Marginal model has a closed-form solution to the equation via a Sandwich Estimator (SwE)• It’s fast, and can be feasibly run with limited resources on lots of data

• Use of a wild bootstrap (WB) provides an NHST framework for complex questions

Page 35: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Critical limitations

• The marginal model cannot be used to draw inferences about individuals within a population

• It is an exploratory approach, which can be verified using subsequent confirmatory approaches• DEAP can help conform such analyses to best standards and practices through

pre-registered reports, reproducibility, and independent validation

Page 36: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Bryan Gillaume’s and Tom Nichols implemented an approach that uses a sandwich estimator to solve a marginal model

Imaging Volume(s)

Statistical T map for inference

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Perform small

sample adj.

Design matrix

Perform Wald Test

Compute model

Y/X = Beta

Page 37: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Marginal models are effectively linear, so we first estimate the parameters for our design matrix by dividing the imaging measure (Y) by the design (X)

Imaging Volume(s)

Design matrix

Compute model

Y/X = Beta

Page 38: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

For our software, the design matrix is just your non-imaging data

Imaging Volume(s)

Design matrix

Compute model

Y/X = Beta

Page 39: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

So for example, with the ABCD data we can input measures and test a model

Imaging Volume(s)

Design matrix

Compute model

Y/X = Beta

Marginal model: y ~ RT

Page 40: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

A sandwich estimator is used to estimate covariance and determine the fixed effects parameters

Imaging Volume(s)

Estimate FE covariance

(SwE)

Design matrix

Compute model

Y/X = Beta

Page 41: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

To handle nested structure, group covariance can be calculated separately (CRITICAL FOR ABCD)

Imaging Volume(s)

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Design matrix

Compute model

Y/X = Beta

Page 42: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

For ABCD, it is good to control for site and gender

Imaging Volume(s)

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Design matrix

Compute model

Y/X = Beta

site gender

14 2

5 2

Page 43: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

If needed we can perform a small sample size adjustment – this may be important if we used family as a nesting variable

Imaging Volume(s)

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Perform small

sample adj.

Design matrix

Compute model

Y/X = Beta

Page 44: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Finally, a Wald test extracts a t-map for statistical inference

Imaging Volume(s)

Statistical T map for inference

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Perform small

sample adj.

Design matrix

Perform Wald Test

Compute model

Y/X = Beta

Page 45: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

The statistical map looks like this

Imaging Volume(s)

Statistical T map for inference

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Perform small

sample adj.

Design matrix

Perform Wald Test

Compute model

Y/X = Beta

Page 46: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Use of a wild bootstrap enables inference similar to a permutation test – so we can control for the FWER

Imaging Volume(s)

Statistical T map for inference

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Perform small

sample adj.

Design matrix

Perform Wald Test

Wild bootstrap

WB maps

Cluster detection/

TFCEInference map

Compute model

Y/X = Beta

Page 47: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Such a test allows us to detect significant clusters

Imaging Volume(s)

Statistical T map for inference

Estimate FE covariance

(SwE)

Calculate subject /groups

covariance (residuals)

Perform small

sample adj.

Design matrix

Perform Wald Test

Wild bootstrap

WB maps

Cluster detection/

TFCEInference map

Compute model

Y/X = Beta

Page 48: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Wild bootstrap

• WB_value = fitted_value + residual_value*sample_value

• Sample with replacement can be from simple or complex distributions:• Radenbacher (-1, 1) would mean we either:

• WB_value = fitted_value – residual_value

• WB_value = fitted_value + residual_value

• However, LOTS of possible distributions, so choice of distribution is important.

Page 49: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

We have begun to implement a standalone MarginalModelCifti package in R

Alpha version will be released at -- http://github.com/dcan-labs/MarginalModelCifti

Page 50: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

The main wrapper for MarginalModelCifti takes in imaging volumes and prepares them for analysis

Imaging Volume(s)

PrepCIFTI/Surf/Vol

Page 51: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

ComputeMM is applied to the prepared data; user specifies the model using Wilkinson notation and wraps the SwE and Wald Test using Geepack

Imaging Volume(s)

PrepCIFTI/Surf/Vol

ComputeMMStatistical T map

for inference

Y ~ group + treatment

Page 52: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

ComputeMM_WB generates the WB maps used to draw inferences about the T map

Imaging Volume(s)

PrepCIFTI/Surf/Vol

ComputeMM

ComputeMM_WB Null Distribution

Statistical T map for inference

Page 53: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

In turn a family of functions are used to parallellize ComputeMM_WB

Imaging Volume(s)

PrepCIFTI/Surf/Vol

ComputeMM

ComputeMM_WB Null Distribution

Statistical T map for inference

ApplyWB_to_data

ComputeFits

ComputeResiudals

ComputeZscores

GetSurfAreas

GetVolAreas

Page 54: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Cluster detection is performed within the main wrapper, using information from both processes

Imaging Volume(s)

PrepCIFTI/Surf/Vol

ComputeMM

ComputeMM_WB Null DistributionCluster

detection/TFCE

Inference map

Statistical T map for inference

Page 55: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

The MarginalModelCifti package comprises multiple functions that can be accessed by anyone

Page 56: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Functions are documented in accordance with CRAN guidelines

Page 57: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Here are all the parameters for ConstructMarginalModel()

Page 58: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

To make things easier – we’ve made a jupyternotebook that can be used as a reference

Page 59: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Outline of talk

• Theory recap: modelling approaches can be reduced to two types: predictive and descriptive

• “Big data” complicates our ability to apply both approaches

• Marginal Modelling is a good approach for descriptive modelling

• Functional Random Forests is a good approach for predictive modelling

• Other approaches can also handle big data, but are beyond the scope of this workshop

Page 60: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Nested structures -- people belong to multiple subtypes

SODACOKE

POP

Dialect preferences: soda, coke or pop?

Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

Page 61: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Nested structures -- people belong to multiple subtypes

DEM

GOP

U.S. 2016 presidential election voting preferences

SODACOKE

POP

Dialect preferences: soda, coke or pop?

Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

Page 62: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Nested structures -- people belong to multiple subtypes

DEM

GOP

U.S. 2016 presidential election voting preferences

Stroke mortality for Adults 35+ per 100,000

RATE

SODACOKE

POP

Dialect preferences: soda, coke or pop?

Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

Page 63: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

But what about effects of scanner upgrades, software maintenance, or even changes in personnel?

Page 64: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

If we want to control for unknown structure, we need to identify subtypes tied to an outcome

• Supervised approaches can confirm known subtypes but not discover unknown subtypes tied to an outcome

Page 65: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

If we want to control for unknown structure, we need to identify subtypes tied to an outcome

• Supervised approaches can confirm known subtypes but not discover unknown subtypes tied to an outcome

• Unsupervised approaches can discover unknown subtypes, but not tied to any outcome

Page 66: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

How does the Functional Random Forest work?

Supervised component

Page 67: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Ask a question: can we predict depression diagnosis?

Supervised component

Unsupervised component

Page 68: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

We start with an input dataset

Input dataset

Unsupervised component

Page 69: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

We start with an input dataset

Input dataset

Unsupervised component

Page 70: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

This dataset can be a functional connectivity matrix

Input dataset

Unsupervised component

Page 71: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

This dataset can be a functional connectivity matrix – which gets reduced to either graph metrics or principal components

Input dataset

Unsupervised component

Page 72: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

Input data are modeled via a random forest via validation/testing

Random Forest

Creates decision trees

Input dataset

Unsupervised component

Page 73: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

Model is supervised because it attempts to predict the outcome of interest

Random Forest

Creates decision trees

Input dataset

Unsupervised component

Page 74: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Unsupervised component

Supervised component

If the random forest performs well on independent test data, a similarity matrix is produced from the RFs

Similarity matrix

Random Forest

Creates decision trees

Input dataset

=

Page 75: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

Unsupervised component

Subgroups are identified from this matrix via Infomap

Random Forest

Creates decision trees

Infomap

Identifies communities

Input dataset

Similarity matrix

Page 76: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Supervised component

Unsupervised component

Subtypes arise from the model that are tied to the outcome

Random Forest

Creates decision trees

Subpopulations

Infomap

Identifies communities

Input dataset

Similarity matrix

Page 77: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

The FRF can be used to identify trajectories in longitudinal data

Longitudinal dataset

Functional Data Analysis

Generates individual

trajectories

f(t) = a1ø1(t) + .... + akøk(t)

Page 78: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Combining the set of functions estimates a smooth trajectory for an individual’s symptoms

Longitudinal dataset

Functional Data Analysis

Generates individual

trajectories

f(t) = a1ø1(t) + .... + akøk(t)

Page 79: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Combining the set of functions estimates a smooth trajectory for an individual’s symptoms

Longitudinal dataset

Functional Data Analysis

Generates individual

trajectories

f(t) = a1ø1(t) + .... + akøk(t)

Page 80: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

We can use an unsupervised approach to identify trajectories

Unsupervised Longitudinal dataset

Functional Data Analysis

Generates individual

trajectories

Infomap

Identifies

communities

Correlation-based

subpopulations

f(t) = a1ø1(t) + .... + akøk(t)

Correlation Matrix

Compares trajectories

Page 81: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Or use a “hybrid” approach that identifies trajectory subtypes tied to an outcome of interest

Unsupervised HybridLongitudinal dataset

Functional Data Analysis

Generates individual

trajectories

Infomap

Identifies

communities

Correlation-based

subpopulations

Model-based

subpopulations

Infomap

Identifies

communities

f(t) = a1ø1(t) + .... + akøk(t)

Correlation Matrix

Compares trajectories

Parameters

Random Forest

Creates decision trees

Similarity matrix

Page 82: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

A manual for using the FRF exists online(https://dcan-labs.github.io/functional-random-forest/)

Page 83: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

A new release is available at:

Page 84: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

A manual for using the FRF exists online(https://dcan-labs.github.io/functional-random-forest/)

Page 85: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Outline of talk

• Theory recap: modelling approaches can be reduced to two types: predictive and descriptive

• “Big data” complicates our ability to apply both approaches

• Marginal Modelling is a good approach for descriptive modelling

• Functional Random Forests is a good approach for predictive modelling

• Other approaches can also handle big data, but are beyond the scope of this workshop

Page 86: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

New approaches within statistics and machine learning can also accommodate problems with big data• Many of these approaches have been developed in genomics

• comBat is a Bayesian approach to handle known site effects in data

• Surrogate Variable Analaysis

• Such approaches need to be examined in the context of neuroimaging data to evaluate where each is most useful

• Knowing how to use these tools requires considerable skill in data science, which has been relatively untaught in mental health fields

• Hopefully, the workshop tomorrow should get you excited about applying these new tools and on your path towards doing “big data” science right.

Page 87: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

AcknowledgmentsFair Lab-Damien Fair-Oscar Miranda-Dominguez-Alice Graham

Computing Team-Darrick Sturgeon-Eric Earl-Anders Perrone-Emma Schifsky-Anthony Galassi-Kathy Snider-David Ball-Lucille Moore

Alpha Testers-Bene Ramirez-Jennifer Zhu-Robert Hermosillo-Mollie Marr-Oliva Doyle-Michaela Cordova-AJ Mitchell

Page 88: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Acknowledgments

• The mentors• Damien Fair• Joel Nigg• Eric Fombonne• Shannon McWeeney

• The databasors• Lourdes Irwin• Darrick Sturgeon• Rachel Klein

• The developers• Eric Earl• Anders Perrone• Darrick Sturgeon

• Other Labs:• Nigg Lab• McWeeney Lab

• The assessors:• Beth Langhorst• Michaela Cordova• Bene Ramirez• Brian Mills• Olivia Doyle

• The students:• Iliana Javier• Nadir Balba

• The docs:• Alice Graham• Oscar Miranda-Dominguez• Binyam Nardos

• The collaborators:• Sarah Karalunas• Alison Hill• Jan Van Santen

• Everyone I forgot, which is many ☺

Page 89: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Questions?

Page 90: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

High dimensionality is bad for predictive modelling

Feczko, Miranda-Dominguez, Marr, Graham, Nigg, Fair, TICS, 2019, DOI: https://doi.org/10.1016/j.tics.2019.03.009

Page 91: Theory in Practice: Modeling in Neuroimaging · 2021. 7. 20. · Outline of talk •Theory recap: modelling approaches can be reduced to two types: predictive and descriptive •“Big

Predictive models must also take into account nested structure

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3880143/