28
1 of 28 2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE Extracting a cellular hierarchy from high- dimensional cytometry data with SPADE Nikolas Pontikos PhD Student, CIMR

Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

Embed Size (px)

Citation preview

  • 1 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Nikolas Pontikos

    PhD Student, CIMR

  • 2 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Different types of cells can be identified based on their shape/size and the surface markers (proteins) that they express:

    Biological Context: Cell Phenotypes

    Lymphocytes Granulocytes Neutrophils

    CD4+ Lymphocytes CD8+ Lymphocytes

    CD45RA+ CD45RA-CD stands for Cluster of Differentiation these are surface proteins which can be used as markers to distinguish different cell types.

  • 3 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    0 1000 2000 3000 4000

    020

    040

    060

    080

    010

    00

    Forward Scatter

    Side

    Sca

    tter

    What is Flow Cytometry?

    1998-2012 Abcam plc. All rights reserved

    Cells ForwardScatter

    SideScatter

    CD4 CD127 CD45RA CD25

    1 2110 309 103 254 4 70

    2 1565 252 57 278 341 59

    ... ... ... ... ... ... ...

    110,992 964 256 78 199 9 345

    110,992 points

    Granularity

    Lymphocytes

    Cell Size

    Neutrophils

    Granulocytes

  • 4 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    The Transitional Phenotype of Cells

    Memory Cell Naive Cell

    CD45RA

    Memory Cells Naive Cells

    0.0 0.5 1.0 1.5 2.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    Log10 CD45RA IntensityDe

    nsity

    As cells transition from one cell type (state) to

    another they lose/gain expression of certain

    markers.

    Here the CD45RA marker is lost as cells

    transition from naive to memory status.

    This results in a bimodal distributions of the

    intensity of CD45RA.

  • 5 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Manual Method of Identifying Cell Phenotypes

    % of CD25+ Naive Cells

    % of Memory Cells

  • 6 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Identifying all possible cell subsets is tedious and error-prone.

    P parameters results in the order of P^2 bi-dimensional comparisons.

    Manual analysis also introduces operator bias.

    Unexpected or rare cell populations may be missed.

    Issues with Manual Analysis of Flow Cytometry Data

  • 7 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Flow Data Genetic Data

    P ~ 100,000P ~ 10

    N ~

    100

    0

    N ~

    1,0

    0,00

    0

    N > 10,000 x P N < 100 x P

    VS

    Distance-based clustering:

    - hclust- kmeans

    Density-based clustering:

    - identifying regions of significantly high-density- fitting mixture models

    N cells N individualsP cellular markers P SNPs

  • 8 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Motivation for SPADE

    Heading towards high-dimensional data sets:- pooling of datasets- mass cytometry

    Distance based methods are fast at the expense of storing the entire distance matrix.

    Distance-based clustering is well suited for high-dimensional data sets when data is too sparse for density-based methods.

  • 9 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Primarily a visualisation tool for revealing structure in point clouds as obtained from flow cytometry.

    A clustering method with rare event detection thanks to density-dependent downsampling.

    Four main steps in SPADE:

    1) Density-dependent downsampling2) Agglomerative clustering3) Minimum spanning tree construction4) Upsampling

    SPADE:spanning-tree progression analysis of density-normalised events

  • 10 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Outline of SPADE as applied to a simulated data set- Proof of concept- Structure of data preserved and rarer cell population identified

    Analysis of mouse hematopoiesis using flow cytometry data- Ability to reconstruct a known hierarchy- Comparison to manual gating- Identified cell population missed in manual gating (dendritic cells)

    Analysis of human hematopoiesis using mass cytometry data- Joining multiple stimulation experiments on core markers- Non-targeted cell population identified (NK cells)

    Results from paper

  • 11 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    SPADE: Spanning-tree Progression Analysis of Density-normalised Events

    (i) A simulated two-parameter flow cytometry data set, with one rare population and three abundant populations.

    (ii) Result of density-dependent down-sampling of the original data.

    (iii) Agglomerative clustering result of the down-sampled cells. Adjacent clusters are drawn in alternating colors.

    (iv) Minimum spanning tree that connects the cell clusters.

    (v) Colored SPADE trees. Nodes are colored by the median intensities of protein markers of cells in each node, allowing visualization of the behaviors of the two markers across the entire heterogeneous cell population.

    Input

    Output

  • 12 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Density-dependent down-sampling: an example

    1

    0

    1

    2

    3

    4

    1 0 1 2 3CD25

    CD45RA

    1

    0

    1

    2

    3

    4

    1 0 1 2 3CD25

    CD45RA

    N=200 N=50

    After downsampling the density has been flattened to the target density while preserving rare clusters.

    The green nodes can be used to build the minimum spanning-tree.

  • 13 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Extracting Cellular Hierarchy

    Identification of dendritic cells missed by manual gating in f

    Mouse Data

  • 14 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Identification of untargeted* cell type

    NK Cells NK Cells

    * CD127 and CD16 are typically not used to identify NK Cells.

    Human Data

  • 15 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Joining multiple flow experiments on core surface markers

    Core markers used to build tree. Other markers, functional or additional surface markers, used to annotate it.

    2 0 2 4

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    CD4

    N = 229591 Bandwidth = 0.08117

    Dens

    ity

    Core markers need to align across experiments.

    Human Data

  • 16 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Visualisation of response

    Pooling of experiments on common tree structure allows visualisation across many different experimental conditions.

    Human Data

  • 17 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Visualises high dimensional data.

    Exposes hierarchy in bottom-up manner thanks to spanning-tree.

    Identification of novel and rare cell types in flow cytometry thanks to density-dependent downsampling.

    Pooling of multiple experiments on common tree structure for meta-analysis.

    Conclusion

  • 18 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Application to our data sets

  • 19 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Visualising response to stimulation

    0U.lymphocytes2Median of Alexa.Fluor.488.A

    0.64 2.14

    Range: 0.02 to 0.98 pctile

    1U.lymphocytes2Median of Alexa.Fluor.488.A

    0.64 2.14

    Range: 0.02 to 0.98 pctile

    10U.lymphocytes2Median of Alexa.Fluor.488.A

    0.64 2.14

    Range: 0.02 to 0.98 pctile

    1000U.lymphocytes2Median of Alexa.Fluor.488.A

    0.64 2.14

    Range: 0.02 to 0.98 pctile

    CD45RA+ CD25-

    CD45RA- CD25+

    resting

    increasing dose

    Data courtesy of Tony Cutler

    Applied to Flow Data:

  • 20 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Minimum spanning trees relate to single-linkage hierarchical

    clustering as used in heatmaps for viewing genetic data

    such as SNP arrays.

  • 21 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Copy Number Imputation from SNP Log R Ratio and Theta

    0 1 2 3

    01

    23

    KIR3DS1

    KIR

    3DL1

    x x

    xx x

    x x

    x

    60.38%

    29.58%

    3.66%

    1.83%

    2.10%

    0.47% 0.34%

    1.63%

    Log_R_

    Ratio.se

    q.rs674268

    Theta.seq.rs598452

    Log_R_

    Ratio.rs12461010

    Theta.seq.rs1654644

    Theta.seq.rs10500318

    Theta.rs581623

    Theta.seq.rs649216

    Theta.seq.rs648689

    Theta.rs12461010

    Log_R_

    Ratio.se

    q.t1d.19.60054973.T.C

    Log_R_

    Ratio.rs640345

    Log_R_

    Ratio.se

    q.rs10407958

    Log_R_

    Ratio.se

    q.t1d.19.60014013.A.C

    Log_R_

    Ratio.se

    q.rs597598

    Log_R_

    Ratio.se

    q.rs55761930

    Log_R_

    Ratio.se

    q.t1d.19.60056605.A.T

    Log_R_

    Ratio.se

    q.rs3865510

    Log_R_

    Ratio.se

    q.rs10500318

    Log_R_

    Ratio.se

    q.rs649216

    Log_R_

    Ratio.rs3865507

    Log_R_

    Ratio.se

    q.rs592645

    Log_R_

    Ratio.se

    q.rs648689

    Log_R_

    Ratio.se

    q.t1d.19.60034052.C.T

    Log_R_

    Ratio.se

    q.rs12976350

    Log_R_

    Ratio.rs10422740

    Log_R_

    Ratio.se

    q.t1d.19.60056721.C.T

    Log_R_

    Ratio.se

    q.rs2295805

    Log_R_

    Ratio.rs3826878

    Theta.rs3826878

    Log_R_

    Ratio.se

    q.rs4806568

    Log_R_

    Ratio.rs581623

    Log_R_

    Ratio.rs4806585

    Log_R_

    Ratio.se

    q.t1d.19.60007809.C.G

    Log_R_

    Ratio.se

    q.rs604999

    Log_R_

    Ratio.se

    q.rs1654644

    Theta.seq.t1d.19.60054973.T.C

    Theta.seq.t1d.19.60014013.A.C

    Log_R_

    Ratio.se

    q.rs604077

    Log_R_

    Ratio.se

    q.rs598452

    Theta.seq.rs10407958

    Theta.seq.t1d.19.60007809.C.G

    Theta.seq.rs592645

    Log_R_

    Ratio.se

    q.rs62122181

    Theta.seq.rs674268

    Theta.seq.rs604077

    Theta.rs640345

    Theta.seq.rs597598

    Theta.seq.rs604999

    Theta.seq.rs55761930

    Theta.seq.rs12976350

    Theta.seq.rs4806568

    Theta.seq.rs62122181

    Theta.seq.t1d.19.60034052.C.T

    Theta.seq.t1d.19.60056605.A.T

    Theta.seq.rs3865510

    Theta.rs3865507

    Theta.rs10422740

    Theta.rs4806585

    Theta.seq.t1d.19.60056721.C.T

    Theta.seq.rs2295805

    Log R Ratio and Theta from ImmunoChip of 30 SNPs in gene A and B regionCopy Number Calls Using qPCR data

    Gene A

    Gene

    B 0-2 1-2

    0-1 1-1 2-1

    3-02-01-0

    Applied to Genetic Data:

  • 22 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Minimal Spanning TreeSingle-linkage hclust1510

    50

  • 23 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Minimum spanning trees another way of visualising high-

    dimensional data?

  • 24 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Lum, P. Y., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., et al. (2013). Extracting insights from the shape of complex data using topology. Scientific Reports, 3. doi:10.1038/srep01236

    Topological Data Analysis

  • 25 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Qiu, P. (2012). Inferring Phenotypic Properties from Single-Cell Characteristics. PloS one, 7(5), e37038. doi:10.1371/journal.pone.0037038

    P

    P and Q can be comparing using the Earth Mover Distance subject to minimising

    Comparing distribution of cells across the same tree

    EMD(P,Q) =

    Pmi=1

    Pnj=1 fijdijPm

    i=1

    Pnj=1 fij

    .

    P and Q are cell distributions across the nodes of the same tree.

    Q

    fij

    dij

    # of cells moved from node i to node j

    # of hops in shortest path from node i to j

    Pmi=1

    Pnj=1 fijdij

  • 26 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Calculating density

    All these points will be assigned the same local density (LD).

    According to the target density (TD) and outlier density (OD), SPADE keeps each cell i with the following probability:

  • 27 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Building Minimal Spanning Tree

    The minimal spanning-tree (MST) is the shortest path which connects all nodes.

    Layout of tree determined by Fruchterman-Reingold algorithm (see Methods).

    MST is related to single-linkage hierarchical clustering algorithm (see later).

  • 28 of 28

    2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE

    Average Linkage

    Log_R_

    Ratio.se

    q.rs674268

    Theta.seq.rs598452

    Log_R_

    Ratio.rs12461010

    Theta.seq.rs1654644

    Theta.seq.rs10500318

    Theta.rs581623

    Theta.seq.rs649216

    Theta.seq.rs648689

    Theta.rs12461010

    Log_R_

    Ratio.se

    q.t1d.19.60054973.T.C

    Log_R_

    Ratio.rs640345

    Log_R_

    Ratio.se

    q.rs10407958

    Log_R_

    Ratio.se

    q.t1d.19.60014013.A.C

    Log_R_

    Ratio.se

    q.rs597598

    Log_R_

    Ratio.se

    q.rs55761930

    Log_R_

    Ratio.se

    q.t1d.19.60056605.A.T

    Log_R_

    Ratio.se

    q.rs3865510

    Log_R_

    Ratio.se

    q.rs10500318

    Log_R_

    Ratio.se

    q.rs649216

    Log_R_

    Ratio.rs3865507

    Log_R_

    Ratio.se

    q.rs592645

    Log_R_

    Ratio.se

    q.rs648689

    Log_R_

    Ratio.se

    q.t1d.19.60034052.C.T

    Log_R_

    Ratio.se

    q.rs12976350

    Log_R_

    Ratio.rs10422740

    Log_R_

    Ratio.se

    q.t1d.19.60056721.C.T

    Log_R_

    Ratio.se

    q.rs2295805

    Log_R_

    Ratio.rs3826878

    Theta.rs3826878

    Log_R_

    Ratio.se

    q.rs4806568

    Log_R_

    Ratio.rs581623

    Log_R_

    Ratio.rs4806585

    Log_R_

    Ratio.se

    q.t1d.19.60007809.C.G

    Log_R_

    Ratio.se

    q.rs604999

    Log_R_

    Ratio.se

    q.rs1654644

    Theta.seq.t1d.19.60054973.T.C

    Theta.seq.t1d.19.60014013.A.C

    Log_R_

    Ratio.se

    q.rs604077

    Log_R_

    Ratio.se

    q.rs598452

    Theta.seq.rs10407958

    Theta.seq.t1d.19.60007809.C.G

    Theta.seq.rs592645

    Log_R_

    Ratio.se

    q.rs62122181

    Theta.seq.rs674268

    Theta.seq.rs604077

    Theta.rs640345

    Theta.seq.rs597598

    Theta.seq.rs604999

    Theta.seq.rs55761930

    Theta.seq.rs12976350

    Theta.seq.rs4806568

    Theta.seq.rs62122181

    Theta.seq.t1d.19.60034052.C.T

    Theta.seq.t1d.19.60056605.A.T

    Theta.seq.rs3865510

    Theta.rs3865507

    Theta.rs10422740

    Theta.rs4806585

    Theta.seq.t1d.19.60056721.C.T

    Theta.seq.rs2295805

    L og _

    R _R a

    t i o. r s

    4 80 6

    5 85

    L og _

    R _R a

    t i o. s e

    q .t 1

    d .1 9

    . 60 0

    0 78 0

    9 .C .

    G

    L og _

    R _R a

    t i o. r s

    3 82 6

    8 78

    L og _

    R _R a

    t i o. s e

    q .r s

    5 98 4

    5 2

    L og _

    R _R a

    t i o. s e

    q .t 1

    d .1 9

    . 60 0

    5 49 7

    3 .T . C

    L og _

    R _R a

    t i o. r s

    6 40 3

    4 5

    L og _

    R _R a

    t i o. s e

    q .t 1

    d .1 9

    . 60 0

    1 40 1

    3 .A .

    C

    L og _

    R _R a

    t i o. s e

    q .r s

    5 97 5

    9 8

    L og _

    R _R a

    t i o. s e

    q .r s

    5 57 6

    1 93 0

    L og _

    R _R a

    t i o. r s

    3 86 5

    5 07

    L og _

    R _R a

    t i o. s e

    q .r s

    6 49 2

    1 6

    L og _

    R _R a

    t i o. s e

    q .r s

    3 86 5

    5 10

    L og _

    R _R a

    t i o. s e

    q .r s

    1 05 0

    0 31 8

    L og _

    R _R a

    t i o. s e

    q .r s

    6 48 6

    8 9

    L og _

    R _R a

    t i o. s e

    q .r s

    5 92 6

    4 5

    L og _

    R _R a

    t i o. s e

    q .r s

    1 04 0

    7 95 8

    L og _

    R _R a

    t i o. s e

    q .t 1

    d .1 9

    . 60 0

    5 66 0

    5 .A .

    T

    L og _

    R _R a

    t i o. s e

    q .r s

    4 80 6

    5 68

    L og _

    R _R a

    t i o. s e

    q .t 1

    d .1 9

    . 60 0

    3 40 5

    2 .C .

    T

    L og _

    R _R a

    t i o. s e

    q .r s

    1 29 7

    6 35 0

    L og _

    R _R a

    t i o. r s

    1 04 2

    2 74 0

    L og _

    R _R a

    t i o. r s

    5 81 6

    2 3

    T he t

    a .r s

    3 82 6

    8 78

    L og _

    R _R a

    t i o. s e

    q .t 1

    d .1 9

    . 60 0

    5 67 2

    1 .C .

    T

    L og _

    R _R a

    t i o. s e

    q .r s

    1 65 4

    6 44

    L og _

    R _R a

    t i o. s e

    q .r s

    2 29 5

    8 05

    T he t

    a .s e

    q .r s

    1 04 0

    7 95 8

    T he t

    a .s e

    q .t 1

    d .1 9

    . 60 0

    0 78 0

    9 .C .

    G

    L og _

    R _R a

    t i o. s e

    q .r s

    6 04 0

    7 7

    L og _

    R _R a

    t i o. s e

    q .r s

    6 04 9

    9 9

    T he t

    a .s e

    q .t 1

    d .1 9

    . 60 0

    5 49 7

    3 .T . C

    T he t

    a .s e

    q .t 1

    d .1 9

    . 60 0

    1 40 1

    3 .A .

    C

    L og _

    R _R a

    t i o. s e

    q .r s

    6 74 2

    6 8

    L og _

    R _R a

    t i o. r s

    1 24 6

    1 01 0

    T he t

    a .s e

    q .r s

    5 98 4

    5 2

    T he t

    a .s e

    q .r s

    5 92 6

    4 5

    L og _

    R _R a

    t i o. s e

    q .r s

    6 21 2

    2 18 1

    T he t

    a .r s

    5 81 6

    2 3

    T he t

    a .s e

    q .r s

    6 49 2

    1 6

    T he t

    a .s e

    q .r s

    6 48 6

    8 9

    T he t

    a .s e

    q .r s

    1 65 4

    6 44

    T he t

    a .s e

    q .r s

    5 57 6

    1 93 0

    T he t

    a .s e

    q .r s

    1 29 7

    6 35 0

    T he t

    a .s e

    q .r s

    6 74 2

    6 8

    T he t

    a .s e

    q .r s

    4 80 6

    5 68

    T he t

    a .s e

    q .r s

    6 21 2

    2 18 1

    T he t

    a .r s

    1 24 6

    1 01 0

    T he t

    a .s e

    q .t 1

    d .1 9

    . 60 0

    3 40 5

    2 .C .

    T

    T he t

    a .s e

    q .t 1

    d .1 9

    . 60 0

    5 66 0

    5 .A .

    T

    T he t

    a .s e

    q .r s

    3 86 5

    5 10

    T he t

    a .r s

    4 80 6

    5 85

    T he t

    a .s e

    q .t 1

    d .1 9

    . 60 0

    5 67 2

    1 .C .

    T

    T he t

    a .s e

    q .r s

    2 29 5

    8 05

    T he t

    a .s e

    q .r s

    1 05 0

    0 31 8

    T he t

    a .r s

    3 86 5

    5 07

    T he t

    a .r s

    1 04 2

    2 74 0

    T he t

    a .s e

    q .r s

    6 04 0

    7 7

    T he t

    a .r s

    6 40 3

    4 5

    T he t

    a .s e

    q .r s

    5 97 5

    9 8

    T he t

    a .s e

    q .r s

    6 04 9

    9 9

    Single Linkage