Upload
nikolas-pontikos
View
103
Download
0
Tags:
Embed Size (px)
Citation preview
1 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Nikolas Pontikos
PhD Student, CIMR
2 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Different types of cells can be identified based on their shape/size and the surface markers (proteins) that they express:
Biological Context: Cell Phenotypes
Lymphocytes Granulocytes Neutrophils
CD4+ Lymphocytes CD8+ Lymphocytes
CD45RA+ CD45RA-CD stands for Cluster of Differentiation these are surface proteins which can be used as markers to distinguish different cell types.
3 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
0 1000 2000 3000 4000
020
040
060
080
010
00
Forward Scatter
Side
Sca
tter
What is Flow Cytometry?
1998-2012 Abcam plc. All rights reserved
Cells ForwardScatter
SideScatter
CD4 CD127 CD45RA CD25
1 2110 309 103 254 4 70
2 1565 252 57 278 341 59
... ... ... ... ... ... ...
110,992 964 256 78 199 9 345
110,992 points
Granularity
Lymphocytes
Cell Size
Neutrophils
Granulocytes
4 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
The Transitional Phenotype of Cells
Memory Cell Naive Cell
CD45RA
Memory Cells Naive Cells
0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
Log10 CD45RA IntensityDe
nsity
As cells transition from one cell type (state) to
another they lose/gain expression of certain
markers.
Here the CD45RA marker is lost as cells
transition from naive to memory status.
This results in a bimodal distributions of the
intensity of CD45RA.
5 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Manual Method of Identifying Cell Phenotypes
% of CD25+ Naive Cells
% of Memory Cells
6 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Identifying all possible cell subsets is tedious and error-prone.
P parameters results in the order of P^2 bi-dimensional comparisons.
Manual analysis also introduces operator bias.
Unexpected or rare cell populations may be missed.
Issues with Manual Analysis of Flow Cytometry Data
7 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Flow Data Genetic Data
P ~ 100,000P ~ 10
N ~
100
0
N ~
1,0
0,00
0
N > 10,000 x P N < 100 x P
VS
Distance-based clustering:
- hclust- kmeans
Density-based clustering:
- identifying regions of significantly high-density- fitting mixture models
N cells N individualsP cellular markers P SNPs
8 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Motivation for SPADE
Heading towards high-dimensional data sets:- pooling of datasets- mass cytometry
Distance based methods are fast at the expense of storing the entire distance matrix.
Distance-based clustering is well suited for high-dimensional data sets when data is too sparse for density-based methods.
9 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Primarily a visualisation tool for revealing structure in point clouds as obtained from flow cytometry.
A clustering method with rare event detection thanks to density-dependent downsampling.
Four main steps in SPADE:
1) Density-dependent downsampling2) Agglomerative clustering3) Minimum spanning tree construction4) Upsampling
SPADE:spanning-tree progression analysis of density-normalised events
10 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Outline of SPADE as applied to a simulated data set- Proof of concept- Structure of data preserved and rarer cell population identified
Analysis of mouse hematopoiesis using flow cytometry data- Ability to reconstruct a known hierarchy- Comparison to manual gating- Identified cell population missed in manual gating (dendritic cells)
Analysis of human hematopoiesis using mass cytometry data- Joining multiple stimulation experiments on core markers- Non-targeted cell population identified (NK cells)
Results from paper
11 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
SPADE: Spanning-tree Progression Analysis of Density-normalised Events
(i) A simulated two-parameter flow cytometry data set, with one rare population and three abundant populations.
(ii) Result of density-dependent down-sampling of the original data.
(iii) Agglomerative clustering result of the down-sampled cells. Adjacent clusters are drawn in alternating colors.
(iv) Minimum spanning tree that connects the cell clusters.
(v) Colored SPADE trees. Nodes are colored by the median intensities of protein markers of cells in each node, allowing visualization of the behaviors of the two markers across the entire heterogeneous cell population.
Input
Output
12 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Density-dependent down-sampling: an example
1
0
1
2
3
4
1 0 1 2 3CD25
CD45RA
1
0
1
2
3
4
1 0 1 2 3CD25
CD45RA
N=200 N=50
After downsampling the density has been flattened to the target density while preserving rare clusters.
The green nodes can be used to build the minimum spanning-tree.
13 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Extracting Cellular Hierarchy
Identification of dendritic cells missed by manual gating in f
Mouse Data
14 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Identification of untargeted* cell type
NK Cells NK Cells
* CD127 and CD16 are typically not used to identify NK Cells.
Human Data
15 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Joining multiple flow experiments on core surface markers
Core markers used to build tree. Other markers, functional or additional surface markers, used to annotate it.
2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
CD4
N = 229591 Bandwidth = 0.08117
Dens
ity
Core markers need to align across experiments.
Human Data
16 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Visualisation of response
Pooling of experiments on common tree structure allows visualisation across many different experimental conditions.
Human Data
17 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Visualises high dimensional data.
Exposes hierarchy in bottom-up manner thanks to spanning-tree.
Identification of novel and rare cell types in flow cytometry thanks to density-dependent downsampling.
Pooling of multiple experiments on common tree structure for meta-analysis.
Conclusion
18 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Application to our data sets
19 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Visualising response to stimulation
0U.lymphocytes2Median of Alexa.Fluor.488.A
0.64 2.14
Range: 0.02 to 0.98 pctile
1U.lymphocytes2Median of Alexa.Fluor.488.A
0.64 2.14
Range: 0.02 to 0.98 pctile
10U.lymphocytes2Median of Alexa.Fluor.488.A
0.64 2.14
Range: 0.02 to 0.98 pctile
1000U.lymphocytes2Median of Alexa.Fluor.488.A
0.64 2.14
Range: 0.02 to 0.98 pctile
CD45RA+ CD25-
CD45RA- CD25+
resting
increasing dose
Data courtesy of Tony Cutler
Applied to Flow Data:
20 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Minimum spanning trees relate to single-linkage hierarchical
clustering as used in heatmaps for viewing genetic data
such as SNP arrays.
21 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Copy Number Imputation from SNP Log R Ratio and Theta
0 1 2 3
01
23
KIR3DS1
KIR
3DL1
x x
xx x
x x
x
60.38%
29.58%
3.66%
1.83%
2.10%
0.47% 0.34%
1.63%
Log_R_
Ratio.se
q.rs674268
Theta.seq.rs598452
Log_R_
Ratio.rs12461010
Theta.seq.rs1654644
Theta.seq.rs10500318
Theta.rs581623
Theta.seq.rs649216
Theta.seq.rs648689
Theta.rs12461010
Log_R_
Ratio.se
q.t1d.19.60054973.T.C
Log_R_
Ratio.rs640345
Log_R_
Ratio.se
q.rs10407958
Log_R_
Ratio.se
q.t1d.19.60014013.A.C
Log_R_
Ratio.se
q.rs597598
Log_R_
Ratio.se
q.rs55761930
Log_R_
Ratio.se
q.t1d.19.60056605.A.T
Log_R_
Ratio.se
q.rs3865510
Log_R_
Ratio.se
q.rs10500318
Log_R_
Ratio.se
q.rs649216
Log_R_
Ratio.rs3865507
Log_R_
Ratio.se
q.rs592645
Log_R_
Ratio.se
q.rs648689
Log_R_
Ratio.se
q.t1d.19.60034052.C.T
Log_R_
Ratio.se
q.rs12976350
Log_R_
Ratio.rs10422740
Log_R_
Ratio.se
q.t1d.19.60056721.C.T
Log_R_
Ratio.se
q.rs2295805
Log_R_
Ratio.rs3826878
Theta.rs3826878
Log_R_
Ratio.se
q.rs4806568
Log_R_
Ratio.rs581623
Log_R_
Ratio.rs4806585
Log_R_
Ratio.se
q.t1d.19.60007809.C.G
Log_R_
Ratio.se
q.rs604999
Log_R_
Ratio.se
q.rs1654644
Theta.seq.t1d.19.60054973.T.C
Theta.seq.t1d.19.60014013.A.C
Log_R_
Ratio.se
q.rs604077
Log_R_
Ratio.se
q.rs598452
Theta.seq.rs10407958
Theta.seq.t1d.19.60007809.C.G
Theta.seq.rs592645
Log_R_
Ratio.se
q.rs62122181
Theta.seq.rs674268
Theta.seq.rs604077
Theta.rs640345
Theta.seq.rs597598
Theta.seq.rs604999
Theta.seq.rs55761930
Theta.seq.rs12976350
Theta.seq.rs4806568
Theta.seq.rs62122181
Theta.seq.t1d.19.60034052.C.T
Theta.seq.t1d.19.60056605.A.T
Theta.seq.rs3865510
Theta.rs3865507
Theta.rs10422740
Theta.rs4806585
Theta.seq.t1d.19.60056721.C.T
Theta.seq.rs2295805
Log R Ratio and Theta from ImmunoChip of 30 SNPs in gene A and B regionCopy Number Calls Using qPCR data
Gene A
Gene
B 0-2 1-2
0-1 1-1 2-1
3-02-01-0
Applied to Genetic Data:
22 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Minimal Spanning TreeSingle-linkage hclust1510
50
23 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Minimum spanning trees another way of visualising high-
dimensional data?
24 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Lum, P. Y., Singh, G., Lehman, A., Ishkanov, T., Vejdemo-Johansson, M., Alagappan, M., et al. (2013). Extracting insights from the shape of complex data using topology. Scientific Reports, 3. doi:10.1038/srep01236
Topological Data Analysis
25 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Qiu, P. (2012). Inferring Phenotypic Properties from Single-Cell Characteristics. PloS one, 7(5), e37038. doi:10.1371/journal.pone.0037038
P
P and Q can be comparing using the Earth Mover Distance subject to minimising
Comparing distribution of cells across the same tree
EMD(P,Q) =
Pmi=1
Pnj=1 fijdijPm
i=1
Pnj=1 fij
.
P and Q are cell distributions across the nodes of the same tree.
Q
fij
dij
# of cells moved from node i to node j
# of hops in shortest path from node i to j
Pmi=1
Pnj=1 fijdij
26 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Calculating density
All these points will be assigned the same local density (LD).
According to the target density (TD) and outlier density (OD), SPADE keeps each cell i with the following probability:
27 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Building Minimal Spanning Tree
The minimal spanning-tree (MST) is the shortest path which connects all nodes.
Layout of tree determined by Fruchterman-Reingold algorithm (see Methods).
MST is related to single-linkage hierarchical clustering algorithm (see later).
28 of 28
2013-09-03, Nikolas Pontikos, Extracting a cellular hierarchy from high-dimensional cytometry data with SPADE
Average Linkage
Log_R_
Ratio.se
q.rs674268
Theta.seq.rs598452
Log_R_
Ratio.rs12461010
Theta.seq.rs1654644
Theta.seq.rs10500318
Theta.rs581623
Theta.seq.rs649216
Theta.seq.rs648689
Theta.rs12461010
Log_R_
Ratio.se
q.t1d.19.60054973.T.C
Log_R_
Ratio.rs640345
Log_R_
Ratio.se
q.rs10407958
Log_R_
Ratio.se
q.t1d.19.60014013.A.C
Log_R_
Ratio.se
q.rs597598
Log_R_
Ratio.se
q.rs55761930
Log_R_
Ratio.se
q.t1d.19.60056605.A.T
Log_R_
Ratio.se
q.rs3865510
Log_R_
Ratio.se
q.rs10500318
Log_R_
Ratio.se
q.rs649216
Log_R_
Ratio.rs3865507
Log_R_
Ratio.se
q.rs592645
Log_R_
Ratio.se
q.rs648689
Log_R_
Ratio.se
q.t1d.19.60034052.C.T
Log_R_
Ratio.se
q.rs12976350
Log_R_
Ratio.rs10422740
Log_R_
Ratio.se
q.t1d.19.60056721.C.T
Log_R_
Ratio.se
q.rs2295805
Log_R_
Ratio.rs3826878
Theta.rs3826878
Log_R_
Ratio.se
q.rs4806568
Log_R_
Ratio.rs581623
Log_R_
Ratio.rs4806585
Log_R_
Ratio.se
q.t1d.19.60007809.C.G
Log_R_
Ratio.se
q.rs604999
Log_R_
Ratio.se
q.rs1654644
Theta.seq.t1d.19.60054973.T.C
Theta.seq.t1d.19.60014013.A.C
Log_R_
Ratio.se
q.rs604077
Log_R_
Ratio.se
q.rs598452
Theta.seq.rs10407958
Theta.seq.t1d.19.60007809.C.G
Theta.seq.rs592645
Log_R_
Ratio.se
q.rs62122181
Theta.seq.rs674268
Theta.seq.rs604077
Theta.rs640345
Theta.seq.rs597598
Theta.seq.rs604999
Theta.seq.rs55761930
Theta.seq.rs12976350
Theta.seq.rs4806568
Theta.seq.rs62122181
Theta.seq.t1d.19.60034052.C.T
Theta.seq.t1d.19.60056605.A.T
Theta.seq.rs3865510
Theta.rs3865507
Theta.rs10422740
Theta.rs4806585
Theta.seq.t1d.19.60056721.C.T
Theta.seq.rs2295805
L og _
R _R a
t i o. r s
4 80 6
5 85
L og _
R _R a
t i o. s e
q .t 1
d .1 9
. 60 0
0 78 0
9 .C .
G
L og _
R _R a
t i o. r s
3 82 6
8 78
L og _
R _R a
t i o. s e
q .r s
5 98 4
5 2
L og _
R _R a
t i o. s e
q .t 1
d .1 9
. 60 0
5 49 7
3 .T . C
L og _
R _R a
t i o. r s
6 40 3
4 5
L og _
R _R a
t i o. s e
q .t 1
d .1 9
. 60 0
1 40 1
3 .A .
C
L og _
R _R a
t i o. s e
q .r s
5 97 5
9 8
L og _
R _R a
t i o. s e
q .r s
5 57 6
1 93 0
L og _
R _R a
t i o. r s
3 86 5
5 07
L og _
R _R a
t i o. s e
q .r s
6 49 2
1 6
L og _
R _R a
t i o. s e
q .r s
3 86 5
5 10
L og _
R _R a
t i o. s e
q .r s
1 05 0
0 31 8
L og _
R _R a
t i o. s e
q .r s
6 48 6
8 9
L og _
R _R a
t i o. s e
q .r s
5 92 6
4 5
L og _
R _R a
t i o. s e
q .r s
1 04 0
7 95 8
L og _
R _R a
t i o. s e
q .t 1
d .1 9
. 60 0
5 66 0
5 .A .
T
L og _
R _R a
t i o. s e
q .r s
4 80 6
5 68
L og _
R _R a
t i o. s e
q .t 1
d .1 9
. 60 0
3 40 5
2 .C .
T
L og _
R _R a
t i o. s e
q .r s
1 29 7
6 35 0
L og _
R _R a
t i o. r s
1 04 2
2 74 0
L og _
R _R a
t i o. r s
5 81 6
2 3
T he t
a .r s
3 82 6
8 78
L og _
R _R a
t i o. s e
q .t 1
d .1 9
. 60 0
5 67 2
1 .C .
T
L og _
R _R a
t i o. s e
q .r s
1 65 4
6 44
L og _
R _R a
t i o. s e
q .r s
2 29 5
8 05
T he t
a .s e
q .r s
1 04 0
7 95 8
T he t
a .s e
q .t 1
d .1 9
. 60 0
0 78 0
9 .C .
G
L og _
R _R a
t i o. s e
q .r s
6 04 0
7 7
L og _
R _R a
t i o. s e
q .r s
6 04 9
9 9
T he t
a .s e
q .t 1
d .1 9
. 60 0
5 49 7
3 .T . C
T he t
a .s e
q .t 1
d .1 9
. 60 0
1 40 1
3 .A .
C
L og _
R _R a
t i o. s e
q .r s
6 74 2
6 8
L og _
R _R a
t i o. r s
1 24 6
1 01 0
T he t
a .s e
q .r s
5 98 4
5 2
T he t
a .s e
q .r s
5 92 6
4 5
L og _
R _R a
t i o. s e
q .r s
6 21 2
2 18 1
T he t
a .r s
5 81 6
2 3
T he t
a .s e
q .r s
6 49 2
1 6
T he t
a .s e
q .r s
6 48 6
8 9
T he t
a .s e
q .r s
1 65 4
6 44
T he t
a .s e
q .r s
5 57 6
1 93 0
T he t
a .s e
q .r s
1 29 7
6 35 0
T he t
a .s e
q .r s
6 74 2
6 8
T he t
a .s e
q .r s
4 80 6
5 68
T he t
a .s e
q .r s
6 21 2
2 18 1
T he t
a .r s
1 24 6
1 01 0
T he t
a .s e
q .t 1
d .1 9
. 60 0
3 40 5
2 .C .
T
T he t
a .s e
q .t 1
d .1 9
. 60 0
5 66 0
5 .A .
T
T he t
a .s e
q .r s
3 86 5
5 10
T he t
a .r s
4 80 6
5 85
T he t
a .s e
q .t 1
d .1 9
. 60 0
5 67 2
1 .C .
T
T he t
a .s e
q .r s
2 29 5
8 05
T he t
a .s e
q .r s
1 05 0
0 31 8
T he t
a .r s
3 86 5
5 07
T he t
a .r s
1 04 2
2 74 0
T he t
a .s e
q .r s
6 04 0
7 7
T he t
a .r s
6 40 3
4 5
T he t
a .s e
q .r s
5 97 5
9 8
T he t
a .s e
q .r s
6 04 9
9 9
Single Linkage