Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
12 Grand Challenges in Single-Cell Data
Science
David Lähnemann*,1,2,3, Johannes Köster*,+,1,4, Ewa Szczurek*,5, Davis J. McCarthy*,6,7, Stephanie C. Hicks*,8, Mark D.
Robinson*,9, Catalina A. Vallejos*,10,11, Niko Beerenwinkel*,12,13, Kieran R. Campbell*,15,16,17, Ahmed Mahfouz*,18,19, Luca
Pinello*,20,21,22, Pavel Skums*,23, Alexandros Stamatakis*,24,25, Camille Stephan-Otto Attolini*,26, Samuel Aparicio16,27,
Jasmijn Baaijens29, Marleen Balvert29,31, Buys de Barbanson32,33,34, Antonio Cappuccio35, Giacomo Corleone36, Bas E.
Dutilh31,38, Maria Florescu32,33,34, Victor Guryev41, Rens Holmer42, Katharina Jahn12,13, Thamar Jessurun Lobo41, Emma M.
Keizer45, Indu Khatri46, Szymon M. Kiełbasa47, Jan O. Korbel48, Alexey M. Kozlov24, Tzu-Hao Kuo3, Boudewijn P.F.
Lelieveldt49,50, Ion I. Mandoiu51, John C. Marioni52,53,54, Tobias Marschall55,56, Felix Mölder1,59, Amir Niknejad60,61, Łukasz
Rączkowski5, Marcel Reinders18,19, Jeroen de Ridder32,33, Antoine-Emmanuel Saliba62, Antonios Somarakis50, Oliver
Stegle48,54,63, Fabian J. Theis67, Huan Yang68, Alex Zelikovsky69,70, Alice C. McHardy+,3, Benjamin J. Raphael+,71, Sohrab P.
Shah+,72, and Alexander Schönhuth@,+,*,29,31
*Joint first authors, major contributions to manuscript.1Algorithms for Reproducible Bioinformatics, Genome Informatics, Institute of Human Genetics, University Hospital Essen, University of
Duisburg-Essen, Germany2Department of Paediatric Oncology, Haematology and Immunology, Medical Faculty, Heinrich Heine University, University Hospital, Düsseldorf,
Germany3Computational Biology of Infection Research Group, Helmholtz Centre for Infection Research, Braunschweig, Germany
4Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, USA5Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics University of Warsaw, Poland
6Bioinformatics and Cellular Genomics, St Vincent’s Institute of Medical Research, Fitzroy, Australia7Melbourne Integrative Genomics, School of BioSciences — School of Mathematics & Statistics, Faculty of Science, University of Melbourne,
Australia8Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
9Institute of Molecular Life Sciences and SIB Swiss Institute of Bioinformatics, University of Zurich, Switzerland10MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, UK
11The Alan Turing Institute, British Library, London, UK12Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
13SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland15Department of Statistics, University of British Columbia, Vancouver, Canada16Department of Molecular Oncology, BC Cancer Agency, Vancouver, Canada17Data Science Institute, University of British Columbia, Vancouver, Canada
18Leiden Computational Biology Center, Leiden University Medical Center, The Netherlands19Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, The Netherlands
20Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital Research Institute, Charlestown, USA21Department of Pathology, Harvard Medical School, Boston, USA
22Broad Institute of Harvard and MIT, Cambridge, MA, USA23Department of Computer Science, Georgia State University, Atlanta, USA
24Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Germany25Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Germany
26Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Spain27Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, Canada
29Life Sciences and Health, Centrum Wiskunde & Informatica, Amsterdam, The Netherlands31Theoretical Biology and Bioinformatics, Science for Life, Utrecht University, The Netherlands
32Center for Molecular Medicine, University Medical Center Utrecht, The Netherlands33Oncode Institute, Utrecht, The Netherlands
34Quantitative biology, Hubrecht Institute, Utrecht, The Netherlands35Institute for Advanced Study, University of Amsterdam, The Netherlands
36Department of Surgery and Cancer, The Imperial Centre for Translational and Experimental Medicine, Imperial College London, UK38Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Nijmegen, The Netherlands
41European Research Institute for the Biology of Ageing, University Medical Center Groningen, University of Groningen, The Netherlands42Bioinformatics Group, Wageningen University, The Netherlands45Biometris, Wageningen University & Research, The Netherlands
46Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, The Netherlands47Department of Biomedical Data Sciences, Leiden University Medical Center, The Netherlands
48Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany49PRB lab, Delft University of Technology, The Netherlands
50Division of Image Processing, Department of Radiology, Leiden University Medical Center, The Netherlands51Computer Science & Engineering Department, University of Connecticut, Storrs, USA
52Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, UK53Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, UK
54European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK55Center for Bioinformatics, Saarland University, Saarbrücken, Germany
56Max Planck Institute for Informatics, Saarbrücken, Germany59Institute of Pathology, University Hospital Essen, University of Duisburg-Essen, Germany.
60Computation molecular design, Zuse Institute Berlin, Germany61Mathematics department, Mount Saint Vincent, New York, USA
62Helmholtz Institute for RNA-based Infection Research, Helmholtz-Center for Infection Research, Würzburg, Germany63Division of Computational Genomics and Systems Genetics, German Cancer Research Center – DKFZ, Heidelberg, Germany
67Institute of Computational Biology, Helmholtz Zentrum München – German Research Center for Environmental Health, Neuherberg, Germany68Division of Drug Discovery and Safety, Leiden Academic Center for Drug Research – LACDR — Leiden University, The Netherlands
69Department of Computer Science, Georgia State University, Atlanta, USA70The Laboratory of Bioinformatics, I.M. Sechenov First Moscow State Medical University, Moscow, Russia
71Department of Computer Science, Princeton University, USA72Computational Oncology, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, USA
+Joint last authors, workshop organizers.
@Corresponding author: Alexander Schönhuth, [email protected]
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
The recent upswing of microfluidics and1
combinatorial indexing strategies, further en-2
hanced by very low sequencing costs, have3
turned single cell sequencing into an em-4
powering technology; analyzing thousands—5
or even millions—of cells per experimental6
run is becoming a routine assignment in lab-7
oratories worldwide. As a consequence, we8
are witnessing a data revolution in single cell9
biology. Although some issues are similar in10
spirit to those experienced in bulk sequencing,11
many of the emerging data science problems12
are unique to single cell analysis; together,13
they give rise to the new realm of ’Single-Cell14
Data Science’.15
Here, we outline twelve challenges that will16
be central in bringing this new field forward.17
For each challenge, the current state of the art18
in terms of prior work is reviewed, and open19
problems are formulated, with an emphasis20
on the research goals that motivate them.21
This compendium is meant to serve as a22
guideline for established researchers, newcom-23
ers and students alike, highlighting interesting24
and rewarding problems in ’Single-Cell Data25
Science’ for the coming years.26
Contents27
1 Introduction 328
2 Single-Cell Data Science:29
Themes and Categories 430
2.1 Varying levels of resolution . . . 531
2.2 Quantifying uncertainty of32
measurements and analysis33
results . . . . . . . . . . . . . . 534
2.3 Scaling to higher dimensionali-35
ties: more cells, more features,36
broader coverage . . . . . . . . 637
2.4 Challenge categories . . . . . . 738
3 Challenges in single-cell tran- 39
scriptomics 7 40
3.1 Challenge I: Handling sparsity 41
in single-cell RNA sequencing . 7 42
3.2 Challenge II: Defining flexible 43
statistical frameworks for dis- 44
covering complex differential 45
patterns in gene expression . . . 13 46
3.3 Challenge III: Mapping single 47
cells to a reference atlas . . . . 15 48
3.4 Challenge IV: Generalizing 49
trajectory inference . . . . . . . 16 50
3.5 Challenge V: Finding patterns 51
in spatially resolved measure- 52
ments . . . . . . . . . . . . . . 18 53
4 Challenges in single-cell genomics 19 54
4.1 Challenge VI: Improving 55
single-cell DNA sequencing 56
data quality and scaling to 57
more cells . . . . . . . . . . . . 20 58
4.2 Challenge VII: Errors and 59
missing data in the identifi- 60
cation of features / variation 61
from single-cell DNA sequenc- 62
ing data. . . . . . . . . . . . . . 23 63
5 Challenges in single-cell phy- 64
logenomics 25 65
5.1 Challenge VIII: Scaling phylo- 66
genetic models to many cells 67
and many sites . . . . . . . . . 26 68
5.2 Challenge IX: Integrating mul- 69
tiple types of features / varia- 70
tion into phylogenetic models . 27 71
5.3 Challenge X: Inferring popula- 72
tion genetic parameters of tu- 73
mor heterogeneity by model in- 74
tegration . . . . . . . . . . . . . 29 75
2
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
6 Overarching challenges 351
6.1 Challenge XI: Integration of2
single-cell data: across sam-3
ples, experiments and types of4
measurement . . . . . . . . . . 355
6.2 Challenge XII: Validating and6
benchmarking analysis tools7
for single-cell measurements . . 398
7 Acknowledgements 419
1 Introduction10
Since being elevated to “Method of the Year”11
in 2013 [Nature Methods, 2013], sequencing12
of the genetic material of individual cells has13
become routine when investigating cell-to-cell14
heterogeneity. Single-cell measurements of15
both RNA and DNA, and more recently also16
of epigenetic marks and protein levels, can17
stratify cells at the finest resolution possible.18
Single-cell RNA sequencing (scRNA-seq)19
facilitates to distinguish cell states within20
coarser cell type clusters [for an early exam-21
ple, see Anchang et al., 2016], thereby ar-22
ranging populations of cells according to novel23
types of hierarchies. It is also possible to24
identify cells in transition between states, so25
we get a much clearer view on the dynamics26
of tissue and organism development, and on27
structures within cell populations that had so28
far been perceived as homogeneous. Along29
a similar vein, analyses based on single-cell30
DNA sequencing (scDNA-seq) can highlight31
somatic clonal structures [e.g. in cancer, see32
Francis et al., 2014, ?] and are thus helpful33
for tracking the formation of certain cell lin-34
eages and to provide insight into evolutionary35
processes acting on somatic mutations.36
The opportunities arising from single-cell37
sequencing (sc-seq) are enormous: only now38
is it possible to re-evaluate hypotheses about39
differences between pre-defined sample groups40
at the single-cell level—no matter if such 41
sample groups are disease subtypes, treat- 42
ment groups or simply morphologically dif- 43
ferent cell types. It is therefore no surprise 44
that the enthusiasm about the possibility to 45
screen the genetic material of the basic units 46
of life has been continuing to grow: a promi- 47
nent example is the Human Cell Atlas [Regev 48
et al., 2017], an initiative aiming to map the 49
different types and states of cells that a hu- 50
man being is composed of, or Zhang and Liu 51
[2019], as a most recent example of a list of 52
single-cell analysis based opportunities in par- 53
ticular domains such as the blood, the brain 54
and the lung. 55
Encouraged by the great potential of in- 56
vestigating DNA and RNA at the single- 57
cell level, the development of the corre- 58
sponding experimental technologies has expe- 59
rienced massive boosts. This upswing of high- 60
throughput sc-seq technologies—most impor- 61
tantly in microfluidics techniques and com- 62
binatorial indexing strategies [Zilionis et al., 63
2017, Vitak et al., 2017, Svensson et al., 64
2018b, Luo et al., 2019, Gao et al., 2019]— 65
means that tens or hundreds of thousands 66
of cells, instead of just tens or hundreds, 67
are routinely sequenced in one experiment; a 68
development—further fueled by in the mean- 69
time low sequencing costs—that has recently 70
even led to a publication on millions of cells in 71
one experiment [Cao et al., 2019a]. As a con- 72
sequence, primary and secondary sc-seq re- 73
sults of very large numbers of single cells are 74
becoming available worldwide, constituting a 75
data revolution for the field of single-cell anal- 76
ysis. 77
These vast amounts of data and the re- 78
search hypotheses that motivate them, need 79
to be handled in a computationally efficient 80
and statistically sound manner. As these 81
aspects clearly match a recent definition of 82
“Data Science” [Hicks and Peng, 2019], we 83
posit that we have entered the era of Single- 84
3
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Cell Data Science (SCDS).1
While SCDS faces many of the data sci-2
ence issues arising in bulk sequencing, it also3
substantially adds to them and further com-4
pounds existing scientific challenges. Namely,5
limited amounts of material available per cell6
lead to exceptionally high levels of uncer-7
tainty about (possibly missed) observations,8
and where amplification is used to generate9
more material, technical noise is added to the10
resulting data. Further, a new level of resolu-11
tion also means another—rapidly growing—12
dimension in data matrices, thus requiring13
scalable models and methods for data anal-14
ysis. While the particular challenges can vary15
greatly by research goal, tissue analyzed, ex-16
perimental setup or—last but not least—just17
by whether DNA or RNA is sequenced, fur-18
ther factoring into various protocols, assaying19
for example also the epigenome (bisulfite pro-20
tocols), chromatin accessibility (e.g. ATAC-21
seq) or protein levels (CITE-seq), the com-22
mon denominator is that the challenges are23
all rooted in data science, hence are compu-24
tational or statistical in nature. Here, we pro-25
pose the dozen data science challenges that we26
believe to be most relevant for bringing SCDS27
forward. We summarize and categorize them,28
providing a thorough review of the status of29
each challenge relative to existing approaches.30
From this foundation, we point to possible di-31
rections of research to tackle them. This cat-32
alog of SCDS challenges aims at focusing the33
development of data analysis methods and the34
directions of research in this rapidly evolving35
field—as a guideline for researchers looking36
for rewarding problems that match their per-37
sonal expertise and interests.38
2 Single-Cell Data Science: 39
Themes and Categories 40
A number of challenging themes are common 41
to all single-cell analyses, regardless of the 42
particular assay or data modality generated. 43
We will start our review by broadly categoriz- 44
ing these aspects. Later, when discussing the 45
specific 12 challenges, we will refer to these 46
broader categories wherever appropriate and, 47
if this is sensible, lay out what these broader 48
theme issues mean in the particular context. 49
If challenges covered in later sections are par- 50
ticularly entangled with the broader themes 51
listed here, we will also refer to them from 52
within this section. 53
These elementary themes may reflect issues 54
one also experiences when analyzing bulk se- 55
quencing data. However, even if not unique 56
to single-cell experiments, these issues may 57
become particularly dominant in the analysis 58
of sc-seq data and therefore require particu- 59
lar attention. The most driving of such el- 60
ementary themes, not necessarily unique to 61
sc-seq, are: (i) The need to quantify mea- 62
surement uncertainty (see challenges in sec- 63
tion 2.2) (ii) The need to benchmark methods 64
systematically, in a way that highlights the 65
metrics that are particularly critical in sc-seq 66
(section 6.2). The most driving themes spe- 67
cific to sc-seq, exacerbated by the rapid ad- 68
vances in terms of experimental technologies 69
supporting single-cell analyses, are: (i) The 70
need to scale to higher dimensional data, be 71
it more cells measured or more data mea- 72
sured per cell (section 2.3); this often arises 73
in combination with: (ii) The need to inte- 74
grate data across different types of single– 75
cell measurements (e.g. RNA, DNA, proteins, 76
methylation and so on) and across samples, 77
be they from different time points, treatment 78
groups or even organisms (section 6.1). Fi- 79
nally, the possibility to operate on the finest 80
4
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
levels of resolution casts an important, over-1
arching question: (iii) Which exact level of2
resolution is appropriate relative to the par-3
ticular research question one has in mind (sec-4
tion 2.1)? We will start by qualifying this last5
one.6
2.1 Varying levels of resolution7
Sc-seq allows for a fine-grained definition of8
cell types and states. Hence it allows for9
characterizations of cell populations that are10
significantly more detailed than characteriza-11
tions supported by bulk sequencing experi-12
ments. However, even though sc-seq operates13
at the most basic level, mapping cell types14
and states at a particular level of resolution15
of interest may be challenging: Depending on16
whether the research question allows for a cer-17
tain freedom in terms of resolution, and de-18
pending on the limits imposed by the particu-19
lar experimental setup, achieving the targeted20
level of resolution or granularity for the in-21
tended map of cells may require substantial22
methodological efforts.23
When drawing maps of cell types and24
states, it is important that they: (i) have a25
structure that recapitulates both tissue devel-26
opment and tissue organization; (ii) account27
for continuous cell states in addition to dis-28
crete cell types (i.e. reflecting cell state tra-29
jectories within cell types and smooth tran-30
sitions between cell types, as observed in tis-31
sue generation); (iii) allow for choosing the32
level of resolution flexibly (i.e. the map should33
possibly support zoom type operations, to34
let the researcher choose the desired level35
of granularity with respect to cell types and36
states conveniently, ranging from whole or-37
ganisms via tissues to cell populations and38
cellular subtypes); (iv) include biological and39
functional annotation wherever available and40
helpful in the intended functional context.41
An exemplary illustration of how maps of42
cell types and states can support different lev- 43
els of resolution are the structure-rich topolo- 44
gies generated by PAGA based on scRNA- 45
seq [Wolf et al., 2019], see Figure 1 for an 46
illustration1. At the highest levels of resolu- 47
tion, these topologies also reflect intermedi- 48
ate cell states and the developmental trajec- 49
tories passing through them. A similar ap- 50
proach that also allows for consistently zoom- 51
ing into more detailed levels of resolution is 52
provided by hierarchical stochastic neighbor 53
embedding (HSNE, Pezzotti et al. [2016]), a 54
method pioneered on mass cytometry data 55
sets [Unen et al., 2017, Höllt et al., 2018]. 56
In addition, manifold learning [Welch et al., 57
2017, Moon et al., 2018] and metric learning 58
[Hoffer and Ailon, 2015, Bromley et al., 1993] 59
may provide further theoretical support for 60
even more accurate maps, because they pro- 61
vide sound theories about reasonable, contin- 62
uous distance metrics, instead of just distinct, 63
discrete clusters. 64
2.2 Quantifying uncertainty of 65
measurements and analysis 66
results 67
The amount of material sampled from single 68
cells is considerably less in comparison with 69
the amounts of material raised in bulk exper- 70
iments, because the latter are based on ex- 71
amining the DNA or RNA of larger pools of 72
cells together. Signals become more stable 73
when individual signals are summarized (such 74
as in a bulk experiment), thus the increase in 75
resolution due to sc-seq also means a reduc- 76
tion of the stability of the supporting signals. 77
The reduction in signal stability, in turn, im- 78
plies that data becomes substantially more 79
1Figure 1 was adapted from Wolf et al. [2019],Fig. 3, provided under Creative CommonsAttribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/).
5
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
�ssues cell types single cells
intermediatecell states
trajectories
Figure 1: Different levels of resolution are of interest, depending on the research question andthe data available. Thus, analysis tools and reference systems (such as cell atlases) will haveto accommodate for multiple levels of resolution from whole organs and tissues over discretecell types to continuously mappable intermediate cell states, indistinguishable even at themicroscopic level. A graph abstraction that enables such multiple levels of focus is providedby PAGA [Wolf et al., 2019], a structure that allows for discretely grouping cells, as well asinferring trajectories as paths through a graph.
uncertain and tasks hitherto considered rou-1
tine, such as single nucleotide variation (SNV)2
calling in bulk sequencing, require consider-3
able methodological care to be resolved also4
for sc-seq.5
These issues with data quality and in par-6
ticular missing data pose challenges that are7
novel and unique to sc-seq, and are thus8
at the core of several challenges: regarding9
scDNA-seq data quality (see challenges in10
section 4.1) and especially regarding missing11
data in scDNA-seq (section 4.2) and scRNA-12
seq (section 3.1). In contrast, the non-13
negligible batch effects that scRNA-seq can14
suffer from reflect a common problem in high-15
throughput data analysis [Leek et al., 2010],16
and thus are not discussed here (although in17
certain protocols such effects can be allevi-18
ated by careful use of negative control data19
in the form of spike-in RNA of known con-20
tent and concentration [Severson et al., 2018,21
BEARscc]).22
Optimally, sc-seq analysis tools would accu-23
rately quantify all uncertainties arising from 24
experimental errors and biases. Thereby, 25
these tools would prevent the uncertainties 26
from propagating to the intended downstream 27
analyses in an uncontrolled manner, and 28
rather translate them into statistically sound 29
and accurately quantified qualifiers of final re- 30
sults. 31
2.3 Scaling to higher 32
dimensionalities: more cells, 33
more features, broader 34
coverage 35
The current blossoming of experimental 36
methods poses considerable statistical chal- 37
lenges, and would do even if measurements 38
were not affected by errors and biases. 39
The increase in the number of single cells 40
analyzed per experiment translates into more 41
data points being generated, requiring meth- 42
ods to scale rapidly. With scRNA-seq already 43
6
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
scaling to millions of cells, some of the respec-1
tive methodology has picked up the thread2
[Sengupta et al., 2016, Sinha et al., 2018, Wolf3
et al., 2018, Iacono et al., 2018]. Of course,4
the respective issues have not yet been fully5
resolved; further improvements are conceiv-6
able. For scDNA-seq, experimental method-7
ology has just been scaling up to more cells re-8
cently (see section 4.1 and section 5.1), mak-9
ing this a pressing challenge in the develop-10
ment of data analysis methods.11
Beyond basic scRNA-seq and scDNA-seq12
experiments, various assays have been pro-13
posed to measure chromatin accessibility14
[Buenrostro et al., 2015, Cusanovich et al.,15
2015], DNA methylation [Karemaker and Ver-16
meulen, 2018], protein levels [Virant-Klun17
et al., 2016], protein binding, and also for per-18
forming multiple simultaneous measurements19
[Clark et al., 2018, Cao et al., 2018] in sin-20
gle cells.The corresponding increase in exper-21
imental choices means another possible infla-22
tion of feature spaces.23
In parallel to the increase in the number24
of cells queried and the number of different25
assays possible, the increase of the resolu-26
tion per cell of specific measurement types27
causes a steady increase of the dimension-28
ality of corresponding data spaces. For the29
field of SCDS this amounts to a severe and30
recurring case of the “curse of dimensional-31
ity” for all types of measurements. Here32
again, scRNA-seq based methods are in the33
lead when trying to deal with feature dimen-34
sionality, while scDNA-seq based methodol-35
ogy (which includes epigenome assays) has yet36
to catch up.37
Finally, there are efforts to measure multi-38
ple feature types in parallel, e.g. from scDNA-39
seq (see section 5.2). Also, with spatial and40
temporal sampling becoming available (see41
section 3.5 and section 5.3), data integration42
methods need to scale to more and new types43
of context information for individual cells (see44
section 6.1 for a comprehensive discussion of 45
data integration approaches). 46
2.4 Challenge categories 47
All challenges we identified fall into at least 48
one of three greater categories: transcrip- 49
tomics (section 3), genomics (section 4) and 50
phylogenomics (section 5). Here, the separa- 51
tion of phylogenomics from genomics is due to 52
the distinct research goals the respective chal- 53
lenges address. Last but not least, two chal- 54
lenges are relevant to all of these categories, 55
and are thus discussed as recapitulatory chal- 56
lenges at the end: the data integration chal- 57
lenge (section 6.1) draws on the types of mea- 58
surements and experiments described in the 59
category-specific challenges. The benchmark- 60
ing challenge (presented in section 6.2), al- 61
though being essential in many areas of data 62
science, is worth highlighting here in partic- 63
ular, because benchmarking for SCDS is still 64
in its infancy. 65
3 Challenges in single-cell 66
transcriptomics 67
3.1 Challenge I: Handling 68
sparsity in single-cell RNA 69
sequencing 70
A comprehensive characterization of the tran- 71
scriptional status of individual cells enables us 72
to gain full insight into the interplay of tran- 73
scripts within single cells. However, scRNA- 74
seq measurements typically suffer from large 75
fractions of observed zeros, where a given gene 76
in a given cell has no unique molecule identi- 77
fiers or reads mapping to it. These observed 78
zero values can represent either missing data 79
(i.e. a gene is expressed but not detected by 80
the sequencing technology) or true absence of 81
7
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
expression. The proportion of zeros, or degree1
of sparsity, is thought to be due to imper-2
fect reverse transcription and amplification,3
and other technical limitations (Hicks et al.4
[2018], Bacher and Kendziorski [2016]), and5
depends on the scRNA-seq platform used, the6
sequencing depth and the underlying expres-7
sion level of the gene. The term “dropout” is8
often used to denote observed zero values in9
scRNA-seq data, but this term conflates zero10
values attributable to methodological noise11
and biologically-true zero expression, so we12
recommend against its use as a catch-all term13
for observed zeros.14
Sparsity in scRNA-seq data can hinder15
downstream analyses, but it is challenging to16
model or handle it appropriately, and thus,17
there remains an ongoing need for improved18
methods. Sparsity pervades all aspects of19
scRNA-seq data analysis, but here we fo-20
cus on the linked problems of learning la-21
tent spaces and “imputing” expression values22
from scRNA-seq data (Figure 2). Imputation,23
“data smoothing” and “data reconstruction”24
approaches are closely linked to the challenges25
of normalization. But whereas normalization26
generally aims to make expression values be-27
tween cells more comparable to each other,28
imputation and data smoothing approaches29
aim to achieve adjusted data values that—it30
is hoped—better represent the true expression31
values. Imputation methods could therefore32
be used for normalization, but do not entail33
all possible or useful approaches to normal-34
ization.35
3.1.1 Status36
The imputation of missing values has been37
very successful for genotype data. Crucially,38
when imputing genotypes we often know39
which data are missing (e.g. when no geno-40
type call is possible due to no coverage of41
a locus, although see section section 4.2 for42
the challenges with scDNA-seq data) and rich 43
sources of external information are available 44
(e.g. haplotype reference panels). Thus, geno- 45
type imputation is now highly accurate and 46
a commonly-used step in data processing for 47
genetic association studies [Das et al., 2018]. 48
The situation is somewhat different for 49
scRNA-seq data, as we do not routinely have 50
external reference information to apply (see 51
section 3.3). In addition, we can never be sure 52
which observed zeros represent “missing data” 53
and which accurately represent a true gene ex- 54
pression level in the cell [Hicks et al., 2018]. 55
Observed zeros can either represent “biologi- 56
cal” zeros, i.e. those present because the true 57
expression level of a gene in a cell was zero. 58
Or they they are the result of methodological 59
noise, which can arise when a gene has true 60
non-zero expression in a cell, but no counts 61
are observed due to failures at any point in 62
the complicated process of processing mRNA 63
transcripts in cells into mapped reads. Such 64
noise can lead to artefactual zero that are ei- 65
ther more systematic (e.g. sequence-specific 66
mRNA degradation during cell lysis) or that 67
occur by chance (e.g. barely expressed tran- 68
scripts that at the same expression level will 69
sometimes be detected and sometimes not, 70
due to sampling variation, e.g in the sequenc- 71
ing). The high degree of sparsity in scRNA- 72
seq data therefore arises from technical zeros 73
and true biological zeros, which are difficult 74
to distinguish from one another. 75
In general, two broad approaches can be ap- 76
plied to tackle this problem of sparsity: (i) use 77
statistical models that inherently model the 78
sparsity, sampling variation and noise modes 79
of scRNA-seq data with an appropriate data 80
generative model; or (ii) attempt to “impute” 81
values for observed zeros (ideally the tech- 82
nical zeros; sometimes also non-zero values) 83
that better approximate the true gene expres- 84
sion levels. We prefer to use the first option 85
where possible, and for many single-cell data 86
8
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
analysis problems, statistical models appro-1
priate for sparse count data exist and should2
be used (e.g. for differential expression anal-3
ysis). However, there are many cases where4
the appropriate models are not available and5
accurate imputation of technical zeros would6
allow better results from downstream meth-7
ods and algorithms that cannot handle sparse8
count data. For example, imputation could9
be particularly useful for many dimension re-10
duction, visualization and clustering applica-11
tions. It is therefore desirable to improve both12
statistical methods that work on sparse count13
data directly and approaches for data impu-14
tation for scRNA-seq data, whether by re-15
fining existing techniques or developing new16
ones (see also section 2.2).17
We define three broad (and sometimes over-18
lapping) categories of methods that can be19
used to “impute” scRNA-seq data in the ab-20
sence of an external reference: (i) Model-based21
imputation methods of technical zeros use22
probabilistic models to identify which ob-23
served zeros represent technical rather than24
biological zeros and aim to impute expression25
levels just for these technical zeros, leaving26
other observed expression levels untouched;27
or (ii) Data-smoothing methods define sets28
of “similar” cells (e.g. cells that are neigh-29
bors in a graph or occupy a small region30
in a latent space) and adjust expression val-31
ues for each cell based on expression values32
in similar cells. These methods adjust all33
expression values, including technical zeros,34
biological zeros and observed non-zero val-35
ues. (iii) Data-reconstruction methods typ-36
ically aim to define a latent space repre-37
sentation of the cells. This is often done38
through matrix factorization (e.g. principal39
component analysis) or, increasingly, through40
machine learning approaches (e.g. variational41
autoencoders that exploit deep neural net-42
works to capture non-linear relationships).43
Although a broad class of methods, both ma-44
trix factorization methods and autoencoders 45
(among others) are able to “reconstruct” the 46
observed data matrix from low-rank or sim- 47
plified representations. The reconstructed 48
data matrix will typically no longer be sparse 49
(with many zeros) and the implicitly “im- 50
puted” data can be used for downstream ap- 51
plications that cannot handle sparse count 52
data. 53
The first category of methods generally 54
seeks to infer a probabilistic model that cap- 55
tures the data generation mechanism. Such 56
generative models can be used to identify, 57
probabilistically, which observed zeros cor- 58
respond to technical zeros (to be imputed) 59
and which correspond to biological zeros (to 60
be left alone). There are many model-based 61
imputation methods already available that 62
use ideas from clustering (e.g. k-means), di- 63
mension reduction, regression and other tech- 64
niques to impute technical zeros, oftentimes 65
combining ideas from several of these ap- 66
proaches. These include SAVER [Huang 67
et al., 2018], ScImpute [Li and Li, 2018], 68
bayNorm [Tang et al., 2018], scRecover [Miao 69
et al., 2019], and VIPER [Chen and Zhou, 70
2018]. Clustering methods that implicitly im- 71
pute values, such as CIDR [Lin et al., 2017b] 72
and BISCUIT [Azizi et al., 2017], are closely 73
related to this class of imputation methods. 74
Data-smoothing methods, which adjust all 75
gene expression levels based on expression 76
levels in “similar” cells, have also been pro- 77
posed to handle imputation problems. We 78
might regard these approaches as “denois- 79
ing” methods. To take a simplified exam- 80
ple (Figure 2), we might imagine that sin- 81
gle cells originally refer to points in two- 82
dimensional space, but are likely to describe a 83
one-dimensional curve; projecting data points 84
onto that curve eventually allows imputation 85
of the “missing” values (but all points are 86
adjusted, or smoothed, not just true tech- 87
nical zeros). Prominent data-smoothing ap- 88
9
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
true popula�on manifold
denoising / imputa�on manifold
measurement error
denoising process
"true" data point
observed data point
denoised data point
imputed data point
Figure 2: Measurement error requires denoising methods or approaches that quantify uncer-tainty and propagate it down analysis pipelines. Also, whenever methods cannot deal withthe abundant missing values, imputation approaches are necessary. Whereas the true popu-lation manifold that generated data is never known, one can usually obtain some estimationof it that can be used for both denoising and imputation.
proaches to handling sparse counts include:1
• diffusion-based MAGIC [Dijk et al.,2
2018]3
• k-nearest neighbor-based knn-smooth4
[Wagner et al., 2018b]5
• network diffusion-based netSmooth6
[Jonathan Ronen, 2018]7
• clustering-based DrImpute [Gong et al.,8
2018]9
• locality sensitive imputation in LSIm-10
pute [Moussa and Măndoiu, 2019]11
A major task in the analysis of high-12
dimensional single-cell data is to find low-13
dimensional representations of the data that14
capture the salient biological signals and ren-15
der the data more interpretable and amenable16
to further analyses. As it happens, the ma-17
trix factorization and latent-space learning18
methods used for that task also provide an-19
other route for imputation through their abil-20
ity to reconstruct the observed data matrix21
from simplified representations of it. Prin-22
cipal component analysis (PCA) is one such23
standard matrix factorization method that24
can be applied to scRNA-seq data (preferably25
after suitable data normalization) as are other 26
widely-used general statistical methods like 27
independent component analysis (ICA) and 28
non-negative matrix factorization (NMF). As 29
(linear) matrix factorization methods, PCA, 30
ICA and NMF decompose the observed data 31
matrix into a “small” number of factors in two 32
low-rank matrices, one representing cell-by- 33
factor weights and one gene-by-factor load- 34
ings. Many matrix factorization methods 35
with tweaks for single-cell data have been pro- 36
posed in recent years, including: 37
• ZIFA, a zero-inflated factor analysis 38
[Pierson and Yau, 2015] 39
• f-scLVM, a sparse Bayesian latent vari- 40
able model [Buettner et al., 2017] 41
• GPLVM, a Gaussian process latent vari- 42
able model [Verma and Engelhardt, 2018] 43
• ZINB-WaVE, a zero-inflated negative bi- 44
nomial factor model [Risso et al., 2018] 45
• scCoGAPS, an extension of NMF [Stein- 46
O’Brien et al., 2019] 47
• consensus NMF, a meta-analysis ap- 48
proach to NMF [Kotliar et al., 2019] 49
10
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
• pCMF, probabilistic count matrix factor-1
ization with a Poisson model [Durif et al.,2
2019]3
• SDA, sparse decomposition of arrays;4
another sparse Bayesian method [Jung5
et al., 2019].6
Some data reconstruction approaches have7
been specifically proposed for imputation, in-8
cluding:9
• ENHANCE, denoising PCA with an ag-10
gregation step [Wagner et al., 2019]11
• ALRA, SVD with adaptive thresholding12
[Linderman et al., 2018]13
• scRMD, robust matrix decomposition14
[Chen et al., 2018]15
Recently, machine learning methods have16
emerged that apply autoencoders [AutoIm-17
pute, Talwar et al., 2018] and deep neu-18
ral networks [DeepImpute, Arisdakessian19
et al., 2018]) or ensemble learning [EnImpute,20
Zhang et al., 2019c]) to impute expression val-21
ues.22
Additionally, many deep learning methods23
have been proposed for single-cell data anal-24
ysis that can, but need not, use probabilis-25
tic data generative processes to capture low-26
dimensional or latent space representations of27
a dataset. Even if imputation is not a main28
focus, such methods can generate “imputed”29
expression values as an upshot of a model pri-30
marily focused on other tasks like learning la-31
tent spaces, clustering, batch correction, or32
visualization (and often several of these tasks33
simultaneously). The latter set includes tools34
such as:35
• DCA, an autoencoder with a zero-36
inflated negative binomial distribution37
[Eraslan et al., 2019]38
• scVI, a variational autoencoder with a 39
zero-inflated negative binomial model 40
[Lopez et al., 2018] 41
• LATE [Badsha et al., 2018] 42
• VASC [Wang and Gu, 2018] 43
• compscVAE [Grønbech et al., 2018] 44
• scScope [Deng et al., 2019] 45
• Tybalt [Way and Greene, 2018] 46
• SAUCIE [Amodio et al., 2019] 47
• scvis [Ding et al., 2018] 48
• net-SNE [Cho et al., 2018] 49
• BERMUDA, focused on batch correction 50
[Wang et al., 2019] 51
• DUSC [Srinivasan et al., 2019] 52
• Expression Saliency [Kinalis et al., 2019] 53
• others [Lin et al., 2017a, Zhang, 2019] 54
Besides the three categories described 55
above, a small number of scRNA-seq impu- 56
tation methods have been developed to in- 57
corporate information external to the cur- 58
rent dataset for imputation. These include: 59
ADImpute [Leote et al., 2019], which uses 60
gene regulatory network information from ex- 61
ternal sources; SAVER-X [Wang et al., 2018], 62
a transfer learning method for denoising and 63
imputation that can use information from 64
atlas-type resources; and methods that bor- 65
row information from matched bulk RNA- 66
seq data like URSM [Zhu et al., 2018] and 67
SCRABBLE [Peng et al., 2019]. 68
11
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
3.1.2 Open problems1
A major challenge in this context is the circu-2
larity that arises when imputation solely relies3
on information that is internal to the imputed4
dataset. This circularity can artificially am-5
plify the signal contained in the data, leading6
to inflated correlations between genes and/or7
cells. In turn, this can introduce false pos-8
itives in downstream analyses such as differ-9
ential expression testing and gene network in-10
ference [Andrews and Hemberg, 2019]. Han-11
dling batch effects and potential confounders12
requires further work to ensure that imputa-13
tion methods do not mistake unwanted varia-14
tion from technical sources for biological sig-15
nal. In a similar vein, single-cell experiments16
are affected by various uncertainties (see sec-17
tion 2.2). Approaches that allow quantifica-18
tion and propagation of the uncertainties as-19
sociated with expression measurements (sec-20
tion 2.2), may help to avoid problems associ-21
ated with ‘overimputation’ and the introduc-22
tion of spurious signals noted by Andrews and23
Hemberg [2019].24
To avoid this circularity, it is important25
to identify reliable external sources of infor-26
mation that can inform the imputation pro-27
cess. One possibility is to exploit external28
reference panels (like in the context of ge-29
netic association studies). Such panels are30
not generally available for scRNA-seq data,31
but ongoing efforts to develop large scale cell32
atlases [e.g. Regev et al., 2017, see also sec-33
tion 3.3] could provide a valuable resource34
for this purpose. Systematic integration of35
known biological network structures is de-36
sirable and may also help to avoid circular-37
ity. A possible approach is to encode net-38
work structure knowledge as prior informa-39
tion, as attempted in netSmooth and ADIm-40
pute. Another alternative solution is to ex-41
plore complementary types of data that can42
inform scRNA-seq imputation. This idea was43
adopted in SCRABBLE and URSM, where an 44
external reference is defined by bulk expres- 45
sion measurements from the same population 46
of cells for which imputation is performed. 47
Yet another possibility could be to incorpo- 48
rate orthogonal information provided by dif- 49
ferent types of molecular measurements (see 50
section 6.1). Methods designed to integrate 51
multi-omics data could then be extended to 52
enable scRNA-seq imputation, e.g. through 53
generative models that explicitly link scRNA- 54
seq with other data types [e.g. clonealign, 55
Campbell et al., 2019] or by inferring a shared 56
low-dimensional latent structure [e.g. MOFA, 57
Argelaguet et al., 2018] that could be used 58
within a data-reconstruction framework. 59
With the proliferation of alternative meth- 60
ods, comprehensive benchmarking is urgently 61
required as for all areas of single-cell data 62
analysis section 6.2. Early attempts by Zhang 63
and Zhang [2018] and Andrews and Hemberg 64
[2019] provide valuable insights into the per- 65
formance of methods available at the time. 66
But many more methods have since been pro- 67
posed and even more comprehensive bench- 68
marking platforms are needed. Many meth- 69
ods, especially those using deep learning, de- 70
pend strongly on choice of hyperparameters 71
[Hu and Greene, 2019]. There, more de- 72
tailed comparisons that explore parameter 73
spaces would be helpful, extending work like 74
that from Sun et al. [2019] comparing di- 75
mensionality reduction methods. Learning 76
from exemplary benchmarking studies [Sone- 77
son and Robinson, 2018, Saelens et al., 2019], 78
it would be immensely beneficial to develop 79
a community-supported benchmarking plat- 80
form with a wide-range of synthetic and ex- 81
periment ground-truth datasets (or as close 82
as possible, in the case of experimental data) 83
and a variety of thoughtful metrics for eval- 84
uating performance. Ideally, such a bench- 85
marking platform would remain dynamic be- 86
yond an initial publication to allow ongoing 87
12
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
comparison of methods as new approaches are1
proposed. Detailed benchmarking would also2
help to establish when normalization methods3
derived from explicit count models [e.g. Hafe-4
meister and Satija, 2019, Townes et al., 2019]5
may be preferable to imputation.6
Finally, scalability for large numbers of7
cells remains an ongoing concern for imputa-8
tion, data smoothing and data reconstruction9
methods, as for all high-throughput single-cell10
methods and software (see section 2.3).11
3.2 Challenge II: Defining12
flexible statistical13
frameworks for discovering14
complex differential patterns15
in gene expression16
Beyond simple changes in average gene ex-17
pression between cell types (or across bulk-18
collected libraries), scRNA-seq enables a19
high granularity of changes in expression to20
be unraveled. Interesting and informative21
changes in expression patterns can be re-22
vealed, as well as cell-type-specific changes23
in cell state across samples (Figure 6, Ap-24
proach 1). Further understanding of gene25
expression changes will enable deeper knowl-26
edge across a myriad of applications, such as27
immune responses [Kang et al., 2018b, Stub-28
bington et al., 2017], development [Karaiskos29
et al., 2017a] and drug response [Kim et al.,30
2015].31
3.2.1 Status32
Currently, the vast majority of differential ex-33
pression detection methods assume that the34
groups of cells to be compared are known35
in advance (e.g., experimental conditions or36
cell types). However, most current analy-37
sis pipelines rely on clustering or cell type38
assignment to identify such groups, before39
downstream differential analysis is performed, 40
without propagating the uncertainty in these 41
assignments or accounting for the double use 42
of data (clustering, differential testing be- 43
tween clusters). 44
In this context, most methods have fo- 45
cused on comparing average expression be- 46
tween groups [Kharchenko et al., 2014, Fi- 47
nak et al., 2015], but it appears that single- 48
cell-specific methods do not uniformly out- 49
perform the state-of-the-art bulk methods 50
[Soneson and Robinson, 2018]. Instead, lit- 51
tle attention has been given to more gen- 52
eral patterns of differential expression (Fig- 53
ure 3), such as changes in variability that ac- 54
count for mean expression confounding [Eling 55
et al., 2018], changes in trajectory along pseu- 56
dotime [Campbell and Yau, 2018, van den 57
Berge et al., 2019], or more generally, changes 58
in distributions [Korthauer et al., 2016b]. 59
Furthermore, methods for cross-sample com- 60
parisons of gene expression (e.g., cell-type- 61
specific changes in cell state across samples, 62
compare section 6.1, Figure 6 and Table 2) 63
are now emerging, such as pseudo-bulk com- 64
parisons [Kang et al., 2018a], where expres- 65
sion is aggregated over multiple cells within 66
each sample. With the expanding capacity 67
of experimental techniques to generate multi- 68
sample scRNA-seq datasets, further general 69
and flexible statistical frameworks will be re- 70
quired to identify complex differential pat- 71
terns across samples. This will be particularly 72
critical in clinical applications, where cells are 73
collected from multiple patients. 74
3.2.2 Open problems 75
Accounting for uncertainty in cell type as- 76
signment and for double use of data will 77
require, first of all, a systematic study of 78
their impact. Integrative approaches in which 79
clustering and differential testing are simul- 80
taneously performed [Vavoulis et al., 2015] 81
13
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
popula�on differences in
mean variability pseudo�me
...
Q Q
...
...
R R
...
...
S S
...
Figure 3: Differential expression of a gene or transcript between cell populations. The top rowlabels the specific gene or transcript, as is also done in Figure 6. A difference in mean geneexpression manifests in a consistent difference of gene expression across all cells of a popu-lation (e.g. high vs. low). A difference in variability of gene expression means that in onepopulation, all cells have a very similar expression level, whereas in another population somecells have a much higher expression and some a much lower expression. The resulting averageexpression level may be the same and in such cases, only single-cell measurements can findthe difference between populations. A difference across pseudotime is a change of expres-sion within a population, e.g. along a developmental trajectory (compare Figure 1). Thisalso constitutes a difference between cell populations that is not apparent from populationaverages, but requires a pseudo-temporal ordering of measurements on single cells.
can address both issues. However, integra-1
tive methods typically require bespoke imple-2
mentations, precluding a direct combination3
between arbitrary clustering and differential4
testing tools. In such cases, the adaptation of5
selective inference methods [Reid et al., 2018,6
Zhang et al., 2019b] could provide an alterna-7
tive solution.8
While some methods exist to identify more9
general patterns of gene expression changes10
(e.g. variability, distributions), these meth-11
ods could be further improved by integrat-12
ing with existing approaches that account for13
confounding effects such as cell cycle [Ste-14
gle et al., 2015] and complex batch effects15
[Butler et al., 2018a, Haghverdi et al., 2018].16
Moreover, our capability to discover interest-17
ing gene expression patterns will be vastly18
expanded by connecting with other aspects19
of single-cell expression dynamics, such as20
cell type composition, RNA velocity [Manno 21
et al., 2018], splicing and allele-specificity. 22
This will allow us to fully exploit the granu- 23
larity contained in single-cell level expression 24
measurements. 25
In the multi-donor setting, several promis- 26
ing methods have been applied to discover 27
state transitions in high-dimensional cytome- 28
try datasets [Lun et al., 2017, Bruggner et al., 29
2014, Weber et al., 2018, Nowicka et al., 2017]. 30
These approaches could be expanded to the 31
higher dimensions and characteristic aspects 32
of scRNA-seq data. Alternatively, there is a 33
large space to explore other general and flex- 34
ible approaches, such as hierarchical models 35
where information is borrowed across sam- 36
ples, while allowing for sample-specific pat- 37
terns. 38
14
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
3.3 Challenge III: Mapping1
single cells to a reference2
atlas3
Classifying cells into cell types or states is4
essential for many secondary analyses. As5
an example, consider studying and classify-6
ing how expression varies across different cells7
and different biological conditions (for differ-8
ential expression analyses, see section 3.2 and9
data integration Approach 1 in section 6.1,10
Figure 6 and Table 2). To put the results of11
such studies on a map, reliable reference sys-12
tems are required.13
The lack of appropriate, available refer-14
ences has so far implied that only reference-15
free approaches were conceivable, where unsu-16
pervised clustering approaches were the pre-17
dominant option (see data integration Ap-18
proach 0 in section 6.1, Figure 6 and Table 2).19
Method development for such unsupervised20
clustering of cells has already reached a cer-21
tain level of maturity; see Duò et al. [2018],22
Freytag et al. [2018], Kiselev et al. [2019] for23
a systematic identification of available tech-24
niques.25
However, unsupervised approaches involve26
manual cluster annotation. There are two27
major caveats: (i) manual annotation is a28
time-consuming process, which also (ii) puts29
certain limits to the reproducibility of the re-30
sults. Cell atlases, as reference systems that31
systematically capture cell types and states,32
either tissue-specific or across different tis-33
sues, remedy this issue (see data integration34
Approach 2 in section 6.1, Figure 6 and Ta-35
ble 2; see also Figure 1 for an idea of what cell36
atlas type reference systems preferably could37
look like).38
3.3.1 Status39
See Table 1 for a list of cell atlas type refer-40
ences that have recently been published. For41
human, similar endeavors as for the mouse are 42
under way, with the intention to raise a Hu- 43
man Cell Atlas [Regev et al., 2017]. Towards 44
this end, initial consortia focus on specific or- 45
gans, for example the lung [Schiller et al., 46
2019]. 47
The availability of these reference atlases 48
has led to the active development of methods 49
that make use of them in the context of su- 50
pervised classification of cell types and states 51
[Lieberman et al., 2018, Srivastava et al., 52
2018, Cao et al., 2019b, DePasquale et al., 53
2019, Kanter et al., 2019, Sato et al., 2019, 54
Zhang et al., 2019a]. A field that serves as 55
a source of inspiration is flow/mass cytom- 56
etry, where several methods have addressed 57
the classification of high-dimensional cell type 58
data [Chester and Maecker, 2015, Weber and 59
Robinson, 2016, Saeys et al., 2016, Guilliams 60
et al., 2016]. Finally, as for benchmarking 61
methods that map cells of unknown type or 62
state onto reference atlases (see Section sec- 63
tion 6.2 for benchmarking in general), atlases 64
of model organisms where full lineages of cells 65
have been integrated can form the basis for 66
further developments [Spanjaard et al., 2018, 67
Plass et al., 2018, Fincher et al., 2018, Farrell 68
et al., 2018, Briggs et al., 2018].Importantly, 69
additional information available from lineage 70
tracing can provide a cross-check with respect 71
to the transcriptome-profile-based classifica- 72
tion [Briggs et al., 2018, Kester and van Oude- 73
naarden, 2018]. 74
3.3.2 Open problems 75
Cell atlases can still be considered under 76
active development, with several computa- 77
tional challenges still open, in particular re- 78
ferring to the fundamental themes from above 79
[Regev et al., 2017, Schiller et al., 2019, Hon 80
et al., 2018]. Here, we focus on the map- 81
ping of cells or rather their molecular profiles 82
onto stable existing reference atlases to fur- 83
15
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
organism scale of cell atlas citationnematodeCaenorhabditis ele-gans
whole organism at larval stageL2
[Cao et al., 2017]
planariaSchmidtea mediter-ranea
whole organism of the adult an-imal
[Fincher et al., 2018, Plass et al.,2018]
fruit flyDrosophilamelanogaster
whole organism at embryonicstage
[Karaiskos et al., 2017b]
Zebrafish whole organism at embryonicstage
[Farrell et al., 2018, Wagneret al., 2018a]
frogXenopus tropicalis
whole organism at embryonicstage
[Briggs et al., 2018]
Mouse whole adult brain [Rosenberg et al., 2018, Saunderset al., 2018, Zeisel et al., 2018]
Mouse whole adult organism [Tabula Muris Consortium, 2018,Han et al., 2018]
Table 1: Published cell atlases of whole tissues or whole organisms.
ther highlight the importance of these fun-1
damental themes. A computationally and2
statistically sound method for mapping cells3
onto atlases for a range of conceivable re-4
search questions will need to: (i) enable op-5
eration at various levels of resolution of inter-6
est, and also cover continuous, transient cell7
states (see section 2.1); (ii) quantify the un-8
certainty of a particular mapping of cells of9
unknown type/state (see section 2.2); (iii) to10
scale to ever more cells and broader cover-11
age of types and states (see section 2.3), and12
(iv) to eventually integrate information gen-13
erated not only through scRNA-seq experi-14
ments, but also through other types of mea-15
surements, for example scDNA-seq or protein16
expression data (see below in section 6.1 for a17
discussion of data integration, especially data18
integration Approaches 4 and 5 in section 6.1,19
Figure 6 and Table 2).20
3.4 Challenge IV: Generalizing 21
trajectory inference 22
Several biological processes, such as differen- 23
tiation, immune response or cancer expansion 24
can be described and represented as continu- 25
ous dynamic changes in cell type/state space 26
using tree, graphical or probabilistic mod- 27
els. A potential path that a cell can undergo 28
in this continuous space is often referred to 29
as a trajectory (Trapnell et al. [2014] and 30
Figure 1), and the ordering induced by this 31
path is referred to as pseudotime. Several 32
models have been proposed to describe cell 33
state dynamics, starting from transcriptomic 34
data [Saelens et al., 2019]. Trajectory infer- 35
ence is in principle not limited to transcrip- 36
tomics. Nevertheless, modeling of other mea- 37
surements, such as proteomic, metabolomic, 38
and epigenomic, or even integrating multiple 39
types of data (see section 6.1), is still at its 40
infancy. We believe the study of complex tra- 41
16
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
jectories integrating different data-types es-1
pecially epigenetics and proteomics informa-2
tion in addition to transcriptomics data will3
lead to a more systematic understanding of4
the processes determining cell fate.5
3.4.1 Status6
More than sixty trajectory methods have7
been proposed for trajectory inference from8
transcriptomic data using snapshot data at9
single or multiple time points [Saelens et al.,10
2019]. Briefly, those methods start from a11
count matrix where genes are rows and cells12
are columns. First, a feature selection or di-13
mensionality reduction step is used to explore14
a subspace where distances between cells are15
more reliable. Next, clustering and minimum16
spanning trees [Trapnell et al., 2014, Ji and17
Ji, 2016], principal curve or graph fitting [Qiu18
et al., 2017, Chen et al., 2019, Rizvi et al.,19
2017], or random walks and diffusion opera-20
tions on graphs (Haghverdi et al. [2016], Setty21
et al. [2016] among others) are used to in-22
fer pseudotime and/or branching trajectories.23
Alternative probabilistic descriptions can be24
obtained using optimal transport analysis25
[Schiebinger et al., 2017] or approximation of26
the Fokker-Planck equations [Weinreb et al.,27
2018] or by estimating pseudotime through di-28
mensionality reduction with a Gaussian pro-29
cess latent variable model [Campbell and Yau,30
2016, Reid and Wernisch, 2016, Ahmed et al.,31
2019].32
3.4.2 Open problems33
Potentially, many of the above-mentioned34
methods for trajectory inference can be35
extended to data obtained with non-36
transcriptomic assays. Thereby, the follow-37
ing aspects are crucial. First, it is necessary38
to define the features to use; while for tran-39
scriptomic data the features are well anno-40
tated and correspond to expression levels of 41
genes, clear-cut features are harder to deter- 42
mine for data such as methylation profiles and 43
chromatin accessibility where signals can re- 44
fer to individual genomic sites, but also be 45
pooled over sequence features or sequence re- 46
gions. Second, many of those recent technolo- 47
gies only allow measurement of a quite lim- 48
ited number of cells compared to transcrip- 49
tomic assays where millions of cells can be 50
profiled using droplet-based platforms [Ma- 51
cosko et al., 2015, Klein et al., 2015, Zheng 52
et al., 2017]. Third, some of those measure- 53
ments are technically challenging since the in- 54
put material for each cell is limited (for exam- 55
ple two copies of each chromosome for methy- 56
lation or chromatin accessibility), giving rise 57
to more sparsity than scRNA-seq. In the 58
latter case it is necessary to define distance 59
or similarity metrics that take this problem 60
into account. An alternative approach con- 61
sists of pooling/combining information from 62
several cells or data imputation. For ex- 63
ample, imputation has been used for single- 64
cell DNA methylation [Angermueller et al., 65
2017], aggregation over chromatin accessibil- 66
ity peaks from bulk or pseudo-bulk sample 67
[Cusanovich et al., 2018], and k-mer-based 68
approaches have been proposed [Buenrostro 69
et al., 2018, de Boer and Regev, 2018, Chen 70
et al., 2019]. However, so far, no systematic 71
evaluation (see section 6.2) of those choices 72
has been performed and it is not clear how 73
many cells are necessary to reliably define 74
those features. 75
A pressing challenge is to assess how the 76
different trajectory inference methods per- 77
form on different data types and importantly 78
to define metrics that are suitable. Also, it 79
is necessary to reason on the ground truth or 80
propose reasonable surrogates (e.g. previous 81
knowledge about developmental processes). 82
Some recent papers explore this idea using 83
scATAC-seq data, an assay to measure chro- 84
17
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
matin accessibility [Buenrostro et al., 2018,1
Chen et al., 2019, Pliner et al., 2018].2
Having defined robust methods to recon-3
struct trajectories from each data type, an-4
other future challenge is related to their com-5
parison or alignment. Here, some ideas from6
recent methods used to align transcriptomic7
datasets may be extended [Butler et al.,8
2018b, Haghverdi et al., 2018, Welch et al.,9
2018]. A related unsolved problem is that10
of comparing different trajectories obtained11
from the same data type but across individu-12
als or conditions to highlight unique and com-13
mon aspects.14
3.5 Challenge V: Finding15
patterns in spatially resolved16
measurements17
Single-cell spatial transcriptomics or pro-18
teomics [Crosetto et al., 2015, Strell et al.,19
2018, Moffitt et al., 2018] technologies can20
obtain transcript abundance measurements21
while retaining spatial coordinates of cells or22
even transcripts within a tissue (this can be23
seen as an additional feature space to inte-24
grate, see Approach 3 in section 6.1, Figure 625
and Table 2). With such data, the question26
arises of how spatial information can best be27
leveraged to find patterns, infer cell types or28
functions and classify cells in a given tissue29
[Tanay and Regev, 2017].30
3.5.1 Status31
Experimental approaches have been tailored32
to either systematically extract foci of cells33
and analyze them with scRNA-seq, or to mea-34
sure RNA and proteins in-situ. Histological35
sections can be projected in two dimensions36
while preserving spatial information using se-37
quencing arrays [Ståhl et al., 2016]. Whole38
tissues can be decomposed using the Niche-39
seq approach [Medaglia et al., 2017]: here40
a group of cells are specifically labeled with 41
a fluorescent signal, sorted and subjected to 42
scRNA-seq. The Slide-seq approach uses an 43
array of Drop-seq drops with known barcodes 44
to dissolve corresponding slide sites and se- 45
quence them with the respective barcodes 46
[Rodriques et al., 2019]. Ultimately, one 47
would like to sequence inside a tissue with- 48
out dissociating the cells and without compro- 49
mising on the unbiased nature of scRNA-seq. 50
A preliminary approach has been proposed 51
by Lee et al. [2015] coined FISSEQ (Fluo- 52
rescent in-situ sequencing). Lubeck et al. 53
[2014] have shown a first approach to itera- 54
tively apply fluorescence in-situ hybridization 55
to measure hundreds of RNA species simulta- 56
neously, called seqFISH. SeqFISH+ scales the 57
FISH barcoding strategy to 10,000 genes by 58
splitting each of four barcode locations to be 59
scanned into 20 separate readings to avoid sig- 60
nal crowding [Eng et al., 2019]. Based on a 61
related principle, MERFISH was proposed by 62
Chen et al. [2015], which enables to measure 63
hundreds to thousands of transcripts in sin- 64
gle cells simultaneously while retaining spa- 65
tial coordinates [Moffitt et al., 2016]. Here, 66
even the subcellular coordinates of each in- 67
dividual transcript are retained. In addition 68
to the methods that provide in-situ measure- 69
ments of RNA, Giesen et al. [2014] and An- 70
gelo et al. [2014] use mass cytometry tech- 71
nology to quantify the abundance of proteins 72
while preserving subcellular resolution. Fi- 73
nally, the recently described Digital Spatial 74
Profiling [DSP, Merritt et al., 2019, Van and 75
Blank, 2019] promises to provide both RNA 76
and protein measurements with spatial reso- 77
lution. 78
For determining cell types, or clustering 79
cells into groups that conduct a common func- 80
tion, several methods are available [Zhang 81
et al., 2019a, Kiselev et al., 2018, Butler et al., 82
2018b]. None of these currently directly use 83
spatial information. In contrast, spatial cor- 84
18
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
relation methods have been used to detect1
aggregation of proteins [Shivanandan et al.,2
2016]. Shah et al. [2016] use seqFISH to mea-3
sure transcript abundance of a set of marker4
genes while retaining the spatial coordinates5
of the cells. Cells are clustered by gene ex-6
pression profiles and then assigned to regions7
in the brain based on their coordinates in the8
sample. Recently, Edsgärd et al. [2018] pre-9
sented a method to detect spatial differential10
expression patterns per gene based on marked11
point processes [Jacobsen, 2005]. Svensson12
et al. [2018a] provided a method to perform a13
spatially resolved differential expression anal-14
ysis. Here, spatial dependence for each gene15
is learned by non-parametric regression, en-16
abling the testing of the statistical signifi-17
cance for a gene to be differentially expressed18
in space.19
3.5.2 Open problems20
The central problem is to consider gene or21
transcript expression and spatial coordinates22
of cells, and derive an assignment of cells23
to classes, functional groups or cell types.24
While methods for both assigning cell types25
or functional groups and spatially resolved26
gene expression analysis are present, there is27
currently no method available that combines28
the two by leveraging information from spa-29
tial localization to determine the cell type or30
find groups of cells that conduct a common31
function. Depending on the studied biolog-32
ical question, it can be useful to constrain33
assignments with expectations on the homo-34
geneity of the tissue. For example, a set of35
cells grouped together might be required to36
appear in one or multiple clusters where lit-37
tle to no other cells are present. Such con-38
straints might depend on the investigated cell39
types or tissues. For example, in cancer, spa-40
tial patterns can occur on multiple scales,41
ranging from single infiltrating immune cells42
[Fridman et al., 2011] and minor subclones 43
[Swanton, 2012] to larger subclonal structures 44
or the embedding in surrounding normal tis- 45
sue and the tumor microenvironment [Cretu 46
and Brooks, 2007]. Currently, to the best of 47
our knowledge, there is no method available 48
that would allow the encoding of such prior 49
knowledge while inferring cell types by inte- 50
grating spatial information with transcript or 51
gene expression. Another important aspect 52
when modeling the relation between space 53
and expression is whether uncertainty in the 54
measurements can be propagated to down- 55
stream analyses. For example, it is desir- 56
able to rely on transcript quantification meth- 57
ods that provide the posterior distribution 58
of transcript expression [Kharchenko et al., 59
2014, Köster et al., 2017] and propagate this 60
information to the spatial analysis. Finally, 61
in light of issues with sparsity in single-cell 62
measurements (section 3.1), it appears desir- 63
able to integrate spatial information into the 64
quantification itself, and e.g. use neighboring 65
cells within the same tissue for imputation 66
or the inference of a posterior distribution of 67
transcript expression. 68
4 Challenges in single-cell 69
genomics 70
With every cell division in an organism, the 71
genome can be altered through mutational 72
events ranging from point mutations, over 73
short insertions and deletions, to large scale 74
copy number variation and complex struc- 75
tural variants. In cancer, the entire reper- 76
toire of these genetic events can occur during 77
disease progression (Figure 4). The resulting 78
tumor cell populations are highly heteroge- 79
neous. As tumor heterogeneity can predict 80
patient survival and response to therapy, in- 81
cluding immunotherapy, quantifying this het- 82
19
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
erogeneity and understanding its dynamics1
are crucial for improving diagnosis and ther-2
apeutic choices (Figure 4).3
Classic bulk sequencing data of tumor sam-4
ples taken during surgery are always a mix-5
ture of tumor and normal cells (including6
e.g. invading immune cells). This means7
that disentangling mutational profiles of tu-8
mor subclones will always be challenging,9
which especially holds for rare subclones that10
could nevertheless be the ones e.g. bearing11
resistance mutation combinations prior to a12
treatment (Figure 4). Here, the sequencing13
of (sufficient) single cells holds the exciting14
promise of directly identifying and character-15
izing those subclone profiles (Figure 4).16
4.1 Challenge VI: Improving17
single-cell DNA sequencing18
data quality and scaling to19
more cells20
Despite accumulating technological advances21
in the field, the task of characterizing tumor22
heterogeneity and inferring the evolutionary23
mechanisms that give rise to this heterogene-24
ity is still hampered by multiple types of er-25
rors that occur during the process of scDNA-26
seq [Wang and Song, 2017, Hou et al., 2015,27
Gawad et al., 2016, Estévez-Gómez et al.,28
2018]. DNA sequencing technologies differ in29
their protocols of single-cell isolation and ly-30
sis, whole genome amplification (WGA), and31
library preparation [Zhang et al., 2016]. Fail-32
ure of cell isolation leads to the presence—33
albeit usually in a small proportion—of dou-34
blets instead of single cells and the cell lysis35
step can introduce artificial sequence modifi-36
cation. The main source of error, however,37
is the WGA step. Single cells only carry38
two (in case of normal cells) up to tens (in39
amplified regions of disease cells) of copies40
of DNA molecules, which need to be sub-41
stantially amplified from pico to nanogram 42
scale to read their sequence. Amplification- 43
related artifacts include i) amplification er- 44
rors, i.e. sequence alterations such as single 45
nucleotide or indel errors introduced by the 46
polymerase in the copy process, ii) allelic bias, 47
i.e. the differential amplification of the alle- 48
les at a genomic locus (if one allele fails to 49
amplify at all, this is an allele dropout, if 50
both fail, a locus dropout), iii) chimeric se- 51
quences. The majority of WGA approaches 52
can be broadly classified into methods based 53
on polymerase chain reaction (PCR) and 54
multiple displacement amplification (MDA). 55
The PCR-based technologies include degener- 56
ate oligonucleotide-primed PCR (DOP-PCR) 57
[Telenius et al., 1992], linker-adapter PCR 58
[Klein et al., 1999], primer extension pre- 59
amplification PCR (PEP-PCR-/I-PEP-PCR) 60
[Zhang et al., 1992, Arneson et al., 2008] 61
and others. They require thermostable poly- 62
merases that withstand all temperatures dur- 63
ing the cycling. More recent MDA-based 64
technologies use the strand-displacing, high- 65
fidelity Φ29 DNA polymerase [Blanco et al., 66
1989, Dean et al., 2002, Spits et al., 2006b, 67
Picher et al., 2016, Paez et al., 2004, Spits 68
et al., 2006a] for an isothermal reaction, as 69
it is not stable at common PCR temperature 70
maxima. Another approach, called multiple 71
annealing and looping-based amplification cy- 72
cles (MALBAC) combines MDA and PCR, 73
and relies on the Bacillus stearothermophilus 74
polymerase for the MDA process [Zong et al., 75
2012]. 76
4.1.1 Status 77
Ideally, scDNA-seq should provide informa- 78
tion about the entire repertoire of distinct 79
events that occurred in the genome of a single 80
cell, such as copy number alterations, genomic 81
rearrangements, together with SNVs and 82
smaller insertion and deletion variants. How- 83
20
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
subclonecompe��on
immunedetec�on
tumourresec�on
therapy
metastasisresec�on
selec�onpressures
biopsy
�mespace
small muta�on(SNV / indel)
amplifica�on
large dele�on
sampled cell theore�calsubclone phylogeny
Figure 4: From initiation of a tumor to its detection, resection and possible metastasis, itwill evolve somatically. New genomic mutations can confer a selective advantage to theresulting new subclone, that can allow it to outcompete other tumor subclones (subclonecompetition). At the same time, the acting selection pressures can change over time, e.g. dueto new subclones arising, the immune system detecting certain subclones, or as a result oftherapy. Understanding such selective regimes—and how specific mutations alter a subclone’ssusceptibility to changes in selection pressures—will help construct an evolutionary model oftumorigenesis. And it is only within this evolutionary model, that more efficient and morepatient-specific treatments can be developed. For such a model, unambiguously identifyingmutation profiles of subclones via scDNA-seq of resected or biopsied single cells is crucial.
21
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
ever, amplification biases and errors present a1
serious challenge to variant calling [de Bourcy2
et al., 2014, Hou et al., 2015, Huang et al.,3
2015, Estévez-Gómez et al., 2018]: It is4
broadly accepted that different WGA tech-5
nologies should be used depending on whether6
SNVs or whether copy number variation7
(CNV)s are to be detected, as the distinct8
technologies differ in the magnitude of ampli-9
fication bias, and the rates of amplification er-10
rors and chimera formation. Generally, PCR-11
based approaches with more uniform coverage12
should be used for CNV calling, while MDA-13
based methods that result in less single nu-14
cleotide errors should be applied for SNV call-15
ing. The goal must thus be to (i) improve the16
coverage uniformity of MDA-based methods,17
(ii) reduce the error rate of the PCR-based18
methods, or (iii) create new methods that ex-19
hibit both a low error rate and a more uni-20
form amplification of alleles. Recent years21
witnessed intensive research in these direc-22
tions, e.g.: (i) Improved coverage uniformity23
for MDA has been achieved using droplet mi-24
crofluidics-based methods, resulting in emul-25
sion WGA (eWGA, [Fu et al., 2015]), sin-26
gle droplet MDA (sd-MDA, [Hosokawa et al.,27
2017]) and digital droplet multiple displace-28
ment amplification (ddMDA, [Sidore et al.,29
2016]). A second approach has been to cou-30
ple the Φ29 DNA polymerase to a primase31
to reduce priming bias [Picher et al., 2016].32
Both these approaches improve the calling33
of CNVs from the resulting data. (ii) One34
way to reduce the amplification error rate35
of the PCR-based methods (including MAL-36
BAC) would be to employ a thermostable37
polymerase (necessary for use in PCR) with38
proof-reading activity similar to Φ29 DNA39
polymerase. While SD polymerase combines40
thermostability with strand displacement and41
has been tested for WGA [Blagodatskikh42
et al., 2017], we are not aware of any PCR43
DNA polymerases with a fidelity in the range44
of Φ29 DNA polymerase [Potapov and Ong, 45
2017] having been used in PCR-based WGA. 46
(iii) Three newer methods use an entirely dif- 47
ferent approach: They randomly insert trans- 48
posons into the whole genome and then lever- 49
age these as priming sites for library prepara- 50
tion and amplification. Direct library prepa- 51
ration (DLP, [Zahn et al., 2017a]), as the 52
name suggests, directly sequences the result- 53
ing shallow library without any amplification, 54
allowing only for CNV calling. It has re- 55
cently been further improved to account for 56
doublets and dead cells and scaled to 80,000 57
single cells [Laks et al., 2018]. Transposon 58
Barcoded (TnBC) follows the transposon in- 59
tegration with PCR amplification, making it 60
useful for CNV calling, but suffering from am- 61
plification errors [Xi et al., 2017]. Finally, 62
Linear Amplification via Transposon Inser- 63
tion (LIANTI, [Chen et al., 2017]) introduces 64
a new approach to dealing with amplifica- 65
tion errors. Instead of exponential amplifi- 66
cation, their amplification process is linear: 67
From promoters included in the transposon 68
insertion, they transcribe the original tagged 69
sequence multiple times and then use reverse 70
transcription and second-strand synthesis to 71
obtain double-stranded DNA for sequencing. 72
As errors introduced by the individual pro- 73
cesses are not propagated, they should be 74
unique to individual copies and accordingly 75
the authors report a false positive rate that is 76
even lower than for MDA [Chen et al., 2017]. 77
4.1.2 Open problems 78
These recent developments promise scalable 79
methodology for scDNA-seq comparable to 80
that already available for scRNA-seq, while 81
at the same time reducing previously limit- 82
ing errors and biases. In addition to fur- 83
ther improvements over the described exist- 84
ing methods, the major challenge will be to 85
continuously and systematically evaluate the 86
22
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
whole range of promising WGA methods for1
the identification of all types of genetic varia-2
tion from SNVs over smaller insertions and3
deletions up to copy number variation and4
structural variants.5
4.2 Challenge VII: Errors and6
missing data in the7
identification of features /8
variation from single-cell9
DNA sequencing data.10
The aim of scDNA sequencing usually is to11
track somatic evolution at the cellular level,12
that is, at the finest resolution possible rela-13
tive to the laws of reproduction (cell division,14
Figure 5). Examples refer to identifying het-15
erogeneity and tracking evolution in cancer,16
as the likely most predominant use case (also17
see below in section 5), but also to monitor-18
ing the interaction of somatic mutation with19
developmental and differentiation processes.20
To track genetic drifts, selective pressures, or21
other phenomena inherent to the development22
of cell clones or types (Figure 4)—but also to23
stratify cancer patients for the presence of re-24
sistant subclones—it is instrumental to geno-25
type and also phase genetic variants in single26
cells with sufficiently high confidence.27
The major disturbing factor in scDNA-seq28
data is the WGA process (see section 4.1).29
All methodologies introduce amplification er-30
rors (false positive alternative alleles), but31
more drastic is the effect of amplification bias:32
the insufficient or complete failure of am-33
plification, which leads to imbalanced pro-34
portions or complete lack of variant alleles.35
Overall, one can distinguish between three36
cases: (i) an imbalanced proportion of al-37
leles, i.e. loci harboring heterozygous muta-38
tions where preferential amplification of one39
of the two alleles leads to read counts that40
are distorted, sometimes heavily; (ii) allele41
10000
11000
11100
110101100? 100?01?101
Figure 5: Mutations (colored stars) accumu-late in cells during somatic cell divisionsand can be used to reconstruct the develop-mental lineages of individual cells within anorganism (leaf nodes of the tree with muta-tional presence / absence profiles attached).However, insufficient or unbalanced WGAcan lead to the dropout of one or both alle-les at a genomic site. This can be mitigatedby better amplification methods, but alsoby computational and statistical methodsthat can account for or impute the missingvalues.
drop-out, i.e. loci harboring heterozygous mu- 42
tations where only one of the alleles was am- 43
plified and sequenced, and (iii) site drop-out, 44
which is the complete failure of amplification 45
of both alleles at a site and the resulting lack 46
of any observation of a certain position of the 47
genome. Note that (ii) can be considered an 48
extreme case of (i). 49
A sound imputation of missing alleles and 50
a sufficiently accurate quantification of un- 51
certainties will yield massive improvements in 52
geno- and haplotyping (phasing) somatic vari- 53
ants. This, in turn, is necessary to substan- 54
tially improve the identification of subclonal 55
genotypes and the tracking of evolutionary 56
23
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
developments. Potential improvements in this1
area include (i) more explicit accounting for2
possible scDNA-seq error types, (ii) integrat-3
ing with different data types with error pro-4
files different from scDNA-seq (e.g. bulk se-5
quencing or RNA sequencing), or (iii) inte-6
grating further knowledge of the process of7
somatic evolution, such as the constraints of8
phylogenetic relationships among cells, into9
variant calling models. In this latter context,10
it is important to realize that somatic evolu-11
tion is asexual. Thus, no recombination oc-12
curs during mitosis, eliminating a major dis-13
turbing factor usually encountered when aim-14
ing to reconstruct species or population trees15
from germline mutation profiles.16
4.2.1 Status17
Current single-cell specific SNV callers in-18
clude Monovar [Zafar et al., 2016] and SC-19
caller [Dong et al., 2017]. SCcaller de-20
tects somatic variants independently for each21
cell, but accounts for local allelic amplifica-22
tion biases by integrating across neighboring23
germline single-nucleotide polymorphisms. It24
exploits the fact that allele drop-out af-25
fects contiguous regions of the genome large26
enough to harbor several, and not only one,27
heterozygous mutation loci. Monovar uses28
an orthogonal approach to variant calling. It29
does not assume any dependency across sites,30
but instead handles low and uneven cover-31
age and false positive alternative alleles by32
integrating the sequencing information across33
multiple cells. While Monovar merely creates34
a consensus across cells, integrating across35
cells is particularly powerful if further knowl-36
edge about the dependency structure among37
cells is incorporated. As pointed out above,38
due to the lack of recombination, any sam-39
ple of cells derived from an organism shares40
an evolutionary history that can be described41
by a cell lineage tree (see section 5). This42
tree, however, is in general unknown and can 43
in turn only be reconstructed from single-cell 44
mutation profiles. A possible solution is to 45
infer both mutation calls and a cell lineage 46
tree at the same time, an approach taken by 47
a number of existing tools: single-cell Geno- 48
typer [Roth et al., 2016], SciCloneFit [Zafar 49
et al., 2018] and SciΦ [Singer et al., 2018]. 50
Finally, SSrGE, identifies SNVs correlated 51
with gene expression from scRNA-seq data 52
[Poirion et al., 2018]. 53
54
Some basic approaches to CNV calling from 55
scDNA-seq data are available. These are usu- 56
ally based on hidden markov models (HMMs) 57
where the hidden variables correspond to copy 58
number states, as e.g. in Aneufinder [Bakker 59
et al., 2016]. Another tool, Ginkgo, pro- 60
vides interactive CNV detection using circu- 61
lar binary segmentation, but is only avail- 62
able as a web-based tool [Garvin et al., 2015]. 63
ScRNA-seq data, which does not suffer from 64
the errors and biases of WGA, can also be 65
used to call CNVs or loss of heterozygosity 66
events: an approach called HoneyBADGER 67
[Fan et al., 2018] utilizes a probabilistic hid- 68
den Markov model, whereas the R package 69
inferCNV simply averages the expression over 70
adjacent genes [Patel et al., 2014]. 71
4.2.2 Open problems 72
SNV callers for scDNA-seq data have al- 73
ready incorporated amplification error rates 74
and allele dropout in their models. But 75
beyond these rates, the challenge remains 76
to further extend this into a full statistical 77
modeling of the amplification process, that 78
would inherently account for both errors and 79
biases, and more accurately quantify the 80
resulting uncertainties (see section 2.2). This 81
could be achieved by expanding models that 82
accurately quantify uncertainties in related 83
settings [Köster et al., 2019] and would 84
24
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
ultimately even allow reliable control of false1
discovery rates in the variant discovery and2
genotyping process. Such expanded models3
can build on a number of recent studies4
in this context, e.g. on a formalization in5
a recent preprint [Koptagel et al., 2018].6
Furthermore, such models could integrate the7
structure of cell lineage trees with the struc-8
ture implicit in haplotypes that link alleles.9
For haplotype phasing, Satas and Raphael10
[2018] recently proposed an approach based11
on contiguous stretches of amplification bias12
(similar to SCcaller, see above), whereas13
others propose read-backed phasing in two14
recent studies [Bohrson et al., 2019, Hård15
et al., 2019]. In addition, the integration16
with deep bulk sequencing data, as well as17
with (sc)RNA-seq data remains unexplored,18
although it promises to improve the precision19
of callers without compromising sensitivity.20
21
Identification of short insertions and22
deletions (indels) is another major challenge23
to be addressed: we are not aware of any24
scDNA-seq variant callers with those respec-25
tive capabilities.26
27
For copy number variation calling, soft-28
ware has previously been published mostly in29
conjunction with data-driven studies. Here,30
a systematic analysis of biases in the most31
common WGA methods for copy number32
variation calling (including newer methods33
to come) could further inform method devel-34
opment. The already mentioned approach35
of leveraging amplification bias for phasing36
could also be informative [Satas and Raphael,37
2018].38
39
The final challenge is a systematic compar-40
ison of tools beyond the respective software41
publications, which is still lacking for both42
SNV and CNV callers. This requires system-43
atic benchmarks, which in turn require simu-44
lation tools to generate synthetic datasets, as 45
well as sample-based benchmarking datasets 46
with a reasonably reliable ground truth (see 47
section 6.2). 48
5 Challenges in single-cell 49
phylogenomics 50
Single-cell variant profiles from scDNA-seq, 51
as described above (section 4.2), can be used 52
in computational models of somatic evolution, 53
including cancer evolution as an important 54
special case (Figure 4). For cancer, there is an 55
on-going, lively discussion about the very na- 56
ture of evolutionary processes at play, with 57
competing theories such as linear, branch- 58
ing, neutral, and punctuated evolution [Davis 59
et al., 2017]. 60
Models of cancer evolution may range from 61
a simple binary representation of the pres- 62
ence versus the absence of a particular mu- 63
tational event (Figure 5), to elaborate models 64
of the mechanisms and rates of distinct muta- 65
tional events. There are two main modeling 66
approaches that lend themselves to the anal- 67
ysis of tumor evolution [Altrock et al., 2015]: 68
phylogenetics and population genetics. 69
Phylogenetics comes with a rich reper- 70
toire of computational methods for likelihood- 71
based inference of phylogenetic trees [Felsen- 72
stein, 1981]. Traditionally, these methods are 73
used to reconstruct the evolutionary history 74
of a set of distinct species. However, they can 75
also be applied to cancer cells or subclones 76
(Figure 4). In this setting, tips of the phy- 77
logeny (also called leaves or taxa) represent 78
sampled and sequenced cells or subclones, 79
whereas inner nodes (also called ancestral) 80
represent their hypothetical common ances- 81
tors. The input for a phylogenetic inference 82
commonly consists of a multiple sequence 83
alignment (MSA) of molecular sequences for 84
25
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
the species of interest. For cancer phyloge-1
nies, one would concatenate the SNVs (and2
possibly other variant types) to assemble the3
input MSA. The key challenge for phyloge-4
netic method development comprises design-5
ing sequence evolution models that are (i) bio-6
logically realistic and yet (ii) computationally7
tractable for the increasingly large number of8
sequenced cells per patient and study.9
In population genetics, the tumor is under-10
stood as a population of evolving cells (Fig-11
ure 4). To date, population genetic theory12
has been used to model the initiation, pro-13
gression and spread of tumors from bulk se-14
quencing data [Foo et al., 2011, Beerenwinkel15
et al., 2007, Haeno et al., 2012]. The general16
mathematical framework behind these mod-17
els are branching processes [Kimmel and Ax-18
elrod, 2015], e.g. in models of the accumula-19
tion of driver and passenger mutations [Bozic20
et al., 2016, 2010]. Here, the driver mutations21
carry a fitness advantage, as might epistatic22
interactions among them [Bauer et al., 2014].23
On the other hand, passenger mutations are24
assumed to be neutral regarding fitness; they25
merely hitchhike along the fitness advantage26
of driver mutations they are linked to via27
their haplotype. The parameters of popu-28
lation genetic models describe inherent fea-29
tures of individual cells that are relevant for30
the evolution of their populations, e.g. fit-31
ness and the rates of birth, death, and mu-32
tations. Such cell-specific parameters should33
more naturally apply to and be derived from34
information gathered by sequencing of indi-35
vidual cells, as opposed to sequencing of bulk36
tissue samples. Models using these param-37
eters and the information about the evolu-38
tionary dynamics of cancer they contain, will39
e.g. be essential in the design of adaptive can-40
cer treatment strategies that aim at manag-41
ing subclonal tumor composition [Acar et al.,42
2019, Zhang et al., 2017].43
5.1 Challenge VIII: Scaling 44
phylogenetic models to many 45
cells and many sites 46
Even if given perfect data, phylogenetic mod- 47
els of tumor evolution would still face the 48
challenge of computational tractability, which 49
is mainly induced by: (i) the increasing num- 50
bers of cells that are sequenced in cancer 51
studies (see section 2.3), and (ii) the increas- 52
ing numbers of sites that can be queried per 53
genome (also see section 2.3). 54
5.1.1 Open problems 55
(i) While adding data from more single cells 56
will help improve the resolution of tumor phy- 57
logenies [Graybeal, 1998, Pollock et al., 2002], 58
this exacerbates one of the main challenges 59
of phylogenetic inference in general: the im- 60
mense space of possible tree topologies that 61
grows super-exponentially with the number of 62
taxa—in our case the number of single cells. 63
Therefore, phylogenetic inference is NP-hard 64
[Roch, 2006] under most scoring criteria (a 65
scoring criterion takes a given tree and MSA 66
to calculate how well the tree explains the 67
observed data). Calculating the given score 68
on all possible trees to find the tree that 69
best explains the data is computationally not 70
feasible for MSAs containing more than ap- 71
proximately 20 single cells, and thus requires 72
heuristic approaches to explore only promis- 73
ing parts of the tree search space. 74
(ii) In addition to the growing number of 75
cells (taxa), the breadth of genomic sites and 76
genomic alterations that can be queried per 77
genome also increases. Classical approaches 78
thus need not only scale with the number of 79
single cells queried (see above), but also with 80
the length of the input MSA. Here, previ- 81
ous efforts for parallelization [Aberer et al., 82
2014, Ayres, 2017] and other optimisation ef- 83
forts [Ogilvie et al., 2017] exist and can be 84
26
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
built upon. The breadth of sequencing data1
also allows determination of large numbers of2
invariant sites, which further raises the ques-3
tion of whether including them will change4
results of phylogenetic inferences in the con-5
text of cancer. Excluding invariant sites from6
the inference has been coined ascertainment7
bias, and for phylogenetic analyses of closely8
related individuals from a few populations it9
has been shown that accounting for ascertain-10
ment bias alters branch lengths, but not the11
resulting tree topologies per se [Leaché et al.,12
2015].13
5.2 Challenge IX: Integrating14
multiple types of features /15
variation into phylogenetic16
models17
Naturally, downstream analyses—like charac-18
terizing intratumor heterogeneity and infer-19
ring its evolutionary history—suffer from the20
unreliable variant detection in single cells.21
The better the quality of the variant calls22
gets, however, the more important it be-23
comes to model all types of available signal24
in mathematical models of tumor evolution,25
with the goal of increasing the resolution and26
reliability of the resulting trees; from SNVs,27
over smaller insertions and deletions, to large28
structural variation and CNVs (Figure 4). Fi-29
nally, to model somatic phylogenies compre-30
hensively, all available types of variants will31
have to be integrated into a comprehensive32
model. In the context of cancer, with ge-33
nomic destabilization occurring, this will be34
especially challenging.35
5.2.1 Status36
For phylogenetic tree inference from SNVs of37
single cells, a considerable number of tools38
exist. The early tools OncoNEM [Ross and39
Markowetz, 2016] and SCITE [Jahn et al., 40
2016] use a binary representation of presence 41
or absence of a particular SNV. They account 42
for false negatives, false positives and missing 43
information in SNV calls, where false neg- 44
atives are orders of magnitude more likely 45
to occur than false positives. The more re- 46
cent tool SiFit [Zafar et al., 2017] also uses 47
a binary SNV representation, but infers tu- 48
mor phylogenies allowing for both noise in 49
the calls and for violations of the infinite 50
sites assumption. Another approach allow- 51
ing for violations of the infinite sites assump- 52
tion is the extension of the Dollo parsimony 53
model to allow for k losses of a mutation 54
(Dollo-k) [El-Kebir, 2018, Ciccolella et al., 55
2018]. Single cell genotyper [Roth et al., 56
2016], SciCloneFit [Zafar et al., 2018], or SciΦ 57
[Singer et al., 2018] jointly call mutations in 58
individual cells and estimate the tumor phy- 59
logeny of these cells, directly from single-cell 60
raw sequencing data. In a recent work [Ko- 61
zlov, 2018], a standard phylogenetic infer- 62
ence tool RAxML-NG [Kozlov et al., 2019] 63
has been extended to handle single-cell SNV 64
data. In particular, this implements (i) a 10-s- 65
tate substitution model to represent all pos- 66
sible unphased diploid genotypes and (ii) an 67
explicit error model for allelic dropout and 68
genotyping/amplification errors. Initial ex- 69
periments showed that—although a 10-state 70
model incorporates more information—it out- 71
performed the ternary model (as used by 72
SiFit) only slightly and only in simulations 73
with very high error rates (10%-50%). How- 74
ever, further analysis suggests that benefits of 75
the genotype model become much more pro- 76
nounced with an increasing number of cells 77
and, in particular, an increasing number of 78
SNVs (Kozlov, personal communication). 79
While there are no tools yet available to 80
identify insertions and deletions from scDNA- 81
seq (see challenge above), it is only a matter 82
of time until such callers will become avail- 83
27
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
able. As they can already be identified from1
bulk sequencing data, some precious efforts2
to incorporate indels in addition to substitu-3
tions into classical phylogenetic models exist:4
A decade ago, a simple probabilistic model5
of indel evolution was proposed [Rivas and6
Eddy, 2008]. But although some progress7
has been made since then, such models are8
less tractable than the respective substitution9
models [Holmes, 2017].10
Incorporating CNVs in the reconstruction11
of tumor phylogeny can be helpful for un-12
derstanding tumor progressions, as they rep-13
resent one of the most common mutation14
types associated to tumor hypermutability15
[Kim et al., 2013]. CNVs in single cells were16
extensively studied in the context of tumor17
evolution and clonal dynamics [Navin et al.,18
2011, Eirew et al., 2015]. Reconstructing a19
phylogeny with CNVs is not straightforward.20
The challenges are not only related to ex-21
perimental limits, such as the complexity of22
bulk sequencing data [Zaccaria et al., 2017]23
and amplification biases [Gawad et al., 2016],24
but also involve computational constraints.25
First of all, the causal mechanisms, such as26
breakage-fusion-bridge cycles [Bignell et al.,27
2007] and chromosome missegregation [San-28
taguida et al., 2017], can lead to overlapping29
copy number events [Schwarz et al., 2014].30
Secondly, inferring a phylogeny with CNV31
data requires quantifying transition proba-32
bilities for changes in copy numbers based33
on the causal mechanisms. Towards that34
goal, approaches to calculate the distance be-35
tween whole copy number profiles [Zeira and36
Shamir, 2018] are a first step. But for them,37
a number of challenges remain, with several38
of the underlying problems known to be NP-39
hard [Zeira and Shamir, 2018].40
Co-occurrence of all of the above variation41
types further complicates mathematical mod-42
eling, as these events are not independent.43
For example, multiple SNVs that occurred in44
the process of tumor evolution may disappear 45
at once via a deletion of a large genomic 46
region. In addition, recent analyses revealed 47
recurrence and loss of particular mutational 48
hits at specific sites in the life histories of 49
tumors [Kuipers et al., 2017], undermining 50
the validity of the so called infinite sites 51
assumption, commonly made by phylogenetic 52
models: it assumes an infinite number of 53
genomic sites, thus rendering a repeated 54
mutational hit of the same genomic site along 55
a phylogeny impossible. 56
57
5.2.2 Open problems 58
For phylogenetic reconstruction from SNVs, 59
we anticipate a shift towards leveraging im- 60
provements in input data quality as they are 61
achieved through better amplification meth- 62
ods and SNV callers (see challenges above). 63
For indels, variant callers for scDNA-seq data 64
remain to be developed (see challenge above), 65
but are anticipated. Thus, indel modeling 66
efforts for phylogenetic reconstruction from 67
bulk sequencing data should be adapted. For 68
phylogenetic inference from CNVs, the ma- 69
jor challenges are (i) determining correct mu- 70
tational profiles and (ii) computing realis- 71
tic transition probabilities between those pro- 72
files. 73
The final challenge will be to incorporate all 74
of the above phenomena into a holistic model 75
of cancer evolution. However, this will sub- 76
stantially increase the computational cost of 77
reconstructing the evolutionary history of tu- 78
mor cells. Thus, one needs to carefully de- 79
termine which phenomena actually do mat- 80
ter (e.g. which parameters even affect the fi- 81
nal tree topology) and which features can be 82
measured (section 4.1) and called (section 4.2) 83
with sufficient accuracy to actually improve 84
modeling results. As a consequence one might 85
be able to devise more lightweight models for 86
28
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
answering specific questions and invest con-1
siderable effort into optimizing novel tools at2
the algorithmic and technical level (see chal-3
lenge below).4
5.3 Challenge X: Inferring5
population genetic6
parameters of tumor7
heterogeneity by model8
integration9
Tumor heterogeneity is the result of an evo-10
lutionary journey of tumor cell populations11
through both time and space [Swanton, 2012,12
McGranahan and Swanton, 2017]. Microen-13
vironmental factors like access to the vascular14
system and infiltration with immune cells dif-15
fer greatly—for regions within the original tu-16
mor as well as between the main tumor and17
metastases, and across different time points18
[Yang and Lin, 2017]. This imposes different19
selective pressures on different tumor cells,20
driving the formation of tumor subclones and21
thus determining disease progression (includ-22
ing metastatic potential), patient outcome23
and susceptibility to treatment (Junttila and24
de Sauvage [2013], Corredor et al. [2018] and25
Figure 4). However, even the answers to very26
basic questions about the resulting dynam-27
ics remain unanswered [Turajlic and Swanton,28
2016]: for example, whether metastatic seed-29
ing from the primary tumor occurs early and30
multiple times in parallel, with metastases di-31
verging genetically from the primary tumor,32
or whether seeding of metastases occurs late,33
from a far-developed subclone in the primary34
tumor, with that subclone seeding multiple35
locations with a genotype closer to the late-36
stage primary tumor; and whether a single37
cell can seed a metastasis, or whether the joint38
migration of a set of cells is required. Here, sc-39
seq can provide invaluable resolution [Navin40
et al., 2011].41
Although many mathematical models of 42
tumor evolution have been proposed [Bozic 43
et al., 2010, 2016, Altrock et al., 2015, Foo 44
et al., 2011, Michor et al., 2004], fundamen- 45
tal parameters characterizing the evolution- 46
ary processes remain elusive. To quantita- 47
tively describe the tumor evolution process 48
and evaluate different possible modes against 49
each other (e.g. modes of metastatic seeding), 50
we would like to estimate fitness values of 51
individual mutations and mutation combina- 52
tions, as well as rates of mutation, cell birth 53
and cell death—if possible, on the level of sub- 54
clones. These parameters determine the un- 55
derlying fitness landscape of individual cells 56
within their microenvironment, which in turn 57
determines the evolutionary dynamics of can- 58
cer progression. 59
5.3.1 Status 60
Recent technological advances already allow 61
for measuring the arrangement and relation- 62
ships of tumor cells in space, with cell loca- 63
tion basically amounting to a second measure- 64
ment type requiring data integration within 65
a cell (Approach 3 in section 6.1, Figure 6 66
and Table 2). While in vivo imaging tech- 67
niques might also become interesting for ob- 68
taining time series data in the future [Larue 69
et al., 2017], the automated analysis of whole 70
slide immunohistochemistry images [Ghaz- 71
navi et al., 2013, Saco et al., 2016] seems 72
the most promising in the context of cancer 73
and mutational profiles from scDNA-seq. It 74
is already amenable to single-cell extraction 75
of characterized cells with known spatial con- 76
text and subsequent scDNA-seq. Using laser 77
capture microdissection [Datta et al., 2015] 78
hundreds of single cells have recently been 79
isolated from tissue sections and analyzed 80
for copy number variation [Casasent et al., 81
2018]. For cell and tissue characterization in 82
immunohistochemical images, machine learn- 83
29
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
ing models are trained to segment the im-1
ages and recognize structures within tissues2
and cells [Gurcan et al., 2009, Irshad et al.,3
2014, Komura and Ishikawa, 2018]: They can4
e.g. determine the densities and quantities5
of mitotic nuclei, vascular invasion, immune6
cell infiltration on the tissue level, as well as7
stained biomarkers on the level of the individ-8
ual cell. These are key parameters of the tu-9
mor microenvironment, characterizing the in-10
teraction tumor cells with their environment11
in space [Yuan, 2016, Heindl et al., 2015].12
Mathematical models of tumor popula-13
tion genetics have classically assumed well14
mixed populations, ignoring any spatial struc-15
ture, let alone evolutionary microenviron-16
ments. Recently, methods have been ex-17
tended to account for some spatial structure18
and have already led to refined predictions of19
the waiting time to cancer [Martens et al.,20
2011] and intratumor heterogeneity [Waclaw21
et al., 2015]. In particular, spatial statistics22
has been proposed for the quantitative sta-23
tistical analysis of cancer digital pathology24
imaging [Heindl et al., 2015], but the idea25
is applicable to other spatially resolved read-26
outs. A number of methods were proposed27
to model cell-cell interactions [Schapiro et al.,28
2017, Arnol et al., 2018] or to predict single-29
cell expression from microenvironmental fea-30
tures [Goltsev et al., 2018, Battich et al.,31
2015]. With the advent of spatially resolved32
DNA sequencing, models can be adapted to33
the new data.34
Regarding temporal resolution, it is already35
common to sequence tumor material from dif-36
ferent timepoints: biopsies used for diagnosis,37
resected tumors, lymph nodes and metastases38
upon surgery and tumors after relapse. These39
time-points already lend themselves to tem-40
poral analyses of clonal dynamics using bulk41
DNA sequencing data [Johnson et al., 2014].42
But scDNA-seq will help to increase the res-43
olution of subclonal genotypes. And inte-44
grating this clonal stratification across time- 45
points and with other readouts, such as cell 46
state markers, will allow to determine central 47
model parameters for the detection of positive 48
and negative selection, e.g. rates of prolifera- 49
tion, mutation and death. 50
To also leverage the kinship relationships 51
between cells, population genetic methods 52
and models could be integrated with ap- 53
proaches from phylogenetics. One prominent 54
example of this recent trend is the use of 55
the multi-species coalescent model for ana- 56
lyzing MSAs that contain several individuals 57
for several populations [Rannala and Yang, 58
2017, Liu et al., 2015]. This naturally trans- 59
lates into analyzing tumor subclones as pop- 60
ulations of single cells, capturing some of the 61
population structure seen in cancers. This 62
phylogenetic context also lends itself to mod- 63
eling differences in mutational rates and sig- 64
natures between different cell populations, 65
e.g. between normal somatic evolution before 66
tumor initiation and cancer evolution after 67
tumor initiation, or between different tumor 68
subclones. 69
In this setting, we will have to account 70
for heterotachy (see e.g. Kolaczkowski and 71
Thornton [2008]), that is, we cannot assume 72
a single model of substitution for the entire 73
tree, but have to allow different models to act 74
on distinct branches or subtrees/subclones. 75
Here, anything from a simple model of rate 76
heterogeneity (e.g. Yang [1994]) to an empir- 77
ical mixture model as used for protein evolu- 78
tion [Le et al., 2012] could be considered. 79
A recent example integrating population 80
genetics approaches with phylogenetics, is a 81
computational model for inference of fitness 82
landscapes of cancer clone populations using 83
scDNA-seq data, SCIFIL [Skums et al., 84
2019]. It estimates the maximum likelihood 85
fitness of clone variants by fitting a replicator 86
equation model onto a character-based tumor 87
phylogeny. 88
30
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
1
For the detection of positive selection, a2
number of phylogenetic and population ge-3
netic approaches have been proposed. Phy-4
logenetic trees may be used for detecting5
branches on which positive [Zhang et al.,6
2005] or diversifying episodic selection [Smith7
et al., 2015] is acting. The tests from the8
area of “classic” phylogenetics might serve as9
a starting point for exploring and adapting10
appropriate methods that will allow to asso-11
ciate positive selection events to branches of12
the tumor tree or specific evolutionary events.13
Evolutionary pressures are often quantified by14
the dN/dS ratio of non-synonymous and syn-15
onymous substitutions. In application to tu-16
mor cell populations, however, this ratio may17
not be applicable, as it has been shown to be18
relatively insensitive when applied to popula-19
tions within the same species [Kryazhimskiy20
and Plotkin, 2008]. Other measures have been21
proposed as better suited for detecting selec-22
tion within populations based on time-series23
data and could potentially be transferred to24
tumor cell populations [Neher et al., 2014,25
Gray et al., 2011, Steinbrück and McHardy,26
2011]. An open question is to which extent27
the above tests will be sensitive to errors in28
cancer data as they are known to produce29
high false positive rates in the classic phyloge-30
netic setting if the error rate in the input data31
is too high [Fletcher and Yang, 2010]. Com-32
putationally intense solutions for decreasing33
the high false positive rate have been pro-34
posed [Redelings, 2014], but they might not35
scale to cancer datasets. Importantly, devel-36
opment of tests for positive selection could37
contribute to the discussion of whether the38
evolution of tumors is driven by selection or39
neutral.40
For the detection of negative selection, time41
resolved measurements and resulting prolif-42
eration and death rates could prove equally43
promising. Further, approaches were de-44
veloped to discover epistatic interactions— 45
particularly synthetic lethality—from ge- 46
nomic and transcriptomic data in tumor 47
genomes and cancer cell lines [Szczurek et al., 48
2013, Jerby-Arnon et al., 2014], and patient 49
survival [Matlak and Szczurek, 2017]. Some of 50
these epistatic interactions, however, can be 51
hard to spot in bulk sequencing data, as they 52
may simply disappear because of a low fre- 53
quency. ScDNA-seq, ideally in a time resolved 54
fashion and across individuals, provides much 55
more insight into epistatic interactions than 56
bulk sequencing. The key feature is that 57
it is possible to identify pairs of mutations 58
that often occur simultaneously in the same 59
genome, and pairs that rarely or never do. 60
That is, cells affected by negatively selected 61
or synthetic lethal mutations will go extinct in 62
the tumor population and thus their genotype 63
with the synthetic lethal mutations occurring 64
together will not be observed. Cell death, 65
however, can be the result of mere chance, so 66
to detect significant negative pressures, large 67
cohorts of repeated time resolved experiments 68
would have to be performed. 69
5.3.2 Open problems 70
With an increased resolution of scDNA-seq 71
(section 4.1) and more work on the scDNA- 72
seq challenges described in other sections, it 73
will be possible to determine subclone geno- 74
types in more detail. 75
The first challenge will be to integrate 76
this with the spatial location of single cells 77
obtained from other measurements. This 78
will enable determining whether cells from 79
the same subclones are co-located, whether 80
metastases are founded recurrently by the 81
same subclone(s) and whether individual 82
metastases are founded by individual or mul- 83
tiple subclones. A number of studies utilizing 84
multiple region samples from the same tumor 85
and from distant metastases already paved 86
31
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
the way in investigating these questions [Tu-1
rajlic and Swanton, 2016]. Still, only single-2
cell spatial resolution will allow identification3
of specific individual genotypes in specific lo-4
cations and the drawing precise conclusions.5
The second challenge will be to determine6
rates of proliferation and death per subclone.7
This could be achieved by measuring num-8
bers of mitotic and apoptotic cells per sub-9
clone or by integrating subclone abundance10
profiles across time points. Good estimates11
of these basic parameters will greatly benefit12
models, e.g. for the detection of positive and13
negative selection in cancer.14
A third challenge will be to determine15
subclone-specific rates of mutation. Here, in-16
tegration of models from population genetics17
and phylogenetics holds promise.18
A fourth challenge will be to devise ways19
to determine further relevant model parame-20
ters. For example, comparing expanded sub-21
clones in drug screens to determine subclone22
fitness under different treatment regimes can23
both help to predict subclone resistance (and24
thus expected treatment success) and further25
inform cancer evolution models.26
A final step will then be to put all these27
parameters into context with further infor-28
mation about local microenvironments (such29
as vascular invasion and immune cell infiltra-30
tion), to estimate the selection potential of31
such local factors for or against different sub-32
clones.33
32
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
...
...
...
...
...
...
...
A B C DE F G ...
Approach 0
...
...
...
...
...
...
...
A B C D E F G ...
?
?
?
?
?
?
...
...
...
...
...
...
...
A B C D E F G ...
Approach 1
...
...
...
...
...
...
...
A B C D E F G ...3 5
3 1
Approach 2
...
...
...
...
...
...
...
A B C D E F G ...
Approach 3
x y zp1 ...p2p3p4p5
...
...
...
...
...
A B C D E F G ...
...
...
...
...
...
muta�onal profile
gene expression
...
...
...
...
...
...
...
...
...
...
...
...
A B C D E F G ...
gene methyla�on
chroma�n accessibility
loca�on
...
...
...
...
...
...
A B C D E F G ...Approach 4scDNA-seq
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
scRNA-seqclonalprofiles
Approach 5
...
...
...
...
...
...
A B C D E F G ...
...
p1 ...p2p3p4p5
...
...
...
...
...
...
...
Figure 6: Approaches for integrating single-cell measurement datasets across measurementtypes, samples and experiments, as also described in Table 2.Approach 0 Clustering of cells from one sample from one experiment, no data integration isneeded. Approach 1 Cell populations / clusters from multiple samples but the same mea-surement type need to be linked. Approach 2 For cell populations / clusters across multipleexperiments, stable reference systems like cell atlases are needed (compare Figure 1). Ap-
proach 3 Whenever multiple measurement types can be obtained from the same cell, theyare automatically linked. However, this setup highlights the problem of data sparsity of allavailable measurement types and the dependency of measurement types that needs to beaccounted for. Approach 4 When multiple measurement types cannot be obtained fromthe same cell, a solution is to obtain them from cells of the same cell population. However,this combines the problems of Approach 1 with those of Approach 3. Approach 5 Onepossibility for easing data integration across measurement types from separate cells wouldbe to have a stable reference (cell atlas) across multiple measurement types. Effectively, thiscombines the problems of Approaches 2, 3 and 4.
33
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Integration example MTs example AMs Promises Challenges0 none scDNA-seq,
scRNA-seq,merFISH
clustering /unsupervised
identify new celltypes and states
technical noise
1 within 1 MT,within 1 exp,across > 1 smps
scDNA-seq,scRNA-seq,merFISH
differentialanalyses,time series,spatial sampling
identify effectsacross samplegroups, time andspace
technical noise; batch effects;validate cell type assignments
2 within 1 MT,across > 1 exp,across > 1 smps,
scRNA-seq,merFISH
map cells tostable reference(cell atlas)
accelerate analyses;increase sample size& generalize obser-vations
technical noise; batch effects;validate cell type assignments;standards across experimentalcenters
3 across > 1 MTs,within 1 exp,within 1 cell
scG&T-seq,scM&T-seq,seqFISH
MOFA,DIABLO,MINT
holistic view of biol.processes withincell;quantification ofdependency of MTs
scaling cell throughput;MT combinations limited;dependency of MTs;data sparsity
4 across > 1 MTs,within 1 exp,across > 1 cells,within 1 cell pop
scDNA-seq +
scRNA-seq,DNA-seq +
scRNA-seq
Cardelino,Clonealign,MATCHER
use existing datasets(faster than 3);flexible experimen-tal design
technical noise;validate cell / data grouping;test assumptions for integratingdata
5 across > 1 MTs,across > 1 exps,across > 1 smps,within cells
hypothetical:any combina-tion
hypothetical:multi-omicHCA,single-cellTCGA
comprehensive char-acterizations of bio-logical systems
all from approaches 2, 3 & 4;standards across experimentalcenters
Table 2: Approaches for data integration and their potential.Abbreviations: AM – analysis method; exp(s) – experiment(s); HCA – human cell atlas;MT – measurement type; smps – samples; TCGA – The Cancer Genome Atlas
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
6 Overarching challenges1
6.1 Challenge XI: Integration of2
single-cell data: across3
samples, experiments and4
types of measurement5
Biological processes are complex and dy-6
namic, varying across cells and organisms. To7
comprehensively analyze such processes, dif-8
ferent types of measurements from multiple9
experiments need to be obtained and inte-10
grated. Depending on the actual research11
question, such experiments will refer to dif-12
ferent time points, tissues or organisms. For13
different measurement types, we put particu-14
lar emphasis on the combination of scRNA-15
seq and scDNA-seq data, although augment-16
ing sequencing data with records on protein17
or metabolite levels is also possible.18
Since the exploration of complex, dynamic19
and variable processes requires the integration20
of data from multiple experiments, we need21
flexible but rigorous statistical and compu-22
tational frameworks to support that integra-23
tion. See Table 2 and Figure 6 for an overview24
of how the issues in creating such frameworks25
can vary relative to the particular problem2.26
When aiming at the identification of pat-27
terns of differential expression, so as to char-28
acterize variability across organisms, individ-29
uals, or location, data refers to the same30
(unique) measurement type (for example,31
only scRNA-seq), but stems from different32
time points, different locations (such as dif-33
ferent tissues or sites in a tumor), or different34
organisms. See Approach 1 in Figure 6 and35
Table 2 for methodological challenges arising36
2Graph representation in Figure 6 Approaches 2 and5 taken from Wolf et al. [2019], Fig. 3, providedunder Creative Commons Attribution 4.0 Interna-tional License (http://creativecommons.org/licenses/by/4.0/)
from this scenario. 37
Another scenario arises when aiming at a 38
general increase in sample sizes, so as to gen- 39
eralize (and statistically corroborate) obser- 40
vations. The increase in generality may fur- 41
ther support the construction of a reference 42
system, such as a cell atlas, the existence of 43
which can support decisive speed-ups when 44
classifying cells or cell states, investigated in 45
subsequent experiments (see section 3.3). In- 46
creasing sample sizes often means that data is 47
raised across multiple experiments of identi- 48
cal setup, for example experimental replicates 49
possibly raised in different laboratories, such 50
that statistically accounting for batch effects 51
is a decisive factor. See Approach 2 in Fig- 52
ure 6 and Table 2 for respective methodolog- 53
ical challenges. 54
Yet another scenario manifests when try- 55
ing to unravel complexity and coordination 56
of intracellular biological processes, as well 57
as their mutual dependencies, so as to draw 58
a comprehensive picture of a single cell. In 59
this, an optimal setup is to raise data from 60
just one single cell across multiple experi- 61
ments referring to different types of mea- 62
surements, such as scDNA-seq, scRNA-seq, 63
possibly further augmented by measurements 64
of chromatin accessibility, gene methylation, 65
proteins or metabolites. See Approach 3 in 66
Figure 6 and Table 2 for this scenario. 67
Co-measuring different and possibly con- 68
curring types of quantities, for example 69
scRNA-seq and scDNA-seq [Kong et al., 70
2019], in just one single cell can be experi- 71
mentally challenging or even just impossible 72
at this point in time. An exit strategy to this 73
problem is to raise a population of cells that 74
is coherent in terms of cell type and state. 75
One then spreads the different measurements 76
across several single cells, all of which are 77
drawn from this population. Upon having ap- 78
plied the different measurements on different 79
single cells, one needs to combine the data 80
35
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
raised in a way that is biologically meaning-1
ful, respecting that each measurement stems2
from a different cell. Note that this approach3
encompasses the possibility to raise data both4
from single cells, and from bulks of cells. An5
example for the latter are bulk sequencing de-6
rived genotypes which one uses for imputa-7
tion of missing values or the quantification of8
data that have remained uncertain in single9
cells that stem from the same population as10
the bulk. The integration of different types of11
data raised across multiple single cells, pos-12
sibly including bulk data, casts issues that13
deserve attention in their own right (see Ap-14
proach 4 in Figure 6 and Table 2), because15
these issues can substantially differ from the16
methods referring to Approach 3.17
The most comprehensive goal, finally, may18
be to gain deeper insight into the complexity19
of (intra-)cellular circuits, and to chart20
their variability across time, tissues, and21
populations. Mapping cellular circuits in this22
comprehensive manner requires to take com-23
plementary and concurring measurements24
in single cells and across multiple single25
cells, possibly also across time, tissues and26
populations. Approach 5 in Figure 6 and27
Table 2 deals with this holistic approach to28
examining single cells. The ultimate goal is29
to comprehensively characterize biological30
systems, which requires to operate at the31
single-cell level, because one would not gain32
sufficient insight otherwise.33
34
The challenges just outlined in terms of Ap-35
proaches 1-5 in Figure 6 and Table 2 all are36
affected by the issues that influence single-37
cell data analysis in general, namely: (i) the38
varying resolution levels that are of interest39
depending on the research question at hand40
(section 2.1); (ii) the uncertainty of any mea-41
surements and how to quantify it for and dur-42
ing the analyses (section 2.2) and (iii) the43
scaling of single-cell methodology to more44
cells and more features measured at once (sec- 45
tion 2.3). All of these further compound the 46
most important challenge in the integration 47
of single-cell data: to link data from the dif- 48
ferent sources in a way that is biologically 49
meaningful and supports the intended anal- 50
ysis. It is an immediate insight that the 51
maps that describe how data from the differ- 52
ent sources is linked, increase in complexity 53
on increasing amounts of samples, time points 54
and types of measurements (Figure 6, Ta- 55
ble 2): Linking multiple samples referring to 56
the same quantity measured within one exper- 57
iment (Approach 1 in Figure 6 and Table 2) 58
or across several experiments (Approach 2) 59
needs to account for batch effects. Of course, 60
whenever possible, batch effects should be 61
minimized by establishing (global) standards 62
affecting experimental centers worldwide to 63
streamline common initiatives. Nevertheless, 64
even if standards have been successfully es- 65
tablished, additional validation of, for exam- 66
ple, assignments of cells to types and states 67
may be required. 68
The integration of measurements on mul- 69
tiple quantities (such as scRNA-seq and 70
scDNA-seq) raised in one single cell (Ap- 71
proach 3) needs to account for dependencies 72
if phenomena are concurrent. An illustrative 73
example is to measure copy number variation 74
(through scDNA-seq) or methylation so as to 75
investigate their effects on RNA levels (mea- 76
sured through scRNA-seq). 77
Linking multiple types of measurement 78
across different cells from the same cell pop- 79
ulation (Approach 4) may require the group- 80
ing of cells after experiments have been per- 81
formed, because only then does disturbing 82
variability among the (prior to the experiment 83
assumed coherent) different cells become ev- 84
ident. An example is to group cells based 85
on commonalities or differences in their geno- 86
type profile, having become evident only after 87
the application of a scDNA-seq experiment. 88
36
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Any assumptions that underlie these possible1
groupings need to resist thorough statistical2
testing and functional validation.3
6.1.1 Status4
For unsupervised clustering (Approach 0 in5
Figure 6 and Table 2), method development is6
a well-established field. Remaining challenges7
have already been identified systematically,8
see Duò et al. [2018], Freytag et al. [2018],9
Kiselev et al. [2019].10
For integrating multiple datasets of the11
same measurement type across different sam-12
ples in one experiment (Approach 1), a few13
approaches are available. See for exam-14
ple MNN [Haghverdi et al., 2018], and the15
methodologies included in the Seurat pack-16
age [Satija et al., 2015, Butler et al., 2018b,17
Stuart et al., 2018]. For the challenges and18
promises referring to the integration of sc-seq19
data that vary in terms of spatial and tempo-20
ral origin, see the discussions in the section 3.521
and section 5.3 below.22
For integrating multiple datasets of the23
same measurement type across experiments24
(Approach 2), mapping cells to reference25
datasets such as the Human Cell Atlas [Regev26
et al., 2017] are currently emerging as the27
most promising strategy. We refer the reader28
to more particular and detailed discussions in29
section 3.3. If applicable reference systems30
are not available (note that the human cell31
atlas is not yet fully operable), assembling32
cell type clusters from different experiments33
is a reasonable strategy, as implemented by34
several recently published tools [Zhang et al.,35
2018, Barkas et al., 2018, Gao et al., 2018,36
Kiselev et al., 2018, Park et al., 2018, Wag-37
ner and Yanai, 2018, Boufea et al., 2019, Jo-38
hansen and Quon, 2019, Johnson et al., 2019].39
The integration of data raised from one cell,40
referring to multiple types of measurements41
(Approach 3) is described in some particular42
experimental protocols that address the issue 43
[Macaulay et al., 2017]. These focus on com- 44
bining scDNA-seq and scRNA-seq (Dey et al. 45
[2015], Macaulay et al. [2016, 2017]), methyla- 46
tion data and scRNA-seq [Angermueller et al., 47
2016], or even all of scRNA-seq, scDNA-seq, 48
methylation and chromatin accessibility data 49
[Clark et al., 2018], or targeted queries on a 50
cell’s methylation, transcription (scRNA-seq) 51
and genotype status (sc-GEM, Cheow et al. 52
[2016]). Beyond these single-cell specific ap- 53
proaches, bulk approaches that address the 54
integration of data from different types of ex- 55
periments have the potential to be leveraged 56
to account for single-cell specific noise char- 57
acteristics or adapted to also qualify for cor- 58
responding single-cell analyses (MOFA, Arge- 59
laguet et al. [2018]), DIABLO [Rohart et al., 60
2017b, Singh et al., 2018] and MINT [Rohart 61
et al., 2017a]). 62
For the integration of different measure- 63
ments performed on several cells all of which 64
stem from a population of cells that is co- 65
herent with respect to the intended analysis 66
(Approach 4), technologies such as 10X ge- 67
nomics [Zheng et al., 2017] for scRNA-seq 68
and direct library preparation (DLP, Zahn 69
et al. [2017b]) for scDNA-seq establish an ex- 70
perimental basis. As above-mentioned, the 71
greater analytical challenge is to, upon hav- 72
ing performed experiments, identify subpop- 73
ulations that had hitherto remained invis- 74
ible, and whose identification is crucial so 75
as to not combine different types of data 76
in mistaken ways. An example for this are 77
the identification of cancer clones although 78
single cells had been sampled from identi- 79
cal tumor tissue—only performing scDNA- 80
seq experiments can definitively reveal the 81
clonal structure of a tumor. If one wishes 82
to correctly link mutation with transcription 83
profiles—the latter of which are examined via 84
scRNA-seq experiments—ignoring the clonal 85
structure of a tumor would be misleading. 86
37
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Several analytical methods that address this1
problem have recently emerged: (i) clonealign2
[Campbell et al., 2019] assumes a copy-num-3
ber dosage effect on transcription to assign4
gene expression states to clones. (ii) cardelino5
[McCarthy et al., 2018] aligns clone-specific6
SNVs in scRNA-seq to those inferred from7
bulk exome data to infer clone-specific ex-8
pression patterns. (iii) MATCHER [Welch9
et al., 2017] uses manifold alignment to com-10
bine scM&T-seq [Angermueller et al., 2016]11
with sc-GEM [Cheow et al., 2016], leverag-12
ing the common set of loci. All of these13
methods are based on biologically coherent14
assumptions on how to summarize measure-15
ments across different types and samples in a16
reasonable way, despite their different physi-17
cal origin.18
6.1.2 Open problems19
Experimental technologies that deal with tak-20
ing measurements of different kinds on one21
single cell (Approach 3 in Figure 6 and Ta-22
ble 2) are on the rise and will allow to as-23
say more cells at higher fidelity and reduced24
cost. Yet, however, many methods for evalu-25
ating combinations of different types of mea-26
surements performed on one single cell have27
not been in the focus. It is to be expected28
that the corresponding open problems will be-29
come more urgent. As an example, consider30
combined measurements of scDNA-seq and31
scRNA-seq, where one uses the transcripts de-32
rived from the latter to impute missing values33
in the genotype profile derived from the first.34
While this may make Approach 4 look as35
if becoming gradually obsolete, the advances36
with respect to Approach 3 and the corre-37
sponding advances in terms of the resolution38
of how intracellular measurements of different39
types are linked with one another will benefit40
from ground work on Approach 4. Further,41
work using Approach 4 will mean a boost for42
reference systems, such as cell atlases (see also 43
Approach 2), because our understanding of 44
the link between the different substrates mea- 45
sured will improve. As an example consider 46
how gene expression increases on increasing 47
genomic copy number, known as measure- 48
ment linkage [Loper et al., 2019], are impor- 49
tant to account for in such a reference system. 50
This, in turn, will yield techniques that map 51
different cellular quantities with greater ac- 52
curacy, eventually allowing analyses at higher 53
resolution and finer granularity. As a con- 54
sequence, approaches that address taking dif- 55
ferent measurement across different cells from 56
the same population (Approach 4) will deliver 57
more fine-grained results, hence also thanks 58
to these approaches being easier to perform 59
and being more cost efficient, likely will not 60
experience a loss in popularity. 61
As just mentioned, advances with respect 62
to Approach 3 and 4 will be partially based on 63
advances in terms of mappings that connect 64
cells across their types and states, see Ap- 65
proach 2. With combinations of measurement 66
types gradually being shifted in the focus of 67
attention, extensions of Approach 2 (which 68
predominantly addresses how to connect dif- 69
ferent cells based on a single measurement) 70
are necessary. These extensions will have to 71
address how to connect different cells also in 72
terms of multiple types of measurements, or 73
even combinations thereof, such as integrative 74
genotype-expression-profiles (raised by evalu- 75
ating combined experiments on both scRNA- 76
seq and scDNA-seq, for example), which 77
points out the need for improvements address- 78
ing Approach 5. 79
Amounts of material that underlie most 80
measurements will remain tiny, oftentimes 81
limited by the amounts within a single cell 82
and by a limited number of cells available 83
from a particular cell population. This means 84
that one overarching theme will persist: that 85
the analyses we have just discussed will suf- 86
38
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
fer from missing entire views—samples, time1
points, or measurement types missing entirely2
at the time of training models or mapping3
quantities on one another. This will add to4
the difficulties in terms of missing data one5
experiences in non-integrative approaches.6
6.2 Challenge XII: Validating7
and benchmarking analysis8
tools for single-cell9
measurements10
With the advances in sc-seq and other single-11
cell technologies, more and more analysis12
tools become available for researchers, and13
even more are being developed and will be14
published in the near future. Thus, the need15
for datasets and methods that support sys-16
tematic benchmarking and evaluation of these17
tools is becoming more pressing. To be useful18
and reliable, algorithms and pipelines should19
be able to pass the following quality control20
tests: (i) They should produce the expected21
results (e.g. reconstruct phylogenies, estimate22
differential expressions or cluster the data) of23
high quality and outperform existing meth-24
ods, if such methods exist. (ii) They should25
be robust to high levels of sequencing noise26
and technological biases, including PCR bias,27
allele dropout and chimeric signals. In any28
case, benchmarking should be conducted in a29
systematic way, following established recom-30
mendations [Mangul et al., 2019, Weber et al.,31
2019].32
Evaluation of tool performance requires33
benchmarking datasets with known ground34
truth. Such data should include cell popula-35
tions with known genomic compositions and36
population structures, i.e. where frequencies37
of clones and alleles are known. Currently,38
such datasets are scarce—with some notable39
exceptions [Grün et al., 2014, Tian et al.,40
2019]—because generating them in genuine41
laboratory settings is time-, labor- and cost- 42
intensive. Experimental benchmark datasets 43
for evolutionary analysis of single-cell pop- 44
ulations are even harder to obtain, as they 45
require follow-up samples with known infor- 46
mation about evolutionary trajectories and 47
developmental times. With lack of time- 48
resolved measurements, only anecdotal evi- 49
dence exists on, for instance, how the ac- 50
curacy of phylogenetic inferences is affected 51
by data quality. Availability of such gold- 52
standard datasets would benefit single-cell ge- 53
nomics research enormously. 54
Due to aforementioned difficulties, the most 55
affordable sources of benchmarking and vali- 56
dation data are in silico simulations. Simula- 57
tions provide ground truth test examples that 58
can be rapidly and cost-effectively generated 59
under different assumptions. However, devel- 60
opment of reliable simulation tools require de- 61
sign and implementation of models which cap- 62
ture the essence of underlying biological pro- 63
cesses and technological details of single-cell 64
technologies and high-throughput sequencing 65
platforms, establishing single-cell data sim- 66
ulation as a methodologically involved chal- 67
lenge. 68
6.2.1 Status 69
Recent studies [Soneson and Robinson, 2018, 70
Saelens et al., 2019] show that systematic 71
benchmarking of different single-cell analysis 72
methodologies has begun. However, to the 73
best of our knowledge, there is still a short- 74
age of single-cell data simulation tools. Many 75
single-cell data analysis packages include their 76
own ad hoc data simulators [Vallejos et al., 77
2015, Korthauer et al., 2016a, Lun et al., 2016, 78
Lun and Marioni, 2017, Jahn et al., 2016, Sa- 79
tas and Raphael, 2018, Rizzetto et al., 2017, 80
Köster et al., 2017]. However, these simula- 81
tors are usually not available as separate tools 82
or even as a source code, tailored to specific 83
39
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
problems studied in corresponding papers and1
sometimes not comprehensively documented,2
thus limiting their utility for the broad re-3
search community. Furthermore, since such4
simulators are used only as auxiliary subrou-5
tines inside particular projects and are not6
published as stand-alone tools, they them-7
selves are usually not evaluated, and there-8
fore the accuracy of their reflection of real9
biological and technological processes remain10
unclear. There are few exceptions known11
to us, including the tools Splatter [Zappia12
et al., 2017], powsimR [Vieth et al., 2017],13
and SymSim [Zhang et al., 2019d], which pro-14
vide frameworks for simulation of scRNA-seq15
data and whose accuracy has been validated16
by comparison of its results with real data.17
For single-cell phylogenomics, cancer genome18
evolution simulators are being designed [Se-19
meraro et al., 2018, Xia et al., 2018, Meng20
and Chen, 2018].21
6.2.2 Open problems22
Simulation tools mostly concentrate on differ-23
ential expression analysis, while comprehen-24
sive simulation methods for other important25
aspects of sc-seq analysis are still to be devel-26
oped. In particular, to the best of our knowl-27
edge, no such tool is available for scDNA-seq28
data.29
With single-cell phylogenomics, one would30
like to assess the accuracy of methods for31
phylogenetic inference and subclone identifi-32
cation, or the power of population genetics33
methods for estimating parameters of interest34
(e.g. tests for selection and epistatic interac-35
tions in cancer, see section 5.3). To this end,36
realistic and comprehensive (w.r.t. the evolu-37
tionary phenomena) simulation tools are re-38
quired.39
Another interesting computational problem40
is development of tools for validation of simu-41
lated sc-seq datasets themselves by their com-42
parison with real data using a comprehen- 43
sive set of biological parameters. The first 44
such tool for scRNA-seq data is countsimQC 45
[Soneson and Robinson, 2017], but similar 46
tools for scDNA-seq data are needed. Finally, 47
most of the simulators concentrate on model- 48
ing of biologically meaningful data, while ig- 49
noring or simplifying models for sc-seq errors 50
and artifacts. 51
Another important challenge in single-cell 52
analysis tool validation is the selection of com- 53
prehensive evaluation metrics, which should 54
be used for comparison of different analysis 55
results with each other and with the ground 56
truth. For single-cell data it is particularly 57
complicated, since many analysis tools deal 58
with heterogeneous clone populations, which 59
possesses multiple biological characteristics to 60
be inferred and analyzed. Development of 61
a single measure which captures several of 62
these characteristics is complicated, and in 63
many cases impossible. For example, valida- 64
tion of tools for imputation of cellular and 65
transcriptional heterogeneity should simulta- 66
neously evaluate two measures: (i) how close 67
are the reconstructed and true cellular ge- 68
nomic profiles and (ii) how close are recon- 69
structed and true SNV/haplotype frequency 70
distributions. Development of synthetic mea- 71
sures which capture several such characteris- 72
tics (e.g. based on utilization of earth mover’s 73
distance [Knyazev et al., 2018]) is highly im- 74
portant. 75
When simulating datasets in general, the 76
circularity of simulating and inferring pa- 77
rameters under the same—possibly simplis- 78
tic model—should be critically assessed, as 79
should potential biases. Thus, further eval- 80
uation on empirical datasets for which some 81
ground truth is known will be invaluable. Ide- 82
ally, all single-cell analysis fields should define 83
a standard set of benchmark datasets that will 84
allow for assessing and comparing methods or 85
come up with a regular data analysis chal- 86
40
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
lenge. This approach has been very success-1
ful, e.g. in protein structure prediction3 and2
metagenomic analyses4. A first step in this3
direction was the recent single-cell transcrip-4
tomics DREAM challenge5.5
7 Acknowledgements6
We are deeply grateful to the Lorentz Cen-7
ter for hosting the workshop “Single Cell Data8
Science: Making Sense of Data from Billions9
of Single Cells” (4–8 June 2018). In par-10
ticular, we would like to thank the Lorentz11
Center staff, who turned organizing and at-12
tending the workshop into a great pleasure.13
For a week, the authors of this review came14
together—researchers from the fields of statis-15
tics and medicine, computer science and biol-16
ogy, and any combinations thereof. In inter-17
active workshop sessions, we brought together18
our knowledge of single-cell analyses, ranging19
from the wet-lab to the server cluster, from20
statistical models to algorithms, from can-21
cer biology to evolutionary genetics. During22
these sessions, we formulated an initial set of23
challenges that was further systematized and24
refined in the following months, and substan-25
tiated with extensive literature research of the26
respective state-of-the-art for this review.27
Acronyms28
CNV copy number variation. 22, 24, 25, 27,29
2830
ICA independent component analysis. 1031
3http://predictioncenter.org/4https://data.cami-challenge.org5https://www.synapse.org/#!Synapse:
syn15665609/wiki/582909
MALBAC multiple annealing and looping- 32
based amplification cycles. 20, 22 33
MDA multiple displacement amplification. 34
20, 22 35
MSA multiple sequence alignment. 25, 26, 36
30 37
NMF non-negative matrix factorization. 10 38
PCA principal component analysis. 10, 11 39
PCR polymerase chain reaction. 20, 22, 39 40
sc-seq single-cell sequencing. 3–6, 29, 37, 39, 41
40 42
scDNA-seq single-cell DNA sequencing. 3, 43
6–8, 16, 20–25, 27–31, 34–38, 40 44
SCDS Single-Cell Data Science. 3, 4, 7 45
scRNA-seq single-cell RNA sequencing. 3, 46
5–14, 16–18, 22, 24, 34–38, 40 47
SNV single nucleotide variation. 6, 20, 22– 48
28, 38, 40 49
WGA whole genome amplification. 20, 22–25 50
References 51
Andre J Aberer, Kassian Kobert, and Alexan- 52
dros Stamatakis. ExaBayes: massively par- 53
allel bayesian tree inference for the whole- 54
genome era. Mol. Biol. Evol., 31(10):2553– 55
2556, October 2014. 56
Ahmet Acar, Daniel Nichol, Javier 57
Fernandez-Mateos, George D. Cress- 58
well, Iros Barozzi, Sung Pil Hong, 59
Inmaculada Spiteri, Mark Stubbs, Rose- 60
mary Burke, Adam Stewart, Georgios 61
Vlachogiannis, Carlo C. Maley, Luca 62
Magnani, Nicola Valeri, Udai Banerji, and 63
41
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Andrea Sottoriva. Exploiting evolution-1
ary herding to control drug resistance2
in cancer. bioRxiv, page 566950, March3
2019. doi: 10.1101/566950. URL4
https://www.biorxiv.org/content/10.5
1101/566950v1.6
Sumon Ahmed, Magnus Rattray, and7
Alexis Boukouvalas. GrandPrix: scaling8
up the Bayesian GPLVM for single-9
cell data. Bioinformatics, 35(1):47–10
54, January 2019. ISSN 1367-4803.11
doi: 10.1093/bioinformatics/bty533.12
URL https://academic.oup.com/13
bioinformatics/article/35/1/47/14
5047752.15
Philipp M Altrock, Lin L Liu, and Franziska16
Michor. The mathematics of cancer: in-17
tegrating quantitative models. Nat. Rev.18
Cancer, 15(12):730–745, December 2015.19
Matthew Amodio, David van Dijk, Krish-20
nan Srinivasan, William S Chen, Hussein21
Mohsen, Kevin R Moon, Allison Campbell,22
Yujiao Zhao, Xiaomei Wang, Manjunatha23
Venkataswamy, Anita Desai, V Ravi, Priti24
Kumar, Ruth Montgomery, Guy Wolf, and25
Smita Krishnaswamy. Exploring Single-26
Cell data with deep multitasking neural27
networks. January 2019.28
Benedict Anchang, Tom D. P. Hart, Sean C.29
Bendall, Peng Qiu, Zach Bjornson, Michael30
Linderman, Garry P. Nolan, and Sylvia K.31
Plevritis. Visualization and cellular32
hierarchy inference of single-cell data33
using SPADE. Nature Protocols, 11(7):34
1264–1279, July 2016. ISSN 1754-2189.35
doi: 10.1038/nprot.2016.066. URL http:36
//www.nature.com/nprot/journal/v11/37
n7/full/nprot.2016.066.html.38
Tallulah S. Andrews and Martin Hem-39
berg. False signals induced by single-cell40
imputation. F1000Research, 7:1740, 41
March 2019. ISSN 2046-1402. doi: 42
10.12688/f1000research.16613.2. URL 43
https://f1000research.com/articles/ 44
7-1740/v2. 45
Michael Angelo, Sean C. Bendall, Rachel 46
Finck, Matthew B. Hale, Chuck Hitzman, 47
Alexander D. Borowsky, Richard M. Lev- 48
enson, John B. Lowe, Scot D. Liu, Shuchun 49
Zhao, Yasodha Natkunam, and Garry P. 50
Nolan. Multiplexed ion beam imaging of 51
human breast tumors. Nature Medicine, 20 52
(4):436–442, April 2014. ISSN 1546-170X. 53
doi: 10.1038/nm.3488. 54
Christof Angermueller, Stephen J. Clark, 55
Heather J. Lee, Iain C. Macaulay, Mabel J. 56
Teng, Tim Xiaoming Hu, Felix Krueger, 57
Sebastien Smallwood, Chris P. Ponting, 58
Thierry Voet, Gavin Kelsey, Oliver Ste- 59
gle, and Wolf Reik. Parallel single-cell se- 60
quencing links transcriptional and epige- 61
netic heterogeneity. Nature Methods, 13(3): 62
229–232, March 2016. ISSN 1548-7105. doi: 63
10.1038/nmeth.3728. 64
Christof Angermueller, Heather J. Lee, 65
Wolf Reik, and Oliver Stegle. Deep- 66
CpG: accurate prediction of single-cell 67
DNA methylation states using deep learn- 68
ing. Genome Biology, 18(1):67, April 69
2017. ISSN 1474-760X. doi: 10.1186/ 70
s13059-017-1189-z. URL https://doi. 71
org/10.1186/s13059-017-1189-z. 72
Ricard Argelaguet, Britta Velten, Damien 73
Arnol, Sascha Dietrich, Thorsten Zenz, 74
John C. Marioni, Florian Buettner, Wolf- 75
gang Huber, and Oliver Stegle. Multi- 76
Omics Factor Analysis—a framework for 77
unsupervised integration of multi-omics 78
data sets. Molecular Systems Biology, 79
14(6):e8124, June 2018. ISSN 1744- 80
4292, 1744-4292. doi: 10.15252/msb. 81
42
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
20178124. URL http://msb.embopress.1
org/content/14/6/e8124.2
C Arisdakessian, O Poirion, B Yunits,3
X Zhu, and L Garmire. DeepIm-4
pute: an accurate, fast and scalable5
deep neural network method to im-6
pute single-cell RNA-Seq data. bioRxiv,7
2018. URL https://www.biorxiv.org/8
content/10.1101/353607v1.abstract.9
Nona Arneson, Simon Hughes, Richard Houl-10
ston, and Susan Done. Whole-Genome am-11
plification by improved primer extension12
preamplification PCR (I-PEP-PCR). CSH13
Protoc., 2008:db.prot4921, January 2008.14
Damien Arnol, Denis Schapiro, Bernd Bo-15
denmiller, Julio Saez-Rodriguez, and Oliver16
Stegle. Modelling cell-cell interactions from17
spatial molecular data with spatial variance18
component analysis, 2018.19
Daniel L. Ayres. Research And Applica-20
tion Of Parallel Computing Algorithms For21
Statistical Phylogenetic Inference. PhD22
thesis, University of Maryland, 2017.23
URL http://drum.lib.umd.edu/handle/24
1903/19951.25
Elham Azizi, Sandhya Prabhakaran, Ambrose26
Carr, and Dana Pe’er. Bayesian Infer-27
ence for Single-cell Clustering and Imput-28
ing. Genomics and Computational Biol-29
ogy, 3(1):46, January 2017. ISSN 2365-30
7154. doi: 10.18547/gcb.2017.vol3.iss1.e46.31
URL https://genomicscomputbiol.org/32
ojs/index.php/GCB/article/view/46.33
Rhonda Bacher and Christina Kendziorski.34
Design and computational analysis35
of single-cell RNA-sequencing ex-36
periments. Genome Biology, 17(1):37
63, April 2016. ISSN 1474-760X.38
doi: 10.1186/s13059-016-0927-y.39
URL https://doi.org/10.1186/ 40
s13059-016-0927-y. 41
Md Bahadur Badsha, Rui Li, Boxiang Liu, 42
Yang I Li, Min Xian, Nicholas E Banovich, 43
and Audrey Qiuyan Fu. Imputation of 44
single-cell gene expression with an autoen- 45
coder neural network. December 2018. 46
Bjorn Bakker, Aaron Taudt, Mirjam E 47
Belderbos, David Porubsky, Diana C J 48
Spierings, Tristan V de Jong, Nancy 49
Halsema, Hinke G Kazemier, Karina 50
Hoekstra-Wakker, Allan Bradley, Eveline S 51
J M de Bont, Anke van den Berg, Victor 52
Guryev, Peter M Lansdorp, Maria Colomé- 53
Tatché, and Floris Foijer. Single-cell se- 54
quencing reveals karyotype heterogeneity in 55
murine and human malignancies. Genome 56
Biol., 17(1):115, May 2016. 57
Nikolas Barkas, Viktor Petukhov, Daria Niko- 58
laeva, Yaroslav Lozinsky, Samuel Demhar- 59
ter, Konstantin Khodosevich, and Peter V. 60
Kharchenko. Wiring together large single- 61
cell RNA-seq sample collections. bioRxiv, 62
page 460246, November 2018. doi: 10.1101/ 63
460246. URL https://www.biorxiv.org/ 64
content/10.1101/460246v1. 65
Nico Battich, Thomas Stoeger, and Lucas 66
Pelkmans. Control of transcript variabil- 67
ity in single mammalian cells. Cell, 163(7): 68
1596–1610, December 2015. 69
Benedikt Bauer, Reiner Siebert, and Arne 70
Traulsen. Cancer initiation with epistatic 71
interactions between driver and passenger 72
mutations. J. Theor. Biol., 358:52–60, Oc- 73
tober 2014. 74
Niko Beerenwinkel, Tibor Antal, David 75
Dingli, Arne Traulsen, Kenneth W Kinzler, 76
Victor E Velculescu, Bert Vogelstein, and 77
Martin A Nowak. Genetic progression and 78
43
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
the waiting time to cancer. PLoS Comput.1
Biol., 3(11):e225, November 2007.2
Graham R. Bignell, Thomas Santarius, Jes-3
sica C.M. Pole, Adam P. Butler, Janet4
Perry, Erin Pleasance, Chris Greenman,5
Andrew Menzies, Sheila Taylor, Sarah6
Edkins, Peter Campbell, Michael Quail,7
Bob Plumb, Lucy Matthews, Kirsten8
McLay, Paul A.W. Edwards, Jane Rogers,9
Richard Wooster, P. Andrew Futreal,10
and Michael R. Stratton. Architec-11
tures of somatic genomic rearrangement12
in human cancer amplicons at sequence-13
level resolution. Genome Research, 1714
(9):1296–1303, 2007. doi: 10.1101/gr.15
6522707. URL http://genome.cshlp.16
org/content/17/9/1296.abstract.17
Konstantin A Blagodatskikh, Vladimir M18
Kramarov, Ekaterina V Barsova, Alexey V19
Garkovenko, Dmitriy S Shcherbo, An-20
drew A Shelenkov, Vera V Ustinova,21
Maria R Tokarenko, Simon C Baker, Ta-22
tiana V Kramarova, and Konstantin B Ig-23
natov. Improved DOP-PCR (iDOP-PCR):24
A robust and simple WGA method for effi-25
cient amplification of low copy number ge-26
nomic DNA. PLoS One, 12(9):e0184507,27
September 2017.28
L Blanco, A Bernad, J M Lázaro, G Martín,29
C Garmendia, and M Salas. Highly ef-30
ficient DNA synthesis by the phage phi31
29 DNA polymerase. symmetrical mode of32
DNA replication. J. Biol. Chem., 264(15):33
8935–8940, May 1989.34
Craig L. Bohrson, Alison R. Barton,35
Michael A. Lodato, Rachel E. Rodin,36
Lovelace J. Luquette, Vinay V. Viswanad-37
ham, Doga C. Gulhan, Isidro Cortés-38
Ciriano, Maxwell A. Sherman, Min-39
seok Kwon, Michael E. Coulter, Alon40
Galor, Christopher A. Walsh, and41
Peter J. Park. Linked-read analysis 42
identifies mutations in single-cell DNA- 43
sequencing data. Nature Genetics, 44
page 1, March 2019. ISSN 1546-1718. 45
doi: 10.1038/s41588-019-0366-2. URL 46
https://www.nature.com/articles/ 47
s41588-019-0366-2. 48
Katerina Boufea, Sohan Seth, and Nizar N. 49
Batada. scID: Identification of equivalent 50
transcriptional cell populations across 51
single cell RNA-seq data using discrim- 52
inant analysis. bioRxiv, page 470203, 53
January 2019. doi: 10.1101/470203. URL 54
https://www.biorxiv.org/content/10. 55
1101/470203v2. 56
Ivana Bozic, Tibor Antal, Hisashi Ohtsuki, 57
Hannah Carter, Dewey Kim, Sining Chen, 58
Rachel Karchin, Kenneth W Kinzler, Bert 59
Vogelstein, and Martin A Nowak. Accu- 60
mulation of driver and passenger muta- 61
tions during tumor progression. Proc. Natl. 62
Acad. Sci. U. S. A., 107(43):18545–18550, 63
October 2010. 64
Ivana Bozic, Jeffrey M Gerold, and Mar- 65
tin A Nowak. Quantifying clonal and sub- 66
clonal passenger mutations in cancer evolu- 67
tion. PLoS Comput. Biol., 12(2):e1004731, 68
February 2016. 69
James A. Briggs, Caleb Weinreb, Daniel E. 70
Wagner, Sean Megason, Leonid Peshkin, 71
Marc W. Kirschner, and Allon M. Klein. 72
The dynamics of gene expression in ver- 73
tebrate embryogenesis at single-cell res- 74
olution. Science, 360(6392):eaar5780, 75
June 2018. ISSN 0036-8075, 1095- 76
9203. doi: 10.1126/science.aar5780. 77
URL http://science.sciencemag.org/ 78
content/360/6392/eaar5780. 79
Jane Bromley, James W. Bentz, Léon Bot- 80
tou, Isabelle Guyon, Yann Lecun, Cliff 81
44
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Moore, Eduard Säckinger, and Roopak1
Shah. Signature verification using a2
“siamese” time delay neural network.3
International Journal of Pattern Recog-4
nition and Artificial Intelligence, 07(04):5
669–688, August 1993. ISSN 0218-0014.6
doi: 10.1142/S0218001493000339. URL7
https://www.worldscientific.com/8
doi/10.1142/S0218001493000339.9
Robert V Bruggner, Bernd Bodenmiller,10
David L Dill, Robert J Tibshirani, and11
Garry P Nolan. Automated identification12
of stratifying signatures in cellular subpop-13
ulations. Proc. Natl. Acad. Sci. U. S. A.,14
111(26):E2770–7, July 2014.15
Jason D. Buenrostro, Beijing Wu, Ulrike M.16
Litzenburger, Dave Ruff, Michael L.17
Gonzales, Michael P. Snyder, Howard Y.18
Chang, and William J. Greenleaf. Single-19
cell chromatin accessibility reveals princi-20
ples of regulatory variation. Nature, 52321
(7561):486–490, July 2015. ISSN 1476-22
4687. doi: 10.1038/nature14590. URL23
https://www.nature.com/articles/24
nature14590.25
Jason D. Buenrostro, M. Ryan Corces,26
Caleb A. Lareau, Beijing Wu, Alicia N.27
Schep, Martin J. Aryee, Ravindra Ma-28
jeti, Howard Y. Chang, and William J.29
Greenleaf. Integrated Single-Cell Analy-30
sis Maps the Continuous Regulatory Land-31
scape of Human Hematopoietic Differen-32
tiation. Cell, 173(6):1535–1548.e16, 2018.33
ISSN 1097-4172. doi: 10.1016/j.cell.2018.34
03.074.35
Florian Buettner, Naruemon Pratanwanich,36
Davis J McCarthy, John C Marioni, and37
Oliver Stegle. f-scLVM: scalable and ver-38
satile factor analysis for single-cell RNA-39
seq. Genome biology, 18(1):212, Novem-40
ber 2017. ISSN 1465-6906. doi: 10.1186/41
s13059-017-1334-8. URL http://dx.doi. 42
org/10.1186/s13059-017-1334-8. 43
Andrew Butler, Paul Hoffman, Peter Smib- 44
ert, Efthymia Papalexi, and Rahul Satija. 45
Integrating single-cell transcriptomic data 46
across different conditions, technologies, 47
and species. Nat. Biotechnol., 36(5):411– 48
420, June 2018a. 49
Andrew Butler, Paul Hoffman, Peter Smib- 50
ert, Efthymia Papalexi, and Rahul Satija. 51
Integrating single-cell transcriptomic data 52
across different conditions, technologies, 53
and species. Nature Biotechnology, 36(5): 54
411–420, May 2018b. ISSN 1546-1696. 55
doi: 10.1038/nbt.4096. URL https:// 56
www.nature.com/articles/nbt.4096. 57
Kieran R. Campbell and Christopher 58
Yau. Order Under Uncertainty: Robust 59
Differential Expression Analysis Using 60
Probabilistic Models for Pseudotime In- 61
ference. PLOS Computational Biology, 12 62
(11):e1005212, November 2016. ISSN 1553- 63
7358. doi: 10.1371/journal.pcbi.1005212. 64
URL https://journals.plos.org/ 65
ploscompbiol/article?id=10.1371/ 66
journal.pcbi.1005212. 67
Kieran R. Campbell and Christopher Yau. 68
Uncovering pseudotemporal trajectories 69
with covariates from single cell and bulk 70
expression data. Nature Communications, 71
9(1):2442, June 2018. ISSN 2041-1723. 72
doi: 10.1038/s41467-018-04696-6. URL 73
https://www.nature.com/articles/ 74
s41467-018-04696-6. 75
Kieran R. Campbell, Adi Steif, Emma Laks, 76
Hans Zahn, Daniel Lai, Andrew McPher- 77
son, Hossein Farahani, Farhia Kabeer, 78
Ciara O’Flanagan, Justina Biele, Jazmine 79
Brimhall, Beixi Wang, Pascale Walters, 80
IMAXT Consortium, Alexandre Bouchard- 81
Côté, Samuel Aparicio, and Sohrab P. 82
45
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Shah. clonealign: statistical integra-1
tion of independent single-cell RNA and2
DNA sequencing data from human can-3
cers. Genome Biology, 20(1):54, March4
2019. ISSN 1474-760X. doi: 10.1186/5
s13059-019-1645-z. URL https://doi.6
org/10.1186/s13059-019-1645-z.7
Junyue Cao, Jonathan S. Packer, Vijay Ra-8
mani, Darren A. Cusanovich, Chau Huynh,9
Riza Daza, Xiaojie Qiu, Choli Lee, Scott N.10
Furlan, Frank J. Steemers, Andrew Adey,11
Robert H. Waterston, Cole Trapnell, and12
Jay Shendure. Comprehensive single-13
cell transcriptional profiling of a multi-14
cellular organism. Science, 357(6352):15
661–667, August 2017. ISSN 0036-8075,16
1095-9203. doi: 10.1126/science.aam8940.17
URL http://science.sciencemag.org/18
content/357/6352/661.19
Junyue Cao, Darren A. Cusanovich, Vijay20
Ramani, Delasa Aghamirzaie, Hannah A.21
Pliner, Andrew J. Hill, Riza M. Daza,22
Jose L. McFaline-Figueroa, Jonathan S.23
Packer, Lena Christiansen, Frank J.24
Steemers, Andrew C. Adey, Cole Trap-25
nell, and Jay Shendure. Joint profil-26
ing of chromatin accessibility and gene27
expression in thousands of single cells.28
Science, 361(6409):1380–1385, Septem-29
ber 2018. ISSN 0036-8075, 1095-30
9203. doi: 10.1126/science.aau0730.31
URL https://science.sciencemag.org/32
content/361/6409/1380.33
Junyue Cao, Malte Spielmann, Xiaojie Qiu,34
Xingfan Huang, Daniel M. Ibrahim, An-35
drew J. Hill, Fan Zhang, Stefan Mundlos,36
Lena Christiansen, Frank J. Steemers, Cole37
Trapnell, and Jay Shendure. The single-38
cell transcriptional landscape of mam-39
malian organogenesis. Nature, 566(7745):40
496, February 2019a. ISSN 1476-4687.41
doi: 10.1038/s41586-019-0969-x. URL42
https://www.nature.com/articles/ 43
s41586-019-0969-x. 44
Zhi-Jie Cao, Lin Wei, Shen Lu, De-Chang 45
Yang, and Ge Gao. Cell BLAST: Search- 46
ing large-scale scRNA-seq database via 47
unbiased cell embedding. bioRxiv, page 48
587360, March 2019b. doi: 10.1101/ 49
587360. URL https://www.biorxiv.org/ 50
content/10.1101/587360v1. 51
Anna K Casasent, Aislyn Schalck, Ruli 52
Gao, Emi Sei, Annalyssa Long, William 53
Pangburn, Tod Casasent, Funda Meric- 54
Bernstam, Mary E Edgerton, and 55
Nicholas E Navin. Multiclonal invasion in 56
breast tumors identified by topographic 57
single cell sequencing. Cell, 172(1-2): 58
205–217.e12, January 2018. 59
Chong Chen, Changjing Wu, Linjie Wu, 60
Yishu Wang, Minghua Deng, and Ruibin 61
Xi. scRMD: Imputation for single cell 62
RNA-seq data via robust matrix de- 63
composition. November 2018. URL 64
https://www.biorxiv.org/content/10. 65
1101/459404v2. 66
Chongyi Chen, Dong Xing, Longzhi Tan, 67
Heng Li, Guangyu Zhou, Lei Huang, and 68
X Sunney Xie. Single-cell whole-genome 69
analyses by linear amplification via trans- 70
poson insertion (LIANTI). Science, 356 71
(6334):189–194, April 2017. 72
Huidong Chen, Luca Albergante, 73
Jonathan Y. Hsu, Caleb A. Lareau, 74
Giosuè Lo Bosco, Jihong Guan, Shuigeng 75
Zhou, Alexander N. Gorban, Daniel E. 76
Bauer, Martin J. Aryee, David M. Lan- 77
genau, Andrei Zinovyev, Jason D. Buen- 78
rostro, Guo-Cheng Yuan, and Luca Pinello. 79
Single-cell trajectories reconstruction, ex- 80
ploration and mapping of omics data with 81
STREAM. Nature Communications, 10 82
46
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
(1):1903, April 2019. ISSN 2041-1723.1
doi: 10.1038/s41467-019-09670-4. URL2
https://www.nature.com/articles/3
s41467-019-09670-4.4
Kok Hao Chen, Alistair N Boettiger, Jef-5
frey R Moffitt, Siyuan Wang, and Xiaowei6
Zhuang. RNA imaging. spatially resolved,7
highly multiplexed RNA profiling in sin-8
gle cells. Science, 348(6233):aaa6090, April9
2015.10
Mengjie Chen and Xiang Zhou. VIPER:11
variability-preserving imputation for accu-12
rate gene expression recovery in single-cell13
RNA sequencing studies. Genome Biol., 1914
(1):196, November 2018.15
Lih Feng Cheow, Elise T. Courtois, Yuliana16
Tan, Ramya Viswanathan, Qiaorui Xing,17
Rui Zhen Tan, Daniel S. W. Tan, Paul Rob-18
son, Yuin-Han Loh, Stephen R. Quake, and19
William F. Burkholder. Single-cell multi-20
modal profiling reveals cellular epigenetic21
heterogeneity. Nature Methods, 13(10):833–22
836, October 2016. ISSN 1548-7105. doi:23
10.1038/nmeth.3961. URL https://www.24
nature.com/articles/nmeth.3961.25
Cariad Chester and Holden T Maecker. Algo-26
rithmic tools for mining High-Dimensional27
cytometry data. J. Immunol., 195(3):773–28
779, August 2015.29
Hyunghoon Cho, Bonnie Berger, and Jian30
Peng. Generalizable and scalable visualiza-31
tion of Single-Cell data using neural net-32
works. Cell Syst, 7(2):185–191.e4, August33
2018.34
Simone Ciccolella, Mauricio Soto Gomez,35
Murray Patterson, Gianluca Della Ve-36
dova, Iman Hajirasouliha, and Paola37
Bonizzoni. Inferring Cancer Progres-38
sion from Single-cell Sequencing while Al-39
lowing Mutation Losses. bioRxiv, page40
268243, April 2018. doi: 10.1101/ 41
268243. URL https://www.biorxiv.org/ 42
content/10.1101/268243v2. 43
Stephen J. Clark, Ricard Argelaguet, 44
Chantriolnt-Andreas Kapourani, 45
Thomas M. Stubbs, Heather J. Lee, Celia 46
Alda-Catalinas, Felix Krueger, Guido San- 47
guinetti, Gavin Kelsey, John C. Marioni, 48
Oliver Stegle, and Wolf Reik. scNMT-seq 49
enables joint profiling of chromatin accessi- 50
bility DNA methylation and transcription 51
in single cells. Nature Communications, 9 52
(1):781, February 2018. ISSN 2041-1723. 53
doi: 10.1038/s41467-018-03149-4. URL 54
https://www.nature.com/articles/ 55
s41467-018-03149-4. 56
Germán Corredor, Xiangxue Wang, Yu Zhou, 57
Cheng Lu, Pingfu Fu, Konstantinos Syri- 58
gos, David L Rimm, Michael Yang, Ed- 59
uardo Romero, Kurt A Schalper, Vam- 60
sidhar Velcheti, and Anant Madabhushi. 61
Spatial architecture and arrangement of 62
Tumor-Infiltrating lymphocytes for pre- 63
dicting likelihood of recurrence in Early- 64
Stage Non-Small cell lung cancer. Clin. 65
Cancer Res., September 2018. 66
Alexandra Cretu and Peter C Brooks. Im- 67
pact of the non-cellular tumor microenvi- 68
ronment on metastasis: potential thera- 69
peutic and imaging opportunities. J. Cell. 70
Physiol., 213(2):391–402, November 2007. 71
Nicola Crosetto, Magda Bienko, and Alexan- 72
der van Oudenaarden. Spatially resolved 73
transcriptomics and beyond. Nat. Rev. 74
Genet., 16(1):57–66, January 2015. 75
Darren A. Cusanovich, Riza Daza, Andrew 76
Adey, Hannah A. Pliner, Lena Chris- 77
tiansen, Kevin L. Gunderson, Frank J. 78
Steemers, Cole Trapnell, and Jay Shen- 79
dure. Multiplex single cell profiling of chro- 80
47
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
matin accessibility by combinatorial cellu-1
lar indexing. Science (New York, N.Y.),2
348(6237):910–914, May 2015. ISSN 1095-3
9203. doi: 10.1126/science.aab1601.4
Darren A. Cusanovich, James P. Redding-5
ton, David A. Garfield, Riza M. Daza, De-6
lasa Aghamirzaie, Raquel Marco-Ferreres,7
Hannah A. Pliner, Lena Christiansen, Xi-8
aojie Qiu, Frank J. Steemers, Cole Trap-9
nell, Jay Shendure, and Eileen E. M. Fur-10
long. The cis-regulatory dynamics of em-11
bryonic development at single-cell resolu-12
tion. Nature, 555(7697):538–542, March13
2018. ISSN 1476-4687. doi: 10.1038/14
nature25981. URL https://www.nature.15
com/articles/nature25981.16
Sayantan Das, Gonçalo R Abecasis, and17
Brian L Browning. Genotype Impu-18
tation from Large Reference Panels.19
Annual review of genomics and hu-20
man genetics, 19:73–96, August 2018.21
ISSN 1527-8204, 1545-293X. doi:22
10.1146/annurev-genom-083117-021602.23
URL http://dx.doi.org/10.1146/24
annurev-genom-083117-021602.25
Soma Datta, Lavina Malhotra, Ryan Dicker-26
son, Scott Chaffee, Chandan K Sen, and27
Sashwati Roy. Laser capture microdissec-28
tion: Big data from small samples. Histol.29
Histopathol., 30(11):1255–1269, November30
2015.31
Alexander Davis, Ruli Gao, and Nicholas32
Navin. Tumor evolution: Linear, branch-33
ing, neutral or punctuated? Biochim. Bio-34
phys. Acta, 1867(2):151–161, April 2017.35
Carl G. de Boer and Aviv Regev. BROCK-36
MAN: deciphering variance in epige-37
nomic regulators by k-mer factoriza-38
tion. BMC Bioinformatics, 19(1):253, July39
2018. ISSN 1471-2105. doi: 10.1186/40
s12859-018-2255-6. URL https://doi. 41
org/10.1186/s12859-018-2255-6. 42
Charles F A de Bourcy, Iwijn De Vlam- 43
inck, Jad N Kanbar, Jianbin Wang, Charles 44
Gawad, and Stephen R Quake. A quantita- 45
tive comparison of single-cell whole genome 46
amplification methods. PLoS One, 9(8): 47
e105585, August 2014. 48
Frank B Dean, Seiyu Hosono, Linhua Fang, 49
Xiaohong Wu, A Fawad Faruqi, Patricia 50
Bray-Ward, Zhenyu Sun, Qiuling Zong, 51
Yuefen Du, Jing Du, Mark Driscoll, Wan- 52
min Song, Stephen F Kingsmore, Michael 53
Egholm, and Roger S Lasken. Compre- 54
hensive human genome amplification using 55
multiple displacement amplification. Proc. 56
Natl. Acad. Sci. U. S. A., 99(8):5261–5266, 57
April 2002. 58
Yue Deng, Feng Bao, Qionghai Dai, Lani F 59
Wu, and Steven J Altschuler. Scalable anal- 60
ysis of cell-type composition from single- 61
cell transcriptomics using deep recurrent 62
learning. Nature methods, March 2019. 63
ISSN 1548-7091, 1548-7105. doi: 10.1038/ 64
s41592-019-0353-7. URL https://doi. 65
org/10.1038/s41592-019-0353-7. 66
Erica AK DePasquale, Kyle Ferchen, Stuart 67
Hay, H. Leighton Grimes, and Nathan 68
Salomonis. cellHarmony: Cell-level 69
matching and comparison of single-cell 70
transcriptomes. bioRxiv, page 412080, 71
January 2019. doi: 10.1101/412080. URL 72
https://www.biorxiv.org/content/10. 73
1101/412080v4. 74
Siddharth S. Dey, Lennart Kester, Bastiaan 75
Spanjaard, Magda Bienko, and Alexan- 76
der van Oudenaarden. Integrated genome 77
and transcriptome sequencing of the same 78
cell. Nature Biotechnology, 33(3):285–289, 79
March 2015. ISSN 1546-1696. doi: 10.1038/ 80
48
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
nbt.3129. URL https://www.nature.1
com/articles/nbt.3129.2
David van Dijk, Roshan Sharma, Juozas3
Nainys, Kristina Yim, Pooja Kathail,4
Ambrose J. Carr, Cassandra Burdziak,5
Kevin R. Moon, Christine L. Chaf-6
fer, Diwakar Pattabiraman, Brian Bierie,7
Linas Mazutis, Guy Wolf, Smita Krish-8
naswamy, and Dana Pe’er. Recovering9
Gene Interactions from Single-Cell Data10
Using Data Diffusion. Cell, 174(3):716–11
729.e27, July 2018. ISSN 0092-8674,12
1097-4172. doi: 10.1016/j.cell.2018.05.13
061. URL https://www.cell.com/cell/14
abstract/S0092-8674(18)30724-4.15
Jiarui Ding, Anne Condon, and Sohrab P16
Shah. Interpretable dimensionality reduc-17
tion of single cell transcriptome data with18
deep generative models. Nat. Commun., 919
(1):2002, May 2018.20
Xiao Dong, Lei Zhang, Brandon Milholland,21
Moonsook Lee, Alexander Y. Maslov, Tao22
Wang, and Jan Vijg. Accurate iden-23
tification of single-nucleotide variants in24
whole-genome-amplified single cells. Na-25
ture Methods, 14(5):491–493, May 2017.26
ISSN 1548-7105. doi: 10.1038/nmeth.27
4227. URL https://www.nature.com/28
articles/nmeth.4227.29
Angelo Duò, Mark D Robinson, and Char-30
lotte Soneson. A systematic performance31
evaluation of clustering methods for single-32
cell RNA-seq data. F1000Res., 7, July33
2018.34
G Durif, L Modolo, J E Mold, S Lambert-35
Lacroix, and F Picard. Probabilistic36
Count Matrix Factorization for Single37
Cell Expression Data Analysis. Bioin-38
formatics, March 2019. ISSN 1367-4803,39
1367-4811. doi: 10.1093/bioinformatics/40
btz177. URL http://dx.doi.org/10. 41
1093/bioinformatics/btz177. 42
Daniel Edsgärd, Per Johnsson, and Rickard 43
Sandberg. Identification of spatial expres- 44
sion trends in single-cell gene expression 45
data. Nat. Methods, 15(5):339–342, May 46
2018. 47
Peter Eirew, Adi Steif, Jaswinder Khattra, 48
Gavin Ha, Damian Yap, Hossein Fara- 49
hani, Karen Gelmon, Stephen Chia, Colin 50
Mar, Adrian Wan, Emma Laks, Justina 51
Biele, Karey Shumansky, Jamie Rosner, 52
Andrew McPherson, Cydney Nielsen, 53
Andrew J. L. Roth, Calvin Lefebvre, Ali 54
Bashashati, Camila de Souza, Celia Siu, 55
Radhouane Aniba, Jazmine Brimhall, 56
Arusha Oloumi, Tomo Osako, Alejandra 57
Bruna, Jose L. Sandoval, Teresa Algara, 58
Wendy Greenwood, Kaston Leung, Hong- 59
wei Cheng, Hui Xue, Yuzhuo Wang, 60
Dong Lin, Andrew J. Mungall, Richard 61
Moore, Yongjun Zhao, Julie Lorette, Long 62
Nguyen, David Huntsman, Connie J. 63
Eaves, Carl Hansen, Marco A. Marra, Car- 64
los Caldas, Sohrab P. Shah, and Samuel 65
Aparicio. Dynamics of genomic clones 66
in breast cancer patient xenografts at 67
single-cell resolution. Nature, 518(7539): 68
422–426, February 2015. ISSN 0028-0836. 69
doi: 10.1038/nature13952. URL http: 70
//www.nature.com/nature/journal/ 71
v518/n7539/full/nature13952.html. 72
Mohammed El-Kebir. SPhyR: tumor 73
phylogeny estimation from single-cell 74
sequencing data under loss and er- 75
ror. Bioinformatics, 34(17):i671–i679, 76
September 2018. ISSN 1367-4803. 77
doi: 10.1093/bioinformatics/bty589. 78
URL https://academic.oup.com/ 79
bioinformatics/article/34/17/i671/ 80
5093218. 81
49
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Nils Eling, Arianne C. Richard, Sylvia1
Richardson, John C. Marioni, and2
Catalina A. Vallejos. Correcting the3
Mean-Variance Dependency for Differen-4
tial Variability Testing Using Single-Cell5
RNA Sequencing Data. Cell Systems, 7(3):6
284–294.e12, September 2018. ISSN 2405-7
4712. doi: 10.1016/j.cels.2018.06.011. URL8
https://www.cell.com/cell-systems/9
abstract/S2405-4712(18)30278-3.10
Chee-Huat Linus Eng, Michael Lawson,11
Qian Zhu, Ruben Dries, Noushin Koulena,12
Yodai Takei, Jina Yun, Christopher13
Cronin, Christoph Karp, Guo-Cheng14
Yuan, and Long Cai. Transcriptome-15
scale super-resolved imaging in tissues by16
RNA seqFISH+. Nature, 568(7751):17
235, April 2019. ISSN 1476-4687.18
doi: 10.1038/s41586-019-1049-y. URL19
https://www.nature.com/articles/20
s41586-019-1049-y.21
Gökcen Eraslan, Lukas M. Simon, Maria22
Mircea, Nikola S. Mueller, and Fabian J.23
Theis. Single-cell RNA-seq denois-24
ing using a deep count autoencoder.25
Nature Communications, 10(1):390,26
January 2019. ISSN 2041-1723. doi:27
10.1038/s41467-018-07931-2. URL28
https://www.nature.com/articles/29
s41467-018-07931-2.30
Nuria Estévez-Gómez, Tamara Prieto, Amy31
Guillaumet-Adkins, Holger Heyn, Sonia32
Prado-López, and David Posada. Com-33
parison of single-cell whole-genome am-34
plification strategies. bioRxiv, page35
443754, October 2018. doi: 10.1101/36
443754. URL https://www.biorxiv.org/37
content/10.1101/443754v1.38
Jean Fan, Hae-Ock Lee, Soohyun Lee, Da-39
Eun Ryu, Semin Lee, Catherine Xue,40
Seok Jin Kim, Kihyun Kim, Nikolaos41
Barkas, Peter J Park, Woong-Yang Park, 42
and Peter V Kharchenko. Linking tran- 43
scriptional and genetic tumor heterogeneity 44
through allele analysis of single-cell RNA- 45
seq data. Genome Res., 28(8):1217–1227, 46
August 2018. 47
Jeffrey A. Farrell, Yiqun Wang, Saman- 48
tha J. Riesenfeld, Karthik Shekhar, 49
Aviv Regev, and Alexander F. Schier. 50
Single-cell reconstruction of develop- 51
mental trajectories during zebrafish 52
embryogenesis. Science, 360(6392): 53
eaar3131, June 2018. ISSN 0036-8075, 54
1095-9203. doi: 10.1126/science.aar3131. 55
URL http://science.sciencemag.org/ 56
content/360/6392/eaar3131. 57
Joseph Felsenstein. Evolutionary trees from 58
DNA sequences: A maximum likelihood 59
approach. J. Mol. Evol., 17(6):368–376, 60
1981. 61
Greg Finak, Andrew McDavid, Masanao 62
Yajima, Jingyuan Deng, Vivian Gersuk, 63
Alex K. Shalek, Chloe K. Slichter, Han- 64
nah W. Miller, M. Juliana McElrath, 65
Martin Prlic, Peter S. Linsley, and Raphael 66
Gottardo. MAST: a flexible statistical 67
framework for assessing transcriptional 68
changes and characterizing heterogene- 69
ity in single-cell RNA sequencing data. 70
Genome Biology, 16, 2015. ISSN 1474- 71
7596. doi: 10.1186/s13059-015-0844-5. 72
URL https://www.ncbi.nlm.nih.gov/ 73
pmc/articles/PMC4676162/. 74
Christopher T. Fincher, Omri Wurtzel, 75
Thom de Hoog, Kellie M. Kravarik, and 76
Peter W. Reddien. Cell type transcriptome 77
atlas for the planarian Schmidtea mediter- 78
ranea. Science, 360(6391):eaaq1736, 79
May 2018. ISSN 0036-8075, 1095- 80
9203. doi: 10.1126/science.aaq1736. 81
50
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
URL http://science.sciencemag.org/1
content/360/6391/eaaq1736.2
William Fletcher and Ziheng Yang. The ef-3
fect of insertions, deletions, and alignment4
errors on the branch-site test of positive se-5
lection. Mol. Biol. Evol., 27(10):2257–2267,6
October 2010.7
Jasmine Foo, Kevin Leder, and Franziska Mi-8
chor. Stochastic dynamics of cancer initi-9
ation. Phys. Biol., 8(1):015002, February10
2011.11
Joshua M. Francis, Cheng-Zhong Zhang,12
Cecile L. Maire, Joonil Jung, Veronica E.13
Manzo, Viktor A. Adalsteinsson, Heather14
Homer, Sam Haidar, Brendan Blumenstiel,15
Chandra Sekhar Pedamallu, Azra H.16
Ligon, J. Christopher Love, Matthew17
Meyerson, and Keith L. Ligon. EGFR18
Variant Heterogeneity in Glioblastoma19
Resolved through Single-Nucleus Sequenc-20
ing. Cancer Discovery, 4(8):956–971,21
August 2014. ISSN 2159-8274, 2159-8290.22
doi: 10.1158/2159-8290.CD-13-0879.23
URL http://cancerdiscovery.24
aacrjournals.org/content/4/8/956.25
Saskia Freytag, Luyi Tian, Ingrid Lönnst-26
edt, Milica Ng, and Melanie Bahlo.27
Comparison of clustering tools in R for28
medium-sized 10x Genomics single-cell29
RNA-sequencing data. F1000Research,30
7, December 2018. ISSN 2046-1402. doi:31
10.12688/f1000research.15809.2. URL32
https://www.ncbi.nlm.nih.gov/pmc/33
articles/PMC6124389/.34
Wolf H Fridman, Jérome Galon, Marie-35
Caroline Dieu-Nosjean, Isabelle Cremer,36
Sylvain Fisson, Diane Damotte, Franck37
Pagès, Eric Tartour, and Catherine Sautès-38
Fridman. Immune infiltration in human39
cancer: Prognostic significance and disease40
control. In Glenn Dranoff, editor, Cancer 41
Immunology and Immunotherapy, pages 1– 42
24. Springer Berlin Heidelberg, Berlin, Hei- 43
delberg, 2011. 44
Yusi Fu, Chunmei Li, Sijia Lu, Wenxiong 45
Zhou, Fuchou Tang, X Sunney Xie, and 46
Yanyi Huang. Uniform and accurate 47
single-cell sequencing based on emulsion 48
whole-genome amplification. Proc. Natl. 49
Acad. Sci. U. S. A., 112(38):11923–11928, 50
September 2015. 51
Dan Gao, Feng Jin, Min Zhou, and Yuyang 52
Jiang. Recent advances in single cell ma- 53
nipulation and biochemical analysis on mi- 54
crofluidics. Analyst, 144(3):766–781, Jan- 55
uary 2019. 56
Xin Gao, Deqing Hu, Madelaine Gogol, 57
and Hua Li. ClusterMap: Com- 58
paring analyses across multiple Single 59
Cell RNA-Seq profiles. bioRxiv, page 60
331330, June 2018. doi: 10.1101/ 61
331330. URL https://www.biorxiv.org/ 62
content/10.1101/331330v2. 63
Tyler Garvin, Robert Aboukhalil, Jude 64
Kendall, Timour Baslan, Gurinder S At- 65
wal, James Hicks, Michael Wigler, and 66
Michael C Schatz. Interactive analysis and 67
assessment of single-cell copy-number vari- 68
ations. Nat. Methods, 12(11):1058–1060, 69
November 2015. 70
Charles Gawad, Winston Koh, and Stephen R 71
Quake. Single-cell genome sequencing: cur- 72
rent state of the science. Nat. Rev. Genet., 73
17(3):175–188, March 2016. 74
Farzad Ghaznavi, Andrew Evans, Anant 75
Madabhushi, and Michael Feldman. Digital 76
imaging in pathology: whole-slide imaging 77
and beyond. Annu. Rev. Pathol., 8:331– 78
359, January 2013. 79
51
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Charlotte Giesen, Hao A. O. Wang, Denis1
Schapiro, Nevena Zivanovic, Andrea Ja-2
cobs, Bodo Hattendorf, Peter J. Schüffler,3
Daniel Grolimund, Joachim M. Buhmann,4
Simone Brandt, Zsuzsanna Varga, Peter J.5
Wild, Detlef Günther, and Bernd Boden-6
miller. Highly multiplexed imaging of tu-7
mor tissues with subcellular resolution by8
mass cytometry. Nature Methods, 11(4):9
417–422, April 2014. ISSN 1548-7105. doi:10
10.1038/nmeth.2869. URL https://www.11
nature.com/articles/nmeth.2869.12
Yury Goltsev, Nikolay Samusik, Julia13
Kennedy-Darling, Salil Bhate, Matthew14
Hale, Gustavo Vazquez, Sarah Black, and15
Garry P. Nolan. Deep Profiling of Mouse16
Splenic Architecture with CODEX Multi-17
plexed Imaging. Cell, 174(4):968–981.e15,18
August 2018. ISSN 0092-8674. doi:19
10.1016/j.cell.2018.07.010. URL http:20
//www.sciencedirect.com/science/21
article/pii/S0092867418309048.22
Wuming Gong, Il-Youp Kwak, Pruthvi Pota,23
Naoko Koyano-Nakagawa, and Daniel J.24
Garry. DrImpute: imputing dropout25
events in single cell RNA sequencing26
data. BMC Bioinformatics, 19(1):220, June27
2018. ISSN 1471-2105. doi: 10.1186/28
s12859-018-2226-y. URL https://doi.29
org/10.1186/s12859-018-2226-y.30
R R Gray, O G Pybus, and M Salemi. Mea-31
suring the temporal structure in Serially-32
Sampled phylogenies. Methods Ecol. Evol.,33
2(5):437–445, October 2011.34
Anna Graybeal. Is it better to add taxa or35
characters to a difficult phylogenetic prob-36
lem? Syst. Biol., 47(1):9–17, 1998.37
Christopher Heje Grønbech, Maximil-38
lian Fornitz Vording, Pascal N Timshel,39
Casper Kaae Sønderby, Tune Hannes40
Pers, and Ole Winther. scVAE: Vari- 41
ational auto-encoders for single-cell 42
gene expression data. May 2018. URL 43
https://www.biorxiv.org/content/ 44
early/2018/05/16/318295. 45
Dominic Grün, Lennart Kester, and Alexan- 46
der van Oudenaarden. Validation of noise 47
models for single-cell transcriptomics. Na- 48
ture Methods, 11(6):637–640, June 2014. 49
ISSN 1548-7105. doi: 10.1038/nmeth. 50
2930. URL https://www.nature.com/ 51
articles/nmeth.2930. 52
Martin Guilliams, Charles-Antoine Dutertre, 53
Charlotte L. Scott, Naomi McGovern, 54
Dorine Sichien, Svetoslav Chakarov, Sofie 55
Van Gassen, Jinmiao Chen, Michael 56
Poidinger, Sofie De Prijck, Simon J. 57
Tavernier, Ivy Low, Sergio Erdal Irac, 58
Citra Nurfarah Mattar, Hermi Rizal Suma- 59
toh, Gillian Hui Ling Low, Tam John Kit 60
Chung, Dedrick Kok Hong Chan, Ker Kan 61
Tan, Tony Lim Kiat Hon, Even Fos- 62
sum, Bjarne Bogen, Mahesh Choolani, 63
Jerry Kok Yen Chan, Anis Larbi, Hervé 64
Luche, Sandrine Henri, Yvan Saeys, 65
Evan William Newell, Bart N. Lambrecht, 66
Bernard Malissen, and Florent Ginhoux. 67
Unsupervised High-Dimensional Analysis 68
Aligns Dendritic Cells across Tissues 69
and Species. Immunity, 45(3):669–684, 70
September 2016. ISSN 1074-7613. doi: 71
10.1016/j.immuni.2016.08.015. URL http: 72
//www.sciencedirect.com/science/ 73
article/pii/S1074761316303399. 74
Metin N Gurcan, Laura Boucheron, Ali Can, 75
Anant Madabhushi, Nasir Rajpoot, and 76
Bulent Yener. Histopathological image 77
analysis: A review. IEEE Rev. Biomed. 78
Eng., 2:147, 2009. 79
Hiroshi Haeno, Mithat Gonen, Meghan B 80
Davis, Joseph M Herman, Christine A 81
52
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Iacobuzio-Donahue, and Franziska Michor.1
Computational modeling of pancreatic can-2
cer reveals kinetics of metastasis suggesting3
optimum treatment strategies. Cell, 148(1-4
2):362–375, January 2012.5
Christoph Hafemeister and Rahul Satija. Nor-6
malization and variance stabilization of7
single-cell RNA-seq data using regular-8
ized negative binomial regression. March9
2019. URL https://www.biorxiv.org/10
content/10.1101/576827v2.11
Laleh Haghverdi, Maren Büttner, F. Alexan-12
der Wolf, Florian Buettner, and Fabian J.13
Theis. Diffusion pseudotime robustly14
reconstructs lineage branching. Nature15
Methods, 13(10):845–848, October 2016.16
ISSN 1548-7105. doi: 10.1038/nmeth.17
3971. URL https://www.nature.com/18
articles/nmeth.3971.19
Laleh Haghverdi, Aaron T. L. Lun,20
Michael D. Morgan, and John C. Marioni.21
Batch effects in single-cell RNA-sequencing22
data are corrected by matching mutual23
nearest neighbors. Nature Biotechnology,24
36(5):421–427, 2018. ISSN 1546-1696. doi:25
10.1038/nbt.4091.26
Xiaoping Han, Renying Wang, Yincong Zhou,27
Lijiang Fei, Huiyu Sun, Shujing Lai, Assieh28
Saadatpour, Ziming Zhou, Haide Chen,29
Fang Ye, Daosheng Huang, Yang Xu, Wen-30
tao Huang, Mengmeng Jiang, Xinyi Jiang,31
Jie Mao, Yao Chen, Chenyu Lu, Jin Xie,32
Qun Fang, Yibin Wang, Rui Yue, Tiefeng33
Li, He Huang, Stuart H Orkin, Guo-Cheng34
Yuan, Ming Chen, and Guoji Guo. Map-35
ping the mouse cell atlas by Microwell-Seq.36
Cell, 172(5):1091–1107.e17, February 2018.37
Andreas Heindl, Sidra Nawaz, and Yinyin38
Yuan. Mapping spatial heterogeneity in39
the tumor microenvironment: a new era for40
digital pathology. Lab. Invest., 95(4):377– 41
384, April 2015. 42
Stephanie C. Hicks and Roger D. Peng. 43
Elements and Principles of Data Anal- 44
ysis. arXiv:1903.07639 [stat], March 45
2019. URL http://arxiv.org/abs/1903. 46
07639. arXiv: 1903.07639. 47
Stephanie C. Hicks, F. William Townes, 48
Mingxiang Teng, and Rafael A. Irizarry. 49
Missing data and technical variability 50
in single-cell RNA-sequencing exper- 51
iments. Biostatistics, 19(4):562–578, 52
October 2018. ISSN 1465-4644. doi: 53
10.1093/biostatistics/kxx053. URL https: 54
//academic.oup.com/biostatistics/ 55
article/19/4/562/4599254. 56
Elad Hoffer and Nir Ailon. Deep Metric 57
Learning Using Triplet Network. In Aasa 58
Feragen, Marcello Pelillo, and Marco Loog, 59
editors, Similarity-Based Pattern Recogni- 60
tion, Lecture Notes in Computer Science, 61
pages 84–92. Springer International Pub- 62
lishing, 2015. ISBN 978-3-319-24261-3. 63
Ian H Holmes. Solving the master equation 64
for indels. BMC Bioinformatics, 18(1):255, 65
May 2017. 66
Chung-Chau Hon, Jay W. Shin, Piero Carn- 67
inci, and Michael J. T. Stubbington. 68
The Human Cell Atlas: Technical ap- 69
proaches and challenges. Briefings in 70
Functional Genomics, 17(4):283–294, July 71
2018. ISSN 2041-2649. doi: 10.1093/bfgp/ 72
elx029. URL https://academic.oup. 73
com/bfg/article/17/4/283/4571849. 74
Masahito Hosokawa, Yohei Nishikawa, 75
Masato Kogawa, and Haruko Takeyama. 76
Massively parallel whole genome amplifica- 77
tion for single-cell sequencing using droplet 78
microfluidics. Sci. Rep., 7(1):5199, July 79
2017. 80
53
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Yong Hou, Kui Wu, Xulian Shi, Fuqiang Li,1
Luting Song, Hanjie Wu, Michael Dean,2
Guibo Li, Shirley Tsang, Runze Jiang,3
Xiaolong Zhang, Bo Li, Geng Liu, Ni-4
harika Bedekar, Na Lu, Guoyun Xie, Han5
Liang, Liao Chang, Ting Wang, Jianghao6
Chen, Yingrui Li, Xiuqing Zhang, Huan-7
ming Yang, Xun Xu, Ling Wang, and Jun8
Wang. Comparison of variations detection9
between whole-genome amplification meth-10
ods used in single-cell resequencing. Giga-11
science, 4:37, August 2015.12
Qiwen Hu and Casey S Greene. Param-13
eter tuning is a key part of dimension-14
ality reduction via deep variational au-15
toencoders for single cell RNA transcrip-16
tomics. Pacific Symposium on Biocomput-17
ing. Pacific Symposium on Biocomputing,18
24:362–373, 2019. ISSN 2335-6936, 2335-19
6928. URL https://www.ncbi.nlm.nih.20
gov/pubmed/30963075.21
Lei Huang, Fei Ma, Alec Chapman, Sijia22
Lu, and Xiaoliang Sunney Xie. Single-Cell23
Whole-Genome amplification and sequenc-24
ing: Methodology and applications. Annu.25
Rev. Genomics Hum. Genet., 16:79–102,26
June 2015.27
Mo Huang, Jingshu Wang, Eduardo Torre,28
Hannah Dueck, Sydney Shaffer, Roberto29
Bonasio, John I. Murray, Arjun Raj,30
Mingyao Li, and Nancy R. Zhang.31
SAVER: gene expression recovery for32
single-cell RNA sequencing. Nature Meth-33
ods, 15(7):539, July 2018. ISSN 1548-7105.34
doi: 10.1038/s41592-018-0033-z. URL35
https://www.nature.com/articles/36
s41592-018-0033-z.37
Joanna Hård, Ezeddin Al Hakim, Marie38
Kindblom, Åsa K. Björklund, Bengt39
Sennblad, Ilke Demirci, Marta Paterlini,40
Pedro Reu, Erik Borgström, Patrik L.41
Ståhl, Jakob Michaelsson, Jeff E. Mold, 42
and Jonas Frisén. Conbase: a software 43
for unsupervised discovery of clonal so- 44
matic mutations in single cells through read 45
phasing. Genome Biology, 20(1):68, April 46
2019. ISSN 1474-760X. doi: 10.1186/ 47
s13059-019-1673-8. URL https://doi. 48
org/10.1186/s13059-019-1673-8. 49
T. Höllt, N. Pezzotti, V. van Unen, F. Kon- 50
ing, B. P. F. Lelieveldt, and A. Vilanova. 51
CyteGuide: Visual Guidance for Hierar- 52
chical Single-Cell Analysis. IEEE Trans- 53
actions on Visualization and Computer 54
Graphics, 24(1):739–748, January 2018. 55
ISSN 1077-2626. doi: 10.1109/TVCG.2017. 56
2744318. 57
Giovanni Iacono, Elisabetta Mereu, Amy 58
Guillaumet-Adkins, Roser Corominas, Ivon 59
Cuscó, Gustavo Rodríguez-Esteban, Marta 60
Gut, Luis Alberto Pérez-Jurado, Ivo Gut, 61
and Holger Heyn. bigSCale: an analyti- 62
cal framework for big-scale single-cell data. 63
Genome Res., 28(6):878–890, June 2018. 64
Humayun Irshad, Antoine Veillard, Ludovic 65
Roux, and Daniel Racoceanu. Methods 66
for nuclei detection, segmentation, and 67
classification in digital histopathology: A 68
Review—Current status and future poten- 69
tial, 2014. 70
Martin Jacobsen. Point Process Theory and 71
Applications: Marked Point and Piecewise 72
Deterministic Processes. Springer Science 73
& Business Media, December 2005. 74
Katharina Jahn, Jack Kuipers, and Niko 75
Beerenwinkel. Tree inference for single-cell 76
data. Genome Biol., 17:86, May 2016. 77
Livnat Jerby-Arnon, Nadja Pfetzer, Yedael Y 78
Waldman, Lynn McGarry, Daniel James, 79
Emma Shanks, Brinton Seashore-Ludlow, 80
Adam Weinstock, Tamar Geiger, Paul A 81
54
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Clemons, Eyal Gottlieb, and Eytan Rup-1
pin. Predicting cancer-specific vulnerabil-2
ity via data-driven detection of synthetic3
lethality. Cell, 158(5):1199–1209, August4
2014.5
Zhicheng Ji and Hongkai Ji. TSCAN:6
Pseudo-time reconstruction and evaluation7
in single-cell RNA-seq analysis. Nucleic8
Acids Research, 44(13):e117, 2016. ISSN9
1362-4962. doi: 10.1093/nar/gkw430.10
Nelson Johansen and Gerald Quon. scAlign:11
a tool for alignment, integration and12
rare cell identification from scRNA-seq13
data. bioRxiv, page 504944, March14
2019. doi: 10.1101/504944. URL15
https://www.biorxiv.org/content/10.16
1101/504944v4.17
Brett E Johnson, Tali Mazor, Chibo Hong,18
Michael Barnes, Koki Aihara, Cory Y19
McLean, Shaun D Fouse, Shogo Ya-20
mamoto, Hiroki Ueda, Kenji Tatsuno,21
Saurabh Asthana, Llewellyn E Jalbert,22
Sarah J Nelson, Andrew W Bollen, W Clay23
Gustafson, Elise Charron, William A24
Weiss, Ivan V Smirnov, Jun S Song,25
Adam B Olshen, Soonmee Cha, Yongjun26
Zhao, Richard A Moore, Andrew J27
Mungall, Steven J M Jones, Martin Hirst,28
Marco A Marra, Nobuhito Saito, Hiroyuki29
Aburatani, Akitake Mukasa, Mitchel S30
Berger, Susan M Chang, Barry S Taylor,31
and Joseph F Costello. Mutational anal-32
ysis reveals the origin and therapy-driven33
evolution of recurrent glioma. Science, 34334
(6167):189–193, January 2014.35
Travis S. Johnson, Tongxin Wang, Zhi36
Huang, Christina Y. Yu, Yi Wu, Yatong37
Han, Yan Zhang, Kun Huang, and Jie38
Zhang. LAmbDA: Label Ambiguous39
Domain Adaptation Dataset Integration40
Reduces Batch Effects and Improves41
Subtype Detection. Bioinformatics, April 42
2019. doi: 10.1093/bioinformatics/btz295. 43
URL https://academic.oup.com/ 44
bioinformatics/advance-article/ 45
doi/10.1093/bioinformatics/btz295/ 46
5481958. 47
Altuna Akalin Jonathan Ronen. netsmooth: 48
Network-smoothing based imputation for 49
single cell RNA-seq. F1000Res., 7, 2018. 50
Min Jung, Daniel Wells, Jannette Rusch, 51
Suhaira Ahmad, Jonathan Marchini, Si- 52
mon R Myers, and Donald F Conrad. Uni- 53
fied single-cell analysis of testis gene regu- 54
lation and pathology in five mouse strains. 55
eLife, 8, June 2019. ISSN 2050-084X. 56
doi: 10.7554/eLife.43966. URL http:// 57
dx.doi.org/10.7554/eLife.43966. 58
Melissa R Junttila and Frederic J de Sauvage. 59
Influence of tumour micro-environment 60
heterogeneity on therapeutic response. Na- 61
ture, 501(7467):346–354, September 2013. 62
Hyun Min Kang, Meena Subramaniam, Sasha 63
Targ, Michelle Nguyen, Lenka Maliskova, 64
Elizabeth McCarthy, Eunice Wan, Simon 65
Wong, Lauren Byrnes, Cristina Lanata, 66
Rachel Gate, Sara Mostafavi, Alexan- 67
der Marson, Noah Zaitlen, Lindsey A 68
Criswell, and Chun Jimmie Ye. Multi- 69
plexed droplet single-cell RNA-sequencing 70
using natural genetic variation. Na- 71
ture biotechnology, 36(1):89–94, January 72
2018a. ISSN 1087-0156. doi: 10.1038/nbt. 73
4042. URL https://www.ncbi.nlm.nih. 74
gov/pmc/articles/PMC5784859/. 75
Hyun Min Kang, Meena Subramaniam, Sasha 76
Targ, Michelle Nguyen, Lenka Maliskova, 77
Elizabeth McCarthy, Eunice Wan, Simon 78
Wong, Lauren Byrnes, Cristina M Lanata, 79
Rachel E Gate, Sara Mostafavi, Alexander 80
Marson, Noah Zaitlen, Lindsey A Criswell, 81
55
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
and Chun Jimmie Ye. Multiplexed droplet1
single-cell RNA-sequencing using natural2
genetic variation. Nat. Biotechnol., 36(1):3
89–94, January 2018b.4
Jurrian Kornelis de Kanter, Philip Lijnzaad,5
Tito Candelli, Thanasis Margaritis, and6
Frank Holstege. CHETAH: a selective, hi-7
erarchical cell type identification method8
for single-cell RNA sequencing. bioRxiv,9
page 558908, February 2019. doi: 10.1101/10
558908. URL https://www.biorxiv.org/11
content/10.1101/558908v1.12
Nikos Karaiskos, Philipp Wahle, Jonathan13
Alles, Anastasiya Boltengagen, Salah Ay-14
oub, Claudia Kipar, Christine Kocks, Niko-15
laus Rajewsky, and Robert P Zinzen. The16
drosophila embryo at single-cell transcrip-17
tome resolution. Science, 358(6360):194–18
199, October 2017a.19
Nikos Karaiskos, Philipp Wahle, Jonathan20
Alles, Anastasiya Boltengagen, Salah Ay-21
oub, Claudia Kipar, Christine Kocks, Niko-22
laus Rajewsky, and Robert P. Zinzen. The23
Drosophila embryo at single-cell transcrip-24
tome resolution. Science, 358(6360):194–25
199, October 2017b. ISSN 0036-8075,26
1095-9203. doi: 10.1126/science.aan3235.27
URL http://science.sciencemag.org/28
content/358/6360/194.29
Ino D. Karemaker and Michiel Vermeulen.30
Single-Cell DNA Methylation Profiling:31
Technologies and Biological Applications.32
Trends in Biotechnology, 36(9):952–965,33
September 2018. ISSN 0167-7799, 1879-34
3096. doi: 10.1016/j.tibtech.2018.04.002.35
URL https://www.cell.com/36
trends/biotechnology/abstract/37
S0167-7799(18)30115-X.38
Lennart Kester and Alexander van Oudenaar-39
den. Single-Cell transcriptomics meets lin-40
eage tracing. Cell Stem Cell, May 2018.41
Peter V Kharchenko, Lev Silberstein, and 42
David T Scadden. Bayesian approach 43
to single-cell differential expression analy- 44
sis. Nature Methods, 11(7):740–742, July 45
2014. ISSN 1548-7091. doi: 10.1038/ 46
nmeth.2967. URL http://www.nature. 47
com/doifinder/10.1038/nmeth.2967. 48
Kyu-Tae Kim, Hye Won Lee, Hae-Ock Lee, 49
Sang Cheol Kim, Yun Jee Seo, Woosung 50
Chung, Hye Hyeon Eum, Do-Hyun Nam, 51
Junhyong Kim, Kyeung Min Joo, and 52
Woong-Yang Park. Single-cell mRNA se- 53
quencing identifies subclonal heterogeneity 54
in anti-cancer drug responses of lung ade- 55
nocarcinoma cells. Genome Biol., 16:127, 56
June 2015. 57
Tae-Min Kim, Ruibin Xi, Lovelace J. Lu- 58
quette, Richard W. Park, Mark D. John- 59
son, and Peter J. Park. Functional 60
genomic analysis of chromosomal aber- 61
rations in a compendium of 8000 can- 62
cer genomes. Genome Research, 23(2): 63
217–227, 2013. doi: 10.1101/gr.140301. 64
112. URL http://genome.cshlp.org/ 65
content/23/2/217.abstract. 66
Marek Kimmel and David Axelrod. Branch- 67
ing Processes in Biology. Interdisci- 68
plinary Applied Mathematics. Springer- 69
Verlag, New York, 2 edition, 2015. ISBN 70
978-1-4939-1558-3. URL https://www. 71
springer.com/gp/book/9781493915583. 72
Savvas Kinalis, Finn Cilius Nielsen, Ole 73
Winther, and Frederik Otzen Bagger. 74
Deconvolution of autoencoders to learn bi- 75
ological regulatory modules from single cell 76
mRNA sequencing data. BMC bioinfor- 77
matics, 20(1):379, July 2019. ISSN 1471- 78
2105. doi: 10.1186/s12859-019-2952-9. 79
URL http://dx.doi.org/10.1186/ 80
s12859-019-2952-9. 81
56
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Vladimir Yu Kiselev, Andrew Yiu, and Mar-1
tin Hemberg. scmap: projection of single-2
cell RNA-seq data across data sets. Na-3
ture Methods, 15(5):359–362, May 2018.4
ISSN 1548-7105. doi: 10.1038/nmeth.5
4644. URL https://www.nature.com/6
articles/nmeth.4644.7
Vladimir Yu Kiselev, Tallulah S. Andrews,8
and Martin Hemberg. Challenges in9
unsupervised clustering of single-cell10
RNA-seq data. Nature Reviews Genetics,11
page 1, January 2019. ISSN 1471-0064.12
doi: 10.1038/s41576-018-0088-9. URL13
https://www.nature.com/articles/14
s41576-018-0088-9.15
Allon M. Klein, Linas Mazutis, Ilke Akar-16
tuna, Naren Tallapragada, Adrian Veres,17
Victor Li, Leonid Peshkin, David A. Weitz,18
and Marc W. Kirschner. Droplet barcod-19
ing for single-cell transcriptomics applied20
to embryonic stem cells. Cell, 161(5):1187–21
1201, May 2015. ISSN 1097-4172. doi:22
10.1016/j.cell.2015.04.044.23
C A Klein, O Schmidt-Kittler, J A Schardt,24
K Pantel, M R Speicher, and G Rieth-25
müller. Comparative genomic hybridiza-26
tion, loss of heterozygosity, and DNA se-27
quence analysis of single cells. Proc. Natl.28
Acad. Sci. U. S. A., 96(8):4494–4499, April29
1999.30
Sergey Knyazev, Viachaslau Tsyvina, An-31
drew Melnyk, Alexander Artyomenko, Ta-32
tiana Malygina, Yuri B Porozov, Ellsworth33
Campbell, William M Switzer, Pavel34
Skums, and Alex Zelikovsky. CliqueSNV:35
Scalable reconstruction of Intra-Host viral36
populations from NGS reads, 2018.37
Bryan Kolaczkowski and Joseph W Thornton.38
A mixed branch length model of hetero-39
tachy improves phylogenetic accuracy. Mol.40
Biol. Evol., 25(6):1054–1066, June 2008.41
Daisuke Komura and Shumpei Ishikawa. Ma- 42
chine learning methods for histopatho- 43
logical image analysis. Comput. Struct. 44
Biotechnol. J., 16:34–42, February 2018. 45
Say Li Kong, Huipeng Li, Joyce A Tai, Elise T 46
Courtois, Huay Mei Poh, Dawn Pingxi Lau, 47
Yu Xuan Haw, Narayanan Gopalakrishna 48
Iyer, Daniel Shao Weng Tan, Shyam Prab- 49
hakar, Dave Ruff, and Axel M Hillmer. 50
Concurrent Single-Cell RNA and targeted 51
DNA sequencing on an automated platform 52
for comeasurement of genomic and tran- 53
scriptomic signatures. Clin. Chem., 65(2): 54
272–281, February 2019. 55
Hazal Koptagel, Seong-Hwan Jun, and 56
Jens Lagergren. SCuPhr: A Prob- 57
abilistic Framework for Cell Lineage 58
Tree Reconstruction. bioRxiv, page 59
357442, June 2018. doi: 10.1101/ 60
357442. URL https://www.biorxiv.org/ 61
content/early/2018/06/29/357442. 62
Keegan D Korthauer, Li-Fang Chu, Michael A 63
Newton, Yuan Li, James Thomson, Ron 64
Stewart, and Christina Kendziorski. A sta- 65
tistical approach for identifying differential 66
distributions in single-cell RNA-seq exper- 67
iments. Genome Biol., 17(1):222, October 68
2016a. 69
Keegan D. Korthauer, Li-Fang Chu, 70
Michael A. Newton, Yuan Li, James 71
Thomson, Ron Stewart, and Christina 72
Kendziorski. A statistical approach for 73
identifying differential distributions in 74
single-cell RNA-seq experiments. Genome 75
Biology, 17(1):222, 2016b. ISSN 1474- 76
760X. doi: 10.1186/s13059-016-1077-y. 77
Johannes Köster, Myles Brown, and Xi- 78
aole Shirley Liu. A bayesian model for 79
single cell transcript expression analysis on 80
MERFISH data, September 2017. 81
57
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Dylan Kotliar, Adrian Veres, M Aurel Nagy,1
Shervin Tabrizi, Eran Hodis, Douglas A2
Melton, and Pardis C Sabeti. Identify-3
ing gene expression programs of cell-type4
identity and cellular activity with single-5
cell RNA-Seq. Elife, 8:e43803, July 2019.6
Alexey M. Kozlov, Diego Darriba, Tomáš7
Flouri, Benoit Morel, and Alexan-8
dros Stamatakis. RAxML-NG: a fast,9
scalable and user-friendly tool for10
maximum likelihood phylogenetic in-11
ference. Bioinformatics, May 2019.12
doi: 10.1093/bioinformatics/btz305.13
URL https://academic.oup.com/14
bioinformatics/advance-article/15
doi/10.1093/bioinformatics/btz305/16
5487384.17
O Kozlov. Models, Optimizations, and18
Tools for Large-Scale Phylogenetic Infer-19
ence, Handling Sequence Uncertainty, and20
Taxonomic Validation. PhD thesis, Karl-21
sruhe Institute of Technology (KIT), Octo-22
ber 2018.23
Sergey Kryazhimskiy and Joshua B Plotkin.24
The population genetics of dN/dS. PLoS25
Genet., 4(12):e1000304, December 2008.26
Jack Kuipers, Katharina Jahn, Benjamin J27
Raphael, and Niko Beerenwinkel. Single-28
cell sequencing data reveal widespread re-29
currence and loss of mutational hits in the30
life histories of tumors. Genome Res., 2731
(11):1885–1894, November 2017.32
Johannes Köster, Louis Dijkstra, Tobias33
Marschall, and Alexander Schönhuth.34
Enhancing sensitivity and controlling35
false discovery rate in somatic indel36
discovery. bioRxiv, page 741256, Au-37
gust 2019. doi: 10.1101/741256. URL38
https://www.biorxiv.org/content/10.39
1101/741256v1.40
Emma Laks, Hans Zahn, Daniel Lai, Andrew 41
McPherson, Adi Steif, Jazmine Brimhall, 42
Justina Biele, Beixi Wang, Tehmina 43
Masud, Diljot Grewal, Cydney Nielsen, 44
Samantha Leung, Viktoria Bojilova, Maia 45
Smith, Oleg Golovko, Steven Poon, Peter 46
Eirew, Farhia Kabeer, Teresa Ruiz de Al- 47
gara, So Ra Lee, M. Jafar Taghiyar, Curtis 48
Huebner, Jessica Ngo, Tim Chan, Spencer 49
Vatrt-Watts, Pascale Walters, Nafis Abrar, 50
Sophia Chan, Matt Wiens, Lauren Mar- 51
tin, R. Wilder Scott, Michael T. Under- 52
hill, Elizabeth Chavez, Christian Steidl, 53
Daniel Da Costa, Yusanne Ma, Robin J. N. 54
Coope, Richard Corbett, Stephen Plea- 55
sance, Richard Moore, Andy J. Mungall, 56
Cruk Imaxt Consortium, Marco A. Marra, 57
Carl Hansen, Sohrab Shah, and Samuel 58
Aparicio. Resource: Scalable whole genome 59
sequencing of 40,000 single cells identifies 60
stochastic aneuploidies, genome replication 61
states and clonal repertoires. bioRxiv, page 62
411058, September 2018. doi: 10.1101/ 63
411058. URL https://www.biorxiv.org/ 64
content/early/2018/09/13/411058. 65
Ruben T H M Larue, Gilles Defraene, 66
Dirk De Ruysscher, Philippe Lambin, and 67
Wouter van Elmpt. Quantitative radiomics 68
studies for tissue characterization: a re- 69
view of technology and methodological pro- 70
cedures. Br. J. Radiol., 90(1070):20160665, 71
February 2017. 72
Si Quang Le, Cuong Cao Dang, and Olivier 73
Gascuel. Modeling protein evolution with 74
several amino acid replacement matrices 75
depending on site rates. Mol. Biol. Evol., 76
29(10):2921–2936, October 2012. 77
Adam D Leaché, Barbara L Banbury, Joseph 78
Felsenstein, Adrián Nieto-Montes de Oca, 79
and Alexandros Stamatakis. Short tree, 80
long tree, right tree, wrong tree: New ac- 81
quisition bias corrections for inferring SNP 82
58
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
phylogenies. Syst. Biol., 64(6):1032–1047,1
2015.2
Je Hyuk Lee, Evan R Daugharthy, Jonathan3
Scheiman, Reza Kalhor, Thomas C Fer-4
rante, Richard Terry, Brian M Turczyk,5
Joyce L Yang, Ho Suk Lee, John Aach, Kun6
Zhang, and George M Church. Fluorescent7
in situ sequencing (FISSEQ) of RNA for8
gene expression profiling in intact cells and9
tissues. Nat. Protoc., 10(3):442–458, March10
2015.11
Jeffrey T. Leek, Robert B. Scharpf, Héc-12
tor Corrada Bravo, David Simcha, Ben-13
jamin Langmead, W. Evan Johnson, Don-14
ald Geman, Keith Baggerly, and Rafael A.15
Irizarry. Tackling the widespread and16
critical impact of batch effects in high-17
throughput data. Nature Reviews Genetics,18
11(10):733–739, October 2010. ISSN 1471-19
0064. doi: 10.1038/nrg2825. URL https:20
//www.nature.com/articles/nrg2825.21
A C Leote, X Wu, and A Beyer. Network-22
based imputation of dropouts in single-cell23
RNA sequencing data. bioRxiv, 2019.24
Wei Vivian Li and Jingyi Jessica Li. An accu-25
rate and robust imputation method scim-26
pute for single-cell RNA-seq data. Nat.27
Commun., 9(1):997, March 2018.28
Yuval Lieberman, Lior Rokach, and Tal29
Shay. CaSTLe – Classification of single30
cells by transfer learning: Harnessing31
the power of publicly available single cell32
RNA sequencing experiments to annotate33
new experiments. PLOS ONE, 13(10):34
e0205499, October 2018. ISSN 1932-6203.35
doi: 10.1371/journal.pone.0205499. URL36
https://journals.plos.org/plosone/37
article?id=10.1371/journal.pone.38
0205499.39
Chieh Lin, Siddhartha Jain, Hannah Kim, 40
and Ziv Bar-Joseph. Using neural networks 41
for reducing the dimensions of single-cell 42
RNA-Seq data. Nucleic Acids Res., 45(17): 43
e156, September 2017a. 44
Peijie Lin, Michael Troup, and Joshua W. K. 45
Ho. CIDR: Ultrafast and accurate clus- 46
tering through imputation for single-cell 47
RNA-seq data. Genome Biology, 18(1):59, 48
March 2017b. ISSN 1474-760X. doi: 10. 49
1186/s13059-017-1188-0. URL https:// 50
doi.org/10.1186/s13059-017-1188-0. 51
G C Linderman, J Zhao, and Y Kluger. Zero- 52
preserving imputation of scRNA-seq data 53
using low-rank approximation. bioRxiv, 54
2018. URL https://www.biorxiv.org/ 55
content/10.1101/397588v1.abstract. 56
Liang Liu, Zhenxiang Xi, Shaoyuan Wu, 57
Charles C Davis, and Scott V Edwards. Es- 58
timating phylogenetic trees from genome- 59
scale data. Ann. N. Y. Acad. Sci., 1360: 60
36–53, December 2015. 61
Jackson Loper, Trygve Bakken, Uygar 62
Sumbul, Gabe Murphy, Hongkui Zeng, 63
David Blei, and Liam Paninski. The 64
Markov link method: a nonparametric 65
approach to combine observations from 66
multiple experiments. bioRxiv, page 67
457283, January 2019. doi: 10.1101/ 68
457283. URL https://www.biorxiv.org/ 69
content/10.1101/457283v3. 70
Romain Lopez, Jeffrey Regier, Michael B 71
Cole, Michael I Jordan, and Nir Yosef. 72
Deep generative modeling for single-cell 73
transcriptomics. Nature methods, 15 74
(12):1053–1058, December 2018. ISSN 75
1548-7091, 1548-7105. doi: 10.1038/ 76
s41592-018-0229-2. URL http://dx.doi. 77
org/10.1038/s41592-018-0229-2. 78
59
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Eric Lubeck, Ahmet F Coskun, Timur1
Zhiyentayev, Mubhij Ahmad, and Long2
Cai. Single-cell in situ RNA profiling by3
sequential hybridization. Nat. Methods, 114
(4):360–361, April 2014.5
Aaron T L Lun and John C Marioni. Over-6
coming confounding plate effects in dif-7
ferential expression analyses of single-cell8
RNA-seq data. Biostatistics, 18(3):451–9
464, July 2017.10
Aaron T L Lun, Karsten Bach, and John C11
Marioni. Pooling across cells to normalize12
single-cell RNA sequencing data with many13
zero counts. Genome Biol., 17:75, April14
2016.15
Aaron T L Lun, Arianne C Richard, and16
John C Marioni. Testing for differential17
abundance in mass cytometry data. Nat.18
Methods, 14(7):707–709, July 2017.19
Tao Luo, Lei Fan, Rong Zhu, and Dong Sun.20
Microfluidic Single-Cell manipulation and21
analysis: Methods and applications. Micro-22
machines (Basel), 10(2), February 2019.23
Iain C. Macaulay, Mabel J. Teng, Wilfried24
Haerty, Parveen Kumar, Chris P. Ponting,25
and Thierry Voet. Separation and paral-26
lel sequencing of the genomes and tran-27
scriptomes of single cells using G&T-28
seq. Nature Protocols, 11(11):2081–2103,29
November 2016. ISSN 1750-2799. doi: 10.30
1038/nprot.2016.138. URL https://www.31
nature.com/articles/nprot.2016.138.32
Iain C. Macaulay, Chris P. Ponting, and33
Thierry Voet. Single-Cell Multiomics:34
Multiple Measurements from Single35
Cells. Trends in Genetics, 33(2):155–36
168, February 2017. ISSN 0168-9525.37
doi: 10.1016/j.tig.2016.12.003. URL38
https://www.ncbi.nlm.nih.gov/pmc/39
articles/PMC5303816/.40
Evan Z. Macosko, Anindita Basu, Rahul 41
Satija, James Nemesh, Karthik Shekhar, 42
Melissa Goldman, Itay Tirosh, Allison R. 43
Bialas, Nolan Kamitaki, Emily M. Marter- 44
steck, John J. Trombetta, David A. Weitz, 45
Joshua R. Sanes, Alex K. Shalek, Aviv 46
Regev, and Steven A. McCarroll. Highly 47
Parallel Genome-wide Expression Profil- 48
ing of Individual Cells Using Nanoliter 49
Droplets. Cell, 161(5):1202–1214, May 50
2015. ISSN 1097-4172. doi: 10.1016/j.cell. 51
2015.05.002. 52
Serghei Mangul, Lana S. Martin, Brian L. 53
Hill, Angela Ka-Mei Lam, Margaret G. 54
Distler, Alex Zelikovsky, Eleazar Es- 55
kin, and Jonathan Flint. Systematic 56
benchmarking of omics computational 57
tools. Nature Communications, 10(1): 58
1393, March 2019. ISSN 2041-1723. 59
doi: 10.1038/s41467-019-09406-4. URL 60
https://www.nature.com/articles/ 61
s41467-019-09406-4. 62
Gioele La Manno, Ruslan Soldatov, Amit 63
Zeisel, Emelie Braun, Hannah Hochgerner, 64
Viktor Petukhov, Katja Lidschreiber, 65
Maria E. Kastriti, Peter Lönnerberg, 66
Alessandro Furlan, Jean Fan, Lars E. 67
Borm, Zehua Liu, David van Bruggen, 68
Jimin Guo, Xiaoling He, Roger Barker, 69
Erik Sundström, Gonçalo Castelo-Branco, 70
Patrick Cramer, Igor Adameyko, Sten 71
Linnarsson, and Peter V. Kharchenko. 72
RNA velocity of single cells. Nature, 560 73
(7719):494, August 2018. ISSN 1476-4687. 74
doi: 10.1038/s41586-018-0414-6. URL 75
https://www.nature.com/articles/ 76
s41586-018-0414-6. 77
Erik A Martens, Rumen Kostadinov, Carlo C 78
Maley, and Oskar Hallatschek. Spatial 79
structure increases the waiting time for can- 80
cer. New J. Phys., 13, November 2011. 81
60
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Dariusz Matlak and Ewa Szczurek. Epista-1
sis in genomic and survival data of can-2
cer patients. PLoS Comput. Biol., 13(7):3
e1005626, July 2017.4
Davis James McCarthy, Raghd Rostom,5
Yuanhua Huang, Daniel J. Kunz, Petr6
Danecek, Marc Jan Bonder, Tzachi Hagai,7
HipSci Consortium, Wenyi Wang, Daniel J.8
Gaffney, Benjamin D. Simons, Oliver Ste-9
gle, and Sarah A. Teichmann. Cardelino:10
Integrating whole exomes and single-cell11
transcriptomes to reveal phenotypic im-12
pact of somatic variants. bioRxiv, page13
413047, November 2018. doi: 10.1101/14
413047. URL https://www.biorxiv.org/15
content/10.1101/413047v2.16
Nicholas McGranahan and Charles Swanton.17
Clonal heterogeneity and tumor evolution:18
Past, present, and the future. Cell, 168(4):19
613–628, February 2017.20
Chiara Medaglia, Amir Giladi, Liat Stoler-21
Barak, Marco De Giovanni, Tomer Meir22
Salame, Adi Biram, Eyal David, Han-23
jie Li, Matteo Iannacone, Ziv Shul-24
man, and Ido Amit. Spatial recon-25
struction of immune niches by combin-26
ing photoactivatable reporters and scRNA-27
seq. Science, 358(6370):1622–1626, De-28
cember 2017. ISSN 0036-8075, 1095-29
9203. doi: 10.1126/science.aao4277.30
URL http://science.sciencemag.org/31
content/358/6370/1622.32
Jing Meng and Yi-Ping Phoebe Chen. A33
database of simulated tumor genomes to-34
wards accurate detection of somatic small35
variants in cancer. PLoS One, 13(8):36
e0202982, August 2018.37
Christopher R. Merritt, Giang T. Ong,38
Sarah Church, Kristi Barker, Gary Geiss,39
Margaret Hoang, Jaemyeong Jung, Yan40
Liang, Jill McKay-Fleisch, Karen Nguyen, 41
Kristina Sorg, Isaac Sprague, Charles War- 42
ren, Sarah Warren, Zoey Zhou, Daniel R. 43
Zollinger, Dwayne L. Dunaway, Gordon B. 44
Mills, and Joseph M. Beechem. High 45
multiplex, digital spatial profiling of pro- 46
teins and RNA in fixed tissue using ge- 47
nomic detection methods. bioRxiv, page 48
559021, February 2019. doi: 10.1101/ 49
559021. URL https://www.biorxiv.org/ 50
content/10.1101/559021v2. 51
Zhun Miao, Jiaqi Li, and Xuegong Zhang. 52
screcover: Discriminating true and false ze- 53
ros in single-cell RNA-seq data for imputa- 54
tion. June 2019. 55
Franziska Michor, Yoh Iwasa, and Martin A 56
Nowak. Dynamics of cancer progression. 57
Nat. Rev. Cancer, 4(3):197–205, March 58
2004. 59
Jeffrey R Moffitt, Junjie Hao, Dhanan- 60
jay Bambah-Mukku, Tian Lu, Cather- 61
ine Dulac, and Xiaowei Zhuang. High- 62
performance multiplexed fluorescence in 63
situ hybridization in culture and tissue with 64
matrix imprinting and clearing. Proc. Natl. 65
Acad. Sci. U. S. A., 113(50):14456–14461, 66
December 2016. 67
Jeffrey R. Moffitt, Dhananjay Bambah- 68
Mukku, Stephen W. Eichhorn, Eric 69
Vaughn, Karthik Shekhar, Julio D. 70
Perez, Nimrod D. Rubinstein, Junjie 71
Hao, Aviv Regev, Catherine Dulac, 72
and Xiaowei Zhuang. Molecular, spa- 73
tial, and functional single-cell profiling 74
of the hypothalamic preoptic region. 75
Science, 362(6416):eaau5324, Novem- 76
ber 2018. ISSN 0036-8075, 1095-9203. 77
doi: 10.1126/science.aau5324. URL 78
http://science.sciencemag.org/ 79
content/362/6416/eaau5324. 80
61
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Kevin R Moon, Jay S Stanley, Daniel1
Burkhardt, David van Dijk, Guy Wolf, and2
Smita Krishnaswamy. Manifold learning-3
based methods for analyzing single-cell4
RNA-sequencing data. Current Opinion in5
Systems Biology, 7:36–46, 2018.6
Marmar Moussa and Ion I. Măndoiu.7
Locality Sensitive Imputation for Sin-8
gle Cell RNA-Seq Data. Journal of9
Computational Biology, February 2019.10
doi: 10.1089/cmb.2018.0236. URL11
https://www.liebertpub.com/doi/10.12
1089/cmb.2018.0236.13
Nature Methods, 2013. Method of the year14
2013. Nature Methods, 11:1 EP –, 1215
2013. URL https://doi.org/10.1038/16
nmeth.2801.17
Nicholas Navin, Jude Kendall, Jennifer Troge,18
Peter Andrews, Linda Rodgers, Jeanne19
McIndoo, Kerry Cook, Asya Stepan-20
sky, Dan Levy, Diane Esposito, Lakshmi21
Muthuswamy, Alex Krasnitz, W Richard22
McCombie, James Hicks, and Michael23
Wigler. Tumour evolution inferred by24
single-cell sequencing. Nature, 472(7341):25
90–94, April 2011.26
Richard A Neher, Colin A Russell, and Boris I27
Shraiman. Predicting evolution from the28
shape of genealogical trees. Elife, 3, Novem-29
ber 2014.30
Malgorzata Nowicka, Carsten Krieg, Lukas M31
Weber, Felix J Hartmann, Silvia Guglietta,32
Burkhard Becher, Mitchell P Levesque,33
and Mark D Robinson. CyTOF work-34
flow: differential discovery in high-35
throughput high-dimensional cytometry36
datasets. F1000Res., 6:748, May 2017.37
Huw A Ogilvie, Remco R Bouckaert, and38
Alexei J Drummond. StarBEAST2 brings39
faster species tree inference and accurate40
estimates of substitution rates. Mol. Biol. 41
Evol., 34(8):2101–2114, August 2017. 42
J Guillermo Paez, Ming Lin, Rameen 43
Beroukhim, Jeffrey C Lee, Xiaojun Zhao, 44
Daniel J Richter, Stacey Gabriel, Paula 45
Herman, Hidefumi Sasaki, David Alt- 46
shuler, Cheng Li, Matthew Meyerson, and 47
William R Sellers. Genome coverage and 48
sequence fidelity of phi29 polymerase-based 49
multiple strand displacement whole genome 50
amplification. Nucleic Acids Res., 32(9): 51
e71, May 2004. 52
Jong-Eun Park, Krzysztof Polanski, Kerstin 53
Meyer, and Sarah A. Teichmann. Fast 54
Batch Alignment of Single Cell Transcrip- 55
tomes Unifies Multiple Mouse Cell Atlases 56
into an Integrated Landscape. bioRxiv, 57
page 397042, August 2018. doi: 10.1101/ 58
397042. URL https://www.biorxiv.org/ 59
content/10.1101/397042v2. 60
Anoop P Patel, Itay Tirosh, John J Trom- 61
betta, Alex K Shalek, Shawn M Gille- 62
spie, Hiroaki Wakimoto, Daniel P Cahill, 63
Brian V Nahed, William T Curry, Robert L 64
Martuza, David N Louis, Orit Rozenblatt- 65
Rosen, Mario L Suvà, Aviv Regev, and 66
Bradley E Bernstein. Single-cell RNA- 67
seq highlights intratumoral heterogeneity 68
in primary glioblastoma. Science, 344 69
(6190):1396–1401, June 2014. 70
Tao Peng, Qin Zhu, Penghang Yin, and 71
Kai Tan. SCRABBLE: single-cell RNA- 72
seq imputation constrained by bulk RNA- 73
seq data. Genome biology, 20(1):88, May 74
2019. ISSN 1465-6906. doi: 10.1186/ 75
s13059-019-1681-8. URL http://dx.doi. 76
org/10.1186/s13059-019-1681-8. 77
N. Pezzotti, T. Höllt, B. Lelieveldt, E. Eise- 78
mann, and A. Vilanova. Hierarchical 79
Stochastic Neighbor Embedding. Com- 80
puter Graphics Forum, 35(3):21–30, 2016. 81
62
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
ISSN 1467-8659. doi: 10.1111/cgf.1
12878. URL https://onlinelibrary.2
wiley.com/doi/abs/10.1111/cgf.12878.3
Ángel J Picher, Bettina Budeus, Oliver4
Wafzig, Carola Krüger, Sara García-5
Gómez, María I Martínez-Jiménez, Al-6
berto Díaz-Talavera, Daniela Weber, Luis7
Blanco, and Armin Schneider. TruePrime8
is a novel method for whole-genome am-9
plification from single cells based on Tth-10
PrimPol. Nat. Commun., 7:13296, Novem-11
ber 2016.12
Emma Pierson and Christopher Yau.13
ZIFA: Dimensionality reduction for14
zero-inflated single-cell gene expres-15
sion analysis. Genome Biology, 1616
(1):241, November 2015. ISSN 1474-17
760X. doi: 10.1186/s13059-015-0805-z.18
URL https://doi.org/10.1186/19
s13059-015-0805-z.20
Mireya Plass, Jordi Solana, F. Alexan-21
der Wolf, Salah Ayoub, Aristotelis Mi-22
sios, Petar Glažar, Benedikt Obermayer,23
Fabian J. Theis, Christine Kocks, and Niko-24
laus Rajewsky. Cell type atlas and lineage25
tree of a whole complex animal by single-26
cell transcriptomics. Science, 360(6391):27
eaaq1723, May 2018. ISSN 0036-8075,28
1095-9203. doi: 10.1126/science.aaq1723.29
URL http://science.sciencemag.org/30
content/360/6391/eaaq1723.31
Hannah A. Pliner, Jonathan S. Packer,32
José L. McFaline-Figueroa, Darren A.33
Cusanovich, Riza M. Daza, Delasa34
Aghamirzaie, Sanjay Srivatsan, Xiaojie35
Qiu, Dana Jackson, Anna Minkina, An-36
drew C. Adey, Frank J. Steemers, Jay37
Shendure, and Cole Trapnell. Cicero38
Predicts cis-Regulatory DNA Interactions39
from Single-Cell Chromatin Accessi-40
bility Data. Molecular Cell, 71(5):41
858–871.e8, 2018. ISSN 1097-4164. doi: 42
10.1016/j.molcel.2018.06.044. 43
Olivier Poirion, Xun Zhu, Travers Ching, and 44
Lana X. Garmire. Using single nucleotide 45
variations in single-cell RNA-seq to identify 46
subpopulations and genotype-phenotype 47
linkage. Nature Communications, 9(1): 48
4892, November 2018. ISSN 2041-1723. 49
doi: 10.1038/s41467-018-07170-5. URL 50
https://www.nature.com/articles/ 51
s41467-018-07170-5. 52
David D Pollock, Derrick J Zwickl, Jimmy A 53
McGuire, and David M Hillis. Increased 54
taxon sampling is advantageous for phylo- 55
genetic inference. Syst. Biol., 51(4):664– 56
671, August 2002. 57
Vladimir Potapov and Jennifer L Ong. Ex- 58
amining sources of error in PCR by Single- 59
Molecule sequencing. PLoS One, 12(1): 60
e0169774, January 2017. 61
Xiaojie Qiu, Qi Mao, Ying Tang, Li Wang, 62
Raghav Chawla, Hannah A. Pliner, and 63
Cole Trapnell. Reversed graph embed- 64
ding resolves complex single-cell trajecto- 65
ries. Nature Methods, 14(10):979–982, Oc- 66
tober 2017. ISSN 1548-7105. doi: 10.1038/ 67
nmeth.4402. URL https://www.nature. 68
com/articles/nmeth.4402. 69
Bruce Rannala and Ziheng Yang. Efficient 70
bayesian species tree inference under the 71
multispecies coalescent. Syst. Biol., 66(5): 72
823–842, September 2017. 73
Benjamin Redelings. Erasing errors due to 74
alignment ambiguity when estimating posi- 75
tive selection. Mol. Biol. Evol., 31(8):1979– 76
1993, August 2014. 77
Aviv Regev, Sarah A. Teichmann, Eric S. 78
Lander, Ido Amit, Christophe Benoist, 79
Ewan Birney, Bernd Bodenmiller, Peter 80
63
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Campbell, Piero Carninci, Menna Clat-1
worthy, Hans Clevers, Bart Deplancke,2
Ian Dunham, James Eberwine, Roland3
Eils, Wolfgang Enard, Andrew Farmer,4
Lars Fugger, Berthold Göttgens, Nir Ha-5
cohen, Muzlifah Haniffa, Martin Hem-6
berg, Seung Kim, Paul Klenerman, Arnold7
Kriegstein, Ed Lein, Sten Linnarsson,8
Joakim Lundeberg, Partha Majumder,9
John C. Marioni, Miriam Merad, Musa10
Mhlanga, Martijn Nawijn, Mihai Netea,11
Garry Nolan, Dana Pe’er, Anthony Philli-12
pakis, Chris P. Ponting, Steve Quake,13
Wolf Reik, Orit Rozenblatt-Rosen, Joshua14
Sanes, Rahul Satija, Ton N. Schumacher,15
Alex Shalek, Ehud Shapiro, Padmanee16
Sharma, Jay W. Shin, Oliver Stegle,17
Michael Stratton, Michael J. T. Stubbing-18
ton, Alexander van Oudenaarden, Allon19
Wagner, Fiona Watt, Jonathan Weissman,20
Barbara Wold, Ramnik Xavier, Nir Yosef,21
and the Human Cell Atlas Meeting Partic-22
ipants. The Human Cell Atlas. bioRxiv,23
page 121202, May 2017. doi: 10.1101/24
121202. URL https://www.biorxiv.org/25
content/10.1101/121202v1.26
John E. Reid and Lorenz Wernisch. Pseu-27
dotime estimation: deconfounding single28
cell time series. Bioinformatics, 32(19):29
2973–2980, October 2016. ISSN 1367-30
4803. doi: 10.1093/bioinformatics/btw372.31
URL https://academic.oup.com/32
bioinformatics/article/32/19/2973/33
2196633.34
Stephen Reid, Jonathan Taylor, and Robert35
Tibshirani. A general framework for esti-36
mation and inference from clusters of fea-37
tures. J. Am. Stat. Assoc., 113(521):280–38
293, January 2018.39
Davide Risso, Fanny Perraudeau, Svet-40
lana Gribkova, Sandrine Dudoit, and41
Jean-Philippe Vert. A general and42
flexible method for signal extraction 43
from single-cell RNA-seq data. Na- 44
ture Communications, 9(1):284, Jan- 45
uary 2018. ISSN 2041-1723. doi: 46
10.1038/s41467-017-02554-5. URL 47
https://www.nature.com/articles/ 48
s41467-017-02554-5. 49
Elena Rivas and Sean R Eddy. Probabilis- 50
tic phylogenetic inference with insertions 51
and deletions. PLoS Comput. Biol., 4(9): 52
e1000172, September 2008. 53
Abbas H. Rizvi, Pablo G. Camara, Elena K. 54
Kandror, Thomas J. Roberts, Ira Schieren, 55
Tom Maniatis, and Raul Rabadan. Single- 56
cell topological RNA-seq analysis reveals 57
insights into cellular differentiation and de- 58
velopment. Nature Biotechnology, 35(6): 59
551–560, 2017. ISSN 1546-1696. doi: 10. 60
1038/nbt.3854. 61
Simone Rizzetto, Auda A Eltahla, Peijie Lin, 62
Rowena Bull, Andrew R Lloyd, Joshua 63
W K Ho, Vanessa Venturi, and Fabio Lu- 64
ciani. Impact of sequencing depth and read 65
length on single cell RNA sequencing data 66
of T cells. Sci. Rep., 7(1):12781, October 67
2017. 68
S Roch. A short proof that phylogenetic tree 69
reconstruction by maximum likelihood is 70
hard. IEEE/ACM Trans. Comput. Biol. 71
Bioinform., 3(1):92–94, 2006. 72
Samuel G. Rodriques, Robert R. Stick- 73
els, Aleksandrina Goeva, Carly A. Mar- 74
tin, Evan Murray, Charles R. Vander- 75
burg, Joshua Welch, Linlin M. Chen, Fei 76
Chen, and Evan Z. Macosko. Slide- 77
seq: A scalable technology for measur- 78
ing genome-wide expression at high spa- 79
tial resolution. Science, 363(6434):1463– 80
1467, March 2019. ISSN 0036-8075, 81
1095-9203. doi: 10.1126/science.aaw1219. 82
64
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
URL https://science.sciencemag.org/1
content/363/6434/1463.2
Florian Rohart, Aida Eslami, Nicholas Mati-3
gian, Stéphanie Bougeard, and Kim-4
Anh Lê Cao. MINT: a multivari-5
ate integrative method to identify re-6
producible molecular signatures across7
independent experiments and platforms.8
BMC Bioinformatics, 18(1):128, February9
2017a. ISSN 1471-2105. doi: 10.1186/10
s12859-017-1553-8. URL https://doi.11
org/10.1186/s12859-017-1553-8.12
Florian Rohart, Benoît Gautier, Amrit13
Singh, and Kim-Anh Lê Cao. mixOmics:14
An R package for ‘omics feature se-15
lection and multiple data integration.16
PLOS Computational Biology, 13(11):17
e1005752, November 2017b. ISSN 1553-18
7358. doi: 10.1371/journal.pcbi.1005752.19
URL https://journals.plos.org/20
ploscompbiol/article?id=10.1371/21
journal.pcbi.1005752.22
Alexander B. Rosenberg, Charles M. Roco,23
Richard A. Muscat, Anna Kuchina, Paul24
Sample, Zizhen Yao, Lucas T. Graybuck,25
David J. Peeler, Sumit Mukherjee, Wei26
Chen, Suzie H. Pun, Drew L. Sellers,27
Bosiljka Tasic, and Georg Seelig. Single-28
cell profiling of the developing mouse brain29
and spinal cord with split-pool barcoding.30
Science, 360(6385):176–182, April 2018.31
ISSN 0036-8075, 1095-9203. doi: 10.1126/32
science.aam8999. URL http://science.33
sciencemag.org/content/360/6385/176.34
Edith M Ross and Florian Markowetz. On-35
coNEM: inferring tumor evolution from36
single-cell sequencing data. Genome Biol.,37
17:69, April 2016.38
Andrew Roth, Andrew McPherson, Emma39
Laks, Justina Biele, Damian Yap, Adrian40
Wan, Maia A Smith, Cydney B Nielsen, 41
Jessica N McAlpine, Samuel Aparicio, 42
Alexandre Bouchard-Côté, and Sohrab P 43
Shah. Clonal genotype and population 44
structure inference from single-cell tumor 45
sequencing. Nat. Methods, 13(7):573–576, 46
July 2016. 47
Adela Saco, Jose Ramírez, Natalia Rakislova, 48
Aurea Mira, and Jaume Ordi. Validation of 49
Whole-Slide imaging for histolopathogical 50
diagnosis: Current state. Pathobiology, 83 51
(2-3):89–98, April 2016. 52
Wouter Saelens, Robrecht Cannoodt, 53
Helena Todorov, and Yvan Saeys. A 54
comparison of single-cell trajectory in- 55
ference methods. Nature Biotechnology, 56
page 1, April 2019. ISSN 1546-1696. 57
doi: 10.1038/s41587-019-0071-9. URL 58
https://www.nature.com/articles/ 59
s41587-019-0071-9. 60
Yvan Saeys, Sofie Van Gassen, and Bart N. 61
Lambrecht. Computational flow cytom- 62
etry: helping to make sense of high- 63
dimensional immunology data. Nature 64
Reviews Immunology, 16(7):449–462, July 65
2016. ISSN 1474-1741. doi: 10.1038/nri. 66
2016.56. URL https://www.nature.com/ 67
articles/nri.2016.56. 68
Stefano Santaguida, Amelia Richardson, 69
Divya Ramalingam Iyer, Ons M’Saad, 70
Lauren Zasadil, Kristin A. Knouse, 71
Yao Liang Wong, Nicholas Rhind, Arshad 72
Desai, and Angelika Amon. Chromosome 73
Mis-segregation Generates Cell-Cycle- 74
Arrested Cells with Complex Karyotypes 75
that Are Eliminated by the Immune 76
System. Developmental Cell, 41(6): 77
638–651.e5, June 2017. ISSN 15345807. 78
doi: 10.1016/j.devcel.2017.05.022. URL 79
https://linkinghub.elsevier.com/ 80
retrieve/pii/S1534580717304306. 81
65
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Gryte Satas and Benjamin J Raphael. Haplo-1
type phasing in single-cell DNA-sequencing2
data. Bioinformatics, 34(13):i211–i217,3
July 2018.4
Rahul Satija, Jeffrey A Farrell, David Gen-5
nert, Alexander F Schier, and Aviv Regev.6
Spatial reconstruction of single-cell gene ex-7
pression data. Nat. Biotechnol., 33(5):495–8
502, May 2015.9
Kenta Sato, Koki Tsuyuzaki, Kentaro10
Shimizu, and Itoshi Nikaido. CellFish-11
ing.jl: an ultrafast and scalable cell12
search method for single-cell RNA sequenc-13
ing. Genome Biology, 20(1):31, February14
2019. ISSN 1474-760X. doi: 10.1186/15
s13059-019-1639-x. URL https://doi.16
org/10.1186/s13059-019-1639-x.17
Arpiar Saunders, Evan Z. Macosko, Alec18
Wysoker, Melissa Goldman, Fenna M.19
Krienen, Heather de Rivera, Elizabeth20
Bien, Matthew Baum, Laura Bortolin,21
Shuyu Wang, Aleksandrina Goeva, James22
Nemesh, Nolan Kamitaki, Sara Brum-23
baugh, David Kulp, and Steven A.24
McCarroll. Molecular Diversity and Spe-25
cializations among the Cells of the Adult26
Mouse Brain. Cell, 174(4):1015–1030.e16,27
August 2018. ISSN 0092-8674. doi:28
10.1016/j.cell.2018.07.028. URL http:29
//www.sciencedirect.com/science/30
article/pii/S0092867418309553.31
Denis Schapiro, Hartland W Jackson, Swetha32
Raghuraman, Jana R Fischer, Vito R T33
Zanotelli, Daniel Schulz, Charlotte Giesen,34
Raúl Catena, Zsuzsanna Varga, and Bernd35
Bodenmiller. histoCAT: analysis of cell36
phenotypes and interactions in multiplex37
image cytometry data. Nat. Methods, 1438
(9):873–876, September 2017.39
Geoffrey Schiebinger, Jian Shu, Marcin40
Tabaka, Brian Cleary, Vidya Subrama-41
nian, Aryeh Solomon, Siyan Liu, Sta- 42
cie Lin, Peter Berube, Lia Lee, Jenny 43
Chen, Justin Brumbaugh, Philippe Rigol- 44
let, Konrad Hochedlinger, Rudolf Jaenisch, 45
Aviv Regev, and Eric S. Lander. Re- 46
construction of developmental landscapes 47
by optimal-transport analysis of single- 48
cell gene expression sheds light on cel- 49
lular reprogramming. bioRxiv, page 50
191056, September 2017. doi: 10.1101/ 51
191056. URL https://www.biorxiv.org/ 52
content/10.1101/191056v1. 53
Herbert B Schiller, Daniel T Montoro, 54
Lukas M Simon, Emma L Rawlins, 55
Kerstin B Meyer, Maximilian Strunz, 56
Felipe Vieira Braga, Wim Timens, Ger- 57
ard H Koppelman, G.R. Scott Budinger, 58
Janette K Burgess, Avinash Waghray, 59
Maarten van den Berge, Fabian J Theis, 60
Aviv Regev, Naftali Kaminski, Jayaraj 61
Rajagopal, Sarah A Teichmann, Alexan- 62
der V Misharin, and Martijn C Nawijn. 63
The Human Lung Cell Atlas - A high- 64
resolution reference map of the human 65
lung in health and disease. American 66
Journal of Respiratory Cell and Molecular 67
Biology, April 2019. ISSN 1044-1549. 68
doi: 10.1165/rcmb.2018-0416TR. URL 69
https://www.atsjournals.org/doi/ 70
abs/10.1165/rcmb.2018-0416TR. 71
Roland F. Schwarz, Anne Trinh, Botond 72
Sipos, James D. Brenton, Nick Gold- 73
man, and Florian Markowetz. Phyloge- 74
netic Quantification of Intra-tumour Het- 75
erogeneity. PLoS Computational Biol- 76
ogy, 10(4):e1003535, April 2014. ISSN 77
1553-7358. doi: 10.1371/journal.pcbi. 78
1003535. URL https://dx.plos.org/10. 79
1371/journal.pcbi.1003535. 80
Roberto Semeraro, Valerio Orlandini, and Al- 81
berto Magi. Xome-Blender: A novel can- 82
66
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
cer genome simulator. PLoS One, 13(4):1
e0194472, April 2018.2
Debarka Sengupta, Nirmala Arul Rayan,3
Michelle Lim, Bing Lim, and Shyam4
Prabhakar. Fast, scalable and accu-5
rate differential expression analysis for6
single cells. bioRxiv, page 049734,7
April 2016. doi: 10.1101/049734. URL8
https://www.biorxiv.org/content/10.9
1101/049734v1.10
Manu Setty, Michelle D. Tadmor, Shlomit11
Reich-Zeliger, Omer Angel, Tomer Meir12
Salame, Pooja Kathail, Kristy Choi, Sean13
Bendall, Nir Friedman, and Dana Pe’er.14
Wishbone identifies bifurcating develop-15
mental trajectories from single-cell data.16
Nature Biotechnology, 34(6):637–645, June17
2016. ISSN 1546-1696. doi: 10.1038/nbt.18
3569. URL https://www.nature.com/19
articles/nbt.3569.20
D T Severson, R P Owen, M J White, X Lu,21
and B Schuster-Böckler. BEARscc deter-22
mines robustness of single-cell clusters us-23
ing simulated technical replicates. Nat.24
Commun., 9(1):1187, March 2018.25
Sheel Shah, Eric Lubeck, Wen Zhou, and26
Long Cai. In situ transcription profiling of27
single cells reveals spatial organization of28
cells in the mouse hippocampus. Neuron,29
92(2):342–357, October 2016.30
Arun Shivanandan, Jayakrishnan Unnikrish-31
nan, and Aleksandra Radenovic. On char-32
acterizing protein spatial clusters with cor-33
relation approaches. Sci. Rep., 6:31164,34
August 2016.35
Angus M Sidore, Freeman Lan, Shaun W36
Lim, and Adam R Abate. Enhanced se-37
quencing coverage with digital droplet mul-38
tiple displacement amplification. Nucleic39
Acids Res., 44(7):e66, April 2016.40
Jochen Singer, Jack Kuipers, Katharina 41
Jahn, and Niko Beerenwinkel. Single-cell 42
mutation identification via phylogenetic 43
inference. Nature Communications, 9(1): 44
5144, December 2018. ISSN 2041-1723. 45
doi: 10.1038/s41467-018-07627-7. URL 46
https://www.nature.com/articles/ 47
s41467-018-07627-7. 48
Amrit Singh, Benoit Gautier, Casey P. 49
Shannon, Florian Rohart, Michael Vacher, 50
Scott J. Tebutt, and Kim-Anh Le Cao. 51
DIABLO: from multi-omics assays to 52
biomarker discovery, an integrative ap- 53
proach. bioRxiv, page 067611, March 54
2018. doi: 10.1101/067611. URL 55
https://www.biorxiv.org/content/10. 56
1101/067611v2. 57
Debajyoti Sinha, Akhilesh Kumar, Himan- 58
shu Kumar, Sanghamitra Bandyopadhyay, 59
and Debarka Sengupta. dropclust: efficient 60
clustering of ultra-large scRNA-seq data. 61
Nucleic Acids Res., 46(6):e36, April 2018. 62
Pavel Skums, Viachaslau Tsyvina, and Alex 63
Zelikovsky. Inference of clonal selec- 64
tion in cancer populations using single- 65
cell sequencing data. bioRxiv, page 66
465211, January 2019. doi: 10.1101/ 67
465211. URL https://www.biorxiv.org/ 68
content/10.1101/465211v2. 69
Martin D Smith, Joel O Wertheim, Steven 70
Weaver, Ben Murrell, Konrad Scheffler, and 71
Sergei L Kosakovsky Pond. Less is more: an 72
adaptive branch-site random effects model 73
for efficient detection of episodic diversify- 74
ing selection. Mol. Biol. Evol., 32(5):1342– 75
1353, May 2015. 76
Charlotte Soneson and Mark D Robinson. To- 77
wards unified quality verification of syn- 78
thetic count data with countsimQC. Bioin- 79
formatics, 34(4):691–692, 2017. 80
67
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Charlotte Soneson and Mark D Robinson.1
Bias, robustness and scalability in single-2
cell differential expression analysis. Nat.3
Methods, February 2018.4
Bastiaan Spanjaard, Bo Hu, Nina Mitic,5
Pedro Olivares-Chauvet, Sharan Janjuha,6
Nikolay Ninov, and Jan Philipp Junker.7
Simultaneous lineage tracing and cell-type8
identification using CRISPR-Cas9-induced9
genetic scars. Nat. Biotechnol., 36(5):469–10
473, June 2018.11
C Spits, C Le Caignec, M De Rycke,12
L Van Haute, A Van Steirteghem,13
I Liebaers, and K Sermon. Optimization14
and evaluation of single-cell whole-genome15
multiple displacement amplification. Hum.16
Mutat., 27(5):496–503, 2006a.17
Claudia Spits, Cédric Le Caignec, Martine18
De Rycke, Lindsey Van Haute, André19
Van Steirteghem, Inge Liebaers, and Karen20
Sermon. Whole-genome multiple displace-21
ment amplification from single cells. Nat.22
Protoc., 1(4):1965–1970, November 2006b.23
S Srinivasan, N T Johnson, and D Ko-24
rkin. A Hybrid Deep Clustering Ap-25
proach for Robust Cell Type Profiling Us-26
ing Single-cell RNA-seq Data. bioRxiv,27
2019. URL https://www.biorxiv.org/28
content/10.1101/511626v1.abstract.29
Divyanshu Srivastava, Arvind Iyer, Vibhor30
Kumar, and Debarka Sengupta. Cel-31
lAtlasSearch: a scalable search engine32
for single cells. Nucleic Acids Re-33
search, 46(W1):W141–W147, July 2018.34
ISSN 0305-1048. doi: 10.1093/nar/35
gky421. URL https://academic.oup.36
com/nar/article/46/W1/W141/5000022.37
Oliver Stegle, Sarah A Teichmann, and38
John C Marioni. Computational and an-39
alytical challenges in single-cell transcrip-40
tomics. Nat. Rev. Genet., 16(3):133, Jan- 41
uary 2015. 42
Genevieve L Stein-O’Brien, Brian S Clark, 43
Thomas Sherman, Cristina Zibetti, Qi- 44
wen Hu, Rachel Sealfon, Sheng Liu, Jiang 45
Qian, Carlo Colantuoni, Seth Blackshaw, 46
Loyal A Goff, and Elana J Fertig. De- 47
composing Cell Identity for Transfer Learn- 48
ing across Cellular Measurements, Plat- 49
forms, Tissues, and Species. Cell sys- 50
tems, 8(5):395–411.e8, May 2019. ISSN 51
2405-4720, 2405-4712. doi: 10.1016/j.cels. 52
2019.04.004. URL http://dx.doi.org/ 53
10.1016/j.cels.2019.04.004. 54
Lars Steinbrück and Alice Carolyn McHardy. 55
Allele dynamics plots for the study of evolu- 56
tionary dynamics in viral populations. Nu- 57
cleic Acids Res., 39(1):e4, January 2011. 58
Carina Strell, Markus M Hilscher, Navya 59
Laxman, Jessica Svedlund, Chenglin Wu, 60
Chika Yokota, and Mats Nilsson. Placing 61
RNA in context and space - methods for 62
spatially resolved transcriptomics. FEBS 63
J., March 2018. 64
Tim Stuart, Andrew Butler, Paul Hoffman, 65
Christoph Hafemeister, Efthymia Papalexi, 66
William M. Mauck, Marlon Stoeckius, Pe- 67
ter Smibert, and Rahul Satija. Comprehen- 68
sive integration of single cell data. bioRxiv, 69
page 460147, November 2018. doi: 10.1101/ 70
460147. URL https://www.biorxiv.org/ 71
content/10.1101/460147v1. 72
Michael J T Stubbington, Orit Rozenblatt- 73
Rosen, Aviv Regev, and Sarah A Teich- 74
mann. Single-cell transcriptomics to ex- 75
plore the immune system in health and dis- 76
ease. Science, 358(6359):58–63, October 77
2017. 78
Patrik L. Ståhl, Fredrik Salmén, Sanja Vick- 79
ovic, Anna Lundmark, José Fernández 80
68
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Navarro, Jens Magnusson, Stefania Gia-1
comello, Michaela Asp, Jakub O. West-2
holm, Mikael Huss, Annelie Mollbrink,3
Sten Linnarsson, Simone Codeluppi, Åke4
Borg, Fredrik Pontén, Paul Igor Costea,5
Pelin Sahlén, Jan Mulder, Olaf Bergmann,6
Joakim Lundeberg, and Jonas Frisén. Vi-7
sualization and analysis of gene expres-8
sion in tissue sections by spatial transcrip-9
tomics. Science (New York, N.Y.), 35310
(6294):78–82, July 2016. ISSN 1095-9203.11
doi: 10.1126/science.aaf2403.12
S Sun, J Zhu, Y Ma, and X Zhou. Ac-13
curacy, Robustness and Scalability of Di-14
mensionality Reduction Methods for Sin-15
gle Cell RNAseq Analysis. bioRxiv,16
2019. URL https://www.biorxiv.org/17
content/10.1101/641142v1.abstract.18
Valentine Svensson, Sarah A Teichmann, and19
Oliver Stegle. SpatialDE: identification of20
spatially variable genes. Nat. Methods, 1521
(5):343–346, May 2018a.22
Valentine Svensson, Roser Vento-Tormo, and23
Sarah A. Teichmann. Exponential scal-24
ing of single-cell RNA-seq in the past25
decade. Nature Protocols, 13(4):599–604,26
April 2018b. ISSN 1750-2799. doi: 10.27
1038/nprot.2017.149. URL https://www.28
nature.com/articles/nprot.2017.149.29
Charles Swanton. Intratumor heterogeneity:30
evolution through space and time. Cancer31
Res., 72(19):4875–4882, October 2012.32
Ewa Szczurek, Navodit Misra, and Martin33
Vingron. Synthetic sickness or lethality34
points at candidate combination therapy35
targets in glioblastoma. Int. J. Cancer, 13336
(9):2123–2132, November 2013.37
The Tabula Muris Consortium. Single-cell38
transcriptomics of 20 mouse organs cre-39
ates a Tabula Muris. Nature, 562(7727):40
367, October 2018. ISSN 1476-4687. 41
doi: 10.1038/s41586-018-0590-4. URL 42
https://www.nature.com/articles/ 43
s41586-018-0590-4. 44
Divyanshu Talwar, Aanchal Mongia, De- 45
barka Sengupta, and Angshul Majum- 46
dar. AutoImpute: Autoencoder based 47
imputation of single-cell RNA-seq data. 48
Scientific reports, 8(1):16329, November 49
2018. ISSN 2045-2322. doi: 10.1038/ 50
s41598-018-34688-x. URL http://dx. 51
doi.org/10.1038/s41598-018-34688-x. 52
Amos Tanay and Aviv Regev. Scaling 53
single-cell genomics from phenomenology 54
to mechanism. Nature, 541(7637):331–338, 55
January 2017. 56
W Tang, F Bertaux, P Thomas, C Ste- 57
fanelli, M Saint, and others. bayNorm: 58
Bayesian gene expression recovery, im- 59
putation and normalisation for single 60
cell RNA-sequencing data. bioRxiv, 61
2018. URL https://www.biorxiv.org/ 62
content/10.1101/384586v2.abstract. 63
H Telenius, N P Carter, C E Bebb, M Norden- 64
skjöld, B A Ponder, and A Tunnacliffe. De- 65
generate oligonucleotide-primed PCR: gen- 66
eral amplification of target DNA by a single 67
degenerate primer. Genomics, 13(3):718– 68
725, July 1992. 69
Luyi Tian, Xueyi Dong, Saskia Freytag, 70
Kim-Anh Lê Cao, Shian Su, Abolfazl 71
JalalAbadi, Daniela Amann-Zalcenstein, 72
Tom S. Weber, Azadeh Seidi, Jafar S. 73
Jabbari, Shalin H. Naik, and Matthew E. 74
Ritchie. Benchmarking single cell RNA- 75
sequencing analysis pipelines using mixture 76
control experiments. Nature Methods, 16 77
(6):479, June 2019. ISSN 1548-7105. 78
doi: 10.1038/s41592-019-0425-8. URL 79
https://www.nature.com/articles/ 80
s41592-019-0425-8. 81
69
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
F William Townes, Stephanie C Hicks,1
Martin J Aryee, and Rafael A Irizarry.2
Feature Selection and Dimension Reduc-3
tion for Single Cell RNA-Seq based on a4
Multinomial Model. March 2019. URL5
https://www.biorxiv.org/content/10.6
1101/574574v1.7
Cole Trapnell, Davide Cacchiarelli, Jonna8
Grimsby, Prapti Pokharel, Shuqiang Li,9
Michael Morse, Niall J. Lennon, Kenneth J.10
Livak, Tarjei S. Mikkelsen, and John L.11
Rinn. The dynamics and regulators of cell12
fate decisions are revealed by pseudotempo-13
ral ordering of single cells. Nature Biotech-14
nology, 32(4):381–386, April 2014. ISSN15
1546-1696. doi: 10.1038/nbt.2859.16
Samra Turajlic and Charles Swanton. Metas-17
tasis as an evolutionary process. Science,18
352(6282):169–175, April 2016.19
Vincent van Unen, Thomas Höllt, Nicola20
Pezzotti, Na Li, Marcel J. T. Rein-21
ders, Elmar Eisemann, Frits Koning,22
Anna Vilanova, and Boudewijn P. F.23
Lelieveldt. Visual analysis of mass cy-24
tometry data by hierarchical stochastic25
neighbour embedding reveals rare cell26
types. Nature Communications, 8(1):27
1740, November 2017. ISSN 2041-1723.28
doi: 10.1038/s41467-017-01689-9. URL29
https://www.nature.com/articles/30
s41467-017-01689-9.31
Catalina A Vallejos, John C Marioni, and32
Sylvia Richardson. BASiCS: Bayesian anal-33
ysis of Single-Cell sequencing data. PLoS34
Comput. Biol., 11(6):e1004333, June 2015.35
Trieu My Van and Christian U. Blank.36
A user’s perspective on GeoMxTM37
digital spatial profiling. Immuno-38
Oncology Technology, 1:11–18, July39
2019. ISSN 2590-0188, 2590-0188.40
doi: 10.1016/j.iotech.2019.05.001. URL 41
https://www.esmoiotech.org/article/ 42
S2590-0188(19)30002-4/abstract. 43
Koen van den Berge, Hector Roux de 44
Bezieux, Kelly Street, Wouter Saelens, Ro- 45
brecht Cannoodt, Yvan Saeys, Sandrine 46
Dudoit, and Lieven Clement. Trajectory- 47
based differential expression analysis for 48
single-cell sequencing data. bioRxiv, 49
page 623397, May 2019. doi: 10.1101/ 50
623397. URL https://www.biorxiv.org/ 51
content/10.1101/623397v1. 52
Dimitrios V Vavoulis, Margherita 53
Francescatto, Peter Heutink, and Ju- 54
lian Gough. DGEclust: differential 55
expression analysis of clustered count data. 56
Genome Biol., 16:39, February 2015. 57
A Verma and B Engelhardt. A robust nonlin- 58
ear low-dimensional manifold for single cell 59
RNA-seq data. bioRxiv, 2018. 60
Beate Vieth, Christoph Ziegenhain, Swati 61
Parekh, Wolfgang Enard, and Ines Hell- 62
mann. powsimr: power analysis for 63
bulk and single cell RNA-seq experiments. 64
Bioinformatics, 33(21):3486–3488, Novem- 65
ber 2017. 66
Irma Virant-Klun, Stefan Leicht, Christopher 67
Hughes, and Jeroen Krijgsveld. Identifi- 68
cation of Maturation-Specific Proteins by 69
Single-Cell Proteomics of Human Oocytes. 70
Molecular & cellular proteomics: MCP, 15 71
(8):2616–2627, 2016. ISSN 1535-9484. doi: 72
10.1074/mcp.M115.056887. 73
Sarah A. Vitak, Kristof A. Torkenczy, Jimi L. 74
Rosenkrantz, Andrew J. Fields, Lena 75
Christiansen, Melissa H. Wong, Lucia Car- 76
bone, Frank J. Steemers, and Andrew 77
Adey. Sequencing thousands of single-cell 78
genomes with combinatorial indexing. Na- 79
ture Methods, 14(3):302–308, March 2017. 80
70
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
ISSN 1548-7105. doi: 10.1038/nmeth.1
4154. URL https://www.nature.com/2
articles/nmeth.4154.3
Bartlomiej Waclaw, Ivana Bozic, Meredith E4
Pittman, Ralph H Hruban, Bert Vogelstein,5
and Martin A Nowak. A spatial model6
predicts that dispersal and cell turnover7
limit intratumour heterogeneity. Nature,8
525(7568):261–264, September 2015.9
Daniel E. Wagner, Caleb Weinreb, Zach M.10
Collins, James A. Briggs, Sean G. Megason,11
and Allon M. Klein. Single-cell mapping of12
gene expression landscapes and lineage in13
the zebrafish embryo. Science, 360(6392):14
981–987, June 2018a. ISSN 0036-8075,15
1095-9203. doi: 10.1126/science.aar4362.16
URL http://science.sciencemag.org/17
content/360/6392/981.18
F Wagner, D Barkley, and I Yanai. Accurate19
denoising of single-cell RNA-Seq data us-20
ing unbiased principal component analysis.21
bioRxiv, 2019.22
Florian Wagner and Itai Yanai. Moana: A23
robust and scalable cell type classifica-24
tion framework for single-cell RNA-Seq25
data. bioRxiv, page 456129, Octo-26
ber 2018. doi: 10.1101/456129. URL27
https://www.biorxiv.org/content/10.28
1101/456129v1.29
Florian Wagner, Yun Yan, and Itai30
Yanai. K-nearest neighbor smooth-31
ing for high-throughput single-cell32
RNA-Seq data. January 2018b. URL33
https://www.biorxiv.org/content/34
early/2018/01/24/217737?rss=1.35
Dongfang Wang and Jin Gu. VASC: Dimen-36
sion reduction and visualization of single-37
cell RNA-seq data by deep variational au-38
toencoder. Genomics Proteomics Bioinfor-39
matics, 16(5):320–331, October 2018.40
Jian Wang and Yuanlin Song. Single cell se- 41
quencing: a distinct new field. Clin. Transl. 42
Med., 6(1):10, December 2017. 43
Jingshu Wang, Divyansh Agarwal, 44
Mo Huang, Gang Hu, Zilu Zhou, Vin- 45
cent B Conley, Hugh MacMullan, and 46
Nancy R Zhang. Transfer learning in 47
single-cell transcriptomics improves data 48
denoising and pattern discovery. November 49
2018. 50
T Wang, T S Johnson, W Shao, J Zhang, and 51
K Huang. BURMUDA: A novel deep trans- 52
fer learning method for single-cell RNA se- 53
quencing batch correction reveals hidden 54
high-resolution cellular subtypes. bioRxiv, 55
2019. 56
Gregory P Way and Casey S Greene. Extract- 57
ing a biologically relevant latent space from 58
cancer transcriptomes with variational au- 59
toencoders. Pacific Symposium on Bio- 60
computing. Pacific Symposium on Biocom- 61
puting, 23:80–91, 2018. ISSN 2335-6936, 62
2335-6928. doi: 10.1142/9789813235533\ 63
_0008. URL https://www.ncbi.nlm. 64
nih.gov/pubmed/29218871. 65
Lukas M. Weber and Mark D. Robinson. 66
Comparison of clustering methods for 67
high-dimensional single-cell flow and mass 68
cytometry data. Cytometry Part A, 89 69
(12):1084–1096, December 2016. ISSN 70
1552-4922. doi: 10.1002/cyto.a.23030. 71
URL https://onlinelibrary.wiley. 72
com/doi/full/10.1002/cyto.a.23030. 73
Lukas M. Weber, Malgorzata Nowicka, Char- 74
lotte Soneson, and Mark D. Robin- 75
son. diffcyt: Differential discovery 76
in high-dimensional cytometry via high- 77
resolution clustering. bioRxiv, page 78
349738, November 2018. doi: 10.1101/ 79
349738. URL https://www.biorxiv.org/ 80
content/10.1101/349738v2. 81
71
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Lukas M. Weber, Wouter Saelens, Robrecht1
Cannoodt, Charlotte Soneson, Alexander2
Hapfelmeier, Paul P. Gardner, Anne-Laure3
Boulesteix, Yvan Saeys, and Mark D.4
Robinson. Essential guidelines for compu-5
tational method benchmarking. Genome6
Biology, 20(1):125, June 2019. ISSN 1474-7
760X. doi: 10.1186/s13059-019-1738-8.8
URL https://doi.org/10.1186/9
s13059-019-1738-8.10
Caleb Weinreb, Samuel Wolock, Betsabeh K.11
Tusi, Merav Socolovsky, and Allon M.12
Klein. Fundamental limits on dynamic in-13
ference from single-cell snapshots. Proceed-14
ings of the National Academy of Sciences,15
115(10):E2467–E2476, March 2018. ISSN16
0027-8424, 1091-6490. doi: 10.1073/pnas.17
1714723115. URL https://www.pnas.18
org/content/115/10/E2467.19
Joshua Welch, Velina Kozareva, Ashley Fer-20
reira, Charles Vanderburg, Carly Martin,21
and Evan Macosko. Integrative inference22
of brain cell similarities and differences23
from single-cell genomics. bioRxiv, page24
459891, November 2018. doi: 10.1101/25
459891. URL https://www.biorxiv.org/26
content/10.1101/459891v1.27
Joshua D Welch, Alexander J Hartemink, and28
Jan F Prins. MATCHER: manifold align-29
ment reveals correspondence between single30
cell transcriptome and epigenome dynam-31
ics. Genome Biol., 18(1):138, July 2017.32
F Alexander Wolf, Philipp Angerer, and33
Fabian J Theis. SCANPY: large-scale34
single-cell gene expression data analysis.35
Genome Biol., 19(1):15, February 2018.36
F. Alexander Wolf, Fiona K. Hamey,37
Mireya Plass, Jordi Solana, Joakim S.38
Dahlin, Berthold Göttgens, Nikolaus Ra-39
jewsky, Lukas Simon, and Fabian J.40
Theis. PAGA: graph abstraction recon- 41
ciles clustering with trajectory inference 42
through a topology preserving map of sin- 43
gle cells. Genome Biology, 20(1):59, March 44
2019. ISSN 1474-760X. doi: 10.1186/ 45
s13059-019-1663-x. URL https://doi. 46
org/10.1186/s13059-019-1663-x. 47
Larry Xi, Alexander Belyaev, Sandra Spur- 48
geon, Xiaohui Wang, Haibiao Gong, Robert 49
Aboukhalil, and Richard Fekete. New li- 50
brary construction method for single-cell 51
genomes. PLoS One, 12(7):e0181163, July 52
2017. 53
Li Charlie Xia, Dongmei Ai, Hojoon Lee, 54
Noemi Andor, Chao Li, Nancy R Zhang, 55
and Hanlee P Ji. SVEngine: an efficient 56
and versatile simulator of genome struc- 57
tural variations with features of cancer 58
clonal evolution. Gigascience, 7(7), July 59
2018. 60
Li Yang and P Charles Lin. Mechanisms that 61
drive inflammatory tumor microenviron- 62
ment, tumor heterogeneity, and metastatic 63
progression. Semin. Cancer Biol., 47:185– 64
195, December 2017. 65
Z Yang. Maximum likelihood phylogenetic es- 66
timation from DNA sequences with variable 67
rates over sites: approximate methods. J. 68
Mol. Evol., 39(3):306–314, September 1994. 69
Yinyin Yuan. Spatial heterogeneity in the tu- 70
mor microenvironment. Cold Spring Harb. 71
Perspect. Med., 6(8), August 2016. 72
Simone Zaccaria, Mohammed El-Kebir, Gun- 73
nar W. Klau, and Benjamin J. Raphael. 74
The Copy-Number Tree Mixture Deconvo- 75
lution Problem and Applications to Multi- 76
sample Bulk Sequencing Tumor Data. In 77
S. Cenk Sahinalp, editor, Research in 78
Computational Molecular Biology, Lecture 79
Notes in Computer Science, pages 318–335. 80
72
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
Springer International Publishing, 2017.1
ISBN 978-3-319-56970-3.2
H Zafar, N Navin, K Chen, and L Nakhleh.3
SiCloneFit: Bayesian inference of popula-4
tion structure, genotype, and phylogeny of5
tumor clones from single-cell genome se-6
quencing data. bioRxiv, 2018.7
Hamim Zafar, Yong Wang, Luay Nakhleh,8
Nicholas Navin, and Ken Chen. Mono-9
var: single-nucleotide variant detection in10
single cells. Nature Methods, 13(6):505–11
507, June 2016. ISSN 1548-7105. doi:12
10.1038/nmeth.3835. URL https://www.13
nature.com/articles/nmeth.3835.14
Hamim Zafar, Anthony Tzen, Nicholas Navin,15
Ken Chen, and Luay Nakhleh. SiFit: infer-16
ring tumor trees from single-cell sequenc-17
ing data under finite-sites models. Genome18
Biol., 18(1):178, September 2017.19
Hans Zahn, Adi Steif, Emma Laks, Peter20
Eirew, Michael VanInsberghe, Sohrab P21
Shah, Samuel Aparicio, and Carl L Hansen.22
Scalable whole-genome single-cell library23
preparation without preamplification. Nat.24
Methods, 14(2):167–173, February 2017a.25
Hans Zahn, Adi Steif, Emma Laks, Peter26
Eirew, Michael VanInsberghe, Sohrab P.27
Shah, Samuel Aparicio, and Carl L.28
Hansen. Scalable whole-genome single-29
cell library preparation without preampli-30
fication. Nature Methods, 14(2):167–173,31
February 2017b. ISSN 1548-7105. doi:32
10.1038/nmeth.4140. URL https://www.33
nature.com/articles/nmeth.4140.34
Luke Zappia, Belinda Phipson, and Alicia35
Oshlack. Splatter: simulation of single-cell36
RNA sequencing data. Genome Biol., 1837
(1):174, September 2017.38
Ron Zeira and Ron Shamir. Genome 39
Rearrangement Problems with Sin- 40
gle and Multiple Gene Copies : A 41
Review. Not clear where this was 42
initially published and whether it is 43
peer-reviewed., 2018. URL https: 44
//pdfs.semanticscholar.org/85e6/ 45
7eb03d1b3d004c60a12df08c1f937fbaa974. 46
pdf. 47
Amit Zeisel, Hannah Hochgerner, Peter 48
Lönnerberg, Anna Johnsson, Fatima 49
Memic, Job van der Zwan, Martin Häring, 50
Emelie Braun, Lars E. Borm, Gioele 51
La Manno, Simone Codeluppi, Alessan- 52
dro Furlan, Kawai Lee, Nathan Skene, 53
Kenneth D. Harris, Jens Hjerling-Leffler, 54
Ernest Arenas, Patrik Ernfors, Ulrika 55
Marklund, and Sten Linnarsson. Molec- 56
ular Architecture of the Mouse Nervous 57
System. Cell, 174(4):999–1014.e22, 58
August 2018. ISSN 0092-8674. doi: 59
10.1016/j.cell.2018.06.021. URL http: 60
//www.sciencedirect.com/science/ 61
article/pii/S009286741830789X. 62
Allen W. Zhang, Ciara O’Flanagan, Eliz- 63
abeth Chavez, Jamie LP Lim, Andrew 64
McPherson, Matt Wiens, Pascale Wal- 65
ters, Tim Chan, Brittany Hewitson, Daniel 66
Lai, Anja Mottok, Clementine Sarkozy, 67
Lauren Chong, Tomohiro Aoki, Xue- 68
hai Wang, Andrew P. Weng, Jessica N. 69
McAlpine, Samuel Aparicio, Christian 70
Steidl, Kieran R. Campbell, and Sohrab P. 71
Shah. Probabilistic cell type assignment 72
of single-cell transcriptomic data reveals 73
spatiotemporal microenvironment dynam- 74
ics in human cancers. bioRxiv, page 75
521914, January 2019a. doi: 10.1101/ 76
521914. URL https://www.biorxiv.org/ 77
content/10.1101/521914v1. 78
Chao Zhang. Single-Cell Data Analysis Us- 79
ing MMD Variational Autoencoder. April 80
73
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
2019. URL https://www.biorxiv.org/1
content/10.1101/613414v1.abstract.2
Huanan Zhang, Catherine A A Lee, Zhuliu Li,3
John R Garbe, Cindy R Eide, Raphael Pe-4
tegrosso, Rui Kuang, and Jakub Tolar. A5
multitask clustering approach for single-cell6
RNA-seq analysis in recessive dystrophic7
epidermolysis bullosa. PLoS Comput. Biol.,8
14(4):e1006053, April 2018.9
Jesse M. Zhang, Govinda M. Kamath,10
and David N. Tse. Valid post-clustering11
differential analysis for single-cell RNA-12
Seq. bioRxiv, page 463265, June13
2019b. doi: 10.1101/463265. URL14
https://www.biorxiv.org/content/10.15
1101/463265v3.16
Jianzhi Zhang, Rasmus Nielsen, and Ziheng17
Yang. Evaluation of an improved branch-18
site likelihood method for detecting posi-19
tive selection at the molecular level. Mol.20
Biol. Evol., 22(12):2472–2479, December21
2005.22
Jingsong Zhang, Jessica J. Cunningham,23
Joel S. Brown, and Robert A. Gatenby.24
Integrating evolutionary dynamics into25
treatment of metastatic castrate-resistant26
prostate cancer. Nature Communications, 827
(1):1816, November 2017. ISSN 2041-1723.28
doi: 10.1038/s41467-017-01968-5. URL29
https://www.nature.com/articles/30
s41467-017-01968-5.31
L. Zhang and S. Zhang. Comparison of com-32
putational methods for imputing single-33
cell RNA-sequencing data. IEEE/ACM34
Transactions on Computational Biology35
and Bioinformatics, pages 1–1, 2018. ISSN36
1545-5963. doi: 10.1109/TCBB.2018.37
2848633.38
L Zhang, X Cui, K Schmitt, R Hubert, W Na-39
vidi, and N Arnheim. Whole genome am-40
plification from a single cell: implications 41
for genetic analysis. Proc. Natl. Acad. Sci. 42
U. S. A., 89(13):5847–5851, July 1992. 43
Xiao-Fei Zhang, Le Ou-Yang, Shuo Yang, 44
Xing-Ming Zhao, Xiaohua Hu, and Hong 45
Yan. EnImpute: imputing dropout 46
events in single cell RNA sequencing 47
data via ensemble learning. Bioinfor- 48
matics, May 2019c. ISSN 1367-4803, 49
1367-4811. doi: 10.1093/bioinformatics/ 50
btz435. URL http://dx.doi.org/10. 51
1093/bioinformatics/btz435. 52
Xiaoyan Zhang, Sadie L Marjani, Zhaoyang 53
Hu, Sherman M Weissman, Xinghua Pan, 54
and Shixiu Wu. Single-Cell sequencing 55
for precise cancer research: Progress and 56
prospects. Cancer Res., 76(6):1305–1312, 57
March 2016. 58
Xiuwei Zhang, Chenling Xu, and Nir Yosef. 59
SymSim: simulating multi-faceted variabil- 60
ity in single cell RNA sequencing. bioRxiv, 61
page 378646, April 2019d. doi: 10.1101/ 62
378646. URL https://www.biorxiv.org/ 63
content/10.1101/378646v3. 64
Yifan Zhang and Feng Liu. Multidimensional 65
Single-Cell analyses in organ development 66
and maintenance. Trends Cell Biol., March 67
2019. 68
Grace X. Y. Zheng, Jessica M. Terry, Phillip 69
Belgrader, Paul Ryvkin, Zachary W. 70
Bent, Ryan Wilson, Solongo B. Ziraldo, 71
Tobias D. Wheeler, Geoff P. McDer- 72
mott, Junjie Zhu, Mark T. Gregory, Joe 73
Shuga, Luz Montesclaros, Jason G. Under- 74
wood, Donald A. Masquelier, Stefanie Y. 75
Nishimura, Michael Schnall-Levin, Paul W. 76
Wyatt, Christopher M. Hindson, Rajiv 77
Bharadwaj, Alexander Wong, Kevin D. 78
Ness, Lan W. Beppu, H. Joachim Deeg, 79
Christopher McFarland, Keith R. Loeb, 80
74
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019
William J. Valente, Nolan G. Ericson,1
Emily A. Stevens, Jerald P. Radich, Tar-2
jei S. Mikkelsen, Benjamin J. Hindson,3
and Jason H. Bielas. Massively paral-4
lel digital transcriptional profiling of sin-5
gle cells. Nature Communications, 8:14049,6
January 2017. ISSN 2041-1723. doi: 10.7
1038/ncomms14049. URL https://www.8
nature.com/articles/ncomms14049.9
Lingxue Zhu, Jing Lei, Bernie Devlin, and10
Kathryn Roeder. A UNIFIED STATISTI-11
CAL FRAMEWORK FOR SINGLE CELL12
AND BULK RNA SEQUENCING DATA.13
The annals of applied statistics, 12(1):609–14
632, March 2018. ISSN 1932-6157. doi:15
10.1214/17-AOAS1110. URL http://dx.16
doi.org/10.1214/17-AOAS1110.17
Rapolas Zilionis, Juozas Nainys, Adrian18
Veres, Virginia Savova, David Zemmour,19
Allon M Klein, and Linas Mazutis. Single-20
cell barcoding and sequencing using droplet21
microfluidics. Nat. Protoc., 12(1):44–73,22
January 2017.23
Chenghang Zong, Sijia Lu, Alec R Chapman,24
and X Sunney Xie. Genome-wide detection25
of single-nucleotide and copy-number vari-26
ations of a single human cell. Science, 33827
(6114):1622–1626, December 2012.28
75
PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.27885v3 | CC BY 4.0 Open Access | rec: 23 Aug 2019, publ: 23 Aug 2019