Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
1
Universality of cell differentiation trajectories revealed by
a reconstruction of transcriptional uncertainty landscapes
from single-cell transcriptomic data
Nan Papili Gao1, Olivier Gandrillon 2,3, András Páldi 4, Ulysse Herbach 5, Rudiyanto Gunawan 6,*
1 Institute for Chemical and Bioengineering, ETH Zurich, 8093 Zurich, Switzerland
2 Univ Lyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, Laboratory of Biology
and Modelling of the Cell, F69007 Lyon, France
3 Inria Team Dracula, Inria Center Grenoble Rhône-Alpes, France
4 Ecole Pratique des Hautes Etudes, PSL Research University, St-Antoine Research Center, Inserm U938, 34 rue
Crozatier, 75012, Paris, France
5 Université de Lorraine, CNRS, Inria, IECL, F-54000 Nancy, France
6 Department of Chemical and Biological Engineering, University at Buffalo, Buffalo, NY 14260, USA
*Corresponding author (email: [email protected])
ABSTRACT
We employed our previously-described single-cell gene expression analysis CALISTA (Clustering And Lineage
Inference in Single-Cell Transcriptional Analysis) to evaluate transcriptional uncertainty at the single-cell level
using a stochastic mechanistic model of gene expression. We reconstructed a transcriptional uncertainty landscape
during cell differentiation by visualizing single-cell transcriptional uncertainty surface over a two dimensional
representation of the single-cell gene expression data. The reconstruction of transcriptional uncertainty landscapes
for ten publicly available single-cell gene expression datasets from cell differentiation processes with linear, single
or multi-branching cell lineage, reveals universal features in the cell differentiation trajectory that include: (i) a
peak in single-cell uncertainty during transition states, and in systems with bifurcating differentiation trajectories,
each branching point represents a state of high transcriptional uncertainty; (ii) a positive correlation of
transcriptional uncertainty with transcriptional burst size and frequency; (iii) an increase in RNA velocity
preceeding the increase in the cell transcriptional uncertainty. Finally, we provided biological interpretations of
the universal rise-then-fall profile of the transcriptional uncertainty landscape, including a link with the
Waddington’s epigenetic landscape, that is generalizable to every cell differentiation system.
Keywords: single cell, cell differentiation, epigenetic landscape, gene expression, heterogeneity
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
2
MAIN
Cell differentiation is the process through which unspecialized stem cells become more specialized. Because
of its importance in development, cellular repair, and organismal homeostasis, the molecular mechanisms of cell
differentiation has been the subject of intense scrutiny. Since roughly 50 years ago – along with the promulgation
of the central dogma of molecular biology by Francis Crick and the characterization of the lactose operon by
François Jacob and Jacques Monod – the existence of a genetic program has become a prevailing explanation for
the cell differentiation process. Although the details were originally not defined, at least not formally, such a
genetic program purports a constellation of master genes (i.e., transcription factors) that orchestrate the
transcription of downstream target genes in a precise spatiotemporal fashion, resulting in long-lasting alterations
in the gene expression patterns 1–3. A notable experimental evidence substantiating this view is the overexpression
of myoD inducing a myogenic phenotype in seemingly naive cells 4. Over the past few decades, the repertoire of
such master genes across numerous stem cell systems, such as Nanog, Oct4, Sox2, BATF and MyoD, begin to
coalesce 5–7.
Recent advances in single-cell technologies has revealed new aspects of the cell differentiation that are
incompatible with the idea of ordered and programmed (i.e., deterministic) gene expression. More specifically,
single-cell data paint a stochastic differentiation process that increases cell-to-cell variability of gene expression.
Such an observation has been made for a wide variety of cell differentiation systems, including chicken erythroid
progenitors 8, erythroid myeloid lymphoid (EML) cells 9, mouse embryonic stem cells (mESCs) 10,11, and human
CD34+ cells 12. Interestingly, a similar increase of gene expression variation was also observed during the de-
differentiation of somatic cells into iPSCs 13. Stochastic gene expression also appears to have functional role
beyond cell differentiation systems. For example, an increase in cell-to-cell variability of gene expression has
been reported during a forced adaptation of budding yeast cells to unforeseen challenges 14.
Based on the observations of single-cell data above, a different view of cell differentiation begins to
materialize. Instead of stem cells following an identical genetic program, the cell differentiation is akin to a
dynamic exploratory process. More specifically, in this view, the cell differentiation is thought to proceed as
follows 14–17:
I) extrinsic and/intrinsic internal stimuli, such as a medium change or the addition of new molecules in the external
medium, trigger a cellular response that destabilizes the initial high potency cell state;
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
3
II) each cell alters its internal cell state and engages an exploratory dynamics through a combination of the inherent
stochastic dynamics of gene transcription and the emergence of new stable cell state(s). At the cell population
level, we observe a rise in the cell-to-cell variability of gene expression;
III) a physiological selection / commitment to one stable lineage among possibly multiple lineages that arise from
the degeneracy of the system;
IV) finally, a reduction in the exploratory dynamics commences along with the establishment of stable cell state(s)
corresponding to differentiated cell type(s).
The above view is compatible with the idea that cell phenotype transition results from the dynamics of an
underlying stochastic molecular network 18,19. In 1957, Conrad Waddington proposed the presently well-known
epigenetic landscape that likens the cell differentiation process to a ball rolling on a downward sloping surface,
starting from a state of high cell potency and ending at one of possibly several states of low cell potency. The
landscape itself is shaped by the action of the genes and gene network – depicted in the less-frequently-shown
part B of Waddington’s original figure as a network of ropes that are tied to the surface, creating valleys and hills.
Although the epigenetic landscape was originally proposed only as a metaphor of how gene regulation governs
the cell differentiation process, this landscape has been formalized within the framework of dynamical systems
theory 20. The valleys in the Waddington’s epigenetic landscape are equated to stable states of a dynamical system,
called attractors, while the hills are often interpreted as energetic barriers.
The Waddington’s metaphor has further been re-examined in a series of works, where the behaviour of the
underlying dynamical gene regulatory network (GRN) drives cell fate determination. Some of these studies
employed stochastic simulations of simple gene networks with bi-/tri-stable states to describe the role of non-
genetic cell heterogeneity in cell differentiation processes 21–23. Here, to quantitatively reconstruct the epigenetic
landscape, an underlying GRN structure must be assumed a priori. In reality, the GRN driving cell fate
determination is complex and proved to be challenging to infer from data 6,24–28. Even when the complete
regulatory system is known, the high-dimensionality of the parameter space makes the landscape generation
computationally prohibitive 21, especially in the absence of an analytical solution, which are slowly emerging 29.
Thus, the above approaches provide more of a conceptual understanding than a mechanistic or molecular
explanation.
A number of recent studies provided a graphical representation of the differentiation process based on single-
cell transcriptomic data that conforms with the Waddington’s epigenetic landscape 30–34. More specifically, these
studies reconstructed the epigenetic landscape from single-cell gene expression data using probabilistic and quasi-
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
4
potential methods, for example by applying Hopfield neural networks 30,31, a cell-density based strategy 33,
network entropy measurements 34 or more recently Large Deviation Theory 29. However, with the exception of
Fard et al. 31 and Lv et al., 2014 29, the aforementioned studies produced monotonic descent passages during cell
differentiation, mimicking closely the Waddington’s epigenetic landscape metaphor (see for example 22,34). Also,
none of the above studies consider directly the cellular mechanism that generates stochastic gene transcriptional
bursts.
In the present work, we aimed to shed light on the gene transcriptional mechanism behind the rise-then-fall
trajectory of cell-to-cell variability in gene expression observed during the cellular differentiation process. To this
end, we analyzed a collection (8) of published single-cell transcriptomic datasets from various cell differentiation
systems, comprising both single-cell RT-qPCR (scRT-qPCR) 8,10,12,35–37 and single-cell RNA-sequencing
(scRNA-seq) 38,39. We employed a likelihood-based analysis using a recent method CALISTA (Clustering And
Lineage Inference in Single-cell Transcriptomics Analysis) 40. The analysis relied on a mechanistic model of the
stochastic gene transcriptional bursts to characterize single-cell gene expression distribution. We defined a new
concept, called transcriptional uncertainty landscape, based on the cell likelihood value from CALISTA analysis
to characterize the stochastic dynamics of the gene transcription process during cell differentiation. The stochastic
gene transcriptional model enabled identifying the specific parameters or mechanisms that explain the observed
changes in the the gene transcriptional uncertainty at the single-cell level 41. For two additional single-cell datasets,
we also evaluated the single-cell RNA-velocity using the recently published Velocyto method42. The two-state
model parameter analysis, combined with RNA-velocities, provided insights into the mechanism regulating cell
fate decisions, specifically on the role of stochastic gene transcriptions in the differentiation processes and on the
possible mechanism generating this stochasticity.
RESULTS
Single-cell Transcriptional Uncertainty Landscape using CALISTA
In this work, we used CALISTA, a likelihood-based bioinformatics toolbox designed for an end-to-end
analysis of single-cell gene expression data, to evaluate the transcriptional uncertainty of each individual cell
based on its gene expression data 40. CALISTA uses the two-state model of stochastic gene transcription bursts to
characterize the distribution of mRNA counts in individual cells 43. In the model, a gene promoter stochastically
switches between ON and OFF state, and only in the ON state can gene transcription occur. The distribution of
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
5
mRNA depends on four model parameters: qon (the rate of promoter activation), qoff (the rate of promoter
inactivation), qt (the rate of mRNA production when the promoter is on the ON state), and qd (the rate constant of
mRNA degradation) 27,44(see Fig. 1). For example, when qoff >>qon and qoff >>qd, keeping qt /qoff fixed, mRNA are
produced through bursts of short but intense transcription, which is a typical case observed for gene transcriptions
in single cells 45,46.
Our analysis procedure using CALISTA involves four main steps: cell clustering, lineage inference, and
calculation of single-cell transcriptional uncertainty. In the analysis, each cell is assigned a likelihood value, which
is computed from the probability of its gene expression (mRNA counts) based on the mRNA distribution given
by the two-state model above. In the single-cell clustering, we employed a greedy algorithm to find single-cell
clusters that maximizes the total likelihood value for the cells. In the lineage progression inference, we connected
cell clusters sharing similar gene expression distributions – by computing cluster distances based on cell likelihood
– to form a hierarchical network. Here, we assigned each cell to an edge in the lineage progression network that
is pointing to or emanating from its respective cluster, such that its single-cell likelihood is maximized. Finally,
we computed single-cell transcriptional uncertainty as the negative logarithm of the single-cell likelihood (NLL)
value. The full detail of CALISTA can be found in a recent publication 40.
The single-cell likelihood value reflects the joint probability of its gene expression repertoire. A cell with a
low likelihood value may indicate that the gene expression of the cell is different from its neighboring cells, i.e.
the cell is an outlier. But, more interestingly, a low likelihood value may also correspond to a cell state of high
uncertainty in the gene expression. The group of cells in such high uncertainty state have gene expressions that
are dissimilar to each other, and thus, the gene expression distributions will have high entropy. As mentioned
above, in this work we used the negative logarithm of the single-cell likelihood value as a metric of single-cell
transcriptional uncertainty. By plotting the single-cell transcriptional uncertainty over the two-dimensional
projection of the single-cell transcriptomics data – for example, using the first two principal components from
Principal Component Analysis (PCA), we constructed a transcriptional uncertainty landscape in the form of a
surface plot of the NLL values. By visualizing the single-cell transcriptional uncertainty as a surface plot, we can
study the landscape of transcriptional uncertainty during cell differentiation at single-cell resolution. On the
single-cell transcriptional uncertainty surface, an aberrant cell can be easily distinguished from a cell of high
uncertainty state, since an aberrant cell will appear isolated from its nearby cells and will be located on a region
with high NLL.
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
6
Transcriptional uncertainty landscape of iPSC cell differentiation to cardiomyocytes
In the following, we demonstrated an application of our procedure described above to a single-cell
transcriptional dataset from cardiomyocytes differentiation from human induced pluripotent stem cells (iPSCs) 35.
The single-cell clustering of CALISTA returned five clusters 40, in good agreement with the number of cell types
reported in the original study. CALISTA identified one bifurcation event in the lineage progression, which led to
two cell lineages 35. Fig. 2a-b gives the single-cell transcriptional uncertainty landscape showing cells exiting the
initial epiblast state that is characterized by a valley in the landscape, passing through a hill of high transcriptional
uncertainty corresponding to primitive streak (PS)-like progenitor state, before ending up at one of the low
transcriptional uncertainty terminal states corresponding to either mesodermal (desired) or endodermal
(undesired) fate. As depicted in Fig, 2c, the intermediate cell cluster (cluster 2) comprising PS-like cells have
higher cell uncertainty (lower single-cell likelihood) than the other clusters. Figures 2d and 2e give the moving-
averaged uncertainty values for pseudotemporally ordered cells using a moving window of 10% of the total cells
for both endodermal and mesodermal paths, respectively. The moving-averaged transcriptional uncertainty for
the two differentiation paths follows a rise-then-fall trajectory where the peak of uncertainty coincides with the
lineage bifurcation event.
We explored whether the rise-then-fall in uncertainty is an artefact from using the two-state model to evaluate
the cell likelihood values. To this end, we implemented a modified version of the algorithm for ordering cells by
calculating the cell likelihood values using the empirical (observed) distribution, instead of the analytical
distribution from the two-state model. As shown in Supplementary Fig. S1, the transcriptional uncertainty
landscape from the modified implementation shows a strong resemblance to the original one. We also investigated
whether the number of clusters may affect the landscape, in which using too few of clusters may artificially inflate
the uncertainty due to mixing of cells from different states. We reran CALISTA by using a higher number of
clusters (set to nine based on the eigengap heuristic 47). The hill in the uncertainty landscape is again seen around
the bifurcation event upon using a higher number of cell clusters (Supplementary Fig. S2). Finally, we used a
different algorithm to cluster cells, specifically using a Laplacian-based clustering algorithm SIMLR 48, to test
whether the shape of the transcriptional uncertainty landscape changes with the clustering algorithm. The single-
cell clusters can be interpreted as the transitional states that the differentiating cells go through. Starting with the
result of SIMLR cell clustering, we then generated the lineage progression and estimated the cell likelihood values
using CALISTA. The transcriptional uncertainty landscape from SIML cell clustering has the same shape as that
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
7
in Fig. 2a-b, demonstrating that the transcriptional uncertainty landscape observed above is not dependent on
using CALISTA for cell clustering (Supplementary Fig. S3).
To further elucidate the role of specific genes in shaping the transcriptional uncertainty landscape, we looked
at the transcriptional uncertainty associated with individual genes. Fig. 2f depicts the NLL distribution of each
gene for the five single-cell clusters. As expected, cell in cluster 2 have generally higher NLL than those in the
other clusters. Fig. 2f clearly illustrates that within cluster 2, some genes show higher NLL values than the others
(Supplementary Figure S4). To identify the important genes related to transcriptional uncertainty, we identified
genes with NLL values exceeding a threshold 𝛿 for at least 30% of the cells in each cluster, where 𝛿 is set to 3
standard deviation above the overall mean NLL for all cells and genes in the dataset (see Methods Eq. (2)). None
of the genes in clusters 1, 4 and 5 have NLL above the threshold. Meanwhile, 16 and 8 genes in clusters 2 and 3,
respectively, pass the above criterion for high uncertainty with 4 common genes between the two gene sets
(Supplementary Table S1). Genes with high transcriptional uncertainty in cluster 2 may have functional roles in
cell fate determination. The gene set of cluster 2 includes known genes upregulated only in the PS-like state (e.g.
EOMES, GSC, MESP1 and MIXL1), as well as markers of mesodermal and endodermal cells (e.g. BMP4,
HAND1, and SOX17) 35,40 (Supplementary Figure S5). Meanwhile, the main contributors to cell uncertainty in
cluster 3 (e.g. BMP4 and MYL4 35,40) are known transition genes between PS-like cells and the final mesoderm
fate (Supplementary Figure S6). Fig. 2g depicts the protein-protein interaction (PPI) network related to the gene
set of cluster 2 using STRING (minimum required interaction score of 0.4) 49, indicating that these genes form a
strongly interconnected hub of known transcription factors and molecules involved in the signal transduction of
embryonic development (Supplementary Table S1).
Transcriptional uncertainty landscapes of cell differentiation
We further applied the procedure above to seven additional single-cell transcriptomic datasets that were
generated using scRT-qPCR 8,10,12,36,37 and scRNA-sequencing 38,39, to assess the universality of the rise-then-fall
feature of single-cell transcriptional uncertainty landscape during cell differentiation. The first of these datasets
came from 405 cells during mouse embryonic fibroblast (MEF) reprogramming into induced neural (iN) and
myogenic (M) cells 38. Like the iPSC differentiation above, the lineage progression has a single bifurcation point.
As depicted in Figure 3a, the single-cell transcriptional uncertainty increases from the initial MEF state and
reaches a peak around the bifurcation before decreasing toward two end-point cell fates. The rise-then-fall of
transcriptional uncertainty in the MEF reprogramming is in good agreement with what we observed in the iPSCs
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
8
differentiation above. Higher entropy of gene expression distribution in a cell population has also been reported
in the reprogramming of iPSCs13.
Next, we analyzed datasets from cell differentiation processes without a lineage bifurcation and with
multiple lineage bifurcations. Three scRT-qPCR datasets came from differentiation systems without bifurcation,
including Richard et al. study on chicken erythrocytic differentiation of T2EC cells 8, Stumpf et al. study on
differentiation of mouse embryonic stem cells (ESC) to neural progenitor cells (NPC) 10, and Moussy et al. study
during CD34+ cell differentiation 12. The single-cell clustering and lineage progression by CALISTA produced
the expected cell differentiation trajectory (see Supplementary Figure S7-9). The single-cell transcriptional
uncertainty landscapes of these three differentiation systems, as shown in Figure 3b, exhibit a rise-then-fall profile,
creating a hill that the cells traverse through in the differentiation process. A transitory increase in single-cell gene
expression uncertainty was reported either directly or indirectly in the original publications. In Richard et al. 8 and
Stumpf et al 10, the authors adopted the Shannon entropy to quantify cell-to-cell variability (uncertainty), while
Moussy et al. 12 reported an unstable transition state with ‘hesitant cells’ flipping their morphology between
polarized and round shapes before committing to the common myeloid progenitors-like fate. Morphological
uncertainty therefore corresponded to a higher transcriptional uncertainty. Note that Moussy et al. study looked
at only the initial phase of the (hematopoietic) cell differentiation, and thus, it is likely that the differentiation
process had not completed for the cells in the dataset.
The next set of single-cell gene expression data came from differentiation systems with multi-branching
lineage, including Guo et al. study during mouse embryo development from zygote to blastocyst 36, Nestorowa et
al. 39 and Moignard et al. 37 studies on hematopoietic stem cell differentiation. Figure 3c show the single-cell
transcriptional landscape for each of the datasets. For Guo et al. study, we identified 7 cell clusters and identified
two bifurcations in the lineage. Here, we observed two hills in the transcriptional uncertainty landscape, each
coinciding with a bifurcation event in the lineage progression – one at 32-cell stage (cluster 2 to cluster 3 and 4)
and another at 64-cell stage (cluster 4 to cluster 6 and 7) (see Supplementary Figure S10). For Nestorowa et al. 39
(Supplementary Figure S11) and Moignard et al. 37 (see Methods and 40) datasets, we again observed peaks in the
transcriptional uncertainty landscape that colocalize with the bifurcation points in the lineage progression.
The use of the two-state mechanistic gene transcriptional model within CALISTA enabled us to probe into a
mechanistic explanation for the observed shape of the transcriptional uncertainty landscape. Table 1 show the
pairwise Pearson correlations between the cell-averaged NLL of each cluster with two biologically interpretable
model parameters, namely transcriptional burst size (number of transcripts generated in each burst) and burst
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
9
frequency (occurrence of burst per unit time) 50 (see Methods). The Pearson correlations indicate that the single-
cell gene expression uncertainty increases with higher burst size and burst frequency (p-value ≤ 0.01). Higher
transcriptional burst size and frequency are associated with a lower qoff – a lower rate of promoter turning off –
and a greater qon – higher rate of promoter turning on. One possible explanation for such change in model
parameters is a higher chromatin accessibility during the transition period of cell differentiation. This finding is
consistent with the view that stem cells increase its gene expression uncertainty or stochasticity by adopting a
more open chromatin state to enable the exploration of the gene expression space 33,50–52.
Coupling cell uncertainty with RNA velocity
In a recent paper 42, La Manno and colleagues introduced the concept of RNA velocity, which involves
computing the rate of change of mRNA from the ratio of unspliced to spliced mRNA. A positive RNA velocity
indicates an induction of gene expression, while a negative RNA velocity indicates a repression of gene
expression. La Manno et al. demonstrated that RNA velocities are able to predict the trajectory of cells undergoing
a dynamical transition, such as in circadian rhythms or cell differentiation. In the following, we explored the
relationship between RNA velocities and single-cell transcriptional uncertainty.
We evaluated the single-cell transcriptional uncertainty and RNA velocity for two single-cell gene expression
datasets that were previously analyzed in La Manno et al. 42; 42. The first dataset came from human glutamatergic
neurogenesis which has a linear (non-bifurcating) lineage progression. Figure 4 (top row) depicts the cell
clustering, single-cell transcriptional uncertainty, and RNA velocities (see also Supplementary Figure S12). The
single-cell transcriptional uncertainty landscape again has the rise-then-fall shape, as in the other cell
differentiation systems discussed above. Interestingly, the same rise-then-fall profile is also seen in the RNA
velocity. As illustrated in Figure 4, the increase and decrease of the RNA velocity preceed the transcriptional
uncertainty, and the peak of RNA velocity occurs prior to those of the transcriptional uncertainty (see
Supplementary File S1 for an animated illustration). Furthermore, a gene-wise cross-correlation analysis confirms
a positive correlation between RNA velocity and single-cell transcriptional uncertainty with a delay for individual
genes (see Supplementary Figure S13).
We also compared RNA velocity and single-cell transcriptional uncertainty for another dataset from mouse
hippocampal neurogenesis with a multi-branching lineage 53. Figure 4 (bottom row) shows that like in the
neurogenesis dataset earlier, the RNA velocity increases and then decreases during cell differentiation, and the
change in the RNA precede that of the transcriptional uncertainty (see Supplementary File S2). Also, the RNA
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
10
velocity peaks take place before the transcriptional uncertainty peaks. The rise-then-fall dynamic of the RNA
velocity seen in the two datasets above is consistent with the view that cells engage in an exploratory stochastic
dynamics as they leave the progenitor state, and disengage this explorative mode as they reach toward the final
cell state.
DISCUSSION
Although Waddington’s epigenetic landscape was originally proposed only as a metaphor, the landscape has
helped stem cell researchers to conceptualize the cell differentiation processes through canalization of cell
lineages. As mentioned earlier, much of the existing literature on the analytical reconstruction of the epigenetic
landscape relied on either a dynamical system theory applied to a simple gene network, or a thermodynamic
interpretation based on the potential energy of a reaction 21–23. In our study, we did not make any prior assumptions
on the gene regulatory network driving the differentiation process nor on the characteristics of the landscape, such
as the existence of a stable valley or that of an energetic barrier (hill). Here, we assumed that the gene transcription
at the single-cell level occurs as stochastic transcriptional bursts, in which mRNA counts follow the equilibrium
distribution that is given by the two-state gene transcription model 43. Therefore, the transcriptional uncertainty
landscape in our study is a reflection of the dynamic evolution of gene transcriptional stochasticity within
individual cells during the cell differentiation process.
The reconstruction of the transcriptional uncertainty landscapes from 10 single-cell transcriptomic datasets
of various cell differentiation processes in our study reveals a universal rise-then-fall trajectory in which cells start
from a high potency state with a uniform gene expression pattern in the cell population, then progress through
transitional cell state(s) marked by increased transcriptional uncertainty (i.e., higher cell-to-cell variability), and
eventually reach one of possibly several final cell states with again a uniform gene expression pattern among the
cells. Furthermore, the peaks of the transcriptional uncertainty landscape colocalize with forks in the cell lineage.
The rise-then-fall in cell uncertainty agrees well with other reports from different cell differentiation systems 8–
12,54,55, suggesting that stem cells go through a transition state of high gene expression uncertainty before
committing to a particular cell fate. The existence of a hill or barrier during the intermediate stage of cell
differentiation has also been proposed in previous studies 14,31,56. In particular Moris and colleagues compared this
transition state to the activation energy barrier in chemical reactions 56. We noted however, that a hill in our
transcriptional uncertainty landscape is a reflection of a peak in the cell-to-cell gene expression variability, and
thus does not represent a resistance or barrier that a cell has to overcome.
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
11
In the analysis of iPSCs differentiation into cardiomyocytes 35, the key genes regulating cardiomyocyte
differentiation are among the largest contributors to the overall transcriptional uncertainty at or around the peak
in the landscape, supporting the idea that dynamic cell-to-cell variability has a functional role in cell-fate decision
making processes 21,57,58. Such an idea would be in congruence with the recent demonstration that, in a
physiologically relevant cellular system, gene expression variability is functionally linked to differentiation 57,58.
The rise-then-fall trajectory in the transcriptional uncertainty landscape are more pronounced in some datatsets
than in others. For example, in Nestorowa 39 and Moignard 37 datasets (see Figure 3c), peaks in the transcriptional
uncertainty landscape are less noticeable than in the other differentiation systems. We noted that cells in the
Nestorowa 39 and Moignard 37 studies were pre-sorted by using flow cytometry based on the expression of surface
protein markers. We posited that at least some cells in the transition state(s) might have been lost during the cell
pre-sorting since such cells might not express the chosen surface markers strongly.
Further, the correlation analysis between the cell transcriptional uncertainty and biologically meaningful rates
of the stochastic gene transcription model showed strong positive correlations with transcriptional burst size and
frequency. In agreement with the result of the correlation analysis, several studies have reported an increase in
gene transcriptional bursts during transition states in cell differentiation and other recent studies have suggested
that both burst frequency and burst size regulate gene expression levels 33,51,52. Importantly, our comparison of the
single-cell transcriptional uncertainty and the single-cell RNA velocity revealed that an increase (decrease) in
RNA velocity predicts an increase (decrease) in transcriptional uncertainty after a short delay, and that a peak of
RNA velocity preceeds that of the transcriptional uncertainty.
The aforementioned observations, while correlative in nature, points to possible biological mechanisms
underlying the universal dynamic feature of single-cell transcriptional uncertainty during cell differentiation. At
the start of the differentiation process, cells engage an exploratory search dynamics in the gene expression space
by increasing stochastic transcriptional burst size and burst frequency. The putative objective of such a stochastic
search is to optimize the cell’s gene expression pattern given its new environment. The engagement of this
stochastic exploratory mode is supported by the observed increased in the overall RNA velocity and its expected-
but-delayed effect in elevating the cell-to-cell gene expression variability (i.e. higher transcriptional uncertainty).
Increased transcriptional burst size and frequency suggest a promoter state of stochastic gene transcription that
favors the ON state (higher qon and lower qoff ).
A possible mechanism behind this exploratory search dynamics is an increase in chromatin mobility, driven by
metabolic alterations in early differentiation 59. Multiple studies have demonstrated that a mismatch between the
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
12
intracellular state of stem cells and their immediate environment can lead to metabolic reorganization 60–62.
More specifically, a change in the balance between glycolysis and OXPHOS metabolism has been associated to
numerous differentiation processes (see Richard et al. 63 and references therein).
Furthermore, changes in the metabolic flux state in early differentiation can modulate the activity of chromatin
modifying enzymes through their metabolic co-factors 64, or in more direct fashion 65 and alter the cell
differentiation outcome. A more dynamic state of the chromatin is associated with more variable gene expressions
due to the changes in the opening-closing dynamics (breathing) of the chromatin 66. As the cells approach the final
state, cells disengage the exploratory search mode, as the cells approach an optimal gene expression and metabolic
state associated with a chosen cell type.
The findings of our analysis fit within the paradigm of a stochastic stem cell differentiation process as
proposed in the introduction. The disordered gene expression pattern during the transition period can be seen as
an exploratory dynamics to find the optimal pattern(s) 14,17. The transcriptional uncertainty in our analysis can be
interpreted as the width of the valley in Waddington’s epigenetic landscape. If one considers the epigenetic
landscape as a depiction of the accessible gene expression subspace through with stochastic single-cell trajectories
pass during differentiation, a wider valley indicates a more variable gene expression pattern. While in the original
Waddington’s epigenetic landscape the valley naturally widens around the branching point in the cell lineage, our
analysis shows that a widening of the valley (an increase in transcriptional uncertainty) also occurs in non-
branching lineage. In other words, the increase in transcriptional uncertainty appears to be a universal feature of
the cell differentiation process, one that arises from the engagement of exploratory mode through increased
stochasticity in transcriptional bursts, as explained above.
METHODS
Main steps of CALISTA workflow
Herein, we briefly describe the main steps involved in the calculation of single-cell transcriptional uncertainty
using CALISTA (for further details see 40).
Pre-processing
Given an 𝑁 × 𝐺 single-cell expression matrix 𝑀, where N denotes the number of cells and G the number of genes,
the pre-processing in CALISTA involves two steps: a normalization of the expression data 𝑚!,# – i.e. the number
of transcripts of gene g in the n-th cell, and a selection of the most variable genes 40.
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
13
Cell clustering
CALISTA clustering follows a two-step procedure. The first step involves a greedy optimization strategy to find
cell clustering that maximizes the total cell likelihood, i.e. the sum of the likelihood value for all cells. The single-
cell likelihood value is computed as the joint probability of the cell’s gene expression data, which is set equal to
the product of the probabilities of the mRNA counts for the selected genes based on the mRNA distribution from
the two-state stochastic gene transcription model. To avoid issues with numerical overflow, we use the logarithm
of the cell likelihood. By performing the greedy optimization multiple times, a consensus matrix containing the
number of times two cells in the dataset are put in the same cluster, is generated.. In the second and final step,
CALISTA generates the cell cluster assignments by using k-medoids clustering based on the consensus matrix.
The final outcome of CALISTA’s clustering is the assignment of cells into K clusters and the optimal model
parameters for the two-state gene transcription model: 𝜃∗(𝑔, 𝑘) = .𝜃%!∗ , 𝜃%&&∗ , 𝜃'∗/#(
for each gene g in cluster k (for
simplicity, we set 𝜃)∗ = 1 and scale the other parameters by the degradation rate) 40.
Lineage progression inference
In CALISTA, cell lineage progression is inferred based on cluster distances – a measure of dissimilarity between
two clusters. The cluster distance of two cell clusters is defined as the average decrease in the cell likelihood value
if the cells from these two clusters are grouped as one cluster, as opposed to the original clustering. The lineage
progression graph is built by adding transition edges between pairs of clusters in increasing magnitude of cluster
distance until all clusters are connected to at least one other cluster, or based on user-specified criteria.
Single-cell transcriptional uncertainty
The last step in our analysis is to compute the final single-cell likelihood. Briefly, for each cell, we consider all
edges in the lineage progression graph that are adjacent to the cell’s respective cluster, i.e. edges that eminate
from or pointing to the cluster to which the cell belongs. The likelihood of a cell along an edge is evaluated by
interpolating the likelihood values of the cell’s gene expression using the mRNA distributions from the two
adjacent clusters. Each cell is then assigned to the edge along which its interpolated likelihood value is maximum,
and the final cell likelihood is set to this maximum value. As mentioned above, the single-cell transcriptional
uncertainty is evaluated as the negative logarithm of the cell likelihood value (NLL).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
14
Pseudotimes calculation
For we can evaluate the pseudotimes for the cells according to the following procedure. First, a pseudotime is
given to each cluster with a value between 0 (initial cell state) and 1 (final cell fate). Subsequently, we determine
the linear fractional position of each cell along its respective edge at which its interpolated likelihood value is
maximum (see Single-cell transcriptional uncertainty). The pseudotime of a cell is computed by a linear
interpolation of the pseudotimes of the two clusters adjacent to its assigned edge according to the cell’s linear
fractional position on this edge.
Epigenetic landscape reconstruction
To visualize the 3D transcriptional uncertainty landscape, we apply dimensional reduction techniques such as
Principal Component Analysis (PCA) or t-SNE on the z-scored expression data, to project the gene expression of
each individual cell on two dimensional axis, which gives the x-y axis of the landscape plot. For the z axis, we
plot the NLL values. The transcriptional uncertainty landscape surface is reconstructed by estimating local
approximation of individual cell 3D coordinates on a regular 30x30 grid by using a publicly available Matlab
surface fitting code called GRIDFIT 67.
Pre-processing and analysis of single-cell expression datasets
Bargaje et al. scRT-qPCR dataset.
The dataset includes the expression profiles of 96 genes from 1896 single cells at 8 different time points (day 0,
1, 1.5, 2, 2.5, 3, 4, 5) during the differentiation of human pluripotent stem cells (iPSCs) into either mesodermal
(M) or endodermal (En) fate 35. By employing CALISTA, we obtained five cell clusters and detected a bifurcation
event, which gives rise to the two final cell fates. After lineage inference, we pseudotemporally ordered cells
along the inferred differentiation paths (for more details, see 40).
Treutlein et al. scRNA-sequencing dataset.
The dataset includes the gene expression profiles of 405 cells during reprogramming of mouse embryonic
fibroblast (MEF) into a desired induced neural (iN) and an alternative myogenic (M) cell fates 38. We pre-
processed the data using CALISTA to select the 40 most variable genes (10% of the number of cells) for the
transcriptional uncertainty analysis. CALISTA identified four different subpopulations and successfully
recovered the bifurcation event (for more details, see 40)
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
15
Richard et al. scRT-qPCR dataset.
The dataset contains the expression profile of 91 genes measured from 389 cells at 6 distinct time points (0, 8, 24,
33, 48, 72 h) during the differentiation of primary chicken erythrocytic progenitor cells (T2EC) 8. Following
CALISTA pre-processing step, we removed cells in which less than 75% of the genes are expressed. Then, we
selected the subset of genes with at least one non-zero expression values. A total of 354 cells and 88 genes were
considered in the transcriptional uncertainty analysis. Based on eigengap heuristics 40,47, we grouped cells into 6
optimal clusters and ordered cells along the inferred linear trajectory (see Supplementary Figure S7).
Stumpf et al. scRT-qPCR dataset.
The dataset comprises the single-cell expression of 97 genes at 7 time points (0, 24, 48, 72, 96, 120, 168 h) during
neural differentiation of mouse embryonic stem cells (E14 cell line) 10. In the data pre-processing, we excluded
cells in which less than 70% of genes are expressed. Then, we selected genes with at least one non-zero expression
values. A total of 276 cells and 93 genes were considered for for the transcriptional uncertainty analysis. Based
on eigengap heuristics 40, we grouped cells into five optimal clusters and ordered cells along the inferred linear
trajectory (Supplementary Figure S8).
Moussy et al. scRT-qPCR dataset.
The single-cell expression dataset includes normalized Ct values for 91 genes in 435 cells captured at 5 distinct
time points (0, 24, 48, 72, 96 h) during human cord blood-derived CD34+ differentiation 12. We employed
CALISTA to group cells into 7 clusters, reconstruct the developmental trajectory and calculate pseudotimes
(Supplementary Figure S9).
Guo et al. scRT-qPCR dataset.
The dataset comprises the single-cell expression values of 48 genes from 387 individual cells isolated at 4 distinct
developmental cell stages, from 8-cell stage mouse embryos to 64-blastocyst 36. By applying CALISTA, we
identified seven different subpopulations along the differentiation process, and the inferred lineage hierarchy
pinpointed two bifurcations events at 32- and 64-cell stage (Supplementary Figure S10). The timing of the lineage
bifurcations coincides with two well-known branching points: one at 32-cell stage when totipotent cells
differentiate into trophectoderm (TE) and inner cell mass (ICM), and another at 64-cell stage when ICM cells
differentiate into primitive endoderm (PE) and epiblast (E).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
16
Nestorowa et al. scRNA-sequencing dataset.
The dataset comprises single-cell gene expression of 1656 cells from mouse hematopoietic stem cell
differentiation 39. We pre-processed the data by removing genes with non-zero values in less than 10% of the
cells. Then, we selected 433 most variable genes, which is 10% of the number of genes after the previous pre-
processing step, for the transcriptional uncertainty analysis 40. We set the optimal number of clusters based on the
original study 39, which reported six different subpopulations and two bifurcation events: the first one producing
common myeloid progenitor (CMP) from lymphoid-primed multipotent progenitors (LMPP), and the second one
generating granulocyte–monocyte progenitors (GMP) from megakaryocyte-erythroid progenitors (MEP)
(Supplementary Figure S11).
Moignard et al. scRT-qPCR dataset.
The dataset contains the single-cell expression level of 18 transcription factors measured in a total of 597 mouse
bone marrow cells during hematopoietic differentiation. By applying CALISTA, we successfully identified the
five subpopulations and the two branching points detected in the original study 37: long-term hematopoietic stem
cells (HSC) differentiating into megakaryocyte–erythroid progenitors (PreM) or lymphoid-primed multipotent
progenitors (LMPP); LMPP cells differentiating into granulocyte–monocyte progenitors (GMP) and common
lymphoid progenitors (CLP) (for details see 40).
Pairwise correlation analysis of transcriptional uncertainty and transcriptional burst size
and frequency.
We defined gene transcriptional burst size and burst frequency using the two-state model parameters, as follows:
𝐵𝑢𝑟𝑠𝑡𝑠𝑖𝑧𝑒 = 𝜃'𝜃%&&
(1)
𝐵𝑢𝑟𝑠𝑡𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 = 𝜃%! (2)
The burst sizes and burst frequencies are evaluated using the parameters 𝜃∗(𝑔, 𝑘) = .𝜃%!∗ , 𝜃%&&∗ , 𝜃'∗/#(
obtained from
single-cell clustering analysis of CALISTA. Meanwhile, the average gene-wise NLL values for each single-cell
cluster was computed as:
where 𝑁𝐿𝐿#! is the negative log-likelihood of cell n based only on the expression of gene g, and Nk is the total
number of cells in cluster k.
𝑁𝐿𝐿#( =∑ +,,!"#$"%&+$
(3)
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
17
RNA VELOCITY ANALYSIS
Cells and genes were first filtered based on the pre-processing strategy in the original publication by La Manno
and colleagues 42, which resulted in a total of 1720 cells and 1448 genes from human glutamatergic neurogenesis,
and a total of 18140 cells and 2141 genes from mouse hippocampus dataset. We further reduced the number of
genes to only the top 500 highly variable genes for the transcriptional uncertainty analysis. The cell cluster
assignments generated by Velocyto – the algorithm for computing RNA velocity from the original publication –
were considered, instead of using CALISTA. Based on the clustering, we employed CALISTA to generate the
lineage progression and cell pseudotimes (Supplementary Figure S12). The RNA velocity and transcriptional
uncertainty values for the top 500 genes were calculated by employing Velocyto and CALISTA, respectively.
The cell-wise RNA velocity was set to the Euclidean norm of the vector of RNA velocities for each cell, while
the cell-wise NLLs was computed according to:
CODE AVAILABILITY
A MATLAB and R version of CALISTA used in this study is freely available from the following website:
https://www.cabselab.com/calista. All additional data are available from the corresponding author upon request.
DATA AVAILABILITY
All the public single cell data sets analysed in this study are available from the original publications.
COMPETING INTERESTS
The authors declare that they have no competing interests.
FUNDING
This work was supported by the Swiss National Science Foundation (grant number 157154 and 176279) and ANR
research grant SinCity (grant number ANR-17-CE12-0031-01).
𝑁𝐿𝐿!( =∑ +,,!"'((!%&
-.. (4)
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
18
ACKNOWLEDGEMENTS
We would like to thank all members of the SBDM team for lively discussions. We also thank the BioSyL
Federation and the LabEx Ecofect (ANR-11-LABX-0048) of the University of Lyon for inspiring scientific
events.
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
19
REFERENCES
1. Ohno, S., Nagai, Y., Ciccarese, S. & Iwata, H. Testis-Organizing H-Y Antigen and the Primary Sex-
Determining Mechanism of Mammals. Proc. 1978 Laurentian Horm. Conf. 449–476 (1979)
doi:10.1016/B978-0-12-571135-7.50014-X.
2. Herskowitz, I. A regulatory hierarchy for cell specialization in yeast. Nature 342, 749–757 (1989).
3. Lewis, E. B. Clusters of Master Control Genes Regulate the Development of Higher Organisms. JAMA J.
Am. Med. Assoc. 267, 1524 (1992).
4. Davis, R. L., Weintraub, H. & Lassar, A. B. Expression of a single transfected cDNA converts fibroblasts
to myoblasts. Cell 51, 987–1000 (1987).
5. Whyte, W. A. et al. Master transcription factors and mediator establish super-enhancers at key cell identity
genes. Cell 153, 307–19 (2013).
6. Papili Gao, N., Ud-Dean, S. M. M., Gandrillon, O. & Gunawan, R. SINCERITIES: Inferring gene
regulatory networks from time-stamped single cell transcriptional expression profiles. Bioinformatics
(2017) doi:10.1093/bioinformatics/btx575.
7. Sartorelli, V. & Puri, P. L. Shaping Gene Expression by Landscaping Chromatin Architecture: Lessons
from a Master. Mol. Cell 71, 375–388 (2018).
8. Richard, A. et al. Single-Cell-Based Analysis Highlights a Surge in Cell-to-Cell Molecular Variability
Preceding Irreversible Commitment in a Differentiation Process. PLOS Biol. 14, e1002585 (2016).
9. Mojtahedi, M. et al. Cell Fate Decision as High-Dimensional Critical State Transition. PLOS Biol. 14,
e2000640 (2016).
10. Stumpf, P. S. et al. Stem Cell Differentiation as a Non-Markov Stochastic Process. Cell Syst. 5, 268-
282.e7 (2017).
11. Semrau, S. et al. Dynamics of lineage commitment revealed by single-cell transcriptomics of
differentiating embryonic stem cells. Nat. Commun. 8, 1096 (2017).
12. Moussy, A. et al. Integrated time-lapse and single-cell transcription studies highlight the variable and
dynamic nature of human hematopoietic cell fate commitment. PLOS Biol. 15, e2001867 (2017).
13. Buganim, Y. et al. Single-Cell Expression Analyses during Cellular Reprogramming Reveal an Early
Stochastic and a Late Hierarchic Phase. Cell 150, 1209–1222 (2012).
14. Braun, E. The unforeseen challenge: from genotype-to-phenotype in cell populations. Reports Prog. Phys.
78, 036602 (2015).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
20
15. Kupiec, J. J. A chance-selection model for cell differentiation. Cell Death Differ. 3, 385–90 (1996).
16. Kupiec, J. J. A Darwinian theory for the origin of cellular differentiation. Mol. Gen. Genet. MGG 255,
201–208 (1997).
17. Paldi, A. Stochastic gene expression during cell differentiation: order from disorder? Cell. Mol. Life Sci.
60, 1775–1778 (2003).
18. Gupta, P. B. et al. Stochastic state transitions give rise to phenotypic equilibrium in populations of cancer
cells. Cell 146, 633–44 (2011).
19. Thomas, P., Popović, N. & Grima, R. Phenotypic switching in gene regulatory networks. Proc. Natl.
Acad. Sci. U. S. A. 111, 6994–9 (2014).
20. Huang, S. Non-genetic heterogeneity of cells in development: more than just noise. Development 136,
3853–3862 (2009).
21. Rebhahn, J. A. et al. An animated landscape representation of CD4+ T-cell differentiation, variability,
and plasticity: Insights into the behavior of populations versus cells. Eur. J. Immunol. 44, 2216–2229
(2014).
22. Bhattacharya, S., Zhang, Q. & Andersen, M. E. A deterministic map of Waddington’s epigenetic
landscape for cell fate specification. BMC Syst. Biol. 5, 85 (2011).
23. Brackston, R. D., Lakatos, E. & Stumpf, M. P. H. Transition state characteristics during cell
differentiation. PLOS Comput. Biol. 14, e1006405 (2018).
24. Ud-Dean, S. M. M. & Gunawan, R. Ensemble inference and inferability of gene regulatory networks.
PLoS One 9, e103812 (2014).
25. Bonnaffoux, A. et al. WASABI: A dynamic iterative framework for gene regulatory network inference.
BMC Bioinformatics 20, 220 (2019).
26. Ud-Dean, S. M. M. & Gunawan, R. Optimal design of gene knockout experiments for gene regulatory
network inference. Bioinformatics 32, 875–883 (2016).
27. Herbach, U., Bonnaffoux, A., Espinasse, T. & Gandrillon, O. Inferring gene regulatory networks from
single-cell data: a mechanistic approach. BMC Syst. Biol. 11, 105 (2017).
28. Qiu, X. et al. Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression
Dynamics Using Scribe. Cell Syst. 10, 265-274.e11 (2020).
29. Lv, C., Li, X., Li, F. & Li, T. Constructing the energy landscape for genetic switching system driven by
intrinsic noise. PLoS One 9, e88167 (2014).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
21
30. Guo, J. & Zheng, J. HopLand: single-cell pseudotime recovery using continuous Hopfield network-based
modeling of Waddington’s epigenetic landscape. Bioinformatics 33, i102–i109 (2017).
31. Fard, A. T., Srihari, S., Mar, J. C. & Ragan, M. A. Not just a colourful metaphor: modelling the landscape
of cellular development using Hopfield networks. npj Syst. Biol. Appl. 2, 16001 (2016).
32. Zwiessele, M. & Lawrence, N. D. Topslam: Waddington Landscape Recovery for Single Cell
Experiments. bioRxiv 057778 (2017) doi:10.1101/057778.
33. Zhang, J. & Zhou, T. Topographer Reveals Stochastic Dynamics of Cell Fate Decisions from Single-Cell
RNA-Seq Data. bioRxiv 251207 (2018) doi:10.1101/251207.
34. Shi, J., Teschendorff, A. E., Chen, W., Chen, L. & Li, T. Quantifying Waddington’s epigenetic landscape:
a comparison of single-cell potency measures. Brief. Bioinform. (2018) doi:10.1093/bib/bby093.
35. Bargaje, R. et al. Cell population structure prior to bifurcation predicts efficiency of directed
differentiation in human induced pluripotent cells. Proc. Natl. Acad. Sci. U. S. A. 114, 2271–2276 (2017).
36. Guo, G. et al. Resolution of cell fate decisions revealed by single-cell gene expression analysis from
zygote to blastocyst. Dev. Cell 18, 675–85 (2010).
37. Moignard, V. et al. Characterization of transcriptional networks in blood stem and progenitor cells using
high-throughput single-cell gene expression analysis. Nat. Cell Biol. 15, 363–72 (2013).
38. Treutlein, B. et al. Dissecting direct reprogramming from fibroblast to neuron using single-cell RNA-seq.
Nature 534, 391–395 (2016).
39. Nestorowa, S. et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell
differentiation. Blood 128, e20-31 (2016).
40. Papili Gao, N., Hartmann, T., Fang, T. & Gunawan, R. CALISTA: Clustering and LINEAGE Inference
in Single-Cell Transcriptional Analysis. Front. Bioeng. Biotechnol. 8, 18 (2020).
41. Coulon, A., Gandrillon, O. & Beslon, G. On the spontaneous stochastic dynamics of a single gene:
complexity of the molecular interplay at the promoter. BMC Syst. Biol. 4, 2 (2010).
42. La Manno, G. et al. RNA velocity of single cells. Nature 560, 494–498 (2018).
43. Peccoud, J. & Ycart, B. Markovian Modeling of Gene-Product Synthesis. Theor. Popul. Biol. 48, 222–
234 (1995).
44. Kim, J. & Marioni, J. C. Inferring the kinetics of stochastic gene expression from single-cell RNA-
sequencing data. Genome Biol. 14, R7 (2013).
45. Munsky, B., Neuert, G. & Van Oudenaarden, A. Using gene expression noise to understand gene
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
22
regulation. Science vol. 336 183–187 (2012).
46. Suter, D. M. et al. Mammalian Genes Are Transcribed with Widely Different Bursting Kinetics. Science
(80-. ). 332, 472–474 (2011).
47. von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 17, 395–416 (2007).
48. Wang, B., Zhu, J., Pierson, E., Ramazzotti, D. & Batzoglou, S. Visualization and analysis of single-cell
RNA-seq data by kernel-based similarity learning. Nat. Publ. Gr. 14, (2017).
49. Szklarczyk, D. et al. STRING v10: protein-protein interaction networks, integrated over the tree of life.
Nucleic Acids Res. 43, D447-52 (2015).
50. Nicolas, D., Phillips, N. E. & Naef, F. What shapes eukaryotic transcriptional bursting? Mol. Biosyst. 13,
1280–1290 (2017).
51. Antolović, V., Miermont, A., Corrigan, A. M. & Chubb, J. R. Generation of Single-Cell Transcript
Variability by Repression. Curr. Biol. 27, 1811-1817.e3 (2017).
52. Fritzsch, C. et al. Estrogen-dependent control and cell-to-cell variability of transcriptional bursting. Mol
Syst Biol 14, 7678 (2018).
53. Kriegstein, A. & Alvarez-Buylla, A. The Glial Nature of Embryonic and Adult Neural Stem Cells. (2009)
doi:10.1146/annurev.neuro.051508.135600.
54. Han, X. et al. Construction of a human cell landscape at single-cell level. Nature (2020)
doi:10.1038/s41586-020-2157-4.
55. Ye, Y., Yang, Z. & Lei, J. Using single-cell entropy to describe the dynamics of reprogramming and
differentiation of induced pluripotent stem cells. bioRxiv 2020.04.13.040311 (2020)
doi:10.1101/2020.04.13.040311.
56. Moris, N., Pina, C. & Arias, A. M. Transition states and cell fate decisions in epigenetic landscapes. Nat.
Rev. Genet. 17, 693–703 (2016).
57. Moris, N. et al. Histone Acetyltransferase KAT2A Stabilizes Pluripotency with Control of Transcriptional
Heterogeneity. Stem Cells 36, 1828–1838 (2018).
58. Guillemin, A., Duchesne, R., Crauste, F., Gonin-Giraud, S. & Gandrillon, O. Drugs modulating stochastic
gene expression affect the erythroid differentiation process. PLoS One 14, e0225166 (2019).
59. Paldi, A. What makes the cell differentiate? Prog. Biophys. Mol. Biol. 110, 41–43 (2012).
60. Argüello-Miranda, O., Liu, Y., Wood, N. E., Kositangool, P. & Doncic, A. Integration of Multiple
Metabolic Signals Determines Cell Fate Prior to Commitment. Mol. Cell 71, 733-744.e11 (2018).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
23
61. Gu, W. et al. Glycolytic Metabolism Plays a Functional Role in Regulating Human Pluripotent Stem Cell
State. Cell Stem Cell 19, 476–490 (2016).
62. Folmes, C. D. L. et al. Somatic oxidative bioenergetics transitions into pluripotency-dependent glycolysis
to facilitate nuclear reprogramming. Cell Metab. 14, 264–71 (2011).
63. Richard, A. et al. Erythroid differentiation displays a peak of energy consumption concomitant with
glycolytic metabolism rearrangements. PLoS One 14, e0221472 (2019).
64. Moussaieff, A. et al. Glycolysis-Mediated Changes in Acetyl-CoA and Histone Acetylation Control the
Early Differentiation of Embryonic Stem Cells. Cell Metab. 21, 392–402 (2015).
65. Zhang, D. et al. Metabolic regulation of gene expression by histone lactylation. Nature 574, 575–580
(2019).
66. Zwaka, T. P. Breathing chromatin in pluripotent stem cells. Developmental Cell vol. 10 1–2 (2006).
67. John D’Errico. Surface Fitting using gridfit. MATLAB Central File Exchange
https://www.mathworks.com/matlabcentral/fileexchange/8998-surface-fitting-using-gridfit (2020).
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
24
FIGURE AND TABLE LEGENDS
Figure 1. Single-cell transcriptional uncertainty landscape. The illustration depicts the landscape of single-
cell transcriptional uncertainty during a differentiation process over the (pseudo) time (from blue to yellow). Each
dot corresponds to a cell in the single-cell transcriptomic dataset. Cells start their journey from a valley in the
landscape, through a hill, before ending at one of the final valleys / states.
Initial stateTransition state
Final states
Nega
tive
log
likel
ihoo
d
OFF
mRNAx
θoff
θt
θd
θon
ON
Two-state model
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
25
Figure 2. Analysis of single-cell transcriptional profiles during iPSC differentiation into cardiomyocytes.
The single-cell gene expression dataset was taken from the study of Bargajeet al. 35 (a-b) Epigenetic landscape
plots estimates by CALISTA. Each dot on the landscape represents a cell where the colors indicate (a) the cell
cluster, (b) pseudotime (from dark blue to yellow). The x-y axes of the landscape plots correspond to the first and
second principal component (PC) coordinates, respectively, of the single-cell transcriptomic data. (c) Boxplots of
NLL values for each single-cell cluster. (d-e) Moving-window average NLL along (d) endoderm and (e)
mesoderm fate trajectory. (f) NLL of each gene and cell for every single-cell cluster. (g) Protein-protein interaction
network of top variable genes inferred by STRING 49. Blue nodes represent transcription factors, while red nodes
denote proteins involved in signal transduction. The width of the edges denotes the confidence for the inferred
relationship (thicker edge = higher confidence).
EM
En
PS
0 200 400 600 800 1000Cell Ordering
150
200
250
300
350
400
450
500
550
600
650
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 5
0 200 400 600 800 1000 1200Cell Ordering
150
200
250
300
350
400
450
500
550
600
650
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 3 4
aNL
L
NLL
NLL
NLL
Cluster Cell ordering Cell ordering
PC1 PC1
PC2PC2
NLL
c d e
b
Cells
Cluster 2 Cluster 3 Cluster 4 Cluster 5
NLL
NLL
NLL
NLL
NLL
Cluster 1
CellsCells
CellsCells
GenesGenes Genes
GenesGenes
f
g
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
26
Figure 3. CALISTA analysis of single-cell expression data. (a-c) Landscape plots (based on cell clusters and
pseudotime) and moving-averaged NLLs for each differentiation path of (a) single-branching trajectory (Treutlein
dataset 38), (b) linear trajectories (Richard 8, Stumpf 10, and Moussy 12 datasets), (c) multi-braching trajectories
(Guo 36, Nestorowa 39, and Moignard 37 datasets). Green and red vertical arrows in moving-averaged NLL plots
indicate the first and second peak in cell uncertainty, respectively.
0 50 100 150 200 250Cell Ordering
140
160
180
200
220
240
260
280
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 3
0 50 100 150 200 250 300Cell Ordering
140
160
180
200
220
240
260
280
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 4
Treu
tlein
0 50 100 150 200Cell Ordering
250
300
350
400
450
500
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 3 4 5
0 50 100 150 200 250 300Cell Ordering
200
300
400
500
600
700
800
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 3 4 5 6
0 100 200 300Cell Ordering
200
250
300
350
400
450
500
550
600
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 2 1 4 3 5 6 7
Stum
pfRi
char
dM
ouss
y
0 50 100 150 200Cell Ordering
140
160
180
200
220
240
260
280
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 3 5
0 20 40 60 80 100 120 140 160 180Cell Ordering
140
160
180
200
220
240
260
280
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 4 7
0 20 40 60 80 100 120 140 160Cell Ordering
140
160
180
200
220
240
260
280
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 4 6
0 20 40 60 80 100 120 140 160 180 200Cell Ordering
80
90
100
110
120
130
140
150
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 3
0 50 100 150 200 250 300Cell Ordering
80
90
100
110
120
130
140
150
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 4
0 50 100 150 200 250 300Cell Ordering
80
90
100
110
120
130
140
150
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 5
0 100 200 300 400 500 600Cell Ordering
1800
1900
2000
2100
2200
2300
2400
2500
2600
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 3
0 100 200 300 400 500 600 700 800 900Cell Ordering
1800
1900
2000
2100
2200
2300
2400
2500
2600
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 4 6
0 100 200 300 400 500 600Cell Ordering
1800
1900
2000
2100
2200
2300
2400
2500
2600
Nega
tive
log
likel
ihoo
d
Moving averaged cell-logP after cell ordering
Clusters in path: 1 2 5
Guo
Nes
toro
wa
Moi
gnar
d
a
b
c
iN
MMEF
MEF -> iN
HSC MPPLMPP
CMPGMP
MEP
GMP
PreMCLP
HSC
EPE
TE
8C
MEF -> M
ESC
NPC
HSC
CMP
HSC -> CMP
ESC -> NPC
8C/16C -> TE 8C/16C -> ICM -> E
ICM
8C/16C -> ICM -> PE
HSC/MPP -> LMPP HSC/MPP ->CMP -> GMPHSC/MPP ->CMP -> MEP
HSC -> PreM HSC -> LMPP -> CLP
LMPP
HSC -> LMPP -> GMP
T2EC (0h)
T2EC (72h)
T2EC (0h) -> T2EC (72h)
16C
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
27
Figure 4. Comparison between RNA Velocities estimated using Velocyto 42 and CALISTA NLL values.
(Top row) Human glutamatergic neurogenesis in La Manno et al. study 42. (Bottom row) Mouse hippocampal
neurogenesis in Kreigstein et al. study 5353. (First column) Cell clustering assignments evaluated from Velocyto.
Normalized values for Euclidean norm of RNA velocities (2nd column), CALISTA single-cell transcriptional
uncertainty (NLL; 3rd column). The colors in the first column indicate the cell clusters, and those in the second-
third columns indicate the normalized cell-wise RNA velocities and NLL values respectively.
-20 -10 0 10PC1
-15
-10
-5
0
5
10
15PC
2
-20 -10 0 10PC1
-15
-10
-5
0
5
10
15
PC2
-20 -10 0 10PC1
-15
-10
-5
0
5
10
15
PC2
-20 -10 0 10 20TSNE1
-30
-20
-10
0
10
20
30
TSNE2
-20 -10 0 10 20TSNE1
-30
-20
-10
0
10
20
30
TSNE2
-20 -10 0 10 20TSNE1
-30
-20
-10
0
10
20
30
TSNE2
RNA Velocity NLLCell ClustersLa
Man
noKr
eigs
tein
Radial glia
Neuroblast
Immature neuron
Neuron
Astrocyte OPC
CA2-3-4
CA
Radial glia
nlPC
CA1-Sub
Nbl
Granule
ImmGranule
0
1
RNA
Velo
city/
NLL
a b c
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint
28
Table 1. Pairwise correlation coefficients between transcriptional uncertainty and transcriptional burst
frequency / burst size.
Correlation with Transcriptional Uncertainty (p-value ≤ 0.01 in red boldface):
Burst Frequency Burst Size
Bargaje et al. 0.74 0.70
Treutlein et al. 0.71 0.64
Richard et al. 0.68 0.55
Stumpf et al. 0.68 0.88
Moussy et al. 0.40 0.81
Guo et al. 0.75 0.82
Nestorowa et al. 0.73 0.71
Moignard et al. 0.04 0.83
La Manno et al. 0.78 0.55
Kreigstein et al. 0.77 0.32
.CC-BY-NC-ND 4.0 International license(which was not certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprintthis version posted April 25, 2020. . https://doi.org/10.1101/2020.04.23.056069doi: bioRxiv preprint