Upload
voliem
View
224
Download
0
Embed Size (px)
Citation preview
Supporting Appendix for: MicroRNA-15a and 16-1 Act Via MYB to Elevate Fetal Hemoglobin Expression in Human Trisomy 13 Vijay G. Sankaran, Tobias F. Menne, Danilo Šćepanović, Jo-Anne Vergilio, Peng Ji, Jinkuk Kim, Prathapan Thiru, Stuart H. Orkin, Eric S. Lander, and Harvey F. Lodish* *To whom correspondence should be addressed. E-mail: [email protected]
1
SI Materials and Methods
Cell Culture
293T cells were maintained in DMEM with 10% FCS and 2% penicillin/streptomycin. These cells
were transfected with the FuGene 6 (Roche) reagent according to manufacturer’s protocol.
K562 erythroleukemia cells were cultured in RPMI-1640 medium with 10% FCS, 2%
penicillin/streptomycin, and 1% L-glutamine. Cells were maintained at a density of 0.1-1 X 106
cells/ ml.
Culture and differentiation of primary human CD34+ cells was performed similar to what has
been previously described (1). Briefly, the CD34+ cells were obtained from magnetically-sorted
mononuclear samples of G-CSF mobilized peripheral blood from donors and were frozen after
isolation. Cells were obtained from the Yale Center of Excellence in Molecular Hematology
(YCEMH). Cells were thawed and washed into RPMI-1640 with 10% FCS, and then seeded in
StemSpan SFEM Medium (StemCell Technologies Inc.) with 1X CC100 cytokine mix (StemCell
Technologies Inc.) and 2% penicillin/streptomycin. Cells were maintained in this expansion
medium at a density of 0.1-1 X 106 cells/ ml with media changes every other or every third day
as necessary. Cells were kept in expansion medium for a total of 6 days. On day 6, cells were
reseeded into StemSpan SFEM Medium with 2% P/S, 20 ng/ml SCF, 1 U/ml Epo, 5 ng/ml IL-3,
2 micromolar dexamethasone, and 1 micromolar β-estradiol. Cells were maintained in
differentiation medium, with media changes every other or every third day as needed. Cells
were maintained at a density of 0.1-1 X 106 cells/ ml. By day 3 of differentiation, homogeneous
larger blasts were present in the culture. By day 5, the majority of cells had proerythroblast
morphology and on day 7 the majority of the cells had basophilic erythroblast morphology. By
day 12 of differentiation, the majority of cells demonstrated orthochromatophilic and
2
polychromatophilic erythroblast morphology. This morphological classification has been
confirmed using phenotypic markers of erythropoiesis, including CD235, CD71, CD45, and
CD36 expression.
Constructs
A 628 bp genomic DNA fragment from human chromosome 13 containing the hairpin region of
miR-15a and 16-1 and 200bp flanking sequence on each side was isolated by PCR from human
genomic DNA and was cloned into the XhoI/XbaI and FseI/PacI sites of the pLVX-puro
(Clontech) and pSMPUW-puro (Cell BioLabs, Inc.) lentiviral expression vectors, respectively
(core primer sequences: forward 5’-GGGCACAGAATGGACTTCAG-3’; reverse 5’-
GATGGCATTCAATACAATTATTA-3’).
The 1.21 kb 3’-UTR of MYB was cloned into the XhoI and NotI sites of the psiCheck2 vector
(Promega) after PCR amplification from human genomic DNA.
shRNA lentiviral constructs targeting human MYB were obtained from the Sigma-Aldrich
Mission shRNA collection and the clones used in this study were TRCN0000009853 and
TRCN0000040058. The sequences of the shRNAs encoded by these clones is (respectively):
CCGGAACAGAATGGAACAGATGACCTCGAGGTCATCTGTTCCATTCTGTTCTTTTTG
CCGGCCAGATTGTAAATGCTCATTTCTCGAGAAATGAGCATTTACAATCTGGTTTTTG
Lentiviral Preparation & Infection
3
Lentiviruses were prepared in a manner similar to what has been described previously (1).
Briefly, 293T cells were transfected with the lentiviral constructs described above along with
VSV-G and p∆8.9 vectors that contain the other necessary viral components. Media was
changed 24 hours after transfection with the desired media of the target cell of interest. At 48
hours, this media was collected and filtered at 0.45 µm. CD34+ (generally transduced for these
experiments at day 3 of expansion) and K562 cells were transduced with lentivirus using a spin-
infection method. The cells were mixed with filtered viral supernatant (with cytokines added as
appropriate for the cell line) and polybrene (filtered 4 mg/ ml stock) was added to the mixture at
2 µl/ ml of media. The cells were spun at room temperature for 90 minutes at 2500 rpm. Cells
were then kept at 37oC overnight and the following day, the cells were washed twice with PBS,
and seeded into fresh medium. Selection with puromycin was started at 48 hours after viral
transduction. Appropriate selection concentrations were determined by exposing uninfected
cells to a range of puromycin concentrations, typically between 1 - 2 µg/ml as a final
concentration of puromycin.
RNA extraction, qRT-PCR, and Microarray Analysis
Isolation of RNA was performed using the miRNeasy Mini Kit (Qiagen). An on-column DNase
(Qiagen) digestion was performed according to manufacturer’s instructions with the RNeasy
Mini Kit. RNA was quantified by a NanoDrop spectrophotometer (Thermo Scientific). cDNA
was synthesized with the iScript cDNA synthesis Kit (Bio-Rad). Real time PCR was performed
using the ABI 7900 Machine with ABI SYBR Green PCR Mastermix (Applied Biosystems, Inc.),
similar to what has been previously described (1, 2). For quantification of microRNA levels, the
appropriate TaqMan set of primers and the TaqMan PCR Mastermix (Applied Biosystems, Inc.)
were used following manufacturer’s instructions on the ABI 7900 instrument. RNU19 was used
as a control for these experiments. Relative expression was quantitated using the ∆∆Ct method
4
as described previously (1, 2). Sequences of primers used for RT-PCR have been published
previously (3). Preparation of samples for expression microarray analysis was done as
previously described (1) and microarrays were processed by the Genome Technology Core at
the Whitehead Institute. Expression array data were normalized by Robust Multichip Averaging
(RMA) using the affy package from Bioconductor. A custom probeset definition was used for
processing the arrays such that there was one probe set per Entrez Gene ID. Differentially
expressed genes were identified using a moderated t test in the limma package in Bioconductor
(4), corrected for the false discovery rate. This analytical approach was also utilized with gene
set GSE13110 from MYB siRNA knockdown in human cord blood CD34+ cells (5). The
expression of all genes in this dataset was normalized and the expression of the hemoglobin
genes was subsequently ascertained. Gene set enrichment analysis (GSEA) was performed as
previously described using processed microarray data (2, 6, 7). To produce gene sets from the
lineage-specific data (8) or from the shMYB data, genes with log2 expression changes of >2
(with p values < 0.01) or >1 (with p values < 0.05), respectively, were used to create the various
gene sets. These were then applied using the GSEA algorithm to assess enrichment of these
gene sets in the processed datasets described in the text. This was done for both the
processed shMYB data, as well as for an erythroid differentiation dataset (GSE628) that has
been described previously (9). The gene sets that were used in this analysis are available on
request. The shMYB microarray data has been deposited in the Gene Expression Omnibus
(http://www.ncbi.nlm.nih.gov/geo/) under accession number GSE25678.
Flow Cytometry
For cell cycle analysis by propidium iodide (PI) staining, cells were washed in PBS, diluted into
300 µl of PBS, and fixed with 3 ml of 70 - 80% ethanol (EtOH) added slowly in a dropwise
fashion to the cells. The cells were allowed to fix at 4oC for at least 15 minutes and were often
5
stored in the EtOH solution for several days. Cells were washed with PBS with 1 % serum and
then incubated in 250 µl of PBS/ 1% serum containing 50 µg/ ml PI and 0.1 mg/ ml RNase A.
The cells were incubated at 37oC for 30 minutes (in the dark) and then were kept on ice prior to
FACS analysis.
Staining of surface markers was performed as has been previously described (2). Cells were
washed with PBS with 1% serum and stained with the appropriate surface antibodies (labeled
with various fluorophores). After washing the antibodies, cells were stained with either PI or 7-
AAD as a marker of viability.
All data analysis was performed using the FlowJo 7.5.5 suite of flow cytometry analysis tools.
Cell cycle analysis was performed using the cell cycle modeling feature in FlowJo using the
Watson model that estimates the percentage of cells in G1, S, and G2/M phases using best
fitting approaches of the flow cytometry data from propidium iodide staining.
Luciferase Reporter Assays
293T cells were seeded at 5000 cells per well in 96 well plates on the day prior to transfection.
Transfection was carried out in at least 3 replicates using FuGENE 6 (Roche), with 5 ng of the
pLVX-miR-15a-16-1 or pLVX expression vectors and 10 ng of the MYB 3’-UTR psiCHECK-2
construct. After 2 days, detection of both firefly and renilla luciferase was performed using the
Dual-Glo Luciferase assay kit (Promega). Luminescence was quantitated and renilla luciferase
readings were normalized against the firefly luciferase activity to determine the relative
luciferase activity.
Western Blotting
6
Western blotting was performed as described previously (1, 3). K562 cells were selected in
puromycin for at least 1 week and harvested before western blotting. The anti-MYB antibody
(clone 1-1) was purchased from Millipore.
Pathological Analysis
Pathological specimens from autopsies from over four decades of the archives of the
Department of Pathology at the Children’s Hospital of Boston were examined using a query for
trisomy 13. In several cases, improperly annotated cases were found and were not used in
further analysis. In other cases, limited hematopoietic tissues (bone marrow, spleen, and liver)
were available for assessment and so these cases were not used in further analysis. After
filtering the cases obtained, 17 cases were used in the final analysis and hematopoiesis was
assessed by a hematopathologist in all cases, using standard criteria for lineage ratios,
cellularity, and numbers of progenitors (10, 11). Summaries of this analysis are shown in Table
S1. Similar samples at the Children’s Hospital of Boston, from the same time period, but from
patients with other diagnoses were used to control for the abnormal findings observed in this
analysis.
Integrative Genomic Analysis
The boundaries of the chromosomal 13q14 region were defined using tools available through
the UCSC genome browser (http://genome.ucsc.edu/) using Human (Homo sapiens) genome
draft version GRCh37/ hg19 (February 2009). The region was defined from coordinates
chr13:40,100,001-55,300,000. All genes in this region were identified using RefSeq, GenBank,
and UniProt identifiers and redundancy among the genes was collapsed to define a unique
7
group of genes for analysis. To assess the relative expression of these genes in human
erythroid (CD71+) progenitors, data from the GNF expression atlas was used for these genes
compared with a panel of 78 other cell types from this dataset (12). Using such an analysis, all
known regulators of globin gene regulation and erythropoiesis (13) have expression levels > 1
and therefore this was used as a cutoff in assessing potential candidate genes (Figure S2).
Among the genes found in this region, 18% of the genes have such an expression ratio and
selected for further analysis and examination as candidates for the elevated fetal hemoglobin
level trait. This data is shown in Table S1 and Figure S2.
The analysis of histone 3 lysine 4 trimethylation using ENCODE data from K562 erythroid cell
lines in the 13q14 region was performed in the following manner. Promoters of genes in the
region of interest were downloaded using UCSC Genes definitions, 1500 bases upstream of the
transcription start site using the UCSC Genome Browser’s Table Browser feature. ENCODE
ChIP data was downloaded in WIG format and converted to BED format. Overlapping regions
were determined using Galaxy Tools (http://main.g2.bx.psu.edu/), the “Intersect” tool was used
with the files containing the promoters and ChIP data. ChIP data that had a height of at least 50
pixels was used as a threshold for a positive peak. The resulting promoters in chromosomal
band 13q14 that had significant H3K4me3 peaks are shown in Table S2. The intersection of
this data with the expression data in Table S1 (for genes with relative expression > 1) is shown
in Table S3.
To assess potential candidate microRNA targets, the aggregate PCT (14) for all predicted targets
of miR-15/16 were compared with the relative expression levels in erythroid and hematopoietic
tissues from the GNF expression atlas (12), similar to what is described above. Candidate
targets of interest were chosen based upon high aggregate PCT scores, along with relatively
elevated expression within the tissues of interest (Figure 3A and S6). The aggregate PCT scores
8
for all miR-15/16 seed targets can be obtained from the TargetScan website
(http://www.targetscan.org/). The data from release 5.1 (April 2009) was used in this analysis.
Among this group of candidates, MYB was particularly notable and therefore was used in further
analyses.
9
SI Text
Probabilistic Models of Chromosomal Region Association
We developed a set of models to explain the correlation between trisomy of particular bands of
chromosome 13 and elevation/persistence of fetal hemoglobin (HbF). The aim is to compute the
posterior probability of model parameters given the data under a particular model M :
( )| ,P parameters data M
The above can be calculated using the standard Bayesian formulation:
( ) ( ) ( )( )
| , || ,
|P data parameters M P parameters M
P parameters data MP data M
= (1)
The denominator in Equation (1) represents the probability of observing the data under a
specific model, and is commonly referred to as the Bayesian Score (BS). An appropriate model
produces a high BS, whereas inappropriate models (ones with improper assumptions or
parameters) produce a low BS. It is important to note that the BS is not a function of any
particular parameter values, but rather it considers all possible parameter values in proportion to
their ability to explain the data. We use the BS to select the most appropriate model.
Data:
We analyze a data set of 57 subjects with partial trisomy of chromosome 13 who have been
classified as affected (elevated HbF) or unaffected (15, 16). Each subject’s partial trisomy is
represented as a vector of 0s and 1s, where 1 signifies trisomy. The data set is a matrix where
10
each row is a particular subject. There are rows (number of subjects) and N B columns
(number of bands of chromosome 13).
Subject 1 1 1 0Subject 2 1 0 0
Subject N 0 1 1
N
B
⎫⎡ ⎤ ⎡ ⎤⎪⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢= = ⎬⎢ ⎥ ⎢⎥⎥⎪⎢ ⎥ ⎢ ⎥⎪⎣ ⎦ ⎣ ⎭
S
L L L
L L L
L M L M M O M
L L L1442443⎦
[
Each subject is either affected or unaffected with elevated HbF. This information is stored in a
binary vector ]1 0 1TA
N
=v
L1442443
N, also of length , where 1 signifies an affected subject.
11
To summarize the data, we can construct probability distributions of trisomy of various bands in
the affected and unaffected populations (shown in Figures 1B and S1 in the paper):
( )
( ) ( ) ( )| / 1
| 1 / 1
T T
T T
P Trisomy Affected A A
P Trisomy Unaffected A A
=
= − −
S
S 1
v v v
v v v v v
Av
1T
Here, is a column vector of length and simply serves to either sum the elements of a
vector, or to reverse the bits in . Thus,
1v
N
Av v
1 A−v v
A
is simply the total number of affected subjects in
the data set, is a vector analogous to v
but with 1s for unaffected subjects, and
is the total number of unaffected subjects. For a full list of symbols used in this
document, refer to the Mathematical Glossary.
( )1 1A−v v vT
Model Notation and Assumptions
We test a number of models designed to compute the probability that a particular band X of
chromosome 13 contains the gene responsible for elevation of HbF. This is similar to the model
used by Korbel et al. in their study of segmental trisomy 21 (17). However, we also model gene
silencing by supposing that there exists a regulatory region, which is necessary for the genes in
band X to be expressed or, alternatively, to ensure that the genes in band X are not silenced.
We assume that if the gene responsible for elevation of HbF is expressed in trisomy, then that
subject has a probability of being affected (or *P *1 P− of being unaffected). On the other hand,
if the responsible gene is not expressed in trisomy (either because band X is not present in
trisomy, or if it is present in trisomy but the regulatory region is not), then the subject has a
probability of being affected (1 of being unaffected), with 0P 0P− 0 *P P< .
12
In all models, we have a single parameter X , and we wish to compute as
in Equation (1). There are two other potential parameters, and , but their values can be
ascertained from full trisomy data. The vast majority of individuals with full trisomy 13 have
elevated HbF, and since full trisomy implies that the gene of interest is certainly expressed in
trisomy, we set . On the other hand, HbF elevation is very rare in individuals without
trisomy 13, so we set . We also test these assumptions, and show that as long as
, on average we obtain the same answer as if we assume the aforementioned explicit
values for and .
( )| ,P X x data M=
*P 0P
* 0.8P =
0 0.05P =
0 *P P<
*P 0P
Finally, we consider the prior probability of X : ( )P X x= (how strongly we believe that the
gene of interest resides on a particular band x , before we see any data). There are many ways
to derive a valid prior; for example, we could say that the probability of a gene being on a
particular band is proportional to the number of genes contained within that band. For simplicity,
we use a uniform prior, where we consider each band equally likely to contain the gene of
interest. In the remainder of the document, all the equations contain the general term
, but in the numerical analysis we use ( )P X x= ( ) 1P X x x . B
= = for all
General Probability Structure
For each model, we compute the probability ( )| ,P data X x M= . Due to the definitions of
and , the conceptual equation of this probability is:
*P
0P
( ) ( ) ( ) ( ) ( )* 0 * 0| , 1 1ATxE ADx ATxS UTxE UDx UTxSN N N N N NP data X x M P P P P
+= = − −
+ (2)
13
where is the number of subjects who are affected (ATxEN A ) with elevated HbF and in whom
band x is expressed ( E ) in trisomy ( ), is the number of subjects who are affected and
in whom band
T ADxN
x is present in disomy ( ), is the number of subjects who are affected
and in whom band
D ATxSN
x is present in trisomy but is silenced ( ); , , and N are the
analogues for the unaffected (U ) population. In the equation for each model, we show a matrix-
vector notation for how each of these counts is calculated from the data (S and ).
S UTxEN UDxN UTxS
Av
( )P X x
The denominator of Equation (1) is simply Equation (2) multiplied by the prior = , and
summed over all possible values of x (by the law of total probability):
( ) ( ) (1
| | ,B
kP data M P data X k M P X k
=)= = =∑
14
Model 1
In Model 1, we assume that no other regulatory region is necessary for expression of the genes
on band X . This is equivalent to assuming that a regulatory region does exist, but that it also
resides on band X . For this model, the posterior probability distribution for X is:
( ) ( ) ( ) ( ) ( )( ) ( )( ) ( ) ( )
( ) ( ) ( ) ( )( ) ( )( ) ( ) ( )
1 1 1 1* 0 * 0
1 1 1 1* 0 * 0
1
1 1| , 1
1 1
T TT Tx x x x
T TT Tk k k k
A S A S A S A S
B A S A S A S A S
k
P P P P P X xP X x data M
P P P P P X k
− − − −
− − − −
=
− − == =
− − =∑
v v v v v v v v v v v v
v v v v v v v v v v v v
x xS δ= Sv v
th column of S . x is the Where
Model 2
In Model 2, we assume that a regulatory region exists somewhere between the p terminus and
X , but not within X . We also assume that the regulatory region is equally likely to exist in any
of the bands between the p terminus and ( )1x1/X (probability of − for each band). The
posterior probability distribution for X is similar to that of Model 1, but with a different set of
exponents:
15
( )
( ) ( ) ( ) ( )( )
( ) ( )( )
) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 2
1 1
T Tx x x x x
T Tk k k k k
T Tk k k k k
A L A S L
A L A S L
A L A S L
P P
P P
P X x
P X x data M
P P
P X k
− + −
− + −
− − − +
⎧ ⎫×⎪ ⎪
( ) ( )
−
( ) ( )( )
( ) (
1 1 1 1* 0T T
x x x x xA L A S L
P P
− − − + −⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪
D D
D D
D D
D D
v v v v vv v
v v v v vv v
v v v v v v vv v
v v v v v v vv v
1
B
k=
⎪⎩ ⎭⎪
For clarity, we use t ctor
∑
(3)
he ve xS as in Model 1, and define a diagonal matrix containing the
vector
v
xSv
on the main diagonal ( )x xdiag S=Dv
, and a vector containing the fraction of bands
between the p terminus and band x (excluding band x ) that are trisomied 1x x
δ⎛ ⎞⎜ ⎟−⎝ ⎠
vv
To understand the meaning of
xL <= S .
xLv
, consider the two subjects in Figure 1, and assume they are
both affected. Subject 1 has trisomy of bands 1-6 and subject 2 has trisomy of bands 4-10. If we
are looking at band 5x = , and we assume that the regulatory region is between the p term
(left side) and 5, then ct S1 contains the regulatory region within the trisomied segment
with probability 1 and therefore S1
inus
subje
contributes a 1 to the ATxEN count. Subject 2 however,
contains trisomy of band 5, but only contains trisomy of one of the four bands that could contain
the regulatory region. Since each of the four bands (1-4) have equal probability of containing the
regulatory region, S2 has a 0.25 chance of also having trisomy of the regulatory region and
being subject to *P , and a 0.75 chance of not containing the regulatory region within its trisomy
region in which case he/she is subject to 0P . Therefore, S2 would contribute 0.25 to the ATxEN
16
count and 0.75 to the ATxSN count. xLv
contains the fraction of bands that potentially contain the
regulatory region and are trisomied.
S
S2
1
1 2 3 4 5 6 7 8 9 10 qp
Partial trisomy of two hypothetical affected subjects.
Model 3
odel 3 is similar t el 2, except that we assume that the regulatory r
somewhere between
Figure 1
o ModM egion exists
X
within
and the q terminus. The other assumptions of Model 2 remain: the
regulatory region is not X X, and it is equally likely to be in any of the bands between
and the q te Model 3 is:
rminus. The equation for
( )
( ) ( ) ( ) ( )( )
( ) ( )
(( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
* 0
1 1 1 1* 0
1 1
| , 3
1 1
T Tx x x x x
T Tk k k k k
A G A S G
A G A S G
P P
P P
P X x data MP P
P P
P X k
− + −
− − − + −
( ) ( ) ( ) ( )( )
( )
) ( ) ( ) ( )( )
1 1 1 1
1 1* 0
T Tx x x x x
T Tk k k k k
A G A S G
A G A S G
P X x
− − − + −
− + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪⎩ ⎭
D D
D D
D D
D D
v v v v v v v
v v v v v v v
v v v v v v v v v
v v v v v v v v v
(4)
1
B
k=
⎪
∑
17
Where we replace xL with v
xxG
B xδ>⎛ ⎞
= ⎜ ⎟−
v
⎝ ⎠S
vra of , which contains the f ction trisomied bands
between band x and the q terminus (bes xLv
ides the different denominators, note that in we
bands less than only look at x , but in xGv
x we only look at bands greater than ).
Model 4:
In Model 4, we assume that there is a regulatory region, but we make no restriction on where it
an be. We still assume that it is equally likely to be in any of the bands, except bc and X . The
equation for this probability distribution is similar to Equations (3) and (4) except that we define
a vector which contains the fraction of all bands outside of band X that are trisomied:
11
B xxO
Bδ⎛ ⎞−
= ⎜ ⎟−⎝ ⎠S
v vv
. Using this definition and the same definitions of xSv
and xD from above, we
−
can write the distribution for this model as:
( ) ( ) ( ) ( )( )
( )
1 1* 0T T
x x x x xA O A S OP P
− + −⎧ ⎫×D D
v v v v v v v
⎪ ⎪
( )
( ) ( )( ) ( ) ( )( )
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1 1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 4
1 1
T Tx x x x x
T Tk k k k k
T Tk k k k
A O A S O
A O A S O
A O A S O
P P
P X x
P X x data MP P
P P
P X k
− − − + −
− + −
− − − +
⎪ ⎪
k
⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪⎩ ⎭
D D
D D
D D
v v v v v v v v v
v v v v v v v
v v v v v v v vv
1
B
k=
⎪
∑
(5)
In Models 2-4, we assumed that the regulatory region does not exist within the same band as
the gene of interest. For completeness, we also consider models where we include band X as
18
a possible location of the regulatory region. We name these models M2I-M4I, since they are
inclusive of band X . The equations are:
odel 2I
We define
M
to also include band x : xxIL
xδ≤⎛ ⎞
= ⎜ ⎟⎝ ⎠
Sv
vxILv
:
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1 1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 2
1 1
T Tx xI x x xI
T Tx xI x x xI
T Tk kI k k kI
T Tk kI k k kI
A L A S L
A L A S L
A L A S L
A L A S L
P P
P P
P X x
P X x data M IP P
P P
P X k
− + −
− − − + −
− + −
− − − + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪
− − ×⎨⎪
=⎪⎩
D D
D D
D D
D D
v v v v vv v
v v v v v v vv v
v v v v vv v
v v v v v v vv v
1
B
k=
⎫⎪⎪⎪⎬⎪⎪
⎪ ⎪⎭
∑
(6)
Model 3I
e define xIGv
to also include band x : 1
xxIG
B xδ≥⎛ ⎞
= ⎜ ⎟− +⎝ ⎠S
vv
: W
19
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1 1 1* 0
1 1* 0
1 1 1 1* 0
1 1
| , 3
1 1
T Tx xI x x xI
T Tx xI x x xI
T Tk kI k k kI
T Tk kI k k kI
A G A S G
A G A S G
A G A S G
A G A S G
P P
P P
P X x
P X x data M IP P
P P
P X k
− + −
− − − + −
− + −
− − − + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪ − − ×⎨⎪
=⎪⎩
D D
D D
D D
D D
v v v v v v v
v v v v v v v v v
v v v v v v v
v v v v v v v v v
(7)
1
B
k=
⎫⎪⎪⎪⎬⎪⎪
⎪ ⎪⎭
∑
Model 4I
We define
xIOv
to also include band x : 1B
xIOB
⎛ ⎞= ⎜ ⎟
⎝ ⎠S
vv
:
( ) ( ) ( ) ( )
( )
( )
( )( ) ( )( ) ( ) ( )( )
1 1* 0
1 1 1 1* 01 1
T Tx xI x x xI
T Tx xI x x xI
A O A S O
A O A S O
P P
P P
− + −
− − − + −
( )
( ) ( ) ( ) ( )( )
( )( ) ( )( ) ( ) ( )( )
( )
1 1* 0
1 1 1 1* 0
| , 4
1 1
T Tk kI k k kI
T Tk kI k k kI
A O A S O
A O A S O
P X x
P X x data M IP P
P P
P X k
− + −
− − − + −
⎧ ⎫×⎪ ⎪⎪ ⎪⎪ ⎪− − ×⎨ ⎬⎪ ⎪
=⎪ ⎪⎪ ⎪⎩ ⎭= =⎧ ×⎪⎪⎪ − − ×⎨⎪
=⎪⎩
D D
D D
D D
D D
v v v v v v v
v v v v v v v v v
v v v v v v v
v v v v v v v v v
1
B
k=
⎫⎪⎪⎪⎬⎪⎪
⎪ ⎪⎭
∑
(8)
20
Results
Figure 2 contains the BS for all 7 models. Judging from the values, Model 2 is the most
appropriate.
M1 M2 M3 M4 M2I M3I M4I0
1
2
3
4
5
6
7x 10-12
Model
Bay
esia
n S
core
Model Bayesian Scores
Figure 2 Model M2 has the highest BS, followed closely by models M1 and M2I. Although it cannot be distinguished from the figure, the order of BS for the remaining models is M4I, M4, M3I, M3 (from highest to lowest).
We show the probability distributions according to all 7 models in Figure 3. It is interesting to
note that Models 1, 4, and 4I all suggest 13q12 as the one most likely to contain the gene
related to elevation of HbF. However, the model which best explains the data (Model 2) peaks
for 13q14. Since Model 2 is the best model according to Figure 2, we conclude that 13q14
contains the gene causing elevation of HbF.
The next logical question concerns our choice of and : how sensitive is the BS and the
most likely band to the choice of these values? To answer this question, we computed the
probability distributions and BSs for each model for a set of and values, maintaining only
*
*P 0P
P 0P
21
the condition that . Figure 4A shows that the actual values of and affect which is
the most likely model. There are four regions: Model 1 is most appropriate for low values of
and relatively low values of , Model 3 is most appropriate for high values of and high
values of , while Model 2 is most appropriate for the majority of ( , ) values with a
subsection containing smaller values of owned by Model 2I. Figure 4B shows that the BSs
corresponding to the most likely models from panel A are actually fairly well contained in the
( , ) space, being highest around and . Figure 4C shows that if we
choose the most appropriate model for each ( , ) combination, and then choose the band
with the highest posterior probability, band 13q14 dominates the majority of the parameter
space. Smaller sections of the parameter space favor bands 13q11 - 13q13 and the terminal
bands pter and qter. Figure 4D shows the actual posterior probabilities of the corresponding
bands in panel C.
*P P> 0 *P 0P
*P
0P
*P 0P
*P 0P
*P
0P =0P *P 0.12
0P
* 0.75P =
*P
22
pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0
0.5
1P
(X=x
|dat
a,M
1)Probability distributions of band responsible for persistent fetal hemoglobin
pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0
0.5
1
P(X
=x|d
ata,
M2
or M
2I)
M2M2I
pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0
0.5
1
P(X
=x|d
ata,
M3
or M
3I)
M3M3I
pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0
0.5
1
Band
P(X
=x|d
ata,
M4
or M
4I)
M4M4I
Figure 3 Posterior probability distributions according to all 7 models. The majority of models favor band 13q12; however, Model 2, which is the most likely according to the BS criterion, points to band 13q14.
23
A
P0
P*
Model with the highest BS
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
M1
M2
M3
M4
M2I
M3I
M4I
B
P0
P*
BS of Most Appropriate Model
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.5
1
1.5
2
2.5
3
3.5
x 10-11
C
P0
P*
Most Likely Band from Most Appropriate Model
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
pterp12p11.2p11.1q11q12q13q14q21q22q31q32q33qter
D
P0
P*
Probability of Most Likely Band from Most Appropriate Model
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
0
0.2
0.4
0.6
0.8
1
Figure 4 The most likely model for a particular , is shown in panel A. Panel B shows the actual BS of the corresponding models in A. Panel C shows the most likely band from the distributions corresponding to the most likely models in A, and panel D shows the actual probability of the corresponding band in panel C.
*P 0P
To summarize the , perturbation study detailed in Figure 4 ,we compute the average over
all values of and of the probability that each band is the highest-probability band
according to the most likely model. In other words: for each value of and , compute the
BS of each model, select the model with the highest BS (
*P 0P
*P 0P
*P 0P
*M ), use to ( )*| ,P X x data M=
24
*select the band with the highest probability ( x ), and record the value
which captures both the likelihood of the particular band containing the gene of interest and the
appropriateness of the model in explaining all the data. Then, for each band , compute the
average of over all the and for which
( )* *, |P X x data M=
b
( )* *, |P X x data M= *P 0P *x b= . This result is shown
in Figure 5, where we clearly see that regardless of choice of and , if we pick the most
appropriate model for the situation, band 13q14 most likely contains the gene responsible for
elevation of HbF in trisomy 13.
*P 0P
pter p12 p11.2p11.1 q11 q12 q13 q14 q21 q22 q31 q32 q33 qter0
0.2
0.4
0.6
0.8
1
Ba
EP* ,P
0(P(X
=x* ,d
ata|
M* )I x*
=b)
Average Probability of ea nd over all P* an 0
0P
nd
ch ba
*P
d P
Figure 5 Probability that a band contains the gene responsible for elevated HbF according to the most appropriate model (M*), averaged over all values of and .
Conclusion
We constructed seven models to calculate the probability that a particular band of chromosome
13 houses the gene responsible for elevation of HbF when the gene is expressed in trisomy.
The likelihood of each model was evaluated using a data set containing 57 partial trisomy cases
with documented levels of HbF. The most likely model assumes that a single band contains the
gene of interest, and that a regulatory region for that gene exists between the p terminus and
25
the gene itself. According to this model, the gene implicated in elevation of HbF is located on
band 13q14. This conclusion is not sensitive to the particular values of (probability of
elevated HbF if the responsible gene is expressed in trisomy) and (probability of elevated
HbF if the responsible gene is not expressed in trisomy) assumed in the model.
*P
0P
26
Mathematical Glossary
Here we present a quick reference for all the symbols used in the text above. All vectors are
column vectors; their transposes (superscript ) are shown in the definitions below. T
x th column of : SSubject trisomy matrix, one subject on each
row, subjects, N B chromosome bands, 1
means trisomy of a particular band:
Subject 1 1 1 0 ⎫⎡ ⎤ ⎡Subject 2 1 0 0
Subject N 0 1 1
N
B
⎤⎪⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢ ⎥= = ⎬⎢ ⎥ ⎢ ⎥⎪⎢ ⎥ ⎢ ⎥⎪⎣ ⎦ ⎣ ⎦⎭
SL L L
L M L M M O M
L L L1442443
[
L L L
Affected vector, each subject is either affected
(1) or unaffected (0):
]1 0 1TA
N
=v
L1442443
Av
[
Vector of ones, same size as :
]1 1 1 1T
N
=v
L1442443
Vector of ones, length B :
[ ]1 1 1 1TB
B
=v
L1442443
The
x xSv
δ= Sv
Diagonal matrix with the vector xSv
on the main
diagonal:
( )x xdiag S=Dv
Vector containing the fraction of bands
between the p terminus and band x that are
trisomied:
excluding x : 1
xxL
xδ<⎛ ⎞
= ⎜ ⎟
vv
−⎝ ⎠S
including x : xxIL
xδ≤⎛ ⎞
= ⎜ ⎟⎝ ⎠
Sv
v
Vector containing the fraction of bands
between the band x and the q terminus that
are trisomied:
excluding x : xxG
B xδ>⎛ ⎞
= ⎜ ⎟
vv
−⎝ ⎠S
27
28
Vector of zeros with a 1 in position
corresponding to x ; length B :
[ ]0 0 1 0 0
x
=
↑
L L
T
B64444744448vxδ
xδ<
v, xδ>
v, etc are vectors like xδ
v with ones
where the subscript is true, for example xδ<
v
has ones for all bands less than x .
including x :1
xxIG
B xδ≥⎛ ⎞
= ⎜ ⎟− +⎝ ⎠S
vv
Vector containing the fraction of bands outside
band x that are trisomied:
excluding x :1
1B x
xOB
δ⎛ ⎞−= ⎜ ⎟−⎝ ⎠
Sv v
v
including x : 1B
xIOB
⎛ ⎞= ⎜ ⎟
⎝ ⎠S
vv
Table S1. Chromosomal Band 13q14 Gene Information and Relative Erythroid Expression. Name RefSeq EntrezGene CD71+
Erythroid Known Position
Description
EBPL NM_032565 84650 -3.013 chr13:49132863-49163612
emopamil binding related protein, delta8-delta7
DGKH NM_178009 160851 -2.69 chr13:41520889-41701888
diacylglycerol kinase, eta isoform 2
FNDC3A NM_001079673 22862 -2.319 chr13:48448049-48681916
fibronectin type III domain containing 3A
NUDT15 NM_018283 55270 -2.29 chr13:47509704-47519283
nudix-type motif 15
C13orf1 NM_020456 57213 -2.274 chr13:49384843-49408626
chromosome 13 open reading frame 1 isoform 1
SUGT1L1 NR_003365 283507 -2.256 chr13:40384027-40393886
Homo sapiens full length insert cDNA clone YY75B04.
LRCH1 NM_015116 23143 -2.08 chr13:46025304-46215731
leucine-rich repeats and calponin homology (CH)
THSD1 NM_018676 55901 -1.868 chr13:51849304-51878321
thrombospondin type I domain-containing 1
HTR2A NM_000621 3356 -1.778 chr13:46305514-46368370
5-hydroxytryptamine (serotonin) receptor 2A
CYSLTR2 NM_020377 57105 -1.75 chr13:48178954-48181499
cysteinyl leukotriene receptor 2
KCNRG NM_199464 283518 -1.673 chr13:49487391-49493059
potassium channel regulator isoform 1
MGC5590 n/a n/a -1.656 chr13:43615287-43633393
Homo sapiens hypothetical protein MGC5590, mRNA (cDNA clone MGC:5590 IMAGE:3460233), complete cds.
AKAP11 NM_016248 11215 -1.539 chr13:41744289-41795402
A-kinase anchor protein 11
BC025370 n/a n/a -1.536 chr13:43872388-43878021
Homo sapiens hypothetical gene supported by BC025370, mRNA (cDNA clone IMAGE:3945331).
ENOX1 NM_001127615 55068 -1.477 chr13:42685666-43101613
ecto-NOX disulfide-thiol exchanger 1
TSC22D1 NM_183422 8848 -1.437 chr13:43905655-44048701
TSC22 domain family, member 1 isoform 1
SUCLA2 NM_003850 8803 -1.427 chr13:47414792-47473463
succinate-CoA ligase, ADP-forming, beta subunit
SERPINE3 NM_001101320 647174 -1.302 chr13:50813169-50834240
nexin-related serine protease inhibitor
TNFSF11 NM_003701 8600 -1.134 chr13:42046298-42080148
tumor necrosis factor ligand superfamily, member
KIAA0564 NM_015058 23078 -1.067 chr13:41038961-41433221
hypothetical protein LOC23078 isoform a
C13orf18 NM_025113 80183 -1.013 chr13:45814138-45859636
hypothetical protein LOC80183
P2RY5 NM_005767 10161 -1.003 chr13:47883183-47887947
G-protein coupled purinergic receptor P2Y5
KCTD4 NM_198404 386618 -0.948 chr13:44664988-44673175
potassium channel tetramerisation domain
CDADC1 NM_030911 81602 -0.907 chr13:48720104-48765619
cytidine and dCMP deaminase domain containing 1
PHF11 NM_001040443 51131 -0.875 chr13:48967802-49001118
PHD finger protein 11 isoform a
TRIM13 NM_001007278 10206 -0.834 chr13:49469144-49490604
ret finger protein 2 isoform 2
SERP2 NM_001010897 387923 -0.743 chr13:43845978-43869850
stress-associated endoplasmic reticulum protein
FOXO1 NM_002015 2308 -0.701 chr13:40027801-40138734
forkhead box O1
CAB39L NM_030925 81617 -0.556 chr13:48780787- calcium binding protein 39-like
29
48873736
CCDC70 NM_031290 83446 -0.528 chr13:51334118-51338373
coiled-coil domain containing 70 precursor
C13orf31 NM_001128303 144811 -0.5 chr13:43351420-43366068
hypothetical protein LOC144811
SLC25A30 NM_001010875 253512 -0.464 chr13:44868044-44890515
solute carrier family 25, member 30
WDFY2 NM_052950 115825 -0.392 chr13:51056485-51234172
WD repeat and FYVE domain containing 2
ITM2B NM_021999 9445 -0.344 chr13:47705309-47734228
integral membrane protein 2B
LECT1 NM_007015 11061 -0.217 chr13:52175400-52211948
leukocyte cell derived chemotaxin 1 isoform 1
LCP1 NM_002298 3936 -0.188 chr13:45598059-45654395
L-plastin
GTF2F2 NM_004128 2963 -0.181 chr13:44592672-44756237
general transcription factor IIF, polypeptide 2,
FAM124A NM_145019 220108 -0.109 chr13:50694508-50753617
hypothetical protein LOC220108
GUCY1B2 NR_003923 2974 -0.07 chr13:50487553-50538143
Guanylate cyclase 1, soluble, beta 2.
MLNR NM_001507 2862 -0.042 chr13:48692475-48694514
motilin receptor
NEK3 NM_152720 4752 -0.035 chr13:51604780-51631511
NIMA-related kinase 3
RCBTB2 NM_001268 1102 -0.017 chr13:47961100-48005317
regulator of chromosome condensation and BTB
TPT1 NM_003295 7178 0.043 chr13:44809304-44813297
tumor protein, translationally-controlled 1
DKFZp434F1622 NR_002816 374500 0.096 chr13:51639848-51763681
Homo sapiens cDNA FLJ14630 fis, clone NT2RP2000459.
COG3 NM_031431 83548 0.247 chr13:44937072-45008761
component of golgi transport complex 3
DHRS12 NM_001031719 79758 0.26 chr13:51242921-51276294
dehydrogenase/reductase (SDR family) member 12
ATP7B NM_000053 540 0.263 chr13:51404806-51483631
ATPase, Cu++ transporting, beta polypeptide
KIAA1704 NM_018559 55425 0.283 chr13:44461687-44500404
hypothetical protein LOC55425
RCBTB1 NM_018191 55213 0.451 chr13:49004083-49057720
regulator of chromosome condensation (RCC1) and
VPS36 NM_016075 51028 0.46 chr13:51884738-51922764
vacuolar protein sorting 36
EPSTI1 NM_001002264 94240 0.477 chr13:42360122-42464377
epithelial stromal interaction 1 isoform 1
ALG11 NM_021645 9724 0.478 chr13:51484551-51504074
asparagine-linked glycosylation 11,
UTP14C NM_021645 9724 0.478 chr13:51500942-51503242
UTP14, U3 small nucleolar ribonucleoprotein,
CPB2 NM_001872 1361 0.492 chr13:45525323-45577212
plasma carboxypeptidase B2 isoform a
ESD NM_001984 2098 0.553 chr13:46243392-46269368
esterase D/formylglutathione hydrolase
RNASEH2B NM_024570 79621 0.656 chr13:50381893-50428902
ribonuclease H2, subunit B isoform 1
C13orf30 NM_182508 144809 0.659 chr13:42253751-42263683
hypothetical protein LOC144809
ELF1 NM_172373 1997 0.714 chr13:40404164-40454418
E74-like factor 1 (ets domain transcription
INTS6 NM_012141 26512 0.748 chr13:50833702-50925276
integrator complex subunit 6 isoform a
MED4 NM_014166 29079 0.895 chr13:47548093-47567241
mediator complex subunit 4
NARG1L NM_024561 79612 0.94 chr13:40783341-40849166
NMDA receptor regulated 1-like protein isoform
NUFIP1 NM_012345 26747 0.943 chr13:44411384-44461613
nuclear fragile X mental retardation protein
30
KPNA3 NM_002267 3839 1.003 chr13:49171462-49265058
karyopherin alpha 3
ZC3H13 NM_015070 23091 1.015 chr13:45434315-45524895
zinc finger CCCH-type containing 13
DNAJC15 NM_013238 29103 1.057 chr13:42495362-42581306
DNAJ domain-containing
RB1 NM_000321 5925 1.231 chr13:47775884-47954027
retinoblastoma 1
WBP4 NM_007187 11193 1.762 chr13:40533697-40556139
WW domain-containing binding protein 4
MTRF1 NM_004294 9617 1.768 chr13:40688516-40735713
mitochondrial translational release factor 1
SIAH3 NM_198849 283514 1.848 chr13:45252417-45323847
seven in absentia homolog 3
SLC25A15 NM_014252 10166 1.89 chr13:40261597-40282246
solute carrier family 25 (mitochondrial carrier;
MRPS31 NM_005830 10240 1.917 chr13:40201432-40243347
mitochondrial ribosomal protein S31
DLEU2 NR_002612 8847 2.142 chr13:49454689-49597678
Homo sapiens BCMS-upstream neighbor (BCMSUN) mRNA, partial sequence.
ARL11 NM_138450 115761 2.207 chr13:49100625-49105732
ADP-ribosylation factor-like 11
SUGT1 NM_006704 10910 2.809 chr13:52125002-52160434
suppressor of G2 allele of SKP1 isoform b
CKAP2 NM_001098525 26586 3.003 chr13:51927496-51948764
cytoskeleton associated protein 2 isoform 2
C13orf15 NM_014059 28984 3.753 chr13:40929542-40943013
response gene to complement 32
31
Table S2. Chromosomal Band 13q14 Promoters with Significant H3K4me3 Peaks in K562 Cells.
Gene Name AK056182 AK095119 AK124928
ALG11 ATP7B
BC039553 C13orf1 CAB39L CDADC1 CKAP2 CLLD8
CR625002 DHRS12
DKFZp434B105 DKFZp434F1622 DKFZp434H1720
DLEU2 EBPL EBRP
GTF2F2 HNRNPA1L2
INTS6 ITM2B
KBTBD6 KIAA0564 KIAA1704
KPNA3 MRPS31 MTRF1
NARG1L NUDT15 NUFIP1 pp13759
RB1 SETDB2
SLC25A15 SUGT1 TPT1
UTP14C WBP4 WDF2
WDFY2 XTP6
ZC3H13
32
33
Table S3. Chromosomal Band 13q14 Genes with Elevated Expression in CD71+ Erythroid Progenitors and Significant H3K4me3 Promoter Peaks in K562 Cells.
Gene Name
CKAP2
DLEU2
MRPS31
MTRF1
RB1
SLC25A15
SUGT1
WBP4
ZC3H13
Table S4. Pathological Analysis of Hematopoiesis in Trisomy 13 Cases.
Identification Number Age, Gender, and Phenotypic Information Notes from Pathological Assessment TR13-1 3 month, female
Trisomy 13 ‐ 90% cellular BM ‐ M:E = 1:1-2 ‐ Full erythroid maturation ‐ 6 megas per 40x hpf, increased ‐ Mild EMH in spleen
TR13-2 5 day, male Trisomy 13
‐ >90% cellular BM ‐ M:E = 1-2:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Mild EMH in liver
TR13-3 1 day, male Trisomy 13
‐ 100% cellular BM ‐ M:E = 1-2:1 ‐ Full erythroid maturation ‐ 1-2 megas per 40 x hpf ‐ Mild EMH in liver
TR13-4 3 day, female Trisomy D (unclear if karotype done, phenotype
compatible)
‐ 100% cellular BM ‐ M:E = 10:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Mild EMH in liver
TR13-5 1 day, female Trisomy 13
‐ 100% cellular BM ‐ M:E = 10:1 ‐ Rare erythroids, but seem mature ‐ Overall 1-2 megas per 40 x hpf, focal clusters
present ‐ Mild EMH in liver
TR13-6 1 hr 18 min, male
Trisomy 13 ‐ 100% cellular BM ‐ M:E = 3-5:1 ‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Marked EMH in liver
TR13-7 5 hr, male Trisomy 13
‐ 100% cellular BM ‐ M:E = 10:1 ‐ Only rare erythroids, maturation difficult to
assess ‐ 1-2 megas per 40 x hpf ‐ Moderate EMH in liver
TR13-8 1 day, female Trisomy 13 mosaic
‐ 100% cellular BM ‐ M:E = 1:3
34
‐ Full erythroid maturation ‐ 2-3 megas per 40 x hpf ‐ Prominent EMH in liver
TR13-9 4 day old, male, Trisomy 13 [46,XY,-D,t(DqDq)+]
‐ 100% cellular BM ‐ 5:1 M:E ratio ‐ Slight left shift of myeloid elements ‐ Full erythroid maturation ‐ Abnormal megakaryocyte nuclei are noted
(hyperchromatic nuclei with “Staghorn” appearance), megas with patchy distribution t/o marrow, foci of increased megakaryocytes
‐ Mild EMH in liver TR13-10 7 day old female, Trisomy 13 Mosaic (present in 36% of
cells) ‐ 100% cellular BM ‐ 1:1 to 1:2 M:E ratio ‐ Full erythroid maturation, but with increased
immature forms (SEE PHOTO) ‐ Overall 2-3 megas per hpf, patchy distribution ‐ Megakaryocytes with abnormal nuclei (Small
hyperchromatic nuclei and “Staghorn” appearance, SEE PHOTO)
‐ No extramedullary hematopoiesis noted in liver ‐ Spleen with mild hematopoiesis
TR13-11 11 day old female, Trisomy 13 ‐ 100% cellular BM ‐ > 5:1 M:E ratio ‐ Some left-shift of myeloid elements ‐ Full erythroid maturation ‐ 5-10 megas per 40x hpf (increased), abnormal
nuclei (small hyperchromatic – SEE PHOTO) ‐ No extramedullary hematopoiesis noted in liver ‐ MIld hematopoiesis in spleen
TR13-12 6 hour old male, Trisomy D1/13 (no genetic studies reported, but phenotypically
compatible)
‐ Scant marrow space, appears 100% cellular ‐ 1:1 M:E ratio (Note: large number of
hematogones excluded in this estimate) ‐ Full erythroid maturation ‐ Scattered megas seen (cytologically
unremarkable, ? appropriate in number) ‐ Marked extramedullary hematopoiesis in liver
with a 1:1 M:E ratio present in the liver TR13-13 5 day old female, Trisomy 13, ABO incombatibility with
jaundice
‐ 100% cellular BM ‐ 2:1 M:E ratio -> varies to 1:1 in certain regions ‐ Full maturation of erythroid elements ‐ Megakaryocytes decreased in number, 0-1 per
40x hpf ‐ ? Liver (not examined) -> small foci of
extramedullary hematopoiesis noted on pathology report
TR13-14 1 year old male [Trisomy 13 never confirmed – these were pre-mortem
studies]
‐ 100% cellular BM (slightly hypercellular for age) ‐ M:E ratio 5-6:1
35
36
‐ Myeloid cells showing some left shift ‐ Full erythroid maturation noted ‐ On avg 3-4 megakaryocytes per 40x hpf, with
small and condensed hyperchromatic nuclei (some with staghorn appearance)
‐ No extramedullary hematopoiesis was noted TR13-15 9 month old female, Trisomy D1 (Trisomy 13) ‐ >95% cellular
‐ M:E ratio 1-3:1 ‐ Slight left shift in myeloid cells ‐ Full maturation of erythroid elements ‐ Megakaryocytes show small hyperchromatic
nuclei with “Staghorn” appearance ‐ Slight increase in megakaryocyte numbers
noted per high power field (~5-7 megakaryocytes seen in most high power fields examined)
‐ Liver not available for review TR13-16 23 day old female, Partial Trisomy D1
Mother gravida 8:
One prior stillborn Four first trimester abortions Two normal kids (one male & one female) Similarities and differences with classic phenotype of trisomy 13
‐ Appropriate cellularity for age (>95% cellular) ‐ Increased hematogones ‐ M:E ratio is 5:1 ‐ Slight left shift in myeloid cells ‐ Full erythroid maturation visualized ‐ Megakaryocytes show small, “Staghorn”
appearing, hyperchromatic nuclei ‐ Slight increase in megakaryocyte numbers, 5-10
per high power field is noted ‐ Small minor clusters of hematopoiesis in the
liver are noted (with both erythroid and myeloid elements seen)
TR13-17 25 day old female, Trisomy D1 (Trisomy 13) ‐ Appropriate cellularity for age (>95% cellular) ‐ M:E ratio is 3-5:1 (morphology is poor) ‐ Left shift in myeloid cells is observed ‐ Full erythroid maturation is seen ‐ Megakaryocytes show small hyperchromatic
“Staghorn” nuclei ‐ ~2 megas per high power field, but given limited
cellularity, appears increased ‐ No observable hematopoiesis in the liver
(multiple sections assessed)
Supplementary Figure Legends
Fig. S1. Schematic of partial trisomy 13 cases with elevated and normal fetal hemoglobin (HbF)
levels. The figure on the left shows the 14 cases with elevated levels of HbF, from which the
proportion of each chromosomal region involved in cases with elevated HbF is derived in Figure
1B. The figure on the right shows the 43 cases with normal HbF levels, from which the
proportion of each chromosomal region involved in cases with normal HbF is derived in Figure
1B. Each case is shown in a vertical column and chromosomal regions included in each case
are shown in red.
Fig. S2. Relative expression in CD71+ erythroid precursors is shown for genes in the
chromosomal band 13q14 region. Relative expression is shown as a log ratio compared to a
panel of 78 other tissues (12). Genes that are known to play a role in erythropoiesis and globin
gene regulation consistently show a relative expression > 1 using this approach (including
BCL11A, GATA1, KLF1, and SOX6). All of these genes and their relative expression in CD71+
erythroid progenitors are shown in Table S2.
Fig. S3. Relative miR-15a (top, blue) and miR-16 (bottom, pink) expression at day 5 of
erythroid differentiation in primary human CD34+ derived cells transduced with pLVX-puro or
pLVX-miR-15a-16-1 lentiviruses. Quantification was performed using the ∆∆Ct method using
RNU19 as a control.
Fig. S4. Relative miR-15a (top, blue) and miR-16 (bottom, pink) expression in K562 cells
transduced with pSMPUW or pSMPUW-miR-15a-16-1 lentiviruses. Quantification was
performed using the ∆∆Ct method using RNU19 as a control.
37
Fig. S5. CD36 and CD45 staining of pLVX control (black) and pLVX-miR-15a-16-1 (cyan)
transduced erythroid progenitors on day 5 of differentiation show similar levels of staining. The
FACS plot shown are representative of three independent samples. The extent of staining is
similar overall with no major differences for mean fluorescence intensity of either marker. CD36
is expressed at this stage of erythropoiesis and low levels of CD45 are also present. This
suggests that these cells are phenotypically similar. There is a slight tendency for the pLVX-
miR-15a-16-1 cells to be slightly bigger, giving rise to correspondingly higher mean
fluorescence intensities of these markers. The high intensity tail present in CD45 samples is
from the small amount of myeloid cells present in the culture.
Fig. S6. A comparison of aggregate PCT compared with relative expression in erythroid tissues
of interest. Graphs are shown for bone marrow, fetal liver, and K562 cells showing relative
expression (as a log2 expression ratio) compared to a panel of 78 other tissues (12). The MYB
gene is highlighted in red in all of the graphs. The x-axis plots aggregate PCT (14) on a linear
scale, while the y-axis shows relative expression in the various tissues as a log2 ratio.
Fig. S7. Relative normalized expression of γ-globin (HBG) and ε-globin (HBE) from a MYB
siRNA dataset in primary erythroid cord blood progenitors (GSE13110). The expression of γ-
globin (HBG) and ε-globin (HBE) was normalized with all genes in the dataset and plotted
relative to the expression of adult β-globin (HBB). The data from control, mock siRNA, and
MYB siRNA experiments are shown, respectively, from left to right. These data confirm the role
that MYB plays in silencing the fetal and embryonic globin genes in human erythroid
progenitors.
Fig. S8. Representative cytospins from control or shMYB 1 transduced cells from day 6 of
erythroid differentiation. The shMYB cells appear more mature with smaller cell size and more
38
compact nuclei. Additionally an increased number of myeloid (monocytes) are found in various
cytospins, as exemplified by the image on the far right.
Fig. S9. Gene expression of known regulators of globin gene expression and switching.
Expression levels of the previously characterized regulators of globin gene expression,
BCL11A, GATA1, KLF1, ZFPM1, and SOX6 (13) are depicted after processing and normalizing
the microarray data from control cells and cells in which MYB has been knocked down with
shMYB (n = 4 per group). The data is depicted using log2 expression levels. The error bars
show the standard deviation of the mean.
Fig. S10. Gene set enrichment analysis (GSEA) demonstrates that precocious erythroid
differentiation occurs with MYB knockdown. GSEA (6, 7) was used to examine whether a gene
expression set derived from significantly upregulated genes between the MYB shRNA and
control cells (totaling 188 significantly different genes), was enriched when comparing later time
points of erythroid differentiation among G1E cells (9). For these comparisons, the 21 hour
(top) and 30 hour (bottom) time points were compared with the 0 hour time points. The G1E
expression data was derived from the Gene Expression Omnibus accession number GSE628
dataset (9).
Fig. S11. Marked upregulation of γ-globin gene expression upon knockdown of MYB in K562
cells. qRT-PCR was used to compare γ-globin gene expression in K562 cells transduced with
pSMPUW or pSMPUW-miR-15a-16-1 lentiviruses following selection. All experimental samples
are significantly different than the controls (p < 0.001, n = 3-4 per group).
Fig. S12. Alteration in cell cycle regulators on knockdown of MYB. Increase in the relative
expression of cell cycle regulatory genes from microarray data of shMYB and control samples is
39
shown. The decrease of certain positive cell cycle regulators and increase of certain negative
cell cycle regulators is depicted in the graphs. All data is depicted as the normalized mean ±
the standard deviation (n=4 per group).
Fig. S13. Normal megakaryocyte morphology on bone marrow histological sections. These
sections are shown at the same magnification and with similar processing and staining of
samples, as those shown in Fig. 4. Examples of two normal megakaryocytes (with normal
nuclear morphology) are highlighted in the images with cyan arrows. All images are shown at
400X magnification and slides were stained with hematoxylin and eosin.
40
References
1. Sankaran VG, et al. (2008) Human fetal hemoglobin expression is regulated by the
developmental stage-specific repressor BCL11A. Science (New York, N.Y 322(5909):1839-1842.
2. Sankaran VG, Orkin SH, & Walkley CR (2008) Rb intrinsically promotes erythropoiesis by coupling cell cycle exit with mitochondrial biogenesis. Genes & development 22(4):463-475.
3. Sankaran VG, et al. (2009) Developmental and species-divergent globin switching are driven by BCL11A. Nature 460(7259):1093-1097.
4. Dai M, et al. (2005) Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data. Nucleic acids research 33(20):e175.
5. Bianchi E, et al. (2010) c-myb supports erythropoiesis through the transactivation of KLF1 and LMO2 expression. Blood 116(22):e99-e110.
6. Mootha VK, et al. (2003) PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature genetics 34(3):267-273.
7. Subramanian A, et al. (2005) Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102(43):15545-15550.
8. Watkins NA, et al. (2009) A HaemAtlas: characterizing gene expression in differentiated human blood cells. Blood 113(19):e1-9.
9. Welch JJ, et al. (2004) Global regulation of erythroid gene expression by transcription factor GATA-1. Blood 104(10):3136-3147.
10. Gairdner D, Marks J, & Roscoe JD (1952) Blood formation in infancy. Part II. Normal erythropoiesis. Archives of disease in childhood 27(133):214-221.
11. Gairdner D, Marks J, & Roscoe JD (1952) Blood formation in infancy. Part I. The normal bone marrow. Archives of disease in childhood 27(132):128-133.
12. Su AI, et al. (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proceedings of the National Academy of Sciences of the United States of America 101(16):6062-6067.
13. Sankaran VG, Xu J, & Orkin SH (2010) Advances in the understanding of haemoglobin switching. British journal of haematology 149(2):181-194.
14. Friedman RC, Farh KK, Burge CB, & Bartel DP (2009) Most mammalian mRNAs are conserved targets of microRNAs. Genome research 19(1):92-105.
41
42
15. Rogers JF (1984) Clinical delineation of proximal and distal partial 13q trisomy. Clin Genet 25(3):221-229.
16. Tharapel SA, Lewandowski RC, Tharapel AT, & Wilroy RS, Jr. (1986) Phenotype-karyotype correlation in patients trisomic for various segments of chromosome 13. J Med Genet 23(4):310-315.
17. Korbel JO, et al. (2009) The genetic architecture of Down syndrome phenotypes revealed by high-resolution analysis of human segmental trisomies. Proc Natl Acad Sci U S A 106(29):12031-12036.
Elevated HbF Subject Ploidy (red=trisomy)
Subject number
Ban
d
2 4 6 8 10 12
pter-q11q12q13q14q21q22q31
q32-qter
Normal HbF Subject Ploidy (red=trisomy)
Subject number
Ban
d
10 20 30 40
pter-q11q12q13q14q21q22q31
q32-qter
Figure S1
-4
-3
-2
-1
0
1
2
3
4
5
EBPL
DG
KH
FND
C3A
NU
DT1
5
C13o
rf1
SUG
T1L1
LRCH
1
THSD
1
HTR
2A
CYSL
TR2
KCN
RG
MG
C559
0
AKA
P11
BC02
5370
ENO
X1
TSC2
2D1
SUCL
A2
SERP
INE3
TNFS
F11
KIA
A05
64
C13o
rf18
P2RY
5
KCTD
4
CDA
DC1
PHF1
1
TRIM
13
SERP
2
FOXO
1
CAB3
9L
CCD
C70
C13o
rf31
SLC2
5A30
WD
FY2
ITM
2B
LECT
1
LCP1
GTF
2F2
FAM
124A
GU
CY1B
2
MLN
R
NEK
3
RCBT
B2
TPT1
DKF
Zp43
4F16
22
COG
3
DH
RS12
ATP
7B
KIA
A17
04
RCBT
B1
VPS3
6
EPST
I1
ALG
11
UTP
14C
CPB2 ESD
RNA
SEH
2B
C13o
rf30
ELF1
INTS
6
MED
4
NA
RG1L
NU
FIP1
KPN
A3
ZC3H
13
DN
AJC
15 RB1
WBP
4
MTR
F1
SIA
H3
SLC2
5A15
MRP
S31
DLE
U2
DLE
U2
ARL
11
SUG
T1
CKA
P2
C13o
rf15
CD71+ Erythroid
Figure S2
Figure S3
Control miR-15a-16-10.0
0.5
1.0
1.5
2.0R
elat
ive
miR
-15a
Exp
ress
ion
Control miR-15a-16-10.0
0.5
1.0
1.5
2.0
Rel
ativ
e m
iR-1
6 Ex
pres
sion
Figure S4
Control miR-15a-16-10
1
2
3
4
Rel
ativ
e m
iR-1
5a E
xpre
ssio
n
Control miR-15a-16-10.0
0.5
1.0
1.5
2.0
2.5
Rel
ativ
e m
iR-1
6 Ex
pres
sion
Figure S5
CD36 CD45
Figure S6
1.1
-6
-4
-2
0
2
4
6
Aggregate PCT
Log 2
Expr
essi
onR
atio
1.1
-5
0
5
10
Aggregate PCT
Log 2
Expr
essi
onR
atio
1.1
-6
-4
-2
0
2
4
6
Aggregate PCTLog 2
Expr
essi
onR
atio
Bone Marrow
Fetal Liver
K562 Cells
Figure S7
Mock
Control s
iRNA
MYB siRNA
0
2
4
6
8
10
HB
G/H
BB
Mock
Control s
iRNA
MYB siRNA
0.0
0.2
0.4
0.6
0.8
HB
E/H
BB
Figure S8
Control shMYB
BCL11A
GATA1
ZFPM1KLF1
SOX60
5
10
15C ontrolshMYB
Nor
mal
ized
Exp
ress
ion
Leve
l (L
og2)
Figure S9
G1E cells21 hours vs. 0 hours
p = 0.004
G1E cells30 hours vs. 0 hours
p < 0.001
Figure S10
Figure S11
Control
shMYB 1
shMYB 2
0
5
10
15R
elat
iveγ -
Glo
bin
Expr
essi
on
CDK2CDK4
CDK6MYC
CDKN1B
CDKN1C
CDKN2C
CDKN2D0.0
0.5
1.0
1.5
2.0C ontrolshMYB
Rel
ativ
e Ex
pres
sion
Figure S12
PositiveCell CycleRegulators
NegativeCell CycleRegulators
Figure S13