7
Fetal hemoglobin in sickle cell anemia: Bayesian modeling of genetic associations Paola Sebastiani, 1 * Ling Wang, 1 Vikki G. Nolan, 2 Efthymia Melista, 3 Qianli Ma, 2 Clinton T. Baldwin, 3 and Martin H. Steinberg 2 We genotyped single nucleotide polymorphisms (SNPs) in: (1) the b-globin gene-like cluster, (2) quantitative trait loci (QTL) previously associated with fetal hemoglobin (HbF) concentration on chromosomes 6q, 8q, and Xp, and (3) candidate genes that could effect HbF levels, in sickle cell anemia subjects. HbF concentra- tion was modeled as a continuous variable with values in a finite interval using a novel Bayesian approach. We first tested the associations of SNPs with HbF in a group of 1,518 adults and children (CSSCD study), and validated the results in a second independent group of 211 adults (MSH study). In subjects aged 24 years, 5 SNPs in TOX (8q12.1), 2 SNPs in the b-globin gene-like cluster, 2 SNPs in the Xp QTL, and 1 SNP in chromosome 15q22 were associated with HbF in the CSSCD and also validated in the MSH. Four other SNPs in 15q22 were associated with HbF only in the larger CSSCD data. When patients aged <24 years in the CSSCD were examined, additional genes, including 4 with roles in nitric oxide metabolism, were associ- ated with HbF level. These studies confirm prior analyses using traditional analytical approaches showing associations of SNPs in TOX, GPM6B, and the b-globin gene-like cluster with HbF levels. We also identified an additional candidate regulatory region in chromosome 15q22 that is associated with HbF level. By strati- fying patients by age, our results also suggest that different genes might modulate the rate of decline of HbF and the final level of HbF levels in sickle cell anemia. Am. J. Hematol. 83:189–195, 2008. V V C 2007 Wiley-Liss, Inc. Introduction Fetal hemoglobin (HbF) inhibits the polymerization of sickle hemoglobin (HbS) [1] and its levels are likely to be regulated as a multigenic trait. The occurrence of some complications of sickle cell anemia like osteonecrosis, acute chest syndrome, and painful episodes are reduced by increased concentrations of HbF and HbF levels are inversely associated with mortality [2–5]. HbF concentra- tion, high at birth, declines slowly during the first three dec- ades of life in sickle cell anemia (homozygosity for HBB glu6val) subjects. Its rate of decline and the stable level ulti- mately achieved in adults vary considerably. Thus, under- standing which genes modulate its concentration has signif- icant biological and clinical importance. Some of this variation in HbF levels is governed by genes linked to the b-globin gene-like cluster and by quan- titative trait loci (QTL) present on chromosomes 6 [6,7], 8 [8,9], and the X-chromosome [10]. Other regulatory loci and epigenetic factors are also suggested to be important [11,12]. Previously, we genotyped single nucleotide poly- morphisms (SNPs) within candidate genes and QTLs on chromosomes 6q, 8q, and Xp, and in the b-globin gene-like cluster. As HbF levels were not normally distributed, we used lognormal transformation of HbF with linear regres- sion and Random Forest analysis to find that SNPs in TOX (8q), EGFL6, and GPM6B (Xp), several genes in 6q and SNPs within, and 5 0 to the b-globin gene-like cluster were associated with HbF [6,13]. In this report, we describe the application of a novel, newly developed Bayesian approach that uses the Beta distribution [14] to model HbF concen- trations without ad hoc transformations. Because our method is more powerful than traditional regression analy- sis, we confirm our prior analyses and also identify another possible regulatory region in chromosome 15q22. By sepa- rately analyzing adults and children we also find that dis- tinct genes might modulate the rate of decline of HbF. Results Clinical variables Table I reports summaries of the clinical and laboratory variables in the two age delimited groups from the Cooper- ative Study of Sickle Cell Disease (CSSCD) and in the Mul- ticenter Study of Hydroxyurea in Sickle Cell Anemia (MSH) validation group. The average age of the subjects in the ‘‘young’’ group is 11.8 years, while the average age of the ‘‘older’’ subjects in the CSSCD is 36 years, and 33 years in the MSH. The older subjects have a larger proportion of females, comparable prevalence of clinical complications, and a slightly lower HbF concentration. Figure 1 shows that HbF concentration has, on average, a decreasing trend in younger subjects (aged <24 years) and ranges from 9% (aged 4–8 years) to 5.5% (aged 16–24 years); after age 24 years, HbF levels becomes more stable. Genetic modifiers of HbF in subjects aged 24 years Table II reports the SNPs that we found associated with HbF concentration in the older subjects of the CSSCD. Only those associations that were either supported by mul- 1 Department of Biostatistics, Boston University School of Public Heath, Bos- ton, Massachusetts; 2 Department of Medicine, Boston University School of Medicine, Boston, Massachusetts; 3 Center for Human Genetics, Boston Uni- versity School of Medicine, Boston, Massachusetts Contract grant sponsor: NHLBI; Contract grant numbers: R21 HL080463, R01 HL68970, U54 HL70819, T32 HL007501. *Correspondence to: Paola Sebastiani, Boston University School of Public Health, 715 Albany Street, Boston, MA 02118. E-mail: [email protected] Received for publication 7 May 2002; Revised 29 June 2007; Accepted 9 July 2007 Am. J. Hematol. 83:189–195, 2008. Published online 4 October 2007 in Wiley InterScience (www.interscience. wiley.com). DOI: 10.1002/ajh.21048 V V C 2007 Wiley-Liss, Inc. American Journal of Hematology 189 http://www3.interscience.wiley.com/cgi-bin/jhome/35105

Fetal hemoglobin in sickle cell anemia: Bayesian modeling of genetic associations

Embed Size (px)

Citation preview

Fetal hemoglobin in sickle cell anemia: Bayesian modelingof genetic associations

Paola Sebastiani,1* Ling Wang,1 Vikki G. Nolan,2 Efthymia Melista,3 Qianli Ma,2

Clinton T. Baldwin,3 and Martin H. Steinberg2

We genotyped single nucleotide polymorphisms (SNPs) in: (1) the b-globin gene-like cluster, (2) quantitativetrait loci (QTL) previously associated with fetal hemoglobin (HbF) concentration on chromosomes 6q, 8q,and Xp, and (3) candidate genes that could effect HbF levels, in sickle cell anemia subjects. HbF concentra-tion was modeled as a continuous variable with values in a finite interval using a novel Bayesian approach.We first tested the associations of SNPs with HbF in a group of 1,518 adults and children (CSSCD study),and validated the results in a second independent group of 211 adults (MSH study). In subjects aged ≥24years, 5 SNPs in TOX (8q12.1), 2 SNPs in the b-globin gene-like cluster, 2 SNPs in the Xp QTL, and 1 SNP inchromosome 15q22 were associated with HbF in the CSSCD and also validated in the MSH. Four otherSNPs in 15q22 were associated with HbF only in the larger CSSCD data. When patients aged <24 years inthe CSSCD were examined, additional genes, including 4 with roles in nitric oxide metabolism, were associ-ated with HbF level. These studies confirm prior analyses using traditional analytical approaches showingassociations of SNPs in TOX, GPM6B, and the b-globin gene-like cluster with HbF levels. We also identifiedan additional candidate regulatory region in chromosome 15q22 that is associated with HbF level. By strati-fying patients by age, our results also suggest that different genes might modulate the rate of decline ofHbF and the final level of HbF levels in sickle cell anemia. Am. J. Hematol. 83:189–195, 2008. VVC 2007Wiley-Liss, Inc.

IntroductionFetal hemoglobin (HbF) inhibits the polymerization of

sickle hemoglobin (HbS) [1] and its levels are likely to beregulated as a multigenic trait. The occurrence of somecomplications of sickle cell anemia like osteonecrosis,acute chest syndrome, and painful episodes are reducedby increased concentrations of HbF and HbF levels areinversely associated with mortality [2–5]. HbF concentra-tion, high at birth, declines slowly during the first three dec-ades of life in sickle cell anemia (homozygosity for HBBglu6val) subjects. Its rate of decline and the stable level ulti-mately achieved in adults vary considerably. Thus, under-standing which genes modulate its concentration has signif-icant biological and clinical importance.Some of this variation in HbF levels is governed by

genes linked to the b-globin gene-like cluster and by quan-titative trait loci (QTL) present on chromosomes 6 [6,7], 8[8,9], and the X-chromosome [10]. Other regulatory lociand epigenetic factors are also suggested to be important[11,12]. Previously, we genotyped single nucleotide poly-morphisms (SNPs) within candidate genes and QTLs onchromosomes 6q, 8q, and Xp, and in the b-globin gene-likecluster. As HbF levels were not normally distributed, weused lognormal transformation of HbF with linear regres-sion and Random Forest analysis to find that SNPs in TOX(8q), EGFL6, and GPM6B (Xp), several genes in 6q andSNPs within, and 50 to the b-globin gene-like cluster wereassociated with HbF [6,13]. In this report, we describe theapplication of a novel, newly developed Bayesian approachthat uses the Beta distribution [14] to model HbF concen-trations without ad hoc transformations. Because ourmethod is more powerful than traditional regression analy-sis, we confirm our prior analyses and also identify anotherpossible regulatory region in chromosome 15q22. By sepa-rately analyzing adults and children we also find that dis-tinct genes might modulate the rate of decline of HbF.

Results

Clinical variablesTable I reports summaries of the clinical and laboratory

variables in the two age delimited groups from the Cooper-ative Study of Sickle Cell Disease (CSSCD) and in the Mul-ticenter Study of Hydroxyurea in Sickle Cell Anemia (MSH)validation group. The average age of the subjects in the‘‘young’’ group is 11.8 years, while the average age of the‘‘older’’ subjects in the CSSCD is 36 years, and 33 years inthe MSH. The older subjects have a larger proportion offemales, comparable prevalence of clinical complications,and a slightly lower HbF concentration. Figure 1 shows thatHbF concentration has, on average, a decreasing trend inyounger subjects (aged <24 years) and ranges from 9%(aged 4–8 years) to �5.5% (aged 16–24 years); after age24 years, HbF levels becomes more stable.

Genetic modifiers of HbF in subjects aged ≥24 yearsTable II reports the SNPs that we found associated with

HbF concentration in the older subjects of the CSSCD.Only those associations that were either supported by mul-

1Department of Biostatistics, Boston University School of Public Heath, Bos-ton, Massachusetts; 2Department of Medicine, Boston University School ofMedicine, Boston, Massachusetts; 3Center for Human Genetics, Boston Uni-versity School of Medicine, Boston, Massachusetts

Contract grant sponsor: NHLBI; Contract grant numbers: R21 HL080463,R01 HL68970, U54 HL70819, T32 HL007501.

*Correspondence to: Paola Sebastiani, Boston University School of PublicHealth, 715 Albany Street, Boston, MA 02118. E-mail: [email protected]

Received for publication 7 May 2002; Revised 29 June 2007; Accepted 9 July2007

Am. J. Hematol. 83:189–195, 2008.

Published online 4 October 2007 in Wiley InterScience (www.interscience.wiley.com).DOI: 10.1002/ajh.21048

VVC 2007 Wiley-Liss, Inc.

American Journal of Hematology 189 http://www3.interscience.wiley.com/cgi-bin/jhome/35105

tiple SNPs in the same region or gene, or were validated inthe MSH subjects are shown. While SNPs in HBE1 andHBG2 were not significantly associated with HbF in theMSH data set, the distribution of HbF levels for each of thethree genotypes as well as the observation that thosehomozygous for the minor allele (G) had a distribution con-centrated on lower values in the two data sets (Fig. 2) sug-gested a trend toward association. Similarly, the two SNPsin GPM6B associated with HbF levels in older subjects ofthe CSSCD are not significant in the MSH set, but the fittedmeans show the same trend and HbF is lowest in subjectswho are homozygous for the minor allele (see Fig. 3). Theassociations of 5 SNPs in 15q21-22 suggest that AQP9,MAP2K1, SMAD3, and SMAD6 or other genes or regula-tory elements in this region are novel modulators of HbF.The 4 SNPs in MAP2K1, SMAD3, and SMAD6 span aregion of �740 kb. The SNP rs1867380 in AQP9 (15q22)is a functional SNP with amino acid change A/T located inthe last exon of the gene. The MAF observed for the minorallele A is 22% in the CSSCD set, and 19% in the MSHset. This is consistent with 20% estimated in the AfricanAmerican panel of 23 samples provided by the Coriell CellRepository and reported in dbSNP. HbF levels are slightlylower in subjects who are homozygous for the minor alleleA, while heterozygous subjects have a highly variable distri-bution of HbF (Fig. 4).

Genetic modifiers of HbF in subjects aged <24 yearsTables III and IV report the SNPs that we found associ-

ated with HbF concentration in the younger subjects of theCSSCD. In agreement with the greater power provided bythe larger sample size than the older group, we found alarger number of significant associations: 59 SNPs in 13different genes. The four SNPs in KDR are within a dis-tance of 10 kb from the functional SNP rs2305948 withamino acid change V/I. The 4 SNPs in NOX3 are at an av-erage distance of 8kb, while the three SNPs in NOS3 tag aregion of less than 3 kb. The 19 SNPs in 13q12-13 tag thethree genes FLT1, ALOX5AP, and KL in a region of less

than 500 kb. The associations of SNPs in TOX, HBE1,HBG2, and GPM6B are in agreement with the findingsamong the older subjects.

DiscussionThe complex regulatory environment that modulates the

decline in g-globin gene (HBG2, HBG1) expression from itspeak during the 2nd trimester of gestation to its nadir inadult life and its steady-state level in adults provides ampleopportunity for genetic modulation [11,12]. Both cis- andtrans-acting elements have been found to effect HbF con-centration and the contribution from epigenetic modulationis just beginning to be explored [15,16]. In this study weapply newly developed-advanced analytical methods tounderstand the association of SNPs in candidate geneswith the phenotype of HbF levels in adults and in childrenwith sickle cell anemia.Compared with analytical models using a lognormal dis-

tribution, the Bayesian method with Beta distribution wedeveloped for this analysis has a comparable false positiverate but a higher true positive rate that increases the powerwhen the sample size is small. Using this novel approach,we confirm our previous findings suggesting that SNPs inTOX, GPM6B, and within and 50 to the b-globin gene-likecluster are associated with HbF [9]. TOX, whose product isa high mobility group (HMG) box protein family member[17] binds DNA in the minor groove with high sequencespecificity and might control gene expression throughchanging local chromatin structure and modulating the for-mation of transcriptional regulatory complexes [18]. In addi-tion, using this more powerful statistical approach, wefound additional genes, not previously reported to be asso-ciated with HbF levels.In sickle cell anemia, unlike normal individuals where

about 90% of people aged 5–9 years have HbF levels lessthat 1%, postnatal HbF levels decline very gradually andreach their steady-state level only in the third decade.(seeRef. 19 and Fig. 1, left panel). When we stratified ourcases into individuals aged <24 years and aged �24years, we found that additional SNPs were associated withHbF in the younger patients. Among these genes wereASS, NOS1, NOS3, and KL all of which have roles in NOmetabolism [20–22]. Polymorphisms of these genes couldmodulate the levels of NO that, via its role in the synthesisof cGMP, might effect g-globin gene expression [23,24].Also associated with HbF were SNPs in vascular endothe-lial growth factor receptors, KDR, and FLT1. It is unclearhow these genes might influence HbF levels; however, onecould postulate an effect on erythroid regeneration [25].The presence of additional SNPs associated with HbF levelin only patient’s aged �24 years suggests that these genesmay modulate the decline in HbF. It is also possible thatthe larger number of younger patients provide greaterpower for detecting SNPs and genes with smaller effectson HbF level.We have also identified a new region in 15q22-21 con-

taining many genes, some of which might be candidatemodulators of HbF expression. Included in this geneticinterval are MAP2K1, SMAD3, and AQP9, an aquaporinchannel facilitating water and urea permeability andexpressed in leukocytes, liver, lung, and spleen [26]. Noneof these genes have a known connection to HbF synthesisor erythropoiesis and this region is not a known QTL asso-ciated with HbF.To eliminate the inherent bias of candidate gene associa-

tion studies, we have initiated genome-wide scans toexplore further the relationship of SNPs with HbF levels. Inour first studies with pooled DNA samples from the MSH,we compared DNA pools from 60 patients in the top and

TABLE I. Summary of Clinical and Laboratory Variables for the

Subjects in the Two Data Sets From the CSSCD, and the MSH

Variable

CSSCD

(<24, N = 980)

CSSCD

(≥24, N = 538)

MSH

(N = 211)

Mean SD Mean SD Mean SD

Age (years) 11.83 6.48 35.50 9.66 32.70 7.50

ALT (Sgpt) U/L 30.42 43.56 48.70 58.35 25.42 17.40

AST (Sgot) U/L 49.14 20.96 45.49 23.53 42.95 21.70

Bilirubin mg/dl 3.09 1.88 3.17 1.84 3.74 2.41

Hb g/dl 8.29 1.23 8.69 1.35 8.49 1.31

%HbF 7.19 6.16 5.93 5.36 5.12 3.51

MCV fL 87.47 7.32 93.57 7.64 100.45 8.94

Reticulocytes %rRC 11.42 5.65 11.57 4.91 10.99 3.82

Sys BP ** mmHg 101.90 9.86 110.00 10.60 122.30 13.83

WBC 1000/lL 12.20 2.92 11.65 2.61 12.45 3.92

Proportion (%) Proportion (%) Proportion (%)

ACS 69 73 34

AVN 35 38 25

Blood transfusion 14 17 50

Death 3 13 3

HbSS 1 thal 30 32 30

Leg ulceration 5 34 23

Priapism 12 12 5

Sex 50 (female) 57 (female) 51 (female)

Stroke 8 9 9

190 American Journal of Hematology DOI 10.1002/ajh

60 patients in the bottom quartile of HbF concentration.Using the Sentrix HumanHap300 bead chip that contains�317,000 HapMap-based, gene-centric SNPs (Illumina1)we found an association of AQP9, MAP2K1, and SMAD3with HbF level supporting the results of the candidate genestudies [27]. Within the 15q22-21 interval, and, associatedwith HbF in our genome-wide analysis, were SNPs inFOXB1, RORA, and ADAM10, examples of a putative tran-scription factor, a nuclear hormone receptor, and a metallo-protease, respectively. While one or more genes within15q22-21 could modulate HbF expression by an undefinedmechanism, it is also possible that these SNPs are markinga chromosomal region with a regulatory role in HbF produc-tion. Further fine mapping of this region might help pinpointcandidate genes.Although we have examined 2 unrelated patient groups,

we acknowledge that further validation and replication ofour findings in additional patients is necessary. Also, ourmain studies were based on analysis of candidate genesand not the result of genome-wide association testing of

individual patient samples. We, therefore, believe anunbiased genome-wide association study in large numbersof patients should be the next step and will further theunderstanding of genetic modulation of HbF in sickle cellanemia.

Methods

DataOur primary data set consists of 1,518 subjects from the Cooperative

Study of Sickle Cell Disease (CSSCD) [28] for whom we had pheno-typic and genotype information. Methods of laboratory measurementswere described previously [29]. We divided this set into two groups: aset of 980 patients aged <24 years, and a set of 538 patients aged�24 years; we choose the age 24 years because it appears to denotethe time when HbF becomes stable in sickle cell anemia (see Fig. 1and Ref. 19). The Multicenter Study of Hydroxyurea in Sickle Cell Ane-mia (MSH) [30] database provides a replication set of 211 adultspatients, aged �24 years, with DNA samples and HbF levels beforehydroxyurea treatment. These studies were approved by the IRB ofBoston Medical Center. Summaries of the phenotypic and clinical dataare in Table I.

TABLE II. Summary of SNPs That Were Found Significantly Associated With Levels of HbF in the Older Subjects of the CSSCD

Gene Band

CSSCD (age � 24 years) MSH

SNP AA AB BB D(DIC) AA AB BB D(DIC)

TOX 8q21.1 rs10504269 0.070 0.062 0.054 22.95 0.049 0.053 0.042 20.15

rs6997859 0.066 0.063 0.052 20.03 0.051 0.047 0.048 2.706

rs12155519 0.062 0.067 0.056 20.36 0.052 0.051 0.041 21.27

rs1947178 0.056 0.065 0.065 0.79 0.043 0.049 0.056 21.94

rs389349 0.060 0.070 0.039 26.91 0.047 0.058 0.069 24.79

rs4737532 0.600 0.670 0.058 1.11 0.045 0.052 0.055 0.377

rs851800 0.061 0.068 0.045 24.25 0.045 0.052 0.058 1.78

HBE1 11p15.4 rs7130110 0.058 0.078 0.067 27.3 0.049 0.050 0.063 2.07

HBG2 11p15.4 rs7482144 0.058 0.078 0.055 26.93 0.049 0.050 2.05

AQP9 15q22 rs1867380 0.058 0.070 0.056 23.64 0.052 0.044 0.049 22.08

MAP2K1 15q22.1 rs4489951 0.066 0.057 0.051 25.04

SMAD6 15q22.1 rs1440372 0.062 0.054 0.042 24.1

SMAD3 15q22.1 rs10518707 0.062 0.056 0.070 23.84

rs8038623 0.061 0.068 0.043 22.87

GPM6B Xp22.2 rs1005589 0.067 0.064 0.056 22.77 0.051 0.051 0.047 1.75

GPM6B Xp22.2 rs11095629 0.067 0.062 0.047 24.13 0.051 0.048 0.042 0.837

Column (1) Official gene name; (2) Cytogenic band; (3) SNP identifier. Underlined SNPs are validated in the MSH set; (4–6) Fitted mean of HbF

level for the three genotypes with A 5 major allele and B 5 minor allele; (7) Measure of significance given by the difference in DIC: the smaller

the stronger the evidence of association; (8–10) Fitted mean of HbF level in the MSH set; (11) Measure of significance in the MSH set.

Figure 1. Left: Distribution of fetal hemoglobin (y-axis HbF%: percent of total hemoglobin) for different ages (x-axis). The plotshows that HbF% decreases for increasing ages until about 24 years, and then become more stable. Right: Distribution of HbFafter the logit transformation used to achieve approximate normality.

American Journal of Hematology DOI 10.1002/ajh 191

We analyzed �850 SNPs in 320 candidate genes that were chosenfor their potential role in the modulation of sickle cell anemia subpheno-types [31–34] and genes in QTLs at 6q22-23 [6], 8q11-q12 [8], andXp22.2-p22.3 [10], and also the b-globin gene-like cluster on chromo-some 11p15.5 [13,35,36]. SNPs were chosen and typed between 2003and 2006 using a variety of techniques that included mass spectrome-try (Sequenom, San Diego, CA), ABI SNPLEX, ABI Realtime PCRassays, and a custom-made Illumina platform based on haplotype tag-ging SNPs selected from the HapMap project (Phase I, Yoruba sample)[37]. For quality control purposes, about 3% of the DNA samples wereregenotyped.

Statistical analysisWe examined the association of the 850 SNPs with HbF concentra-

tion in our primary dataset, using a two-step approach. In the first step,we used standard linear regression to select a set of SNPs with someevidence of association with HbF concentration. This initial step is com-putationally efficient, making it easy to screen large numbers of SNPs,but it is based on approximations that are valid in large samples andcould yield many false positive associations. Therefore, in the secondstep we tested the associations identified in the first step with a moresophisticated Bayesian approach able to remove the number of invalidassociations (Wang L, et al. Discovery of genetic modifiers of a finitedomain phenotype using Bayesian beta regression, submitted).

In the initial step, we transformed the value of HbF% normalized tothe interval 0, 1 using the logit transformation log (HbF/(1-HbF)). Wechose this transformation of HbF to achieve approximate normality andbe able to use standard regression (see Fig. 1, right panel). For each

SNP in the data set, we fit the linear regression model of no associa-tion between the SNP and log(HbF/(1-HbF)), and the linear regressionmodel of association between the SNP and log(HbF/(1-HbF)) afterappropriate adjusting for age, gender, and hemoglobin genotype. Wecompared the significance of the two models using the Akaike informa-tion criterion (AIC) [38], and selected all SNPs in which the model ofassociation yielded a larger AIC. In this first analysis, we identified 200SNPs associated with variation of HbF in the data of adult subjects,and 232 SNPs associated with variation of HbF in the data of youngsubjects. Only 68 of these SNPs were found simultaneously associatedwith HbF in both data sets. We then proceeded with the more sophisti-cated but computationally intensive analysis of the association of theselected SNPs. For each of these SNPs, the procedure assumes thatthe proportion of HbF follows a Beta distribution, and uses linearregression equations to model the association among the SNP geno-types, the logit transformation of the average HbF, and possibly covari-ates. We use the Beta distribution because it is best suited to describedata that vary in a limited interval [39]. The (unadjusted) regressionmodel is described in the next Equation:

logl

1� l¼ b0 þ b1X1 þ b2X2

where X1 and X2 are dummy variable taking value (0,0) for a homozy-gous subject (common allele); (1,0) for a heterozygous subject; and(0,1) for a homozygous subject (rare allele). The model is fitted using aBayesian procedure that is described in detail by Wang L, et al. (Dis-covery of genetic modifiers of a finite domain phenotype using Bayes-ian beta regression, submitted), and computes the distributions of the

Figure 2. The top boxplots display the distribution of HbF in older subjects with the three genotypes of the SNP in the HBE1gene in the CSSCD data (left) and MSH data (right). Although the model of general association does not reach statistical signifi-cance in the MSH set, the trends in the two sets are concordant. A similar observation apply to the distribution of HbF displayedin the top-right boxplots (SNP in HBG2), with heterozygous subjects predisposed to slightly higher HbF concentration.

192 American Journal of Hematology DOI 10.1002/ajh

regression parameters conditionally on the data. Because the calcula-tions of these distributions in closed form are not feasible, we need toresort to stochastic computations such as the Gibbs Sampling (seeRef. 40 for a recent review about this method). We used the implemen-

tation of Gibbs sampling in the program WinBugs 1.4 to estimate theregression coefficient, the fitted means, and the goodness of fit meas-ure known as the ‘‘deviance information criterion’’ (DIC). This measuregeneralizes the AIC to models in which the number of parameters is

Figure 3. The top boxplots display the distribution of HbF in older subjects with the three genotypes of the SNP rs1005589 inthe GPM6B gene in the CSSCD data (left) and MSH data (right). Once again, the model of general association does not reachstatistical significance in the MSH set, but the trends in the two sets are concordant. A similar observation apply to the distribu-tion of HbF displayed in the top-right boxplots (SNP in HBG2), with heterozygous subjects predisposed to slightly higher HbFconcentration.

Figure 4. Distribution of HbF in older subjects with the three genotypes of the SNP rs1867380 in the AQP9 gene in the CSSCDdata (left) and MSH data (right). In both sets, there is a strong association between the SNP genotypes and HbF lebvels. Homo-zygous subjects for the minor allele A appear to be predisposed to slightly lower HbF concentration.

American Journal of Hematology DOI 10.1002/ajh 193

difficult to estimate [41] and it is estimated in the program WinBugs 1.4as part of the model fit procedure. We used this procedure to fit themodel of no association between each SNP selected earlier and HbFconcentration and the model of general association. As suggested inRef. 41, we selected as associated those SNPs with a general associa-tion model that yielded a change of DIC of at least 3 compared withthe model of no association. A change of DIC of at least 3 means thatthe model of association is �10 times more likely than the model of noassociation. To support some of the findings, we also report SNPs thathave weaker associations with changes in HbF levels (a change in DICbetween 0 and 3) but are physically near other SNPs with a significanteffect on HbF. The scripts for running the analysis are available fromthis website: http://www.bu.edu/sicklecell/downloads/Projects/

We validated a subset of the associations found in the primary analy-sis using the MSH data for which we had genotype data for a subset of384 SNPs. Because the subjects in the MSH set are older (see TableI), we used this data only to validate the reproducibility of the associa-tions found in the set of older subjects of the CSSCD. Furthermore,since the sample size of the validation set is smaller than that of theCSSCD and hence provides less power, we considered an association‘‘validated’’ if there was either a significant change in the DIC or a con-cordant trend in the distribution of HbF.

Acknowledgements

Authors thank the investigators of the Multicenter Studyof Hydroxyurea and the Cooperative Study of Sickle CellDisease who obtained blood samples for DNA-based stud-ies and analyzed data from these studies for the study pub-lications cited in the text of this article.

References1. Poillon WN, et al. Sparing effect of hemoglobin F and hemoglobin A2 on the

polymerization of hemoglobin S at physiologic ligand saturations. Proc NatlAcad Sci USA 1993;90:5039–5043.

2. Platt OS, Thorington BD, Brambilla DJ, et al. Pain in sickle cell disease. Ratesand risk factors. N Engl J Med 1991;325:11–16.

3. Platt OS, et al. Mortality in sickle cell disease. Life expectancy and risk fac-tors for early death. N Engl J Med 1994;330: 1639–1644.

4. Milner PF, et al. Osteonecrosis of the humeral head in sickle cell disease.Clin Orthop Relat Res 1993;283:136–143.

5. Castro O, et al. The acute chest syndrome in sickle cell disease: Incidenceand risk factors. The Cooperative Study of Sickle Cell Disease. Blood 1994;84:643–649.

6. Wyszynski DF, et al. Polymorphisms near a chromosome 6q QTL area areassociated with modulation of fetal hemoglobin levels in sickle cell anemia.Cell Mol Biol (Noisy-le-grand) 2004;50:23–33.

7. Garner C, et al. Haplotype mapping of a major quantitative-trait locus for fetalhemoglobin production, on chromosome 6q23. Am J Hum Genet 1998;62:1468–1474.

8. Garner C, et al. Quantitative trait locus on chromosome 8q influences theswitch from fetal to adult hemoglobin. Blood 2004;104:2184–2186.

9. Garner CP, et al. Evidence of genetic interaction between the b-globin com-plex and chromosome 8q in the expression of fetal hemoglobin. Am J HumGenet 2002;70:793–799.

10. Dover GJ, et al. Fetal hemoglobin levels in sickle cell disease and normalindividuals are partially controlled by an X-linked gene located at Xp22.2.Blood 1992;80:816–824.

11. Stamatoyannopoulos G. Prospects for developing a molecular cure for thalas-semia. Hematology 2005;10(Suppl 1):255–257.

12. Bank A. Regulation of human fetal hemoglobin: New players, new complex-ities. Blood 2006;107:435–443.

13. Ma Q, et al. Fetal hemoglobin in sickle cell anemia: Genetic determinantsof response to hydroxyurea. Pharmacogenomics J 2007; DOI: 10.1038/sj.tpj.6500433.

14. Palmer CR. Encyclopedia of biostatistics. BMJ 1999;318:542.15. Figueiredo MS, Steinberg MH. Fetal hemoglobin in sickle cell anemia: Exami-

nation of phylogenetically conserved sequences within the locus controlregion but outside the cores of hypersensitive sites 2 and 3. Blood Cells MolDis 1997;23:188–200.

TABLE III. List of SNPs in Genes Between Chromosome 1 and 11

That Were Found Associated With Levels of HbF in the Younger

Subjects of the CSSCD

Gene Band

CSSCD (age < 24 years)

rs number AA AB BB D(DIC)

KDR 4q11 rs6554233 0.07 0.071 0.093 23.91

rs6828477 0.076 0.069 0.066 20.73

rs7654599 0.063 0.075 0.073 22.36

rs2305948 0.071 0.076 0.062 20.02

MAP3K7 6q16 rs1145729 0.069 0.079 0.081 24.46

rs157681 0.072 0.075 0.056 23.86

NOX3 6q25.1 rs231944 0.075 0.068 0.080 20.71

rs231945 0.075 0.066 0.085 22.9

rs9371889 0.067 0.075 0.076 0.53

rs6557420 0.066 0.073 0.080 22.41

NOS3 7q36 rs1008140 0.071 0.079 0.059 20.31

rs743507 0.067 0.083 0.067 27.47

rs1808593 0.068 0.082 0.070 25.3

TOX 8q21.1 rs2726599 0.077 0.068 0.065 22.32

rs3109904 0.072 0.076 0.061 22.44

rs7821556 0.077 0.070 0.061 22.64

rs7817609 0.066 0.070 0.081 22.64

rs826730 0.077 0.074 0.064 21.37

rs3779999 0.07 0.078 0.063 20.69

rs1349115 0.077 0.067 0.066 21.64

rs2594953 0.065 0.080 0.071 25.31

rs10283344 0.073 0.075 0.063 21.27

rs12545204 0.066 0.074 0.075 20.25

rs380620 0.074 0.074 0.062 20.88

ASS 9q34 rs590086 0.075 0.066 0.085 24.04

rs652313 0.067 0.077 0.068 21.9

rs12555797 0.069 0.085 0.079 22.28

rs543048 0.069 0.077 0.067 20.59

HBE1 11p15.4 rs7130110 0.07 0.071 0.093 26.96

rs3759070 0.068 0.075 0.060 21.84

HBG2 11p15.4 rs7482144 0.07 0.082 0.084 26.93

TABLE IV. List of SNPs in Genes Between Chromosome 12 and

23 That Were Found Associated With Levels of HbF in the

Younger Subjects of the CSSCD

Gene Chromosome

CSSCD (age < 24 years)

rs number AA AB BB D(DIC)

NOS1 12q24 rs2682820 0.073 0.068 0.094 20.69

rs3825102 0.073 0.067 0.106 23.55

rs1483757 0.078 0.070 0.066 21.05

FLT1 13q12 rs7987291 0.064 0.076 0.075 22.85

rs2387632 0.067 0.079 0.090 26.78

rs9513097 0.076 0.067 0.066 21.54

rs9508026 0.076 0.066 0.062 23.16

rs8002446 0.074 0.067 0.084 20.16

rs638889 0.065 0.078 0.069 22.99

rs2256849 0.066 0.076 0.072 20.02

rs670084 0.076 0.067 0.077 20.27

rs600640 0.072 0.076 0.065 20.06

ALOX5AP 13q12 rs4468448 0.064 0.075 0.073 20.98

rs4769058 0.069 0.078 0.057 21.67

rs12019512 0.069 0.078 0.069 20.65

rs4445746 0.068 0.078 0.071 20.86

KL 13q13 rs398655 0.07 0.076 0.062 22.04

rs577912 0.068 0.076 0.070 20.5

rs7982726 0.069 0.074 0.091 23.79

rs685417 0.076 0.070 0.065 0.24

rs9527025 0.075 0.068 0.050 23.67

rs648202 0.067 0.072 0.080 20.76

GPM6B Xp22.2 rs5978663 0.07203 0.077 0.064 20.43

rs11095629 0.069 0.082 0.071 21.42

rs7890737 0.07 0.084 0.072 20.57

rs4830513 0.069 0.084 0.068 22.91

rs5979998 0.07 0.081 0.069 21.23

rs6654096 0.067 0.083 0.069 25.06

194 American Journal of Hematology DOI 10.1002/ajh

16. Lu ZH, Steinberg MH. Fetal hemoglobin in sickle cell anemia: Relation to reg-ulatory sequences cis to the b-globin gene. Multicenter Study of Hydroxyurea.Blood 1996;87:1604–1611.

17. Wilkinson B, et al. TOX: An HMG box protein implicated in the regulation ofthymocyte selection. Nat Immunol 2002;3:272–280.

18. Bianchi ME, Beltrame M. Flexing DNA: HMG-box proteins and their partners.Am J Hum Genet 1998;63:1573–1577.

19. Rucknagel DL, et al. Age and sex effects on hemoglobin F in sickle cell ane-mia. In: Stamatoyannopoulos G, Nienhuis AW, editors. Cellular and MolecularRegulation of Hemoglobin Switching. New York: Grune and Stratton; 1979.pp 107–118.

20. Mack AK, Kato GJ. Sickle cell disease and nitric oxide: A paradigm shift? IntJ Biochem Cell Biol 2006;38:1237–1243.

21. Nagai R, et al. Endothelial dysfunction in the klotho mouse and downregula-tion of klotho gene expression in various animal models of vascular and meta-bolic diseases. Cell Mol Life Sci 2000;57:738–746.

22. Nakamura T, et al. Production of nitric oxide, but not prostacyclin, is reducedin klotho mice. Jpn J Pharmacol 2002;89:149–156.

23. Ikuta T, Ausenda S, Cappellini MD. Mechanism for fetal globin gene expres-sion: Role of the soluble guanylate cyclase-cGMP-dependent protein kinasepathway. Proc Natl Acad Sci USA 2001;98:1847–1852.

24. Inoue A, et al. Negative regulation of g-globin gene expression by cyclicAMP-dependent pathway in erythroid cells. Exp Hematol 2004;32:244–253.

25. Stamatoyannopoulos G.Molecular and cellular basis of hemoglobin switching.In: Forgret BG, Steinberg MH, Higgs DR, Nagel RL, editors. Disorders of He-moglobin: Genetics, Pathophysiology, and Clinical Management. Cambridge:Cambridge University Press; 2001. pp 131–145.

26. Ishibashi K, et al. Cloning and functional expression of a new aquaporin(AQP9) abundantly expressed in the peripheral leukocytes permeable towater and urea, but not to glycerol. Biochem Biophys Res Commun 1998;244:268–274.

27. Sebastiani P, et al. Genome-wide association study using pooled DNA sam-ples reveal new genetic modulators of HbF concentration Paper presented inNational Sickle Cell Disease Program, 35th Annual Meeting 2007, Washing-ton, DC.

28. Gaston M, Rosse WF. The cooperative study of sickle cell disease: Reviewof study design and objectives. Am J Pediatr Hematol Oncol 1982;4:197–201.

29. West MS, et al. Laboratory profile of sickle cell disease: A cross-sectionalanalysis. The Cooperative Study of Sickle Cell Disease. J Clin Epidemiol 1992;45:893–909.

30. Steinberg MH, et al. Fetal hemoglobin in sickle cell anemia: Determinants ofresponse to hydroxyurea. Multicenter Study of Hydroxyurea. Blood 1997;89:1078–1088.

31. Sebastiani P, et al. Genetic dissection and prognostic modeling of overt strokein sickle cell anemia. Nat Genet 2005;37:435–440.

32. Nolan VG, et al. Association of single nucleotide polymorphisms in klotho withpriapism in sickle cell anaemia. Br J Haematol 2005;128:266–272.

33. Nolan VG, et al. Sickle cell leg ulcers: Associations with haemolysis andSNPs in klotho, TEK and genes of the TGF-b/BMP pathway. Br J Haematol2006;133:570–578.

34. Baldwin C, et al. Association of klotho, bone morphogenic protein 6, andannexin A2 polymorphisms with sickle cell osteonecrosis. Blood 2005;106:372–375.

35. Tuan D, et al. The ‘‘beta-like-globin’’ gene domain in human erythroid cells.Proc Natl Acad Sci USA 1985;82:6384–6388.

36. Thein SL, et al. Detection of a major gene for heterocellular hereditary per-sistence of fetal hemoglobin after accounting for genetic modifiers. Am J HumGenet 1994;54:214–228.

37. The International Hapmap Consortium. The International HapMap project.Nature 2003;426:798–796.

38. Akaike H. Information theory and an extension of the maximum likelihoodprinciple. In Paper Presented at 2nd International Symposium on InformationTheory. Kiado; 1973.

39. Smithson M, Verkuilen J. A better lemon squeezer? Maximum-likelihoodregression with b-distributed dependent variables. Psychol Methods 2006;11:54–71.

40. Balding DJ. A tutorial on statistical methods for population association stud-ies. Nat Rev Genet 2006;7:781–791.

41. Spiegelhalter DJ, et al. Bayesian measures of model complexity and fit. J RoyStatist Soc Series B 2002;64:583–639.

American Journal of Hematology DOI 10.1002/ajh 195