Upload
vocong
View
220
Download
0
Embed Size (px)
Citation preview
Identification of low frequency variants associated with
gout and serum uric acid levels
Patrick Sulem1*, Daniel F. Gudbjartsson1*, G. Bragi Walters1*, Hafdis T. Helgadottir1, Agnar Helgason1, Sigurjon A. Gudjonsson1, Carlo Zanon1, Soren Besenbacher1, Gyda Bjornsdottir1, Olafur T. Magnusson1 , Gisli Magnusson1 , Eirikur Hjartarson1, Jona Saemundsdottir1, Arnaldur Gylfason1, Adalbjorg Jonasdottir1, Hilma Holm1, Ari Karason1, Thorunn Rafnar1, Hreinn Stefansson1, Ole A. Andreassen2, Jesper H. Pedersen3, Allan I. Pack4, Marieke C.H. de Visser5, Lambertus A. Kiemeney5,6,7, Arni J Geirsson8, Gudmundur I. Eyjolfsson9, Isleifur Olafsson10, Augustine Kong1, Gisli Masson1, Helgi Jonsson8,11, Unnur Thorsteinsdottir1,11, Ingileif Jonsdottir1,11,12 & Kari Stefansson1,11 1deCODE genetics, Sturlugata 8, 101 Reykjavik, Iceland 2Division of Mental Health and Addiction, Oslo University Hospital & Institute of Clinical Medicine, University of Oslo, Oslo, Norway. 3Department of Cardiothoracic Surgery, Rigshospitalet, University of Copenhagen, Copenhagen, Denmark 4Center for Sleep and Circardian Neurobiology, Division of Sleep Medicine, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 5Department of Epidemiology, Biostatistics & HTA, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, the Netherlands. 6Comprehensive Cancer Center IKO, 6501 BG Nijmegen, the Netherlands. 7Department of Urology, Radboud University Nijmegen Medical Centre, 6500 HB Nijmegen, the Netherlands
8Landspitali, The National University Hospital of Iceland, Department of Medicine, Reykjavik, Iceland 9Icelandic Medical Center (Laeknasetrid) Laboratory in Mjodd (RAM), 10Landspitali, The National University Hospital of Iceland, Department of Clinical Biochemistry, Reykjavik, Iceland, 11University of Iceland, Faculty of Medicine, Reykjavik, Iceland 12Landspitali, The National University Hospital of Iceland, Department of Immunology, Reykjavik, Iceland. * Authors with an equal contribution
Nature Genetics: doi:10.1038/ng.972
Supplementary Information Contents: Supplementary Note: Supplementary Note Supplementary Table 1: Association with gout of genome-‐wide significant markers at
the 19q13 locus Supplementary Table 2: Association with gout of genome-‐wide significant markers at
the ABCG2 locus Supplementary Table 3: Novel and previously reported sequence variants associating
with serum uric acid levels and gout Supplementary Table 4: Sex-‐stratified uric acid levels association for novel and
previously reported sequence variants associating with gout and serum uric acid levels
Supplementary Table 5: Sex-‐stratified gout association for novel and previously reported sequence variants associating with gout and serum uric acid levels
Supplementary Table 6: Association of uric acid sequence variants with age at visit to the clinician
Supplementary Figure 1: The sequencing depth of the 457 whole-‐genome sequenced individuals
Supplementary Figure 2: Quantile-‐quantile plot of the SNPs in the genome-‐wide association scan for gout
Supplementary Figure 3: Quantile-‐quantile plot of the SNPs in the genome-‐wide association scan for uric acid levels
Nature Genetics: doi:10.1038/ng.972
Supplementary Note
Whole Genome Sequencing
SNPs were identified through the Icelandic whole genomic sequencing project. A total of 457
Icelanders were selected for sequencing based on having various neoplasic, cardiovascular and
psychiatric conditions. All of the individuals were sequenced to a depth of at least 10X. Based on this
data, 15,957,390 SNPs were imputed based on this set of individuals.
Sample preparation. Paired-‐end libraries for sequencing were prepared according to the
manufacturer's instructions (Illumina). In short, approximately 5 μg of genomic DNA, isolated from
frozen blood samples, was fragmented to a mean target size of 300 bp using a Covaris E210
instrument. The resulting fragmented DNA was end repaired using T4 and Klenow polymerases and
T4 polynucleotide kinase with 10 mM dNTP followed by addition of an 'A' base at the ends using
Klenow exo fragment (3′ to 5′-‐exo minus) and dATP (1 mM). Sequencing adaptors containing 'T'
overhangs were ligated to the DNA products followed by agarose (2%) gel electrophoresis. Fragments
of about 400 bp were isolated from the gels (QIAGEN Gel Extraction Kit), and the adaptor-‐modified
DNA fragments were PCR enriched for ten cycles using Phusion DNA polymerase (Finnzymes Oy) and
PCR primers PE 1.0 and PE 2.0 (Illumina). Enriched libraries were further purified using agarose (2%)
gel electrophoresis as described above. The quality and concentration of the libraries were assessed
with the Agilent 2100 Bioanalyzer using the DNA 1000 LabChip (Agilent). Barcoded libraries were
stored at −20 °C. All steps in the workflow were monitored using an in-‐house laboratory information
management system with barcode tracking of all samples and reagents.
DNA sequencing. Template DNA fragments were hybridized to the surface of flow cells (Illumina PE
flowcell, v4) and amplified to form clusters using the Illumina cBot. In brief, DNA (8–10 pM) was
denatured, followed by hybridization to grafted adaptors on the flowcell. Isothermal bridge
amplification using Phusion polymerase was then followed by linearization of the bridged DNA,
denaturation, blocking of 3 ends and hybridization of the sequencing primer. Sequencing-‐by-‐
synthesis was performed on Illumina GAIIx instruments equipped with paired-‐end modules. Paired-‐
end libraries were sequenced using 2 × 101 cycles of incorporation and imaging with Illumina
sequencing kits, v4. Each library or sample was initially run on a single lane for validation followed by
further sequencing of ≥4 lanes with targeted cluster densities of 250–300 k/mm2. Imaging and
analysis of the data was performed using the SCS 2.6 and RTA 1.6 software packages from Illumina,
respectively. Real-‐time analysis involved conversion of image data to base-‐calling in real-‐time.
Nature Genetics: doi:10.1038/ng.972
Alignment. For each lane in the DNA sequencing output, the resulting qseq files were converted into
fastq files using an in-‐house script. All output from sequencing was converted, and the Illumina
quality filtering flag was retained in the output. The fastq files were then aligned against Build 36 of
the human reference sequence using bwa version 0.5.7 (ref. 1).
BAM file generation. SAM file output from the alignment was converted into BAM format using
samtools version 0.1.8 (ref. 2), and an in-‐house script was used to carry the Illumina quality filter flag
over to the BAM file. The BAM files for each sample were then merged into a single BAM file using
samtools. Finally, Picard version 1.17 (see http://picard.sourceforge.net/) was used to mark
duplicates in the resulting sample BAM files.
SNP calling and genotyping in whole-‐genome sequencing
A two-‐step approach was applied. The first step was to detect SNPs by identifying sequence positions
where at least one individual could be determined to be different from the reference sequence with
confidence (quality threshold of 20) based on the SNP calling feature of the pileup tool samtools2.
SNPs that always differed heterozygous or homozygous from the reference were removed. The
second step was to use the pileup tool to genotype the SNPs at the positions that were flagged as
polymorphic. Because sequencing depth varies and hence the certainty of genotype calls also varies,
genotype likelihoods rather than deterministic calls were calculated (see below). Of the 2.5 million
SNPs reported in the HapMap2 CEU samples, 96.3% were observed in the whole-‐genome sequencing
data. Of the 6.9 million SNPs reported in the 1000 Genomes Project data, 89.4% were observed in the
whole-‐genome sequencing data.
Long range phasing
Long range phasing of all chip-‐genotyped individuals was performed with methods described
previously3,4. In brief, phasing is achieved using an iterative algorithm which phases a single proband
at a time given the available phasing information about everyone else that shares a long haplotype
identically by state with the proband. Given the large fraction of the Icelandic population that has
been chip-‐typed, accurate long range phasing is available genome-‐wide for all chip-‐typed Icelanders.
Genotype imputation
We imputed the SNPs identified and genotyped through sequencing into all Icelanders who had been
phased with long range phasing using the same model as used by IMPUTE5. The genotype data from
sequencing can be ambiguous due to low sequencing coverage. In order to phase the sequencing
Nature Genetics: doi:10.1038/ng.972
genotypes, an iterative algorithm was applied for each SNP with alleles 0 and 1. We let H be the long
range phased haplotypes of the sequenced individuals and applied the following algorithm:
1. For each haplotype h in H, use the Hidden Markov Model of IMPUTE to calculate for every
other k in H, the likelihood, denoted γh,k, of h having the same ancestral source as k at the
SNP.
2. For every h in H, initialize the parameter , which specifies how likely the one allele of the
SNP is to occur on the background of h from the genotype likelihoods obtained from
sequencing. The genotype likelihood Lg is the probability of the observed sequencing data at
the SNP for a given individual assuming g is the true genotype at the SNP. If L0, L1 and L2 are
the likelihoods of the genotypes 0, 1 and 2 in the individual that carries h, then set
.
3. For every pair of haplotypes h and k in H that are carried by the same individual, use the
other haplotypes in H to predict the genotype of the SNP on the backgrounds of h and k:
and . Combining these predictions with the genotype
likelihoods from sequencing gives un-‐normalized updated phased genotype probabilities:
, , and
. Now use these values to update θh and θk to
and .
4. Repeat step 3 when the maximum difference between iterations is greater than a
convergence threshold ε. We used ε=10−7.
Given the long range phased haplotypes and the allele of the SNP on a new haplotype h not in H,
is imputed as .
The above algorithm can easily be extended to handle simple family structures such as parent-‐
offspring pairs and triads by letting the P distribution run over all founder haplotypes in the family
structure. The algorithm also extends trivially to the X-‐chromosome. If source genotype data are only
ambiguous in phase, such as chip genotype data, then the algorithm is still applied, but all but one of
Nature Genetics: doi:10.1038/ng.972
the Ls will be 0. In some instances, the reference set was intentionally enriched for carriers of the
minor allele of a rare SNP in order to improve imputation accuracy. In this case, expected allele
counts will be biased toward the minor allele of the SNP. Call the enrichment of the minor allele E
and let be the expected minor allele count calculated from the naïve imputation method, and let
be the unbiased expected allele count, then and hence .
This adjustment was applied to all imputations based on enriched imputations sets. We note that if
is 0 or 1, then will also be 0 or 1, respectively.
Genotype imputation information
The informativeness of genotype imputation was estimated by the ratio of the variance of imputed
expected allele counts and the variance of the actual allele counts:
where is the allele count. was estimated by the observed variance
of the imputed expected counts and was estimated by , where is the allele
frequency.
In silico genotyping
In addition to imputing sequence variants from the whole genome sequencing effort into chip
genotyped individuals, we also performed a second imputation step where genotypes were imputed
into relatives of chip genotyped individuals, creating in silico genotypes. The inputs into the second
imputation step are the fully phased (in particular every allele has been assigned a parent of origin)
imputed and chip type genotypes of the available chip typed individuals. The algorithm used to
perform the second imputation step consists of:
1. For each ungenotyped individual (the proband), find all chip genotyped individuals within two
meiosis of the individual. The six possible types of two meiosis relatives of the proband are
(ignoring more complicated relationships due to pedigree loops): Parents, full and half
siblings, grandparents, children and grandchildren. If all pedigree paths from the proband to
a genotyped relative go through other genotyped relatives, then that relative is excluded.
E.g. if a parent of the proband is genotyped, then the proband’s grandparents through that
Nature Genetics: doi:10.1038/ng.972
parent are excluded. If the number of meiosis in the pedigree around the proband exceeds a
threshold (we used 12), then relatives are removed from the pedigree until the number of
meiosis falls below 12, in order to reduce computational complexity.
2. At every point in the genome, calculate the probability for each genotyped relative sharing
with the proband based on the autosomal SNPs used for phasing. A multipoint algorithm
based on the hidden Markov model Lander-‐Green multipoint linkage algorithm using fast
Fourier transforms is used to calculate these sharing probabilities6,7. First single point sharing
probabilities are calculated by dividing the genome into 0.5cM bins and using the haplotypes
over these bins as alleles. Haplotypes that are the same, except at most at a single SNP, are
treated as identical. When the haplotypes in the pedigree are incompatible over a bin, then a
uniform probability distribution was used for that bin. The most common causes for such
incompatibilities are recombinations within the pedigree, phasing errors and genotyping
errors. Note that since the input genotypes are fully phased, the single point information is
substantially more informative than for unphased genotyped, in particular one haplotype of
the parent of a genotyped child is always known. The single point distributions are then
convolved using the multipoint algorithm to obtain multipoint sharing probabilities at the
center of each bin. Genetic distances were obtained from the most recent version of the
deCODE genetic map4.
3. Based on the sharing probabilities at the center of each bin, all the SNPs from the whole
genome sequencing are imputed into the proband. To impute the genotype of the paternal
allele of a SNP located at , flanked by bins with centers at and . Starting with
the left bin, going through all possible sharing patterns , let be the set of haplotypes of
genotyped individuals that share identically by descent within the pedigree with the
proband’s paternal haplotype given the sharing pattern and be the probability of
at the left bin – this is the output from step 2 above – and let be the expected allele count
of the SNP for haplotype . Then is the expected allele count of the paternal
haplotype of the proband given and an overall estimate of the allele count given the
sharing distribution at the left bin is obtained from . If is empty then no
relative shares with the proband’s paternal haplotype given and thus there is no
information about the allele count. We therefore store the probability that some genotyped
relative shared the proband’s paternal haplotype, and an expected allele
Nature Genetics: doi:10.1038/ng.972
count, conditional on the proband’s paternal haplotype being shared by at least one
genotyped relative: . In the same way calculate and .
Linear interpolation is then used to get an estimates at the SNP from the two flanking bins:
If is an estimate of the population frequency of the SNP then is an
estimate of the allele count for the proband’s paternal haplotype. Similarly, an expected
allele count can be obtained for the proband’s maternal haplotype.
Case control association testing
Logistic regression was used to test for association between SNPs and disease, treating disease status
as the response and expected genotype counts from imputation or allele counts from direct
genotyping as covariates. Testing was performed using the likelihood ratio statistic. The conditional
analysis of the chromosome 1 centromere and 19q13 loci was performed by adding the strongest
SNP at each locus as a covariate while testing every SNP in the region for association with gout. When
testing for association based on the in silico genotypes, controls were matched to cases based on the
informativeness of the imputed genotypes, such that for each case controls of matching
informativeness where chosen. Failing to match cases and controls will lead to a highly inflated
genomic control factor, and in some cases may lead to spurious false positive findings. The
informativeness of each of the imputation of each one of an individual’s haplotypes was estimated by
taking the average of
over all SNPs imputed for the individual, where is the expected allele count for the haplotype at
the SNP and is the population frequency of the SNP. Note that and
. The mean informativeness values cluster into groups corresponding to the
most common pedigree configurations used in the imputation, such as imputing from parent into
child or from child into parent. Based on this clustering of imputation informativeness we divided the
haplotypes of individuals into seven groups of varying informativeness, which created 27 groups of
Nature Genetics: doi:10.1038/ng.972
individuals of similar imputation informativeness; 7 groups of individuals with both haplotypes having
similar informativeness, 21 groups of indivdiuals with the two haplotypes having different
informativeness, minus the one group of individuals with neither haplotype being imputed well.
Within each group we calculate the ratio of the number of controls and the number of cases, and
choose the largest integer that was less than this ratio in all the groups. For example, if in one
group there are 10.3 times as many controls as cases and if in all other groups this ratio was greater,
then we would set and within each group randomly select ten times as many controls as
there are cases. For gout we used .
Quantitative trait association testing
A generalized form of linear regression was used to test for association of UA with SNPs. Let be
the vector of quantitative measurements, and let be the vector of expected allele counts for the
SNP being tested. We assume the quantitative measurements follow a normal distribution with a
mean that depends linearly on the expected allele at the SNP and a variance covariance matrix
proportional to the kinship matrix:
where
is based on the kinship between individuals as estimated from the Icelandic genealogical database
( ) and and estimate of the heritability of the trait ( ). It is not computationally feasible to use this
full model and we therefore split the individuals with in silico genotypes and UA measurements into
smaller clusters. Here we chose to restrict the cluster size to at most 300 individuals.
The maximum likelihood estimates for the parameters , , and involve inverting the kinship
matrix. If there are individuals in the cluster, then this inversion requires calculations, but
since these calculations only need to be performed once the computational cost of doing a GWAS will
only be calculations; the cost of calculating the maximum likelihood estimates if the kinship
matrix has already been inverted.
Effective sample size estimation
In order to estimate the effective sample size of the case control and quantitative trait association
analyses, we compared the variances of the logistic and generalized linear regression parameter
estimates based on the in silico genotypes to their one step imputation counterparts. For the
quantitative trait association analysis, assume that a single step imputation (SNPs are imputed, but in
Nature Genetics: doi:10.1038/ng.972
silico genotypes are not used) association analysis with subjects leads on average to an estimate
of the regression parameter with variance and that the corresponding in silico genotype
association analysis leads to an estimate of the regression parameter with variance , then
assuming that variance goes down linearly with sample size we estimate the effective sample size in
the in silico genotype association analysis as . For the case control association analysis,
the number of controls is much greater than the number cases and we use the same formula to
estimate the effective number of cases, with the -‐s representing the number of cases and the -‐s
representing the variances of the logistic regression coefficient.
References
1. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754-1760 (2009).
2. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079 (2009).
3. Kong, A. et al. Detection of sharing by descent, long-range phasing and haplotype imputation. Nat. Genet. 40, 1068-1075 (2008).
4. Kong, A. et al. Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467, 1099-1103 (2010).
5. Marchini, J., Howie, B., Myers, S., McVean, G. & Donnelly, P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39, 906-913 (2007).
6. Lander, E.S. & Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84, 2363-2367 (1987).
7. Kruglyak, L. & Lander, E.S. Faster multipoint linkage analysis using Fourier transforms. J. Comput. Biol. 5, 1-7 (1998).
Nature Genetics: doi:10.1038/ng.972
Supplementary Table 1 – Association with gout of genome-‐wide significant markers at the 19q13 locus (top marker is bolded)
P-‐value Allelic Odds Ratio Allelic
Frequency in Cases
Allelic Frequency in Controls
Info Coding allele Other allele
Marker Chr 19-‐Build 36 position
Gene (if aa
change)
Amino acid
change
1.7×10-‐8 2.815 0.0251 0.0116 0.89 T G chr19:54326383 54,326,383 -‐ -‐ 4.6×10-‐8 2.509 0.0288 0.0144 0.87 T C chr19:54400812 54,400,812 -‐ -‐ 1.7×10-‐10 3.101 0.0273 0.0116 0.88 C A chr19:54505919 54,505,919 -‐ -‐ 1.5×10-‐16 3.122 0.0442 0.0183 0.89 G C chr19:54660818 54,660,818 ALDH16A1 P476R 2.7×10-‐8 1.660 0.0946 0.0658 0.92 G C rs1064257 54,685,347 -‐ -‐ 2.2×10-‐8 2.113 0.0506 0.0314 0.76 C G rs62128084 54,706,613 -‐ -‐ 3.2×10-‐11 3.080 0.0293 0.0124 0.87 C G chr19:54788061 54,788,061 -‐ -‐ 1.3×10-‐11 3.205 0.0288 0.0119 0.87 G A chr19:54812310 54,812,310 -‐ -‐ 4.7×10-‐11 2.203 0.0566 0.0317 0.90 C T rs62128132 54,909,767 -‐ -‐ 1.8×10-‐9 2.941 0.0259 0.0114 0.88 G A chr19:54991872 54,991,872 -‐ -‐ 2.0×10-‐9 2.932 0.0259 0.0113 0.88 A G chr19:55018776 55,018,776 -‐ -‐ 1.8×10-‐9 2.939 0.0259 0.0113 0.89 T C chr19:55068782 55,068,782 -‐ -‐ 1.9×10-‐9 2.938 0.0259 0.0113 0.89 G T chr19:55071043 55,071,043 -‐ -‐ 1.9×10-‐9 2.938 0.0259 0.0113 0.89 C G chr19:55071103 55,071,103 -‐ -‐ 1.1×10-‐9 3.141 0.0218 0.0078 0.99 C T chr19:55268031 55,268,031 -‐ -‐ 1.1×10-‐9 3.136 0.0219 0.0079 0.98 C G chr19:55270330 55,270,330 -‐ -‐ 5.2×10-‐10 3.162 0.0225 0.0081 0.96 C T chr19:55483086 55,483,086 -‐ -‐ 1.8×10-‐10 3.240 0.0228 0.008 0.98 T C chr19:55576372 55,576,372 -‐ -‐ 1.9×10-‐8 2.611 0.027 0.0125 0.94 G A chr19:55602702 55,602,702 -‐ -‐
The tests for association are based on an effective sample size of 968 gout cases and over 40,000 controls.
Nature Genetics: doi:10.1038/ng.972
Supplementary Table 2 – Association with gout of genome-‐wide significant markers at the ABCG2 locus (top marker is bolded)
P-‐value Allelic Odds
Ratio
Allelic Frequency in
Cases
Allelic Frequency in Controls
Info Coding allele
Other allele
Marker Chr 4-‐Build 36 position
Gene (if aa
change)
Amino acid
change
1.2×10-‐9 1.35 0.479 0.419 0.97 T C rs2725261 89,255,377 -‐ -‐
5.3×10-‐12 1.65 0.147 0.102 0.99 G A rs1481012 89,258,106 -‐ -‐
2.9×10-‐12 1.66 0.145 0.100 1.00 C G rs45499402 89,262,658 -‐ -‐
3.0×10-‐12 1.66 0.145 0.100 1.00 C G chr4:89263204 89,263,204 -‐ -‐
3.0×10-‐12 1.66 0.145 0.100 1.00 T A chr4:89263336 89,263,336 -‐ -‐
3.0×10-‐12 1.66 0.145 0.100 1.00 A G rs75544042 89,264,355 -‐ -‐
3.0×10-‐12 1.66 0.145 0.100 1.00 T C chr4:89265226 89,265,226 -‐ -‐
2.8×10-‐12 1.66 0.145 0.100 1.00 A C rs74904971 89,269,050 -‐ -‐
2.8×10-‐12 1.66 0.145 0.100 1.00 T G rs2231142 89,271,347 ABCG2 Q141K
2.8×10-‐12 1.66 0.145 0.100 1.00 G A rs4148155 89,273,691 -‐ -‐
2.3×10-‐8 1.31 0.518 0.462 1.00 A C rs2622620 89,282,875 -‐ -‐
1.9×10-‐9 1.34 0.562 0.502 0.99 A C rs2622627 89,284,377 -‐ -‐
2.0×10-‐9 1.34 0.561 0.501 0.99 C A rs2725249 89,284,892 -‐ -‐
1.6×10-‐9 1.35 0.565 0.505 0.99 A C rs2622626 89,285,739 -‐ -‐
4.6×10-‐8 1.36 0.752 0.703 0.98 A C rs2725248 89,287,031 -‐ -‐
4.5×10-‐10 1.36 0.513 0.451 0.99 A G rs2725247 89,287,281 -‐ -‐
2.8×10-‐10 1.36 0.511 0.448 0.99 T G rs17731799 89,287,479 -‐ -‐
4.7×10-‐10 1.36 0.513 0.451 0.99 A G rs2725246 89,287,522 -‐ -‐
2.7×10-‐8 1.37 0.751 0.701 0.98 C T rs2622625 89,287,761 -‐ -‐
4.7×10-‐10 1.36 0.513 0.451 0.99 A G rs2725245 89,287,762 -‐ -‐
1.8×10-‐9 1.35 0.564 0.504 0.99 C T rs2725244 89,287,785 -‐ -‐
4.7×10-‐10 1.36 0.512 0.449 0.99 C T rs2622624 89,288,430 -‐ -‐
1.9×10-‐9 1.34 0.564 0.504 0.99 A T rs2725242 89,288,551 -‐ -‐
2.6×10-‐9 1.34 0.565 0.506 0.99 C T chr4:89293003 89,293,003 -‐ -‐
4.0×10-‐9 1.34 0.559 0.500 0.99 T C rs13109944 89,293,429 -‐ -‐
7.1×10-‐10 1.35 0.508 0.447 0.99 G A rs28856119 89,293,627 -‐ -‐
4.4×10-‐10 1.36 0.493 0.432 0.98 A G chr4:89293691 89,293,691 -‐ -‐
1.7×10-‐9 1.35 0.491 0.431 0.97 T C chr4:89293698 89,293,698 -‐ -‐
2.1×10-‐9 1.35 0.490 0.431 0.97 T C chr4:89293703 89,293,703 -‐ -‐
2.3×10-‐9 1.34 0.490 0.431 0.97 G T chr4:89293711 89,293,711 -‐ -‐
2.1×10-‐9 1.35 0.495 0.436 0.98 C T chr4:89293717 89,293,717 -‐ -‐
2.1×10-‐9 1.34 0.498 0.438 0.98 C G chr4:89293718 89,293,718 -‐ -‐
7.1×10-‐10 1.35 0.508 0.447 0.99 G C rs34633905 89,293,795 -‐ -‐
1.2×10-‐9 1.35 0.505 0.444 1.00 A C rs2725239 89,294,647 -‐ -‐
1.3×10-‐9 1.35 0.505 0.444 1.00 G C rs2622603 89,296,505 -‐ -‐
1.1×10-‐9 1.35 0.504 0.443 0.99 T C rs2622605 89,298,410 -‐ -‐
1.5×10-‐9 1.34 0.504 0.444 1.00 C T rs2622605 89,298,410 -‐ -‐
1.3×10-‐9 1.35 0.504 0.444 1.00 C T rs3114020 89,302,690 -‐ -‐
7.5×10-‐10 1.35 0.504 0.442 1.00 A G rs2725226 89,304,355 -‐ -‐
7.6×10-‐10 1.35 0.506 0.444 0.99 T A rs2622608 89,305,768 -‐ -‐
6.9×10-‐10 1.35 0.505 0.443 0.99 C A rs2622609 89,307,499 -‐ -‐
The tests for association are based on an effective sample size of 968 gout cases and over 40,000 controls.
Nature Genetics: doi:10.1038/ng.972
Supplementary Table 3 – Novel and previously reported sequence variants associating with serum uric acid levels and gout
Allele Uric Acid Gout SNP Chr Pos Effect/other Freq Info Effect (95% CI)a P OR (95% CI) P Novel SNP associations chr1_142697422 1 142,697,422 C/T 0.986 0.55 0.48 (0.36, 0.60) 4.5×10-‐16 1.92 (1.01, 3.63) 0.046 chr1_144539240 1 144,539,240 A/G 0.987 0.65 0.41 (0.30, 0.52) 2.5×10-‐13 2.06 (1.11, 3.82) 0.023 c.1580C>G 19 54,660,818 G/C 0.019 0.89 0.36 (0.29, 0.44) 4.5×10-‐21 3.12 (2.38, 4.17) 1.5×10-‐16 Replication of previously reported SNP associations rs1967017 1 144,435,002 T/C 0.449 0.93 0.03 (0.01, 0.05) 0.0016 1.09 (0.98, 1.19) 0.12 rs12129861 1 144,437,046 G/A 0.500 0.90 0.04 (0.01, 0.06) 0.0012 1.04 (0.94, 1.15) 0.46 rs780094 2 27,594,741 T/C 0.340 1.00 0.04 (0.02, 0.06) 0.00071 1.19 (1.08, 1.32) 0.00092 rs780093 2 27,596,107 T/C 0.342 1.00 0.04 (0.02, 0.06) 0.00082 1.18 (1.06, 1.30) 0.0012 rs734553 4 9,532,102 T/G 0.790 1.00 0.24 (0.22, 0.27) 1.0×10-‐80 1.39 (1.23, 1.59) 2.4×10-‐7 rs13129697 4 9,536,065 T/G 0.767 1.00 0.23 (0.21, 0.26) 1.6×10-‐79 1.32 (1.18, 1.49) 5.1×10-‐6 rs2199936 4 89,264,355 A/G 0.101 1.00 0.16 (0.12, 0.19) 1.9×10-‐20 1.66 (1.44, 1.91) 3.0×10-‐12 rs2231142 4 89,271,347 T/G 0.101 1.00 0.16 (0.12, 0.19) 2.3×10-‐20 1.67 (1.43, 1.92) 2.8×10-‐12 rs675209 6 7,047,083 T/C 0.260 1.00 0.04 (0.01, 0.06) 0.0022 1.03 (0.92, 1.15) 0.62 rs742132 6 25,715,550 A/G 0.716 1.00 0.01 (-‐0.01, 0.04) 0.25 1.09 (0.98, 1.21) 0.12 rs1165196 6 25,921,129 A/G 0.492 1.00 0.05 (0.03, 0.07) 2.8×10-‐6 1.10 (1.00, 1.21) 0.059 rs1183201 6 25,931,423 T/A 0.479 1.00 0.05 (0.03, 0.07) 6.4×10-‐6 1.06 (0.96, 1.18) 0.22 rs12356193 10 61,083,359 A/G 0.849 1.00 0.04 (0.01, 0.07) 0.0062 1.05 (0.92, 1.21) 0.46 rs17300741 11 64,088,038 A/G 0.499 1.00 0.04 (0.02, 0.06) 1.9×10-‐5 1.06 (0.96, 1.16) 0.27 rs2078267 11 64,090,690 C/T 0.502 1.00 0.05 (0.03, 0.07) 1.1×10-‐5 1.07 (0.97, 1.17) 0.18 rs505802 11 64,113,648 C/T 0.280 1.00 0.03 (0.01, 0.05) 0.0076 1.03 (0.93, 1.15) 0.57 rs1106766 12 56,095,723 C/T 0.689 0.98 0.04 (0.02, 0.06) 0.00084 1.06 (0.95, 1.17) 0.30 The tests for association are based on an effective sample size of 15,506 individuals with uric acid measurements and 968 gout cases and over 40,000 controls. Previously reported SNPs are from Yang et al. and Kolz et al. Publications. aEffects on uric acid levels are in standard deviations.
Nature Genetics: doi:10.1038/ng.972
Supplementary Table 4 – Sex-‐stratified uric acid levels association for novel and previously reported sequence variants associating with gout and serum uric acid levels
Allele Male Female
SNP Chr Pos Effect/other Freq Info Effect (95% CI) P Effect (95% CI) P Pdiff
chr1_142697422 1 142697422 C/T 0.986 0.55 0.58 (0.40, 0.76) 2.4×10-‐10 0.41 (0.26, 0.55) 2.7×10-‐8 0.14 rs1967017 1 144435002 T/C 0.449 0.93 0.04 (0.01, 0.07) 0.0086 0.03 (0.00, 0.05) 0.043 0.47 rs12129861 1 144437046 G/A 0.5 0.9 0.04 (0.01, 0.08) 0.0064 0.03 (0.00, 0.06) 0.035 0.46 rs780094 2 27594741 T/C 0.34 1 0.03 (0.00, 0.07) 0.041 0.04 (0.01, 0.07) 0.0036 0.78 rs780093 2 27596107 T/C 0.342 1 0.03 (0.00, 0.06) 0.053 0.04 (0.01, 0.07) 0.0033 0.70 rs734553 4 9532102 T/G 0.79 1 0.19 (0.15, 0.23) 5.1×10-‐23 0.28 (0.25, 0.31) 2.1×10-‐70 0.00031
rs13129697 4 9536065 T/G 0.767 1 0.18 (0.15, 0.22) 1.2×10-‐22 0.27 (0.24, 0.30) 1.3×10-‐69 0.00026 rs2199936 4 89264355 A/G 0.101 1 0.20 (0.15, 0.25) 7.0×10-‐15 0.13 (0.09, 0.17) 3.3×10-‐9 0.036 rs2231142 4 89271347 T/G 0.101 1 0.20 (0.15, 0.25) 6.5×10-‐15 0.13 (0.08, 0.17) 4.1×10-‐9 0.034 rs675209 6 7047083 T/C 0.26 1 0.06 (0.02, 0.09) 0.001 0.02 (-‐0.01, 0.05) 0.25 0.074 rs742132 6 25715550 A/G 0.716 1 0.01 (-‐0.03, 0.04) 0.66 0.02 (-‐0.01, 0.05) 0.19 0.62 rs1165196 6 25921129 A/G 0.492 1 0.05 (0.01, 0.08) 0.0038 0.05 (0.02, 0.08) 0.00013 0.82 rs1183201 6 25931423 T/A 0.479 1 0.04 (0.01, 0.07) 0.012 0.05 (0.03, 0.08) 0.0001 0.59 rs12356193 10 61083359 A/G 0.849 1 0.05 (0.01, 0.10) 0.018 0.03 (0.00, 0.07) 0.069 0.50 rs17300741 11 64088038 A/G 0.499 1 0.05 (0.01, 0.08) 0.0037 0.04 (0.02, 0.07) 0.0007 0.95 rs2078267 11 64090690 C/T 0.502 1 0.05 (0.02, 0.08) 0.003 0.05 (0.02, 0.07) 0.00048 0.96 rs505802 11 64113648 C/T 0.28 1 0.03 (0.00, 0.07) 0.062 0.03 (0.00, 0.06) 0.048 0.87 rs1106766 12 56095723 C/T 0.689 0.98 0.04 (0.01, 0.08) 0.0097 0.03 (0.00, 0.06) 0.023 0.59 c.1580C>G 19 54660818 G/C 0.019 0.89 0.35 (0.24, 0.46) 2.9×10-‐10 0.38 (0.28, 0.48) 4.9×10-‐14 0.71
Nature Genetics: doi:10.1038/ng.972
Supplementary Table 5 – Sex-‐stratified gout association for novel and previously reported sequence variants associating with gout and serum uric acid levels
Allele Male Female
SNP Chr Pos Effect/other Freq Info OR (95% CI) P OR (95% CI) P Pdiff
chr1_142697422 1 142697422 C/T 0.986 0.55 5.62 (2.01, 15.70) 0.00098 0.88 (0.40, 1.94) 0.75 0.0049 rs1967017 1 144435002 T/C 0.449 0.93 1.09 (0.95, 1.23) 0.21 1.08 (0.92, 1.25) 0.37 0.92 rs12129861 1 144437046 G/A 0.5 0.9 1.06 (0.93, 1.22) 0.32 0.99 (0.85, 1.15) 0.89 0.45 rs780094 2 27594741 T/C 0.34 1 1.19 (1.05, 1.35) 0.006 1.16 (1.00, 1.35) 0.057 0.82 rs780093 2 27596107 T/C 0.342 1 1.19 (1.04, 1.35) 0.0084 1.16 (1.00, 1.35) 0.058 0.82 rs734553 4 9532102 T/G 0.79 1 1.33 (1.14, 1.56) 0.00034 1.47 (1.22, 1.79) 7.6×10-‐5 0.44
rs13129697 4 9536065 T/G 0.767 1 1.30 (1.11, 1.52) 0.00075 1.35 (1.14, 1.61) 0.001 0.74 rs2199936 4 89264355 A/G 0.101 1 1.79 (1.49, 2.13) 1.8×10-‐10 1.44 (1.16, 1.80) 0.0011 0.13 rs2231142 4 89271347 T/G 0.101 1 1.79 (1.49, 2.13) 1.7×10-‐10 1.45 (1.15, 1.79) 0.0011 0.15 rs675209 6 7047083 T/C 0.26 1 0.95 (0.82, 1.10) 0.47 1.15 (0.98, 1.35) 0.088 0.076 rs742132 6 25715550 A/G 0.716 1 1.07 (0.94, 1.23) 0.3 1.09 (0.93, 1.29) 0.3 0.86 rs1165196 6 25921129 A/G 0.492 1 1.08 (0.96, 1.22) 0.22 1.12 (0.97, 1.30) 0.13 0.71 rs1183201 6 25931423 T/A 0.479 1 1.03 (0.92, 1.18) 0.58 1.10 (0.95, 1.28) 0.19 0.48 rs12356193 10 61083359 A/G 0.849 1 1.13 (0.95, 1.35) 0.17 0.97 (0.79, 1.19) 0.76 0.25 rs17300741 11 64088038 A/G 0.499 1 1.04 (0.92, 1.18) 0.51 1.09 (0.94, 1.26) 0.24 0.62 rs2078267 11 64090690 C/T 0.502 1 1.05 (0.93, 1.19) 0.45 1.11 (0.96, 1.28) 0.16 0.57 rs505802 11 64113648 C/T 0.28 1 1.02 (0.89, 1.17) 0.75 1.04 (0.88, 1.22) 0.64 0.85 rs1106766 12 56095723 C/T 0.689 0.98 1.07 (0.93, 1.22) 0.33 1.04 (0.89, 1.22) 0.61 0.78 c.1580C>G 19 54660818 G/C 0.019 0.89 3.85 (2.86, 5.56) 1.7×10-‐16 2.04 (1.28, 3.23) 0.0024 0.027
Nature Genetics: doi:10.1038/ng.972
Supplementary Table 6 – Association of uric acid sequence variants with age at visit to the clinician
Allele
SNP Chr Pos Effect/other Effect (95% CI) P
chr1_142697422 1 142697422 C/T -‐14.94 (-‐30.00, 0.11) 0.052 rs1967017 1 144435002 T/C -‐0.87 (-‐2.71, 0.97) 0.35 rs12129861 1 144437046 G/A -‐0.80 (-‐2.66, 1.07) 0.40 rs780094 2 27594741 T/C -‐0.23 (-‐1.98, 1.52) 0.80 rs780093 2 27596107 T/C -‐0.23 (-‐1.98, 1.53) 0.80 rs734553 4 9532102 T/G 0.24 (-‐2.15, 2.64) 0.84
rs13129697 4 9536065 T/G -‐0.35 (-‐2.60, 1.90) 0.76 rs2199936 4 89264355 A/G -‐3.41 (-‐5.85, -‐0.97) 0.0061 rs2231142 4 89271347 T/G -‐3.41 (-‐5.85, -‐0.97) 0.0062 rs675209 6 7047083 T/C 1.10 (-‐0.95, 3.14) 0.29 rs742132 6 25715550 A/G 0.83 (-‐1.14, 2.81) 0.41 rs1165196 6 25921129 A/G -‐0.77 (-‐2.51, 0.97) 0.38 rs1183201 6 25931423 T/A -‐0.72 (-‐2.48, 1.04) 0.42 rs12356193 10 61083359 A/G 0.56 (-‐1.97, 3.09) 0.67 rs17300741 11 64088038 A/G 0.08 (-‐1.62, 1.78) 0.93 rs2078267 11 64090690 C/T 0.20 (-‐1.50, 1.90) 0.82 rs505802 11 64113648 C/T -‐0.25 (-‐2.11, 1.60) 0.79 rs1106766 12 56095723 C/T 0.55 (-‐1.42, 2.51) 0.58 c.1580C>G 19 54660818 G/C -‐7.62 (-‐12.36, -‐2.89) 0.0016
Effect is expressed in years.
Nature Genetics: doi:10.1038/ng.972
Supplementary Figure 1
The sequencing depth of the 457 whole-genome sequenced individuals.
Nature Genetics: doi:10.1038/ng.972
Supplementary Figure 2
Quantile-quantile plot of the 15,957,390 SNPs in the genome-wide association scan for gout. The blue ‘x’s represent the P values scaled down by the genomic control inflation factor of 1.10. The diagonal red line represents where the dots are expected to fall under the null hypothesis of no association. The horizontal green line represents P = 5 × 10-8.
Nature Genetics: doi:10.1038/ng.972
Supplementary Figure 3
Quantile-quantile plot of the 15,957,390 SNPs in the genome-wide association scan for uric acid levels. The blue ‘x’s represent the P values scaled down by the genomic control inflation factor of 1.18. The diagonal red line represents where the dots are expected to fall under the null hypothesis of no association. The horizontal green line represents P = 5 × 10-8.
Nature Genetics: doi:10.1038/ng.972