11
Prevention and Epidemiology Identication of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan 1 , Prabhakaran Munusamy 1 , Sau Yeen Loke 1 , Geok Ling Koh 1 , Edward Sern Yuen Wong 1 , Hai Yang Law 2 , Chui Sheun Yoon 2 , Min-Han Tan 3,4,5 ,Yoon Sim Yap 3 , Peter Ang 3,6 , and Ann Siew Gek Lee 1,7,8 Abstract It has been estimated that >1,000 genetic loci have yet to be identied for breast cancer risk. Here we report the rst study utilizing targeted next-generation sequencing to identify single- nucleotide polymorphisms (SNP) associated with breast cancer risk. Targeted sequencing of 283 genes was performed in 240 women with early-onset breast cancer (40 years) or a family history of breast and/or ovarian cancer. Common coding variants with minor allele frequencies (MAF) >1% that were identied were presumed initially to be SNPs, but further database inspec- tions revealed variants had MAF of 1% in the general popula- tion. Through prioritization and stringent selection criteria, we selected 24 SNPs for further genotyping in 1,516 breast cancer cases and 1,189 noncancer controls. Overall, we identied the JAK2 SNP rs56118985 to be signicantly associated with overall breast cancer risk. Subtype analysis performed for patient sub- groups dened by ER, PR, and HER2 status suggested additional associations of the NOTCH3 SNP rs200504060 and the HIF1A SNP rs142179458 with breast cancer risk. In silico analysis indi- cated that coding amino acids encoded at these three SNP sites were conserved evolutionarily and associated with decreased protein stability, suggesting a likely impact on protein function. Our results offer proof of concept for identifying novel cancer risk loci from next-generation sequencing data, with iterative data analysis from targeted, whole-exome, or whole-genome sequenc- ing a wellspring to identify new SNPs associated with cancer risk. Cancer Res; 77(19); 542837. Ó2017 AACR. Introduction Large-scale genome-wide association studies utilizing high- density genotyping microarrays have identied approximately 100 common variants associated with breast cancer risk (14), with >1000 additional loci yet to be identied (4). These common genetic variants have high minor allele frequencies (MAF) >1% and are associated with elevated breast cancer risk with ORs that are typically below 1.5 as compared with the general population (5). Most of these variants are located in the intronic or intergenic regions, with a small proportion (4%) within the coding regions (6, 7). Whole-exome sequencing and targeted gene sequencing have generated an enormous amount of sequence data for coding regions of genes. Typically, variants that are detected with MAFs of >1% within the cases, are ltered out at an early stage of data analysis, as these variants are assumed to be common polymorph- isms. We hypothesized that these common coding variants in breast cancer patients could be associated with breast cancer risk. In this proof-of-concept study, targeted next-generation sequencing of 283 cancer-associated genes (Supplementary Table S1) was performed for 240 women with a family history of breast and/or ovarian cancer or early-onset breast cancer. We identied coding variants with MAF > 1% among these women but with MAF 1% within the general population (ascertained from the 1000 Genomes Project and ExAC databases), and 24 coding variants were selected for high-throughput genotyping in an additional cohort of 1,516 cases and 1,189 controls. Patients and Methods Study population The discovery phase of the study utilized DNA samples obtained from 240 women with a family history of breast cancer and/or ovarian cancer or early-onset breast cancer (collectively designated as FH), who were referred to the National Cancer Centre Singapore (NCCS) for genetic risk assessment. They were invited to participate in the study if they had a family history of breast and/or ovarian cancer in rst- and/or second-degree rela- tives; had both breast and ovarian cancer or bilateral breast cancer; or if they had early-onset breast or ovarian cancer at 40 years of age (Table 1; ref. 8). Of the 240 subjects, 12 did not have a personal cancer history of breast and/or ovarian cancer. Peripheral blood samples were taken and DNA was extracted using an optimized in-house method (9). The validation phase was performed using 1,516 DNA samples from women of Chinese ancestry with breast cancer (Table 1). 1 Division of Medical Sciences, Humphrey Oei Institute of Cancer Research, National Cancer Centre, Singapore. 2 DNA Diagnostic and Research Laboratory, KK Women's and Children's Hospital, Singapore. 3 Department of Medical Oncology, National Cancer Centre, Singapore. 4 Institute of Bioengineering and Nanotechnology, Singapore. 5 Lucence Diagnostics Pte Ltd, Singapore. 6 Onco- care Cancer Centre, Gleneagles Medical Centre, Singapore. 7 Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore. 8 Ofce of Clinical & Academic Faculty Affairs, Duke-NUS Graduate Medical School, Singapore. Note: Supplementary data for this article are available at Cancer Research Online (http://cancerres.aacrjournals.org/). C.H.T. Chan and P. Munusamy contributed equally to this article. Corresponding Author: Ann Siew Gek Lee, Division of Medical Sciences, Hum- phrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 11 Hospital Drive, Singapore 169610. Phone: 65-6436-8313; Fax: 65-6372-0161; E-mail: [email protected] doi: 10.1158/0008-5472.CAN-17-0992 Ó2017 American Association for Cancer Research. Cancer Research Cancer Res; 77(19) October 1, 2017 5428 on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

Prevention and Epidemiology

Identification of Novel Breast Cancer Risk LociClaire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1,Geok Ling Koh1, Edward Sern Yuen Wong1, Hai Yang Law2, Chui Sheun Yoon2,Min-Han Tan3,4,5, Yoon Sim Yap3, Peter Ang3,6, and Ann Siew Gek Lee1,7,8

Abstract

It has been estimated that >1,000 genetic loci have yet to beidentified for breast cancer risk. Here we report the first studyutilizing targeted next-generation sequencing to identify single-nucleotide polymorphisms (SNP) associated with breast cancerrisk. Targeted sequencing of 283 genes was performed in 240women with early-onset breast cancer (�40 years) or a familyhistory of breast and/or ovarian cancer. Common coding variantswith minor allele frequencies (MAF) >1% that were identifiedwere presumed initially to be SNPs, but further database inspec-tions revealed variants had MAF of �1% in the general popula-tion. Through prioritization and stringent selection criteria, weselected 24 SNPs for further genotyping in 1,516 breast cancercases and 1,189 noncancer controls. Overall, we identified the

JAK2 SNP rs56118985 to be significantly associated with overallbreast cancer risk. Subtype analysis performed for patient sub-groups defined by ER, PR, and HER2 status suggested additionalassociations of the NOTCH3 SNP rs200504060 and the HIF1ASNP rs142179458 with breast cancer risk. In silico analysis indi-cated that coding amino acids encoded at these three SNP siteswere conserved evolutionarily and associated with decreasedprotein stability, suggesting a likely impact on protein function.Our results offer proof of concept for identifying novel cancer riskloci from next-generation sequencing data, with iterative dataanalysis from targeted, whole-exome, or whole-genome sequenc-ing a wellspring to identify new SNPs associated with cancer risk.Cancer Res; 77(19); 5428–37. �2017 AACR.

IntroductionLarge-scale genome-wide association studies utilizing high-

density genotyping microarrays have identified approximately100 common variants associated with breast cancer risk (1–4),with>1000 additional loci yet to be identified (4). These commongenetic variants have high minor allele frequencies (MAF) >1%and are associated with elevated breast cancer risk with ORs thatare typically below 1.5 as compared with the general population(5). Most of these variants are located in the intronic or intergenicregions, with a small proportion (�4%)within the coding regions(6, 7).

Whole-exome sequencing and targeted gene sequencing havegenerated an enormous amount of sequence data for coding

regions of genes. Typically, variants that are detected with MAFsof >1% within the cases, are filtered out at an early stage of dataanalysis, as these variants are assumed tobe commonpolymorph-isms. We hypothesized that these common coding variants inbreast cancer patients could be associated with breast cancer risk.

In this proof-of-concept study, targeted next-generationsequencing of 283 cancer-associated genes (Supplementary TableS1) was performed for 240 women with a family history of breastand/or ovarian cancer or early-onset breast cancer. We identifiedcoding variants with MAF > 1% among these women but withMAF � 1% within the general population (ascertained from the1000 Genomes Project and ExAC databases), and 24 codingvariants were selected for high-throughput genotyping in anadditional cohort of 1,516 cases and 1,189 controls.

Patients and MethodsStudy population

The discovery phase of the study utilized DNA samplesobtained from 240 women with a family history of breast cancerand/or ovarian cancer or early-onset breast cancer (collectivelydesignated as FH), who were referred to the National CancerCentre Singapore (NCCS) for genetic risk assessment. They wereinvited to participate in the study if they had a family history ofbreast and/or ovarian cancer in first- and/or second-degree rela-tives; had both breast and ovarian cancer or bilateral breast cancer;or if they had early-onset breast or ovarian cancer at�40 years ofage (Table 1; ref. 8). Of the 240 subjects, 12 did not have apersonal cancer history of breast and/or ovarian cancer. Peripheralblood samples were taken and DNA was extracted using anoptimized in-house method (9).

The validation phase was performed using 1,516DNA samplesfrom women of Chinese ancestry with breast cancer (Table 1).

1Division of Medical Sciences, Humphrey Oei Institute of Cancer Research,National Cancer Centre, Singapore. 2DNA Diagnostic and Research Laboratory,KK Women's and Children's Hospital, Singapore. 3Department of MedicalOncology, National Cancer Centre, Singapore. 4Institute of Bioengineering andNanotechnology, Singapore. 5Lucence Diagnostics Pte Ltd, Singapore. 6Onco-care Cancer Centre, Gleneagles Medical Centre, Singapore. 7Department ofPhysiology, Yong Loo Lin School of Medicine, National University of Singapore,Singapore. 8Office of Clinical & Academic Faculty Affairs, Duke-NUS GraduateMedical School, Singapore.

Note: Supplementary data for this article are available at Cancer ResearchOnline (http://cancerres.aacrjournals.org/).

C.H.T. Chan and P. Munusamy contributed equally to this article.

Corresponding Author: Ann Siew Gek Lee, Division of Medical Sciences, Hum-phrey Oei Institute of Cancer Research, National Cancer Centre Singapore, 11Hospital Drive, Singapore 169610. Phone: 65-6436-8313; Fax: 65-6372-0161;E-mail: [email protected]

doi: 10.1158/0008-5472.CAN-17-0992

�2017 American Association for Cancer Research.

CancerResearch

Cancer Res; 77(19) October 1, 20175428

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 2: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

These sampleswere obtained frompatients recruited at outpatientclinics at NCCS and Singapore General Hospital or were archivalfrozen peripheral blood samples from the SingHealth TissueRepository (STR). DNA was extracted using the same in-housemethod used in the discovery phase. A control group of 1,189healthy women of Chinese ancestry were also included. Thesecontrols were archival DNA samples obtained from the DNADiagnostic and Research Lab, KK Women's and Children's Hos-pital, Singapore.

The study was approved by the SingHealth Centralized Insti-tutional Review Board (CIRB Ref: 2008/478/B), and writteninformed consentwas taken fromeach participant. Patient studieswere conducted in accordance with the ethical guidelines of theDeclaration of Helsinki.

Targeted sequencing and data analysisThe 240 DNA samples for the discovery phase were

sequenced using a multi-gene target panel consisting of 283genes (Supplementary Table S1). Target exome enrichmentwas performed using the Agilent SureSelect kit, and wassubsequently sequenced on the Illumina Hiseq 2000 or Hiseq4000 platforms. Using the Burrows–Wheeler alignment (BWAv0.7.5) tool (10), the raw sequence reads were mapped againstthe human reference genome (hg19) sequence. The alignedreads were sorted, reordered, and processed for PCR duplicatesusing Picard tool v1.74 (http://broadinstitute.github.io/picard/). Indel realignment was performed on the targets usingthe GATK IndelRealigner module (11). Variant (SNPs andindels) detection in the target regions was carried out usingGATK HaplotypeCaller algorithm v3.4-46 (11). The functionalannotation of the variants was performed using the ANNOVARpipeline (12), and tools such as SIFT (13), PolyPhen-2 (14),MutationTaster (15), Mutation Assessor (16), and CADD (17)were used to predict the impact of amino acid change on theprotein.

Variant filtering and SNP selectionVariants in the exonic, and �50 bp intronic regions flanking

the exons were included in the analysis. In the next step offiltering, variants were selected on the basis of minor allelefrequency (MAF �1%) obtained from the 1000 GenomesProject (18) and ExAC databases (19) for all ethnicities (Fig.1). Subjects with mutations in 25 known breast cancer predis-position genes (ATM, BARD1, BMPR1A, BRCA1, BRCA2, BRIP1,CDH1, CDKN2A, CHEK2, FANCC, MLH1, MSH2, MSH6, NBN,NF1, PALB2, PMS2, PTEN, RAD51C, RAD51D, SMAD4, STK11,TP53, VHL, and XRCC2) associated with increasing breastcancer risk were excluded. After exclusion, we had 170 subjectsin our study cohort. From the filtered list of variants, thecorresponding list of genes were compared against the list ofhighly mutated genes carrying somatic mutations in the pub-licly available TCGA breast and ovarian cancer datasets (http://cancergenome.nih.gov/), and genes that were common to bothwere selected, and its variants were chosen for further analysisin this study. Variants with a PhyloP (20) conservation score ofzero and above were retained, as were variants with a CADDscore greater than or equal to 10. Finally, a set of 24 SNPs werechosen for SNP genotyping, prioritized by the frequency ofoccurrence in the samples (Fig. 1).

SNP genotypingSNP genotyping was carried out on 192.24 Dynamic Array

integrated fluidic circuits (IFC) using TaqMan SNP GenotypingAssays (Applied Biosystems; ref. 21). The IFC Controller RX(Fluidigm) was used to load samples and assays onto the IFC,and the BioMark HD (Fluidigm)was used for thermal cycling anddetection of fluorescence. Data were analyzed using the FluidigmSNP Genotyping Analysis software, which automatically callsgenotypes based on k-means clustering.

Table 1. Demographic and clinical characteristics of study participants

Discoverycohort Validation cohortSubjects(n ¼ 240)

Cases(n ¼ 1,516)

Controls(n ¼ 1,189)

Personal cancer historyBreast 205 1,490 —

Ovarian 12 — —

Breast and ovarian 11 26 —

None 12 — —

Agea (years)Mean 39.2 51.5 42.8Median 38 51 43Range 19–67 24–91 21–79Family history 112 81 —

Early onset breast cancer(�40 years old)

151 192 —

ER statusPositive 132 1,019 —

Negative 67 433 —

Unknown 17 64 —

PR statusPositive 111 870 —

Negative 88 577 —

Unknown 17 69 —

Her2 statusPositive 52 341 —

Negative 123 772 —

Equivocal 12 104 —

Unknown 29 299 —

Triple-negative breast cancer 32 131 —

Breast cancer histologic typeInvasive ductal carcinoma (IDC) 162 1,221 —

Invasive lobular carcinoma (ILC) 5 48 —

Invasive micropapillary carcinoma 0 12 —

Invasive mucinous carcinoma 9 33 —

Ductal carcinoma in situ (DCIS) 9 31 —

Others 4 16 —

Mixed histologic types 13 66 —

Unknown 14 89 —

Histologic gradeGrade 1 9 177 —

Grade 2 42 500 —

Grade 3 69 592 —

Unknown 87 247 —

Tumor size�20 mm 73 521 —

20 mm to �50 mm 46 618 —

>50 mm 16 117 —

Unknown 81 260 —

Lymph node statusNegative 80 636 —

1–3 positive 38 401 —

4–9 positive 13 168 —

�10 positive 14 104 —

Unknown 71 207 —aAge refers to the age of cancer diagnosis for cases and the age at recruitmentfor controls.

Novel Breast Cancer Risk Loci Identified by Targeted NGS

www.aacrjournals.org Cancer Res; 77(19) October 1, 2017 5429

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 3: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

SNP association analysisStatistical analysis on SNP genotype data was performed using

the R package, SNPassoc (22) to estimate the per-allele OR with95%confidence intervals (CI), and evaluated using logistic regres-sion model to determine their significance. In addition, associa-tion of the SNPs was carried out with respect to ER status, PRstatus, HER2 status, sporadic breast cancer, or FH. A P value of�0.05 was considered statistically significant.

Detection of amino acid conservation using ConSurf analysisand multiple sequence alignment

ConSurf analysis (http://consurftest.tau.ac.il/) was carried outto identify the conservation of amino acid residue positionin a protein utilizing the phylogenetic relationship between

homologous sequences. Using the R Bioconductor package msa(23), multiple sequence alignment across 12 different species wascarried out based on the ClustalOmega method (24). Forrs200504060 (NOTCH3), only 10 species were available.

Protein structure stability predictionTo determine the effect of the mutation on protein structure

and function, we used three different protein stability predictionprograms namely I-Mutant2.0 (25), Impact of Non synonymousvariations on Protein Stability-Multi-Dimension (INPS-MD)(26), and the HOPE server (27). The protein sequences of thethree genes JAK2, NOTCH3, and HIF1A were retrieved from theNCBI protein database and were used for the analysis of proteinstability. The accession numbers for the protein sequences of the

Targeted sequencing of 283 genes for 240 subjects

Exonic and splicing variants with exclusion of synonymous variants(2,083 variants in 260 genes)

Selected variants with MAF ≤ 1% in all populations as reported in the 1000 Genomes project and ExAC databases (1,699 variants in 254 genes)

After exclusion of samples carrying mutations in 25 known breast cancer predisposition genes (1,154 variants in 217 genes, in 170 subjects)

Selected variants in genes also found to be mutated in both breast and ovarian cancers in the TCGA dataset (1,082 variants in 197 genes)

Selected variants with PhyloP conservation score ≥ 0 or no score(970 variants in 191 genes)

Selected variants with CADD score of 10 or above(775 variants in 186 genes)

Selected variants present in 2 or more subjects(204 variants in 106 genes)

Selected 24 variants with frequency > 3.5% in subjects

Figure 1.

Flow chart for the selection of 24SNPs.

Chan et al.

Cancer Res; 77(19) October 1, 2017 Cancer Research5430

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 4: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

genes JAK2, NOTCH3, and HIF1A that were used areNP_004963.1, NP_000426.2, and NP_001521.1, respectively.

ResultsThe discovery cohort (n¼ 240) comprised of subjects whowere

Chinese (82.9%), Malays (7.1%), Indians (2.9%), and others(7.1%). The clinicopathologic characteristics of this cohort areshown in Table 1. To investigate the association of the 24 selectedSNPs with breast cancer risk, genotyping was performed on1,516 breast cancer cases and 1,189 controls, all of whom wereChinese. The clinicopathologic characteristics of these subjects aredescribed in Table 1.

Targeted sequencing resulted in the detection of 1,154 exonicand splicing variants within 217 genes excluding synonymousvariants. After filtering and annotation of variants as shown in Fig.1, therewere 204 variants in 106 genes. Each of these variantswerepresent in varying proportions in our cohort of 170 patientsamples ranging from 1.17% to 97.64%. The observation of thesevariants being common in our discovery cohort cases was incontrast to their minor allele frequency (�1% in all populations,and�4.2% in the East Asian population) reported in the publiclyavailable databases such as the 1000 Genomes Project and ExAC(Table 2). Taking into consideration the fact that common var-iants have been found to be associated with breast cancer risk, weselected 24 variants with the highest occurrence in our cohort todetermine their association with breast cancer risk. In addition,the functional effect of the 24 variants was predicted to bedeleterious by one or more in silico prediction tools (Table 2).Furthermore, all 24 variants had a positive nucleotide conserva-tion score (range, 0.879–9.230) measured using the PhyloPprogram (Table 2).

All 24 SNP assays had a call rate of more than 95.0%, with anaverage call rate of 98.34%. All 24 SNPs are not included oncommercially available Illumina and Affymetrix genotypingarrays. Five SNPs, rs112515611, rs60244562, rs199839047,rs112790792, and rs4024370, were found to be monomorphicin our cases and controls and were excluded from analysis. Afterexclusion of these monomorphic SNPs and applying Bonferronicorrection to the remaining 19 SNPs, one SNP, rs56118985located at 9p24.1/JAK2 was found to be significantly associatedwithbreast cancer risk via an additivemodel (per alleleOR¼1.81;95% CI ¼ 1.24 - 2.64; P ¼ 0.00331; Table 3).

The association of 19 SNPs with clinicopathologic parameters(ER, PR, andHER2 status; sporadic breast cancer, andFH)was alsoinvestigated (Table 4). The number of cases in each subgroup islisted in Table 4. For rs56118985, only sporadic (per-allele OR ¼1.81; 95%CI¼ 1.22–2.69; P¼ 0.004495) and PR-negative breastcases (per-allele OR ¼ 2.02; 95% CI ¼ 1.28–3.18; P ¼ 0.00381)showed significant associations with breast cancer risk (Table 4;Supplementary Table S2).

For SNP rs200504060, which maps to 19p13.12/NOTCH3,significant association with breast cancer risk were detected insporadic (per-allele OR ¼ 2.45; 95% CI ¼ 1.39–4.33; P ¼0.002331), ER-positive (per-allele OR ¼ 2.49; 95% CI ¼ 1.38–4.49; P ¼ 0.00323) and HER2-positive breast cancer (per-alleleOR ¼ 3.31; 95% CI ¼ 1.64–6.69; P ¼ 0.001175; Table 4;Supplementary Table S3).

Another SNP, rs142179458, located at 14q23.2 within theHIF1A genewas identified to be specifically associated with breastcancer risk only in HER2-negative cases (per-allele OR ¼ 1.92;

95% CI¼ 1.26–2.90; P¼ 0.00147; Table 4; Supplementary TableS4). Because of the smaller number of cases in some of thesubgroups, wider CIs were observed. This could be attributed tothe smaller number of samples present within each subgroup,suggesting a lack of sufficient statistical power. Further studieswith larger samples sizes of the different hormone receptor sub-groups should be carried out to confirm our findings.

The neural network algorithm of ConSurf was used topredict a conservation score for each amino acid with theconservation scale ranging from 1 to 9 (a score of 1 being"variable" to 9 being "highly conserved"). The amino acid atposition 127 of the JAK2 protein (SNP: rs56118985) had ascore of between 6 and 7 (moderately conserved), and ispredicted to be an exposed residue based on the algorithm.However, the amino acid residues at position 1175 ofNOTCH3 (SNP: rs200504060) and at position 349 of HIF1A(SNP: rs142179458) both had scores of 3 (likely less con-served). In addition, multiple sequence alignment basedon the ClustalOmega method showed the residues of interestto be conserved across different species for all three sites(Fig. 2A–C).

Analysis of protein stability for the JAK2 SNP rs56118985predicted a decrease in stability for the G127D mutation by I-Mutant2.0 and INPS-MDwith free energy values (DDG) of –0.07and –0.1836 Kcal/mol, respectively. The HOPE server analysis ofthe JAK2 SNP reported the mutant residue to be larger than thewild-type. In addition, themutation alters the charge fromneutralto negative with increased hydrophobicity. Likewise, the SNPrs200504060, that causes the change of amino acid residue "R"at position 1175 to "W" in the NOTCH3 protein results inalteration of the amino acid charge from positive to neutral withthe mutant residue being larger as reported by the HOPE server.SNP rs200504060 caused reduced NOTCH3 protein stability aspredicted by the tools, I-Mutant2.0 and INPS-MD with DDGvalues of –0.35 and –0.6867 Kcal/mol, respectively. For the SNPrs142179458 within the HIF1A gene, decreased protein stabilitywas predicted by I-Mutant2.0 and INPS-MD with DDG values of–0.28 and –0.5563 Kcal/mol, respectively, due to the D349Nmutation. In addition, analysis by the HOPE server reported thatthe amino acid charge changed from negative to neutral, therebylikely disturbing its function due to differences in amino acidproperties (Table 5).

DiscussionCommon variants associated with breast cancer risk have

been identified from several GWAS. We report here the firststudy utilizing targeted next-generation sequencing to identifybreast cancer risk loci. Through high-throughput SNP geno-typing of 24 selected SNPs, three novel coding variants werefound to be significantly associated with breast cancer risk inChinese.

We have detected a novel SNP, rs56118985, associated withbreast cancer risk via an additive model in a Singaporean Chinesepopulation. Rs56118985 has been reported to be associated withacute leukemia and acutemyeloid leukemia in a single study donein Chinese (28). No other association or functional studies havebeen done on rs56118985. Rs56118985 is located in the codingregion of the JAK2 gene on 9p24.1. The JAK–STAT signalingpathway plays a role in proliferation, differentiation, and apo-ptosis, and has been implicated in tumorigenesis and cancer

Novel Breast Cancer Risk Loci Identified by Targeted NGS

www.aacrjournals.org Cancer Res; 77(19) October 1, 2017 5431

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 5: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

Table

2.MAFofthe24

SNPsan

dtheirfunctiona

leffect

predictionusingin

silicotools

MAF

Insilicopatho

gen

icityprediction

SNP

Locu

sGen

eAlle

lesa

Freque

ncy,

%(n

¼170)b

1000

Gen

omes

(ALL

)

1000

Gen

omes

(EAS)

ExA

C(A

LL)

ExA

C(EAS)

SIFTc

PolyPhe

nMutationTa

ster

Mutation

Assesso

rd

CADDC

scaled

score

e

Phy

loP

conservation

scoref

rs1125

15611

7q36

.1KMT2

CG/A

97.64

NR

NR

NR

NR

0.16

(T)

Possibly

dam

aging

Disea

secausing

1.245(L)

15.34

6.332

rs6024

456

27q

36.1

KMT2

CT/C

90.00

NR

NR

NR

NR

0.09(T)

Dam

aging

Disea

secausing

1.15(L)

13.48

1.399

rs199839

047

7q36

.1KMT2

CA/G

84.70

NR

NR

NR

NR

0(D

)Dam

aging

Disea

secausing

1.355

(L)

16.84

7.062

rs1127

9079

27q

36.1

KMT2

CC/T

11.18

NR

NR

NR

NR

NA

NA

Disea

secausing

NA

23.8

7.814

rs20

1760077

8p11.21

KAT6

ATCT/-

10.59

0.007

0.03

0.006

0.04

NA

NA

NA

NA

NA

NA

rs78

12874

412q12

ARID2

A/G

7.65

0.005

0.024

0.003

0.03

0.34(T)

Ben

ign

Disea

secausing

0(N

)11.12

5.517

rs35

11826

210q11.21

RET

C/A

6.47

0.004

0.019

0.002

0.027

0.14

(T)

Ben

ign

Disea

secausing

1.095(L)

17.54

3.75

8rs14624

2251

22q13.2

EP30

0A/G

6.47

0.003

0.011

0.002

0.017

0.02(D

)Ben

ign

Polymorphism

automatic

0.975

(L)

11.57

1.289

rs34

1728

43

13q12.2

FLT

3T/A

6.47

0.005

0.025

0.002

0.023

0.13

(T)

Ben

ign

Polymorphism

0(N

)15.46

2.891

rs13839

9473

Xq11.2

AMER1

C/T

5.88

0.007

0.028

0.003

0.034

0.29(T)

Ben

ign

Disea

secausing

0.695(N

)16.03

2.52

4rs15080473

811q23

.3KMT2

AG/A

5.88

0.005

0.024

0.002

0.034

0.23(T)

Dam

aging

Disea

secausing

1.245(L)

15.49

6.778

rs20

056

7881

6p21.32

DAXX

CCT/-

5.88

0.004

0.02

0.002

0.018

NA

NA

NA

NA

NA

NA

rs75

191113

7q36

.1KMT2

CG/T

5.88

0.006

0.03

0.002

0.024

1(T)

Ben

ign

Disea

secausing

1.74(L)

11.91

6.086

rs142179

458

14q23

.2HIF1A

G/A

4.71

0.005

0.022

0.002

0.026

0.01(D

)Possibly

dam

aging

Disea

secausing

2.07(M

)16.97

5.462

rs38

32931

14q32

.31

HSP

90AA1

TTT/-

4.71

0.008

0.038

0.003

0.042

NA

NA

NA

NA

NA

NA

rs4024

370

7q36

.1KMT2

CG/A

4.71

NR

NR

NR

NR

1(T)

NA

Disea

secausing

NA

41

4.399

rs20

050

4060

19p13.12

NOTC

H3

G/A

4.71

0.002

0.008

0.001

0.013

0(D

)Possibly

dam

aging

Disea

secausing

2.485(M

)12.93

3.883

rs56

118985

9p24

.1JA

K2

G/A

4.12

0.004

0.015

0.002

0.018

0.09(T)

Dam

aging

Polymorphism

1.355

(L)

16.86

2.114

rs78

004519

7q36

.1KMT2

CA/G

4.12

0.003

0.016

0.002

0.015

0.05(D

)Ben

ign

Disea

secausing

2.215(M

)16.16

6.388

rs75

758215

2q22

.1LR

P1B

G/A

4.12

0.004

0.016

0.002

0.015

0.3

(T)

Possibly

dam

aging

Disea

secausing

1.39(L)

12.76

0.879

rs75

321043

1p34

.1MUTY

HC/T

4.12

0.002

0.008

0.001

0.013

0.54(T)

Dam

aging

Disea

secausing

1.735

(L)

23.3

9.23

rs79

7774

94

1p34

.1MUTY

HG/A

4.12

0.002

0.008

0.001

0.013

0.09(T)

Possibly

dam

aging

Polymorphism

1.445(L)

18.07

1.433

rs150513105

17p12

NCOR1

C/T

4.12

0.002

0.012

0.001

0.011

1(T)

Ben

ign

Disea

secausing

1.04(L)

12.64

1.318

rs37

823

5612q13.12

KMT2

DC/T

3.53

0.002

0.009

0.001

0.014

0.05(D

)Dam

aging

Disea

secausing

1.965(M

)32

6.006

NOTE:A

LLinclud

esEastAsian

(EAS),SouthAsian

(SAS),Europea

n(EUR),African

(AFR),an

dAdmixed

American

(AMR)population.

Abbreviations:NR,n

otreported

;NA,p

redictionisno

tavailable.

aMajor/minorallele

oftheSNP.

bFreque

ncyofalternateho

mozygous/heterozygous

gen

otypes

inthedisco

very

coho

rt.

c D,d

eleterious

(SIFT�

0.05);T,tolerated(SIFT>0.05).

dVariantsclassified

asne

utral(N)/low

(L)arepredictedto

beno

nfun

ctiona

l;varian

tsclassified

asmed

ium

(M)/high(H

)arepredictedto

befunctiona

l.eVariantswithCADDscoresofmore

than

oreq

ualto

10areclassified

asdeleterious.

f Positive

scoresindicateev

olutiona

ryco

nservation.

Chan et al.

Cancer Res; 77(19) October 1, 2017 Cancer Research5432

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 6: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

development (29). Germline JAK2 mutations have been mostcommonly associated with myeloproliferative neoplasms (30).

The rs56118985 variant allele (MAF ¼ 0.0161) was onlypresent in East Asians in a study that characterized germlinevariations in 158 cancer susceptibility genes by analyzing wholegenome sequences of 681 healthy individuals of diverse ethnic-ities (31). MAFs of rs56118985 from 1000 Genomes Project andExAC databases were also higher in East Asian populations ascomparedwith the general population (0.015 vs. 0.004 and 0.018vs. 0.002, respectively). Specifically looking at a database com-prising of 765 healthy Singaporean individuals (http://beacon.prism-genomics.org/), we found theMAFof this particular variantto be 0.016. Taken together, this suggests that this particular SNPhas an increased MAF in East Asian populations. It has been wellestablished that genetic variants identified could have allelefrequencies that differ among different ethnicities, and couldconfer varying degrees of disease susceptibility (32). Association

studies carried out on a single population may not always beapplicable to other populations, emphasizing the importance ofthe role of ethnicity in these studies. GWAS have been highlysuccessful in identifying risk loci that contribute to breast cancersusceptibility (1–4). However, the majority of these studies havebeen performed in European populations, and risk loci identifieddo not always apply to Asian populations. The approachdescribed in this current study has successfully demonstrated thefeasibility of identifying novel breast cancer risk loci using next-generation sequencing, and this approach could be extended toother populations and cancers.

We also identified another SNP, rs200504060, which wasspecifically associated with HER2-positive breast cancer.Rs200504060 resides in the coding region of NOTCH3 on19p13.2. The Notch family of proteins are highly conserved andare crucial in development. They are involved in signaling path-ways that controls cell fate by influencing proliferation,

Table 3. Association of 19 SNPs with breast cancer risk in the validation cohort

SNP Chr: Position Gene Allelesa Risk allele Frequency, % (n ¼ 1,516)b OR (95% CI) P

rs20176077 8: 41794797 KAT6A TCT/- — 7.9 1.01 (0.78–1.31) 0.95237rs78128744 12: 46243406 ARID2 A/G A 98.4 1.28 (0.94–1.74) 0.12139rs35118262 10: 43600607 RET C/A A 6.27 1.12 (0.82–1.52) 0.48617rs146242251 22: 41527628 EP300 A/G G 3.1 1.12 (0.73–1.73) 0.59991rs34172843 13: 28622544 FLT3 T/A T 98.1 1.12 (0.77–1.63) 0.52293rs138399473 X: 63413082 AMER1 C/T C 97.6 1.19 (0.88–1.62) 0.26378rs150804738 11: 118375998 KMT2A G/A G 98.2 1.22 (0.93–1.61) 0.15249rs200567881 6: 33287881 DAXX CCT/- CCT 96.8 0.85 (0.6–1.21) 0.3656rs75191113 7: 151859288 KMT2C G/T G 97.7 0.97 (0.67–1.39) 0.8569rs142179458 14: 62203623 HIF1A G/A G 98.1 1.37 (1.01–1.85) 0.04248rs3832931 14: 102551276 HSP90AA1 TTT/- — 10.0 1.36 (1.05–1.76) 0.02052rs200504060 19: 15290031 NOTCH3 G/A A 2.8 2.25 (1.29–3.94) 0.00796rs56118985 9: 5044432 JAK2 G/A A 5.6 1.81 (1.24–2.64) 0.00331rs78004519 7: 151860023 KMT2C A/G A 98.5 0.83 (0.54–1.29) 0.4111rs75758215 2: 140995843 LRP1B G/A G 98.3 0.94 (0.51–1.72) 0.8429rs75321043 1: 45800146 MUTYH C/T C 97.0 0.97 (0.61–1.54) 0.8923rs79777494 1: 45800167 MUTYH G/A G 98.3 0.91 (0.57–1.43) 0.6766rs150513105 17: 15983784 NCOR1 C/T T 2.0 1.3 (0.73–2.32) 0.3728rs3782356 12: 49420078 KMT2D C/T T 2.6 1.48 (0.88–2.49) 0.23739

NOTE: SNPs with P ¼ 0.0042 (0.05/12) are considered significant. Only 12 SNPs satisfy the additive model in this study.Abbreviation: CI, confidence interval.aMajor/minor allele of the SNP.bFrequency of alternate homozygous/heterozygous genotypes in the validation cohort.

Table 4. Association of SNPs with different subgroups of breast cancer cases, and their ORs and P value

Groups Cases (n)rs56118985 (JAK2)a rs200504060 (NOTCH3)b rs142179458 (HIF1A)c

OR (95% CI), P OR (95% CI), P OR (95% CI), P

All cases 1,516 1.81 (1.24–2.64), 0.00331 2.25 (1.29–3.94), 0.008 1.37 (1.01–1.85) 0.0425Sporadic 1,227 1.81 (1.22–2.69), 0.0045 2.45 (1.39–4.33), 0.00233 1.36 (0.98–1.88), 0.0604FH 318 1.85 (1.07–3.21), 0.0276 1.47 (0.57–3.79), 0.4376 1.40 (0.83–2.38), 0.4146ERþ 1,019 1.71 (1.13–2.59), 0.01614 2.49 (1.38–4.49), 0.00323 1.60 (1.12–2.29), 0.00804ER� 433 1.88 (1.14–3.10), 0.0166 2.08 (1.00–4.34), 0.07335 1.15 (0.74–1.77), 0.7500PRþ 870 1.60 (1.04–2.47), 0.04976 2.23 (1.21–4.10), 0.01498 1.64 (1.12–2.40), 0.00842PR� 577 2.02 (1.28–3.18), 0.00381 2.61 (1.35–5.01), 0.00753 1.20 (0.81–1.78), 0.6505HER2þ 341 1.46 (0.82–2.61), 0.1362 3.31 (1.64–6.69), 0.001175 1.10 (0.68–1.75), 0.9256HER2� 772 1.90 (1.24–2.93), 0.00536 1.82 (0.96–3.47), 0.08064 1.91 (1.26–2.90), 0.00147Triple negativeER� and PR� and HER2� 131 2.25 (1.06–4.74), 0.04973 2.46 (0.81–7.49), 0.1460 1.62 (0.70–3.77), 0.4236

ER/PRþ HER2�

(ERþ or PRþ) and HER2� 639 1.84 (1.17–2.90), 0.00965 1.73 (0.88–3.38), 0.1041 1.98 (1.25–3.12), 0.00185ERþ and PRþ and HER2� 493 2.14 (1.34–3.41), 0.00198 2.19 (1.11–4.32), 0.0299 1.83 (1.12–2.99), 0.0103

NOTE: SNPs with statistically significant association after Bonferroni correction are highlighted in bold.aFor rs56118985, the major/minor allele is G/A, and the risk allele is A.bFor rs200504060, the major/minor allele is G/A, and the risk allele is A.cFor rs142179458, the major/minor allele is G/A, and the risk allele is G.

Novel Breast Cancer Risk Loci Identified by Targeted NGS

www.aacrjournals.org Cancer Res; 77(19) October 1, 2017 5433

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 7: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

differentiation, and apoptosis (33). It has been suggested thatNotch signaling drives the proliferation of epithelial cells duringmammary gland development and prevents their terminal dif-ferentiation (34). Thus, the upregulation of Notch signaling maythen lead to breast tumorigenesis (34). The role ofNOTCH3 in thedevelopment of breast cancer has also been established (35, 36).In a study investigating the relationship between HER2 andNOTCH3 in DCIS (37), it was found that their expression levelsare directly correlated and the upregulation and activation of thetwopathwaysmay contribute to the progression to invasive breastcarcinomas. This could suggest why the association ofrs200504060 with breast cancer risk was only found specificallyin HER2-positive breast cancer cases. The role of HER2 and

NOTCH3 together in the development of breast cancer shouldbe further elucidated.

A third SNP, rs142179458, was found to be significantlyassociated with HER2-negative breast cancer. Rs142179458 islocated in the coding region of HIF1A on chromosome 14q23,and encodes for the alpha subunit of Hypoxia-Inducible Factor 1(HIF1). A number of studies have established associationsbetween HIF1A polymorphisms with disease phenotypes includ-ing cancer (38).However, none have identified rs142179458. Theoverexpression of HIF1A is brought about by decreased levels ofcellular oxygen (39), and functional HIF1 activates the transcrip-tion of genes that allow cells to adapt to these hypoxic conditions.HIF1A has been found to be highly expressed in many solid

Figure 2.

Multiple sequence alignment of rs56118985 (A), rs200504060 (B), and rs142179458 (C) showing conservation of the amino acid at the mutation sites (arrowed)across species.

Chan et al.

Cancer Res; 77(19) October 1, 2017 Cancer Research5434

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 8: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

tumors, and its role in promoting angiogenesis andmetastasis hasalso been established (40). In breast cancer, levels of HIF1A havebeen found to be directly correlated to stage of cancer progression(41), and could potentially be used as amarker of poor prognosis(42). High expression levels of HIF1A have been associated withHER2-positive breast cancers,whereby studies havedemonstratedits role in further promoting cancer progression and resistance totherapy (43–45).

Numerous genetic variants are identified through sequencingstudies, resulting in investigators having the challenge of deter-mining the clinical significance of these variants with regard to thedisease. Conducting functional experiments is one way to deter-mine whether these variants are deleterious but due to their timeconsuming and laborious nature they are not often carried out.Although there are databases like ClinVar, HGMD, dbSNP, andOMIM that have cataloged the functional effect of previouslyreported variants, information is often not available for novelvariants detected throughNGS. Toovercome this, prediction toolsbasedon the theoretical knowledge and features like nucleotide oramino acid conservation and biochemical properties of aminoacids have been built to classify these variants as benign, likelybenign, pathogenic, likely pathogenic, or of unknown signifi-cance. However, these prediction programs are based on differentmethods and datasets, and their interpretations vary (46). Toaddress this limitation, tools like CADD (17) integrate the pre-diction of different individual methods and produce a score thatclassifies the variant as either benign or deleterious. In our study,we employed commonly used tools like SIFT (13), PolyPhen (14),MutationTaster (15),MutationAssessor (16), and aswell as CADD(17) to determine the pathogenicity of the variants. Utilizingmultiple tools in determining the nature of the variants couldprovide a better estimation of the variant's effect on the protein.Although in silico prediction tools can suggest associations ofvariants in relation to disease, conclusive evidence on the patho-genicity of a variant are best drawn from functional studies (47).

Rs56118985, rs200504060, and rs142179458 reported in thiscurrent study are low-frequency (1% < MAF � 5%) codingvariants. There is currently limited data on the role of low-frequency variants and their contribution to the missing herita-

bility in cancer. However, recent studies suggest that low-frequen-cy and rare variants may have larger effect sizes than commonvariants (48). Low-frequency missense variants associated withlung cancer risk or epithelial ovarian cancer risk have beenidentified using genotyping arrays (49, 50), providing evidencethat such variants are relevant in cancer susceptibility.

We interrogated several publicly available databases[TCGA (www.cbioportal.org), COSMIC (http://cancer.sanger.ac.uk/cosmic), ICGC (https://dcc.icgc.org/), LOVD (http://www.lovd.nl/3.0/), Intogen (https://www.intogen.org/), andDoCM (http://docm.genome.wustl.edu/)] for the variantsrs56118985, rs200405060, and rs142179548 but these variantswere not reported in any of these databases except forrs200504060. The variant rs200405060 was detected in only onelung cancer sample out of 14 tumor–normal matched samplesfrom lung carcinoma patients that were exome sequenced(COSMIC). Possible reasons for the low frequency or lack ofdetection of these variants could be because (i) the variants couldhave been filtered out during the variant filtering and prioritiza-tion process; (ii) these variants are uncommon in Caucasianpopulations. Further studies are warranted in additional diversepopulations to determine the frequency of the three variantsidentified in this current study.

One limitation of this study is that the cases and controls in theValidation cohort are not agematched. As the controls are about adecade younger than the cases in theValidation cohort, there is thepossibility that some of the controls may develop cancer in thefuture, and thus in the future would be categorized under "cases."Hence, this may suggest that the ORs reported here could bepotentially lower than if the cases and controls are age matched.We had performed a preliminary data analysis of our Validationcohort with 866 cases and 886 controls that were age matched,and had observed that the 3 SNPs of interest (rs56118985 (JAK2),rs200504060 (NOTCH3), and rs142179458 (HIF1A) were sig-nificant at P < 0.05 (data not shown). With more samples added(total of 1,516 cases and 1,189 controls), these 3 SNPs remainedsignificant even with a more stringent cutoff after Bonferronicorrection. It is likely that adding more age matched sampleswould yield similar results.

Table 5. Prediction of protein stability for SNPs rs56118985, rs200504060, and rs142179458

Free energy change valuea

(DDG), Kcal/mol HOPE server

SNP I-Mutant2.0 INPS-MD Amino acid changePredicted effect on structure andfunction

rs56118985 –0.07 –0.1836 Located within FERM domain andinteracts with cytokine/interferon/growth hormone receptors. Loss ofthe flexible residue, glycine mightabolish the protein function

rs200504060 –0.35 –0.6867 Located within EGF-like 30; calcium-binding domain. Mutation introduces amore hydrophobic residue and canresult in loss of hydrogen bondsand/or disturb correct folding

rs142179458 –0.28 –0.5563 Located in a region likely to interact withTSGA10. The differences in amino acidproperties can disturb this region andits function

aDDG < 0, decreased stability; DDG > 0, increased stability.

Novel Breast Cancer Risk Loci Identified by Targeted NGS

www.aacrjournals.org Cancer Res; 77(19) October 1, 2017 5435

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 9: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

In summary, we identified variants in JAK2, NOTCH3, andHIF1A that are associated with breast cancer in Chinese through anovel strategy utilizing data derived from targeted sequencing.Additional studies in other populations are warranted to deter-mine whether these variants are associated with breast cancer riskin other ethnicities. Our findings suggest that through the filteringpipeline described here, additional risk loci associatedwith cancercould be discovered from next-generation sequencing data.

Disclosure of Potential Conflicts of InterestM.-H. Tan is a CEO andMedical Director at Lucence Diagnostics Pte. Ltd. No

potential conflicts of interest were disclosed by the other authors.

Authors' ContributionsConception and design: A.S.G. LeeDevelopment of methodology: P. Munusamy, A.S.G. LeeAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): C.H.T. Chan, S.Y. Loke, G.L. Koh, E.S.Y. Wong,H.Y. Law, M.-H. Tan, Y.S. Yap, P. AngAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): C.H.T. Chan, P. Munusamy, Y.S. Yap, A.S.G. Lee

Writing, review, and/or revision of the manuscript: C.H.T. Chan, P. Munu-samy, S.Y. Loke, G.L. Koh, E.S.Y. Wong, C.S. Yoon, Y.S. Yap, P. Ang, A.S.G. LeeAdministrative, technical, or material support (i.e., reporting or organizingdata, constructing databases): H.Y. Law, C.S. YoonStudy supervision: A.S.G. Lee

AcknowledgmentsThe authors are grateful to the volunteers who have participated in the study.

The authors also thank Dr. C.Y. Wong, Dr. W.S. Yong, Dr. N.S. Wong, Dr. R. Ng,Dr. K.W. Ong, Dr. P. Madhukumar, Dr. C.L. Oey, and Dr. G.H. Ho, for referringpatients for the study.

Grant SupportThis study was supported by a grant from the National Medical Research

Council (NMRC) of Singapore (NMRC/CBRG/0034/2013) awarded toA.S.G. Lee and by Centre Grant NMRC support to the National Cancer Centreof Singapore.

The costs of publication of this articlewere defrayed inpart by the payment ofpage charges. This article must therefore be hereby marked advertisement inaccordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Received April 5, 2017; revised June 9, 2017; accepted July 25, 2017;published OnlineFirst August 3, 2017.

References1. Couch FJ, Wang X, McGuffog L, Lee A, Olswold C, Kuchenbaecker KB, et al.

Genome-wide association study in BRCA1 mutation carriers identifiesnovel loci associated with breast and ovarian cancer risk. PLoS Genet2013;9:e1003212.

2. Lindstrom S, Thompson DJ, Paterson AD, Li J, Gierach GL, Scott C, et al.Genome-wide association study identifiesmultiple loci associatedwith bothmammographic density and breast cancer risk. Nat Commun 2014;5:5303.

3. Michailidou K, Beesley J, Lindstrom S, Canisius S, Dennis J, Lush MJ, et al.Genome-wide association analysis of more than 120,000 individualsidentifies 15 new susceptibility loci for breast cancer. Nat Genet 2015;47:373–80.

4. Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, MilneRL, et al. Large-scale genotyping identifies 41 new loci associated withbreast cancer risk. Nat Genet 2013;45:353–61.

5. Manolio TA.Genomewide association atudies and assessment of the risk ofdisease. N Engl J Med 2010;363:166–76.

6. SherryST,WardMH,KholodovM,Baker J, PhanL, SmigielskiEM,et al. dbSNP:the NCBI database of genetic variation. Nucleic Acids Res 2001;29:308–11.

7. SachidanandamR,WeissmanD, Schmidt SC, Kakol JM, Stein LD,Marth G,et al. Amap of human genome sequence variation containing 1.42millionsingle nucleotide polymorphisms. Nature 2001;409:928–33.

8. Wong ES, Shekar S, Met-Domestici M, Chan C, Sze M, Yap YS, et al.Inherited breast cancer predisposition in Asians: multigene panel testingoutcomes from Singapore. NPJ Genomic Medicine 2016;1:15003.

9. ChanM,ChanMW, LohTW, LawHY, YoonCS, Than SS, et al. Evaluationofnanofluidics technology for high-throughput SNP genotyping in a clinicalsetting. J Mol Diagn 2011;13:305–12.

10. Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 2010;26:589–95.

11. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A,et al. The genome analysis toolkit: a mapreduce framework for analyzingnext-generation DNA sequencing data. Genome Res 2010;20:1297–303.

12. WangK, LiM,HakonarsonH.ANNOVAR: functional annotationof geneticvariants from high-throughput sequencing data. Nucleic Acids Res2010;38: e164.

13. Sim N-L, Kumar P, Hu J, Henikoff S, Schneider G, Ng PC. SIFT web server:predicting effects of amino acid substitutions on proteins. Nucleic AcidsRes 2012;40:W452–7.

14. Adzhubei I, Jordan DM, Sunyaev SR. Predicting functional effect of humanmissense mutations using PolyPhen-2. Curr Protoc Hum Genet 2013;Chapter 7:Unit 20.

15. Schwarz JM, Cooper DN, Schuelke M, Seelow D. MutationTaster2: muta-tion prediction for the deep-sequencing age. NatMethods 2014;11:361–2.

16. Frousios K, Iliopoulos CS, Schlitt T, SimpsonMA. Predicting the functionalconsequences of non-synonymous DNA sequence variants — evaluationof bioinformatics tools and development of a consensus strategy. Geno-mics 2013;102:223–8.

17. KircherM,WittenDM, JainP,O'Roak BJ, CooperGM, Shendure J. A generalframework for estimating the relative pathogenicity of human geneticvariants. Nat Genet 2014;46:310–5.

18. The Genomes Project Consortium. A global reference for human geneticvariation. Nature 2015;526:68–74.

19. Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, et al.Analysis of protein-coding genetic variation in 60,706 humans. Nature2016;536:285–91.

20. Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A. Detection of nonneutralsubstitution rates on mammalian phylogenies. Genome Res 2010;20:110–21.

21. Chan M, Ji SM, Liaw CS, Yap YS, Law HY, Yoon CS, et al. Association ofcommon genetic variants with breast cancer risk and clinicopathologicalcharacteristics in a Chinese population. Breast Cancer Res Treat 2012;136:209–20.

22. Gonzalez JR, Armengol L, Sole X, Guino E, Mercader JM, Estivill X, et al.SNPassoc: an R package to perform whole genome association studies.Bioinformatics 2007;23:644–5.

23. Bodenhofer U, Bonatesta E, Horejs-Kainrath C, Hochreiter S. msa: an Rpackage for multiple sequence alignment. Bioinformatics 2015;31:3997–9.

24. Sievers F,Wilm A, DineenD, Gibson TJ, Karplus K, LiW, et al. Fast, scalablegeneration of high-quality protein multiple sequence alignments usingClustal Omega. Mol Syst Biol 2011;7:539.

25. Capriotti E, Fariselli P, Casadio R. I-Mutant2.0: predicting stability changesupon mutation from the protein sequence or structure. Nucleic Acids Res2005;33:W306–10.

26. Savojardo C, Fariselli P, Martelli PL, Casadio R. INPS-MD: a web server topredict stability of protein variants from sequence and structure. Bioin-formatics 2016;32:2542–44.

27. Venselaar H, Te Beek TA, Kuipers RK, Hekkelman ML, Vriend G. Proteinstructure analysis of mutations causing inheritable diseases. An e-Scienceapproach with life scientist friendly interfaces. BMC Bioinformatics 2010;11:548.

28. Zhong Y, Wu J, Ma R, Cao H, Wang Z, Ding J, et al. Association of Januskinase 2 (JAK2) polymorphisms with acute leukemia susceptibility. Int JLab Hematol 2012;34:248–53.

29. Thomas SJ, Snowden JA, Zeidler MP, Danson SJ. The role of JAK/STATsignalling in the pathogenesis, prognosis and treatment of solid tumours.Br J Cancer 2015;113:365–71.

Chan et al.

Cancer Res; 77(19) October 1, 2017 Cancer Research5436

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 10: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

30. HindsDA, Barnholt KE,Mesa RA, Kiefer AK, DoCB, ErikssonN, et al. Germline variants predispose to both JAK2 V617F clonal hematopoiesis andmyeloproliferative neoplasms. Blood 2016;128:1121–8.

31. Bodian DL, McCutcheon JN, Kothiyal P, Huddleston KC, Iyer RK, VockleyJG, et al. Germline variation in cancer-susceptibility genes in a healthy,ancestrally diverse cohort: implications for individual genome sequencing.PLoS One 2014;9:e94554.

32. Henderson BE, Lee NH, Seewaldt V, Shen H. The influence of raceand ethnicity on the biology of cancer. Nat Rev Cancer 2012;12:648–53.

33. Artavanis-Tsakonas S, Rand MD, Lake RJ. Notch signaling: cell fate controland signal integration in development. Science 1999;284:770–6.

34. Farnie G, Clarke RB. Mammary stem cells and breast cancer–role of Notchsignalling. Stem Cell Rev 2007;3:169–75.

35. Choy L, Hagenbeek T, Solon M, French DM, Finkle D, Shelton A, et al.Constitutive NOTCH3 signaling promotes the growth of basal breastcancers. Cancer Res 2017;77:1439–52.

36. Zhang Z, Wang H, Ikeda S, Fahey F, Bielenberg D, Smits P, et al. Notch3in human breast cancer cell lines regulates osteoblast-cancer cellinteractions and osteolytic bone metastasis. Am J Pathol 2010;177:1459–69.

37. PradeepC-R, K€ostlerWJ, LauriolaM,Granit R, Zhang F, Jacob-Hirsch J, et al.Modeling ductal carcinoma in situ: a HER2-Notch3 collaboration enablesluminal filling. Oncogene 2012;31:907–917.

38. Gladek I, Ferdin J,Horvat S,CalinGA,Kunej T.HIF1Agenepolymorphismsand human diseases: graphical review of 97 association studies. GenesChromosomes Cancer 2017;56:439–52.

39. Semenza GL. Regulation of mammalian O2 homeostasis by hypoxia-inducible factor 1. Annu Rev Cell Dev Biol 1999;15:551–78.

40. Zhong H, De Marzo AM, Laughner E, Lim M, Hilton DA, Zagzag D, et al.Overexpression of hypoxia-inducible factor 1alpha in common humancancers and their metastases. Cancer Res 1999;59:5830–5.

41. Bos R, Zhong H, Hanrahan CF, Mommers ECM, Semenza GL, Pinedo HM,et al. Levels of hypoxia-inducible factor-1a during breast carcinogenesis. JNat Cancer Inst 2001;93:309–14.

42. Generali D, Berruti A, Brizzi MP, Campo L, Bonardi S, Wigfield S, et al.Hypoxia-inducible factor-1alpha expression predicts a poor response toprimary chemoendocrine therapy and disease-free survival in primaryhuman breast cancer. Clin Cancer Res 2006;12:4562–8.

43. Whelan KA, Schwab LP, Karakashev SV, Franchetti L, Johannes GJ, Sea-groves TN, et al. The oncogene HER2/neu (ERBB2) requires the hypoxia-inducible factorHIF-1 formammary tumor growth andanoikis resistance. JBiol Chem 2013;288:15865–77.

44. Karakashev SV, Reginato MJ. Hypoxia/HIF1alpha induces lapatinib resis-tance in ERBB2-positive breast cancer cells via regulation of DUSP2.Oncotarget 2015;6:1967–80.

45. Laughner E, Taghavi P, Chiles K, Mahon PC, Semenza GL. HER2 (neu)signaling increases the rate of hypoxia-inducible factor 1alpha (HIF-1alpha) synthesis: novel mechanism for HIF-1-mediated vascular endo-thelial growth factor expression. Mol Cell Biol 2001;21:3995–4004.

46. Walters-Sen LC, Hashimoto S, Thrush DL, Reshmi S, Gastier-Foster JM,Astbury C, et al. Variability in pathogenicity prediction programs: impacton clinical diagnostics. Mol Genet Genomic Med 2015;3:99–110.

47. Miosge LA, Field MA, Sontani Y, Cho V, Johnson S, Palkova A, et al.Comparison of predicted and actual consequences of missense mutations.Proc Natl Acad Sci 2015;112:E5189–98.

48. Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet2012;13:135–45.

49. Permuth JB, Pirie A, Ann Chen Y, Lin HY, Reid BM, Chen Z, et al. Exomegenotyping arrays to identify rare and low frequency variants associatedwith epithelial ovarian cancer risk. Hum Mol Genet 2016;25:3600–12.

50. Jin G, Zhu M, Yin R, Shen W, Liu J, Sun J, et al. Low-frequency codingvariants at 6p21.33 and 20q11.21 are associated with lung cancer risk inChinese populations. Am J Hum Genet 2015;96:832–40.

www.aacrjournals.org Cancer Res; 77(19) October 1, 2017 5437

Novel Breast Cancer Risk Loci Identified by Targeted NGS

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992

Page 11: Identification of Novel Breast Cancer Risk LociIdentification of Novel Breast Cancer Risk Loci Claire Hian Tzer Chan1, Prabhakaran Munusamy1, Sau Yeen Loke1, Geok Ling Koh1, Edward

2017;77:5428-5437. Published OnlineFirst August 3, 2017.Cancer Res   Claire Hian Tzer Chan, Prabhakaran Munusamy, Sau Yeen Loke, et al.   Identification of Novel Breast Cancer Risk Loci

  Updated version

  10.1158/0008-5472.CAN-17-0992doi:

Access the most recent version of this article at:

  Material

Supplementary

  http://cancerres.aacrjournals.org/content/suppl/2017/08/03/0008-5472.CAN-17-0992.DC1

Access the most recent supplemental material at:

   

   

  Cited articles

  http://cancerres.aacrjournals.org/content/77/19/5428.full#ref-list-1

This article cites 49 articles, 10 of which you can access for free at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected]

To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://cancerres.aacrjournals.org/content/77/19/5428To request permission to re-use all or part of this article, use this link

on May 24, 2020. © 2017 American Association for Cancer Research. cancerres.aacrjournals.org Downloaded from

Published OnlineFirst August 3, 2017; DOI: 10.1158/0008-5472.CAN-17-0992