12
General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from orbit.dtu.dk on: Sep 17, 2020 Klinefelter syndrome comorbidities linked to increased X chromosome gene dosage and altered protein interactome activity Belling, Kirstine González-Izarzugaza; Russo, Francesco; Jensen, Anders Boeck; Dalgaard, Marlene Danner; Westergaard, David; Rajpert-De Meyts, Ewa; Skakkebæk, Niels E.; Juul, Anders Christian; Brunak, Søren Published in: Human Molecular Genetics Link to article, DOI: 10.1093/hmg/ddx014 Publication date: 2017 Document Version Publisher's PDF, also known as Version of record Link back to DTU Orbit Citation (APA): Belling, K. G-I., Russo, F., Jensen, A. B., Dalgaard, M. D., Westergaard, D., Rajpert-De Meyts, E., Skakkebæk, N. E., Juul, A. C., & Brunak, S. (2017). Klinefelter syndrome comorbidities linked to increased X chromosome gene dosage and altered protein interactome activity. Human Molecular Genetics, 26(7), 1219-1229. https://doi.org/10.1093/hmg/ddx014

Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors andor other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights

Users may download and print one copy of any publication from the public portal for the purpose of private study or research

You may not further distribute the material or use it for any profit-making activity or commercial gain

You may freely distribute the URL identifying the publication in the public portal If you believe that this document breaches copyright please contact us providing details and we will remove access to the work immediately and investigate your claim

Downloaded from orbitdtudk on Sep 17 2020

Klinefelter syndrome comorbidities linked to increased X chromosome gene dosageand altered protein interactome activity

Belling Kirstine Gonzaacutelez-Izarzugaza Russo Francesco Jensen Anders Boeck Dalgaard MarleneDanner Westergaard David Rajpert-De Meyts Ewa Skakkebaeligk Niels E Juul Anders ChristianBrunak Soslashren

Published inHuman Molecular Genetics

Link to article DOI101093hmgddx014

Publication date2017

Document VersionPublishers PDF also known as Version of record

Link back to DTU Orbit

Citation (APA)Belling K G-I Russo F Jensen A B Dalgaard M D Westergaard D Rajpert-De Meyts E SkakkebaeligkN E Juul A C amp Brunak S (2017) Klinefelter syndrome comorbidities linked to increased X chromosomegene dosage and altered protein interactome activity Human Molecular Genetics 26(7) 1219-1229httpsdoiorg101093hmgddx014

O R I G I N A L A R T I C L E

Klinefelter syndrome comorbidities linked to increased

X chromosome gene dosage and altered protein

interactome activityKirstine Belling1 Francesco Russo1 Anders B Jensen1Marlene D Dalgaard2 David Westergaard1 Ewa Rajpert-De Meyts34Niels E Skakkebaeligk34 Anders Juul34 and Soslashren Brunak1

1Novo Nordisk Foundation Center for Protein Research Faculty of Health and Medical Sciences University ofCopenhagen Copenhagen Denmark 2Center for Biological Sequence Analysis Department of SystemsBiology Technical University of Denmark Lyngby Denmark 3Department of Growth and ReproductionRigshospitalet University of Copenhagen Copenhagen Denmark and 4International Research and ResearchTraining Centre in Endocrine Disruption of Male Reproduction and Child Health (EDMaRC) RigshospitaletUniversity of Copenhagen Copenhagen Denmark

To whom correspondence should be addressed at Novo Nordisk Foundation Center for Protein Research Faculty of Health and Medical SciencesUniversity of Copenhagen Blegdamsvej 3B DK-2200 Copenhagen Denmark Tel thorn45 35325000 Fax thorn45 35325001 Email kirstinebellingcprkudk

AbstractKlinefelter syndrome (KS) (47XXY) is the most common male sex chromosome aneuploidy Diagnosis and clinicalsupervision remain a challenge due to varying phenotypic presentation and insufficient characterization of the syndromeHere we combine health data-driven epidemiology and molecular level systems biology to improve the understanding of KSand the molecular interplay influencing its comorbidities In total 78 overrepresented KS comorbidities were identified usingin- and out-patient registry data from the entire Danish population covering 68 million individuals The comorbidities ex-tracted included both clinically well-known (eg infertility and osteoporosis) and still less established KS comorbidities (egpituitary gland hypofunction and dental caries) Several systems biology approaches were applied to identify key molecularplayers underlying KS comorbidities Identification of co-expressed modules as well as central hubs and gene dosage per-turbed protein complexes in a KS comorbidity network build from known disease proteins and their proteinndashprotein inter-actions The systems biology approaches together pointed to novel aspects of KS disease phenotypes including perturbed Jak-STAT pathway dysregulated genes important for disturbed immune system (IL4) energy balance (POMC and LEP) anderythropoietin signalling in KS We present an extended epidemiological study that links KS comorbidities to the molecularlevel and identify potential causal players in the disease biology underlying the identified comorbidities

Received October 25 2016 Revised December 20 2016 Accepted December 21 2016

VC The Author 2017 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in any medium provided the original work is properly citedFor commercial re-use please contact journalspermissionsoupcom

1219

Human Molecular Genetics 2017 Vol 26 No 7 1219ndash1229

doi 101093hmgddx014Original Article

IntroductionKlinefelter syndrome (KS) (47XXY) is the most common malesex chromosome aneuploidy with a prevalence of approxi-mately 1 in 650 males Diagnosis of KS and prevention andtreatment of its comorbidities keep being a challenge It is esti-mated that only 25 of KS males are ever diagnosed due tovariable phenotype and possibly insufficient characterisationof the syndrome Many KS males are only diagnosed in adultlife due to infertility (1)

The presence of an extra chromosome can cause higher le-vels of gene expression and gene products in amounts affectedby regulation at different levels protein degradation andmodification (2) Also the interaction pattern of the proteinsencoded by the additional chromosome can influence theseverity of a trisomy (3) In the case of KS X chromosome in-activation (XCI) counterweights the extra dosage from themany X chromosome genes (n 2000) Still roughly 15 of X-linked genes escape XCI including genes in the two pseudoau-tosomal regions (PARs) (4)

KS males present certain shared phenotypes such as smalltestes tall stature narrow shoulders broad hips sparse bodyhair gynecomastia and decreased verbal intelligence KS malesalso have increased risk of a wide range of additional disorderscompared to the background population ie comorbiditiesTesticular dysfunction is prevalent in KS It is estimated that KScauses 1ndash3 of all male infertility cases (5) In fact 10 of allmale patients with azoospermia are likely to be caused by KSNumerous KS comorbidities are related to hypogonadismosteoporosis cognitive disorders metabolic syndrome and dia-betes and cardiac disease Consequently KS males are allocatedlifelong testosterone replacement therapy from puberty Yetthe therapy does not counteract or prevent all comorbidities Itis still unclear whether the chromosome abnormality adds anextra testosterone-independent facet to the complexity of KScomorbidities (6)

Comorbidities have been of interest for many years in epi-demiology studies In recent years more data-drivenapproaches have linked molecular aetiology to observed dis-ease correlations extracted from electronic patient recordsStudying disease comorbidities using a systems biology ap-proach potentially enhances the understanding of disease ori-gin and progression and the longitudinal interplay betweencomorbidities (78) Genes associated with the same diseasetend to be co-expressed and work together in protein com-plexes (9ndash12) Thus studying co-expressed genes and proteincomplexes can give further functional insight into the biolo-gical transformation between healthy and diseased individ-uals and potentially further improve disease diagnosistreatment and target discovery (13)

Here we present a population-wide study of KS comorbid-ities extracted from data in the Danish National Patient Registry(14) collected over a period of 185 years Gene expression datawas generated from KS patients and controls from peripheralblood and co-expressed modules of high relevance for the KSphenotype were identified A KS comorbidity network was builtbased on the extracted disease-pair frequencies knowndisease-associated proteins and their proteinndashprotein inter-actions (PPIs) Disease distances and nodes central for the inter-play between diseases were subsequently explored in thenetwork The various analytical approaches used in this studypoint out the same disturbed functionalities and therebystrengthen their relevance for KS

ResultsKlinefelter syndrome comorbidities

KS comorbidities were systematically extracted from the DanishNational Patient Registry (14) using the International Classificationof Diseases version 10 (ICD-10) level-3 codes checking for co-occurrence of the KS code Q98 with all other disease codes (seelsquoMaterials and Methodsrsquo for details) Comorbidities were defined asdiseases significantly diagnosed a minimum of 50 more fre-quently in men with KS compared to controls ie relative risk(RR) 15 P-values were obtained using the Fisherrsquos test andBenjamini-Hochberg correction was applied to obtain a false dis-covery rate (FDR) of 5 corresponding to P 00082 A total of 78significant KS comorbidities was identified with a minimum of fiveKS patients diagnosed (Supplementary Material Table S1) Figure 1displays the KS comorbidities by ICD-10 chapters (short abbrevi-ations for chapter titles in Supplementary Material Table S2)Many well-known KS comorbidities were as expected observed inthe patient records including E29 lsquotesticular dysfunctionrsquo (RRfrac14 10188) M81 lsquoosteoporosis without pathological fractionrsquo (RRfrac14 12433)N62 lsquohypertrophy of the breastrsquo (RRfrac14 8252) F79 lsquounspecified men-tal retardationrsquo (RRfrac14 6907) E14 lsquounspecified diabetes mellitusrsquo(RRfrac14 6332) Q53 lsquoundescended testiclersquo (RRfrac14 4227) E66 lsquoobesityrsquo(RRfrac14 4038) and G40 lsquoepilepsyrsquo (RRfrac14 3928) (15ndash17) Less establishedKS comorbidities were also observed eg E23 lsquohypofunction andother disorders of the pituitary glandrsquo (RRfrac14 8369) K02 lsquodental car-iesrsquo (RRfrac14 4816) and K07 lsquodentofacial anomalities [including mal-occlusion]rsquo (RRfrac14 384) All comorbid level-3 codes were checked forobservations representing phenotypes diverging in opposite direc-tions as level-4 codes Only one such case was identified E29 lsquotes-ticular dysfunctionrsquo (RRfrac14 10188) that includes both E290lsquotesticular hyperfunctionrsquo and E291 lsquotesticular hypofunctionrsquo ofwhich only the latter was found comorbid with KS (RRfrac14 11767)

Differentially expressed genes in peripheral blood

Blood samples from eight KS patients and eight controls were ex-pression profiled using microarrays A total of 12947 genes werefound to be expressed in at least one of the 16 samples (intensitycut off 727) In this gene set we found an enrichment of genesencoded by chromosomes 4 (Pfrac14 00232) X (Pfrac14 0017) and Y(Pfrac14 2483e-05) (Fisherrsquos exact test P 005) (SupplementaryMaterial Fig S1) Statistical testing identified 363 differentially ex-pressed genes (146 up- and 217 down-regulated) considering anabsolute log2 fold change (lfc) 15 and a BenjaminindashHochberg-corrected FDR of 5 corresponding to P 00069 (SupplementaryMaterial Table S3)

The most significant differentially expressed and upregu-lated KS gene was the long non-coding RNA X inactive specifictranscript (XIST) (lfcfrac14 841) A total of 22 X chromosome geneswas deregulated in KS compared to controls 16 were up-regulated (AKAP17A ASMTL CSF2RA EIF1AX EIF2S3 GPR82GTPBP6 IL3RA PLCXD1 PPP2R3B PRKX RP11-EPO6O153 SEPT6SLC25A6 TMSB4X and XIST) and six were down-regulated(BEND2 BEX1 COX7B FOXO4 NHS and TFE3) Some of thesegenes were expressed from pseudoautosomal region 1 (PAR1)AKAP17A ASMTL CSF2RA GTPBP6 IL3RA and PLCXD1 but nodifferentially expressed genes were observed encoded frompseudoautosomal region 2 An interesting observation was theup-regulation of another long non-coding RNA RP11-706O153The function of this gene is still unknown but it is located nearPAR1 at Xp2233 and thus potentially escapes XCI andor mightbe involved in the XCI process itself

1220 | Human Molecular Genetics 2017 Vol 26 No 7

Gene set enrichment analysis (GSEA) was performed to gainfunctional understanding of the deregulated genes (nfrac14 363) Themost enriched biological process terms were lsquoimmune responsersquoand lsquoresponse to bacteriumrsquo and the most enriched pathwayswere lsquoCytokinendashcytokine receptor interactionrsquo and lsquoJak-STAT sig-nalling pathwayrsquo (full list in Supplementary Material Table S4)The lsquoCytokinendashcytokine receptor interactionrsquo pathway included18 deregulated genes (seven up-regulated CCR7 CSF2RA IFNA4IL3RA IL6ST IL7R and TNFRSF25 and 11 down-regulated CXCL1CXCL16 CXCL2 CXCR1 EPOR IL1R1 IL4 INHBB LEP LEPR and

OSM) Multiple genes are involved in both the lsquoCytokinendashcytokinereceptor interactionrsquo and the lsquoJak-STAT signalling pathwayrsquo(CSF2RA EPOR IFNA4 IL3RA IL4 IL6ST IL7R LEP LEPR and OSM)

Co-expressed gene modules correlating with Klinefeltersyndrome status

A total of 54 co-expressed modules containing genes with simi-lar expression patterns was identified by applying weightedgene co-expression network analysis (WGCNA) (18) (see

Figure 1 Klinefelter syndrome (KS) comorbidities extracted from Danish patient records (nfrac1478) Comorbidities are summarised by disease chapter and each dot repre-

sents a KS comorbidity with a certain relative risk (RR) The disease chapter titles are shortened (see Supplementary Material Table S2) There were no comorbidities

between KS and disorders from chapters 2 7 8 16 and 18ndash21 See Supplementary Material Table S1 for the full list of KS comorbidities and RRs

1221Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 2 Gene co-expression modules correlating with Klinefelter syndrome (KS) status (A) Gene co-expression analysis dendrogram showing co-expressed gene mo-

dules identified using Weighted gene co-expression network analysis (WGCNA) (nfrac1454) The top colour bar states expression correlation with KS status (red positive

correlation blue negative correlation) The bar below informs about module membership In total 27 modules correlated significantly with KS status and reported

here as coloured across the dendrogram (B) Scaled expression per individual of the 99 genes in the most positively correlated module with KS status M1saddlebrown

(C) Scaled expression per individual of the 81 genes in the most negatively correlated module with KS status M2skyblue3

1222 | Human Molecular Genetics 2017 Vol 26 No 7

lsquoMaterials and Methodsrsquo for details) Of these 27 modules hadexpression signatures that correlated with KS status either posi-tively (higher expressed in KS than controls nfrac14 14) or nega-tively (lower expressed in KS than controls nfrac14 13) All moduleswere assigned a colour and labelled numerically according totheir degree of significant correlation to KS status (Fig 2A andSupplementary Material Table S5)

The most significant and positively correlated modulewas M1saddlebrown (WGCNA student asymptotic t-testPfrac14 112e07) (Fig 2B) This module included 99 genes includingmany X-linked and PAR1 genes AKAP17A CA5B EIF1AX IL3RASCML1 SEPT6 XIST and ZBED1 An interesting gene in the mo-dule was DDX17 that recently was discovered as protein interac-tor of XIST (19) M1 contained 32 of the significantly up-regulated genes in KS identified in the statistical test describedpreviously (Supplementary Material Table S3) GSEA showedthat the M1 genes were involved in the KEGG pathwayslsquoneuroactive ligand-receptor interaction pathwayrsquo and lsquocalciumsignalling pathwayrsquo and several Online Mendelian Inheritancein Man (OMIM) diseases such as lsquoabnormal thyroid hormonemetabolismrsquo lsquoepilepsyrsquo and lsquodeafnessrsquo (Supplementary MaterialTable S5 for complete list)

The most significant negatively correlated module was M2skyblue3 (WGCNA student asymptotic t-test Pfrac14 521e06) (Fig2C) This module included 81 genes none of them located in thePARs M2 contained 38 of the down-regulated genes in KS andGSEA revealed enriched biological processes such as lsquoimmunesystem processrsquo and lsquoresponse to bacteriarsquo and the KEGG path-way lsquoplatelet activationrsquo (Supplementary Material Table S5)Interestingly both WGCNA and the differential expression ana-lysis showed significantly enriched biological terms related tothe immune system These terms contained mainly downregu-lated genes In order to evaluate the overlap between the genesinvolved in these terms within M2 and the deregulated geneswe computed the hypergeometric distribution obtaining a sig-nificant overlap (Pfrac14 335e47) This result showed that theGSEA analysis and the consequent interpretation of the resultswere consistent that is the down-regulation of genes involvedin the immune system could explain some diseases associatedwith KS and its comorbidities

Supplementary Material Figure S2 displays the chromosomaldistribution of genes in each of the 27 modules correlating signifi-cantly with KS status M1 was one of the modules with the high-est X chromosome gene content Yet overall the modulescontained genes spread out genome-wide underlining that hav-ing an extra chromosome changes gene expression globally

Klinefelter syndrome comorbidity network

One of the main aims of this study was to investigate the molecu-lar relationship between KS and its comorbidities and to identifykey players in their crosstalk For this purpose a KS comorbiditynetwork was constructed consisting of disease sub-networkseach covering KS or one of its comorbidities Each disease sub-network was built of associated proteins extracted from disease-gene databases and following connecting with observed PPIsfrom InWeb_InBioMap (InWeb_IM) (11) The disease sub-networks were following merged into the KS comorbidity net-work (see lsquoMaterials and Methodsrsquo and Supplementary MaterialFig S3 for further details) Each of the sub-networks were testedfor increased PPI connectivity compared to random networks ofthe same size and node degree distribution thereby testing theirfunctional relevance under the hypothesis that functionally

related proteins are more likely to interact than unrelated pro-teins (20) It was possible to build disease sub-networks for KSand 29 of its comorbidities and the sub-networks for KS and 22comorbidity subsequently showed significantly high connectivity(P-value cut off 005) (Supplementary Material Table S6)Accordingly the KS comorbidity network covered 23 diseasesand consisted of 299 nodes and 528 edges (Fig 3A)

Network distances between each of the disease pairs in theKS comorbidity network were determined by comparing thenetwork-based distances between the proteins involved in thetwo diseases (21) As expected most diseases had a negative dis-ease separation indicating overlapping network topology Thecomorbidities with most PPI topological overlap (TO) to KS wasQ53 lsquoundescended testiclersquo N46 lsquomale infertilityrsquo and E14 lsquoun-specified diabetes mellitusrsquo (Fig 3B and Supplementary MaterialTable S7) The comorbidity network had the common topologicalcharacteristics of networks from the molecular level biologicaldomain and the KS sub-network did not stand out in any topo-logical measures (see Supplementary Material Text S1 and TextS2 for more details and Supplementary Material Table S8)

Numerous nodes in the network were associated to multipleKS comorbidities In this study we defined two categories of suchnodes comorbidity nodes and comorbidity-linking hubs Comorbiditynodes are associated in disease-gene databases to numerouscomorbidities themselves Comorbidity-linking hubs are centralto multiple comorbidities through their first-order interactionpartners which are associated with multiple comorbiditiesThus the term lsquohubrsquo here refers to the centrality of the node tonumerous diseases and is not referring to the PPI connectivity ofthe node The KS comorbidity network can be considered as thelsquobackbonersquo for tissue-agnostic network and applying gene expres-sion data from different tissues will give tissue-specific informa-tion In this study we applied the gene expression data fromblood presented above This data set covered 174 nodes in thenetwork (Supplementary Material Table S9)

Some comorbidity nodes were associated with numerous codesfrom the same or disease-related ICD-10 chapters eg chapters Iand Q both featuring heart diseases Of more interest werecomorbidity nodes linking diseases affecting diverse pathophysi-ology across tissues Numerous such nodes were present in thenetwork and involved in up to twelve diseases Three nodes link-ing minimum five diseases were significantly down-regulated inKS POMC (lfc frac14 0762) LEP (lfc frac14 0785) and IL4 (lfc frac14 0814)linking seven six and five diseases respectively

Comorbidity-linking hubs were also abundant in the networkie nodes that connect multiple comorbidities through theirfirst order interactions partners The biggest deregulatedcomorbidity-linking hubs were LEP LEPR (lfc frac14 0600) POMCEPOR (lfc frac14 0603) CSF2RA (lfcfrac14 0648) and IL4 connectingnine eight eight seven seven and six diseases respectivelyThus LEP POMC and IL4 were both comorbidity nodes andcomorbidity-linking hubs ie both associated to multiple di-seases themselves and also linking multiple diseases throughtheir first-order interaction partners

Protein complexes with perturbed functionality in KS

Protein complexes ie clusters were identified in the comorbi-dity network In total 18 significant clusters (P 005) were iden-tified with the most significant cluster C1 containing 16proteins CCR2 CSF2RA EPO EPOR GHR IL2 IL23R IL2RA IL4RIL5 IL5RA IL10RA JAK2 LEPR MPL and PKD1 (Fig 3C andSupplementary Material Table S10) These proteins are

1223Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 3 Klinefelter syndrome (KS) comorbidity network disease separation and protein complex with disturbed functionality (A) The global KS comorbidity network

created from disease-associated proteins from knowledge databases linked by observed proteinndashprotein interactions (PPIs) The network represents the sub-networks

for KS and the 22 comorbidities with significant high connectivity Nodes are coloured according to their associated disease chapter similarly to Figure 1 Nodes associ-

ated to diseases belonging to multiple chapters are multi-coloured Node size is scaled to the number of KS comorbidities each node is associated with (B) Disease se-

parations of the 23 diseases represented in the comorbidity network clustered after similarity and only including edges between diseases with overlapping network

topology (C) The most statistical significant cluster in the network The size and colouring of the nodes are similar to in the full network Node border colour represents

significant up- (red) and down-regulation (blue) in KS compared to controls

1224 | Human Molecular Genetics 2017 Vol 26 No 7

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 2: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

O R I G I N A L A R T I C L E

Klinefelter syndrome comorbidities linked to increased

X chromosome gene dosage and altered protein

interactome activityKirstine Belling1 Francesco Russo1 Anders B Jensen1Marlene D Dalgaard2 David Westergaard1 Ewa Rajpert-De Meyts34Niels E Skakkebaeligk34 Anders Juul34 and Soslashren Brunak1

1Novo Nordisk Foundation Center for Protein Research Faculty of Health and Medical Sciences University ofCopenhagen Copenhagen Denmark 2Center for Biological Sequence Analysis Department of SystemsBiology Technical University of Denmark Lyngby Denmark 3Department of Growth and ReproductionRigshospitalet University of Copenhagen Copenhagen Denmark and 4International Research and ResearchTraining Centre in Endocrine Disruption of Male Reproduction and Child Health (EDMaRC) RigshospitaletUniversity of Copenhagen Copenhagen Denmark

To whom correspondence should be addressed at Novo Nordisk Foundation Center for Protein Research Faculty of Health and Medical SciencesUniversity of Copenhagen Blegdamsvej 3B DK-2200 Copenhagen Denmark Tel thorn45 35325000 Fax thorn45 35325001 Email kirstinebellingcprkudk

AbstractKlinefelter syndrome (KS) (47XXY) is the most common male sex chromosome aneuploidy Diagnosis and clinicalsupervision remain a challenge due to varying phenotypic presentation and insufficient characterization of the syndromeHere we combine health data-driven epidemiology and molecular level systems biology to improve the understanding of KSand the molecular interplay influencing its comorbidities In total 78 overrepresented KS comorbidities were identified usingin- and out-patient registry data from the entire Danish population covering 68 million individuals The comorbidities ex-tracted included both clinically well-known (eg infertility and osteoporosis) and still less established KS comorbidities (egpituitary gland hypofunction and dental caries) Several systems biology approaches were applied to identify key molecularplayers underlying KS comorbidities Identification of co-expressed modules as well as central hubs and gene dosage per-turbed protein complexes in a KS comorbidity network build from known disease proteins and their proteinndashprotein inter-actions The systems biology approaches together pointed to novel aspects of KS disease phenotypes including perturbed Jak-STAT pathway dysregulated genes important for disturbed immune system (IL4) energy balance (POMC and LEP) anderythropoietin signalling in KS We present an extended epidemiological study that links KS comorbidities to the molecularlevel and identify potential causal players in the disease biology underlying the identified comorbidities

Received October 25 2016 Revised December 20 2016 Accepted December 21 2016

VC The Author 2017 Published by Oxford University PressThis is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (httpcreativecommonsorglicensesby-nc40) which permits non-commercial re-use distribution and reproduction in any medium provided the original work is properly citedFor commercial re-use please contact journalspermissionsoupcom

1219

Human Molecular Genetics 2017 Vol 26 No 7 1219ndash1229

doi 101093hmgddx014Original Article

IntroductionKlinefelter syndrome (KS) (47XXY) is the most common malesex chromosome aneuploidy with a prevalence of approxi-mately 1 in 650 males Diagnosis of KS and prevention andtreatment of its comorbidities keep being a challenge It is esti-mated that only 25 of KS males are ever diagnosed due tovariable phenotype and possibly insufficient characterisationof the syndrome Many KS males are only diagnosed in adultlife due to infertility (1)

The presence of an extra chromosome can cause higher le-vels of gene expression and gene products in amounts affectedby regulation at different levels protein degradation andmodification (2) Also the interaction pattern of the proteinsencoded by the additional chromosome can influence theseverity of a trisomy (3) In the case of KS X chromosome in-activation (XCI) counterweights the extra dosage from themany X chromosome genes (n 2000) Still roughly 15 of X-linked genes escape XCI including genes in the two pseudoau-tosomal regions (PARs) (4)

KS males present certain shared phenotypes such as smalltestes tall stature narrow shoulders broad hips sparse bodyhair gynecomastia and decreased verbal intelligence KS malesalso have increased risk of a wide range of additional disorderscompared to the background population ie comorbiditiesTesticular dysfunction is prevalent in KS It is estimated that KScauses 1ndash3 of all male infertility cases (5) In fact 10 of allmale patients with azoospermia are likely to be caused by KSNumerous KS comorbidities are related to hypogonadismosteoporosis cognitive disorders metabolic syndrome and dia-betes and cardiac disease Consequently KS males are allocatedlifelong testosterone replacement therapy from puberty Yetthe therapy does not counteract or prevent all comorbidities Itis still unclear whether the chromosome abnormality adds anextra testosterone-independent facet to the complexity of KScomorbidities (6)

Comorbidities have been of interest for many years in epi-demiology studies In recent years more data-drivenapproaches have linked molecular aetiology to observed dis-ease correlations extracted from electronic patient recordsStudying disease comorbidities using a systems biology ap-proach potentially enhances the understanding of disease ori-gin and progression and the longitudinal interplay betweencomorbidities (78) Genes associated with the same diseasetend to be co-expressed and work together in protein com-plexes (9ndash12) Thus studying co-expressed genes and proteincomplexes can give further functional insight into the biolo-gical transformation between healthy and diseased individ-uals and potentially further improve disease diagnosistreatment and target discovery (13)

Here we present a population-wide study of KS comorbid-ities extracted from data in the Danish National Patient Registry(14) collected over a period of 185 years Gene expression datawas generated from KS patients and controls from peripheralblood and co-expressed modules of high relevance for the KSphenotype were identified A KS comorbidity network was builtbased on the extracted disease-pair frequencies knowndisease-associated proteins and their proteinndashprotein inter-actions (PPIs) Disease distances and nodes central for the inter-play between diseases were subsequently explored in thenetwork The various analytical approaches used in this studypoint out the same disturbed functionalities and therebystrengthen their relevance for KS

ResultsKlinefelter syndrome comorbidities

KS comorbidities were systematically extracted from the DanishNational Patient Registry (14) using the International Classificationof Diseases version 10 (ICD-10) level-3 codes checking for co-occurrence of the KS code Q98 with all other disease codes (seelsquoMaterials and Methodsrsquo for details) Comorbidities were defined asdiseases significantly diagnosed a minimum of 50 more fre-quently in men with KS compared to controls ie relative risk(RR) 15 P-values were obtained using the Fisherrsquos test andBenjamini-Hochberg correction was applied to obtain a false dis-covery rate (FDR) of 5 corresponding to P 00082 A total of 78significant KS comorbidities was identified with a minimum of fiveKS patients diagnosed (Supplementary Material Table S1) Figure 1displays the KS comorbidities by ICD-10 chapters (short abbrevi-ations for chapter titles in Supplementary Material Table S2)Many well-known KS comorbidities were as expected observed inthe patient records including E29 lsquotesticular dysfunctionrsquo (RRfrac14 10188) M81 lsquoosteoporosis without pathological fractionrsquo (RRfrac14 12433)N62 lsquohypertrophy of the breastrsquo (RRfrac14 8252) F79 lsquounspecified men-tal retardationrsquo (RRfrac14 6907) E14 lsquounspecified diabetes mellitusrsquo(RRfrac14 6332) Q53 lsquoundescended testiclersquo (RRfrac14 4227) E66 lsquoobesityrsquo(RRfrac14 4038) and G40 lsquoepilepsyrsquo (RRfrac14 3928) (15ndash17) Less establishedKS comorbidities were also observed eg E23 lsquohypofunction andother disorders of the pituitary glandrsquo (RRfrac14 8369) K02 lsquodental car-iesrsquo (RRfrac14 4816) and K07 lsquodentofacial anomalities [including mal-occlusion]rsquo (RRfrac14 384) All comorbid level-3 codes were checked forobservations representing phenotypes diverging in opposite direc-tions as level-4 codes Only one such case was identified E29 lsquotes-ticular dysfunctionrsquo (RRfrac14 10188) that includes both E290lsquotesticular hyperfunctionrsquo and E291 lsquotesticular hypofunctionrsquo ofwhich only the latter was found comorbid with KS (RRfrac14 11767)

Differentially expressed genes in peripheral blood

Blood samples from eight KS patients and eight controls were ex-pression profiled using microarrays A total of 12947 genes werefound to be expressed in at least one of the 16 samples (intensitycut off 727) In this gene set we found an enrichment of genesencoded by chromosomes 4 (Pfrac14 00232) X (Pfrac14 0017) and Y(Pfrac14 2483e-05) (Fisherrsquos exact test P 005) (SupplementaryMaterial Fig S1) Statistical testing identified 363 differentially ex-pressed genes (146 up- and 217 down-regulated) considering anabsolute log2 fold change (lfc) 15 and a BenjaminindashHochberg-corrected FDR of 5 corresponding to P 00069 (SupplementaryMaterial Table S3)

The most significant differentially expressed and upregu-lated KS gene was the long non-coding RNA X inactive specifictranscript (XIST) (lfcfrac14 841) A total of 22 X chromosome geneswas deregulated in KS compared to controls 16 were up-regulated (AKAP17A ASMTL CSF2RA EIF1AX EIF2S3 GPR82GTPBP6 IL3RA PLCXD1 PPP2R3B PRKX RP11-EPO6O153 SEPT6SLC25A6 TMSB4X and XIST) and six were down-regulated(BEND2 BEX1 COX7B FOXO4 NHS and TFE3) Some of thesegenes were expressed from pseudoautosomal region 1 (PAR1)AKAP17A ASMTL CSF2RA GTPBP6 IL3RA and PLCXD1 but nodifferentially expressed genes were observed encoded frompseudoautosomal region 2 An interesting observation was theup-regulation of another long non-coding RNA RP11-706O153The function of this gene is still unknown but it is located nearPAR1 at Xp2233 and thus potentially escapes XCI andor mightbe involved in the XCI process itself

1220 | Human Molecular Genetics 2017 Vol 26 No 7

Gene set enrichment analysis (GSEA) was performed to gainfunctional understanding of the deregulated genes (nfrac14 363) Themost enriched biological process terms were lsquoimmune responsersquoand lsquoresponse to bacteriumrsquo and the most enriched pathwayswere lsquoCytokinendashcytokine receptor interactionrsquo and lsquoJak-STAT sig-nalling pathwayrsquo (full list in Supplementary Material Table S4)The lsquoCytokinendashcytokine receptor interactionrsquo pathway included18 deregulated genes (seven up-regulated CCR7 CSF2RA IFNA4IL3RA IL6ST IL7R and TNFRSF25 and 11 down-regulated CXCL1CXCL16 CXCL2 CXCR1 EPOR IL1R1 IL4 INHBB LEP LEPR and

OSM) Multiple genes are involved in both the lsquoCytokinendashcytokinereceptor interactionrsquo and the lsquoJak-STAT signalling pathwayrsquo(CSF2RA EPOR IFNA4 IL3RA IL4 IL6ST IL7R LEP LEPR and OSM)

Co-expressed gene modules correlating with Klinefeltersyndrome status

A total of 54 co-expressed modules containing genes with simi-lar expression patterns was identified by applying weightedgene co-expression network analysis (WGCNA) (18) (see

Figure 1 Klinefelter syndrome (KS) comorbidities extracted from Danish patient records (nfrac1478) Comorbidities are summarised by disease chapter and each dot repre-

sents a KS comorbidity with a certain relative risk (RR) The disease chapter titles are shortened (see Supplementary Material Table S2) There were no comorbidities

between KS and disorders from chapters 2 7 8 16 and 18ndash21 See Supplementary Material Table S1 for the full list of KS comorbidities and RRs

1221Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 2 Gene co-expression modules correlating with Klinefelter syndrome (KS) status (A) Gene co-expression analysis dendrogram showing co-expressed gene mo-

dules identified using Weighted gene co-expression network analysis (WGCNA) (nfrac1454) The top colour bar states expression correlation with KS status (red positive

correlation blue negative correlation) The bar below informs about module membership In total 27 modules correlated significantly with KS status and reported

here as coloured across the dendrogram (B) Scaled expression per individual of the 99 genes in the most positively correlated module with KS status M1saddlebrown

(C) Scaled expression per individual of the 81 genes in the most negatively correlated module with KS status M2skyblue3

1222 | Human Molecular Genetics 2017 Vol 26 No 7

lsquoMaterials and Methodsrsquo for details) Of these 27 modules hadexpression signatures that correlated with KS status either posi-tively (higher expressed in KS than controls nfrac14 14) or nega-tively (lower expressed in KS than controls nfrac14 13) All moduleswere assigned a colour and labelled numerically according totheir degree of significant correlation to KS status (Fig 2A andSupplementary Material Table S5)

The most significant and positively correlated modulewas M1saddlebrown (WGCNA student asymptotic t-testPfrac14 112e07) (Fig 2B) This module included 99 genes includingmany X-linked and PAR1 genes AKAP17A CA5B EIF1AX IL3RASCML1 SEPT6 XIST and ZBED1 An interesting gene in the mo-dule was DDX17 that recently was discovered as protein interac-tor of XIST (19) M1 contained 32 of the significantly up-regulated genes in KS identified in the statistical test describedpreviously (Supplementary Material Table S3) GSEA showedthat the M1 genes were involved in the KEGG pathwayslsquoneuroactive ligand-receptor interaction pathwayrsquo and lsquocalciumsignalling pathwayrsquo and several Online Mendelian Inheritancein Man (OMIM) diseases such as lsquoabnormal thyroid hormonemetabolismrsquo lsquoepilepsyrsquo and lsquodeafnessrsquo (Supplementary MaterialTable S5 for complete list)

The most significant negatively correlated module was M2skyblue3 (WGCNA student asymptotic t-test Pfrac14 521e06) (Fig2C) This module included 81 genes none of them located in thePARs M2 contained 38 of the down-regulated genes in KS andGSEA revealed enriched biological processes such as lsquoimmunesystem processrsquo and lsquoresponse to bacteriarsquo and the KEGG path-way lsquoplatelet activationrsquo (Supplementary Material Table S5)Interestingly both WGCNA and the differential expression ana-lysis showed significantly enriched biological terms related tothe immune system These terms contained mainly downregu-lated genes In order to evaluate the overlap between the genesinvolved in these terms within M2 and the deregulated geneswe computed the hypergeometric distribution obtaining a sig-nificant overlap (Pfrac14 335e47) This result showed that theGSEA analysis and the consequent interpretation of the resultswere consistent that is the down-regulation of genes involvedin the immune system could explain some diseases associatedwith KS and its comorbidities

Supplementary Material Figure S2 displays the chromosomaldistribution of genes in each of the 27 modules correlating signifi-cantly with KS status M1 was one of the modules with the high-est X chromosome gene content Yet overall the modulescontained genes spread out genome-wide underlining that hav-ing an extra chromosome changes gene expression globally

Klinefelter syndrome comorbidity network

One of the main aims of this study was to investigate the molecu-lar relationship between KS and its comorbidities and to identifykey players in their crosstalk For this purpose a KS comorbiditynetwork was constructed consisting of disease sub-networkseach covering KS or one of its comorbidities Each disease sub-network was built of associated proteins extracted from disease-gene databases and following connecting with observed PPIsfrom InWeb_InBioMap (InWeb_IM) (11) The disease sub-networks were following merged into the KS comorbidity net-work (see lsquoMaterials and Methodsrsquo and Supplementary MaterialFig S3 for further details) Each of the sub-networks were testedfor increased PPI connectivity compared to random networks ofthe same size and node degree distribution thereby testing theirfunctional relevance under the hypothesis that functionally

related proteins are more likely to interact than unrelated pro-teins (20) It was possible to build disease sub-networks for KSand 29 of its comorbidities and the sub-networks for KS and 22comorbidity subsequently showed significantly high connectivity(P-value cut off 005) (Supplementary Material Table S6)Accordingly the KS comorbidity network covered 23 diseasesand consisted of 299 nodes and 528 edges (Fig 3A)

Network distances between each of the disease pairs in theKS comorbidity network were determined by comparing thenetwork-based distances between the proteins involved in thetwo diseases (21) As expected most diseases had a negative dis-ease separation indicating overlapping network topology Thecomorbidities with most PPI topological overlap (TO) to KS wasQ53 lsquoundescended testiclersquo N46 lsquomale infertilityrsquo and E14 lsquoun-specified diabetes mellitusrsquo (Fig 3B and Supplementary MaterialTable S7) The comorbidity network had the common topologicalcharacteristics of networks from the molecular level biologicaldomain and the KS sub-network did not stand out in any topo-logical measures (see Supplementary Material Text S1 and TextS2 for more details and Supplementary Material Table S8)

Numerous nodes in the network were associated to multipleKS comorbidities In this study we defined two categories of suchnodes comorbidity nodes and comorbidity-linking hubs Comorbiditynodes are associated in disease-gene databases to numerouscomorbidities themselves Comorbidity-linking hubs are centralto multiple comorbidities through their first-order interactionpartners which are associated with multiple comorbiditiesThus the term lsquohubrsquo here refers to the centrality of the node tonumerous diseases and is not referring to the PPI connectivity ofthe node The KS comorbidity network can be considered as thelsquobackbonersquo for tissue-agnostic network and applying gene expres-sion data from different tissues will give tissue-specific informa-tion In this study we applied the gene expression data fromblood presented above This data set covered 174 nodes in thenetwork (Supplementary Material Table S9)

Some comorbidity nodes were associated with numerous codesfrom the same or disease-related ICD-10 chapters eg chapters Iand Q both featuring heart diseases Of more interest werecomorbidity nodes linking diseases affecting diverse pathophysi-ology across tissues Numerous such nodes were present in thenetwork and involved in up to twelve diseases Three nodes link-ing minimum five diseases were significantly down-regulated inKS POMC (lfc frac14 0762) LEP (lfc frac14 0785) and IL4 (lfc frac14 0814)linking seven six and five diseases respectively

Comorbidity-linking hubs were also abundant in the networkie nodes that connect multiple comorbidities through theirfirst order interactions partners The biggest deregulatedcomorbidity-linking hubs were LEP LEPR (lfc frac14 0600) POMCEPOR (lfc frac14 0603) CSF2RA (lfcfrac14 0648) and IL4 connectingnine eight eight seven seven and six diseases respectivelyThus LEP POMC and IL4 were both comorbidity nodes andcomorbidity-linking hubs ie both associated to multiple di-seases themselves and also linking multiple diseases throughtheir first-order interaction partners

Protein complexes with perturbed functionality in KS

Protein complexes ie clusters were identified in the comorbi-dity network In total 18 significant clusters (P 005) were iden-tified with the most significant cluster C1 containing 16proteins CCR2 CSF2RA EPO EPOR GHR IL2 IL23R IL2RA IL4RIL5 IL5RA IL10RA JAK2 LEPR MPL and PKD1 (Fig 3C andSupplementary Material Table S10) These proteins are

1223Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 3 Klinefelter syndrome (KS) comorbidity network disease separation and protein complex with disturbed functionality (A) The global KS comorbidity network

created from disease-associated proteins from knowledge databases linked by observed proteinndashprotein interactions (PPIs) The network represents the sub-networks

for KS and the 22 comorbidities with significant high connectivity Nodes are coloured according to their associated disease chapter similarly to Figure 1 Nodes associ-

ated to diseases belonging to multiple chapters are multi-coloured Node size is scaled to the number of KS comorbidities each node is associated with (B) Disease se-

parations of the 23 diseases represented in the comorbidity network clustered after similarity and only including edges between diseases with overlapping network

topology (C) The most statistical significant cluster in the network The size and colouring of the nodes are similar to in the full network Node border colour represents

significant up- (red) and down-regulation (blue) in KS compared to controls

1224 | Human Molecular Genetics 2017 Vol 26 No 7

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 3: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

IntroductionKlinefelter syndrome (KS) (47XXY) is the most common malesex chromosome aneuploidy with a prevalence of approxi-mately 1 in 650 males Diagnosis of KS and prevention andtreatment of its comorbidities keep being a challenge It is esti-mated that only 25 of KS males are ever diagnosed due tovariable phenotype and possibly insufficient characterisationof the syndrome Many KS males are only diagnosed in adultlife due to infertility (1)

The presence of an extra chromosome can cause higher le-vels of gene expression and gene products in amounts affectedby regulation at different levels protein degradation andmodification (2) Also the interaction pattern of the proteinsencoded by the additional chromosome can influence theseverity of a trisomy (3) In the case of KS X chromosome in-activation (XCI) counterweights the extra dosage from themany X chromosome genes (n 2000) Still roughly 15 of X-linked genes escape XCI including genes in the two pseudoau-tosomal regions (PARs) (4)

KS males present certain shared phenotypes such as smalltestes tall stature narrow shoulders broad hips sparse bodyhair gynecomastia and decreased verbal intelligence KS malesalso have increased risk of a wide range of additional disorderscompared to the background population ie comorbiditiesTesticular dysfunction is prevalent in KS It is estimated that KScauses 1ndash3 of all male infertility cases (5) In fact 10 of allmale patients with azoospermia are likely to be caused by KSNumerous KS comorbidities are related to hypogonadismosteoporosis cognitive disorders metabolic syndrome and dia-betes and cardiac disease Consequently KS males are allocatedlifelong testosterone replacement therapy from puberty Yetthe therapy does not counteract or prevent all comorbidities Itis still unclear whether the chromosome abnormality adds anextra testosterone-independent facet to the complexity of KScomorbidities (6)

Comorbidities have been of interest for many years in epi-demiology studies In recent years more data-drivenapproaches have linked molecular aetiology to observed dis-ease correlations extracted from electronic patient recordsStudying disease comorbidities using a systems biology ap-proach potentially enhances the understanding of disease ori-gin and progression and the longitudinal interplay betweencomorbidities (78) Genes associated with the same diseasetend to be co-expressed and work together in protein com-plexes (9ndash12) Thus studying co-expressed genes and proteincomplexes can give further functional insight into the biolo-gical transformation between healthy and diseased individ-uals and potentially further improve disease diagnosistreatment and target discovery (13)

Here we present a population-wide study of KS comorbid-ities extracted from data in the Danish National Patient Registry(14) collected over a period of 185 years Gene expression datawas generated from KS patients and controls from peripheralblood and co-expressed modules of high relevance for the KSphenotype were identified A KS comorbidity network was builtbased on the extracted disease-pair frequencies knowndisease-associated proteins and their proteinndashprotein inter-actions (PPIs) Disease distances and nodes central for the inter-play between diseases were subsequently explored in thenetwork The various analytical approaches used in this studypoint out the same disturbed functionalities and therebystrengthen their relevance for KS

ResultsKlinefelter syndrome comorbidities

KS comorbidities were systematically extracted from the DanishNational Patient Registry (14) using the International Classificationof Diseases version 10 (ICD-10) level-3 codes checking for co-occurrence of the KS code Q98 with all other disease codes (seelsquoMaterials and Methodsrsquo for details) Comorbidities were defined asdiseases significantly diagnosed a minimum of 50 more fre-quently in men with KS compared to controls ie relative risk(RR) 15 P-values were obtained using the Fisherrsquos test andBenjamini-Hochberg correction was applied to obtain a false dis-covery rate (FDR) of 5 corresponding to P 00082 A total of 78significant KS comorbidities was identified with a minimum of fiveKS patients diagnosed (Supplementary Material Table S1) Figure 1displays the KS comorbidities by ICD-10 chapters (short abbrevi-ations for chapter titles in Supplementary Material Table S2)Many well-known KS comorbidities were as expected observed inthe patient records including E29 lsquotesticular dysfunctionrsquo (RRfrac14 10188) M81 lsquoosteoporosis without pathological fractionrsquo (RRfrac14 12433)N62 lsquohypertrophy of the breastrsquo (RRfrac14 8252) F79 lsquounspecified men-tal retardationrsquo (RRfrac14 6907) E14 lsquounspecified diabetes mellitusrsquo(RRfrac14 6332) Q53 lsquoundescended testiclersquo (RRfrac14 4227) E66 lsquoobesityrsquo(RRfrac14 4038) and G40 lsquoepilepsyrsquo (RRfrac14 3928) (15ndash17) Less establishedKS comorbidities were also observed eg E23 lsquohypofunction andother disorders of the pituitary glandrsquo (RRfrac14 8369) K02 lsquodental car-iesrsquo (RRfrac14 4816) and K07 lsquodentofacial anomalities [including mal-occlusion]rsquo (RRfrac14 384) All comorbid level-3 codes were checked forobservations representing phenotypes diverging in opposite direc-tions as level-4 codes Only one such case was identified E29 lsquotes-ticular dysfunctionrsquo (RRfrac14 10188) that includes both E290lsquotesticular hyperfunctionrsquo and E291 lsquotesticular hypofunctionrsquo ofwhich only the latter was found comorbid with KS (RRfrac14 11767)

Differentially expressed genes in peripheral blood

Blood samples from eight KS patients and eight controls were ex-pression profiled using microarrays A total of 12947 genes werefound to be expressed in at least one of the 16 samples (intensitycut off 727) In this gene set we found an enrichment of genesencoded by chromosomes 4 (Pfrac14 00232) X (Pfrac14 0017) and Y(Pfrac14 2483e-05) (Fisherrsquos exact test P 005) (SupplementaryMaterial Fig S1) Statistical testing identified 363 differentially ex-pressed genes (146 up- and 217 down-regulated) considering anabsolute log2 fold change (lfc) 15 and a BenjaminindashHochberg-corrected FDR of 5 corresponding to P 00069 (SupplementaryMaterial Table S3)

The most significant differentially expressed and upregu-lated KS gene was the long non-coding RNA X inactive specifictranscript (XIST) (lfcfrac14 841) A total of 22 X chromosome geneswas deregulated in KS compared to controls 16 were up-regulated (AKAP17A ASMTL CSF2RA EIF1AX EIF2S3 GPR82GTPBP6 IL3RA PLCXD1 PPP2R3B PRKX RP11-EPO6O153 SEPT6SLC25A6 TMSB4X and XIST) and six were down-regulated(BEND2 BEX1 COX7B FOXO4 NHS and TFE3) Some of thesegenes were expressed from pseudoautosomal region 1 (PAR1)AKAP17A ASMTL CSF2RA GTPBP6 IL3RA and PLCXD1 but nodifferentially expressed genes were observed encoded frompseudoautosomal region 2 An interesting observation was theup-regulation of another long non-coding RNA RP11-706O153The function of this gene is still unknown but it is located nearPAR1 at Xp2233 and thus potentially escapes XCI andor mightbe involved in the XCI process itself

1220 | Human Molecular Genetics 2017 Vol 26 No 7

Gene set enrichment analysis (GSEA) was performed to gainfunctional understanding of the deregulated genes (nfrac14 363) Themost enriched biological process terms were lsquoimmune responsersquoand lsquoresponse to bacteriumrsquo and the most enriched pathwayswere lsquoCytokinendashcytokine receptor interactionrsquo and lsquoJak-STAT sig-nalling pathwayrsquo (full list in Supplementary Material Table S4)The lsquoCytokinendashcytokine receptor interactionrsquo pathway included18 deregulated genes (seven up-regulated CCR7 CSF2RA IFNA4IL3RA IL6ST IL7R and TNFRSF25 and 11 down-regulated CXCL1CXCL16 CXCL2 CXCR1 EPOR IL1R1 IL4 INHBB LEP LEPR and

OSM) Multiple genes are involved in both the lsquoCytokinendashcytokinereceptor interactionrsquo and the lsquoJak-STAT signalling pathwayrsquo(CSF2RA EPOR IFNA4 IL3RA IL4 IL6ST IL7R LEP LEPR and OSM)

Co-expressed gene modules correlating with Klinefeltersyndrome status

A total of 54 co-expressed modules containing genes with simi-lar expression patterns was identified by applying weightedgene co-expression network analysis (WGCNA) (18) (see

Figure 1 Klinefelter syndrome (KS) comorbidities extracted from Danish patient records (nfrac1478) Comorbidities are summarised by disease chapter and each dot repre-

sents a KS comorbidity with a certain relative risk (RR) The disease chapter titles are shortened (see Supplementary Material Table S2) There were no comorbidities

between KS and disorders from chapters 2 7 8 16 and 18ndash21 See Supplementary Material Table S1 for the full list of KS comorbidities and RRs

1221Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 2 Gene co-expression modules correlating with Klinefelter syndrome (KS) status (A) Gene co-expression analysis dendrogram showing co-expressed gene mo-

dules identified using Weighted gene co-expression network analysis (WGCNA) (nfrac1454) The top colour bar states expression correlation with KS status (red positive

correlation blue negative correlation) The bar below informs about module membership In total 27 modules correlated significantly with KS status and reported

here as coloured across the dendrogram (B) Scaled expression per individual of the 99 genes in the most positively correlated module with KS status M1saddlebrown

(C) Scaled expression per individual of the 81 genes in the most negatively correlated module with KS status M2skyblue3

1222 | Human Molecular Genetics 2017 Vol 26 No 7

lsquoMaterials and Methodsrsquo for details) Of these 27 modules hadexpression signatures that correlated with KS status either posi-tively (higher expressed in KS than controls nfrac14 14) or nega-tively (lower expressed in KS than controls nfrac14 13) All moduleswere assigned a colour and labelled numerically according totheir degree of significant correlation to KS status (Fig 2A andSupplementary Material Table S5)

The most significant and positively correlated modulewas M1saddlebrown (WGCNA student asymptotic t-testPfrac14 112e07) (Fig 2B) This module included 99 genes includingmany X-linked and PAR1 genes AKAP17A CA5B EIF1AX IL3RASCML1 SEPT6 XIST and ZBED1 An interesting gene in the mo-dule was DDX17 that recently was discovered as protein interac-tor of XIST (19) M1 contained 32 of the significantly up-regulated genes in KS identified in the statistical test describedpreviously (Supplementary Material Table S3) GSEA showedthat the M1 genes were involved in the KEGG pathwayslsquoneuroactive ligand-receptor interaction pathwayrsquo and lsquocalciumsignalling pathwayrsquo and several Online Mendelian Inheritancein Man (OMIM) diseases such as lsquoabnormal thyroid hormonemetabolismrsquo lsquoepilepsyrsquo and lsquodeafnessrsquo (Supplementary MaterialTable S5 for complete list)

The most significant negatively correlated module was M2skyblue3 (WGCNA student asymptotic t-test Pfrac14 521e06) (Fig2C) This module included 81 genes none of them located in thePARs M2 contained 38 of the down-regulated genes in KS andGSEA revealed enriched biological processes such as lsquoimmunesystem processrsquo and lsquoresponse to bacteriarsquo and the KEGG path-way lsquoplatelet activationrsquo (Supplementary Material Table S5)Interestingly both WGCNA and the differential expression ana-lysis showed significantly enriched biological terms related tothe immune system These terms contained mainly downregu-lated genes In order to evaluate the overlap between the genesinvolved in these terms within M2 and the deregulated geneswe computed the hypergeometric distribution obtaining a sig-nificant overlap (Pfrac14 335e47) This result showed that theGSEA analysis and the consequent interpretation of the resultswere consistent that is the down-regulation of genes involvedin the immune system could explain some diseases associatedwith KS and its comorbidities

Supplementary Material Figure S2 displays the chromosomaldistribution of genes in each of the 27 modules correlating signifi-cantly with KS status M1 was one of the modules with the high-est X chromosome gene content Yet overall the modulescontained genes spread out genome-wide underlining that hav-ing an extra chromosome changes gene expression globally

Klinefelter syndrome comorbidity network

One of the main aims of this study was to investigate the molecu-lar relationship between KS and its comorbidities and to identifykey players in their crosstalk For this purpose a KS comorbiditynetwork was constructed consisting of disease sub-networkseach covering KS or one of its comorbidities Each disease sub-network was built of associated proteins extracted from disease-gene databases and following connecting with observed PPIsfrom InWeb_InBioMap (InWeb_IM) (11) The disease sub-networks were following merged into the KS comorbidity net-work (see lsquoMaterials and Methodsrsquo and Supplementary MaterialFig S3 for further details) Each of the sub-networks were testedfor increased PPI connectivity compared to random networks ofthe same size and node degree distribution thereby testing theirfunctional relevance under the hypothesis that functionally

related proteins are more likely to interact than unrelated pro-teins (20) It was possible to build disease sub-networks for KSand 29 of its comorbidities and the sub-networks for KS and 22comorbidity subsequently showed significantly high connectivity(P-value cut off 005) (Supplementary Material Table S6)Accordingly the KS comorbidity network covered 23 diseasesand consisted of 299 nodes and 528 edges (Fig 3A)

Network distances between each of the disease pairs in theKS comorbidity network were determined by comparing thenetwork-based distances between the proteins involved in thetwo diseases (21) As expected most diseases had a negative dis-ease separation indicating overlapping network topology Thecomorbidities with most PPI topological overlap (TO) to KS wasQ53 lsquoundescended testiclersquo N46 lsquomale infertilityrsquo and E14 lsquoun-specified diabetes mellitusrsquo (Fig 3B and Supplementary MaterialTable S7) The comorbidity network had the common topologicalcharacteristics of networks from the molecular level biologicaldomain and the KS sub-network did not stand out in any topo-logical measures (see Supplementary Material Text S1 and TextS2 for more details and Supplementary Material Table S8)

Numerous nodes in the network were associated to multipleKS comorbidities In this study we defined two categories of suchnodes comorbidity nodes and comorbidity-linking hubs Comorbiditynodes are associated in disease-gene databases to numerouscomorbidities themselves Comorbidity-linking hubs are centralto multiple comorbidities through their first-order interactionpartners which are associated with multiple comorbiditiesThus the term lsquohubrsquo here refers to the centrality of the node tonumerous diseases and is not referring to the PPI connectivity ofthe node The KS comorbidity network can be considered as thelsquobackbonersquo for tissue-agnostic network and applying gene expres-sion data from different tissues will give tissue-specific informa-tion In this study we applied the gene expression data fromblood presented above This data set covered 174 nodes in thenetwork (Supplementary Material Table S9)

Some comorbidity nodes were associated with numerous codesfrom the same or disease-related ICD-10 chapters eg chapters Iand Q both featuring heart diseases Of more interest werecomorbidity nodes linking diseases affecting diverse pathophysi-ology across tissues Numerous such nodes were present in thenetwork and involved in up to twelve diseases Three nodes link-ing minimum five diseases were significantly down-regulated inKS POMC (lfc frac14 0762) LEP (lfc frac14 0785) and IL4 (lfc frac14 0814)linking seven six and five diseases respectively

Comorbidity-linking hubs were also abundant in the networkie nodes that connect multiple comorbidities through theirfirst order interactions partners The biggest deregulatedcomorbidity-linking hubs were LEP LEPR (lfc frac14 0600) POMCEPOR (lfc frac14 0603) CSF2RA (lfcfrac14 0648) and IL4 connectingnine eight eight seven seven and six diseases respectivelyThus LEP POMC and IL4 were both comorbidity nodes andcomorbidity-linking hubs ie both associated to multiple di-seases themselves and also linking multiple diseases throughtheir first-order interaction partners

Protein complexes with perturbed functionality in KS

Protein complexes ie clusters were identified in the comorbi-dity network In total 18 significant clusters (P 005) were iden-tified with the most significant cluster C1 containing 16proteins CCR2 CSF2RA EPO EPOR GHR IL2 IL23R IL2RA IL4RIL5 IL5RA IL10RA JAK2 LEPR MPL and PKD1 (Fig 3C andSupplementary Material Table S10) These proteins are

1223Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 3 Klinefelter syndrome (KS) comorbidity network disease separation and protein complex with disturbed functionality (A) The global KS comorbidity network

created from disease-associated proteins from knowledge databases linked by observed proteinndashprotein interactions (PPIs) The network represents the sub-networks

for KS and the 22 comorbidities with significant high connectivity Nodes are coloured according to their associated disease chapter similarly to Figure 1 Nodes associ-

ated to diseases belonging to multiple chapters are multi-coloured Node size is scaled to the number of KS comorbidities each node is associated with (B) Disease se-

parations of the 23 diseases represented in the comorbidity network clustered after similarity and only including edges between diseases with overlapping network

topology (C) The most statistical significant cluster in the network The size and colouring of the nodes are similar to in the full network Node border colour represents

significant up- (red) and down-regulation (blue) in KS compared to controls

1224 | Human Molecular Genetics 2017 Vol 26 No 7

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 4: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

Gene set enrichment analysis (GSEA) was performed to gainfunctional understanding of the deregulated genes (nfrac14 363) Themost enriched biological process terms were lsquoimmune responsersquoand lsquoresponse to bacteriumrsquo and the most enriched pathwayswere lsquoCytokinendashcytokine receptor interactionrsquo and lsquoJak-STAT sig-nalling pathwayrsquo (full list in Supplementary Material Table S4)The lsquoCytokinendashcytokine receptor interactionrsquo pathway included18 deregulated genes (seven up-regulated CCR7 CSF2RA IFNA4IL3RA IL6ST IL7R and TNFRSF25 and 11 down-regulated CXCL1CXCL16 CXCL2 CXCR1 EPOR IL1R1 IL4 INHBB LEP LEPR and

OSM) Multiple genes are involved in both the lsquoCytokinendashcytokinereceptor interactionrsquo and the lsquoJak-STAT signalling pathwayrsquo(CSF2RA EPOR IFNA4 IL3RA IL4 IL6ST IL7R LEP LEPR and OSM)

Co-expressed gene modules correlating with Klinefeltersyndrome status

A total of 54 co-expressed modules containing genes with simi-lar expression patterns was identified by applying weightedgene co-expression network analysis (WGCNA) (18) (see

Figure 1 Klinefelter syndrome (KS) comorbidities extracted from Danish patient records (nfrac1478) Comorbidities are summarised by disease chapter and each dot repre-

sents a KS comorbidity with a certain relative risk (RR) The disease chapter titles are shortened (see Supplementary Material Table S2) There were no comorbidities

between KS and disorders from chapters 2 7 8 16 and 18ndash21 See Supplementary Material Table S1 for the full list of KS comorbidities and RRs

1221Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 2 Gene co-expression modules correlating with Klinefelter syndrome (KS) status (A) Gene co-expression analysis dendrogram showing co-expressed gene mo-

dules identified using Weighted gene co-expression network analysis (WGCNA) (nfrac1454) The top colour bar states expression correlation with KS status (red positive

correlation blue negative correlation) The bar below informs about module membership In total 27 modules correlated significantly with KS status and reported

here as coloured across the dendrogram (B) Scaled expression per individual of the 99 genes in the most positively correlated module with KS status M1saddlebrown

(C) Scaled expression per individual of the 81 genes in the most negatively correlated module with KS status M2skyblue3

1222 | Human Molecular Genetics 2017 Vol 26 No 7

lsquoMaterials and Methodsrsquo for details) Of these 27 modules hadexpression signatures that correlated with KS status either posi-tively (higher expressed in KS than controls nfrac14 14) or nega-tively (lower expressed in KS than controls nfrac14 13) All moduleswere assigned a colour and labelled numerically according totheir degree of significant correlation to KS status (Fig 2A andSupplementary Material Table S5)

The most significant and positively correlated modulewas M1saddlebrown (WGCNA student asymptotic t-testPfrac14 112e07) (Fig 2B) This module included 99 genes includingmany X-linked and PAR1 genes AKAP17A CA5B EIF1AX IL3RASCML1 SEPT6 XIST and ZBED1 An interesting gene in the mo-dule was DDX17 that recently was discovered as protein interac-tor of XIST (19) M1 contained 32 of the significantly up-regulated genes in KS identified in the statistical test describedpreviously (Supplementary Material Table S3) GSEA showedthat the M1 genes were involved in the KEGG pathwayslsquoneuroactive ligand-receptor interaction pathwayrsquo and lsquocalciumsignalling pathwayrsquo and several Online Mendelian Inheritancein Man (OMIM) diseases such as lsquoabnormal thyroid hormonemetabolismrsquo lsquoepilepsyrsquo and lsquodeafnessrsquo (Supplementary MaterialTable S5 for complete list)

The most significant negatively correlated module was M2skyblue3 (WGCNA student asymptotic t-test Pfrac14 521e06) (Fig2C) This module included 81 genes none of them located in thePARs M2 contained 38 of the down-regulated genes in KS andGSEA revealed enriched biological processes such as lsquoimmunesystem processrsquo and lsquoresponse to bacteriarsquo and the KEGG path-way lsquoplatelet activationrsquo (Supplementary Material Table S5)Interestingly both WGCNA and the differential expression ana-lysis showed significantly enriched biological terms related tothe immune system These terms contained mainly downregu-lated genes In order to evaluate the overlap between the genesinvolved in these terms within M2 and the deregulated geneswe computed the hypergeometric distribution obtaining a sig-nificant overlap (Pfrac14 335e47) This result showed that theGSEA analysis and the consequent interpretation of the resultswere consistent that is the down-regulation of genes involvedin the immune system could explain some diseases associatedwith KS and its comorbidities

Supplementary Material Figure S2 displays the chromosomaldistribution of genes in each of the 27 modules correlating signifi-cantly with KS status M1 was one of the modules with the high-est X chromosome gene content Yet overall the modulescontained genes spread out genome-wide underlining that hav-ing an extra chromosome changes gene expression globally

Klinefelter syndrome comorbidity network

One of the main aims of this study was to investigate the molecu-lar relationship between KS and its comorbidities and to identifykey players in their crosstalk For this purpose a KS comorbiditynetwork was constructed consisting of disease sub-networkseach covering KS or one of its comorbidities Each disease sub-network was built of associated proteins extracted from disease-gene databases and following connecting with observed PPIsfrom InWeb_InBioMap (InWeb_IM) (11) The disease sub-networks were following merged into the KS comorbidity net-work (see lsquoMaterials and Methodsrsquo and Supplementary MaterialFig S3 for further details) Each of the sub-networks were testedfor increased PPI connectivity compared to random networks ofthe same size and node degree distribution thereby testing theirfunctional relevance under the hypothesis that functionally

related proteins are more likely to interact than unrelated pro-teins (20) It was possible to build disease sub-networks for KSand 29 of its comorbidities and the sub-networks for KS and 22comorbidity subsequently showed significantly high connectivity(P-value cut off 005) (Supplementary Material Table S6)Accordingly the KS comorbidity network covered 23 diseasesand consisted of 299 nodes and 528 edges (Fig 3A)

Network distances between each of the disease pairs in theKS comorbidity network were determined by comparing thenetwork-based distances between the proteins involved in thetwo diseases (21) As expected most diseases had a negative dis-ease separation indicating overlapping network topology Thecomorbidities with most PPI topological overlap (TO) to KS wasQ53 lsquoundescended testiclersquo N46 lsquomale infertilityrsquo and E14 lsquoun-specified diabetes mellitusrsquo (Fig 3B and Supplementary MaterialTable S7) The comorbidity network had the common topologicalcharacteristics of networks from the molecular level biologicaldomain and the KS sub-network did not stand out in any topo-logical measures (see Supplementary Material Text S1 and TextS2 for more details and Supplementary Material Table S8)

Numerous nodes in the network were associated to multipleKS comorbidities In this study we defined two categories of suchnodes comorbidity nodes and comorbidity-linking hubs Comorbiditynodes are associated in disease-gene databases to numerouscomorbidities themselves Comorbidity-linking hubs are centralto multiple comorbidities through their first-order interactionpartners which are associated with multiple comorbiditiesThus the term lsquohubrsquo here refers to the centrality of the node tonumerous diseases and is not referring to the PPI connectivity ofthe node The KS comorbidity network can be considered as thelsquobackbonersquo for tissue-agnostic network and applying gene expres-sion data from different tissues will give tissue-specific informa-tion In this study we applied the gene expression data fromblood presented above This data set covered 174 nodes in thenetwork (Supplementary Material Table S9)

Some comorbidity nodes were associated with numerous codesfrom the same or disease-related ICD-10 chapters eg chapters Iand Q both featuring heart diseases Of more interest werecomorbidity nodes linking diseases affecting diverse pathophysi-ology across tissues Numerous such nodes were present in thenetwork and involved in up to twelve diseases Three nodes link-ing minimum five diseases were significantly down-regulated inKS POMC (lfc frac14 0762) LEP (lfc frac14 0785) and IL4 (lfc frac14 0814)linking seven six and five diseases respectively

Comorbidity-linking hubs were also abundant in the networkie nodes that connect multiple comorbidities through theirfirst order interactions partners The biggest deregulatedcomorbidity-linking hubs were LEP LEPR (lfc frac14 0600) POMCEPOR (lfc frac14 0603) CSF2RA (lfcfrac14 0648) and IL4 connectingnine eight eight seven seven and six diseases respectivelyThus LEP POMC and IL4 were both comorbidity nodes andcomorbidity-linking hubs ie both associated to multiple di-seases themselves and also linking multiple diseases throughtheir first-order interaction partners

Protein complexes with perturbed functionality in KS

Protein complexes ie clusters were identified in the comorbi-dity network In total 18 significant clusters (P 005) were iden-tified with the most significant cluster C1 containing 16proteins CCR2 CSF2RA EPO EPOR GHR IL2 IL23R IL2RA IL4RIL5 IL5RA IL10RA JAK2 LEPR MPL and PKD1 (Fig 3C andSupplementary Material Table S10) These proteins are

1223Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 3 Klinefelter syndrome (KS) comorbidity network disease separation and protein complex with disturbed functionality (A) The global KS comorbidity network

created from disease-associated proteins from knowledge databases linked by observed proteinndashprotein interactions (PPIs) The network represents the sub-networks

for KS and the 22 comorbidities with significant high connectivity Nodes are coloured according to their associated disease chapter similarly to Figure 1 Nodes associ-

ated to diseases belonging to multiple chapters are multi-coloured Node size is scaled to the number of KS comorbidities each node is associated with (B) Disease se-

parations of the 23 diseases represented in the comorbidity network clustered after similarity and only including edges between diseases with overlapping network

topology (C) The most statistical significant cluster in the network The size and colouring of the nodes are similar to in the full network Node border colour represents

significant up- (red) and down-regulation (blue) in KS compared to controls

1224 | Human Molecular Genetics 2017 Vol 26 No 7

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 5: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

Figure 2 Gene co-expression modules correlating with Klinefelter syndrome (KS) status (A) Gene co-expression analysis dendrogram showing co-expressed gene mo-

dules identified using Weighted gene co-expression network analysis (WGCNA) (nfrac1454) The top colour bar states expression correlation with KS status (red positive

correlation blue negative correlation) The bar below informs about module membership In total 27 modules correlated significantly with KS status and reported

here as coloured across the dendrogram (B) Scaled expression per individual of the 99 genes in the most positively correlated module with KS status M1saddlebrown

(C) Scaled expression per individual of the 81 genes in the most negatively correlated module with KS status M2skyblue3

1222 | Human Molecular Genetics 2017 Vol 26 No 7

lsquoMaterials and Methodsrsquo for details) Of these 27 modules hadexpression signatures that correlated with KS status either posi-tively (higher expressed in KS than controls nfrac14 14) or nega-tively (lower expressed in KS than controls nfrac14 13) All moduleswere assigned a colour and labelled numerically according totheir degree of significant correlation to KS status (Fig 2A andSupplementary Material Table S5)

The most significant and positively correlated modulewas M1saddlebrown (WGCNA student asymptotic t-testPfrac14 112e07) (Fig 2B) This module included 99 genes includingmany X-linked and PAR1 genes AKAP17A CA5B EIF1AX IL3RASCML1 SEPT6 XIST and ZBED1 An interesting gene in the mo-dule was DDX17 that recently was discovered as protein interac-tor of XIST (19) M1 contained 32 of the significantly up-regulated genes in KS identified in the statistical test describedpreviously (Supplementary Material Table S3) GSEA showedthat the M1 genes were involved in the KEGG pathwayslsquoneuroactive ligand-receptor interaction pathwayrsquo and lsquocalciumsignalling pathwayrsquo and several Online Mendelian Inheritancein Man (OMIM) diseases such as lsquoabnormal thyroid hormonemetabolismrsquo lsquoepilepsyrsquo and lsquodeafnessrsquo (Supplementary MaterialTable S5 for complete list)

The most significant negatively correlated module was M2skyblue3 (WGCNA student asymptotic t-test Pfrac14 521e06) (Fig2C) This module included 81 genes none of them located in thePARs M2 contained 38 of the down-regulated genes in KS andGSEA revealed enriched biological processes such as lsquoimmunesystem processrsquo and lsquoresponse to bacteriarsquo and the KEGG path-way lsquoplatelet activationrsquo (Supplementary Material Table S5)Interestingly both WGCNA and the differential expression ana-lysis showed significantly enriched biological terms related tothe immune system These terms contained mainly downregu-lated genes In order to evaluate the overlap between the genesinvolved in these terms within M2 and the deregulated geneswe computed the hypergeometric distribution obtaining a sig-nificant overlap (Pfrac14 335e47) This result showed that theGSEA analysis and the consequent interpretation of the resultswere consistent that is the down-regulation of genes involvedin the immune system could explain some diseases associatedwith KS and its comorbidities

Supplementary Material Figure S2 displays the chromosomaldistribution of genes in each of the 27 modules correlating signifi-cantly with KS status M1 was one of the modules with the high-est X chromosome gene content Yet overall the modulescontained genes spread out genome-wide underlining that hav-ing an extra chromosome changes gene expression globally

Klinefelter syndrome comorbidity network

One of the main aims of this study was to investigate the molecu-lar relationship between KS and its comorbidities and to identifykey players in their crosstalk For this purpose a KS comorbiditynetwork was constructed consisting of disease sub-networkseach covering KS or one of its comorbidities Each disease sub-network was built of associated proteins extracted from disease-gene databases and following connecting with observed PPIsfrom InWeb_InBioMap (InWeb_IM) (11) The disease sub-networks were following merged into the KS comorbidity net-work (see lsquoMaterials and Methodsrsquo and Supplementary MaterialFig S3 for further details) Each of the sub-networks were testedfor increased PPI connectivity compared to random networks ofthe same size and node degree distribution thereby testing theirfunctional relevance under the hypothesis that functionally

related proteins are more likely to interact than unrelated pro-teins (20) It was possible to build disease sub-networks for KSand 29 of its comorbidities and the sub-networks for KS and 22comorbidity subsequently showed significantly high connectivity(P-value cut off 005) (Supplementary Material Table S6)Accordingly the KS comorbidity network covered 23 diseasesand consisted of 299 nodes and 528 edges (Fig 3A)

Network distances between each of the disease pairs in theKS comorbidity network were determined by comparing thenetwork-based distances between the proteins involved in thetwo diseases (21) As expected most diseases had a negative dis-ease separation indicating overlapping network topology Thecomorbidities with most PPI topological overlap (TO) to KS wasQ53 lsquoundescended testiclersquo N46 lsquomale infertilityrsquo and E14 lsquoun-specified diabetes mellitusrsquo (Fig 3B and Supplementary MaterialTable S7) The comorbidity network had the common topologicalcharacteristics of networks from the molecular level biologicaldomain and the KS sub-network did not stand out in any topo-logical measures (see Supplementary Material Text S1 and TextS2 for more details and Supplementary Material Table S8)

Numerous nodes in the network were associated to multipleKS comorbidities In this study we defined two categories of suchnodes comorbidity nodes and comorbidity-linking hubs Comorbiditynodes are associated in disease-gene databases to numerouscomorbidities themselves Comorbidity-linking hubs are centralto multiple comorbidities through their first-order interactionpartners which are associated with multiple comorbiditiesThus the term lsquohubrsquo here refers to the centrality of the node tonumerous diseases and is not referring to the PPI connectivity ofthe node The KS comorbidity network can be considered as thelsquobackbonersquo for tissue-agnostic network and applying gene expres-sion data from different tissues will give tissue-specific informa-tion In this study we applied the gene expression data fromblood presented above This data set covered 174 nodes in thenetwork (Supplementary Material Table S9)

Some comorbidity nodes were associated with numerous codesfrom the same or disease-related ICD-10 chapters eg chapters Iand Q both featuring heart diseases Of more interest werecomorbidity nodes linking diseases affecting diverse pathophysi-ology across tissues Numerous such nodes were present in thenetwork and involved in up to twelve diseases Three nodes link-ing minimum five diseases were significantly down-regulated inKS POMC (lfc frac14 0762) LEP (lfc frac14 0785) and IL4 (lfc frac14 0814)linking seven six and five diseases respectively

Comorbidity-linking hubs were also abundant in the networkie nodes that connect multiple comorbidities through theirfirst order interactions partners The biggest deregulatedcomorbidity-linking hubs were LEP LEPR (lfc frac14 0600) POMCEPOR (lfc frac14 0603) CSF2RA (lfcfrac14 0648) and IL4 connectingnine eight eight seven seven and six diseases respectivelyThus LEP POMC and IL4 were both comorbidity nodes andcomorbidity-linking hubs ie both associated to multiple di-seases themselves and also linking multiple diseases throughtheir first-order interaction partners

Protein complexes with perturbed functionality in KS

Protein complexes ie clusters were identified in the comorbi-dity network In total 18 significant clusters (P 005) were iden-tified with the most significant cluster C1 containing 16proteins CCR2 CSF2RA EPO EPOR GHR IL2 IL23R IL2RA IL4RIL5 IL5RA IL10RA JAK2 LEPR MPL and PKD1 (Fig 3C andSupplementary Material Table S10) These proteins are

1223Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 3 Klinefelter syndrome (KS) comorbidity network disease separation and protein complex with disturbed functionality (A) The global KS comorbidity network

created from disease-associated proteins from knowledge databases linked by observed proteinndashprotein interactions (PPIs) The network represents the sub-networks

for KS and the 22 comorbidities with significant high connectivity Nodes are coloured according to their associated disease chapter similarly to Figure 1 Nodes associ-

ated to diseases belonging to multiple chapters are multi-coloured Node size is scaled to the number of KS comorbidities each node is associated with (B) Disease se-

parations of the 23 diseases represented in the comorbidity network clustered after similarity and only including edges between diseases with overlapping network

topology (C) The most statistical significant cluster in the network The size and colouring of the nodes are similar to in the full network Node border colour represents

significant up- (red) and down-regulation (blue) in KS compared to controls

1224 | Human Molecular Genetics 2017 Vol 26 No 7

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 6: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

lsquoMaterials and Methodsrsquo for details) Of these 27 modules hadexpression signatures that correlated with KS status either posi-tively (higher expressed in KS than controls nfrac14 14) or nega-tively (lower expressed in KS than controls nfrac14 13) All moduleswere assigned a colour and labelled numerically according totheir degree of significant correlation to KS status (Fig 2A andSupplementary Material Table S5)

The most significant and positively correlated modulewas M1saddlebrown (WGCNA student asymptotic t-testPfrac14 112e07) (Fig 2B) This module included 99 genes includingmany X-linked and PAR1 genes AKAP17A CA5B EIF1AX IL3RASCML1 SEPT6 XIST and ZBED1 An interesting gene in the mo-dule was DDX17 that recently was discovered as protein interac-tor of XIST (19) M1 contained 32 of the significantly up-regulated genes in KS identified in the statistical test describedpreviously (Supplementary Material Table S3) GSEA showedthat the M1 genes were involved in the KEGG pathwayslsquoneuroactive ligand-receptor interaction pathwayrsquo and lsquocalciumsignalling pathwayrsquo and several Online Mendelian Inheritancein Man (OMIM) diseases such as lsquoabnormal thyroid hormonemetabolismrsquo lsquoepilepsyrsquo and lsquodeafnessrsquo (Supplementary MaterialTable S5 for complete list)

The most significant negatively correlated module was M2skyblue3 (WGCNA student asymptotic t-test Pfrac14 521e06) (Fig2C) This module included 81 genes none of them located in thePARs M2 contained 38 of the down-regulated genes in KS andGSEA revealed enriched biological processes such as lsquoimmunesystem processrsquo and lsquoresponse to bacteriarsquo and the KEGG path-way lsquoplatelet activationrsquo (Supplementary Material Table S5)Interestingly both WGCNA and the differential expression ana-lysis showed significantly enriched biological terms related tothe immune system These terms contained mainly downregu-lated genes In order to evaluate the overlap between the genesinvolved in these terms within M2 and the deregulated geneswe computed the hypergeometric distribution obtaining a sig-nificant overlap (Pfrac14 335e47) This result showed that theGSEA analysis and the consequent interpretation of the resultswere consistent that is the down-regulation of genes involvedin the immune system could explain some diseases associatedwith KS and its comorbidities

Supplementary Material Figure S2 displays the chromosomaldistribution of genes in each of the 27 modules correlating signifi-cantly with KS status M1 was one of the modules with the high-est X chromosome gene content Yet overall the modulescontained genes spread out genome-wide underlining that hav-ing an extra chromosome changes gene expression globally

Klinefelter syndrome comorbidity network

One of the main aims of this study was to investigate the molecu-lar relationship between KS and its comorbidities and to identifykey players in their crosstalk For this purpose a KS comorbiditynetwork was constructed consisting of disease sub-networkseach covering KS or one of its comorbidities Each disease sub-network was built of associated proteins extracted from disease-gene databases and following connecting with observed PPIsfrom InWeb_InBioMap (InWeb_IM) (11) The disease sub-networks were following merged into the KS comorbidity net-work (see lsquoMaterials and Methodsrsquo and Supplementary MaterialFig S3 for further details) Each of the sub-networks were testedfor increased PPI connectivity compared to random networks ofthe same size and node degree distribution thereby testing theirfunctional relevance under the hypothesis that functionally

related proteins are more likely to interact than unrelated pro-teins (20) It was possible to build disease sub-networks for KSand 29 of its comorbidities and the sub-networks for KS and 22comorbidity subsequently showed significantly high connectivity(P-value cut off 005) (Supplementary Material Table S6)Accordingly the KS comorbidity network covered 23 diseasesand consisted of 299 nodes and 528 edges (Fig 3A)

Network distances between each of the disease pairs in theKS comorbidity network were determined by comparing thenetwork-based distances between the proteins involved in thetwo diseases (21) As expected most diseases had a negative dis-ease separation indicating overlapping network topology Thecomorbidities with most PPI topological overlap (TO) to KS wasQ53 lsquoundescended testiclersquo N46 lsquomale infertilityrsquo and E14 lsquoun-specified diabetes mellitusrsquo (Fig 3B and Supplementary MaterialTable S7) The comorbidity network had the common topologicalcharacteristics of networks from the molecular level biologicaldomain and the KS sub-network did not stand out in any topo-logical measures (see Supplementary Material Text S1 and TextS2 for more details and Supplementary Material Table S8)

Numerous nodes in the network were associated to multipleKS comorbidities In this study we defined two categories of suchnodes comorbidity nodes and comorbidity-linking hubs Comorbiditynodes are associated in disease-gene databases to numerouscomorbidities themselves Comorbidity-linking hubs are centralto multiple comorbidities through their first-order interactionpartners which are associated with multiple comorbiditiesThus the term lsquohubrsquo here refers to the centrality of the node tonumerous diseases and is not referring to the PPI connectivity ofthe node The KS comorbidity network can be considered as thelsquobackbonersquo for tissue-agnostic network and applying gene expres-sion data from different tissues will give tissue-specific informa-tion In this study we applied the gene expression data fromblood presented above This data set covered 174 nodes in thenetwork (Supplementary Material Table S9)

Some comorbidity nodes were associated with numerous codesfrom the same or disease-related ICD-10 chapters eg chapters Iand Q both featuring heart diseases Of more interest werecomorbidity nodes linking diseases affecting diverse pathophysi-ology across tissues Numerous such nodes were present in thenetwork and involved in up to twelve diseases Three nodes link-ing minimum five diseases were significantly down-regulated inKS POMC (lfc frac14 0762) LEP (lfc frac14 0785) and IL4 (lfc frac14 0814)linking seven six and five diseases respectively

Comorbidity-linking hubs were also abundant in the networkie nodes that connect multiple comorbidities through theirfirst order interactions partners The biggest deregulatedcomorbidity-linking hubs were LEP LEPR (lfc frac14 0600) POMCEPOR (lfc frac14 0603) CSF2RA (lfcfrac14 0648) and IL4 connectingnine eight eight seven seven and six diseases respectivelyThus LEP POMC and IL4 were both comorbidity nodes andcomorbidity-linking hubs ie both associated to multiple di-seases themselves and also linking multiple diseases throughtheir first-order interaction partners

Protein complexes with perturbed functionality in KS

Protein complexes ie clusters were identified in the comorbi-dity network In total 18 significant clusters (P 005) were iden-tified with the most significant cluster C1 containing 16proteins CCR2 CSF2RA EPO EPOR GHR IL2 IL23R IL2RA IL4RIL5 IL5RA IL10RA JAK2 LEPR MPL and PKD1 (Fig 3C andSupplementary Material Table S10) These proteins are

1223Human Molecular Genetics 2017 Vol 26 No 7 |

Figure 3 Klinefelter syndrome (KS) comorbidity network disease separation and protein complex with disturbed functionality (A) The global KS comorbidity network

created from disease-associated proteins from knowledge databases linked by observed proteinndashprotein interactions (PPIs) The network represents the sub-networks

for KS and the 22 comorbidities with significant high connectivity Nodes are coloured according to their associated disease chapter similarly to Figure 1 Nodes associ-

ated to diseases belonging to multiple chapters are multi-coloured Node size is scaled to the number of KS comorbidities each node is associated with (B) Disease se-

parations of the 23 diseases represented in the comorbidity network clustered after similarity and only including edges between diseases with overlapping network

topology (C) The most statistical significant cluster in the network The size and colouring of the nodes are similar to in the full network Node border colour represents

significant up- (red) and down-regulation (blue) in KS compared to controls

1224 | Human Molecular Genetics 2017 Vol 26 No 7

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 7: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

Figure 3 Klinefelter syndrome (KS) comorbidity network disease separation and protein complex with disturbed functionality (A) The global KS comorbidity network

created from disease-associated proteins from knowledge databases linked by observed proteinndashprotein interactions (PPIs) The network represents the sub-networks

for KS and the 22 comorbidities with significant high connectivity Nodes are coloured according to their associated disease chapter similarly to Figure 1 Nodes associ-

ated to diseases belonging to multiple chapters are multi-coloured Node size is scaled to the number of KS comorbidities each node is associated with (B) Disease se-

parations of the 23 diseases represented in the comorbidity network clustered after similarity and only including edges between diseases with overlapping network

topology (C) The most statistical significant cluster in the network The size and colouring of the nodes are similar to in the full network Node border colour represents

significant up- (red) and down-regulation (blue) in KS compared to controls

1224 | Human Molecular Genetics 2017 Vol 26 No 7

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 8: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

involved in three main KEGG pathways lsquoCytokinendashcytokine re-ceptor interactionrsquo lsquoJak-STAT signalling pathwayrsquo and lsquohemato-poietic cell lineagersquo and also related to numerous KScomorbidities such as obesity diabetes mellitus atherosclerosisand respiratory disorders In this cluster CSF2RA LEP and EPORwere deregulated as before mentioned which might cause thiscomplex to have disturbed functionality in KS

DiscussionKS is the most common male sex chromosome aneuploidy KSmales have a 70 increased risk of being hospitalized and have areduced life expectancy by two years compared to normal karyo-type males due to increased disease susceptibility (22) Still KS isunder diagnosed and treatment is most often limited to testoster-one therapy which overcomes some but by far all comorbiditiesIn this study we identified 78 significant KS comorbidities usingpopulation-wide patient record data Up to half of the diseaseswere not previously reported in comorbidity studies on KS comor-bidities (15ndash17) Among the novel and unexpected comorbidityfindings was E23 lsquohypofunction and other disorders of the pituit-ary glandrsquo (RRfrac14 542) Usually KS males present a normal growthhormone-IGF-I axis (23) and increased luteinizing hormone andfollicle-stimulating hormone levels due to their androgen defi-ciency (1724) Our observation could in theory result from in-appropriate ICD-10 registration of luteinizing hormonefollicle-stimulating hormone hyposecretion observed during androgen re-placement therapy which occasionally suppresses gonadotropinsto subnormal levels due to negative feedback Another still less es-tablished comorbidity of KS is dental anomalies although severalcase studies of observed late eruption and dental problems in KSmales have been published (25ndash28) Consequently a conclusionfrom this study is to prioritise dental observation of KS males

XCI compensates extensively for the extra X chromosomegene dosage in KS males and probably increases their chancesof survival compared to other aneuploidies The master effectorof XCI the non-coding RNA XIST was the most significantlyupregulated gene in KS consistent with previous findings(2930) Still it is estimated that approximately 15 of the Xencoded genes escape XCI (31) prompting the characteristic KSphenotypes and comorbidities In this study we identified thegenes escaping XCI We found that the majority of the 363 sig-nificantly deregulated genes were not X encoded Thus havingan extra X chromosome influences gene expression across thegenome and underlines the trans-regulatory effect of the es-capees Interestingly it has been shown that the mere presenceof extra chromosome leads to genomic instability and changesin gene expression in cancer cell lines (32ndash34) These studiesshowed that gain of chromosomes triggers replication stresspromoting genomic instability and possibly contributing totumorigenesis in cell lines Moreover the authors showed a re-markable change in the transcriptional regulation (33) Otherstudies showed that excess expression of one or more Xchromosome genes influences brain development and it hasbeen hypothesized that specific brain-expressed genes that areknown to escape XCI could be responsible for cognitive dis-orders of KS patients (3536) Future work with in vivo analyses isneeded to better understand this phenomenon but the gain ofextra chromosomes seems to be responsible for the observedchanges in the global gene expression

The most enriched biological process term for the 363 diffe-rential expressed genes was lsquoimmune responsersquo This is consistentwith that the lack of testosterone in KS patients enhances cellularand humoral immunity and androgen replacement treatment

may suppress this process (37) Moreover it has been reported thatinfectious diseases cause increased mortality of KS patients (38)

Previously focus has mostly been on single proteinndashdiseaserelationships Yet proteins work in concert and we are only start-ing to fully understand how protein subunits work in a networkperspective The KS comorbidity network emphasises knowledgethat can be further studied as disease interplay at the PPI levelThe three downregulated comorbidity nodes and comorbidity-linking hubs POMC LEP and IL4 are involved in the regulation ofenergy balance and the immune system both known to be af-fected by KS POMC is a hormone precursor for multiple peptidesactive across tissues including adrenocorticotropic hormone thatregulates cortisol release and also the three melanocyte stimu-lating hormones that balance food intake and energy consump-tion amongst others LEP is also a long-term regulator of theenergy balance by suppressing food intake and works throughPOMC-containing neurons (39) IL4 is produced by T-cells andplays an essential role in inducing adaptive immune response(40) Possibly IL4 also plays a role in autoimmune diseases whichKS patients have increased risk of including triggering self-antigen presentation in pancreatic islet cells driving the develop-ment of autoimmune diabetes (41) The comorbidity-linking hubEPOR is the receptor for EPO Testosterone stimulates erythropoi-esis through EPO stimulation (42) Thus the observed hypoandro-genism in KS might explain the down-regulation of EPOR

The combined network biology approaches used in this studyemphasised perturbation of lsquocytokinendashcytokine receptor inter-actionrsquo and lsquoJak-STAT pathwaysrsquo immune system alterationsand disturbed EPO signalling and energy balance through POMCand LEP in KS LEP and EPO appear relevant to the increasedprevalence of at least diabetes and obesity in KS patients (43) andmight be used as biomarkers for the clinical management of KSin the future An interesting observation was that both LEP andLEPR were down-regulated Another possible future treatment ofKS might be to add additional XIST molecules edited to comple-ment the escapees like it was done recently for Trisomy 21 (44)This could lead to silencing of the complete extra X chromosomeAdministering such treatment in foetal life has the potential ofKS males being symptom free because it obstructs the primaryinducers of the KS-associated comorbidities

A potential weakness of this study is that the use of registrydata solely consisting of hospital diagnosis data for each pa-tient There are several factors such as socio-economic factorslifestyle and other drug prescriptions that affect the risk of di-seases However it is difficult to obtain such data for the entirepopulation of a full country Thorough follow-up studies shouldinclude as many as these as possible

As already mentioned a significant fraction of KS patientsare never diagnosed The comorbidity profile of KS described inthis study could potentially be used to identify undiagnosed pa-tients However the use of a disease trajectory approach similarto the one published earlier (4546) where the temporal order ofdiseases is incorporated would presumably be more powerfulcompared to the comorbidity profile described here This will bethe target of further work

Although the expression data from different individuals ingeneral was in good agreement the cohort is admittedly smallNevertheless we pointed out the importance and power of dataintegration approaches in particular the integration of comor-bidities with other biological data types In future studies wewill increase the number of samples in order to further confirmthe results In this work a first attempt has been done providingrelevant results consistent with previous studies and addingnovel biological aspects of KS

1225Human Molecular Genetics 2017 Vol 26 No 7 |

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 9: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

In this study several novel and less clinically established KScomorbidities were identified in an unbiased data-driven waySeveral genes in the identified co-expressed modules networkhubs and clusters may be promising biomarkers for preventionor treatment of common KS comorbidities The data-driven epi-demiological approach presented here can be applied to any an-euploidy and disease in general and shed light on their dosage-provoked comorbidities established as well as non-establishedones pointing at their aetiology and molecular interplay

Materials and MethodsKlinefelter syndrome comorbidities

Comorbidities were identified from the over-occurrence of KSwith another disease X compared to individuals not having KSIn this study we used diagnoses from electronic patient recordsin the Danish National Patient Registry (14) for 68 million pa-tients hospitalized between 1996 and 2014 comprising 90 mil-lion hospital encounters and 87 million unique patient-diagnosis associations at the ICD-10 level-3 (4547)

ICD-10 has a hierarchical structure divided into chaptersblocks level-3 codes and finally level-4 codes which are themost specific Each level-4 code can be lsquoroundedrsquo to its parentlevel-3 code and further to the block or chapter To get largercounts and thereby better statistics we have used level-3 codesThis also reduces a misclassification where an inappropriatelevel-4 code was selected among the level-4 codes under a spe-cific level-3 code

KS comorbidities were extracted using the level-3 diseasecode Q98 lsquoother sex chromosome abnormalities male pheno-type not elsewhere classifiedrsquo but only including patients diag-nosed with level-4 codes Q980 lsquoKS karyotype 47XXYrsquo Q981 lsquoKSmale with more than two X chromosomesrsquo Q982 lsquoKS malewith 46XX karyotypersquo or Q984 lsquoKS unspecifiedrsquo (nfrac14 595)Comorbidities were defined as significantly increased RR of Q98co-occurring with other level-3 ICD-10 codes when compared toa control population Fisherrsquos Exact test was used to assign a P-value to the association using uncorrected counts We definedthe full control population as all patients in the Danish NationalPatient Registry without a KS diagnosis Including all patientsfrom this population as the non-exposed comparison group inthe statistical analysis will give a very skewed number ofexposed (KS) patients versus non-exposed patients Thereforewe sampled a matched background (BG) population including

20 controls per KS patient The choice of 20 control patients perKS patient is a compromise between having enough controls topick up rare diseases but not too many to increase the statis-tical power of the test The controls were matched to have samegender and year of birth as the KS patient

RR is normally defined as

RRfrac14 ethKS w disease XTHORN=ethethKS w disease XTHORNthornethKS wo disease XTHORNTHORNethBG w disease XTHORN=eth BG w disease Xeth THORNthornethBG wo disease XTHORNTHORN

With this definition RR overestimates the association of rarediseases This can be addressed by adding a pseudo count (orcontinuity correction) to each of the four patient groups in thecalculation of the RR For a pseudo count of k RR is defined as

RR frac14 ethKS w disease Xthorn kTHORN=ethethKS w disease Xthorn kTHORN thorn ethKS wo disease Xthorn kTHORNTHORNethBG w disease Xthorn kTHORN=eth BG w disease Xthorn keth THORN thorn ethBG wo disease Xthorn kTHORNTHORN

A study found a pseudo count of 12 to be recommended fora generalization of Fisherrsquos Exact test to N groups (the MantelndashHaenszel procedure) (48) We rounded this to an integer of 1 Forthe comorbidities we required a minimum of five KS patientsdiagnosed with disease X and a RR 15 We utilized BenjaminindashHochberg to limit the FDR to 5 The level-4 ICD-10 subcatego-ries of the extracted KS comorbidities were checked for observa-tions representing phenotypes diverging in opposite directions

Three level-3 codes had phenotypic opposite directed level-4codes E23 lsquoHypofunction and other disorders of pituitary glandrsquoE29 lsquoTesticular dysfunctionrsquo and E34 lsquoOther endocrine dis-ordersrsquo For these three level-3 codes we used the level-4 codesto get more detailed information after seeing that the level-3code was significant

Gene expression data and analysis

Blood samples were available from eight KS males and eightnormal karyotype male controls Total RNA was extracted withQIAamp RNA Blood Mini Kit (QIAGEN Hilden Germany) accor-ding to the manufacturerrsquos protocol RNA quantity and qualitywas determined using NanoDrop (Thermo ScientificWilmington Delaware US) with Bioanalyzer Nano Kit (AgilentTechnologies Santa Clara California US) The samples wereamplified (one round) using the MessageAmp II aRNAAmplification Kit (Applied Biosystems Carlsbad California US)and the aRNA was applied and run on Agilent Human GenomeMicroarrays 44K (Agilent Technologies Santa Clara CaliforniaUS) Hybridization and scanning of the one-color arrays wereperformed according to instructions from the manufacturer(Agilent Technologies Santa Clara California US)

Processing of the gene expression microarray data was per-formed using the R software suite version 331 (wwwr-projectorg) gProcessedSignals were loaded into the limma RBioconductor package (4950) and normalized between arraysusing the quantile normalization procedure (51) Probes were

collapsed for each systematic gene ID by taken the median ex-pression value The age of the KS patients and controls varied(Supplementary Material Table S11) which was corrected forusing the ComBat method (52) implemented in the SurrogateVariable Analysis package (53) A Gaussian mixture model wasemployed to determine a threshold of genuine gene expressionusing the mixtools package (54) The expectation maximizationalgorithm was used to model the log2-transformed intensity sig-nals of the probes using a mixture of three Gaussian distribu-tions Genes with mean of log2-transformed expression valuesover the 95 percentile of two distributions of lowest expressedgenes were considered expressed (cut off 727) and included inthe further analysis The expectation maximization algorithmwas implemented using the normalmixEM function from themixtools R package The limma package (50) was used to per-form differential expression analysis Genes were considereddifferentially expressed if having an absolute fold change 15and an BenjaminindashHochberg-corrected P 005 The raw micro-array data is deposited in ArrayExpress with accession numberE-MTAB-4922 Description of the arrays are available inSupplementary Material Table S11

1226 | Human Molecular Genetics 2017 Vol 26 No 7

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 10: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

Co-expressed gene modules correlating with KS status

The WGCNA package in R was used to define co-expressed genemodules (55) Correlations were estimated using the biweightmid-correlation and a signed weighted correlation network wasused to identify co-expression modules with high TO (18) TheTO is a useful approach to exclude spurious or isolated connec-tions during network construction TO considers each pair ofgenes in relation to all other genes in the network Genes havehigh TO if they are connected to roughly the same group ofgenes in the network (ie they share the same neighbourhood)The TO captures relationships among neighbourhoods of genesand is therefore more robust than pairwise correlation alone forclustering genes by similarity (18)

Modules were defined as branches of a hierarchical clustertree using the top-down dynamic tree cut method (56) The ex-pression patterns of each module were summarized by themodule eigengene defined as the first principal component of agiven module Pairs of modules with high module eigengenecorrelations (rgt 09) were merged A weighted signed networkwas computed based on a fit to scale-free topology A threshol-ding power of 26 was chosen (lowest threshold resulting in ascale-free R2 fit of 08) and the pairwise TO between genes wascalculated which converted pairwise correlation values [11]to TO co-expression values [01] where values close to 1 repre-sented highly shared co-expression neighbourhoods The TOdendrogram was used to define modules using the dynamictree cut method function in WGCNA (55) with a minimum mo-dule size set to 40 genes and the deepSplit parameter set to 2(57) Finally a measure of module-disease and gene-disease cor-relations to KS were calculated Student asymptotic P-valueusing the R function corPvalueStudent of the WGCNA method

The gProfileR package (58) was used to compute the GSEA ofthe differentially expressed genes and the genes in modulesThe background was set to the total list of expressed genesdefined above The hypergeometric distribution was used to es-timate the significance of enriched pathways and processesusing a threshold of FDR 5

Klinefelter syndrome comorbidity associated proteinsand network

In order to identify proteins associated with KS comorbiditieswe created a mapping between the Disease Ontology Database(59) and ICD-10 codes and then used the associations betweendiseases and proteins extracted from the Diseases database(60) Genetics home reference (61) OMIM (62) and the UniProtKnowledgebase (63) Disease sub-networks were built for KSand all extracted comorbidities with associated proteins con-nected with observed physical PPIs We used high confidencePPIs from InWeb_IM (11) using the recommended cut off of con-fidence score 01 A common hypothesis is that proteins asso-ciated to the same disease and thus having some functionalrelationship are more likely to interact than randomly selectedproteins (20) Therefore each disease sub-network was statistic-ally tested for this hypothesis by comparing the observed num-ber of PPIs between the disease proteins compared to 1000random sub-networks of the same size and node degree distri-bution picked randomly in the whole InWeb_IM network of highconfidence interactions P-values were calculated as the propor-tion of random networks with equal or more PPIs than the truenetwork (workflow illustrated in Supplementary Material FigS3) A significance level of P 005 was set as cut off for a func-tional network Disease separations were calculated as

previously published (21) and network topology was investi-gated (see Supplementary Material Text S1) Subsequently theClusterONE algorithm (64) implemented in Cytoscape (65) wasused to extract protein clusters using the default parametersand the InWeb_IM confidence score as edge weights

Supplementary MaterialSupplementary Material is available at HMG online

Conflict of Interest statement None declared

FundingFP7 grant SyBoSS EU 7th Framework (GA N 242129) NovoNordisk Foundation (grant agreement NNF14CC0001)MedBioinformatics EU Horizon 2020 grant (agreement no634143) BioMedBridges EU FP7 Capacities Specific Programmegrant (agreement number 284209) International Research andResearch Training Centre in Endocrine Disruption of MaleReproduction and Child Health (EDMaRC) Funding to pay theOpen Access publication charges for this article was provided byNovo Nordisk Foundation (grant agreement NNF14CC0001)

References1 Bojesen A Juul S and Gravholt CH (2003) Prenatal and

postnatal prevalence of Klinefelter syndrome a nationalregistry study J Clin Endocrinol Metab 88 622ndash626

2 Durrbaum M and Storchova Z (2016) Effects of aneuploidy ongene expression implications for cancer FEBS J 283 791ndash802

3 Kirk I K Weinhold N Belling K Skakkebaeligk N E JensenT S Leffers H Juul A and Brunak S (2017) Chromosome-wise protein interaction patterns and their impact on func-tional implications of large-scale genomic aberrations Cell

Systems 4 1ndash84 Deng X Berletch JB Nguyen DK and Disteche CM

(2014) X chromosome regulation diverse patterns in devel-opment tissues and disease Nat Rev Genet 15 367ndash378

5 Lanfranco F Kamischke A Zitzmann M and Nieschlag E(2004) Klinefelterrsquos syndrome Lancet 364 273ndash283

6 Hoslashst C Skakkebaek A Groth KA and Bojesen A (2014)The role of hypogonadism in Klinefelter Syndrome Asian J

Androl 16 185ndash1917 Barabasi AL Gulbahce N and Loscalzo J (2011) Network

medicine a network-based approach to human disease Nat

Rev Genet 12 56ndash688 Hu JX Thomas CE and Brunak S (2016) Network biology

concepts in complex disease comorbidities Nat Rev Genet17 615ndash629

9 Taylor IW Linding R Warde-Farley D Liu Y PesquitaC Faria D Bull S Pawson T Morris Q and Wrana JL(2009) Dynamic modularity in protein interaction networkspredicts breast cancer outcome Nat Biotechnol 27 199ndash204

10 Lage K (2014) Protein-protein interactions and genetic dis-eases the interactome Biochim Biophys Acta 1842 1971ndash1980

11 Li T Wernersson R Hansen RB Horn H Mercer JMSlodkowicz G Workman C Regina O Rapacki KStaerfeldt HH et al (2016) A scored human protein-proteininteraction network to catalyze genomic interpretation Nat

Methods14 61ndash6412 Ideker T and Sharan R (2008) Protein networks in disease

Genome Res 18 644ndash652

1227Human Molecular Genetics 2017 Vol 26 No 7 |

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 11: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

13 Sun K Goncalves JP Larminie C and Przulj N (2014)Predicting disease associations via biological network ana-lysis BMC Bioinformatics 15 1ndash13

14 Lynge E Sandegaard JL and Rebolj M (2011) The DanishNational Patient Register Scand J Public Health 39 30ndash33

15 Bourke E Herlihy A Snow P Metcalfe S and Amor D(2014) Klinefelter syndrome a general practice perspectiveAustr Family Physician 43 38ndash41

16 Bojesen A and Gravholt CH (2007) Klinefelter syndrome inclinical practice Nat Rev Urol 4 192ndash204

17 Bojesen A and Gravholt CH (2011) Morbidity and mortalityin Klinefelter syndrome (47XXY) Acta Paediatr 100807ndash813

18 Zhang B and Horvath S (2005) A general framework forweighted gene co-expression network analysis Stat ApplGenet Mol Biol 4 Article17

19 Chu C Zhang QC da Rocha ST Flynn RA BharadwajM Calabrese JM Magnuson T Heard E and Chang HY(2015) Systematic discovery of Xist RNA binding proteinsCell 161 404ndash416

20 Goh KI Cusick ME Valle D Childs B Vidal M andBarabasi AL (2007) The human disease network Proc NatlAcad Sci USA 104 8685ndash8690

21 Menche J Sharma A Kitsak M Ghiassian SD Vidal MLoscalzo J and Barabasi AL (2015) Disease networksUncovering disease-disease relationships through the in-complete interactome Science 347 841ndash849

22 Bojesen A Stochholm K Juul S and Gravholt CH (2011)Socioeconomic trajectories affect mortality in Klinefeltersyndrome J Clin Endocrinol Metab 96 2098ndash2104

23 Aksglaede L Skakkebaek NE and Juul A (2008) Abnormalsex chromosome constitution and longitudinal growthserum levels of insulin-like growth factor (IGF)-I IGF bindingprotein-3 luteinizing hormone and testosterone in 109males with 47XXY 47XYY or sex-determining region ofthe Y chromosome (SRY)-positive 46XX karyotypes J ClinEndocrinol Metab 93 169ndash176

24 Aksglaede L Skakkebaek NE Almstrup K and Juul A(2011) Clinical and biological parameters in 166 boys adoles-cents and adults with nonmosaic Klinefelter syndrome aCopenhagen experience Acta Paediatr 100 793ndash806

25 Hunter ML Collard MM Razavi T and Hunter B (2003)Increased primary tooth size in a 47XXY male a first casereport Int J Paediatr Dent 13 271ndash273

26 Schulman GS Redford-Badwal D Poole A Mathieu GBurleson J and Dauser D (2005) Taurodontism and learningdisabilities in patients with Klinefelter syndrome PediatrDent 27 389ndash394

27 Marques-da-Silva B Baratto-Filho F Abuabara A MouraP Losso EM and Moro A (2010) Multiple taurodontismthe challenge of endodontic treatment J Oral Sci 52653ndash658

28 DrsquoAlessandro G Armuzzi L Cocchi G and Piana G (2012)Eruption delay in a 47 XXY male a case report Eur JPaediatr Dent 13 159ndash160

29 Vawter MP Harvey PD and DeLisi LE (2007)Dysregulation of X-linked gene expression in Klinefelterrsquossyndrome and association with verbal cognition Am J MedGenet A 144B 728ndash734

30 Ma Y Li C Gu J Tang F Li C Li P Ping P Yang S LiZ and Jin Y (2012) Aberrant gene expression profiles inpluripotent stem cells induced from fibroblasts of aKlinefelter syndrome patient J Biol Chem 28738970ndash38979

31 Tuttelmann F and Gromoll J (2010) Novel genetic aspectsof Klinefelterrsquos syndrome Mol Hum Reprod 16 386ndash395

32 Passerini V Ozeri-Galai E de Pagter MS Donnelly NSchmalbrock S Kloosterman WP Kerem B andStorchova Z (2016) The presence of extra chromosomesleads to genomic instability Nat Commun 7 10754

33 Sheltzer JM Torres EM Dunham MJ and Amon A(2012) Transcriptional consequences of aneuploidy ProcNatl Acad Sci USA 109 12644ndash12649

34 Durrbaum M Kuznetsova AY Passerini V Stingele SStoehr G and Storchova Z (2014) Unique features of thetranscriptional response to model aneuploidy in humancells BMC Genomics 15 139

35 Wallentin M Skakkebaek A Bojesen A Fedder J LaurbergP Oslashstergaard JR Hertz JM Pedersen AD and GravholtCH (2016) Klinefelter syndrome has increased brain re-sponses to auditory stimuli and motor output but not to vis-ual stimuli or Stroop adaptation Neuroimage Clin 11 239ndash251

36 DeLisi LE Maurizio AM Svetina C Ardekani B SzulcK Nierenberg J Leonard J and Harvey PD (2005)Klinefelterrsquos syndrome (XXY) as a genetic model for psych-otic disorders Am J Med Genet B Neuropsychiatr Genet135B 15ndash23

37 Kocar IH Yesilova Z Ozata M Turan M Sengul A andOzdemir I (2000) The effect of testosterone replacementtreatment on immunological features of patients withKlinefelterrsquos syndrome Clin Exp Immunol 121 448ndash452

38 Bojesen A Juul S Birkebaek N and Gravholt CH (2004)Increased mortality in Klinefelter syndrome J ClinEndocrinol Metab 89 3830ndash3834

39 Ehrlich S Weiss D Burghardt R Infante-Duarte CBrockhaus S Muschler MA Bleich S Lehmkuhl U andFrieling H (2010) Promoter specific DNA methylation and geneexpression of POMC in acutely underweight and recovered pa-tients with anorexia nervosa J Psychiatr Res 44 827ndash833

40 Choi P and Reiser H (1998) IL-4 role in disease and regula-tion of production Clin Exp Immunol 113 317ndash319

41 Mueller R Bradley LM Krahl T and Sarvetnick N (1997)Mechanism underlying counterregulation of autoimmunediabetes by IL-4 Immunity 7 411ndash418

42 Bachman E Travison TG Basaria S Davda MN GuoW Li M Westfall JC Bae H Gordeuk V and Bhasin S(2014) Testosterone induces erythrocytosis via increasederythropoietin and suppressed hepcidin evidence for a newerythropoietinhemoglobin set point J Gerontol A Biol SciMed Sci 69 725ndash735

43 Rocca MS Pecile V Cleva L Speltra E Selice R DiMambro A Foresta C and Ferlin A (2016) The Klinefeltersyndrome is associated with high recurrence of copy num-ber variations on the X chromosome with a potential role inthe clinical phenotype Andrology 4 328ndash334

44 Jiang J Jing Y Cost GJ Chiang JC Kolpa HJ CottonAM Carone DM Carone BR Shivak DA Guschin DYet al (2013) Translating dosage compensation to trisomy 21Nature 500 296ndash300

45 Jensen AB Moseley PL Oprea TI Ellesoslashe SG ErikssonR Schmock H Jensen PB Jensen LJ and Brunak S (2014)Temporal disease trajectories condensed from population-wide registry data covering 62 million patients NatCommun 5 4022

46 Beck MK Jensen AB Nielsen AB Perner A MoseleyPL and Brunak S (2016) Diagnosis trajectories of priormulti-morbidity predict sepsis mortality Sci Rep 6 36624

1228 | Human Molecular Genetics 2017 Vol 26 No 7

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |

Page 12: Klinefelter syndrome comorbidities linked to …Klinefelter syndrome (KS) (47,XXY) is the most common male sex chromosome aneuploidy. Diagnosis and clinical Diagnosis and clinical

47 Roque FS Jensen PB Schmock H Dalgaard MAndreatta M Hansen T Soslasheby K Bredkjaeligr S Juul AWerge T et al (2011) Using electronic patient records to dis-cover disease correlations and stratify patient cohorts PLoS

Comput Biol 7 e100214148 Sweeting MJ Sutton AJ and Lambert PC (2004) What to

add to nothing Use and avoidance of continuity correctionsin meta-analysis of sparse data Stat Med 23 1351ndash1375

49 Gentleman RC Carey VJ Bates DM Bolstad B DettlingM Dudoit S Ellis B Gautier L Ge Y Gentry J et al(2004) Bioconductor open software development for compu-tational biology and bioinformatics Genome Biol 5 R80

50 Ritchie ME Phipson B Wu D Hu Y Law CW Shi Wand Smyth GK (2015) limma powers differential expressionanalyses for RNA-sequencing and microarray studiesNucleic Acids Res 43 e47

51 Smyth GK (2004) Linear models and empirical bayes meth-ods for assessing differential expression in microarray ex-periments Stat Appl Genet Mol Biol 3 Article 3

52 Johnson WE Li C and Rabinovic A (2007) Adjusting batcheffects in microarray expression data using empirical Bayesmethods Biostatistics 8 118ndash127

53 Leek JT Johnson WE Parker HS Jaffe AE and StoreyJD (2012) The sva package for removing batch effects andother unwanted variation in high-throughput experimentsBioinformatics 28 882ndash883

54 Benaglia T Chauveau D Hunter DR and Young DS(2009) mixtools an R Package for Analyzing Finite MixtureModels J Stat Softw 32 1ndash29

55 Langfelder P and Horvath S (2008) WGCNA an R packagefor weighted correlation network analysis BMC

Bioinformatics 9 55956 Langfelder P and Horvath S (2012) Fast R functions for ro-

bust correlations and hierarchical clustering J Stat Softw46 pii i11

57 Voineagu I Wang X Johnston P Lowe JK Tian YHorvath S Mill J Cantor RM Blencowe BJ andGeschwind DH (2011) Transcriptomic analysis of autisticbrain reveals convergent molecular pathology Nature 474380ndash384

58 Reimand J Kull M Peterson H Hansen J and Vilo J(2007) gProfilerndasha web-based toolset for functional profilingof gene lists from large-scale experiments Nucleic Acids Res35 W193ndashW200

59 Kibbe WA Arze C Felix V Mitraka E Bolton E Fu GMungall CJ Binder JX Malone J Vasant D et al (2015)Disease Ontology 2015 update an expanded and updateddatabase of human diseases for linking biomedical know-ledge through disease data Nucleic Acids Res 43D1071ndashD1078

60 Pletscher-Frankild S Palleja A Tsafou K Binder JX andJensen LJ (2015) DISEASES text mining and data integra-tion of disease-gene associations Methods 74 83ndash89

61 Mitchell JA and McCray AT (2003) The Genetics HomeReference a new NLM consumer health resource AMIAAnnu Symp Proc 2003 936

62 Online Mendelian Inheritance in Man OMIMVR - McKusick-Nathans Institute of Genetic Medicine Johns HopkinsUniversity (Baltimore MD) Online Mendelian Inheritance inMan OMIMVR - McKusick-Nathans Institute of GeneticMedicine Johns Hopkins University (Baltimore MD)

63 UniProt Consortium (2015) UniProt a hub for protein infor-mation Nucleic Acids Res 43 D204ndashD212

64 Nepusz T Yu H and Paccanaro A (2012) Detecting over-lapping protein complexes in protein-protein interactionnetworks Nat Methods 9 471ndash472

65 Shannon P Markiel A Ozier O Baliga NS Wang JTRamage D Amin N Schwikowski B and Ideker T (2003)Cytoscape a software environment for integrated models ofbiomolecular interaction networks Genome Res 132498ndash2504

1229Human Molecular Genetics 2017 Vol 26 No 7 |