70
ACTA UNIVERSITATIS UPSALIENSIS UPPSALA 2019 Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Medicine 1599 Human leukocyte antigen in sickness and in health Ankylosing spondylitis and HLA in Sweden JESSIKA NORDIN ISSN 1651-6206 ISBN 978-91-513-0760-2 urn:nbn:se:uu:diva-393317

Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2019

Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 1599

Human leukocyte antigen insickness and in health

Ankylosing spondylitis and HLA in Sweden

JESSIKA NORDIN

ISSN 1651-6206ISBN 978-91-513-0760-2urn:nbn:se:uu:diva-393317

Page 2: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

Dissertation presented at Uppsala University to be publicly examined in Room B41, BMC,Husargatan 3, Uppsala, Thursday, 14 November 2019 at 13:15 for the degree of Doctor ofPhilosophy (Faculty of Medicine). The examination will be conducted in English. Facultyexaminer: Dr. Alison Meynert (MRC Human Genetics Unit, MRC Institute of Genetics &Molecular Medicine, University of Edinburgh).

AbstractNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing spondylitisand HLA in Sweden. Digital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 1599. 69 pp. Uppsala: Acta Universitatis Upsaliensis.ISBN 978-91-513-0760-2.

The human leukocyte antigen (HLA) plays a major role in keeping us healthy, but some ofthe HLA alleles can contribute to disease susceptibility. One example is HLA-B*27, whichconfers increased susceptibility of ankylosing spondylitis and represents one of the strongestgenetic associations found in any common human disease. Ankylosing spondylitis shows astrong sex ratio skew (2-3:1 male to female) and studies confirm the existence of sexual-dimorphism in the presentation of this disease. The genetic predisposition for this, however, hasnot previously been studied.

A Swedish ankylosing spondylitis population was sequenced with a targeted array toinvestigate the existence of sex-specific associations. RUNX3 was revealed to be associated inmales by a univariate test, while aggregate tests revealed the HLA gene MICB to be associatedin females. Functional validation demonstrated that the risk variants in RUNX3 increaseexpression, and MICB changed the transcription factor binding sites. Interestingly, since thedisease involves bone changes, both RUNX3 and one of the MICB variants had effect in thebone cell line, SaOS-2.

In order to help researchers obtain more controls for HLA analysis, an HLA allele bioresource(SweHLA) was generated from 1,000 Swedish genomes. The alleles were typed with three tofour HLA typing software programs and results were combined by an n-1methodology. Thisproduced high quality alleles where the bias from each software program was diminished.

The methodology from SweHLA was utilised to study HLA in ankylosing spondylitis.To investigate both sex-specific predisposition and HLA-B*27 independence, samples weresubdivided into two populations (one population with mixed HLA-B*27 positive and negativesamples and one with only HLA-B*27 positive samples) that in turn were grouped by sex. Inthe mixed population, several alleles were replicated from previous studies. This study alsorevealed three female-specific alleles, two of which were new and one that had previouslybeen associated to the severity of radiological changes. The HLA-B*27 population revealed apreviously unknown protective allele, HLA-A*24:02. Through deeper examination of the HLA-B*27 population, two amino acids in HLA-A, position 119 in the whole set and position 180 inthe male set, were revealed to be protective.

This thesis brings new insight into the genetic predisposition for a sex-skewed disease,demonstrating how sexual-dimorphism can be reflected in the genetic predisposition, hopefullyleading to more similar studies. It also highlights the importance of methodology anddemonstrate the drastic biases that can be imparted by software programs.

Keywords: Disease genetics, Ankylosing spondylitis, typing, Imputation, Inference,Sex-stratified, independent, Association tests, Functional validation

Jessika Nordin, Department of Medical Biochemistry and Microbiology, Box 582, UppsalaUniversity, SE-75123 Uppsala, Sweden.

© Jessika Nordin 2019

ISSN 1651-6206ISBN 978-91-513-0760-2urn:nbn:se:uu:diva-393317 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-393317)

HLA-B*27 ,HLA

Page 3: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

“With method and logic one can accomplish anything.”

― Hercule Poirot (Agatha Christie)

To all those who made a difference

Page 4: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing
Page 5: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

List of papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Mathioudaki A*, Nordin J*, Karlsson Å, Murén E, Hultin-Rosenberg L, Bianchi M, Eriksson D, Pettersson M, Olsson M, Neumann L, Hartmann A, Farias F.H.G, Dahlqvist J, ImmunoArray Development Consortium, Welander J, Klingberg E, Forsblad-d’Elia H, Rosengren Pielberg G, Kastbom A, Cedergren J, Eriksson P, Söderkvist P, Lindblad-Toh K, Meadows J.R.S.* (2019) The sex-stratified genetic architecture of anky-losing spondylitis. Manuscript.

II Nordin J, Ameur A, Lindbladh-Toh K, Gyllensten U, Meadows. J.R.S. (2019) SweHLA: the high confidence HLA typing bio-resource drawn from 1,000 Swedish genomes. Published online on BioRxiv, and submit-ted manuscript.

III Nordin J, Pettersson M, Hultin Rosenberg L, Mathioudaki A, Karlsson Å, Murén E, Tandre K, Rönnblom L, Kastbom A, Cedergren J, Eriksson P, Söderkvist P, Lindblad-Toh K, Meadows J.R.S. (2019) HLA-A con-fers protection in HLA-B*27 positive ankylosing spondylitis. Submitted manuscript.

*These authors contributed equally to this work.

Page 6: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

Author contribution

The papers included in this thesis are the result of collaborative work. To make my contribution clear, a list is provided here:

I Shared first author - responsible for the pipeline to go from fastqs to high quality variants. Performed aggregate tests analysis (including cre-ating the average exon coverage and performing backward elimination) and formed hypothesis around the variants revealed to be associated by the test and generation of ideas for functional validation. Calculated al-lele frequencies, fisher’s exact test and odd ratio from the replication set. Contributed with text, to interpretation of MICB, and revision of manu-script.

II First author - took major part in planning, performed all HLA analysis, took a major role in interpreting the results together with co-authors, wrote first draft, and was partly responsible for revisions of the manu-script.

III First author - took major part in planning, performed all analysis, took a major role in interpreting the results together with co-authors, wrote first draft, and was partly responsible for revisions of the manuscript.

Page 7: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

Related work by the Author

(Not included in this thesis)

Eriksson D*, Bianchi M*, Landegren N, Nordin J, Dalin F, Mathioudaki A, Eriksson G.N, Hultin-Rosenberg L, Dahlqvist J, Zetterqvist H, Karlsson Å, Hallgren Å, Farias F.H, Murén E, Ahlgren K.M, Lobell A, Andersson G, Tandre K, Dahlqvist S. R, Söderkvist P, Rönnblom L, Hulting A.L, Wahl-berg J, Ekwall O, Dahlqvist P, Meadows J.R.S, Bensing S, Lindblad-Toh K, Kämpe O, Rosengren Pielberg G. (2016) Extended exome sequencing identi-fies BACH2 as a novel major risk locus for Addison's disease. Journal of Internal Medicine, 280(6):595-608.

Ramírez Sepúlveda J.I, Kvarnström M, Eriksson P, Mandl T, Brække Norheim K, Johnsen S.J, Hammenfors D, Jonsson M.V, Skarstein K, Brun J.G, the DISSECT consortium, Rönnblom L, Forsblad-d’Elia H, Magnus-son Bucher S, Baecklund E, Theander E, Omdal R, Jonsson R, Nordmark G, Wahren-Herlenius M. (2017) Long-term follow-up in primary Sjögren's syndrome reveals differences in clinical presentation between female and male patients. Biology of Sex Differences, 8:25.

Eriksson D, Bianchi M, Landegren N, Dalin F, Skov J, Hultin-Rosenberg L, Mathioudaki A, Nordin J, Hallgren Å, Andersson G, Tandre K, Rantapää Dahlqvist S, Söderkvist P, Rönnblom L, Hulting A.L, Wahlberg J, Dahlqvist P, Ekwall O, Meadows J.R.S, Lindblad-Toh K, Bensing S, Rosengren Piel-berg G, Kämpe O. (2018) Common genetic variation in the autoimmune regulator (AIRE) locus is associated with autoimmune Addison’s disease in Sweden. Scientific reports, 8(1):8395

*These authors contributed equally to this work.

Page 8: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing
Page 9: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

Contents

Introduction ................................................................................................... 13 Genetic immune-mediated disease ........................................................... 14

Ankylosing spondylitis ........................................................................ 15 The role of MHC genes ............................................................................ 19

In sickness and in health ...................................................................... 21

Aim of this thesis .......................................................................................... 23

Comments on material and methods ............................................................. 25 Study populations ..................................................................................... 25 The path from DNA to variants ................................................................ 26

Reading the DNA ................................................................................. 27 Pipeline: from targeted sequencing to high quality variants ................ 28 Pipeline: from fastq to high quality HLA alleles .................................. 30

Associations and where to find them ........................................................ 34 A test for each occasion ....................................................................... 34 Prioritisation of variants ....................................................................... 36 How to hypothesise .............................................................................. 37

Results and discussion ................................................................................... 39 Paper I: The sex-stratified genetic architecture of ankylosing spondylitis ................................................................................................. 39 Paper II: SweHLA: the high confidence HLA typing bio-resource drawn from 1,000 Swedish genomes ........................................................ 42 Paper III: HLA-A confers protection in HLA-B*27 positive ankylosing spondylitis .............................................................................. 43 Paper I and III ........................................................................................... 46

Concluding remarks and future prospects ..................................................... 49

Populärvetenskaplig sammanfattning ........................................................... 53

Acknowledgement ......................................................................................... 57

References ..................................................................................................... 61

Page 10: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing
Page 11: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

Abbreviations

AD Autoimmune disease AID Autoinflammatory disease AS Ankylosing spondylitis BAM Binary sequence alignment map bp Base pair BWA Burrow-Wheeler aligner CNV Copy number variation DNA Deoxyribonucleic acid EMSA Electrophoretic mobility shift assay GATK Genome analysis toolkit GWAS Genome-wide association study HLA Human leukocyte antigen IMGT Immunogenetics indel Insertion/deletion L Leucine Mb Mega base MHC Major histocompatibility complex MICB MHC class I polypeptide-related sequence B NGS Next-generation sequencing NSAID Non-steroidal anti-inflammatory drug OR Odds ratio P3H1 Prolyl 3-hydroxylase 1 Q Glutamine RUNX3 Run-related transcription factor 3 SKAT Sequence kernel association test SLE Systemic lupus erythematosus SNP Single nucleotide polymorphism SNV Single nucleotide variant SV Structural variation TFBS Transcription factor binding site TNF-α Anti-tumour necrosis α vcf Variant call format WGS Whole-genome sequencing

Page 12: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing
Page 13: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

13

Introduction

With genetics being key to this thesis, we start with the father of genetics, Gregor Mendel who experimented with peas already in the 1860s1. He no-ticed that some phenotypes of the peas, such as seed shape and colour, seemed to be transferred from one generation of plants to the next1. He theo-rised that this was determined by a factor (known now as genes conveyed in the genetic material, deoxyribonucleic acid; DNA) that could have either a dominant or recessive effect1. These monogenic traits are therefore named after him, Mendelian traits2.

A well-known example of a Mendelian disease is Huntington’s disease which is caused by repeats in the IT15 gene3. Huntington’s disease is a dom-inant trait3, meaning that if one parent develops Huntington’s disease their child will have a 50-100% risk of also developing the disease depending on if the parent is heterozygote (one copy) or homozygote (two copies)3. Since the healthy copy of IT15 is recessive, you will need this in two copies (one from each parent) to not develop disease. As such, if both parents have the disease, there is only a 0-25% chance that the child will be disease-free3.

Another well-known genetic condition is colour blindness4. In the most common case of colour blindness, deuteranopia (red/green colour blindness), a locus containing two genes, OPN1LW and OPN1MW, on the X-chromosome is the cause4,5. These genes do not exist on the Y-chromosome, which explains why the condition is more prevalent in males. If they get a defect copy of chromosome X there is no other chance to buffer this with normal genes on another X-chromosome, like there is for females4. If a mother is a carrier, her male child have a 50% chance of inheriting the con-dition, since if they inherited the defect copy there is no healthy one availa-ble. For a female child to inherit the condition, both X-chromosomes (inher-ited from the father and mother) need to carry the defect genes. This is a recessive phenotype, which can act a bit differently because of the lack of copies of the locus on the Y chromosome.

Huntington’s disease and colour blindness are examples of simple inher-itance, but it is more often than not more complicated to understand what the cause of phenotypes are. The cause of a phenotype can result during muta-tions in the DNA either at the germ line (egg or sperm) or it can result during development of the embryo. To complicate it even more, not all disorders can be explained by just one gene or locus. Those that do not fit the “one gene cause” description are instead called polygenic or complex diseases.

Page 14: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

14

Although monogenic and polygenic are different, the approach to study complex diseases has developed from the Mendelian approach2.

It might sound simple; as if you have a certain genetic material then you will get a specific set of phenotypes. Unfortunately, it is seldom as simple as affected individuals (cases) all having one genetic variant, and unaffected (controls) having the opposite. Instead, it is more common that one group of individuals have an enrichment of a particular genetic variant. Not only that, there can be additional factors that contribute to development of some dis-eases, or genetic traits generally, e.g. environment and infections6. That is why two genetically identical individuals (monozygotic twins) might not develop the same disease, as some other factors might differ, like the envi-ronment6. In essence, studying genetic diseases is not always black and white, and that makes the process more difficult.

Genetic immune-mediated disease Genetic disorders can be divided into several groups, e.g. cancer and devel-opmental disorders to mention a few. I will focus on one group, namely im-mune-mediated diseases, which includes genetic diseases that affect the im-mune system in one way or another. Sometimes immune-mediated diseases are for simplicity divided into two classes, autoimmune (AD; where acquired immune system cause disease by attacking self-cells) and autoinflammatory diseases (AID; where innate immune system cause disease by creating in-flammation without an external trigger). It has been 20 years since the term AID was coined in the field of medicine7,8 when tumour necrosis factor re-ceptor-associated periodic syndrome (TRAPS) was described9–11. Since then the immune-mediated disease classes have slowly transformed12,13.

The disease classification has gone from a strict two camp policy, where the disease is either autoimmune or autoinflammatory, to a continuum with a class in each end and different degrees of mix of the two in-between (Figure 1)7–12,14,15. When functioning normally, the innate immune system is the first responder to foreign molecules inside or outside of the cell, and also helps activate the acquired immune system by presenting potentially foreign pep-tides13. Seeing how closely they work together, maybe it is not surprising that features from both the innate and acquired immune system can be seen in the presentation of many of the immune-mediated diseases16. A good ex-ample of a disease with a mix of autoimmune and autoinflammatory features is psoriasis. This is considered as an autoinflammatory disease, but autoanti-bodies against keratin have been observed in around 28% of the patients17. Autoantibodies is a part of the acquired immune system and therefore points toward an autoimmune disease.

Immune-mediated diseases are very diverse in both presentation and cause for both Mendelian and complex diseases (Figure 1). Even though

Page 15: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

15

there is a high genetic diversity, the symptoms often overlap making it chal-lenging to diagnose the specific disease, not to mention the challenge its subsequent treatment is18,19. In some cases, genetic information can help in the process of diagnosis to distinguish between possible diseases. Systemic autoinflammatory disease support (saidsupport.org) has compared genetic testing for autoinflammatory panels available in the US, with the largest testing panel available from Blueprint genetics. This panel, for different im-mune-mediated disorders, covers 274 genes. Although this might sound like a lot, there are still diseases without a genetic cause or, like for most of the complex diseases, where only a small fraction of the genetic cause has been explained14. An example of an immune-mediated disease where only a small part of the genetic cause has been discovered is ankylosing spondylitis (AS), which we will dive deeper into.

Figure 1. Immune-mediated diseases are often a mixture of autoimmune and autoin-flammatory processes. This figure illustrates how each disease can have different degree of autoimmune or autoinflammatory elements, and how its clinical presenta-tion can range from organ-specific to systemic disease. Grey boxes are Mendelian diseases and white boxes are complex disease. Adapted from 14 and 10,12.

Ankylosing spondylitis Clinical presentation AS is a rare chronic immune-mediated disease, with a prevalence of 0.1-1.4% in the general worldwide population20 (0.09%-0.18% in Sweden21–23 and 0.24% in Europe21). Contrary to most immune-mediated disease, AS is

Page 16: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

16

more common in males than females (in Europe 2-3:1 ratio19,24, in Sweden 1.6:123).

The name is descriptive of the disease; ankylosing is stiffness of a joint where the bones on both sides of the joint start to fuse together, and spondy-litis describes the cause of stiffness and the location, namely, inflammation of the vertebrae of the spine. As the name indicate, the main characteristics of AS are inflammation of the axial skeleton and sacroiliac joints. The clini-cal signs of disease, however, can manifest differently from case to case19,20,23,25,26. Diagnosis is made even harder by the presence of co-morbidities, a common feature in AS patients, with 40-60% of cases also having subclinical gut inflammation17,27,28, 20-40% uveitis (a form of eye inflammation)17, around 30% peripheral arthritis29 and 10-30% with psoriasis17,19.

As many other immune-mediated diseases, AS often takes years to diag-nose. This enables disease progression, which is significant to eventual prognosis, as the first 10 years are the most critical for loss of function caused by the disease30. Ankylosing, in this case vertebrae fusion, is irre-versible, thus prompt treatment is critical to halt the spine from turning into a bamboo-like structure19. Inflammation of the spine leads to stiffness and back pains, which in turn leads to problems with standing up straight (ky-phosis; Figure 2)19. The back pain can not be relieved by rest, which can cause sleep problems by preventing a full night’s rest with the pain keeping you awake31. It can severely affects quality of life and increases the risk of depression by 60%32, a similar pattern is seen in other chronic diseases, such as arthritis and cancer33. As the disease progresses, the back begins to hunch, changing the angles of the ribs and limiting the volume of the chest cavity, which in turn constricts the ability to take deep breaths34 (Figure 2).

AS diagnosis is based upon fulfilling the modified New York criteria34. These criteria consist of two parts, diagnosis and grading34. Diagnosis in-cludes clinical (low back pain for more than three month, less mobility of the lumbar spine, and decreased chest expansion) and radiological (signs of in-flammation in the sacroiliac joints) criteria34. The grading tells if it is a cer-tain case of AS (radiological changes and one of the clinical criteria present) or if it is a probable case (when all clinical signs are present or only the radi-ological criteria are fulfilled)34.

While there is no cure, there are several options of treatment that can help against pain, increase mobility, decrease inflammation and overall slow down progression of disease19,35,36. Physical therapy is always recommended to pa-tients as a part of increasing mobility37. Normally the first medication adminis-tered as a treatment is non-steroidal anti-inflammatory drugs (NSAIDs)37. If that fails, available alternative treatments include anti–tumour necrosis α (TNF-α) inhibitor agents and different monoclonal antibodies (targeting inter-leukin-17A, CD20+ B cells, and interleukin-12/23)38–42. These treatments have been shown to reduce the signs and symptoms of disease, but there is no

Page 17: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

17

treatment yet showing that the progression of structural damage is affected38–

42. Additional drugs for treatment of AS are being developed, including in-terleukin-6 receptor inhibitors42.

Even though the disease is highly heterogeneous, it is possible to see dis-tinct groups of cases with similar clinical presentations or treatment re-sponse43. Females experience higher disease activity, while males have more kyphosis and radiological changes43,44. When treated with TNF-α inhibitors, females typically respond poorly and needs to change TNF-α inhibitor treatment more frequently than males43,44. This could be explained by the generally higher body fat proportion content in females that may affect the response to treatment44. Another division in the clinical presentation can be made by genetic predisposition, namely human leukocyte antigen B*27 (HLA-B*27) status. Individuals that are HLA-B*27 positive have an earlier onset of disease (before 35 years old)45–49, more often have a family history of AS (22.5% versus 15.6%)45, higher disease activity (with more sacroilitis49, radiological changes of the spine45,49, joint involvement45,47–49), and uveitis47–49.

Figure 2. One of the symptoms of ankylosing spondylitis is that the spine becomes bent and stiff (kyphosis) 19. To the left is the posture of an individual that suffer from ankylosing spondylitis and to the right that of a healthy individual (spine highlighted to illustrate the impact of AS on posture).

Genetics The high heritability of disease show that ankylosing spondylitis is largely caused by genetic predisposition17,28. Monozygotic twin and family studies have revealed a heritability over 90%50. In 1973 three studies were published describing the existences of a link between the allele HLA-B*27 and AS51–53. These studies had 88%-96%51–53 HLA-B*27 positive cases in their studies, while today it is more often said that the expected number of HLA-B*27

Page 18: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

18

positive AS cases is above 80%19,54. Even though the vast majority of disease cases carry the allele, less than 7% of HLA-B*27 positive individuals (de-pending on ethnicity) develop disease19,25,53,54.

It has been almost 50 years since the association between HLA-B*27 and AS was revealed, but it still remains one of the strongest genetic associations discovered in any human disease (with p-values down to 10-300 55)17,35,56. Even though a lot of time has passed since this association was found, the question of how it contributes to disease, its role in disease and what mo-lecular disease mechanisms it has, is still unknown14,17,56,57. There have been different hypotheses describing HLA-B*27 involvement in disease causation (Figure 3), but so far none have been proven correct14.

A strong association, however, does not mean that it fully explains the heritability17,19, in fact only 20% can be explained by HLA-B*2720,28,35,56,58. This is not that surprising seeing as only a small fraction of the individuals with the allele develop the disease. It clearly demonstrates that there must be other genes involved in AS17,36,56,58,59. It is hypothesised that at least 10 genes, but maybe as many as thousands of genes, could be contributing to disease susceptibility35,56. In total, around 30% of the heritability has been explained, as mentioned, 20% by HLA-B*27 and the remaining 10% comes from 45 other loci that have been associated to disease20,28,35,56,58. Still more variants and loci have been associated since this calculation.

To this date, more than 100 variants, located in around 50 loci, have been associated to AS with genome-wide significance17,19,20,28,35,36,55,58,60–62. The majority of the associations have been identified through genome-wide asso-ciation studies (GWAS), while a few others have been found using HLA imputation from an Immunochip63 and an exome-wide association study28.

Despite these associations, the causative picture of disease is not under-stood. Since a majority of the cases have subclinical gut inflammation one hypothesis is that the start of disease is exactly that56 (Figure 3). That the microbiome becomes disturbed such that the gut mucosal immunity, with the disease associated interleukin-23 receptor and others, overreacts and drives inflammation throughout the body56. Both endoplasmic reticulum aminopep-tidase (ERAP) 1 and 2, and HLA-B*27, are part of the aminopeptidase path-way, supporting a hypothesis where the peptide presentation pathway be-comes defective, which leads to disease36,56(Figure 3). There is more than one idea here; either peptides are not being processed properly, or the protein from HLA-B*27 is being misfolded leading to a endoplasmic reticulum (ER) stress response36,56. These hypotheses and associated genes are illustrated in Figure 3.

Page 19: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

19

Figure 3. Many genes have been associated to disease with different hypotheses of their role. A) In the ER, proteins from the ERAP1 and ERAP2 genes are suspected to abnormally cut peptides. The result is that HLA-B*27 either is misfolded, causing the unfolded protein response, or the ability to bind to peptides are changed. B) The peptides presented by HLA-B*27 might activate a cytotoxic response. C) In the gut, the microbiome in AS cases are altered compared to healthy controls, and many of the genetic susceptibility genes are predicted to have a role in inflammation of the gut. D) The processes mentioned would lead to the release of cytokines that travels through the bloodstream to e.g. the spine, where inflammation flares up. The cyto-kines also affect other processes in the body and are predicted to activate bone ero-sion. Adapted from 56,62,64.

The role of MHC genes HLA-B is one of the genes located in the major histocompatibility complex (MHC), which is a key gene region for the immune system in not just hu-mans but in all jawed vertabrates65. In humans, the MHC contains more than 200 genes with around 40% of them having immunological function66. Lo-cated on the short arm of human chromosome 666, this approximately 4 Mb region contains the human leukocyte antigen (HLA) genes, which are divided into classes depending on which cells they are expressed in65 (Figure 4). Class I molecules are expressed in all nucleated cells in the body, while class

Page 20: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

20

II molecules are located on cells specified to present antigens to T-cells65,67. The HLA genes are some of the most polymorphic genes in the human ge-nome, and new variations continue to be discovered66,68,69.

At the end of the 20th century, the international ImMunoGeneTics project established their HLA database (IMGT/HLA database) in an effort to gather all known polymorphisms of these genes in one place70,71. The number of alleles in this database has expanded six fold during the past ten years, 2008-201872,73, for both HLA class I (from ~2,500 to ~15,500), and class II (from ~1,000 to ~6,000 alleles)72,73.

With this vast diversity of alleles, a robust system for nomenclature is re-quired. Each version of a gene is called an allele of that gene. All alleles from e.g. HLA-A would have an allele name starting with that gene. Depend-ing on the serotype the alleles have they are divided into 1-field groups e.g. HLA-A*24, and the next division is based on the specific protein an allele creates, 2-field, e.g. HLA-A*02:24. Synonymous variation in the coding region can be found in 3-field and the 4-field identifies changes in the non-coding region. However, there are more ways to divide HLA alleles, such as by their G-groups (which means all alleles that are identical in the binding groove, exon 2 and 3 for class I and exon 2 for class II, are grouped togeth-er). Some software, and often sequence based lab-typing, genotype on a G-group level.

The high polymorphism is not the only reason why this region is hard to work with. There is also high sequence similarities between genes and high linkage disequilibrium (LD) in the region65. For example, HLA-DRB1 and HLA-DRB5 are two separate genes with very high (>90%) sequence similari-ty of the coding region74. Also, when alleles are in high LD, it can be hard to pinpoint which one of the alleles that is the cause of an association signal. Alleles from a group of genes can create haplotypes. In 2004, two MHC haplotypes, PGF and COX, were described based on homozygote human cell lines, which also gave the names to the haplotypes75. These two cell lines were chosen not only because they represented some of the most common haplotypes in Europe but also because they were carrier of autoimmune dis-ease haplotypes of four alleles each75. More than 95 MHC haplotypes have now been characterised and named with a focus on eight genes (HLA-A, -B, -C, -DRB1, -DQA1, -DQB1, -DPA1 and -DPB1)74 and depending on the number of genes, and which genes, several haplotypes might look the same. In the British population, when considering the first six genes for the haplo-type, COX (and VAVY since they share those six alleles) has a frequency of 7.51% and PGF 3.03%76. These haplotypes have relatively low frequency and the more genes you include, the lower the frequency will become; proof of how diverse individuals are when it comes to the immune system.

Page 21: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

21

In sickness and in health We come in contact with foreign molecules every day; molecules that do not naturally occur in our bodies. With a working immune system, these foreign molecules seldom bother us, or we may get a fever as the immune system is activated to rid us of harmful intruders.

MHC class I and II molecules have slightly different roles in protecting us from foreign molecules (Figure 4). Class I genes help the immune system to see that our own cells are healthy by presenting peptides from inside of the cell to CD8+ T-cells65,67. All nucleated cells in our body express class I genes and if a foreign molecule, e.g. virus, invades a cell it will present these derived foreign peptide65,67. The presentation of the foreign peptide in turn leads to activation of T-cells, which will kill cells that present this peptide65,67.

Class II is only located on cells that can engulf and breaks down other cells, e.g. dendritic cells65,67. These cells have the job to clean up in the body and to present what they found. They can break down a bacterium, and will present peptides derived from it to CD4+ T-cells65,67. Other HLA genes (HLA-DM and TAP) are translated to proteins that play a role in placing the peptide in the binding groove of the peptide presenters65,67.

Since there are many foreign molecules, the repertoires of HLA genes need to be large in order to bind peptides derived from the vast diversity of possible invaders77. Not only are the genes highly polymorphic but it is also common that individuals are heterozygous for HLA genes77. HLA genes are co-dominant, such that both molecules are produced in heterozygous indi-viduals, so increasing the chance to recognise an intruder since each allele will present slightly different peptides77.

Figure 4. MHC is located on the short arm of chromosome 6. The genes are divided into different classes depending on their function, class I, II, III and class I-like genes.

Page 22: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

22

HLA genes do not only have an important role in keeping us healthy, they can also be an imperative part in causing disease. Even though the HLA genes are essential for the immune system to work properly, they can also be the cause of malfunction. HLA class I and class II proteins have important roles in immune recognition, which can impact processes from organ trans-plantation, to disease and infection susceptibility (including immunological diseases, cancers and neuropathies), to drug response and pregnancy 68,69,78,79.

The name MHC actually comes from the role many of the genes have in transplantation65. It is usually in that context that people might have heard the name HLA, when signing up for being a bone marrow donor or hearing about transplantations of organs. To be able to transplant an organ or bone marrow, HLA genes must be similar, and the more HLA genes that match between donor and recipient, the better65,78. The lower limit for what is ac-cepted for a transplant is one serological mismatch for HLA-A, -B and -DRB1, but this depending on what is being transplanted and the hospital routines, so it can be up to 2-3 mismatches if HLA-C is included in the genes considered80. HLA proteins are part of how the body recognise what is self and what is foreign (“non-self”) 65,78. It is, thus, important that the proteins look the same on the cells, otherwise, the binding to the T-cells could be affected, and alert the immune system that a normal cell from a donated or-gan is foreign65,78. If the immune system notices that there are foreign cells, even if they are actually good for the body, it will create antibodies against them and start to attack, causing what is called host versus graft syndrome and can result in transplant rejection65,78.

That is an understandable immune response, to reject something foreign found in the body. However, the immune response does not always work as it should, where the immune system turns against self-cells and tissues, caus-ing autoimmune disease65. The detailed mechanisms behind this are unclear but HLA genes are often implicated in genetic predisposition to autoimmune disease65. This is also true for autoinflammatory diseases where spontaneous inflammation occurs without any signs of infection or damage65.

HLA genes are essential when they function normally but their contribu-tion to disease is enigmatic65. They can even be a problem when they are not the cause of disease, by causing drug sensitivty65. Here, as in most cases, several concepts have been devised trying to explain this, but with no proof for any of them65.

Page 23: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

23

Aim of this thesis

The overall aim of this thesis was to investigate the genetic predisposition of ankylosing spondylitis, and the HLA allele distribution in Sweden. A further aim was to reveal stratified signals based on sex and/or HLA-B*27 status in ankylosing spondylitis.

The specific aims were to:

1. Delve into the genetic predisposition of AS in Sweden with help of tar-geted sequencing data a. Investigate the existence of sex-stratified genetic associations b. Dissect the association signal from the MHC region by studying

HLA allele associations c. Explore HLA-B*27-independent HLA associations

2. Create a Swedish HLA bio-resource, SweHLA, using data from healthy Swedish individuals for the research community at large

3. Develop a methodology to obtain robust HLA genotype calls based on NGS data and existing software programs

Page 24: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

24

Page 25: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

25

Comments on material and methods

Description of material and methods can be found in each respective paper. This is a summary of the populations and methods used in order to help the understanding of the results and discussions.

Study populations The Swedish population has been the focus of the studies included in this thesis but this is not just because of the fact that we are in Sweden. AS is included in two of three papers in this thesis, and the third paper contains the creation of a resource that was built by us to study AS in an accurate man-ner.

Why is Sweden a good population to study to understand the genetics be-hind AS? AS is commonly studied in population of European ancestry (Brit-ish) or Han Chinese. These populations differ in predisposition for AS, e.g. the HLA-B*27:05 allele is the most commonly associated with disease in Caucasian populations whereas for Han Chinese it is HLA-B*27:0442. Even though the Han Chinese population has a higher prevalence of AS (0.2-0.54%81) than United Kingdom (0.15%72), using another Caucasian popula-tion to study disease association could help in finding the missing heritability and to dissect haplotypes. Comparing prevalence between studies is hard, however, since there are multiple ways to perform the analysis82. As men-tioned in the introduction, Sweden has an AS prevalence of 0.18%23 and the South of Sweden is a more homogenous population compared to the ones of a British or mixed European ancestry (with higher regional linkage disequi-librium83,84, pair-wise comparison p-value < 0.00284). Theoretically, fewer samples should thus be required to identify associations. Seeing as AS in Sweden clinically mirrors that of Europe21,23,24, Sweden also has a genetic background that sets it apart (with around 9 million unique variants85), which can help in finding new associations relevant for other populations. Further-more reason is that Sweden is one of the countries with the highest frequen-cy of HLA-B*27:05 in the world at 14%34,86,87, compared to an United States population of European ancestry with an allele frequency of 3.3%88 and a British population with allele frequency of 4.2%76.

Paper I includes, after quality controls, 310 patients (ankylosing spondyli-tis) and 381 healthy (blood donors) individuals from the South East of Swe-

Page 26: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

26

den (Linköping, Jönköping and Örebro). Our collaborators from Linköping University collected these DNA samples and ankylosing spondylitis diagno-sis was based on the modified New York criteria. These cases were enriched for HLA-B*27 positive samples and therefore represents a higher frequency of HLA-B*27 than a representative AS population. For replication, a second dataset containing 619 samples were collected (317 cases and 302 controls, where above 84% and below 13% of individuals were HLA-B27 positive, respectively) including samples from our collaborators at Linköping Univer-sity and the University of Gothenburg. Of note, the replication set from Gothenburg was an AS population without any signs of subclinical gut in-flammation.

Paper II includes sample data from the Swedish whole-genome data re-source (SweGen)34. SweGen34 contains 1,000 healthy individuals from a cross-section of Sweden, with samples from the Swedish Twin registry (only one of a pair) and the Northern Sweden Population Health Study.

Paper III includes next-generation sequence data from the discovery pop-ulation in paper I, and the healthy Swedish population in paper II. In addition to these samples, data for 815 healthy individuals from the Uppsala Bio-resource89 were included (comprised of blood donors from Uppsala and Stockholm), making the total number of controls 2,196. These extra controls were included in order to obtain a control population large enough for a case/control study with only HLA-B*27 positive individuals.

The path from DNA to variants The idea that led to this thesis came about before I joined the project. There was an interest to study immune-mediated disease in an alternative way to the standard method of genome-wide association studies (GWAS).

GWAS has been a big help in discovering common variants associated to disease, e.g. Alzheimers90, B cell lymphomas91 Crohn’s disease6, multiple sclerosis92, and spondyloarthritis93. Single nucleotide polymorphism (SNP) markers are spread over the whole genome at known polymorphic sites for GWAS. In a typical GWAS, 90% of the associations found are non-coding genome regions and thus can be hard to interpret6,94. The associated variants identified might not be the causative ones, but might instead be in linkage disequilibrium with the real culprit, which was not targeted with GWAS6,95. When finding a signal with GWAS, the associated region needs to be studied further to be able to say which SNP, or other variant type, is the causative one.

There are several techniques to go from a sample containing DNA (e.g. blood), to data files with the information of the order of the bases. The breakthrough for reading DNA came with the development of Sanger se-quencing96, the first-generation of sequencing techniques. As with most

Page 27: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

27

technology, sequencing techniques are continuously improving. Since 1977 and Sanger sequencing, the techniques to sequence DNA have moved for-ward in great leaps, especially after the success of sequencing the draft hu-man genome in 200197–99. During these years, technologies have moved past the next-generation (from sequencing one DNA fragment at the same time to millions) to third-generation (from hundreds of bp per read to tens of thou-sands) sequencing methods. In this thesis, samples have been sequenced with next-generation sequencing (NGS), specifically using short reads tech-nology on Illumina HiSeq 2500 or X platforms. With NGS, it is possible to find rare variants, see which variants that are in LD in the same region, and study also structural variations6, overcoming some of the disadvantages of GWAS.

In 2012, whole genome sequencing (WGS) was still very expensive (and still is in comparison to other options). In order to get the benefits of WGS without actually targeting the whole genome, a custom-made targeted array was developed covering around 1% of the human genome89 (Figure 5). In order to be used for a handful of immune-mediated diseases the regions for sequencing were carefully selected for targeting89. Roughly 1,900 genes were selected based on their previous implication in immune-mediated dis-ease in several animals, e.g. humans and dogs, or for their role in immuno-logical pathways89. The sequences in and around these genes, including con-served regions100 within 100 kb up- or downstream of the genes were includ-ed in the target, allowing the study of regulatory regions in addition to pro-tein coding sequences89 (Figure 5).

Figure 5. Illustration of the targeted array. Roughly 1,900 genes (green) and their conserved regions, including extra genes (orange), 100 kb up- and downstream of the genes were targeted.

Reading the DNA Illumina sequencing Samples sequenced for Study I were sequenced on one Illumina HiSeq 2500 (v3 chemistry) lane, using 100-bp paired-end reads with eight barcoded samples. The extra controls for Uppsala Bio-resource were sequenced the same way (some samples with v4 chemistry), while SweGen were sequenced with Illumina HiSeq X (v2.5) using 150-bp paired-end reads.

I will provide a quick walkthrough of the basics of Illumina sequencing (Figure 6). In my case, DNA is fragmented by restriction enzymes and ligat-

Page 28: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

28

ed with oligos101,102. In the Illumina flow cell, bridge amplification creates local clusters of the same fragment on the flow cell101,102. By sequence-by-synthesis, copying the strand with bases will emit a fluorescent signal at the same time as they are built into the nucleic acid chain, each base with a dif-ferent colour101,102. This creates reads that are copies of the fragmented DNA and can be put back together (aligned) to a reference DNA sequence.

Figure 6. Illumina HiSeq sequencing method. This illustrates how the DNA is 1) fragmented into a 2) library, 3) the library is fixed on a flow cell and forms clusters, and 4) is read at the end. For the final stage, different colour light is emitted, is de-tected and translated into bases.

Pipeline: from targeted sequencing to high quality variants Sequencing produces a file with all the reads in it, a fastq file, which contain roughly 100-bp sequence reads, but does not tell you from which part of the genome the reads comes from. By taking the reads and aligning them to the human genome, it is possible to place the reads in the right place. We im-plemented Burrows-Wheeler Aligner (BWA)103 for this (Figure 7). BWA is not the most accurate tool on the market but it is a good option when running many samples. When weighing up the time of execution with the accuracy gained, BWA has a good combination of speed and accuracy. The reads are scored based on of how well they match to the genome where they are placed, e.g. if they are placed next to its pair and there is no other read that looks the same (duplicates) they are higher quality. There are differences in the genome between individuals, which can make it harder to align reads, e.g. in the case of insertions or deletions (indels). Genome analysis toolkit104–

106 (GATK) have several modules, including one that realign reads around potential indels to improve the overall alignment (RealignerTargetCreator and IndelRealigner). Picard (http://broadinstitute.github.io/picard) is used to mark duplicates to exclude them from downstream analysis (Figure 7). The

Page 29: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

29

quality score is recalibrated based on a list of true single nucleotide variants (SNV) that are submitted to another of GATKs modules, BaseRecalibrator (Figure 7). It inspects how true variants look in the dataset and change the quality score accordingly. GATK modules are used for calling variants, first by individually using HaplotypeCaller and creating a genomic variant call format (gvcf) file, which uses genotype probabilities for the different geno-types (Figure 7). This module discovers locations where there might be a variant and, disregarding the alignment, re-aligns the region to make sure it is a variant and not an alignment or mapping error. Genotypes are afterwards called on a population/cohort level (all samples/gvcfs together) through GenotypeGVCFs, helping with calling the correct variant (Figure 7). If there are several individuals that are called as variant it affects the probability that others might be variant for that position as well.

Figure 7. The pipeline for going from fastqs to association analysis ready variants. This illustrates how the pipeline from variant calling is part of the pipeline for HLA allele calling, with imputation and inference branching off at different points

Out with the bad and in with the good The variants called are not all of good quality, and bad quality variants needs to be filtered out. The first basic step is to make sure that the variant had a chance of being good. With a larger number of reads supporting a variant, it is more likely that that variant is true. The first round of filtering is per-

Page 30: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

30

formed on individual genotypes based on genotype quality and read depth106,107. By filtering variants with less than 8 in read depth, the variants with low chance to detect heterozygote calls are filtered out107 (Figure 7). Each variant also has a genotype quality score that range from 1-100, we filtered variants with less than 20 in quality score107 (Figure 7). These filters remove the genotype calls for a single individual, but not the whole position, from analysis.

The second step is to remove positions that are bad. This is done by an-other one of GATK modules, VariantRecalibrator, a machine learning based approach where you provide files with real variants of different qualities (Figure 7). The program uses provided data to make a cut-off based on how the scores for real variants look in the dataset, and make a rule for what is real. This process filters out some of the positions suspected of bad quality, and we also applied an extra hard filter to remove positions called in less than 85% of the samples (Figure 7).

Bad samples were removed based on if they had lower than 80% call rate, had a high level of singletons, high level of heterozygosity, or discordant sex (Figure 7). The thresholds for what were high levels were determined empir-ically based on values for the data as a whole.

Pipeline: from fastq to high quality HLA alleles This pipeline is, to some degree, based upon the previous one described (Figure 7). Imputation software programs need high quality variants and therefore utilise the variant pipeline, while the inference software programs branch off quickly into its own path (Figure 7). HLA typing uses estimations by software programs based on NGS read data, and neither these software programs nor NGS reads are perfect. In particular, there might be inherited biases in many software programs, which during our experiments this was shown to be true. To get past this issue, a combination of software programs was used. Since not all software programs have the ability to call alleles for all genes, the ones used were chosen with that in mind.

Even though Sanger sequence-based typing (SBT) is considered the gold-en standard for genotyping HLA alleles, considerable effort has been made to use NGS data to extract this information108. The different lab typing ap-proaches are both time-consuming and expensive, especially if you already have the NGS data available108. While sequence-based lab typing often call HLA alleles at a G-group level (the exons creating the binding groove is the same), software programs have the ability to type up to 4-field resolution108. It is, however, hard to say how accurate this would be since not many soft-ware programs use the data needed for this level of resolution108. Lab typing can of course go to higher resolution too, but it is not as simple as it sounds because of the sequence similarities in the region. NGS based methods have been limited by the fact that they rely on the presences of known alleles, but

Page 31: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

31

this is beginning to change109. There is a group of software, not used here, that uses G-groups as resolution and tries to find new alleles, e.g. Kourami109 and HLA-PRG110.

Since the protein level of the gene was of interest for us a 2-field resolu-tion was used. As the technique develops, the best way to genotype HLA alleles will hopefully soon be to use long-read sequencing to sequence the whole gene as one single read and be able to overlap reads to get the full haplotype over the MHC, something that should be possible already (but at a considerably higher cost).

Imputation Imputation relies heavily on the quality of the input. If bad quality variants go in, then you get bad quality results coming out. This is because imputa-tion fills in information that is missing in a haplotype based on the infor-mation that is present (Figure 8). The previous section describes the pipeline for going from fastqs to acquire high quality variants and it is utilized for imputation too (Figure 7). In order to get even higher quality variants only positions with a call rate above 98% were kept for the HLA genotyping.

We used SNP2HLA111 to impute HLA genes. This software uses a refer-ence panel as a database to determine linked variants so to fill in missing information. We used the T1DGC reference panel based on roughly 5000 European samples111. SNP2HLA utilizes Beagle112 to impute variants and the combinations of the variants in the HLA genes (included in the reference panel) are translated into HLA alleles. This software was selected based on its popularity, having been cited over 300 times. It is commonly used as the imputation software for GWAS and other SNP chip data, where studies of HLA genes were warranted. SNP2HLA had also been used in an ankylosing spondylitis HLA study; studying associations based on imputed genotype data, Cortes (2015), which also contributed to its prioritisation above other HLA imputation software63.

Page 32: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

32

Figure 8. An illustration of how imputation in SNP2HLA is performed. A reference panel that includes and extends past the gene of interest is combined with infor-mation of LD between SNPs. Missing SNPs in the input will be probabilistically inferred based on the reference panel.

Inference For inference software programs, only the fastqs are needed for genotyping the HLA alleles. To simplify the task to call correct alleles, only the reads from the MHC region and unmapped reads are used (Figure 7 and 9). If all reads are used to map to a smaller part, BWA’s mapping algorithms will cause reads that should not map to a given location to do so, because there is no better option available. The pipeline from the previous section was used to create a binary sequence alignment map (bam) in order to extract the de-sired reads (Figure 7), which in most cases were converted back into fastq files. These files are then used as input into the inference software programs, where the reads are aligned to all the possible HLA alleles in the reference used (Figure 9). In this thesis four inference software programs were used, namely HLA-VBSeq113, HLAscan114, HLA-HD115 and OptiType116.

Each inference software program is different in several – and not always obvious – ways, including not just the method used for inferring alleles, but also the reference and resolution ability. When it comes to the reference, it can differ in two ways. Among the four programs mentioned, all use a IMGT/HLA70 reference, but there are many versions of this reference, and not all software programs allow for changing the reference. The reference is also divided into three options: sequence of only the exons (nucleotide refer-ence), sequence for the whole gene (genomic reference), and the amino acids in the resulting protein (protein sequence). Some software programs use the nucleotide reference while others use the genomic. It might not sound like a big difference, but the content of these references differs considerably in the number of available alleles, because only a small fraction of the alleles has been reported with genomic sequence. It is only recently that it has become common practices that the genomic sequence needs to be available when describing a new allele. It was earlier possible to only have the exons coding for the binding groove available. Since higher resolution HLA typing needs

Page 33: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

33

intronic information, not all software programs are able to infer these. Some software programs try to take advantage of LD to make a best guess when using the nucleotide reference, but those calls might not be correct since they ignore information from the intronic variation. It is not only the variants in the introns that might be missed; some software programs only take the binding groove exons into account, ignoring variants in other exons. This is something we noticed during our study, in particular that one software pro-gram did not include exon 1, resulting in a random call between the two alleles that differed in only exon 1.

Figure 9. A general description of the workflow for inference software programs. Raw reads are aligned to a reference genome, and those that align to the MHC re-gion or are unmapped are extracted. These selected reads are realigned to alleles from the IMGT/HLA database. The interpretations of which alleles are present in any given individual differ between software programs.

n-1 method Seeing as each software program is biased in its own way, we decided to make a consensus dataset that would ensure high quality genotypes, while diminishing the bias. Depending on the study, the constellation of the soft-ware programs varied.

In paper II the software used for the three HLA class I genes (HLA-A, HLA-B and HLA-C) were SNP2HLA111, OptiType116, HLA-VBSeq113 and HLAscan114. For these genes three out of four software programs needed to be concordant for the allele to be called. For the other five genes (HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1 and HLA-DRB1) only three software were used, SNP2HLA, HLA-VBSeq and HLAscan, and therefore the call was based on two out of three software programs instead.

In paper III, OptiType were switched to HLA-HD115 in order to expand the selection of HLA genes included in the study. From eight genes in study II, we now included 17 genes (with the ability to include more but these

Page 34: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

34

were not in the targeted array and therefore excluded from analysis). The eight from study I were all called by four software programs (SNP2HLA, HLA-VBSeq, HLAscan and HLA-HD) then using data from three out of four concordance, while the remaining nine (HLA-DOA, HLA-DOB, HLA-DRA, HLA-E, HLA-F, HLA-G, MICA, MICB and TAP2) were called with two out of three software concordance (Note that SNP2HLA is not able to call these with the T1DGC reference panel).

Associations and where to find them

A test for each occasion Unfortunately, there is no “one size fits all” when it comes to testing for associations. Depending on if you want to include rare or common variants and also depending on the disease studied, there are different tests that would work well in different situations.

Univariate test A univariate test is commonly used when finding associations from GWAS data. Here, each variant by themselves is tested for association to the disease. For this analysis, common variants are used. Common is often specified as alleles with an minor allele frequency above 5% but can also be specified for the population used based on the number of samples (n) and the formula, threshold 1 ÷ √(2𝑛) 117. As for most methods there are several software programs with the ability to perform this analysis, for example the software suite GenABLE (used in Paper I). There are different models of univariate tests and model selected is based on the inheritance model of the disease. For AS, the polygenic mixed model was utilised. An identity by state (IBS) ma-trix was included in order to remove confounding effects of population strat-ification.

Aggregate test In order to analyse if a combination of both common and rare variants in a region produces an association, aggregate tests can be used. Aggregate tests can also help discover variants with low effect size if there are several in the same region, since the combined effect of such variants can be larger than for any individual variant. Selecting a software program for aggregate testing is not a simple task. There are many options to choose from and they might not be as good as they appear. The problem with many of the options is that there is no explanation of what the program actually does, the algorithm is not described, or the parameters used is not specified, making the software program into a black box.

Page 35: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

35

In the end, we selected the R package Sequence Kernel Association Test (SKAT118) which seemed to be one of the most transparent option with the algorithm easily available. It also had the positive addition of a ready pro-gram for prioritising the variants from a significant region using a backward elimination test (which I will mention more in full in the next section).

SKAT includes all markers in a region, irrespective of MAF117, and we performed a combined rare and common variant test, SKAT CommonRa-re117, with the SKAT algorithm. The SKAT algorithm first performs individ-ual regression tests of the variants before combining the variants in the re-gion, which removes any problems that variants with effect of different di-rections could bring. It also performs best with data where the majority of the variants are non-causal, which was what we expected for our dataset.

SKAT CommonRare aggregate the same region once with only rare and once with only common variants. This is because the variants are differently weighted in the tests (rarer alleles have more weighting, which then decreas-es down to the common allele threshold, where all alleles are given the same weight) and making it possible to use different algorithms for the tests if so wished. The two tests are then combined to provide the association of the region as a whole.

The decision of which regions and variants to aggregate is up to the user. It is possible to use a-priori information or only variants that are predicted to possibly have a functional effect, but that would limit down the search. What is already known or predicted might not reflect completely what can be known. Our regions were based on the longest transcript space from each gene (the first start and the last stop of any transcript) together with regulato-ry regions, specifically from the 5’UTR plus 2 kb upstream to the 3’UTR plus 20 bp downstream. The targeted sequencing array resulted in more than 7,200 genes, since genes were covered by the conserved regions and were found around the roughly 1,900 targeted genes.

In order to decide which genes that were covered enough to be part of the SKAT analysis, we calculated the average exon coverage (AEC). The logic behind this was that the test would not be able to fairly evaluate the gene if only a small part of it had been sequenced. To calculate AEC, gvcfs for 10% of the samples were used. First, the percentage of coverage for exons was calculated per sample (number of bases with ≥ 8 read depth and ≥ 20 geno-type quality divided with total number of bases in the exon). A threshold was chosen empirically based on plots where different levels of AEC were plot-ted against sample missingness for each transcript. In order to keep many targeted genes with coverage in the majority of samples, a ≥ 70% AEC and a ≤ 10% missingness were used as a threshold (and only regions with at least two variants were used) resulting in 3,673 regions retained in the analysis for males and 3,652 regions for females.

Page 36: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

36

Logistic regression and Fisher’s exact test In the PyHLA software119 used to analyse HLA alleles, there were three op-tions of association tests available; logistic regression, Fisher’s exact test and chi squared. Even though the tests are similar, there are differences between them that make one or the other a better choice.

Since sex was significantly associated to disease we wanted (when possi-ble) to use sex as a covariate in the association test. This could only be done with logistic regression test (also for linear regression but that is for quantita-tive traits), which made the choice easy in this case. For amino acids, logistic regression was not available, and since Fisher’s exact test works better on small sample sizes than chi squared, it was used for these tests.

Male. female and the whole population analysis For both paper I and III, we were interested in investigating if genetic sex-stratification could be identified in this sex-biased disease. Around 2/3 of our case population was males. By dividing the population studied into males and females it was possible to compare the associations found in the differ-ent data. This was by far the best way to try to dissect sex-specific associa-tions; partly because the population studied became more homozygous mak-ing signals of associations clearer and partly because other options might have a hard time handling the skew. When studying the whole population, sex needed to be taken into consideration, since it was significant in our da-taset. This was done in two different ways; sex was either used as a covariant in the test (paper III) or the individual test for each sex was combined with a weighted Fisher’s method120 to combine the p-values from the shared vari-ants from two different sized groups (paper I).

Prioritisation of variants After running the association test, most variants are filtered from down-stream analysis since they do not have significant p-values. Once again there are choices to be made; what should be the significance threshold and which variants are driving the signal?

Significance threshold It is obvious that there needs to be a threshold for what is significant and what can be seen by chance, but how to do this is not always straightforward or easily understandable. Significant p-values for a single test is normally set to 0.05121. When performing several tests the significance threshold, or p-value, needs to be adjusted for the number of tests121.

Bonferroni test is one of the most common ways of choosing a signifi-cance threshold to correct for multiple testing. In paper I, two different Bon-ferroni thresholds were applied. For the univariate test, a cut-off of 1x10-6

Page 37: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

37

was used instead of the classical 5x10-8 since we have targeted and not ge-nome-wide data121. For the aggregate tests a 5% Bonferroni was used based on the number of regions that was aggregated in the test.

Bonferroni is based on the preconception that the tests are independent, which often makes it too strict when it comes to genomics datasets with many variants and loci in linkage disequilibrium. To address this, it is possi-ble to use permutation testing to take all dependence and dataset structure into account. A false discovery rate (FDR) was used in two of the papers (I and III), where the tests were performed 1,000 times with phenotype or sex shuffled. The p-value for 95% lowest value is used as the threshold for a 5% FDR.

Some software programs also provide you with an odds ratio (OR), which determines the effect size of the variant. The OR is the ratio of the following ratios, i) having the variant versus not having it in cases and ii) having the variant versus not having it in controls. An OR that have the 95% confidence interval (CI) that crosses 1 (the line for what is a protective variant and what is a risk variant) is a sign that more data is needed to obtain a trustworthy effect of the variant (narrow the CI).

Backward elimination An aggregate test uses all variant positions in a region when performing the test. But as each individual have 3-5 million single nucleotide variants (SNVs), the majority of these do not have an impact on disease122. To narrow down the number of variants in a region, to those more probable to cause disease, backwards elimination can be performed (BE; SKAT-BE123).

The idea of BE is simple, variants that contribute to the signal will nega-tively influence the p-value when removed, while removal of neutral variants will not change the p-value or even make it more significant. When running the aggregate test with all variants, it will give a p-value that is used as the baseline for the BE. One variant at a time is removed and the p-value re-measured. The variant that improves the p-value the most when removed is excluded from the next run using the new improved p-value as baseline. The process continues until no variant can be removed without negatively chang-ing the p-value, creating a set of variants that are most likely to have an im-pact on disease.

How to hypothesise Discovering associated variants is not enough to meaningfully contribute to disease diagnosis and treatment. When a set of significantly associated vari-ants has been acquired the question becomes “how”. How might this specific variant impact disease? How could this hypothesis be tested?

A hypothesis of what the function of a variant might be can be created in several ways. Collecting information about the variant, for example, if it is a

Page 38: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

38

coding variant, the gene in which it is located can provide crucial infor-mation about its potential function. The variant or gene might have been associated with a phenotype before, which might exist in the disease studied. There might have been functional analysis done on the variant before, which could shed light on its role in disease.

In some cases, there are no previous studies mentioning an associated variant and no information about what the resulting protein does. Luckily, there is a multitude of databases available to mine for information. Some of the ones used for this study include ENCODE124, RegulomeDB125, GTEx126, GeneHancer127, PROVEAN128 SnpEff 4.1129, and pair-wise TFBS potential, sTRAP130,131. These are all resources available to predict function, such as transcription factor binding, amino acid changes, loss of function, which genes that can be affected, how deleterious a variant might be and how prob-able it is that it has a regulatory effect. With this information collected, it is possible to create a hypothesis of how the variants might contribute to dis-ease and, even better, find a way to prove it. By using the little grey cells, it is possible to design an appropriate experiment.

“Everything must be taken into account. If the fact will not fit the theory – let the theory go.”

― Hercule Poirot (Agatha Christie)

Page 39: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

39

Results and discussion

The discussion is divided into four parts, including Papers I-III, and a final section that will cover the investigation to determine if sex-dimorphism ob-served for disease is reflected in the genetic predisposition as studied in Pa-per I and III.

Paper I: The sex-stratified genetic architecture of ankylosing spondylitis In this study we identified both shared and sex-specific associated loci. By subdividing the population by sex, we could uncover three sex-specific loci significantly associated to disease. Univariate test revealed a distal promoter to run-related transcription factor 3 (RUNX3) was associated in males. Ag-gregate tests revealed one associated locus in males, prolyl 3-hydrocylase 1 precursor (P3H1, also known as LEPRE1), and one in females MHC class I polypeptide-related sequence B (MICB). All variants prioritised for func-tional validation were replicated in a second Swedish population unless oth-erwise stated.

MHC was most highly associated to disease in the univariate test for both males and females, which was not surprising. The SNP with lowest p-value was located around 700 bp upstream of HLA-B. Males and females did not have the same SNP as most significant even though they were in LD. Locat-ed in a region with histone marks and transcription factor binding sites (TFBS), it is possible that this variation can affect gene regulation. When performing a conditional analysis on the most significant SNP, the remaining association peak in males and female differ, namely HLA-B in males and MICA in female. It is, however, difficult to dissect the signal in the region because of the high linkage disequilibrium.

RUNX3, on chromosome 1, is a locus that has previously been associated with AS, with the first time being around 10 years ago36. In this study, how-ever, we have discovered some new information. We do not have the same top SNP in the region as previous studies36, though it is in LD (r2=0.98), and in our data we can show that the locus is male-specific. This may explain why this locus does not always show up as significant, seeing as a popula-tion of mixed female and male samples would lower the signal. The locus

Page 40: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

40

consists of five variants, four SNPs and one indel, located in a GeneHancer element127 with several upstream transcription factor-binding sites (TFBS). The region may play a regulatory role not only on RUNX3 itself but also as a distal enhancer for MAN1C1 and SYF2I. The LD between these variants was investigated and haplotypes reconstructed. Homozygote risk was observed in 33/16% of male cases/controls, which was significant with an OR of 2.6, and 20/28% of female cases/controls (not significant).

Luciferase assays were performed on two fragment sizes, one with three of the variants and one with all five. The risk haplotype was a significantly stronger enhancer than the non-risk and the longer fragment produced 3-4 times more relative luciferase expression than the shorter fragment. This is not only proof that these variants can affect expression, but also hints at that these variants may influence expression of RUNX3, MAN1C1 and SYF2I. It also brings forth the importance of considering more than one variant when performing validation. Intriguingly, this experiment revealed that expression is not only altered in immunological cell lines (Jurkat), but also in skin cell lines (HaCat) and bone cell lines (SaOS-2). These three cell lines are all of interest for AS, with psoriasis as a common comorbidity (skin) and the main symptom of ankylosing (bone erosion followed by growth).

Electrophoretic mobility shift assay (EMSA) was performed on the most significant SNP in the locus, since the risk variant was predicted to disrupt several TFBS. Since EMSA is not a quantitative assay it is hard to tell if there is more or less binding in either group, and which transcription factor that binds. What we could conclude was that both alleles had binding in both activated and naïve T-lymphocyte cells (Jurkat). Seeing as the association has been known since 2011, it is not surprising that the idea of how it con-tributes to disease have evolved during this time. Based on a recent mouse study, Runx3 might have a dual job in disease, both in immune regulation and in osteogenesis (process of bone building and breakdown)132–136.

Aggregate test revealed two loci, MICB and P3H1, to be associated with AS. By implementing BE, the number of variants in the loci decreased so that 27 of 115 variants from MICB and three of 31 variants from P3H1 were retained as contributing to the association signal and are thus more likely to be contribute to disease.

P3H1 is a good candidate gene, with its previous association to the bone disease osteogenesis imperfecta. Of the three SNPs, two were rare and one common, only the common variant was used in downstream analysis. This allele was, however, not significant in the replication population or the com-bination of the discovery and replication data. This might be because of the disease differences in the discovery versus the replication dataset or even that they are collected in different regions of Sweden. In any case, we decid-ed to examine if the locus could be functionally validated using one of the common variants, rs7552138. Even though the variant was predicted to af-

Page 41: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

41

fect TFBS this could not be validated with EMSA. This does not mean they are not functional, and other cell lines might tell a different story.

MICB was more successful. With 27 variants to choose from, we investi-gated the LD in the region. This resulted in two haplotype blocks and a few variants independent of these blocks. Both blocks had a risk haplotype. In-terestingly, the homozygote risk haplotype of block 1 always co-occurred with homozygote risk haplotype for block 2. The reverse was not true, ho-mozygote risk haplotype for block 2 was as common with heterozygote as for homozygote risk haplotype for block 1. Information about their regulato-ry potential and previous associations was utilized to prioritise one variant from each block (or independent group, three in total) that should be used for functional validation. With help of EMSA it was possible to learn that all three variants had binding capabilities. One of the variants (rs3828903-A) in block 1 showed competitive binding across skin (HaCat), T-lymphocyte (K562) and bone cell lines (SaOS-2). MICB is known for its role in presenta-tion and binding to natural killer cells in the immune response137, so this result hints towards MICB having roles not yet known. This variant is in weak LD (r2=0.23) with the most significant HLA-B allele, but that does not exclude the possibility of LD to HLA-B*27. It is also one of 13 variants in that haplotype that is co-located in a GeneHancer element that influences 26 genes.

GTEx data supports that regulation in certain tissues by this region for five of those 26 genes. Three of the genes that might be regulated by this region has been associated with ankylosing spondylitis before, DDX39B (BAT1)138, HLA-C139 and MICB140. One of the other genes was also particu-larly interesting, ATF6B. Even though this gene as never been genetically associated with disease, previous reports have described a 1.8 fold up-regulation of mRNA in the whole blood of AS patients compared with healthy controls141. A knock-out of ATF6B counterpart in mice, Atf6B, re-vealed a correlation between expression of the gene and chondrocyte prolif-eration142, suggesting that ATF6B might not only have a potential role in endoplasmic reticulum stress response, a response theorised to play a part in AS, but also in ankylosing.

The MHC region is as always hard to dissect and to explicitly understand how it has a role in disease. We hypothesise in Paper I that the MICB associ-ation could have a role in ankylosing spondylitis through TNF-α homeostasis and the innate immune system, while RUNX3 might have a role in the adap-tive immune response with regulation of T-cell differentiation. The results from both MICB variant rs3828903 and RUNX3 hints about a role also out-side of the immune system with interesting results in a bone cell line. The fact that RUNX3 is male-specific might explain why it has been elusive in association tests, where the composition of males and females in study popu-lations will affect the strength of association signals from sex-specific vari-ants. In this study, we have contributed to increasing the knowledge of pre-

Page 42: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

42

disposition of AS, exposed new ideas of how previous associations might function, like RUNX3, and opened up for new studies, e.g. what effect have increased levels of ATF6B.

Paper II: SweHLA: the high confidence HLA typing bio-resource drawn from 1,000 Swedish genomes This study focused on the generation of the high confidence Swedish HLA typing bio-resource (SweHLA). This resource was the next step in the study of 1,000 Swedish whole genomes published in 2017, SweGen85. With the data from SweGen, eight HLA genes (HLA-A, -B, -C, -DPA1, -DPB1, -DQA1, -DQB1 and -DRB1) were genotyped and their allele frequencies cal-culated. Both the frequencies of the n-1 high confidence set, and the individ-ual software programs are freely available. The methodology used has been carefully described in the comments on samples and methods section; here we are going to explain why this is so important.

When trying to do a case-control study with only one software program we found one association, but after a closer look we realised that it was a false association. The software could not differentiate between two of the alleles and by random chance had control samples been assigned more of one allele than the cases. This was disconcerting and implied that we needed another, better approach to genotype HLA alleles. The solution came half from the software Ensemble143 (a software combining results) and half from the fact that we had already tried more than one software program. Since each software program has its biases (see Comments on sample and meth-ods), combining their results should diminish their individual biases. By combining results, while some alleles would not be called when the software programs disagreed, the genotyping rate was still very high, varying between genes from 82.4-98.1% in this dataset.

The big question was if these biases could make a significant difference. When comparing results with studies using the same software, there is the same bias in both datasets, so here the bias should be negligible. But when comparing results from different software programs, there will start to be problems. We discovered 18 alleles that had more than 2% difference in their allele frequencies between any two software programs. Of these, 15 were alleles that have been associated to some kind of disease. This clearly makes a difference. Depending on the software program utilised, the allele might be rare and filtered out, or common and included in the association tests, or even worse, some alleles could not be called at all. Therefore, before saying that the allele was not replicated by the dataset, it is important to see if it even had the possibility of being called. One clear example of the im-parted bias when using only one software program is demonstrated by HLA-

Page 43: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

43

DRB1*16:01, where this allele had three different allele frequencies depend-ing on software: 0%, 0.3% and 10.7%.

In order to evaluate the quality of the data, the allele frequencies were compared to the largest freely available lab-typed Swedish population of only 245 Swedes144. This is the only population with more than three genes available for the same set of individuals at a 2-field resolution. Unfortunate-ly, not all of the genes studied had 2-field information available, but there was a high correlation between the genes that was possible to compare. The dataset was also compared to a British SNP2HLA population76 with over 5,000 samples to compare population diversity. HLA-DQA1*03:03 had a 6% allele frequency in SweHLA but was not found in the British set. This is another clear example of the problems that using different software pro-grams can create. This was not a proof of diversity, but rather result of SNP2HLA not having the allele in its reference panel. These kinds of dis-crepancies in calling are to be expected, since the number of alleles available in the references for the four software programs used in this study varies between 298 and 9,854.

This approach, to create consensus calls, seems to be closer to the gold standard than any of these software programs are on their own. SweHLA is a high-quality resource with the option to use allele frequencies from one sin-gle software program if that is desirable, but our recommendation is to com-bine software programs for a more robust and trustworthy result.

Paper III: HLA-A confers protection in HLA-B*27 positive ankylosing spondylitis In this study, the methodology developed in paper II was utilised to study association between HLA alleles and ankylosing spondylitis. By changing one of the software programs, 17 genes could be genotyped (HLA-A, -B, -C, -DOA, -DOB, -DPA1, -DPB1, -DQA1, -DQB1, -DRA, -DRB1, -E, -F, -G, MICA MICB, and TAP2), with 15 having a genotyping rate above 80% (HLA-DPA1 and TAP2 were excluded). Since both HLA-B*27 independence and sex-specific predisposition were to be investigated, the population were subdivided into six groups. One group with all samples and one group with all HLA-B*27 positive samples; these groups were further split into only males and only females. Seeing as sex was significantly associated with dis-ease, it was included as a covariate in analysis encompassing groups with combined male and female samples (except for amino acids since Fisher’s exact test could not use covariates). Discussion about sex-stratification can be found in the next section because of the close connection with paper I in that regard.

Page 44: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

44

Of the 15 genes that were genotyped at a high rate (> 80%), nine were as-sociated to disease, with 25 alleles when all samples (combined set: ncas-

es=310 and ncontrols=2,196) were used (all associated alleles with OR and CI for all tests can be viewed in Figure 10). These alleles were detected in dif-ferent combinations of the datasets (as illustrated in Figure 10 with boxes; grey for combined, black for male and white for female). Four alleles were only detected in the combined dataset, whereas seven were associated in all datasets: the combined, male and female dataset. The remaining 14 associat-ed alleles were shared between the combined and the male set only. The seven alleles associated in all groups demonstrate a robust association, while the ones shared between the combined and male dataset hint at the composi-tion of the combined set (3:1 males). Three alleles were associated in fe-males but not in the male or combined dataset, HLA-DQA1*04:01, -DQB1*04:02 and -DRB1*08:01 (Figure 10). Interestingly, HLA-DRB1*08:01 has been previously associated to a decreased degree of radio-logical changes in AS, concordant with manifestation of disease, seeing that female cases often have less changes145. The other two alleles were novel. One of the alleles associated with risk in the combined dataset was MICB-005:02.

An HLA-B*27 positive cohort (or negative) is the best way to find out if there are any association signals independent of this gene in ankylosing spondylitis. In most studies, researchers perform conditional analysis tests, where the gene is either masked or used as a covariate. This is a strategy commonly used in AS too, but it was not the optimal way to do it here. In AS, with more than 80% of cases being HLA-B*27 positive19,54 versus only around 7-14% in controls depending on population144,146, it is very close to conditioning on disease, which is not the intent. To extract real independent associations in the best way, in AS, is to have a homogenous group. In this study we were able to create a HLA-B*27 positive population (with only eight HLA-B*27 negative cases, an negative analysis could not be done).

Page 45: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

45

Figure 10. All alleles significantly associated to ankylosing spondylitis when ana-lysing a population with both HLA-B*27 positive and negative individuals. Odds ratio above 1 (with 95% confidence interval not crossing the line) indicate risk, and below 1 indicate protective effect. Grey boxes indicate alleles significantly associat-ed to disease in the combined set, black in males and white in females.

Page 46: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

46

One allele, HLA-A*24:02, was protective in the HLA-B*27 positive popu-lation. Neither male- nor female-specific analysis revealed any significant associations. This was not unexpected given that the groups were rather small. In order to try to find other HLA-B*27 independent associations, these three groups were further analysed to see if there could be any association signals from amino acids. This, too, presents an opportunity to discuss reso-lution. An amino acid is a lower resolution than a 2-field allele, because amino acid analysis group alleles together by matching one single amino acid. This does not mean they have the same three bases that codes for it, but in the protein, the amino acid at that position is the same. Our HLA-B*27-independent associations reveal a link between the amino acids and the asso-ciated allele, as the protective amino acids are found in HLA-A*24:02. When all HLA-B*27 positive samples were included the position 119 in HLA-A was significantly protective when being a leucine (119L). Males revealed a significant protective association at HLA-A position 180 if the amino acid was glutamine (180Q). Both amino acids are located in the binding groove of HLA-A and could significantly impact binding capacity. Both 119L and 180Q have bigger side chains compared to the other amino acids found at these positions, and while 119L does not change the charge of the protein, 180Q is hydrophilic contrasting to the other hydrophobic amino acids com-monly occurring at that position.

This is the largest (by the number of genes) HLA study of ankylosing spondylitis done in one and the same population. It is also one of the largest studies, by the number of samples and genes, of only HLA-B*27 positive samples. We have not only replicated some of the previously known HLA associations but also presented new ones. What makes the newly described HLA-A*24:02 allele protective in an HLA-B*27 positive population? Figur-ing out how this allele is involved in hindering disease progression might potentially reveal a drug target.

Paper I and III Both paper I and III cover association analysis of ankylosing spondylitis with the goal to investigate if sex-specific predisposition to disease exists. The manifestation of AS differs between sexes, the genetic background to this has not been studied. It is, however, important to understand if this has a genetic cause, considering that it could be exploited to better personalise disease treatment. In fact, these studies demonstrate that the genetic predis-position reflects the sexual-dimorphism in disease.

Paper I revealed male-specific variants in or close to RUNX3 and P3H1 whereas MICB is female-specific. Paper III found three female-specific al-leles in the HLA-B*27 mixed population (HLA-DQA1*04:01, -DRB1*08:01, -DQB1*04:02), and one amino acid in males (HLA-A 180Q). To make sure

Page 47: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

47

that the results were not caused by subsampling or difference in cohort size, two 5% FDR were used as thresholds, one with the phenotype shuffled (pa-per III) and one with sex shuffled (paper I and paper III). Another investiga-tion was also made, comparing the difference in allele frequencies between cases and controls for female versus males. This examined if the association difference is caused by one sex having a higher allele frequency at a specific allele. If the difference between cases and controls in one sex is larger than the other, it is more probable that it is true. This all adds weight to the dis-covery of sex-specific genetic predisposition in ankylosing spondylitis.

There are a few points here of importance. When sex is significantly as-sociated with disease, to only use that fact as a covariate might not be enough. In paper III, we used sex as a covariate, which should strengthen sex-specific associations signal. Since we have more males it is easy for the signal to be controlled by them, by using sex as a covariate this influence should be less obvious, but this did not reveal any of the sex-specific vari-ants that we discovered later by dividing the cohort into male and female datasets. Even though the subdivision reduces the number of samples, it creates even more homogenous groups and simplifies detection of associa-tions. It probably goes without saying, but many of these sex-specific associ-ations would not have been detected if sex were disregarded, like is often the case in studies of ankylosing spondylitis. There are studies that do not men-tioning the sex ratio of their population at all, which is likely to be imbal-anced, and others include this information but do not use it as a covariate in analysis even though it is significantly associated with disease.

Another point in favour of these studies is that although the population sizes were relatively small, they were able to replicate known associations, as well as revealing new ones. In paper I, functional validation strengthens the result, while the associations in paper III would need more investigation to prove the putative functional effects. HLA-DRB1*08:01 does make a compelling story, though, with its association with less radiological changes and its association to female-specific disease, as females are reported to have less radiological changes.

Page 48: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

48

Page 49: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

49

Concluding remarks and future prospects

Overall, we have taken the research into ankylosing spondylitis and NGS HLA genotyping forward. There are still much that could be done with this data, not to mention how this work could be transferred and utilised in other diseases or into clinical genetics.

What have we learnt from these projects? The first efforts to utilise this dataset for genetic studies have resulted in new insight into the sex-stratification of ankylosing spondylitis for both the whole targeted sequence array and HLA alleles. Even though the fact that males have a higher incidence of disease and that males and females have different manifestation of disease, the genetic differences had not been stud-ied before. Skewed sex ratios are true for several diseases, especially immu-nological diseases. Hopefully this proof of concept, that skewed sex ratio and differential manifestation of disease might be caused by genetic predis-position, will lead to more studies of its kind. Another lesson that really hits home is to not expect a software program to be perfect. Always make sure to test it out, ideally on data where you know the result, to know if it performs the task that it should and in the way you expect it to. If you are unsure about what bias a software program might bring, combine several to reduce the overall bias. It might result in less data but at least the data you have will be of high quality and confidence.

What else could be done with the ankylosing spondylitis dataset? There is still a lot that can be done with this dataset. First of all, imputation, which might be the easiest next step to implement. Since we have a targeted array covering around 1% of the genome there a wealth of information that has not been considered yet. The targeted array has allowed for a more in-depth study of this disease than the various SNP chips used in earlier publi-cations. By imputation based on a really large reference panel with almost 65,000 haplotypes from 32,500 samples of predominantly European ances-try147 and with high quality variants, we could expand the region of infor-mation and learn even more. To provide extra strength to this analysis, extra controls from SweGen can be included.

Second, careful association analysis should be performed also for the sex chromosomes. The sex chromosomes are largely overlooked in most disease association studies, often explained by the higher difficulty in calling vari-

Page 50: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

50

ants for sex chromosomes than autosomes. For females, they are diploid (chromosome X), but for males there is a mix of haploid and diploid (chro-mosome X and Y have an overlapping regions called pseudo autosomal re-gion148), this region makes the variant calling tricky. Our solution has been to divide the X chromosome in males and treating the pseudo autosomal region as diploid while the other region in haploid, but further analysis is in the works.

Third, it would be interesting to attempt to investigate structural variation (SV) in the dataset. Even though our data is targeted sequencing, which makes SV detection harder, it is not impossible to do. There exist software programs especially designed to accommodate exome sequencing, similar in scale to our targeted array149. In this field, it is standard to apply a similar approach to our HLA typing, by using several software programs and get the consensus before trusting an SV. This examination is even more interesting since it is known that there is a copy number variation (CNV) region within MHC that covers among others HLA-DQA1150.

Fourth, it would be good to do further functional validation. In some of the studies, we could show that a specific variant had an effect on the bind-ing of transcription factors or altered expression, but we still do not know how they contribute to disease. We also have the example of HLA-A*24:02 and the amino acids, which could affect binding in the allele compared to alleles without the two associated amino acids, but the functionality of this is not proven. In any case, this picture will not be complete until the molecular mechanism of how variants contribute to disease can be demonstrated.

Fifth, it would be interesting to expand the study, to look further afield. With this I mean, investigate if the associations found in the Swedish popu-lation can be replicated in another populations. In an ideal study, all the AS datasets would be combined to see what more power could help reveal in a larger, more diverse dataset and which variants held the key to disease no matter which population you belonged to. With a larger AS population, it would also be possible to subdivide into sub-phenotypes, such as co-morbidity categories.

Sixth, even though I do think combining software programs is the best way to type HLA alleles, I would like to compare the associations from our SNP2HLA data with that from Cortes et al. (2015). This paper did a thor-ough investigation of HLA alleles imputed with SNP2HLA from an Immu-nochip. Cortes had 9,069 ankylosing spondylitis cases and over 13,000 con-trols (no sex information provided). It would be interesting to see if we are able to find the same associations in our different (and smaller) populations and, if we do, if we can find a difference when having sex as a covariate or when testing each sex by itself.

Page 51: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

51

What about all data generated with the custom-made targeted array? Since the creation of the array several diseases has been sequenced with it, not just ankylosing spondylitis but also Addison’s disease, Systemic lupus erythematosus (SLE), Sjögren’s syndrome, myositis and vasculitis, generat-ing a group of immune-mediated diseases (all sequenced with the same array and in the same way). This presents a unique dataset that should be exploit-ed. Some of these diseases also share co-morbidities, like psoriasis or nephri-tis. In AS the co-morbidity groups were too small for independent examina-tion, but by combining these different disease dataset, it might be possible to study sub-phenotypes common across diseases. Not to mention the possibil-ity to find disease variants common between diseases. Combining the da-tasets can lead to detection of causative variation with lower effect that are hard to detect in a single disease but will be brought through with the height-en power of more samples.

What is next for HLA analysis? Even though our work is important and if implemented by others, will lead to more robust genotyping of HLA genes from NGS data, there are ideas that might take it a step further. New software programs rely more on graph-guided typing instead of realigning to the known alleles from IMGT/HLA database108,109. This idea to use population graphs, with the different variant options across the gene, will make the software programs less dependent on what has already been reported108,109. These software programs claim that they have the ability to find novel alleles108,109, but these alleles are right now only reported by G-groups, which is why we have not used them in these studies.

We have tested Kourami but since the resolution was different, we could not incorporate it in paper II. Overall, the G-groups agreed with calls from the other software programs if the 2-field alleles had been assigned their G-groups instead. We tried to investigate the potentially novel alleles, but the variants did not agree with the ones in IGV when looking at the bam files. This can have many explanations, like the reads are realigned to the graph, and only Sanger sequencing of the potentially novel alleles would prove or disprove their existence. When the population reference graph-based soft-ware programs start giving 2-field resolution, they will be one of the best options available. This, in combination with long read sequencing of the MHC, will give the golden standard of HLA sequencing (SBT) a hard fight.

What is next for SweHLA? As mentioned above, more software programs are being developed. Since the creation of SweHLA, the same population has been run with one more software program, and though not included in the analysis, Kourami has been performed. My vision is that this resource will continue to grow, to

Page 52: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

52

meet developing needs in the field. It already serves to the needs of SNP-based typing as well as NGS data at 2-filed resolution. The next step would be to include G-groups from population graph software programs. With the use of software programs that claim they can detect novel alleles, another door opens. Like for the resource, a combination of software programs capa-ble of identifying novel alleles could be used for an in-depth analysis of HLA in the Swedish population. Novel alleles would of course need validation by sequencing; here long-read sequencing could be put to good use.

Page 53: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

53

Populärvetenskaplig sammanfattning

Ankyloserande spondylit (tidigare känd som Bechterews sjukdom) är en immunologisk sjukdom som påverkar livskvalitén hos den drabbade. Huvud-symptomen av sjukdomen är ryggsmärtor och stelhet, detta är orsakat av inflammation i ryggrad och bäckenbenets leder. Om sjukdomen får fort-skrida ostört blir ryggraden böjd och kotorna i ryggraden börjar få skelett utskott, vilka växer ihop och till slut formar en bambu liknande struktur. Den böjda ryggraden leder till andningssvårigheter då volymen i bröstkorgen minskar och det blir svårare att expandera den då hela strukturen är stelare. Det är även vanligt att ankyloserande spondylit patienter drabbas av andra sjukdomar: subklinisk inflammation i mag- och tarmkanal, psoriasis, druv-hinneinflammation (ögat) och ledinflammation. Ankyloserande spondylit är mer vanlig hos män än kvinnor (2-3:1), något som är ovanligt för immuno-logiska sjukdomar. Även symptomen kan variera beroende på kön, till ex-empel man kan se en högre grad av skelettförändringar hos män.

Ankyloserande spondylit är en ärftlig sjukdom vars orsak inte helt kart-lagts. När det kommer till genetiken bakom sjukdomen så har mer än 50 regioner i arvsmassan blivit kopplade till sjukdomen. En variant av immun-genen HLA-B, nämligen HLA-B*27, finns i arvsmassan hos mer än 80 % av patienterna, men färre än 7 % av de med HLA-B*27 utvecklar sjukdomen, något som tyder på att det är fler gener involverade. (HLA gener översätts till protein som är avgörande för ett fungerande immunförsvar.) Den här av-handlingen fokuserar på de genetiska orsakerna till ankyloserande spondylit i en svensk population och metoderna som användes för dessa undersökning-ar.

I den första studien sekvenserade (avläste) vi ca 1% av den mänskliga arvsmassan baserat på vilka gener som tidigare har blivit associerat med olika immunologiska sjukdomar, och de konserverade regionerna runt om-kring (för att inkludera reglerande element). Totalt 310 personer med anky-loserande spondylit och 381 friska blodgivare användes för att undersöka associationer till sjukdomen. Då sjukdomen varierar mellan könen under-sökte vi om detta reflekterades i den genetiska bakgrunden för sjukdomen. Med ett univariat test kunde de vanliga varianterna (>5% av proverna bär på den) undersökas, och det var ingen överraskning att den starkaste associat-ionen i både könen kom från HLA-B. Några varianter nära genen RUNX3 var signifikant associerad till sjukdomen i män medan med ett aggregerande test, som inkluderar både vanliga och mindre vanliga varianter i en region (till

Page 54: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

54

exempel runt en gen), kunde både P3H1 och HLA genen MICB bli identifie-rade att vara associerade till sjukdom i män respektive kvinnor. Vi kunde bevisa att varianterna vid RUNX3 kan bidra till ökad utryck av de gener de reglerar, medan MICB ändrar bindningen av transkriptions faktorer. Detta kunde visas i bland annat benceller, vilket är av intresse för sjukdomen då den involverar skelettförändringar.

Nästa studie involverade att både hitta en robust metod för att undersöka HLA gener, och att skapa tillgång till ett större antal kontroller för fortsatta studier av ankyloserande spondylit men även som en resurs för andra fors-kare. Arvsmassan av 1,000 svenskar sekvenserades och publicerades 2017 och med dess data kunde vi utöka vår kontrollgrupp för en HLA studie. Av-läsning av HLA gener kan göras från sekvenserade prover med hjälp av da-taprogram. Vi använde oss av tre till fyra program, beroende på genen, vars resultat kombinerades med en n-1 metod, för att undvika inbyggd partiskhet av ett program. Medan vi byggde upp en kontrollresurs för HLA studier kunde vi även undersöka frekvensen av de olika HLA gene varianterna i den svenska populationen. Vi kunde även se tydliga exempel på hur partiskhet kan påverka forskningen, då 15 av 18 gene varianter med stora skillnader mellan dataprogrammen har blivit associerade med olika sjukdomar.

Med kunskapen från föregående studie gjordes en undersökning av HLA genernas roll i ankyloserande spondylit, både vad gäller om det fanns någon skillnad mellan könen och om det fanns varianter som är oberoende av HLA-B*27. Samma tillvägagångsätt användes, 3-4 program vars resultat kombine-rades med en n-1 metod, för att få fram HLA gen variationer från de tidigare nämnda proverna, plus 815 extra kontroller. När alla prover användes kunde flera tidigare associationer replikeras, vi såg även att tre HLA varianter var associerade hos bara kvinnor. Två av dessa har inte blivit beskrivna i sam-band med sjukdomen förut, och den tredje (HLA-DRB1*08:01) har setts i sammanhang med lägre grad av skelettförändringar. För att undersöka obe-roendet av HLA-B*27 användes bara de prover som är bärare av denna vari-ation. HLA-A*24:02 var associerad med ankyloserande spondylit i gruppen med båda könen. Vi tittade närmare på regionen genom att undersöka om aminosyrorna, som generna ger upphov till, kan ge oss mer information. HLA-A hade två aminosyror som var associerade, båda finns kodade i A*24:02. Position 119 var signifikant i gruppen med både kvinnor och män medan position 180 var specifik för män.

Med dessa studier har vi kunnat visa att inte bara sjukdomens presentation skiljer sig mellan män och kvinnor men även den genetiska bakgrunden. Detta lägger tyngd på vikten av att använda sig av kön som en faktor när man studerar sjukdomar, speciellt när de är kända för att se annorlunda ut mellan könen. Vi har även visat att metoderna man använder kan spela stor roll i ens resultat. När man gör studier som förlitar sig på dataprogram ska man inte anta att de gör exakt vad man har tänkt sig. Många program har tyvärr en inbyggd partiskhet, genom att kombinera resultat från flera pro-

Page 55: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

55

gram kan man minska effekten av detta och få mer robusta och trovärdiga resultat.

Page 56: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

56

Page 57: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

57

Acknowledgement

“I’ve heard it said that people come into our lives for a reason, bringing something we must learn. And we are led to those who help us most to grow, if we let them, and we help them in return. Well, I don't know if I believe that’s true, but I know I am who I am today

because I knew you.”

- For good from Wicked Many people have come and gone in the corridor during the years I have been here. I would like to thank you all for being a part of this experience, no matter how small that part has been. I also would like to thank collaborators, people at IMBIM I have interacted with and scientists I met or whose talks I have heard; you have all con-tributed in some way to this. Some have had a more crucial part and need to be mentioned:

First of all, my supervisors that accepted the challenge to have me as a Ph.D. stu-dent. I honestly did not think I would ever do a Ph.D., but the combination of super-visors and the opportunity to work on genetic diseases was too good to resist.

Kerstin, for letting me come to the lab for research training and, even more im-portantly, for letting me come back and stay. From the first lecture I heard about your research I decided that one day, I would work there. Back then I thought my option was working with dogs, while my dream was genetic diseases in humans, but you made my dream come true. Oh, and for allowing me to steal your office for half-time and thesis writing, that was invaluable.

Jennifer, I don’t know where to start, there is so, so much I’m thankful for. After having you as a supervisor for the master thesis, I knew you would be a fantastic Ph.D. supervisor as well. It is impossible to count all the times you made me more motivated, made me do my very best or made a day better. I would never have made it without you.

Matt W, for being willing to become my supervisor halfway though, even though the project planned for us never happened.

My office mates Doreen, Iris, and Sharda; J. K. Rowling said it best “There are some things you can't share without ending up liking each other” and knocking out a twelve-foot mountain tro… I mean… and sharing an office during your Ph.D. is one of them. Even though it got very stuffy when we were all there at the same time. For

Page 58: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

58

both the times at work and, even more so, for outside of work; dogs, Great Britain, Greece, TV-shows (you won’t admit to watching), books, movies, fika, and food.

All the people who made Friday-fika and lunchtimes memorable. An extra shout out to Freyja, Jonas, Mette, Turid, Matt C, Jason, Andreas, Gerli, Laura, Erik, Anna, Matteo and Sharda who has been part of conversations with A LOT of laughter that makes me smile thinking about them now.

Åsa and Eva M, for all the lab work you have done for the projects and, together with Susanne, Ulla and Jessica, being all-knowing for any lab or corridor related questions. And on that note, Cecilia, I’m still not sure how the group is running without you, no matter the question you had the answer. Erik and Maja, for taking me in the first time. Ginger and Gerli, for asking good questions, giving good ad-vice and giving an encouraging word when it is needed the most. This is also true for Lina, and I enjoyed our discussions on methods and trying to solve errors. I’m happy that we catch-up over food now and then. Mette, with your bright personality you always gave me an energy boost. I have missed having you around. Thank you for proofreading this thesis, for fikas and for being my friend.

People outside of the lab: Eva G, for being the best possible help imagine when teaching. I always enjoy running into you. Alexis, Dorothe, co-teachers and course responsible, for being helpful and teaching me new things, who knew I would enjoy bacteria or blood so much. Uppmax, for being a great computational resource with excellent support, even though we have had some hard times during the years. Johan, Björn and others from SciLifeLabs Bioinformatics advisory program. To Rågher and Ronnie, for being my models for the “in sickness and in health” figure.

An acknowledgment is not finished without the people that make your life complete, the ones outside of work: The wonderful people I have been incredibly lucky to gather around me since I got to Uppsala little more than 10 years ago.

The guy in my life, Michael, that have for some reason been by my side for over eight years. Your encouragements and craziness always keep me together and make any day better, no matter which “pandamonium” that is going on. Awwwwwww.

Amanda, a frequent visitor in our home, you always bring sunshine wherever you go. For being my Scotland road trip companion, which memories help me through hard days, and for, even more important, getting me. Du fattas mig!

Christofer, for being a great walking companion, psychologist, patient, gamer, cheerleader and for being the one that encouraged me and went with me on the greatest adventure (oh, New Zealand how I miss you). I’m deeply grateful for the eight years together, and there will be more, no matter where we end up in the world (you will not get rid of me that easily). And for being a big help in keeping me sane through this time.

Page 59: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

59

Veronica, for being a great friend all round. Spending time with you always makes me feel better (you are good at saying the right things), and I am thankful that you have time for me.

Tove, Ida and again Christofer and Veronica, for helping me escape from thinking about the thesis, and also escaping rooms and games. I hope these escape days/nights continue for a long time (but they need to hurry and create more. Or is that our next mission in life Christofer?).

Viktor, for being a nerd at my own level, and for being the first to read this the-sis, at least the first draft. Sofia (Bauero), and Simon and Sabina, always cherish the time spent with you.

Friends from my Vikbolandet and Norrköping days. Tessan (you will always be Tessan for me, sorry) and Martin, for being my

friends since forever. It is great to have friends you know is there no matter how long it was since the last time you met.

My high school friends My, Nanna and Daniela for the feel-good fikas during the last five years. I wish we could have them more often!

During the years, I have had several amazing teachers without whom I would proba-bly not have chosen this road. Thank you for everything you taught me, for believ-ing in me and for encouraging me.

And of course, my family. My parents, for always helping me with studying when asked to before I moved to Uppsala (even if it was hard sometimes), and letting me choose my own path, even if you did not fully understand what that was. For taking care of the involuntary renovating of the bathroom, allowing me to focus on finish-ing. To them all (parents, grandparents, brother with family) for letting me come home, for feeding me, and giving me time to relax (or work when needed). Cousin Peter, for Sweden rock, whisky, and talks. Aunt Ewa, for opening your home to us during the Scotland trip. And one more time, the newest member of the family, Rasmus, for being the best possible reason for going home and the best way to replenish energy.

And lastly, to those no longer with us.

Page 60: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

60

Page 61: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

61

References

1. Mendel G. Experiments in plant hybridization. J R Hortic Soc. 1901;26:1-32. 2. Botstein D, Risch N. Discovering genotypes underlying human phenotypes:

past successes for mendelian disease, future approaches for complex disease. Nat Genet. 2003;33(S3):228-237.

3. Landles C, Bates GP. Huntingtin and the molecular pathogenesis of Huntington’s disease. Fourth in molecular medicine review series. EMBO Rep. 2004;5(10):958-963.

4. Nathans J, Thomas D, Hogness DS, Shows T, Hogness D. Molecular genetics of human color vision: the genes encoding blue, green, and red pigments. Science. 1986;232(4747):193-202.

5. Neitz J, Neitz M. The genetics of normal and defective color vision. Vision Res. 2011;51(7):633-51.

6. Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front Genet. 2015;6:149.

7. McDermott MF, Aksentijevich I, Galon J, et al. Germline mutations in the extracellular domains of the 55 kDa TNF receptor, TNFR1, define a family of dominantly inherited autoinflammatory syndromes. Cell. 1999;97(1):133-44.

8. Lidar M, Giat E. An Up-to-date Approach to a Patient with a Suspected Autoinflammatory Disease. Rambam Maimonides Med J. 2017;8(1):1-7.

9. Aksentijevich I, Galon J, Soares M, et al. The Tumor-Necrosis-Factor Receptor–Associated Periodic Syndrome: New Mutations in TNFRSF1A, Ancestral Origins, Genotype-Phenotype Studies, and Evidence for Further Genetic Heterogeneity of Periodic Fevers. Am J Hum Genet. 2001;69(2):301-314.

10. Wekell P, Karlsson A, Berg S, Fasth A. Review of autoinflammatory diseases, with a special focus on periodic fever, aphthous stomatitis, pharyngitis and cervical adenitis syndrome. Acta Paediatr Int J Paediatr. 2016;105(10):1140-1151.

11. Wekell P, Berg S, Karlsson A, Fasth A. Toward an Inclusive, Congruent, and Precise Definition of Autoinflammatory Diseases. Front Immunol. 2017;8(April):6-10.

12. McGonagle D, McDermott MF. A proposed classification of the immunological diseases. PLoS Med. 2006;3(8):e297.

13. McDermott MF, Aksentijevich I. The autoinflammatory syndromes. Curr Opin Allergy Clin Immunol. 2002;2(6):511-6.

14. Park H, Bourla AB, Kastner DL, Colbert RA, Siegel RM. Lighting the fires within: the cell biology of autoinflammatory diseases. Nat Rev Immunol. 2012;12(8):570-580.

15. Hedrich CM. Shaping the spectrum — From autoinflammation to autoimmunity. Clin Immunol. 2016;165:21-28.

Page 62: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

62

16. Samuels J, Ozen S. Familial Mediterranean fever and the other autoinflammatory syndromes: evaluation of the patient with recurrent fever. Curr Opin Rheumatol. 2006;18(1):108-17.

17. Robinson PC, Brown MA. Genetics of ankylosing spondylitis. Mol Immunol. 2014;57(1):2-11.

18. Toplak N, Frenkel J, Ozen S, et al. An international registry on autoinflammatory diseases: the Eurofever experience. Ann Rheum Dis. 2012;71(7):1177-82.

19. Cortes A, Hadler J, Pointon JP, et al. Identification of multiple risk variants for ankylosing spondylitis through high-density genotyping of immune-related loci. Nat Genet. 2013;45(7):730-738.

20. O’Rielly DD, Uddin M, Rahman P. Ankylosing spondylitis: beyond genome-wide association studies. Curr Opin Rheumatol. 2016;28(4):337-345.

21. Dean LE, Jones GT, MacDonald AG, Downham C, Sturrock RD, Macfarlane GJ. Global prevalence of ankylosing spondylitis. Rheumatology. 2014;53(4):650-657.

22. Haglund E, Bremander AB, Petersson IF, et al. Prevalence of spondyloarthritis and its subtypes in southern Sweden. Ann Rheum Dis. 2011;70(6):943-948.

23. Exarchou S, Lindström U, Askling J, et al. The prevalence of clinically diagnosed ankylosing spondylitis and its clinical manifestations: a nationwide register study. Arthritis Res Ther. 2015;17(1):118.

24. Lee W, Reveille JD, Weisman MH. Women with ankylosing spondylitis: A review. Arthritis Rheum. 2008;59(3):449-454.

25. Reveille JD, Weisman MH. The epidemiology of back pain, axial spondyloarthritis and HLA-B27 in the United States. Am J Med Sci. 2013;345(6):431-436.

26. Pal B. Ankylosing spondylitis, a seronegative spondarthritis. Practitioner. 1987;231(1430):785-93.

27. Klingberg E, Strid H, Ståhl A, et al. A longitudinal study of fecal calprotectin and the development of inflammatory bowel disease in ankylosing spondylitis. Arthritis Res Ther. 2017;19(1):21.

28. Robinson PC, Leo PJ, Pointon JJ, et al. Exome-wide study of ankylosing spondylitis demonstrates additional shared genetic background with inflammatory bowel disease. npj Genomic Med. 2016;1(1):16008.

29. Ginsburg WW, Cohen MD. Peripheral arthritis in ankylosing spondylitis. A review of 209 patients followed up for more than 20 years. Mayo Clin Proc. 1983;58(9):593-6.

30. Gran JT, Skomsvoll JF. The Outcome of Ankylosing Spondylitis: A Study of 100 Patients. Br J Rheumatol 1997;36(7):766-71.

31. McVeigh CM, Cairns AP. Diagnosis and management of ankylosing spondylitis. BMJ. 2006;333(7568):581-5.

32. Meesters JJL, Bremander A, Bergman S, Petersson IF, Turkiewicz A, Englund M. The risk for depression in patients with ankylosing spondylitis: a population-based cohort study. Arthritis Res Ther. 2014;16(5):418.

33. Wells KB, Golding JM, Burnam MA. Psychiatric disorder in a sample of the general population with and without chronic medical conditions. Am J Psychiatry. 1988;145(8):976-981.

34. Linden S Van Der, Valkenburg HA, Cats A. Evaluation of Diagnostic Criteria for Ankylosing Spondylitis. Arthritis Rheum. 1984;27(4):361-368.

Page 63: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

63

35. Kenna TJ, Hanson A, Costello M-E, Brown MA. Functional Genomics and Its Bench-to-Bedside Translation Pertaining to the Identified Susceptibility Alleles and Loci in Ankylosing Spondylitis. Curr Rheumatol Rep. 2016.

36. Evans DM, Spencer CCA, Pointon JJ, et al. Interaction between ERAP1 and HLA-B27 in ankylosing spondylitis implicates peptide handling in the mechanism for HLA-B27 in disease susceptibility. Nat Genet. 2011;43(8):761-767.

37. Dougados M, Dijkmans B, Khan M, Maksymowych W, van der Linden S, Brandt J. Conventional treatments for ankylosing spondylitis. Ann Rheum Dis. 2002;61 Suppl 3(suppl 3):iii40-50.

38. van der Heijde D, Landewé R, Einstein S, et al. Radiographic progression of ankylosing spondylitis after up to two years of treatment with etanercept. Arthritis Rheum. 2008;58(5):1324-1331.

39. van der Heijde D, Landewé R, Baraliakos X, et al. Radiographic findings following two years of infliximab therapy in patients with ankylosing spondylitis. Arthritis Rheum. 2008;58(10):3063-3070.

40. van der Heijde D, Braun J, Deodhar A, et al. Modified stoke ankylosing spondylitis spinal score as an outcome measure to assess the impact of treatment on structural progression in ankylosing spondylitis. Rheumatology. 2019;58(3):388-400.

41. Braun J, Haibel H, de Hooge M, et al. Spinal radiographic progression over 2 years in ankylosing spondylitis patients treated with secukinumab: a historical cohort comparison. Arthritis Res Ther. 2019;21(1):142.

42. Zhu W, He X, Cheng K, et al. Ankylosing spondylitis: etiology, pathogenesis, and treatments. Bone Res. 2019;7(1):22.

43. van der Slik B, Spoorenberg A, Wink F, et al. Although female patients with ankylosing spondylitis score worse on disease activity than male patients and improvement in disease activity is comparable, male patients show more radiographic progression during treatment with TNF-α inhibitors. Semin Arthritis Rheum. 2019;48(5):828-833.

44. Murray C, Fearon C, Dockery M, et al. Ankylosing Spondylitis Response to TNF Inhibition Is Gender Specific: A 6-Year Cohort Study. Ir Med J. 2018;111(9):820.

45. Yang M, Xu M, Pan X, et al. Epidemiological comparison of clinical manifestations according to HLA-B*27 carrier status of Chinese Ankylosing Spondylitis patients. Tissue Antigens. 2013;82(5):338-343.

46. Wu Z, Lin Z, Wei Q, Gu J. Clinical features of ankylosing spondylitis may correlate with HLA-B27 polymorphism. Rheumatol Int. 2009;29(4):389-392.

47. Kim T-J, Kim T-H. Clinical spectrum of ankylosing spondylitis in Korea. Jt Bone Spine. 2010;77(3):235-240.

48. Kim T-J, Na K-S, Lee H-J, Lee B, Kim T-H. HLA-B27 homozygosity has no influence on clinical manifestations and functional disability in ankylosing spondylitis. Clin Exp Rheumatol. 27(4):574-9.

49. Freeston J, Barkham N, Hensor E, Emery P, Fraser A. Ankylosing spondylitis, HLA-B27 positivity and the need for biologic therapies. Jt Bone Spine. 2007;74(2):140-143.

50. Brown MA, Jepson A, Young A, Whittle HC, Greenwood BM, Wordsworth BP. Ankylosing spondylitis in West Africans--evidence for a non-HLA-B27 protective effect. Ann Rheum Dis. 1997;56(1):68-70.

51. Schlosstein L, Terasaki PI, Bluestone R, Pearson CM. High Association of an HL-A Antigen, W27, with Ankylosing Spondylitis. N Engl J Med. 1973;288(14):704-706.

Page 64: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

64

52. Brewerton DA, Hart FD, Nicholls A, Caffrey M, James DCO, Sturrock RD. Ankylosing spondylitis and HL-A 27. Lancet. 1973;301(7809):904-907.

53. CAFFREY MFP, JAMES DCO. Human Lymphocyte Antigen Association in Ankylosing Spondylitis. Vol 242. Nature Publishing Group; 1973:121.

54. Brown MA. Genetics of ankylosing spondylitis. Curr Opin Rheumatol. 2010;22(2):126-132.

55. Reveille JD, Sims A-M, Danoy P, et al. Genome-wide association study of ankylosing spondylitis identifies non-MHC susceptibility loci. Nat Genet. 2010;42(2):123-127.

56. Brown MA, Kenna T, Wordsworth BP. Genetics of ankylosing spondylitis—insights into pathogenesis. Nat Rev Rheumatol. 2016;12(2):81-91.

57. Smith JA. Update on Ankylosing Spondylitis: Current Concepts in Pathogenesis. Curr Allergy Asthma Rep. 2015;15(1):489.

58. Danoy P, Pryce K, Hadler J, et al. Association of Variants at 1q32 and STAT3 with Ankylosing Spondylitis Suggests Genetic Overlap with Crohn’s Disease. Gibson G, ed. PLoS Genet. 2010;6(12):e1001195.

59. Brown MA, Laval SH, Brophy S, Calin A. Recurrence risk modelling of the genetic susceptibility to ankylosing spondylitis. Ann Rheum Dis. 2000;59(11):883-6.

60. Ellinghaus D, Jostins L, Spain SL, et al. Analysis of five chronic inflammatory diseases identifies 27 new associations and highlights disease-specific patterns at shared loci. Nat Genet. 2016;48(5):510-518.

61. Burton PR, Clayton DG, Cardon LR, et al. Association scan of 14,500 nonsynonymous SNPs in four diseases identifies autoimmunity variants. Nat Genet. 2007;39(11):1329-1337.

62. Ranganathan V, Gracey E, Brown MA, Inman RD, Haroon N. Pathogenesis of ankylosing spondylitis — recent advances and future directions. Nat Rev Rheumatol. 2017;13(6):359-367.

63. Cortes A, Pulit SL, Leo PJ, et al. Major histocompatibility complex associations of ankylosing spondylitis are complex and involve further epistasis with ERAP1. Nat Commun. 2015;6(1):7146.

64. Simone D, Al Mossawi MH, Bowness P. Progress in our understanding of the pathogenesis of ankylosing spondylitis. Rheumatology. 2018;57(suppl_6):vi4-vi9.

65. Mosaad YM. Clinical Role of Human Leukocyte Antigen in Health and Disease. Scand J Immunol. 2015;82(4):283-306.

66. The MHC sequencing Consortium. Complete sequence and gene map of a human major histocompatibility complex. Nature. 1999;401(6756):921-923.

67. Wagner CS, Grotzke JE, Cresswell P. Intracellular events regulating cross-presentation. Front Immunol. 2012;3:138.

68. Sommer S. The importance of immune gene variability (MHC) in evolutionary ecology and conservation. Front Zool. 2005;2:16.

69. Choo SY. The HLA System: Genetics, Immunology, Clinical Testing, and Clinical Implications. Yonsei Med J. 2007;48(1):11.

70. Robinson J, Mistry K, McWilliam H, Lopez R, Parham P, Marsh SGE. The IMGT/HLA database. Nucleic Acids Res. 2011;39(Database):D1171-D1176.

71. Robinson J, Halliwell JA, McWilliam H, Lopez R, Parham P, Marsh SGE. The IMGT/HLA database. Nucleic Acids Res. 2012;41(D1):D1222-D1227.

72. IPD-IMGT/HLA Database. https://www.ebi.ac.uk/ipd/imgt/hla/intro.html. 2019.

Page 65: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

65

73. Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SGE. The IPD and IMGT/HLA database: Allele variant databases. Nucleic Acids Res. 2015;43(D1):D423-D431.

74. Norman PJ, Norberg SJ, Guethlein LA, et al. Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II. Genome Res. 2017;27(5):813-823.

75. Stewart CA, Horton R, Allcock RJN, et al. Complete MHC haplotype sequencing for common disease gene mapping. Genome Res. 2004;14(6):1176-87.

76. Neville MJ, Lee W, Humburg P, et al. High resolution HLA haplotyping by imputation for a British population bioresource. Hum Immunol. 2017;78(3):242-251.

77. Markov P V, Pybus OG. Evolution and Diversity of the Human Leukocyte Antigen(HLA). Evol Med public Heal. 2015;2015(1):1.

78. Ayala García MA, González Yebra B, López Flores AL, Guaní Guerra E. The Major Histocompatibility Complex in Transplantation. J Transplant. 2012;2012:1-7.

79. Trowsdale J, Knight JC. Major Histocompatibility Complex Genomics and Human Disease. Annu Rev Genomics Hum Genet. 2013;14(1):301-323.

80. Kanda J. Effect of HLA mismatch on acute graft-versus-host disease. Int J Hematol. 2013;98(3):300-308.

81. Hamilton L, Macgregor A, Toms A, Warmington V, Pinch E, Gaffney K. The prevalence of axial spondyloarthritis in the UK: a cross-sectional cohort study. BMC Musculoskelet Disord. 2015;16:392.

82. J. G. ImmunoGenetic study in Chinese population with Ankylosing Spondylitis: Are ther specific genes recently disclosed? Int J Rheum Dis. 2013;16:16.

83. Salmela E, Lappalainen T, Fransson I, et al. Genome-wide analysis of single nucleotide polymorphisms uncovers population structure in Northern Europe. PLoS One. 2008;3(10).

84. Salmela E, Lappalainen T, Liu J, et al. Swedish Population Substructure Revealed by Genome-Wide Single Nucleotide Polymorphism Data. Kayser M, ed. PLoS One. 2011;6(2):e16747.

85. Ameur A, Dahlberg J, Olason P, et al. SweGen: A whole-genome data resource of genetic variability in a cross-section of the Swedish population. Eur J Hum Genet. 2017;25(11):1253-1260.

86. Mathieu A, Paladini F, Vacca A, Cauli A, Fiorillo MT, Sorrentino R. The interplay between the geographic distribution of HLA-B27 alleles and their role in infectious and autoimmune diseases: A unifying hypothesis. Autoimmun Rev. 2009;8(5):420-425.

87. Khan MA. HLA-B27 and its subtypes in world populations. Curr Opin Rheumatol. 1995;7(4):263-9.

88. Maiers M, Gragert L, Klitz W. High-resolution HLA alleles and haplotypes in the United States population. Hum Immunol. 2007;68(9):779-788.

89. Eriksson D, Bianchi M, Landegren N, et al. Extended exome sequencing identifies BACH2 as a novel major risk locus for Addison’s disease. J Intern Med. 2016;280(6):595-608.

90. Bertram L, Tanzi RE. Genome-wide association studies in Alzheimer’s disease. Hum Mol Genet. 2009;18(R2):137-145.

91. Cerhan JR, Berndt SI, Vijai J, et al. Genome-wide association study identifies multiple susceptibility loci for diffuse large B cell lymphoma. Nat Genet. 2014;46(11):1233-8.

Page 66: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

66

92. Sawcer S, Hellenthal G, Pirinen M, et al. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature. 2011;476(7359):214-9.

93. Costantino F, Talpin A, Said-Nahal R, et al. A family-based genome-wide association study reveals an association of spondyloarthritis with MAPK14. Ann Rheum Dis. 2017;76(1):310-314.

94. Hrdlickova B, de Almeida RC, Borek Z, Withoff S. Genetic variation in the non-coding genome: Involvement of micro-RNAs and long non-coding RNAs in disease. Biochim Biophys Acta - Mol Basis Dis. 2014;1842(10):1910-1922.

95. Gusev A, Bhatia G, Zaitlen N, et al. Quantifying Missing Heritability at Known GWAS Loci. Visscher PM, ed. PLoS Genet. 2013;9(12):e1003993.

96. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci. 1977;74(12):5463-5467.

97. Schuster SC. Next-generation sequencing transforms todayś biology. Nat Methods. 2008;5(1):16-18.

98. Lander ES. L, Birren LM., Nusbaum B, et al. Initial sequencing and analysis of the human genome. Nature. 2001;409(February):860-921.

99. Craig Venter J, Adams MD, Myers EW, et al. The sequence of the human genome. Science (80- ). 2001;291(5507):1304-1351.

100. Lindblad-Toh K, Garber M, Zuk O, et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature. 2011;478(7370):476-482.

101. Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008;24(3):133-141.

102. Davey JW, Hohenlohe PA, Etter PD, Boone JQ, Catchen JM, Blaxter ML. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nat Publ Gr. 2011;12(7):499-510.

103. Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754-1760.

104. McKenna A, Hanna M, Banks E, et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20(9):1297-303.

105. DePristo MA, Banks E, Poplin R, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43(5):491-8.

106. Van der Auwera GA, Carneiro MO, Hartl C, et al. From FastQ Data to High-Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline. Current Protocols in Bioinformatics. Vol 43. ; 2013:11.10.1-11.10.33.

107. Carson AR, Smith EN, Matsui H, et al. Effective filtering strategies to improve data quality from population-based whole exome sequencing studies. BMC Bioinformatics. 2014;15(1):125.

108. Dilthey AT, Gourraud P-A, Mentzer AJ, Cereb N, Iqbal Z, McVean G. High-Accuracy HLA Type Inference from Whole-Genome Sequencing Data Using Population Reference Graphs. Franke A, ed. PLOS Comput Biol. 2016;12(10):e1005151.

109. Lee H, Kingsford C. Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery. Genome Biol. 2018;19(1):16.

110. Dilthey A, Cox C, Iqbal Z, Nelson MR, McVean G. Improved genome inference in the MHC using a population reference graph. Nat Genet. 2015;47(6):682-688.

Page 67: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

67

111. Jia X, Han B, Onengut-Gumuscu S, et al. Imputing Amino Acid Polymorphisms in Human Leukocyte Antigens. PLoS One. 2013;8(6):e64683.

112. Browning SR, Browning BL. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet. 2007;81(5):1084-97.

113. Nariai N, Kojima K, Saito S, et al. HLA-VBSeq: accurate HLA typing at full resolution from whole-genome sequencing data. BMC Genomics. 2015;16(Suppl 2):S7.

114. Ka S, Lee S, Hong J, et al. HLAscan: genotyping of the HLA region using next-generation sequencing data. BMC Bioinformatics. 2017;18(1):258.

115. Kawaguchi S, Higasa K, Shimizu M, Yamada R, Matsuda F. HLA-HD: An accurate HLA typing algorithm for next-generation sequencing data. Hum Mutat. 2017;38(7):788-797.

116. Szolek A, Schubert B, Mohr C, Sturm M, Feldhahn M, Kohlbacher O. OptiType: Precision HLA typing from next-generation sequencing data. Bioinformatics. 2014;30(23):3310-3316.

117. Ionita-Laza I, Lee S, Makarov V, Buxbaum JD, Lin X. Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet. 2013;92(6):841-53.

118. Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rare-Variant Association Testing for Sequencing Data with the Sequence Kernel Association Test. Am J Hum Genet. 2011;89(1):82-93.

119. Fan Y, Song Y-Q. PyHLA: tests for the association between HLA alleles and diseases. BMC Bioinformatics. 2017;18(1):90.

120. Whitlock MC. Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J Evol Biol. 2005;18(5):1368-1373.

121. Fadista J, Manning AK, Florez JC, Groop L. The (in)famous GWAS P-value threshold revisited and updated for low-frequency variants. Eur J Hum Genet. 2016;24(8):1202-1205.

122. Cline MS, Karchin R. Using bioinformatics to predict the functional impact of SNVs. Bioinformatics. 2011;27(4):441-448.

123. Ionita-Laza I, Capanu M, De Rubeis S, McCallum K, Buxbaum JD. Identification of rare causal variants in sequence-based studies: methods and applications to VPS13B, a gene involved in Cohen syndrome and autism. PLoS Genet. 2014;10(12):e1004729.

124. Consortium TEP. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012;489(7414):57-74.

125. Boyle AP, Hong EL, Hariharan M, et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012;22(9):1790-7.

126. Lonsdale J, Thomas J, Salvatore M, et al. The Genotype-Tissue Expression (GTEx) project. Nat Genet. 2013;45(6):580-585.

127. Fishilevich S, Nudel R, Rappaport N, et al. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford). 2017;2017.

128. Choi Y, P. Chan A. PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics. 2015;31(16):2745-2747.

129. Cingolani P, Platts A, Wang LL, et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly. 2012;6(2):80-92.

Page 68: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

68

130. Roider HG, Kanhere A, Manke T, Vingron M. Predicting transcription factor affinities to DNA from a biophysical model. Bioinformatics. 2007;23(2):134-141.

131. Thomas-Chollier M, Hufton A, Heinig M, et al. Transcription factor binding predictions using TRAP for the analysis of ChIP-seq data and regulatory SNPs. Nat Protoc. 2011;6(12):1860-1869.

132. Steinke FC, Yu S, Zhou X, et al. TCF-1 and LEF-1 act upstream of Th-POK to promote the CD4+ T cell fate and interact with Runx3 to silence Cd4 in CD8+ T cells. Nat Immunol. 2014;15(7):646-656.

133. Bauer O, Sharir A, Kimura A, Hantisteanu S, Takeda S, Groner Y. Loss of Osteoblast Runx3 Produces Severe Congenital Osteopenia. Mol Cell Biol. 2015;35(7):1097-1109.

134. Wang D, Diao H, Getzler AJ, et al. The Transcription Factor Runx3 Establishes Chromatin Accessibility of cis-Regulatory Landscapes that Drive Memory Cytotoxic T Lymphocyte Formation. Immunity. 2018;48(4):659-674.e6.

135. Bonyadi Rad E, Musumeci G, Pichler K, et al. Runx2 mediated Induction of Novel Targets ST2 and Runx3 Leads to Cooperative Regulation of Hypertrophic Differentiation in ATDC5 Chondrocytes. Sci Rep. 2017;7(1):17947.

136. Bleil J, Sieper J, Maier R, et al. Cartilage in facet joints of patients with ankylosing spondylitis (AS) shows signs of cartilage degeneration rather than chondrocyte hypertrophy: implications for joint remodeling in AS. Arthritis Res Ther. 2015;17(1):170.

137. Bauer S, Groh V, Wu J, et al. Activation of NK cells and T cells by NKG2D, a receptor for stress-inducible MICA. Science. 1999;285(5428):727-9.

138. Spies T, Blanck G, Bresnahan M, Sands J, Strominger J. A new cluster of genes within the human major histocompatibility complex. Science. 1989;243(4888):214-217.

139. Jiao Y-L, Ma C-Y, Wang L-C, et al. Polymorphisms of KIRs Gene and HLA-C Alleles in Patients with Ankylosing Spondylitis: Possible Association with Susceptibility to the Disease. J Clin Immunol. 2008;28(4):343-349.

140. Wang WY, Tian W, Zhu FM, Liu XX, Li LX, Wang F. MICA , MICB Polymorphisms and Linkage Disequilibrium with HLA-B in a Chinese Mongolian Population. Scand J Immunol. 2016;83(6):456-462.

141. Dolcino M, Tinazzi E, Pelosi A, et al. Gene Expression Analysis before and after Treatment with Adalimumab in Patients with Ankylosing Spondylitis Identifies Molecular Pathways Associated with Response to Therapy. Genes (Basel). 2017;8(4).

142. Forouhan M, Mori K, Boot-Handford RP. Paradoxical roles of ATF6α and ATF6β in modulating disease severity caused by mutations in collagen X. Matrix Biol. March 2018.

143. Larjo A, Eveleigh R, Kilpeläinen E, et al. Accuracy of Programs for the Determination of Human Leukocyte Antigen Alleles from Next-Generation Sequencing Data. Front Immunol. 2017;8(DEC):1-9.

144. Johansson Å, Ingman M, Mack SJ, Erlich H, Gyllensten U. Genetic origin of the Swedish Sami inferred from HLA class I and class II allele frequencies. Eur J Hum Genet. 2008;16(11):1341-1349.

145. Lee W, Reveille JD, Davis JC, Learch TJ, Ward MM, Weisman MH. Are there gender differences in severity of ankylosing spondylitis? Results from the PSOAS cohort. Ann Rheum Dis. 2007;66(5):633-638.

Page 69: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

69

146. Gran JT, Mellby AS, Husby G. The prevalence of HLA-B27 in Northern Norway. Scand J Rheumatol. 1984;13(2):173-6.

147. The Haplotype Reference Consortium, McCarthy S, Das S, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet. 2016;48(10):1279-1283.

148. Ross MT, Grafham D V, Coffey AJ, et al. The DNA sequence of the human X chromosome. Nature. 2005;434(7031):325-37.

149. Kosugi S, Momozawa Y, Liu X, Terao C, Kubo M, Kamatani Y. Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing. Genome Biol. 2019;20(1):117.

150. Wang J, Yang Y, Guo S, et al. Association between copy number variations of HLA-DQA1 and ankylosing spondylitis in the Chinese Han population. Genes Immun. 2013;14(8):500-503.

Page 70: Human leukocyte antigen in sickness and in healthuu.diva-portal.org/smash/get/diva2:1354233/FULLTEXT01.pdfNordin, J. 2019. Human leukocyte antigen in sickness and in health. Ankylosing

Acta Universitatis UpsaliensisDigital Comprehensive Summaries of Uppsala Dissertationsfrom the Faculty of Medicine 1599

Editor: The Dean of the Faculty of Medicine

A doctoral dissertation from the Faculty of Medicine, UppsalaUniversity, is usually a summary of a number of papers. A fewcopies of the complete dissertation are kept at major Swedishresearch libraries, while the summary alone is distributedinternationally through the series Digital ComprehensiveSummaries of Uppsala Dissertations from the Faculty ofMedicine. (Prior to January, 2005, the series was publishedunder the title “Comprehensive Summaries of UppsalaDissertations from the Faculty of Medicine”.)

Distribution: publications.uu.seurn:nbn:se:uu:diva-393317

ACTAUNIVERSITATIS

UPSALIENSISUPPSALA

2019