38
Investigation of GWAS data on Type 1 Diabetes Mellitus using pathway and network based approaches T.J.E. Hooimeijer I6049728 Biomedical Sciences Direction Biological Health Sciences Bachelor thesis Supervised by: Martina Summer-Kutmon & Elisa Cirillo Internship period: 20-04-2015 – 03-07-2015 Maastricht University Date: 02-07-2015

Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

  • Upload
    votram

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

Investigation of GWAS data on Type 1 Diabetes Mellitus using pathway and network based approaches

T.J.E. Hooimeijer

I6049728

Biomedical Sciences

Direction Biological Health Sciences

Bachelor thesis

Supervised by: Martina Summer-Kutmon & Elisa Cirillo

Internship period: 20-04-2015 – 03-07-2015

Maastricht University

Date: 02-07-2015

Page 2: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

Contents

Abstract 1. Introduction 1.1 Type 1 Diabetes Mellitus .................................................................................................. 1

1.2 Disease causes ................................................................................................................... 2 1.3 Genome wide association studies ...................................................................................... 2 1.4 Research aim ..................................................................................................................... 3

2. Materials and methods

2.1 Data resources 2.1.1 Diabetes type 1 GWAS datasets .................................................................................. 4

2.1.2 WikiPathways .............................................................................................................. 4 2.1.3. DisGeNET-analysis .................................................................................................... 5

2.2 Tools 2.2.1 GWAS3D .................................................................................................................... 5

2.2.2 Ensembl BioMart ........................................................................................................ 6 2.2.3 Cytoscape .................................................................................................................... 6 2.2.3 PathVisio ..................................................................................................................... 7 2.2.5 GO-analysis ................................................................................................................. 8

2.3 Workflow .......................................................................................................................... 8 3. Results 3.1 GWAS3D-analysis ............................................................................................................ 9

3.2 GO-Analysis .................................................................................................................... 11 3.3 Network analysis ............................................................................................................. 11 3.4 PathVisio visualization of SNPs 3.4.1. Visualization of SNPs in PathVisio ......................................................................... 14

3.4.2. Pathways linked to T1DM ........................................................................................ 18 4. Discussion 4.1 GWAS analysis ............................................................................................................... 17

4.2 GO-Analysis .................................................................................................................... 18 4.3 Network analysis ............................................................................................................. 18 4.4 PathVisio visualization of SNPs ..................................................................................... 18 4.5 Comparing results with other studies .............................................................................. 21 4.6 Shortcomings in this study .............................................................................................. 22

4.7 Conclusion ....................................................................................................................... 22 5. References.............................................................................................................................23 6. Appendix...............................................................................................................................26 Add-on page: Description of work by student Bachelor BMW ............................................... 34

Page 3: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

Abstract

Type 1 Diabetes Mellitus (T1DM) is an autoimmune disorder in which the body attacks its

own beta-cells. This leads to an insufficient insulin production, causing a defective glucose

homeostasis. The exact mechanism causing T1DM is unknown, but environmental- as well as

genetic factors play a crucial role. With the upcoming of genome-wide association studies

(GWAS), more and more genes determining the genetic susceptibility to T1DM become

known. The MHC-genes are well known to be involved, but many other genes are already

discovered and probably more will be discovered by feature research. In this research project

we tried to find the disease related processes by integrating and visualizing the genetic

variants with existing pathway knowledge.

Data from two different GWAS were reanalyzed to examine the involvement of SNPs in the

susceptibility to T1DM. SNPs were visualized and reprioritized by GWAS3D and

consequently linked to their genes using Ensembl BioMart. A network was created were the

gene-SNP list was merged together with a list of all the pathway and genes from

WikiPathways. Not all genes could be linked to a pathway from WikiPathways and not all

pathways had genes with T1DM related SNPs. These genes and pathways were removed to

create a final network. This network was analyzed and a GO-analysis was performed to

search for enriched biological processes. The genes not linked to a pathway were further

investigated using DisGeNET. An effort was made to visualize SNPs using PathVisio.

The three pathways with the most genes with SNPs were the ‘Allograft Rejection’,

‘Proteasome degradation’ and ‘Type II interferon signaling’ pathways. The MHC/HLA-genes

were involved multiple times in these pathways, explaining their importance in T1DM. GO-

analysis also showed that immunological process represent most of the enriched biological

processes. The visualization of SNPs in PathVisio possibly improves the pathway analysis

from GWAS data by giving information about the SNPs after selection of a gene and

providing the possibilty to visualize SNPs based on SIFT-scores (from Ensembl BioMart).

The method applied is able to confirm existing knowledge and gives possibilities to search

new loci and pathways involved in the disease process.

Page 4: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

1

1. Introduction

1.1 Type 1 Diabetes Mellitus Type 1 Diabetes Mellitus (T1DM) is a chronic disease in which the beta-cells in the islands of

Langerhans are damaged or destroyed. T1DM belongs to the class of autoimmune disorders,

which are characterized by the breakdown of the host’s own tissue leading to damage in

different organs and tissues, in this case the beta-cells (1). In the Netherlands alone

approximately 750.000 people are diagnosed with diabetes, of which 125.000 people had

T1DM (2013) (2).

T1DM is also called ‘juvenile onset diabetes’ or ‘insulin-dependent diabetes mellitus’ because

the typical age of onset is between 6-25 years. T1DM is a disease in which the human body

does not produce (sufficient) insulin because of the damaged or missing beta-cells, causing

blood glucose concentrations (BGC) to exceed the accepted range. BGC can range between

300 mg/100 ml and 1200 mg/ml, which has multiple physiological effects. The excess

glucose will partly be excreted via the kidneys. The lack of insulin causes inefficient use of

the available carbohydrates. The excess excretion together with the increased osmotic

pressure in the blood stream causes dehydration of the cells and extracellular fluids. Long

periods of high BGC can even lead to endothelial dysfunction, leading to insufficient blood

supply to various tissue, which in turn increases the risk of a stroke, heart attack, blindness

and numerous other serious conditions. Patients are treated with insulin, helping to control

blood sugar levels and thereby counteracting side effects of hyperglycemia (3, 4).

The negative effects associated with long periods of high BGC indicate that more research

into the causes and possible alternative treatment options for T1DM patients is needed. Better

treatment options would be of benefit to numerous people (1, 3).

T1DM must not be confused with Type 2 Diabetes Mellitus (T2DM), see Figure 1. Both types

of diabetes mellitus are recognized by frequent urination, hunger and intense thirst when

untreated. Despite this similarity there are a lot of differences between T1DM and T2DM.

T2DM is also called ‘maturity onset diabetes’ and is caused by insulin resistance. T2DM

often develops at ages over 40 years and is strongly associated with obesity (3, 5).

Page 5: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

2

Figure 1: The differences between T1DM and T2DM.

In both types of diabetes, cells do not absorb glucose. In T1DM patients there is not enough insulin produced

due to the lack of pancreatic beta-cells, which are destroyed by an autoimmune reaction. The insulin is needed to

absorb the glucose. In T2DM patients the cells do not respond to the produced insulin, since the cells have

become insulin resistant. Both diseases lead to a high blood sugar level also called hyperglycemia.

1.2 Disease causes  

The autoimmune reaction in T1DM causes the breakdown of the beta-islets, but it may not be

the primary cause of the disease. The exact mechanisms causing T1DM are still unknown.

Immune disorders and viral infection can play a role, but also genetic predisposition and

environmental factors are important (6, 7). This has also been demonstrated by studying the

different rates of development of T1DM between monozygotic twins or the 15-fold higher

risk for developing the disease for people with a first-degree relative of T1DM patients (8).

Research into the causes of T1DM have shown the importance of the autoimmune reaction

against the pancreatic beta-cells, especially the CD4+- and CD8+ T-cells in the breakdown of

beta-cells in the pancreas. Genetic variations in genes involved around the T-cell signaling,

activation and proliferation of T-cells or in genes involved in the cytokine production may

cause malfunctioning proteins, favouring an autoimmune response (8).

Currently, a lot of research is conducted into the genes involved in all the processes around T-

cell signaling.

1.3 Genome wide association studies  

In a genome wide association study (GWAS) the genome is screened for genetic variants in

multiple individuals. A common approach is to select subjects with and without a specific

trait or disease and then comparing the genomes of the two groups. The aim of genome-wide

association studies is to find genetic variants which are present in the disease population but

not in the (healthy) control group. With the increasing amount of GWAS data available, more

Page 6: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

3

and more loci are found that may be involved in disease development. Several studies found

genes and loci related to T1DM, including genes involved in immune response, like the major

histocompatibility complex (MHC) genes, the protein tyrosine phosphatase-22 (PTPN22)

gene and the inhibitory T-cell signaling molecule producing cytotoxic T-lymphocyte-

associated protein 4 (CTLA4) gene (9). The identification of the genes and loci involved in

T1DM may help to better understand the disease mechanisms, and might be useful for early

diagnostics and development of better treatment options.

1.4 Research aim  

In this project we combine data from GWAS data on T1DM (or in combination with other

diseases) to investigate the effects of single nucleotide polymorphisms (SNPs) on the genetic

predisposition to T1DM using pathway- and network based approaches. A SNP is a variation

in the DNA sequence found in at least 1% of the population (10). These SNPs will be

analyzed by linking them to genes and consequently combining the genes with pathway

knowledge to form a pathway-gene-SNP network.

Our research question is: “Can we find disease related processes by integrating and

visualizing the significant genetic variants with existing pathway knowledge?”

The visualization of the relationship between SNPs, genes and diseases in pathway diagrams

and networks is an important and intuitive method to confirm existing knowledge as well as

gain new insights into disease mechanisms. Since T1DM is an autoimmune disease, we

expect to find variations in genes that are involved in immune related pathways.

The GWAS data was obtained from an open access database provided by Johnson AD and

O'Donnell CJ. The SNPs were visualized and reprioritized using GWAS3D, a tool that adds

SNPs based on linkage disequilibrium. This means that SNPs can be added based on

associations between those alleles, different than what would be expected if the alleles were

independent (11). Significant SNPs were linked to their genes using Ensembl BioMart and

consequently these genes were linked to biological pathways in which they are involved. A

network of SNPs, genes and pathways was made using Cytoscape and relevant pathways were

visualized using PathVisio.

Page 7: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

4

2. Materials and methods

Section 2.1 the different data resources that are used in the analysis are explained, in section

2.2. the tools and section 2.3. will give a detailed workflow of the analysis

2.1 Data resources

2.1.1 Diabetes type 1 GWAS datasets  

Johnson and O'Donnell analysed and integrated the results of more than hundred GWA

studies (12). Two of those studies included data about T1DM. The first study was performed

by the WTCCC group, who conducted a GWAS based on 2.000 cases each for 7 diseases with

a combined control group of 3.000 people. The 7 diseases were bipolar disorder, coronary

artery disease, Crohn's disease, hypertension, rheumatoid arthritis and type I and II diabetes

mellitus. The participants in this study were residents of Great Britain and were classified as

white Europeans with caucasian ancestry. Of the 3.000 control individuals, 1.500 were

recruited from the 1958 British Birth Cohort and 1.500 from blood donors recruited by the

WTCCC. Their age varied between 18 and 69 years. Participants of the T1DM group were

required to be diagnosed with T1DM before the age of 17 and to be insulin dependent from

that time on with a minimum period of 6 months. After quality control, a total of 4.806

samples were analyzed from 1.868 cases and 2.938 controls. A total of 469.557 SNPs were

analyzed (13).

The second study was performed by Hakonarson et al. In this dataset a total of 1.028 cases

and 1.143 controls were used. Cases and family trios with T1DM were selected from multiple

paediatric diabetes clinics in the United States. Control individuals were selected from the

Children’s Hospital of Philadelphia Health Care Network. With the Illumina Human Hap550

Genotyping BeadChip 550.000 SNPs were genotyped from a total of 563 cases, 1.146

controls and 483 T1DM family trios. They all had European ancestry. After quality control a

total of 534.071 SNPs were analyzed (14).

2.1.2 WikiPathways

WikiPathways (http://www.wikipathways.org) is an open online database for biological

pathways that is used for sharing and publishing pathway knowledge (15). Hyperlinks to

literature references and/or information pages on a gene or protein can be added to the

pathway diagram to include as much information as possible. WikiPathways is not only a

convenient tool to study human biology, also pathway data for 20 other species are included.

Each pathway has its own wiki page where the pathway diagram, the description, the

Page 8: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

5

references and a list of pathway elements, including gene products, metabolites and proteins,

are available. Pathways can be edited by using the pathway editor. A version history for each

change in the pathway is also kept (15). Pathways can also be downloaded and used in

advanced analysis programs like PathVisio. PathVisio is described in section 2.2.3.

A list of all the pathways and linked genes from WikiPathways was provided by Bram van

Steen.

2.1.3. DisGeNET-analysis

DisGeNET (http://disgenet.org/web/DisGeNET/menu) is an open access databases that

combines different sources to link genes to diseases (16). DisGeNET was used to check for

gene-disease associations of the genes in our study that were not present in any of the

pathways.

A script was made to convert the Entrez gene ids -associated to diseases, to the Ensembl gene

ids - disease associations. In Cytoscape a network was made in which genes were linked to

the disease(s) to which they are associated. This network was analyzed manually.

2.2 Tools

2.2.1 GWAS3D  

GWAS3D (http://jjwanglab.org/gwas3d) is a tool that is helpful in presenting and interpreting

the large amount of data available from GWAS. One way in which GWAS3D helps to present

results is with the Circos-style GWAS visualization, see Figure 4. The most significant

variants are displayed in the outer circle. These variants are linked to genes or genomic

locations shown in the inner circle. Red lines in the center of the circle indicate a epistatic

regulatory interaction with another locus. The thickness of the line represents the intensity of

the interaction (11).

The GWAS3D tool combines different genomic features with comprehensive annotations,

giving an overview of the significance of different genomic regions. The tool also provides a

prioritization algorithm that adds SNPs based on linkage disequilibrium.

In this experiment the analysis was performed in a European American (CEPH) population,

the P-value cutoff level used was 1*10-7, linkage disequilibrium cutoff (R2) of 0.8, binding

site p-value of 1*10-2, and promotor range (up/down) of 500/100 was used. No cell types

restrictions were given.

Page 9: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

6

2.2.2 Ensembl BioMart

Ensembl BioMart (http://www.ensembl.org/biomart/) is a data management system that can

be used for advanced data querying. Ensembl BioMart can find genes, sequences etc. in the

Ensembl database based on multiple attributes such as chromosome position or gene list. It

provides a table with a clear overview of all the results. In the filters panel, multiple filters can

be added to define restrictions, like specific genes or chromosome. The attributes panel allows

you obtain specific information about the given restrictions, like Ensembl Gene ID or

Ensembl Transcript ID and many other options available. One way to use this tool is to look

for genes linked with a certain SNP or list of SNPs. In this project we used the SNPs (after

GWAS3D analysis) as a filter. We selected the ‘Ensembl Gene ID’, ‘HGNC-symbol’ and

‘SIFT-score’ as attributes. The SIFT-score estimates if a SNP will be deleterious. The score

ranges between 0 and 1, where a low score indicates a possible deleterious SNP.

Ensembl BioMart was also used to investigate the association between genes (filters) and

their phenotype (attributes: phenotype description) were investigated, to see whether genes

could be linked to type 1 diabetes. For both these investigations the ‘Ensembl Variation 80,

Homo sapiens Short Variants SNPs and Indels GRCh38.2’ database was used.

2.2.3 Cytoscape

Cytoscape (http://www.cytoscape.org/) is a freely available software tool that can be used to

analyse, visualize and integrate networks (17). One of the functions of Cytoscape is to create

networks by importing files and/or tables. Nodes and edges will then have to be set, after

which Cytoscape will create the network. An example to illustrate the meaning of nodes and

edges is the following: ‘T1DM’ could be a node to which genes (the edges) associated to

T1DM are linked. Networks can also be merged together with other networks, giving users

the option to improve interpretation options by advanced visualization. The merging of

networks is illustrated in Figure 2.

Figure 2: Example of the merging function in Cytoscape. In network work a gene (gene 1 in this example) is associated with a SNP (SNP 1). Gene 1 is involved in a

pathway (pathway 1), shown in network 2. The merging function combines the two network so that the pathway,

gene and SNP are linked. ‘Pathway -, Gene- and SNP 1’ are examples used to demonstrate the merging function

in Cytoscape.

Page 10: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

7

In this projects two networks were made in the beginning. The first network consisted of a list

of SNPs linked to the genes. The second network of the pathways and linked genes from

WikiPathways. These two networks were then merged together to form a network with

pathways, genes and SNP. This will be further explained in the materials & methods and

results section.

2.2.3 PathVisio

PathVisio (http://www.pathvisio.org/) is a program that gives researchers the ability the

create, edit, analyse and visualize biological pathways (18). Pathways have been used in

biology for a long time to describe the biological interactions between molecules like genes,

proteins and metabolites. PathVisio gives the possibility to edit pathways or even to create

new ones that can be directly uploaded to WikiPathways. Within a pathway references and

comments can be added, giving the possibility to show a lot of information in a single

overview.

PathVisio can also be used for the visualization of pathways. The visualization option gives

the users the possibility to show experimental data on the pathway diagrams. The data can be

loaded into PathVisio which then can visualize the results on the data nodes and interactions

in the pathway. Different rules for visualization can be used, for example a color gradient can

be used to visualize an activity measurement, whereas a color rule can be used to show

significance levels. Data nodes will split into multiple columns, so all data is visible at the

same time.

Pathway statistics is a tool to find pathways which are (significantly) altered in a dataset. The

user defines a criterion on which the selection should take place (value for activity

measurement, p-value etc). Based on the selected gene list PathVisio performs an over-

representation analysis calculating a Z-score for each pathway. A positive Z-score indicates

that more genes in that pathway meet the criteria than expected based on the complete dataset.

PathVisio can also be extended with plugins. In this project the new RI-plugin for PathVisio

was tested. This plugin links the genes in the pathway to related elements, for example the

known variants for the selected gene. Those variants are then visualized in a side-panel. By

clicking on a gene, information about the linked SNPs and their SIFT-scores could be

obtained. These SIFT-scores were obtained from Ensembl BioMart. Visualizations could then

be performed based on SIFT-scores and p values, making it easier to find genes (with SNPs)

with a potential deleterious effect.

Page 11: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

8

2.2.5 GO-analysis

GOrilla (http://cbl-gorilla.cs.technion.ac.il/) is a tool that can be used to identify enriched GO-

terms in a list of genes (19). There are two input options. In the first option a ‘Single ranked

list of genes’ has to be provided. In the second option ‘two unranked lists of genes (target and

background lists)’ have to be provided. Four different ontologies can be chosen: Process,

function, component and ‘all’. The tool then provides researchers with a table of the most

enriched processes/functions/components and provides a schematic overview.

In this project we used all the genes in the WikiPathways collection as a background list and

used all the genes with T1DM associated SNPs as a ‘single ranked list of genes’, so we could

investigate which processes are overrepresented.

2.3 Workflow  

In Figure 3 a detailed workflow of the analysis is shown to provide a clear overview of the

steps taken. SNPs associated with T1DM were extracted from the database (step 1). SNPs

were visualized and reprioritized by GWAS3D (step 2) and consequently linked to their genes

using Ensembl BioMart (step 3). A GO-analysis was performed to search for enriched

biological processes (step 4). Genes together with their T1DM associated SNPs were

visualized in a network using Cytoscape (step 5). This network was merged together with a

network of all the pathways from WikiPathways and the genes involved in these pathways.

The merged (‘Merged pathway-gene-SNP network’) network consisted of pathways linked to

genes, which were consequently linked to the SNPs (step 6). Not all genes could be linked to

a pathway from WikiPathways and not all pathways had genes with T1DM associated genes.

These genes and pathways were removed (step 7) to create a new network, called ‘Filtered

pathway-gene SNP network’ (step 8). This network was then analyzed using Excel (step 9).

The genes not present in any pathways were further investigated using DisGeNET (step 10).

Additionally, an effort was made to visualize and analyse the T1DM related SNPs using

PathVisio (step 11).

Page 12: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

9

Figure 3: The 11 analysis steps performed in this project to integrate, visualize and analyse genetic

variation data together with biological pathway knowledge.

3. Results

3.1 GWAS3D-analysis

5.352 different SNPs related to T1DM were extracted from the database provided by Johnson

and O'Donnell. The p-values which indicated the associations of the SNPs to T1DM varied

from 1,12E-307 for rs9273363 to 8,11E-04 for rs12124983. These SNPS were reanalyzed

using the GWAS3D SNP analyzing program. GWAS3D detected 190 variants having

regulatory signals, 190 variants causing transcription factor binding site affinity changes, 159

variants affecting long range interactions, 62 variants having a direct effect by GWAS leading

SNPs and 128 variants having an indirect effect by high linkage disequilibrium (LD) of

GWAS leading SNPs. The Circos-style visualization is shown in Figure 4.

Page 13: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

10

Figure 4: Circos-style visualization (GWAS3D) of significant SNPs.

SNP ids or loci are displayed in the outer circle, linked to the corresponding genes or genomic location in the

inner circle. Red lines indicate interaction between the linked genes or loci, where the thickness of the line

represents the intensity of the interaction.

Ensembl BioMart

After removing SNPs with a p-value > 1E-07 and after GWAS3D reprioritization, the

remaining SNPs were linked to the affected genes using Ensembl BioMart. These included a

total of 2.183 different SNPs. Eventually 1.521 SNPs of these 2.183 SNPs were matched to

340 different genes. There were 2496 different links between the 1.521 SNPs and 340 genes.

This is possible since a gene can be linked to multiple genes.

SNPs not linked to genes

The 662 SNPs that could not be linked to a gene were analyzed again to be sure that Ensembl

BioMart could not linked them to a gene. For 145 SNPs a matching gene was found and these

were included in the further analysis.

Page 14: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

11

Most significant SNP

The SNP which is most significantly associated with T1DM is the rs9273363 (association

with T1DM, p-value: 1,12E-307). This SNP is linked to the HLA-DQB1 gene.

3.2 GO-Analysis  

A GO-analysis was performed to search for overrepresented biological processes. All the

genes with T1DM associated SNPs were checked against all genes involved in the pathway

collection from WikiPathways. The most significant processes involved are the antigen

presentation pathway (from exogenous pathogens). Other processes are involved around

leukocytes, lymphocytes, T-cells and B-cells, for instance the ‘positive regulation of T-cell

activation’ process.

Some processes not related to immune function are detected too, like the nucleosome

organization and -assembly. All the processes are shown in Appendix 6.1: Table 1.

3.3 Network analysis  

Merged pathway-gene-SNP network

The 2.496 SNPs connected to the 340 genes were used to create a network in Cytoscape. In

this network the SNPs were placed around the genes to which they are linked. This network

was merged together with a network where all the pathways and connected genes from

WikiPathways were shown. This latter network consisted of 284 pathways and 5.703 different

genes. Merging the two networks resulted in a network where the genes had SNPs associated

with T1DM linked to them and the genes were then linked to the pathways in which they are

involved (Figure 5). This resulted in a very complex network with 7.502 nodes and 17.017

edges. For explanation of the merging function see section Materials and Methods 2.2.3 and

Figure 2.

Page 15: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

12

Figure 5: Overview of all the pathways, genes and SNPs linked in a single network.

Larger network was made by merging together the gene -SNP and pathway - gene network.

Genes not linked to any pathway

306 genes that have known T1DM associated SNPs, but which could not be linked to any

pathway were examined with DisGeNET to rule out any possible involvement in T1DM. For

two genes DisGeNET found a known link to T1DM; the PTPN22 and ZPF57 genes. These

genes were included in the further analysis.

Filtered pathway-gene-SNP network

To facilitate the interpretation of the network, two filters were applied: (i) genes (and their

associated SNPs) not connected to any of the pathways were removed and (ii) pathways

without genes that are linked to T1DM associated SNPs were removed. 306 different genes

and their SNPs were removed from the network. A total of 210 pathways were removed so 74

pathways remained to which 57 different genes were linked. These genes were connected to

364 diferent SNPs. In the remaining network, the ‘Filtered pathway-gene-SNP network’, it

was found that the CDK2 gene is involved in 16 different pathways, followed by the DAXX

and ITPR1 gene which are involved in 7 and 6 pathways respectively. The genes with the

Page 16: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

13

most SNPs were the HLA-DQA2 gene with 43 SNPs, followed by the HLA-DRA gene with 30

SNPs.

The highest percentages of ‘genes (with SNPs) linked to pathway/total genes in pathway’ was

found in the ‘Arylamine metabolism’, where 1 out of 7 genes (14,3%) have T1DM related

SNPs. The second highest ratio is 7,8%, found in the ‘Allograft Rejection’ pathway. The focal

adhesion pathway is the pathway containing the most genes, a total of 211. An overview of all

the pathways, with the linked genes and SNPs is given in appendix 6.2, Table 2.

Figure 6: Overview of all the pathways, genes and SNPs linked in a single network.

Network of pathways with genes with T1DM associated SNPs. The 3 pathways with the most genes with T1DM

associated SNPs and the HLA-DQB1, linked to the SNP with the lowest p-value are indicated.

Page 17: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

14

3.4 PathVisio visualization of SNPs

3.4.1. Visualization of SNPs in PathVisio  

The new RI-plugin for PathVisio was used to visualize SNPs in a pathway. Information about

the amount of SNPs (and the SNP ids) becomes available in a side panel when selecting a

gene (Figure 7). The SNPs are coloured based on their p-value (left side of the box). All SNPs

of the selected HLA-C gene have a p-value lower than the cut-off value of 1*10-7. The right

side of the box shows the SIFT-score. These SIFT-scores were obtained from BioMart. Blue

means tolerated, whereas white means that no information is available. No deleterious SNPs

were found for this gene.

Figure 7: Visualization of SNPs of the Allograft Rejection pathway in PathVisio.

In the pathway image the HLA-C gene is selected and the back page on the right shows the list of SNPs linked to

this gene. The SNPs are coloured based on the p-value and SIFT-score, below the list a link to the NCBI page of

the SNP and details of the given data are provided. http://wikipathways.org/index.php/Pathway:WP2328

Legend of boxes in right upper corner: left box indicates p-value. Green means p-value < 1*10-7, where with

indicates a value above that. The right half of the box indicates the SIFT-score. Blue means tolerated, white no

information available, red (not shown) means a deleterious SNP.

3.4.2. Pathways linked to T1DM

The ‘Allograft Rejection’, ‘Proteasome degradation’ and ‘Type II interferon signaling’

pathway will be discussed in this section. These pathways were chosen since they contained

the most genes with T1DM associated SNPs.

Page 18: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

15

Allograft Rejection

The Allograft Rejection pathway has a total of 204 genes of which 16 have T1DM associated

SNPs. The pathway shows interactions involved in adaptive immune response which are

responsible for the degradation and destruction of an allograft.

The HLA-genes involved in the Allograft Rejection pathway include HLA-DRB4, -DRB1, -

DQB1, -DQA1, -DPA1, -DRB3, -DPB1, -DOA, -DMA, -DMB, -DOB, -DRA, -DRB5 and -

DQA2. For all those genes except HLA-DRB4, -DRB3, -DOA and -DRB5 SNPs are found that

are linked to T1DM. Besides the HLA-genes also the C4A and MICA genes have SNPs related

to T1DM (Figure 7).

Proteasome degradation pathway

7 out of the 135 genes in the ‘Proteasome degradation’ pathway are linked to T1DM related

SNPs. The above mentioned HLA-genes also play an important role in the ‘Proteasome

degradation’. The last step of the pathway is to present the antigens on MHC-class I proteins.

These include the HLA-A, -B, -C, -F and -J genes. PSMB8 and -9 are the two other genes with

T1DM associated SNPs in the ‘Proteasome degradation’ pathway. These two genes belong to

the beta-subunits of the proteasome 20s catalytic core (P45). The pathway is shown in Figure

8.

Page 19: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

16

Figure 8: The Proteasome degradation pathway from WikiPathways. The red circles are places around the

genes with T1DM related SNPs. Image adapted from WikiPathways.

http://wikipathways.org/index.php/Pathway:WP183

Type II interferon signaling

In the ‘Type II interferon signaling’ pathway 4 out of 74 genes have T1DM associated SNPs

related to them (Figure 9). These four genes are HIST2H4, HLA-B, TAP1 and PSMB9.

HIST2H4 encodes for the histone H4 protein, one the proteins involved around the formation

of chromatin structure. The TAP1 gene encodes for an ABC transporter, which is able to

transport a variety of molecules across membranes. The PSMB9 protein is one of the 17

subunits of the 20S proteasome complex and the HLA-B is involved in recognition of foreign

antigens.

Page 20: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

17

Figure 9: The Type II Interferon Signaling pathway from WikiPathways. The genes with T1DM related

SNPs are marked with a red circle. Image adapted from WikiPathways.

http://wikipathways.org/index.php/Pathway:WP619

4. Discussion

4.1 GWAS analysis A total of 2.183 different T1DM associated SNPs were extracted from Johnson and

O'Donnell’s database. Eventually 1521 SNPs were matched 2496 times to 340 different genes

using Ensembl BioMart, after GWAS3D analysis. This means that 662 SNPs could not be

linked to any gene. This problem can cause important data to be lost. These 662 SNPs were

entered again in Ensembl BioMart and it was found that 145 of the 662 SNPs could be linked

to Ensembl gene identifiers but not to HGNC symbols.

The SNP with the lowest p-value (rs9273363) can be linked to the HLA-DQB1 gene.

Depending on the haplotype of the SNP, the risk for T1DM can increase or decrease. The A/A

haplotype increases the risk, while the people with A/C and C/C haplotypes have a 0,87 and

0,15 time lower risk, respectively (20).

The HLA-genes are already classified as important loci in T1DM as described by Nejentsev et

al. In their association analysis they identified 1.475 SNPs linked to the genes of the MHC-

region (21).

Page 21: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

18

4.2 GO-Analysis  

The GO-analysis revealed that most of the genes with SNPs linked to T1DM are related to the

immune system and function. Torkamani et al. performed a pathway analysis of the seven

diseases assessed by the WTCCC (22). Most of the processes found for T1DM are also

related to the immune system, like the ‘IL2 activation and signaling pathway’. They also

found two pathway involved in signal transduction (‘Calcium signaling’ and ‘IP3 signaling’)

and several pathway involved angiotensin, AKT and ERK. Differences in results can be

explained by the use of MetaCore by Torkamani et al., while other approaches were used in

this thesis. Although some differences are found, both experiments found immune-related

processes. Since GO-classes are much more generic than pathways, this might also explain

some of the differences. The nucleosome organization and assembly may show up since

chromatin modification is critical for efficient transcription of the MHC-class II proteins (23).

4.3 Network analysis  

WikiPathways comprises a total of 284 curated pathways for Homo Sapiens. These pathways

contain a total of 5.916 different genes. It is clear that large amounts of knowledge is not (yet)

incorporated in these pathways, causing relevant genes to be missing in the pathway analysis.

The size of this problem might have been reduced by integrating multiple pathway sources,

for example KEGG or Reactome, although the overlap in gene count is quite substantial. The

‘Merged Pathway-gene-SNP network’ had multiple genes that could not be connected to any

pathway. Ensembl BioMart was used to investigate the phenotype descriptions associated

with these genes. Numerous genes missed phenotype descriptions and therefore DisGeNET-

analysis was performed. DisGeNET and Ensembl BioMart indicated the association of the

ZFP57 gene with T1DM. The ZFP57 gene is more associated with another form of diabetes,

called the ’Diabetes Mellitus Transient Neonatal 1’ (24).

The pathway- and network analysis approach was found to be useful to examine in the

investigation of SNPs and to see which pathways are involved in the genetic susceptibility to

T1DM. Numerous loci have been found to be associated to T1DM and a part of these were

validated by this novel approach.

4.4 PathVisio visualization of SNPs  

The visualization of SNPs in PathVisio was not meant to generate new data but to test a

method to interactively visualize SNPs directly in the pathway. The Allograft Rejection

pathway was visualization based on p-values and SIFT-scores. These SIFT-scores indicated

Page 22: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

19

no deleterious SNPs. One of the problems encountered was the fact that we did not succeed in

linking more than one transcript per gene. A method to take these transcripts into account is

necessary, since SIFT-scores between these transcripts can vary significantly. This

visualization shows that there are possibilities to visualize SNPs in PathVisio, which facilitate

the interpretation of SNPs in pathways.

Allograft rejection and diabetes

The ‘Allograft Rejection’ pathway is the pathway which has the highest amount of genes with

T1DM related SNPs. These genes mainly belong to the MHC/HLA-class genes. Carbonetto

and Stephens conducted an experiment in which the data from the WTCCC, which is also

included in this experiment, was analyzed. They found the ‘Allograft Rejection’ and ‘Asthma’

pathway as the top pathways in rheumatoid arthritis and T1DM (25).

In the ‘Allograft Rejection’ pathway antigen presenting cells from the donor or the recipient

(direct and indirect pathway, resp.) activate naive T-cells leading to CD8+ and CD4+ T cell

activation. The MHC/HLA-proteins help in the process of presenting the antigens to the T-

cells. The activation of T-cells ultimately leads to destruction and apoptosis of the allograft

cells (Figure 7). Cytotoxic T-lymphocyte-associated protein 4 (CTLA4) acts as an inhibitor of

the inflammatory response, but no SNPs linked to this gene were found in the analysis of the

GWAS data (26). In both the Asthma and Allograft Rejection pathway, multiple MHC-genes

are involved, explaining the involvement in T1DM. Unfortunately the ‘asthma’ pathway was

not included in the WikiPathways collection.

Articles on the association of the HLA-genes with T1DM go as far back as 1970 (27). In 1996

Noble JA described the role of the HLA-class II genes, especially in the HLA-DRB1, -DQA1,

-DQB1, and -DPB1 genotypes (28). The role of these genes in T1DM are confirmed in this

experiment. The similarity between ‘Allograft Rejection’ and T1DM is the immune reaction

against tissue cells. In the rejection of an allograft the allograft is seen as a foreign object and

attacked by the immune system, whereas in T1DM the host’s own beta-cells are attacked. It

was therefore expected that this pathway would show up, since similar processes are involved

in T1DM.

Proteolysis and diabetes

We found several models around the involvement of the proteolysis in T1DM. An interesting

model of the involvement of proteolysis in the development of T1DM is proposed by

Fierabracci (29). Fierabracci’s model starts with an environmental influence, like a viral

Page 23: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

20

infection, causing breakdown of self-proteins in the beta-cells. Pieces of the protein fragments

that remain are shown to CD8+ T-cells by the antigen presenting capacities of the MHC-class

I molecules. This T-cells activation will lead to lysis of the beta-cells. By presentation of

these antigens to B-cells, autoantibodies are produced to attack the beta-cells. An extensive

description of this hypothesis is given in ‘The putative role of proteolytic pathways in the

pathogenesis of Type 1 diabetes mellitus: The ‘autophagy’ hypothesis’ (29).

Balasubramanyam M et al. described a different aspect of the proteolysis and its involvement

in T1DM (30). The ubiquitin-proteasome system is one the systems involved in the

degradation of cellular proteins and is involved in numerous biological functions. One of

these biological functions is the insulin signaling (30). Insulin promotes cellular growth not

only by increasing synthesis of some proteins, but mainly by the inhibition of overall

proteolysis (31, 32). Experimental evidence shows that this inhibition of overall proteolysis

comes from the inhibition of the ATP-and ubiquitin-dependent degradation by the

proteasomes in vitro as well as in cultured cells (33). Contrasting evidence shows that

prolonged exposure of cells to insulin causes IRS-1 ubiquitin conjugation and consequently

degradation of IRS-1 (34, 35). In diabetes, inflammatory mediators could upregulate SOCS

proteins (suppressor of cytokine signaling proteins), especially SOCS3 could be upregulated.

IRS-1 and -2 could then be degraded by proteasomes, causing insulin resistance. These SOCS

proteins and the processes involved could play a role in T1DM, although the exact

mechanisms are still unclear (33). In addition to the ubiquitin/proteasome system the

proteolytic autophagic pathway may also play a role in the autoimmune processes. Autophagy

could be involved in the autoimmune process within the MHC class I & II self-antigen

presentation. Components of the ubiquitin/proteasome system may even be shared with the

autophagy process (29). This information suggest that we can confirm the role of the

‘Proteasome Degradation’ pathway in T1DM.

Type II interferon signaling

Inflammation is an important process involved in T1DM. Type II interferon signaling is a

pathway involved in the inflammatory response. The four genes with T1DM associated SNPs

in the ‘Type II interferon signaling’ pathway are the HIST2H4, HLA-B, TAP1 and PSMB9

gene. TAP 1 has been reported to be associated with T1DM (36). Together with the HLA-B

gene, among others, the TAP and MHC-complex work together on the antigen presentation to

T-cells (21, 36). The GO-Analysis performed also showed an overrepresentation of the

interferon signaling and the involvement of TAP proteins.

Page 24: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

21

4.5 Comparing results with other studies  

We compared our results with multiple more recent publication to investigate the role of

several genes associated with T1DM. Hakonarson et al. found in 2007 a significant

association with variations in the gene region that contains the KIAA0350 gene (9). This

regions has not been found in the GWA studies used in this analysis.

The PTPN22 gene is frequently associated with T1DM. This gene encodes the lymphoid

protein tyrosine phosphatase (37, 38). A SNP in the gene could contribute to the development

of T1DM by the negative regulation of T-cell activation, demonstration the important role of

T-cells in T1DM (39). Unfortunately, the PTPN22 is not present in any of the pathways in

the WikiPathways collection, and so it was ‘lost’ in the ‘Filtered pathway-gene-SNP

network’. However, SNPs related to PTPN22 were found in the GWA studies and therefore

should nonetheless be considered as an important factor in T1DM.

Cytotoxic T-lymphocyte-associated protein 4 (CTLA4) is a protein on the surface of activated

T-cells, producing an inhibitory signal regarding the T-cell activation. Changes in the gene

expression in the CTLA4 gene, caused by possible variations in one the regulators may

increase T-cell self-reactivity, indicating its possible link to T1DM (26, 40). No SNP related

to the CTLA4 gene was found in the database, or the SNP could not be matched to the CTLA4

gene using Ensembl BioMart, explaining the absences of the gene in the network .

In the experiment conducted by Carbonetto, as described above in section ‘Allograft

Rejection’ (discussion section), the pathway with the higher Bayes factor was the IL-2

signaling pathway. This pathway is missing in the final network, since none of these genes

had T1DM associated SNPs linked to them. The IL-2 cytokine is involved in the activation,

development and maintenance of T-regulatory cells. Defects in these pathways are involved

in autoimmune disorders (25).

Evangelou M et al. did a pathway analysis in which they started with SNPs and looked for the

pathway involved in T1DM (41). A major difference in their approach compared to this thesis

is that Evangelou et al. removed the MHC-loci in the analysis, so these loci could not bias

towards pathway in which multiple MHC-loci are involved. The most important pathways

they found were also involved in the immune response. Since they used a different approach

and different databases for the pathways (Reactome and BioCarta), the pathway did differ

from the pathways found in this thesis.

Page 25: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

22

4.6 Shortcomings in this study  

What must be taken into account is the fact that the database from Johnson and O'Donnell

was established in 2009 and the articles from Hakonarson and the WTCCC were both

published in 2007. Therefore SNPs found more recently by other researchers may not be

included. Besides this, only 50% of the protein coding genes are in the pathways since the

detailed mechanisms and functions are often not known. Ensembl BioMart not being able to

link some SNPs to genes may explain incomplete networks.

4.7 Conclusion We performed a complex interactive network approach to investigate the following research

question: “Can we find the disease related processes by integrating and visualizing the

genetic variants with existing pathway knowledge?”

We found several pathways and biological processes to be involved in T1DM. The MHC-

region was already known to be important in T1DM, and our pathway- and network based

approaches confirmed this finding, showing several relevant pathways containing multiple

HLA-genes. This result shows the importance of immune related pathways in T1DM.

The method applied is able to confirm existing knowledge and gives possibilities to search

new loci and pathways involved in the disease process. One of the difficulties encountered

was that the ‘Merged Pathway-gene-SNP network’ was too big to interpret visually. In the

‘Filtered Pathway-gene-SNP network’ this problem was resolved. In this smaller network the

pathways, gene, SNPs and their connections could be distinguished based on the used

visualization. Future research could focus on improvements in linking SNPs to genes and

genes to pathways, thereby possibly expanding the knowledge on the disease process. This

could eventually lead to improvements in the diagnosis or treatment of T1DM patients.

Page 26: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

23

5. References

1. Van Belle TL, Coppieters KT, Von Herrath MG. Type 1 diabetes: etiology,

immunology, and therapeutic strategies. Physiol Rev. 2011;91(1):79-118.

2. Statistiek CB. Steeds meer mensen met diabetes 2014. Available from: http://www.cbs.nl/nl-NL/menu/themas/gezondheid-

welzijn/publicaties/artikelen/archief/2014/2014-4173-wm.htm.

3. Frayn KN. Metabolic Regulation. A Human Perspective. Oxford (UK): Wiley -

Blackwell; 2010.

4. Guyton AC, Hall JE. Textbook of Medical Physiology. Philadelphia: Elsevier

Saunders; 2010.

5. Goossens GH. The role of adipose tissue dysfunction in the pathogenesis of obesity-

related insulin resistance. Phsyiol Behav. May 2008;94(2):206-18.

6. Fierabracci A. Unravelling the role of infectious agents in the pathogenesis of human

autoimmunity: the hypothesis of the retroviral involvement revisited

. Curr Mol Med. 2009;20099:1024-33.

7. Fierabracci A, Ayroldi E. Experimental strategies in autoimmunity: antagonists of

cytokines and their receptors, nanocarriers, inhibitors of immunoproteasome, leukocyte

migration and protein kinases

. Curr Pharm Des. 2011;17:3094-107.

8. Bottini N, Vang T, Cucca F, Musteling T. Role of PTPN22 in type 1 diabetes and

other autoimmune diseases. Seminars in Immunology. 2006;18(4):207-13.

9. Hakonarson H. A genome-wide association study identifies KIAA0350 as a type 1

diabetes gene. Nature. August 2007;448:591-4.

10. Barillot E, et al. Computational Systems Biology of Cancer: Taylor & Francis Group;

2012.

11. Li MJ, Wang LY, Xia Z, Sham PC, Wang J. GWAS3D: detecting human regulatory

variants by integrative analysis of genome-wide associations, chromosome interactions and

histone modifications. Nucleic Acids Res. 2013.

12. Johnson AD, O'Donnell CJ. An Open Access Database of Genome-wide Association

Results. BMC Medical Genetics. 2009;10(6).

13. The Wellcome Trust Case Control Consortium. Genome-wide association study of

14.000 cases of seven common diseases and 3.000 shared controls. Nature. June

2007;447(7145):661-78.

Page 27: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

24

14. Hakonarson H. A genome-wide association study identifies KIAA0350 as a type 1

diabetes gene. Nature 448. August 2007:591-4.

15. Kelder T, van Iersel MP, Hanspers K, Kutmon M, Conklin BR, Evelo C, et al.

WikiPathways: building research communities on biological pathways. Nucleic Acids Res.

2011:1301-7.

16. Pinero J, al. e. DisGeNET: a discovery platform for the dynamical exploration of

human diseases and their genes. Database (Oxford). 2015.

17. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: A

Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome

Res. 2003;13(11):2498–504.

18. Kutmon M, Van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, et al. PathVisio 3:

An Extendable Pathway Analysis Toolbox. PLoS Comput Biol. 2015;11(2).

19. Eden E, Navon R, Steinfeld I, Lipson D, Yakhini Z. GOrilla: A Tool For Discovery

And Visualization of Enriched GO Terms in Ranked Gene Lists. BMC Bioinformatics.

2009;10(48).

20. Cariaso M, Lennon G. SNPedia: a wiki supporting personal genome annotation,

interpretation and analysis. Nucleic Acids Res. 2012:1308-12.

21. Nejentsev S, et al. Localization of type 1 diabetes susceptibility to the MHC class I

genes HLA-B and HLA-A. Nature. 2007;450(7171):887-92.

22. Torkamani A, Topol EJ, Schork NJ. Pathway analysis of seven common diseases

assed by genome-wide association. Genomics. 2008;92:265-72.

23. Choi NM, et al. Regulation of major histocompatibility complex class II genes. Curr

Opin Immunol. 2011;23(1):81-7.

24. Baglivo L, et al. Genetic and epigenetic mutations affect the DNA binding capability

of human ZFP57 in transient neonatal diabetes type 1. FEBS Lett. 2013;587(10):1474-81.

25. Carbonetto P, Stephens M. Integrated Enrichment Analysis of Variants and Pathways

in Genome-Wide Association Studies Indicates Central Role for IL-2 Signaling Genes in

Type 1 Diabetes, and Cytokine Signaling Genes in Crohn's Disease. PloS Genetics. 3 October

2013.

26. Ueda H, et al. Association of the T-cell regulatory gene CTLA4 with susceptibility to

autoimmune disease. Nature 423. 2003:506-11.

27. Noble JA. Genetics of Type 1 Diabetes. Cold Spring Harb Perspect Med. 2012;2.

28. Noble JA, et al. The role of HLA class II genes in insulin-dependent diabetes mellitus:

molecular analysis of 180 Caucasian, multiplex families. Am J Hum Genet 59. 1996:1134-48.

Page 28: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

25

29. Fierabracci A. The putative role of proteolytic pathways in the pathogenesis of Type 1

diabetes mellitus: The ‘autophagy’ hypothesis. Medical Hypotheses. 2014;82(5):553-7.

30. Balasubramanyam M, Sampathkumar R, Mohan V. Is insulin signaling molecules

misguided in diabetes for ubiquitin–proteasome mediated degradation? Molecular and

Cellular Biochemistry. 2005;275:117-25.

31. Fryburg DA, Jahn LA, Hill SA, Oliveras DM, Barrett EJ. Insulin and insulin-like

growth factor-I enhance human skeletal muscle protein anabolism during

hyperaminoacidemia by different mechanisms. Journal of Clinical Investigation.

1995;96:1722-9.

32. Russell-Jones DL, Umpleby M. Protein anabolic action of insulin,

growth hormone and insulin-like growth factor I. Eur J Endocrinol. 1996;135:631-42.

33. Bennett RG, Hamel FG, Duckworth WC. Insulin inhibits the ubiquitin dependent

degrading activity of the 26S proteasome. Endocrinology. 2000;141:2508-17.

34. Sun XJ, Goldberg JL, Qiao LY, Mitchell JJ. Insulin-induced insulin receptor substrate-

1 degradation is mediated by the proteasome degradation pathway. Diabetes. 1999;48:1359-

64.

35. Zhande R, Mitchell JJ, Wu J, Sun XJ. Molecular mechanism of insulin-induced

degredation of insulin receptor substrate 1. Mol Cell Biol. 2002;22:1016-26.

36. Sia C, Weinem M. Genetic susceptibility to type 1 diabetes in the intracellular

pathway of antigen processing - a subject review and cross-study comparison. Rev Diabet

Stud. 2005;2(1):40-52.

37. Hirschhorn JN. Genetic epidemiology of type 1 diabetes. Pedriatr Diabetes.

2003;4:87-100.

38. Maier LM, Wicker LS. Genetic susceptibility to type 1 diabetes. Curr Opin Immunol.

2005;17:601-8.

39. Mehers KL, Gillespie KM. The genetic basis for type 1 diabetes. Br Med Bull.

2008;88(1):115-29.

40. Kristiansen OP, Larsen ZM, Pociot F. CTLA-4 in autoimmune diseases—a general

susceptibility gene to autoimmunity? Genes Immun. 2000;1:170-84.

41. M E. A Method for Gene-Based Pathway Analysis Using Genomewide Association

Study Summary Statistics Reveals Nine New Type 1 Diabetes Associations. Genet

Epidemiol. 2014;38(8):661-70.

Page 29: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

26

6. Appendix

6.1 Results of GO-analysis Table 1: The enriched processes found by GO-Analysis. A description of the process, together with the P-value, FDR q-value and enrichtment values are given.

Page 30: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

27

6.2 Overview of pathways – genes – SNPs in ‘Filtered pathway-gene-SNP

network’ Table 2: Overview of all the pathways which have genes with T1DM associated SNPs linked.

The left column shows the pathway, the middle one the HGNC-symbols of the genes and the right column

indicates which SNPs are linked to the gene. In case there is no SNP indicates, that means that the genes is also

mentioned in another pathway and the SNPs are indicated there.

Pathway HGNC Symbol

SNPs linked to gene

AGE/RAGE pathway

AGER rs3134945, rs3134943, rs3134947, rs3134946, rs1800684, rs3132965, rs3096689

Allograft Rejection

C4A rs391165, rs389512

HLA-A rs3823343 HLA-B rs9378249, rs3819294, rs2523612 HLA-C rs9468933, rs9468929, rs7382307, rs3819294, rs9378249,

rs7751729, rs2844613, rs2524078, rs3134792, rs3134745, rs2523612, rs2248902, rs2524069, rs2524067, rs13208617, rs13207315, rs2248880, rs13216197, rs13200073, rs13191519, rs13203895, rs13200571, rs10484554, rs10456057, rs13191343, rs12191877

HLA-DMA

rs209473

HLA-DMB

rs3132131

HLA-DOB

rs2857124, rs2857118, rs2857135, rs2071474, rs2071472, rs2621338, rs2621321, rs2071470

HLA-DPA1

rs9277380

Page 31: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

28

HLA-DPB1

rs9277380, rs3117224, rs9277477, rs2179920

HLA-DQA1

rs9272346, rs9271775, rs9272775, rs17843593, rs6927022, rs3129768

HLA-DQA2

rs9276395, rs9276394, rs9276401, rs9276364, rs9276357, rs9276375, rs9276370, rs9276319, rs9276313, rs9276351, rs9276348, rs7773694, rs7773441, rs9276311, rs7773955, rs7773068, rs7755597, rs7773407, rs7773149, rs6457644, rs6457643, rs7755596, rs7744593, rs2051600, rs13214143, rs5021448, rs2894287, rs13199553, rs12183007, rs13214069, rs13199787, rs10947336

HLA-DQB1

rs9273349, rs3134996, rs9273363, rs1063355

HLA-DRA

rs983561, rs9268657, rs9268645, rs9268659, rs9268658, rs9268615, rs8084, rs9268641, rs9268634, rs3135394, rs3135393, rs5000563, rs4612206, rs3129887, rs3129881, rs3135392, rs3135391, rs3129876, rs3129875, rs3129878, rs3129877, rs2395181, rs2395180, rs3129872, rs3129868, rs2395176, rs2395174, rs2395179, rs2395177, rs14004

HLA-DRB1

rs35464393, rs35366052, rs701831, rs28490179

HLA-F MICA rs3132472, rs3094584, rs9501387, rs2596542, rs2596541,

rs2857282, rs2844521, rs2523459, rs2523453, rs2596540, rs2523467, rs2428474, rs2256328, rs2523452, rs2428475, rs2251731, rs1063635, rs2256175, rs2256028, rs1051790

Alzheimers Disease

ITPR1 rs3805006

Amyotrophic lateral sclerosis (ALS)

DAXX rs3130099, rs3130018, rs3130100, rs2073525, rs2073524, rs3106189, rs2395379, rs1061783

Androgen receptor signaling pathway

DAXX

Apoptosis Modulation and Signaling

DAXX

HSPA1B rs2763979, rs2471980, rs3115674 Apoptosis Modulation by HSP70

HSPA1B

Apoptosis-related network due to altered Notch3 in ovarian cancer

ERBB3 rs7971751, rs4759229, rs877636

IER3 rs3130660, rs2284174, rs1059612 Arrhythmogenic CACNG1 rs3785579, rs1799938, rs7210865, rs16960487, rs16960501,

Page 32: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

29

Right Ventricular Cardiomyopathy

rs16960497

TCF7L2 rs7904519, rs7900150, rs7924080, rs7077247, rs7077039, rs7895340, rs7100927, rs6585201, rs6585200, rs7071302, rs6585202, rs6585197, rs4074720, rs6585199, rs6585198, rs12359102, rs12265291, rs4074718, rs12718338, rs11196205, rs11196200, rs12258200, rs11196208, rs10885405, rs10885402, rs10885409, rs10885406, rs10787472

Aryl Hydrocarbon Receptor

CDK2 rs773107

Arylamine metabolism

SULT1A2 rs11401

ATM Signaling Pathway

CDK2

MDC1 rs3132584 Calcium Regulation in the Cardiac Cell

ITPR1

Cardiac Progenitor Differentiation

POU5F1 rs887468, rs887465, rs3130457, rs1265159, rs887464, rs6929434, rs1150765

Cell Cycle CDK2 Complement and Coagulation Cascades

C2 rs638383, rs558702, rs644045, rs497309, rs3130683, rs519417, rs512559, rs3128761

CFB rs512559, rs2072633, rs1270942 Cytokines and Inflammatory Response

HLA-DRA

HLA-DRB1

Cytoplasmic Ribosomal Proteins

RPS26 rs1131017, rs705704

Diurnally Regulated Genes with Circadian Orthologs

HIST1H2BN

rs201005, rs200995, rs200989, rs200981, rs200991, rs200990, rs200948, rs17763089, rs200953, rs200949, rs13199772, rs13194781, rs17695758, rs13199906

HLA-DMA

DNA Damage Response

CDK2

DNA Damage Response (only ATM

TCF7L2

Page 33: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

30

dependent) DNA Replication

CDK2

ErbB Signaling Pathway

ERBB3

NRG3 rs11818231, rs11816685, rs17101073, rs17095600, rs11815363 Eukaryotic Transcription Initiation

GTF2H4 rs886420, rs1264308, rs1264304, rs1264312, rs1264310

FAS pathway and Stress induction of HSP regulation

DAXX

Fatty Acid Beta Oxidation

ACSL1 rs13112568, rs13126272, rs13120078

Fatty Acid Biosynthesis

ACSL1

Focal Adhesion TNXB rs1150752, rs393544, rs3134954, rs433061, rs3130285, rs3117189, rs3130342, rs3130287, rs3096695, rs204895, rs3117182, rs3117181, rs204885, rs204879, rs204890, rs204889, rs1269852, rs1150753, rs204878, rs1269854

G Protein Signaling Pathways

ITPR1

G1 to S cell cycle control

ATF6B rs393544, rs3134954, rs3117182, rs3117181, rs3130342, rs3130288, rs204894, rs204892, rs3096695, rs204895, rs1269854, rs1269852, rs204890, rs204889, rs1269851, rs1150752

CDK2 Ganglio Sphingolipid Metabolism

B3GALT4 rs464865, rs463260, rs469064, rs446735, rs462093, rs455567

Gastric Cancer Network 1

HIST1H4J rs200502

Heart Development

ERBB3

Histone Modifications

EHMT2 rs558702

HIST1H3J rs200977 HIST1H4J ID signaling pathway

CDK2

IL-5 Signaling Pathway

SPRED1

Insulin Signaling

FLOT1 rs8233, rs3095329, rs3094127, rs3130660, rs3095330, rs1059612, rs2284174, rs1064627

Integrated Breast Cancer Pathway

CDK2

Page 34: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

31

Integrated Cancer pathway

CDK2

Integrated Pancreatic Cancer Pathway

CDK2

DAXX Iron uptake and transport

ATP6V1G2

rs2857607

SLC17A3 rs555460, rs548987, rs726836, rs629444, rs1324088, rs1324087, rs523383, rs501220, rs1165168, rs1184498, rs1177441

MAPK Signaling Pathway

DAXX

HSPA1B Mesodermal Commitment Pathway

POU5F1

Metapathway biotransformation

SULT1A2

miR-targeted genes in lymphocytes - TarBase

ATAT1 rs9262135, rs9262130

EHMT2 miR-targeted genes in muscle cell - TarBase

EHMT2

ERBB3 miRNA Regulation of DNA Damage Response

CDK2

miRNA targets in ECM and membrane receptors

TNXB

Mitochondrial LC-Fatty Acid Beta-Oxidation

ACSL1

mRNA Processing

DHX16 rs9262135, rs9262141

Myometrial Relaxation and Contraction Pathways

ATF6B

ITPR1 Neural Crest NOTCH4 rs3132946, rs3132940, rs3134942, rs3132956, rs3096690,

Page 35: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

32

Differentiation rs2071278, rs3131296, rs3131294, rs204989, rs204987, rs204991, rs204990, rs1044506

NOTCH4 NOTCH4 NRF2 pathway AGER Oncostatin M Signaling Pathway

CDK2

Ovarian Infertility Genes

MSH5 rs707915, rs3132445, rs707938, rs3131378, rs3130484, rs3131383, rs3131379, rs3117574, rs3115672, rs3117577, rs3117575, rs1150793, rs1144708, rs3115671, rs3101018

p38 MAPK Signaling Pathway

DAXX

Parkin-Ubiquitin Proteasomal System pathway

HSPA1B

TUBB rs3095330, rs3095329, rs8233, rs3132584, rs3094127 Pathogenic Escherichia coli infection

TUBB

Proteasome Degradation

HLA-A

HLA-B HLA-C HLA-F rs929158, rs885942, rs929160, rs1633106, rs1632957,

rs1736915, rs1736913, rs1610608, rs1610602, rs1628578, rs1611350, rs1610601, rs1419696

HLA-J rs356969, rs356968 PSMB8 rs4148882, rs241427 PSMB9 rs4148882 RB in Cancer CDK2 Regulation of Actin Cytoskeleton

FGF14 rs12708382

Regulation of Microtubule Cytoskeleton

SPRED1

Serotonin Receptor 2 and ELK-SRF/GATA4 signaling

ITPR1

SIDS Susceptibility Pathways

C4A

POU5F1

Page 36: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

33

Signaling Pathways in Glioblastoma

CDK2

ERBB3 Spinal Cord Injury

AIF1 rs2857600, rs2736177, rs3763295, rs2736176, rs2269475

CDK2 Sulfation Biotransformation Reaction

SULT1A2

TCR Signaling Pathway

ITPR1

Triacylglyceride Synthesis

AGPAT1 rs3134947, rs3134943, rs3132965, rs3134946, rs3134945, rs3130347, rs3130284, rs3131297, rs3130348, rs3096689, rs2849013, rs3130283, rs3096697, rs1269839

TSH signaling pathway

CDK2

Type II interferon signaling (IFNG)

HIST1H4J

HLA-B PSMB9 TAP1 rs4148882 Wnt Signaling Pathway and Pluripotency

POU5F1

TCF7L2 Wnt Signaling Pathway Netpath

TCF7L2

Page 37: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

34

Add-on page: Description of work by student Bachelor BMW

This add-on page to the Bachelor thesis BMW provides details on the role of the student

in the experiments, data collection, and analyses described in the thesis.

Experiments and measurements

Please describe here which wet-lab experiments, clinical measurements, or other

experiments and/or measurements were conducted by the student:

No measurements were performed.

[Extend the box if needed]

Provenance of data

Please describe here, for all data(sets) described in the results, whether it was generated

by the student, generated by others during the project (e.g. project members, staff,

other students), previously generated or existing in the lab, or taken from public

resources:

The SNPs were extracted from a free accessible database online. A reference to the

database is given in the thesis.

The WikiPathways file (pathways + their genes) was provided by the thesis supervisor.

The DisGeNET script was obtained by contacting the developers.

[Extend the box if needed]

Data analysis and interpretation

Please describe here which data analyses and interpretations were conducted by the

student for the data(sets) described in the previous box:

SNPs were linked to their genes using Ensembl BioMart. These genes were then used

together with the WikiPathways file in order to create a network (pathway-gene-SNP).

Genes not linked to a pathway were removed in Cytoscape. The resulting network was

interpreted by the student with the aid of Excel (sorting Pathways, sorting genes etc).

Pathway to use in PathVisio was downloaded from WikiPathways.

The DisGeNET and GO-Analysis results were visually interpreted by the student.

[Extend the box if needed]

Integration with other analytical results

Page 38: Investigation of GWAS data on Type 1 Diabetes Mellitus ...projects.bigcat.unimaas.nl/data/students/TomHooimeijer/BaThesis... · Not all genes could be linked to a pathway from WikiPathways

35

Please describe here which other analytical results were integrated with those of the analysis performed by the student, if any:

[Extend the box if needed]

Remarks

Visualization of SNPS in PathVisio was performed together with the supervisor and co-supervisor of this project, Martina Summer -Kutmon and Elisa Cirillo resp. This technique is new and was not meant to create data for this thesis.

[Extend the box if needed]