123
UNIVERSITA' DEGLI STUDI DI PADOVA Facoltà di Scienze MM. FF. NN. Centro Ricerche Interdipartimentale Biotecnologie Innovative (CRIBI) SCUOLA DI DOTTORATO DI RICERCA IN BIOCHIMICA E BIOTECNOLOGIE INDIRIZZO IN BIOTECNOLOGIE CICLO XX DEVELOPMENT OF AN INTEGRATED DISEASE ONTOLOGY KNOWLEDGEBASE AND ITS APPLICATION TO STUDY MECHANISMS OF NEUROPSYCHIATRIC DISORDERS Direttore della Scuola Ch.mo Prof. Giuseppe Zanotti Supervisore Ch.mo Prof. Giorgio Valle Dottorando Fabrizio Caldara

UNIVERSITA' DEGLI STUDI DI PADOVA - [email protected]/1888/1/Fabrizio_Caldara_Tesi_Dottorato.pdf · The information age allowed storage and dissemination of

Embed Size (px)

Citation preview

UNIVERSITA' DEGLI STUDI DI

PADOVA

Facoltà di Scienze MM. FF. NN.

Centro Ricerche Interdipartimentale Biotecnologie

Innovative (CRIBI)

SCUOLA DI DOTTORATO DI RICERCA IN BIOCHIMICA E BIOTECNOLOGIE

INDIRIZZO IN BIOTECNOLOGIE

CICLO XX

DEVELOPMENT OF AN INTEGRATED DISEASE

ONTOLOGY KNOWLEDGEBASE AND ITS APPLICATION

TO STUDY MECHANISMS OF NEUROPSYCHIATRIC

DISORDERS

Direttore della Scuola Ch.mo Prof. Giuseppe Zanotti

Supervisore Ch.mo Prof. Giorgio Valle

Dottorando Fabrizio Caldara

- i -

Abstract

Production and distribution of scientific information has grown exponentially in the recent

years. PubMed, a service of the U.S. National Library of Medicine that includes over 18 million

Medline citations to journal articles, has been extending its coverage to some 40.000 abstracts

in life sciences and biomedical literature every month.

The information age allowed storage and dissemination of huge amount of data but our

ability to extract and process knowledge remained constant. We make inferences on

uncharacterised observations by recording and using natural language, which unfortunately is

rarely adequate. Furthermore, biomedical research is characterised by highly specialised

disciplines with limited communication among them and poorly shared resources.

These many aspects draw attention to the real need of integration, a general concept with

many definitions. In the context of my PhD, integration is intended as the process by which data

from one source can be exchanged, interpreted or manipulated by another, in a way that make

sense to the users in their interaction with the system.

Biomedical ontologies (OBO) in general and the Gene Ontology (GO) in particular, have

been fundamental components of an important information integration effort started in year

2000 with the ambitious goal to build a tool for the unification of biology and beyond. My PhD

project, standing on the shoulders of those initiatives, has been focused on the development of a

human-readable knowledgebase system that hopefully would facilitate exploitation of biological

experimental data. This resource relies on information extracted from many databases, mostly

manually curated, and uses an ontology of human diseases (i.e. the ‘Disease Ontology’) as a

backbone of the system. The objective is providing some support to the scientific biomedical

community in the interpretation of data on human diseases and their correlated genes, possibly

delivering information on available interacting drugs.

To test the system meanwhile evaluating its value, real research case was investigated in the

second part of my PhD work.

Functional analysis of inherently complex high-throughput data sources for systems biology

(e.g. microarray) is a fundamental step to understand mechanisms regulating molecular

processes modulated in diseases and pathological states. Nonetheless, advances at any level

relevant to disease understanding and drug discovery for psychiatric disorders in recent years

have been relatively unsuccessful compared with other areas. Therefore, a suitable

computational strategy sustained by the newly developed resource was designed to allow

investigation of the involvement in dendritic plasticity of specific disease genes, their

mechanisms of action and the available drugs they are known to interact with.

Dendritic plasticity, an important component of the central nervous system function during

development, has been recently postulated to be strongly involved in pathogenesis of psychiatric

diseases. The concept of plasticity spans a broad spectrum from describing clinical features of

behavior/learning and memory down to the molecular mechanisms by which neurons create and

lose synapse connections between one another.

The chosen approach allowed the semi-automated identification of a great number of genes

involved in plasticity mechanism at the molecular level. At the same time it also allowed

preliminary validation of the newly developed Disease Ontology Knowledgebase and an

evaluation of its potentialities.

- iii -

Abstract

In questi ultimi anni, la produzione e distribuzione di dati scientifici è cresciuta

esponenzialmente. PubMed, un servizio della U.S. National Library of Medicine che include

ormai oltre 18 milioni di citazioni estratte da Medline, incrementa il proprio contenuto di circa

40.000 estratti da pubblicazioni scientifiche o biomediche ogni mese.

L’avvento dell’era dell’informazione ha permesso di accumulare e disseminare enormi

quantità di dati, ma la nostra capacità di ricavarne conoscenza è rimasta costante. Le nostre

inferenze che nascono dall’osservazione si basano spesso sull’uso del linguaggio verbale che

raramente risulta adeguato. Inoltre, la ricerca biomedica è caratterizzata da discipline

fortemente specializzate che raramente comunicano o condividono risorse.

Tutti questi aspetti aiutano a rivolgere l’attenzione sulla reale necessità di integrare

informazioni, un concetto generale con molte definizioni. Nel contesto del mio dottorato, per

integrazione si intende il processo attraverso il quale i dati possono essere scambiati,

interpretati e manipolati pur rimanendo comprensibili da chi utilizza il sistema.

Le ontologie biomediche in generale e la Gene Ontology in particolare sono state una

componente fondamentale di un importante sforzo di integrazione di informazioni di tipo

biologico iniziato nel 2000 con l’ambizioso obiettivo di sviluppare uno strumento per

l’unificazione della biologia e oltre. Il mio progetto di dottorato, accompagnandosi a questa

iniziativa, si è focalizzato sullo sviluppo di un particolare tipo di database (knowledgebase) che

possa facilitare l’esplorazione di specifici dati sperimentali.

Il sistema si sviluppa sulla base di informazioni estratte da numerose fonti di dati per buona

parte curate manualmente, usando come struttura portante un’ontologia di malattie umane

(Disease Ontology). Lo scopo è quello di fornire supporto alla comunità scientifica biomedica

per l’interpretazione dei dati relativi a malattie umane, ai geni a queste ricollegabili e ai farmaci

in grado di curarle.

Nella seconda parte del dottorato è stata approfondita una specifica tematica di ricerca utile

per provare il sistema e valutarne le reali possibilità.

L’analisi funzionale di dati complessi prodotti con tecnologie high-throughput come i

microarray, risulta fondamentale per comprendere i meccanismi di regolazione dei processi

molecolari implicati negli stati patologici. Tuttavia, nonostante la disponibilità di validi

strumenti di indagine, nel campo delle malattie psichiatriche non si sono avuti gli stessi rilevanti

progressi, utili per comprenderne i meccanismi patologici, ottenuti invece in altre aree di

ricerca.

Pertanto, una adeguata strategia computazionale, abbinata al recente sviluppo della risorsa

oggetto di questo lavoro, è stata disegnata per consentire un’indagini sul coinvolgimento di

alcuni specifici geni, meccanismi e farmaci nella causa o la cura della patologia psichiatrica.

La plasticità dendritica è una componente importante nel funzionamento del sistema

nervoso centrale durante lo sviluppo, ed è stato recentemente postulato che possa essere

fortemente coinvolta nella patogenesi delle malattie legate al sistema nervoso centrale.

Il concetto di plasticità abbraccia un ampio spettro di caratteristiche cliniche che descrivono

aspetti del comportamento, dell’apprendimento e della memoria fino ai meccanismi molecolari

con cui i neuroni creano o perdono le loro sinapsi.

La strategia scelta ha consentito di identificare in modo semi-automatico un grande numero

di geni coinvolti a livello molecolare nel meccanismo della plasticità dendritica e ha permesso

allo stesso tempo la verifica, in certa misura e in via preliminare, delle qualità e delle

potenzialità del knowledgebase sviluppato.

- v -

Acknowledgements

My first thank you goes to Prof. Giorgio Valle for the possibility he offered me to do this

PhD. For their collaboration and help in the development of the database I wish to thank Dr.

Erika Feltrin and Dr. Alessandro Albiero. I would like to also thank Dr. Andrea Telatin for the

preliminary database interface.

Another deserved thank to Chris Hastwell that always supported me and this project with

unconditioned trust.

I would like to thank my wife Laura for her incomparable encouragement in any

circumstance, my children Giorgia and Tommaso for their perseverance in reminding me real-

life priorities, my mother Maly for having been always there to help me when needed.

Finally, I want to remember and thank you my father that shared the beginning, but sadly

not the end of this period of my life.

vii

Contents

1 Introduction…………………..……………………………..………………………………..1

1.1 Data integration and ontologies……………………………………….……………..1

1.1.1 The word ontology…………………………………………………..……….2

1.1.2 Ontologies in modern research………………………...…………………..2

1.2 Open Biomedical Ontologies………………………………………...……………….6

1.3 The Gene Ontology project…………………………………………..……………….9

1.4 Gene Ontology Annotation (GOA)………………………………..………………..10

2 Disease Ontology Knowledgebase……………………….…………..………………..13

2.1 Introduction………………………………………………………………………….13

2.2 Basic resources………………………………………….……………………………14

2.2.1 Disease Ontology……………………………………………………………14

2.2.2 Online Mendelian Inheritance in Man (OMIM)……………..………….17

2.2.3 Genetic Association Database……………………………………………..17

2.2.4 DrugBank………………...………………………………………………….19

2.2.5 PharmGKB…………………………………………………………………..19

viii CONTENTS ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.3 Methods and Results………………………..……….…………………...…………20

2.3.1 Retrieval and integration of data…………………………………….……21

2.3.2 Acquisition of disease name synonyms…………….…………………….22

2.3.3 Gene annotation findings and gene-disease relationships……………..26

2.3.4 Compilation of the drug dictionary………………….……………………27

2.3.5 Finding correlations between drugs and diseases………….…...………28

2.3.6 Identifying relationships between drugs and target genes……...……..30

2.4 Possible applications………………………………........……………..……………31

3 Case study……………………………………………...…………………………………….35

3.1 Introduction…………………….………...………………………………………….35

3.2 Background………………………..…………….…………………………………...36

3.2.1 Neuropsychiatric disorders and mechanisms of regulation……………37

3.2.2 Dendritic plasticity………………………...………….………..………….38

3.2.3 Factors influencing dendritic plasticity………………………….…..…..39

3.3 Methods and results (Part I)……………………………….…………………..40

3.3.1 Selection of query terms and creation of a gene list………….…………41

3.3.2 Annotation and selection of the Dendritic Plasticity gene dataset…....42

3.3.3 Identification of the correlations gene-disease……...…………………..44

3.3.4 Data validation using alternative methods………...…………………….47

3.4 Methods and Results (Part II)……………………..………………...………….50

3.4.1 Introduction to pathway and network analyses..……………….………50

CONTENTS ix ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.4.2 Databases and tools for Pathway/Network analysis……..…………..…51

3.4.3 Canonical pathways analysis………………………….…………..………52

3.4.4 Gene networks………………………………..……..……..……………….55

3.4.5 Drug/nutraceutical interactors…………………………..……….....……60

3.5 Discussion………………..………………………………………………..….………61

4 Conclusions………………………………….………………………………………………63

A Dendritic Plasticity gene dataset…………………………..………………………….67

B Over-represented disease genes…………………………………..…...……………...95

C List of abbreviations……………………….……………………………………………..99

Bibliography……………………………………………………….……………………….101

1

Chapter 1

Introduction

This document is divided in 4 chapters. The initial one is an introduction to ontologies as

they are intended and used in the context of computer science and biomedical research. Chapter

2 is focused on the annotation of the Disease Ontology and the development of the Disease

Ontology Knowledgebase. Chapter 3 concerns the validation of the system; a research case study

focused on neuropsychiatric diseases is described in some details. Conclusions are presented in

the final chapter of the document.

1.1 Data integration and ontologies

The exponential increase of data-based information, owing to fast biotechnological advances

and to high-throughput technologies, in addition to the coming of the World Wide Web as a new

means for data exchange, made it more complex and difficult to ascertain the biological meaning

covered in the heterogeneous biological data available to the scientific community. Moreover the

huge amounts of information, that are now produced on a daily basis, require more advanced

management solutions, and the availability of the web as a modern infrastructure for scientific

exchange has created new requirements with respect to data accessibility [1].

Concurrently, in the era of genome-scale biology, the aggregation of biological data is

followed by the distributed proliferation of biology-oriented databases [2]. Therefore, to make

the most effective use of such databases and the knowledge they incorporate, different kind of

information from different sources must be merged in ways that make sense to life scientists. In

that respect, the consolidation of data from the existent databases has long been acknowledged

2 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

as a significant component in the life science studies and different technologies and approaches

to data integration have been prosecuted over the past decade [1].

A major component of the integration effort is the development and use of annotation

criteria such as ontologies.

1.1.1 The word ‘ontology’

The word 'ontology' descends by the Greek ontos (being) and logos (word) and its

conceptual origin can be traced back to early philosophers which have been studying the theory

of objects and their ties for centuries. In philosophy, ontology is used to name the discipline that

tries to describe reality.

The term 'ontology' however is still disputable since different people have different ideas on

its significance and definition in different linguistic context. The first formal and explicit

approach to ontologies in the technical (not philosophic) sense goes back to 1900, given by

Husserl. Later in the 1980's, the ontologies got into the computer science domain as a way to

offer a simplified and clear view of a particular field of interest.

There is certain consensus on what an ontology is not: it is not a taxonomy (i.e. just a class-

subclass hierarchy), a dictionary (ontology includes relationships between terms), nor a

knowledgebase that includes individual objects. According to Gruber, ontology is 'the

specification of conceptualisations, used to help programs and humans share knowledge' [3].

Today ontologies are more formalised conceptual models used in computer science,

database integration, and artificial intelligence and they make accessible a common terminology,

across a domain, necessary for communication between people and organisations. They provide

the foundation for interoperability between systems. They can be used to make the content in

information sources explicit and serve well as an index to a repository of information [4].

1.1.2 Ontologies in modern research

Many decades ago, the main drive of bioinformatics was to store, retrieve and analyse the

1.1. Data integration and ontologies 3 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

data created by life scientists; data such as nucleotide sequences and protein structures. At that

time, the limited quantity of data acquired by biological investigators, required elementary

systems for their management, organisation and analysis. However, the advent of the genome

sequencing projects, high-throughput experiments, and other techniques gave rise to a huge

amount of data that necessitated to be analysed. Today, bioinformatics systems have to deal with

once inconceivable quantities of complex information, unmanageable for a scientist without

advanced knowledge of management and information processing tools [5]. Such data are rising

at an exponential rate but the knowledge contained in them is not maturating at an equivalent

pace. There are different reasons for this deficiency of productive knowledge and the most

significant is that biological phenomena can be described in many different ways [6] and this

complexity has not been tackled semantically. This means that usually the life scientists are left

with a giant realm of information that they cannot access, analyse, or integrate in a sensible way

[7].

The impossibility of drawing on information from the data available, contributes additional

pressure to implement standardised and compatible nomenclature in molecular biology. The

central problem is that biomedical scientists gather facts, often recording them in natural

language, and then use that knowledge to make inferences about yet uncharacterised

observations. Because of this, knowledge is extremely heterogeneous. While it is easy to

compare, for instance, nucleic acid or polypeptide sequences between bioinformatics resources,

the knowledge content of these resources is very difficult to compare, both for humans and

computers, because the knowledge is represented in a wide variety of lexical forms [8].

Often in biology, a word refers to two different concepts: for example, the concept of

'gametogenesis' means different processes in mammals or in plants and a user, querying a

database for this concept, needs to deal with these terminological and conceptual

incompatibilities. This situation makes it more complicated for a computer to process

information because it would not be capable to reason over the data and simply capture the

knowledge content.

Thus, there is urgent demand for strategies suitable for the representation of biological

knowledge in formal manner [9]. One possibility of capturing that knowledge within

computational applications and databases in biology can be identified in the use of ontologies,

4 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

which in last years has driven the maturation of the 'bio-ontologies' and promoted a relatively

new area of bioinformatics [10].

An ontology is a 'controlled vocabulary' that provides a way to capture and represent the

knowledge of a domain in a computer-comprehensive way. An ontology describes objects and

the relations between them in a formal way, and has a grammar for using the vocabulary terms

to express something meaningful within a specific domain of interest [11]. The labels used for

the objects and the relationships in an ontological model can provide a language for a

community to talk about the domain being modelled. By agreeing on a particular ontological

representation, a common vocabulary can be used to describe and ultimately analyse data. Such

sharing has obvious benefits because it helps humans to make inferences about a studied

domain.

The data, that are clues for enriching the knowledge about the domain, become much easier

to handle as the same things are referred to in the same manner across the resources in which

those data are stored. If different biological databases use the same ontologies to describe their

data objects, the bio-ontologies can be used to link the databases and retrieve information from

them. Ultimately, since ontologies give a well-defined semantics for the knowledge

representation language, machine can make inferences about the facts expressed in that

language [8].

Ontologies are designed for the domain and application that they are intended to support,

however, it is forth pointing out that, for any ontology to be valuable, it has to be defined

following specific rules and assertions. There are several fundamental characteristics that an

ontology must possess to be considered complete and ready to be widely used [12]:

• Completeness: ontologies are designed to capture the maximum quantity of relevant

concepts for the domain they represent;

• Formalism: ontologies are built using mathematical formalisms, making them readable by

computer machines;

• Understandability (by humans): ontologies are built using natural language terms, making

1.1. Data integration and ontologies 5 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

them accessible to scientists;

• Freedom: ontologies aim to represent conceptual domains independently of any specific

use or implementation [13].

Figure 1.1: Interplay between ontologies, biology,

computer science and philosophy. Molecular biologists

discover facts that need to be organised and stored in

databases. Computer scientists provide techniques for data

representation and manipulation. Philosophers and

linguists help in organising the meaning behind database

labels [14].

Therefore, the development of an ontology requires in depth subject knowledge, computer

science skills to provide techniques for data representation and manipulation, and

philosophy/linguistics understanding to organise semantics behind data labels. The interplay

between all this disciplines is illustrated in figure 1.1.

Finally, it is worth pointing out that an ontology aiming to be of public interest, has to be

widely acknowledged by the community of the specific domain that it tries to represent.

6 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Furthermore, the entire scientific community needs to be strongly involved in the improvement

of a newly created ontology and is expected to promote the concept that only single ontologies

for each area should be placed in the public domain.

Number and variety of ontologies will most probably grow in the following years. They have

widely revealed themselves as useful tools not only to successfully integrate different resources

but also to create knowledge and accomplish predictions [15]. It has been for instance

demonstrated that functional annotation of new sequences based on sequence similarity is not

optimal [16] while semantic methods based on ontologies and applied to the same task can

represent a real improvement [17].

1.2 Open Biomedical Ontologies

The Open Biomedical Ontologies Foundry1 initiative gave shape to some principles partly

described above as relevant for an ontology to be of general interest, such as being widely

disseminated and accepted among users of the field that it aims to describe. The OBO Foundry is

a collaborative experiment involving developers of science-based ontologies who are

establishing a set of principles for ontology development. The goal is creating a suite of

orthogonal interoperable reference ontologies for the biomedical domain. It has been, and still

is, a strong community effort devoted to ensure wide ontological coverage on one side and to

avoid duplication of activities on the other. Some of the many OBO Foundry candidate

ontologies are reported in Table 1.2.

The aim of this initiative, focused on object-level questions, is to represent in an exhaustive

way the proteins, organisms, diseases or drug interactions that are of primary interest in

biomedical research [18].

1http://obofoundry.org

1.2. Open Biomedical Ontologies 7 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Table 1.2: Summary of some of the many groups developing ontologies who have expressed an interest in

OBO Foundry goal.

As a tangible result, the Open Biomedical Ontologies (OBO) library is now a unique

collection of controlled vocabularies shared across different biological and medical domains that

forms the basis of the OBO Foundry. The main role of the OBO is to be the reference resource of

ontologies in the biological science domain. It is supported by the NIH Roadmap National

Center for Biomedical Ontology (NCBO) through its BioPortal and it is continually kept up-to-

date by ontology-based developers. There are currently over 60 live-science ontologies lodge in

OBO, covering domains such as anatomy, development and phenotype, genomic and proteomic

information and taxonomic information. All of them use a range of different attributes to

describe the respective biological domain.

There are many resources available under the OBO umbrella, and most of these are shown

in figure 1.3, in which OBO have been roughly arranged along a spectrum of genotype to

phenotype. To be included in OBO Foundry, an ontology has to be developed following a set of

8 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

principles that are used to give coherence to wider ontological efforts across the community:

• Openness: ontologies must be available to all, without any constraint or license on their

use and it is only asked that users acknowledge the original source. This encourages usage

and community buy-in and effort;

• Common representation: this is either the OBO format2 or the Web Ontology Language

(OWL)3. This provides common access via open tools and offers common semantics for

knowledge representation;

• Independence: lack of redundancy across separate ontologies encourages combinatorial

re-use of ontologies and the interlinking of ontologies via relationships;

• Identifiers: each term should have a semantic-free identifier, the first part of which refers

to the originating ontology. This promotes easy management;

• Natural language definitions: terms themselves are often ambiguous, even in the context of

their ontology, and definition helps ensure appropriate interpretation. Thus, the terms in

each ontology must have a proper textual definition explaining clearly the exact meaning

of the concept within the context of a particular ontology.

2http://www.geneontology.org/GO.format.shtml#oboflat

3http://www.w3.org/TR/owl-features/

1.3. The Gene Ontology project 9 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Figure 1.3: The OBO ontologies arranged on a spectrum of genotype to phenotype, according

to their main domain [8].

1.3 The Gene Ontology project

The Gene Ontology4 (GO) project began in 1998 as a collaborative effort between three

model organism databases: FlyBase (Drosophila), the Saccharomyces Genome Database (SGD)

and the Mouse Genome Informatics (MGI) project [21]. Since then, many databases have joined

the GO Consortium including several of the world's major repositories for plant, animal and

microbial genomes.

Nowadays, the GO is the most successful OBO ontology and it is used in several studies

including expression profile analysis and proteomic studies to extract additional knowledge

from the huge amount of data available.

4http://www.geneontology.org

10 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

The GO project moved its first step from the consideration that a large fraction of the genes,

derived by genomic sequencing and specifying the core of biological functions, are shared by all

organisms.

At the moment, many robust methods are at hand for automated transferring of biological

annotations from the experimentally tractable model organisms to the others based on gene and

protein sequence similarity. The knowledge accumulated can be often transferred across

organisms but there is a wide range of hurdles to overcome. First, the current system of

nomenclature for genes and their products is not followed correctly. Even when an underlying

similarity between two genes can be appreciated, the experts are not very confident in using the

right nomenclature. Secondly, the lack of the interoperability between genomic databases limits

the use of the content of these databases. The Gene Ontology project was formed to help in the

solution of these major barriers.

The GO project has three main goals:

i) To develop and maintain a set of controlled and structured vocabularies, or ontologies

[22, 23], for the description of genes and gene products

ii) To use these vocabularies to annotate genes and gene products in biological database

from as many species as possible

iii) To provide a public resource allowing access to ontologies, to gene annotation files and

to specific tools developed to utilise all GO data [24]

1.4 Gene Ontology Annotation (GOA)

Data annotation is primarily progressed for species-specific database resources, such as the

Mouse Genome Informatics and FlyBase, and in multispecies resources such as Uniprot. The

complete list of contributing database groups and the total numbers of annotations are listed on

1.4. Gene Ontology Annotation (GOA) 11 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

the GO web page5. Among such contributors, there is the GOA group located at the European

Bioinformatics Institute (EBI)6.

The Gene Ontology Annotation (GOA)7 project aims to provide high-quality GO annotations

to proteins of the UniProtKnowledgebase (UniProtKB)8 and the International Protein Index

(IPI)9. It is also a central dataset for other major multi-species databases such as Ensembl10 and

NCBI11.

GOA has been a member of the GO Consortium since 2001, and is responsible for the

integration and release of GO annotations to the human, chicken and cow proteomes. GOA is

also committed to the comprehensive annotation of a set of disease-related proteins in human.

High-quality GO annotations are generated through a combination of electronic and manual

techniques, the latter being accomplished by expert biologists.

By annotating all characterised proteins with GO terms and facilitating the transfer of this

knowledge to similar uncharacterised proteins, the Uniprot group will make a valuable

contribution to biological and biotechnological research through a better understanding of all

proteomes.

5http://www.geneontology.org/GO.current.annotations.shtml

6http://ebi.ac.uk/

7http://www.ebi.ac.uk/GOA/

8http://www.ebi.ac.uk/uniprot/index.html

9 http://www.ebi.ac.uk/IPI

10http://www.ensembl.org/

11http://www.ncbi.nlm.nih.gov/

13

Chapter 2

Disease Ontology Knowledgebase

This second chapter describes in some details the development of the Disease Ontology

Knowledgebase, a computational resource useful to represent relations between genes, drugs

and diseases to help understand mechanisms of diseases. The data sources are described in the

first part. There is then a section on how the data were collected and organised in the database.

Last part finally suggests possible applications to make full use of the system.

2.1 Introduction

Last years development and implementation of high-throughput functional genomic

technologies have resulted in the rapid accumulation of genome-scale data sets. Simultaneously

linkage analysis and association studies that identify disease-associated genes are generating

increasingly large candidate gene sets that need to be exploited. It remains however a difficult

task to identify the most likely gene-disease relationship since the etiology of most chronic

diseases involves interaction of environmental factors and genes that modulate important

biological processes [25]. This is even more complicated by the not well understood molecular

mechanisms underlying the correlation between chemicals and diseases.

Additional limitation is that scientists involved in different research fields, are currently

hampered by the specialization of their technical language. For example, a physician trying to

collect information on gene products correlated to ‘Epilepsy’ might find that same genes are also

relevant for 'Febrile Seizure' and 'Unverricht-Lundborg Disease' without knowing that the latter

14 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

is just a correct synonym for a type of ‘Epilepsy’. A strictly correlated problem is polysemy,

which is the ambiguity of an individual word or phrase that can be used in different contexts

with different meanings. As a result, there is a major, continuing need to aggregate and annotate

data on genes, drugs, diseases and their interactions to generate new knowledge.

To make the best use of biological databases, different kinds of information from

different sources must be integrated in ways that make sense to the entire scientific community.

Ontologies are a valuable possibility for data integration [2]. Following the example of Gene

Ontology Annotation (GOA) project, our goal is to classify and represent gene-drug, gene-

disease or gene-drug-disease associations in a standardised way using ontologies. The idea is to

associate genes both related to disorders and regulated by drug treatment using the terms of the

Disease Ontology (DO), with the final objective of building a knowledge base of genes, drugs and

targets to help the investigation of the molecular processes relevant to diseases.

2.2 Basic resources

Many data sources focused on gene data, drugs and diseases, such as ontologies and

specialised databases were evaluated to develop the knowledgebase and five were selected; three

to build a vocabulary of disease names, and two others to develop an equivalent dictionary for

drug names.

2.2.1 Disease Ontology

The Disease Ontology1 (DO) is a controlled medical vocabulary modelled on the GO

structure and developed at the Bioinformatics Core Facility, in collaboration with the NuGene

Project, at the Center for Genetic Medicine (Chicago, US). It was designed to facilitate the

mapping of diseases and associated conditions to particular medical codes such as ICD9CM2

1http://diseaseontology.sourceforge.net/#projects

2The International Classification of Diseases, Ninth Revision, Clinical Modification is the official system

of for the classification of disease entries, diagnostic, and therapeutic procedures associated with hospital

utilisation in the US.

2.2. Basic resources 15 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

SNOMED3 and others. The Disease Ontology is implemented making use of directed acyclic

Figure 2.1: Screenshot of DO using OBO-Edit

version 1.1 .

graph (DAG) representation and utilises the Unified Medical Language System (UMLS)4 [26].

Based on this standard, much of the process of updating the ontology can be more easily

handled. In a manner similar to the GO curation process and open development, the ontology is

continually extended and revised in order to broadly encompass diseases. The DO is available in

OBO format and it can be readily edited and viewed using the OBO-Edit tool. In figure 2.1 an

OBO-Edit screenshot from the Disease Ontology version 3 is shown.

3 Systematised Nomenclature of Medicine-Clinical Terms is a standardised vocabulary system that

creates a common clinical language for medical databases. Current modules contain more that 357,000

concepts. 4The UMLS contains a metathesaurus within medical concepts and a semantic network. It is intended to

be used mainly by developers of systems in medical informatics and it provides facilities for natural

language processing.

16 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Previous version (v2.1) of the Disease Ontology was almost entirely based on ICD9CM with

some additional concepts useful to map common diseases. Newest version 3 has been based

primarily on freely available vocabularies. For this project the last available version (v3)

containing 12,448 concept nodes was used after downloading from the SourceForge5 home page.

Among others (e.g. it is an OBO ontology), the choice of this ontology was based on the

consideration that it had never been used for gene annotation as for instance the GO, and

consequently annotation projects were not yet initiated. Moreover, as already mentioned in the

introduction, there were several objective advantages in using ontologies for our project of data

integration:

• Through their semantic-free identifiers (unique IDs), ontologies allow quick linkage to

other resources that already make use of their notation system (e.g. GAD database);

Figure 2.2: Genes annotated with the child concept 'synovitis' can be

transitively annotated with parent terms 'disorder of tendon' and

'rheumatism' (on the right). This is not possible out of a DAG structure (on

the left).

5http://sourceforge.net/project/showfiles.php?group id79168&package id=202115&release id=508426

2.2. Basic resources 17 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

• Terms in natural language are often ambiguous even in the context of their ontologies, the

hierarchical definition structure (DAG) ensures however appropriate interpretation

(Figure 2.2);

• Finally and most importantly for computational projects, ontologies are machine

processable.

2.2.2 Online Mendelian Inheritance in Man (OMIM)

The Online Mendelian Inheritance in Man (OMIM)6 is a comprehensive, authoritative and

regularly updated knowledgebase of human genes and genetic disorders compiled to support

human genetics research, education and the practice of clinical genetics [27].

OMIM data are organised in two different files: the 'Gene Map' and the 'Morbid Map' files,

both available at the OMIM project FTP site7. The OMIM Gene Map is a single file, in tabular

format, listing genes that are described in the database. Not all OMIM entries are included in the

Gene Map, but only those for which a cytogenetic location has been published in the cited

references. Each entry is a list of fields such as gene location, gene symbol, MIM number,

disorders and reference. The OMIM Morbid Map is an alphabetical list of diseases used in the

database and their corresponding cytogenetic locations.

2.2.3 Genetic Association Database

The Genetic Association Database8 (GAD) is a publicly available NIH based database of

published gene-based genetic association studies which contains records of over 5,000 human

genetic association studies. The database is centred on genes and provides a standardised

molecular nomenclature by including official HUGO gene symbols. Each record refers to a gene

or a marker and is annotated with links to molecular databases (e.g. LocusLink, GeneCards) and

reference databases (e.g. PubMed, CDC) [28]. The goal of GAD is to allow rapid identification of

6http://www.ncbi.nlm.nih.gov/omim/

7ftp://ftp.ncbi.nih.gov/repository/OMIM/

8http://geneticassociationdb.nih.gov

18 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

medically relevant polymorphisms from a large volume of mutational data.

There are several data fields in GAD collected from genetic association studies, such as

disease, phenotypes, sample size and allele descriptions (Fig. 2.3). Of particular interest to the

Disease Ontology Knowledgebase project are the several disease data fields. A top level 'disease

class' is assigned followed by 'disease' specification from the original paper. Then, there is the

'Broad (or Narrow) Phenotype' disease class that is assigned if studies recognise clinical

subphenotypes and finally there is the MeSH Disease Terms. Full list of disease/phenotype

available in GAD can be freely retrieved9.

In addition, the OMIM gene field links each GAD official HUGO gene name to OMIM ID.

This database was selected as a valuable external resource because it is based on manual

curation and therefore provides an excellent baseline for constructing our knowledgebase. A

relatively large community of experts registered in ad-hoc list contributes to the GAD curation

process. Anyone specialised in either a specific disease, and/or a specific gene or other related

expertise, such as disease or gene specific data collections is invited to enter the list.

Figure 2.3: A simple search of associations for the disease schizophrenia. Fields in

this view include Official Gene Symbol, Disease Phenotype, Disease class, OMIM

ID, MeSH Disease term.

9http://geneticassociationdb.nih.gov/diseaselist.html

2.2. Basic resources 19 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

2.2.4 DrugBank

The DrugBank10 is a unique bioinformatics and cheminformatics resource combining

detailed drug data (i.e. chemical, pharmacological and pharmaceutical) with comprehensive

drug target information (i.e. sequence, structure, and pathways) [29]. It includes physical

property data, structure and image files, pharmacological and physiological data on thousands

of drug products as well as extensive molecular biological information about their corresponding

drug targets.

Each DrugCard contains more than 80 data fields with half of the information being

dedicated to drug/chemical data and the other half to drug target or protein data. Each entry is

created and formatted by one member of the curation team and then separately validated by a

second member of the same team that guarantees quality and completeness. Drug targets and

drug structures are accurately confirmed by using multiple data sources (e.g. PubMed, RxList,

PharmGKB, KEGG, PubChem).

Especially for the massive manual curation, the high-quality data collected in DrugBank was

partly integrated in our DO Knowledgebase.

2.2.5 PharmGKB

The Pharmacogenetics and Pharmacogenomics Knowledge Base11 (PharmGKB) is a public

resource that contains genomic, phenotype and clinical information collected from ongoing

research and from the literature [30]. It is devoted to cataloguing information about

pharmacogenes, which are genes involved in modulating the response to drugs [31].

Pharmacogenes are either involved in the pharmacokinetics (PK) of a drug (how the drug is

absorbed, distributed, metabolised and eliminated) or the pharmacodynamics (PD) of a drug

(how the drug acts on its target and its mechanisms of action).

The aim of PharmGKB is to capture the relationships between drugs, diseases/phenotypes

and genes from several types of information such as literature annotations, primary data sets,

PK and PD pathways, and expert-generated summaries of PK/PD relationships [32].

10

http://redpoll.pharmacy.ualberta.ca/drugbank/index.html 11

http://www.pharmgkb.org/index.jsp

20 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Figure 2.4: Some of the relationships among data objects in PharmGKB. Today, the

PharmGKB has curated evidence for nearly 2,000 genes involved in drug response.

There are 545 drugs with associated phenotype and genotype data or literature annotations,

57 manually created drug-centred pathways, 542 diseases with supporting information and

more than 2100 literature annotations (Figure 2.4). The scientific community contributes to

growing the database content by providing information about gene-drug, gene-disease or gene-

drug-disease associations, as well as available evidences for the associations. Submitted data are

internally curated to avoid possible inconsistencies.

2.3 Methods and Results

The approach used to create the Disease Ontology Knowledgebase combines automated and

manual curation to address two principal tasks: i) extracting gene, disease and drug data from

selected sources and ii) characterising relationships using several complementary strategies.

The suggested method was divided in 4 phases:

• Phase 1: acquisition and integration of data from the external resources;

• Phase 2: compilation of two vocabularies, one of disease names and disease synonyms, and

another of drug names and drug synonyms;

2.3. Methods and Results 21 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

• Phase 3: association of diseases, genes and drugs to DO terms based on automated and

manually curated approaches;

• Phase 4: design and implementation of a MySQL database.

Several problems of data formats were faced and solved during the development of our

resource. To parse data and pull out all the relevant information available, many software

routines were designed on a case by case basis. Information was then completely re-organized to

become easily accessible to the newly developed query tool and to allow an easy maintenance

and update.

2.3.1 Retrieval and integration of data

Main initial effort was focused on the retrieval of relevant data on genes, drugs, diseases and

their inter-relationships, from each and every external database selected. This aspect was

complicated by the many differences in terms of information content and data format of those

resources. Different approaches were therefore adopted to standardise files and make them

easily accessible.

After downloading, the newest revision of the Disease Ontology 31 (revision 21) text file, it

was parsed to extract disease names and synonyms with corresponding DO identifiers, leaving

out the 'temp holding' and the 'obsolete' terms. As already mentioned, terms in the DO are

structured as DAGs; parent terms can be linked to more than one child term and in turn child

terms can have more than one parent. A Perl script was developed to navigate the data structure

and drawing inferences from selected terms, going down through descendent or up through

ancestor of a given node, and taking account of multiple paths.

The PharmGKB provides access to a selected subset of data via a SOAP interface and

documentation. The sample client code and the client programs are freely available and can be

downloaded from the home page2. Several Perl scripts have been combined by authors in order

to allow extraction of different types of information from the PharmGKB knowledgebase. In

1http://sourceforge.net/project/showfiles.php?group_id79168&package_id=202115&release_id=508426

2http://www.pharmgkb.org/home/projects/webservices/index.jsp

22 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________

particular, the specialSearch.pl script was run with option ‘6’ to obtain all diseases with

supporting information. The results were parsed and given in input to the disease.pl script to

obtain information about all related genes and drugs for each disease. Finally, drugs.pl and

genes.pl scripts were used to retrieve information on each single drug and gene. In addition,

when available, the drug chemical structure was collected and integrated in our knowledgebase.

To download the complete database in tab-delimited text files, GAD required us to

manually fill in user request form with personal credentials.

Since every entry is described in GAD by several attributes that can be sub-selected, filters

were applied to extract only those fields relevant to our project, like Broad Phenotype, Disease

Class, MeSH Disease Term, Gene, Gene Name, OMIM ID.

OMIM morbid map was used to extract additional information on disorders and genes

involved in disorders starting from the assigned OMIM ID.

Finally, since DrugBank is a freely available resource, a full set of DrugBank Approved

DrugCards was downloaded3 in a single flat file and used as a source for drug names, synonyms,

and gene target symbols.

For each database, a list of all diseases was gathered and used for the compilation of the

disease dictionary. Then, association data for genes and diseases were extracted from each

resource and successively used for the gene annotation process.

2.3.2 Acquisition of disease name synonyms

One aspect that had to be undertaken was the presence of disease synonyms that often are

used to describe the same disease with different names. Also genes known to be associated to the

same disease are often annotated to different synonyms causing retrieval problems or

incompleteness. Therefore, in order to solve the problem, external resources, and again GAD,

PharmGKB and OMIM, were accurately parsed to provide an additional set of disease synonyms.

The strategy used to compare DO terms associated to disease names was based on the

combination of an automated association process (i.e. comparison algorithm) with a very time

consuming manual curation. The former produced an initial relatively low-quality set of

3http://redpoll.pharmacy.ualberta.ca/drugbank/cgi-bin/download.cgi

2.3. Methods and Results 23 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

associations derived without human intervention, the latter instead improved quality to a much

higher standard.

All the data collected were appropriately formatted to be more suitable for any available

analysis tool. The list of disorders included in the DO was used to link the internal ontology to

external resources. Each DO concept was mapped to any other database containing disease

names by running a Perl script specifically designed to allow term-term comparison, identify

overlapping definitions and extract correlated synonyms when available. In order to perform the

comparison process in all databases, the script was adjusted to be applied to different file format

inputs. Automated processing was focused on the principle of reducing as much the false

negatives as possible accepting meanwhile limited stringency on the false positives.

This initial approach allowed maximising the identification of possible synonyms from the

beginning devoting accuracy to the manual step. Being in the context of standard definitions and

not of the natural language, intended in its widest accepted meaning, no sophisticated learning

algorithms were necessary to make comparisons. The automated comparison method was based

on the application of simple rules to score the level of identity between sentences, also taking

into consideration some semantic content of composing words when possible. Similar

definitions were considered synonyms if at the first instance they responded to the following

condition: I>= int (K/2) where K=T-N (I=Identities, T=total number of words, N= words not

relevant). Conjunctions, generic medical words and order of terms were considered either

irrelevant for the identification of synonyms of diseases or negatively correlated to the level of

identity to be calculated. Main limitations of this comparison approach were the impossibility to

spot synonyms when definitions contained different words with the same meaning and also

when completely different definitions of the same disease existed (e.g. depressive disorder and

major depression).

When a DO term was successfully mapped to a disease name present in one of the source

databases, all its synonyms were extracted. The next step corresponded to the accurate curation

of the results that also addressed the false positives problem (Fig. 2.5).

A disease vocabulary was therefore created and almost all the diseases described in external

databases were appropriately associated to at least one DO term with a unique identifier.

24 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Figure 2.5: Overview of the method. The input files of the method correspond to two

lists of disease names and synonyms: one from the Disease Ontology and one among

GAD, OMIM and PharmGKB. In this example, the DO dictionary is augmented by

synonyms provided by PharmGKB. After the initial filtering using comparison

algorithm, a manual curation has been applied to correct the result and find

additional associations.

2.3. Methods and Results 25 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

The highest number of exact matches was found between the DO and PharmGKB database.

A total of 2,633 exact matches between these two resources were obtained, e.g. osteoporosis

(DOID: 11476 and GKB: PA445190), and rheumatoid arthritis (DOID: 7148 and GKB:

PA443434).

Table 2.6 recapitulates the number of matches between DO and external databases. Column

A represents the total number of associations generated by the script used for the comparison.

Table 2.6: Results of the comparison between DO and the three resources are reported;

numbers in brackets correspond to total number of terms for each database. Column A: total

number of associations generated by the script used for the comparison. Column B: sum of

totals in column C and D. Column C: total number of identities between the DO name and the

name or synonyms in the other database. Column D: total number of matches found after the

manual curation.

The associations, including false positives, were redundant and required curation process.

For instance, the script found and filtered 186,894 possible PharmGKB positive results that

corresponded to 2,976 non-redundant associations. After manual curation of this large set of

almost 3000 entries, 2,866 resulted as correctly matched by the script (column B). The total

matches are derived from the addition of the matches in column C and D. Column C shows the

identities between the DO name and the name or synonyms in the other database; column D

shows the matches found after manual curation. The highest global overlap (71.68%) was found

between the DO and PharmGKB database. The DO terms associated to the highest levels of the

26 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________

ontology hierarchy were easily spotted in all databases e.g. osteoporosis (DOID: 11476;

PharmGKB: PA445190; OMIM: 166710). The low-level DO terms, which refer to more specific

disease classes, were anyway found in at least one external database.

2.3.3 Gene annotation findings and gene-disease relationships

Gene information was retrieved and downloaded as available from the NCBI FTP site4. The

file containing human gene-based information only was then parsed to collect data from of

interest and the extracted information was implemented in a MySQL database (Fig. 2.7).

Gene annotation data were also found in the GOA gene association files5 that maintains the

Figure 2.7: The general disease vocabulary has been populated using GAD, OMIM and

PharmGKB data. This vocabulary was used to search in the genetic association file obtained

from GAD. All matches between the two files were collected in a gene annotation file where

genes are associated with one or more DO diseases with a unique DO ID.

4ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene info.gz

5ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/gene association.goa human.gz

2.3. Methods and Results 27 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

GO assignments for the proteins of the non-redundant human proteome set.

Disease names and synonyms of the general disease vocabulary were used to search in the

genetic association file produced with data collected from GAD. Matches between the two files

were collated in a general gene annotation file where genes are associated to one or more DO

disease with a unique DO ID (Fig. 2.7).

2.3.4 Compilation of the drug dictionary

The list of drugs used in our database was compiled from DrugBank and PharmGKB, being

the former first used in order of time.

DrugBank content is based on the 'active principle' or ‘active ingredient set’ of drugs. Due to

the effort required for curation, some drugs are included in the queue of 'to be added' drugs even

if publicly available (e.g. nimesulide). Relevant data were extracted from each DrugCard entry by

selecting, among others, the following fields:

• Generic name: standard name of drug as provided by drug manufacturer;

• Brand name and synonym: alternate names of the drug, brand names from different

manufacturers;

• Brand name mixtures: brand names and composition of mixtures that include the drug

described in the DrugCard file;

• Indication: description or common names of diseases that the drug is used to treat;

• Drug target(s) name: name of the protein or macromolecule (or other small molecule) that

the drug is supposed to act upon. Some drugs act on multiple targets, so these fields may

be repeated several times, reflecting the number of drug targets that a specific drug may

have;

• Drug target(s) gene name: gene name of drug target;

• Drug target(s) synonyms: alternate names (protein names, abbreviations, etc.) of the drug

target;

• Other fields such as ChEBI ID, CAS RN and PharmGKB ID.

28 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Starting from the drug’s generic name list obtained from DrugBank, Perl scripts were run on

PharmGKB database to extract additional entries (names or synonyms) and populate a general

dictionary in which each generic drug name is associated to all possible synonyms and to

mixtures possibly including the drug. A mixture might be associated to more than one drug e.g.

Ana-Kit is a mixture composed of Chlorpheniramine (APRD00001) and Epinephrine

(APRD00450). A total of 1,349 drug active principle names and 24,303 synonyms have been

identified, each of them have an average of 19 associated synonyms (Table 2.8).

Table 2.8: A total of 1,349 drug active principle names with an average of 19 synonyms

have been identified. 216 drugs are from DrugBank, 149 from PharmGKB and 984 are in

common. The total number of synonyms is 24,303 and number of mixture is 253.

2.3.5 Finding correlations between drugs and diseases

The following action was planned to extract and characterise the relationships between

drugs and diseases. The Indication field in DrugBank provided a description of the possible uses

of a drug for the treatment of specific disorders. Unfortunately, these definitions did not follow

any standard as they are derived by natural language. Finding drug-disease associations was

therefore further complicated by the large number of possible false positives. In order to address

this problem, Perl scripts were developed implementing stringency criteria useful to improve

predictivity as much as possible. Nonetheless, output files required manual curation to increase

the level of accuracy for each drug-disease relationship. As a result, 888 drugs from DrugBank

have been associated to 801 DO diseases (Fig. 2.9).

2.3. Methods and Results 29 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Figure 2.9: Example of associations between DO diseases and drugs.

Several drugs were associated to more than one DO term, as for example the

Chlorpheniramine (DrugBankID: APRD00001), which has indications for the treatment of

rhinitis, urticaria, allergy, common cold, asthma and hay fever. As expected it was properly

linked to six DO entries by our knowledgebase (Table 2.10). Chlorpheniramine was also

associated to the DO term 'hypersensitivity' as synonym of the disease name 'allergy'.

Table 2.10: Chlorpheniramine (DrugBankID:APRD00001), used for the treatment of

rhinitis, urticaria, allergy, common cold, asthma and hay fever, has been associated to six

DO entries.

30 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Thus, usage of our knowledgebase confirmed its capacity to retrieve the principal diseases

associated to a specific drug. In a similar manner, different drugs associated to the same disease

can be equally identified (Table 2.11).

Table 2.11: Divalproex and Rizatriptan are both used for the treatment of migraine DOID:

6364.

2.3.6 Identifying relationships between drugs and target genes

Gene or proteins are identified as disease key molecules when involved in specific metabolic

or signalling pathways relevant to given condition or pathology. A protein however, might also

be considered key molecule because target of drug treatment. Its inhibition for instance, could

block a pathway in the disease state.

Since the DrugBank database provides this type of information, it was used to characterise

and store relationships between drugs and the respective target genes. Each drug is ID linked to

relevant entries in the GENE table (Figure 2.12) of the MySQL database, which also contains

UniprotID, alternative gene name etc. (Figure 2.13).

2.4. Possible applications 31 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Figure 2.12: Example of associations between genes and drugs.

2.4 Possible applications

Analysis of differential gene expression is a widely known and useful method to reveal list of

possible disease candidate genes, which is usually a gradient of entries spanning from well-

known in the literature to completely new. Screening such a list in PubMed is neither quick nor

easy even for a general overview and the process is further complicated by the availability of

synonyms and common names. For instance, search of the BDNF official gene name in

combination with ‘bipolar’ retrieves 177 PubMed abstracts. This number is reduced to 153 if the

extended name Brain Derived Neurotrophic Factor is used instead. An equivalent search for

ALOX12 results in 1 entry only. However, both genes would be quickly associated to bipolar

disorder using our DO Knowledgebase. If relevant, it is then even more difficult to find public

domain resources able to correlate genes to known interacting drugs, which is another useful

option of the DO Knowledgebase. In summary, retrieving useful information on genes, diseases

and/or drugs of interest, is a time consuming and sometimes frustrating job. Our DO

knowledgebase is potentially useful to get through long lists of expression derived gene data

because it easily helps organising information at the higher-level saving time for detailed and

focused analysis.

Aim of the work has been to provide the research community with a tool able to deliver

32 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

quick overview of potential links between genes, drugs and diseases. The DO Knowledgebase

successfully connects those data making use of several very up to date and manually curated

resources. As soon as the web interface is available, the user will be able to browse the list of

disease concepts or query the database to search disease terms or genes of interest. Intersection

of different gene sets relevant for different disease concepts of particular interest will also be

possible. It is, for instance, widely recognised that people suffering for one specific sub-type of

mood disorders have an increased susceptibility to additional mood disorder’s sub-types. With

this approach it would be feasible to answer questions such as: how many genes are associated

to the several known types of depression in a complex expression experiment? And among them,

how many are in common or appear frequently? When those genes are identified, it is quick and

Figure 2.13: Schema of the MySQL database.

2.4. Possible applications 33 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

straightforward with the DO Knowledgebase to verify if they are associated to any available

therapeutic drugs for mood disorders. Moreover, finding gene-drug relationships would form

the basis of more detailed pharmacogenetic experimental investigations.

35

Chapter 3

Case study

This chapter provides description of the research case study used to verify quality and

potentialities of the Disease Ontology Knowledgebase. There is an initial introduction followed by

some background to contextualize molecular mechanisms of disease in the domain of

neuropsychiatric disorders. The central part of the chapter is divided in two sections:

a. DO Knowledgebase analysis and ‘Methods and results’ part one.

b. Pathway/Network analysis and ‘Methods and results’ part two.

Last paragraph is devoted to some discussion on this second part of my PhD activity.

3.1 Introduction

To test the functionality of the Disease Ontology Knowledgebase developed in my PhD

activity, validate the internal consistency of the data and possibly broaden current state of

knowledge on the subject, genes and mechanisms involved in dendritic plasticity were

investigated especially in correlation with neuropsychiatric diseases.

Evidences that etiopathology of several cognitive disorders is strongly influences by

regulation of plasticity mechanisms at the molecular level started accumulating recently. There

are many aspects of this interesting research subject already known in the literature that are

useful to control and verify results, several others however still remain to be elucidated. General

description of the multiplicity of molecular aspects involved in many of the disorders is often

36 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

lacking and even disease categorization is based on poorly objective criteria. Some detailed

overview of the many aspects implicated, spanning from basic definitions to description of state

of the art research, is reported in the next paragraphs (3.2, 3.2.1, 3.2.2, 3.2.3). Background is

provided in order to better understand methods and results described in the second part of the

chapter.

This study does not pretend to give any exhaustive results on the subject; it only suggests a

research activity that makes use of the computational resource developed during my PhD activity

integrated in a wider computational strategy. Main objective was the validation of the DO

Knowledgebase in a real research case, which is a less subjective method compared to test

examples created ‘ad hoc’. The good results obtained however, represent an interesting starting

point for further investigations in the field.

3.2 Background

Mental Disorders are categorized according to their predominant features. For example,

phobias, social anxiety, and post-traumatic stress disorder all include anxiety as a main feature of

the disorder. All of these disorders are therefore categorized under Anxiety Disorders. There are

over 300 different psychiatric disorders listed in the DSM-IV. With continued research, more are

named every year and some others are removed or re-categorized.

Important factors in the molecular genetics of psychiatric illnesses and relevance of

molecular signals have been elucidated using a combination of experiments and computation.

However, research in the field is still very far from describing even the fundamental mechanisms

for most of the disorders. This is true for instance with depression, one of the most serious

mental diseases with the highest prevalence worldwide. It is becoming the major source of

disability, second only to cardiovascular diseases. Depression like most mental illnesses is

probably caused by a combination of genetics and environmental causes. Abnormalities in brain

biochemistry and in the structure or activity of certain neural circuits are known to be

responsible for the extreme shifts in mood, energy, and functioning that characterize depression.

Lithium has remained one of the most effective medicines for depression patients, but the

mechanism of this effect is still unclear even if several of its molecular targets, as the Glycogen

3.2. Background 37 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Synthase Kinase 3 (GSK3), have been identified.

Data from the last decade have suggested that abnormalities in the development and

information processing in the neuronal networks involved in emotional processes may at least

partially underlie mood disorders [33] [34] [35].

Plasticity of neuronal networks has therefore emerged not only as a determinant of the

disease but also as a necessary component in successful antidepressant treatments, including

pharmacological and psychological therapies.

3.2.1 Neuropsychiatric disorders and mechanisms of regulation

There are many hypotheses on neuropsychiatric disorders and antidepressants mode of

action, several of which are largely based on the dysregulation of the hypothalamic-pituitary-

adrenal axis (HPA) mediated by the involvement of corticotropin-releasing hormone (CRH),

glucocorticoids, brain-derived neurotrophic factors (BDNF) and CREB [36]. Others focus on the

fact that neuropsychiatric disorders are stress-related and there are good evidences that episodes

of depression for instance often occur in response to stress to some trauma. A prominent

mechanism by which the brain reacts to acute and chronic stress is through the activation of the

HPA axis. When activated by exposure to stressors, CRH is produced within the hypothalamus.

In turn, CRH stimulates the anterior pituitary gland to release adreno-corticotropic

hormone (ACTH) into the bloodstream. ACTH then stimulates the release of glucocorticoids (e.g.

cortisol in human) from the adrenal cortex [37]. Circulating glucocorticoids interact with their

receptors in various target organs such as the liver and muscle tissue, as well as the brain and the

HPA axis itself. Here they are responsible for initiating feedback inhibition. Thus they exert

profound effects on general metabolism and also affect several processes like neurogenesis,

survival of neurons, neuronal plasticity, neuronal cell proliferation and cell death [38].

Other hypothesis suggests a role for neurotrophic factors at the basis of several pathological

neuropsychiatric conditions. They regulate neuronal growth and differentiation during

development but are also known to be potent regulators of plasticity and survival of adult

neurons and glia cells. Many papers dealing with the neurotrophin hypothesis have shown that

acute and chronic stress decreases levels of BDNF expression in several brain regions [39].

38 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

One of the emerging mechanisms of particular interest in brain disorders is widely

recognised to be plasticity of dendritic spines but details have not yet been elucidated.

Dendritic spines are morphological specializations that protrude from the main shaft of

dendrites. Most excitatory synapses in the mature mammalian brain occur on spines. So, spines

represent the main unitary postsynaptic compartment for excitatory input. Dendritic spines

generally consist of a head attached to a dendrite via a stalk or a neck and within this general

description, spines span a continuum of shapes. Spines have been classified by shape as thin,

stubby, mushroom and cup-shaped. However, spine morphology is not static; spines change size

and shape over variable timescales.

There is no definitive answer on the significance of dendritic spines, but prevailing view is

that their primary function is to provide a microcompartment for segregating postsynaptic

chemical responses, such as elevated calcium.

Regulated changes in spine number might reflect mechanisms for converting transient

changes in synaptic activity into long-lasting alterations. Indeed, changes in spine density have

been observed in response to changes in the efficacy of neurotransmission. In general terms,

spines seem to be maintained by an 'optimal' level of synaptic activity: spine density increases

when there is insufficient activity, and decreases when stimulation is excessive. Moreover, spine

morphology is markedly influenced by the activity of glutamate receptors.

In depth knowledge of dendritic plasticity would contribute better understanding of a wide

range of diseases with apparently different pathogenesis and symptoms, different drug

treatments but possibly common molecular determinants.

3.2.2 Dendritic plasticity

In neuroscience, synaptic plasticity is the ability of the connection, or synapse, between two

neurons to change in strength. Neuronal plasticity or remodelling is most often discussed with

regard to cellular and behavioural models of learning and memory. However, neuronal plasticity

is a fundamental process by which the brain acquires information and makes the appropriate

adaptive responses in future-related settings.

3.2. Background 39 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

The majority of synapses throughout the nervous system are made onto dendrites.

Dendrites are a major determinant of how neurons integrate and process incoming information,

and thus, they play a vital role in the functional properties of neural circuits. Thus, the extent of

a neuron’s dendritic arborisation is an important determinant of its input structure and affects

how incoming information is integrated and processed. Dendritic spines exhibit rapid motility.

Most spines can change shape in seconds. The shape change involves a remodelling of the

cytoskeleton in the spine, and actin-based protrusive activity from the spine head. The

underlying molecular mechanisms of this motile behaviour, and its functional significance, are

unknown. Existing data suggest that more spines form when neurons have less excitatory

activation, are maintained by optimal activation, and are lost when activation is too high, or if

the presynaptic axons degenerate [40]. This pattern supports the hypothesis that neurons may

homeostatically regulate input through spine number. It also suggests a second important fact

about dendritic spines. Extra spines that form when excitatory neuronal activation is low can

provide a morphological basis to support new synaptic plasticity.

Considerable progress has been made in identifying the molecules that control spine growth

and maturation. The cytoskeleton is crucial for their development and stability, and an

expanding set of actin-binding and actin-regulatory molecules has also been implicated in these

processes. They include Ras, and GTPases of the Rho/Rac/Cdc42 family, the small GTPase

series of receptors and scaffold proteins [41]. Several questions remain however to be answered

in this nascent field.

3.2.3 Factors influencing dendritic plasticity

Antidepressants have been shown to induce neuronal plasticity when administered over a

sufficiently long period [42] [43] in several cortical regions particularly the hippocampus, in a

manner that is analogous to that produced by favourable environmental stimulation [44].

Importantly, if neurogenesis is prevented, antidepressants fail to produce typical behavioural

responses in rodents, demonstrating at least an association between neurogenesis and

behavioural effects on antidepressants [45]. Another manipulation that causes widespread

changes in dendrites in a variety of adult brain regions is exposure to drugs of abuse. Because

40 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

chronic exposure to drugs causes profound experience dependent changes in behaviour, it was

hypothesized, based partly on the data from environmental enrichment and training, that such

exposure could also cause persistent changes in dendritic structure. Furthermore, drugs of

abuse are known to change the concentration of various neurotransmitters at synapses, which is

known to affect dendrite development.

Recent studies have indicated that neurotrophins also control dendritic growth and

arborization in the CNS. Some authors have found that all four neurotrophins have diverse

effects on dendritic arborization of pyramidal neurons in slices of developing visual cortex [46].

Horch and Katz (2002) have published elegant studies showing that BDNF supplied by a single

neuron in ferret cortex brain slices induces dendritic branching in nearby neurons in a distance

dependent manner [47]. These data support a role for BDNF in dendritic growth and

remodelling in the neocortex. However, it appears that some neurotrophins exhibit opposing

effects on cortical dendritic growth. In addition, it has been shown that NT-3 exerted dendrite-

retractive effects in developing visual cortex, suggesting a “push-pull” control of dendritic

arborization by different neurotrophins [48].

One of the simplest ways to link changes in activity to changes in dendrites is via the effects

of neurotransmitters themselves. The effects of glutamatergic transmission, particularly

mediated by the N-methyl-d-aspartate (NMDA) subtype of receptor, have been shown to affect

dendritic structure in the developing cortex, and there is an interaction between synaptic

activity driven by glutamate and the effects of BDNF.

3.3 Methods and results (Part I)

The strategy adopted to give an insight into the mechanisms involved in dendritic plasticity

and neuropsychiatric disorders illustrated in previous paragraphs is described in the following

summary and then detailed in paragraphs from 3.3.1 to 3.4.4:

Step 1: Identify relevant query terms to interrogate selected databases (e.g. Gene

Ontology) and extract as an exhaustive list of known genes associated to dendritic

plasticity as possible.

3.2. Background 41 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Step 2: Completely annotate genes with information on functional role, tissue

localization, molecular mechanisms involved etc. Filter list based on gene

expression criteria.

Step 3: Submit gene list to the Disease Ontology Knowledgebase and extract all the

associated diseases.

Step 4: Investigate gene distribution in canonical pathways.

Step 5: Extract the genes showing an involvement in at least two related diseases

identified in previous step.

Step 6: Build a network of relationships for genes identified in Step 5 to delineate

molecular context and spot topological “hubs”. Use the DO Knowledgebase to

study the network.

Step 7: Collect available gene-drug information for genes selected in Step 5 and Step 6.

3.3.1 Selection of query terms and creation of a gene list

Information sources covering the literature space, public domain ontologies and commercial

bioinformatics software solutions were selected in order to extract genes from computationally

structured knowledge.

Short textual definitions able to quickly describe the several known aspects of the dendritic

plasticity molecular mechanisms were short listed to interrogate the Gene Ontology, PubMed

and GeneGO (GeneGo® bioinformatics software)1 and extract relevant genes. These definitions

included free text query terms pertinent to biological processes (e.g. plasticity), morphological

aspects (e.g. dendritic spines) and known mechanisms (e.g. known protein markers). Several

1http://www.genego.com

42 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________

restrictions were applied to the original query definitions to avoid retrieval of either too

unspecific or just marginally relevant genes. The resulting list was than mapped when possible

to standard biological subject headings (i.e. descriptors) and used to interrogate the data

sources selected.

The expected output was a group of genes containing several hundred entries significantly

associated to the query terms (Fig. 3.1).

Figure 3.1: Query terms related to the mechanism

under investigation are selected through several

criteria and used to identify genes correlated.

Gene duplications caused by alias names were consolidated to end up with a final group of

more than 200 relevant genes emerged from a total of approximately 500. These genes were

selected to proceed with complete annotation followed by data analysis. Full list of genes is

available in Appendix A.

Data sources

- PubMed - GO - Networks Warehouse - Disease expert knowledge space

Process

• “Plasticity” and related

Structure

• Dendrite

• Dendritic

Molecular mechanism

• Known protein

Free text and ontology based queries

Filters:

• Expressed in correct regions

• Correlated Expression

• Quality of query terms

List of relevant genes

3.3. Methods and Results (Part I) 43 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.3.2 Annotation and selection of the Dendritic Plasticity gene dataset

A focused dataset constituted by over 220 genes relevant to dendritic plasticity and pulled

out as described in previous section (3.3.1) was fully annotated with quality controlled data to be

variously clustered based on common features. Information was retrieved from many sources

using manual and semi-automated methods often integrated when available by authors in

public domain Web tools.

In summary, the most significant sources of knowledge selected were the Gene Ontology,

the whole literature, the Jackson’s mouse phenotypes (The Jackson Laboratory, Bar Harbor,

Maine, US)2, the Gene Expression Omnibus (GEO, National Center for Biotechnology

Information, Bethesda, US)3 and the Allen Brain Atlas a public resource of the Allen Institute for

Brain Science, Seattle (WA, US)4. GO annotations were extracted from the most descriptive

levels of the ontology tree in all the three branches (i.e. biological processes, molecular

functions, cellular components) using EASE a standalone version of the Database for

Annotation, Visualization and Integrated Discovery tool [49][50].

Genes were ordered along rows and annotated along multiple columns of an Excel Worksheet.

Information extracted was condensed to fill in key fields and sub-fields in some details and as

much consistently as possible. When specific data were not available corresponding cell was left

empty in the table. Snapshot of the data sheet content split by main categories and sub-

functions is listed below. Number of genes annotated for each feature is reported in parenthesis:

o Gene Ontology

o Biological process (200); Molecular function (193); Cellular component (189)

o Mechanisms

o Structural (67); Trafficking (37); Long Term Potentiation (88); Long Term

Depression (30); Development (76); Neurogenesis (45)

2http://www.informatics.jax.org

3http://www.ncbi.nlm.nih.gov/geo

4http://www.brain-map.org

44 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________

o Species

o Human (223); Mouse (222); Rat (222); Drosophila only (15)

o Functional localization

o Adult (105); Development (107); Neuronal (109); Dendritic Spine (46); Post-

synaptic (49); Pre-synaptic (28)

o Brain region

o Cortex (104); Hippocampus (103); Amygdala (41); Thalamus (102);

Hypothalamus (99); Cerebellum (102)

o Cell localization

o Extracellular (50); Membrane (88); Cytoplasmic (121); Nucleus (48);

Mitochondrial (6)

The number of genes sharing characteristics known to be associated to dendritic plasticity

was further reduced to form a smaller and much focused subset of elements. Expression in

appropriate brain areas, developmental stage or functional localization etc. was used to filter out

genes with weaker evidences supporting their link to plasticity. Since no better sources of

human data were available, brain expression information was derived from the mouse Brain

Atlas.

This further sub-selection was important to improve specificity and proceeds with a more

accurate identification of the diseases that genes identified were associated to after analysis with

the Disease Ontology Knowledgebase.

3.3.3 Identification of the correlations gene-disease

The list of genes appropriately extracted from public domain sources and filtered to exclude

some of the less relevant was submitted to the Disease Ontology Knowledgebase. This allowed

extracting gene-disease associations based on the annotated dataset identified. A significant

portion of the selected genes appeared to be implicated in one or more diseases.

3.3. Methods and Results (Part I) 45 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Neuropsychiatric disorders, the most interesting to the objectives of this work, appeared highly

enriched in genes of the dendritic plasticity dataset (Fig. 3.2).

Figure 3.2: Absolute number of genes assigned to DO nodes is reported in

vertical axes, while diseases are distributed horizontally. Alzheimer and

Schizophrenia resulted as the top two represented diseases of the “Dendritic

Plasticity” focused dataset.

The most represented among all were by far Alzheimer’s disease and Schizophrenia,

followed by Epilepsy, Myocardial infarction, Hypertension and Obesity. Being dendritic

plasticity a mechanism known to be extremely relevant for cognitive related tasks at the

molecular level it was not unexpected to find diseases affecting these tasks on top of the list. On

the other hand, disease candidates like myocardial infarction and hypertension were less

obvious to be interpreted and would have required further investigation out of scope in this

project.

It is necessary to highlight that number of total annotations per disease in the DO

knowledgebase extracted as described previously could be involuntarily biased towards certain

diseases. This is not an effect of the annotation process; instead, it reflects data distribution in

0

2

4

6

8

10

12

14

16

18

20

Ach

ondr

oplasia

Acu

te m

yoca

rdia

l inf

arct

ion

Aden

ocar

cino

ma

of L

ung

Alcoh

ol a

buse

Alexa

nder

Disea

se

Alzhe

imer

's D

isea

se

Asthm

a

Ath

eros

cler

osis

Bipola

r Disor

der

Blepha

rosp

asm

Bre

ast C

arcinom

a

Car

cino

ma o

f Skin

Celiac Disea

se

Color

ecta

l Can

cer

Cor

onar

y he

art dise

ase

CRANIO

FRONTO

NASAL DYS

PLA

SIA

Dem

entia

Diabe

tes Mellit

us

Diabe

tic N

ephr

opat

hy

Diabe

tic R

etinop

athy

Dow

n Syn

drom

e

Eatin

g Disor

ders

Endo

met

riosis

Epile

psy

Ess

entia

l Hyp

erte

nsio

n

Fac

tor V

II Def

iciency

Fac

tor X

II Def

iciency

Fra

gile X

Syn

drom

e

Fro

ntot

empo

ral d

emen

tia

Gas

tric ulc

er

Ger

m cell

tum

or

Glauc

oma

Gra

ves' D

iseas

e

Hep

atitis A

Hep

atitis E

Her

oin

Dep

enden

ce

Histio

cyto

ma

Hun

tingto

n Disea

se

Hyp

erch

oles

tero

lemia

Hyp

erlip

idem

ia

Hyp

erlip

oprot

eine

mia

Typ

e III

Hyp

erte

nsion

Kidney

Failure

Lep

rosy

Les

ch-N

yhan

Syn

drom

e

Li-F

raum

eni S

yndr

ome

Lup

us er

ythem

atos

us

Macu

lar d

egene

ratio

n

Multi

ple

Mye

lom

a

Multi

ple

Scler

osis

Mya

sthen

ia G

ravis

Mye

lofib

rosis

Myo

card

ial Inf

arction

Myo

card

ial Isc

hem

ia

Myo

tonic Dys

troph

y

neu

ropa

thy

Obe

sity

Obs

essive

-Com

pulsive D

isor

der

Osteo

poros

is

Osteo

sarc

oma

Panc

reat

ic carc

inom

a

Panc

reat

itis

Park

inso

n Disea

se

Perio

dontitis

Pers

onality Disord

ers

Piebal

dism

Pro

stat

e ca

rcinom

a

Ret

inob

lastom

a

Rhe

umatoid

Arth

ritis

Sarc

oidos

is

Sch

izoph

renia

Squa

mou

s ce

ll ca

rcinom

a

Sys

tem

ic lu

pus er

ythem

atos

us

Thr

ombo

cyto

penia

Thy

roid car

cino

ma

Weg

ener's

Gra

nulom

atos

is

Wisko

tt-Ald

rich

synd

rom

e

Series2

Alzheimer’s Schizophrenia

Myocardial infarction

Hypertension

Epilepsy

Obesity

46 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________

public domain databases (e.g. the literature). There are several research fields that are more

investigated than others and for which the wealth of information available is clearly

overwhelming. In some cases bias is simply due to the availability of data preferentially

produced with large scale analysis techniques (e.g. omic studies).

Figure 3.3: Total number of annotations per single disease available in the knowledgebase is

highlighted in purple; in blue the equivalent number of disease annotations found for the

dendritic plasticity dataset. Log scale distribution of total disease annotations for relevant

diseases in the DO does not clearly correlate with distribution of diseases in the dendritic

plasticity dataset even if some tendency cannot be excluded.

However, as graphically reported in logarithmic scale in Fig. 3.3 over-representation of

disease annotations identified in the query set showed some correlation tendency only and not

clear relation to the total number of annotations in the knowledgebase. Alzheimer and

Schizophrenia diseases for example, are both highly annotated and also the most enriched in the

dataset but Epilepsy and Parkinson’s do not follow the same trend. At this point, statistical

analysis would be very useful to evaluate significance of results. Integration of statistical

methods applied to data validation however has been taken into consideration for further

1

10

100

Mye

loid le

ukem

ia

Ange

lman

Syn

drom

e

Bipola

r Disor

der

Celiac Disea

se

Lup

us er

ythem

atos

us

Osteo

poros

is

Osteo

sarc

oma

Pers

onality Disord

ers

Asthm

a

Cor

onar

y he

art dise

ase

Park

inso

n Disea

se

Rhe

umatoid

Arth

ritis

Sys

tem

ic lu

pus er

ythem

atos

us

Color

ecta

l Can

cer

Multi

ple

Scler

osis

Hyp

erte

nsion

Myo

card

ial Inf

arction

Epile

psy

Obe

sity

Sch

izoph

renia

Alzhe

imer

's D

isea

se

Dendr. plasticity

Tot. annotations

3.3. Methods and Results (Part I) 47 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

improvement of the knowledgebase system and it is not available in this first version of the

analysis tools.

3.3.4 Data validation using alternative methods

To make comparison of the gene-disease associations retrieved with the DO Knowledgebase

and possibly get indirect confirmation of results, supplemental approaches making use of two

different tools, the commercial Ingenuity Pathway Analysis (IPA, Ingenuity® Systems,

www.ingenuity.com) and the public domain DAVID were applied to the dataset.

Ingenuity database contains more than one million findings privately curated from the

public domain literature. Terminology of the Ingenuity Knowledge Base is however not standard

being the result of an internal effort and as such it is not identical to that developed for the

Disease Ontology. Nonetheless, it was straightforward verifying that this approach delivered

very comparable results as summarised in figure 3.4. IPA allowed to score the disease terms

over-represented in the dendritic plasticity dataset and to calculate p-value to evaluate statistical

significance. Both the “Neurological disease” and the “Psychological disorders” internal category

resulted below fixed significance threshold of 0.05.

Figure 3.4: Genes associated to diseases by IPA tool showed that neurological and

psychological disorders are the most represented diseases in the dendritic plasticity dataset.

48 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Usage of IPA analysis supported results pulled out from the DO Knowledgebase and was

considered valid indication to proceed with additional investigation.

Same analysis was performed with DAVID to take advantage of the large number of public

domain databases of pathways integrated in the tool. Top scoring pathways enriched with genes

of the dendritic plasticity gene set and ranked by significance are listed in Table 3.5.

Relationships of the dataset with Alzheimer’s disease and other neurodegenerative diseases in

KEGG pathways came out as statistically significant. The number of genes assigned to over-

represented categories however, resulted much lower compared to that obtained with the DO

Knowledgebase.

Table 3.5: Annotation of dendritic plasticity genes was obtained using DAVID.

Neurodegenerative Diseases and Alzheimer’s disease are significantly over-represented in

KEGG pathways.

3.3. Methods and Results (Part I) 49 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

When disease specific databases were included in DAVID analysis, significant associations

emerged from GAD (Table 3.6), which is also one of the databases used to annotate the DO (see

Chapter 2, 2.2.1). The two highest scoring categories were again Schizophrenia and Alzheimer,

in accordance with previous analysis. However, even if number of genes assigned to

Schizophrenia was similar, number of genes assigned to Alzheimer’s disease was again much

lower. This put in evidence that the integration of several different sources of data in the DO

Knowledgebase allow to get improved results and better annotation.

Table 3.6: Usage of disease specific annotations in DAVID allowed identifying Schizophrenia

and Alzheimer’s disease as the highest scoring entries.

50 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.4 Methods and Results (Part II)

After an initial introduction, Part II of ‘Methods and Results' section is dedicated to

pathway and network analysis. Details of the process used to validate and extend the output of

the DO Knowledgebase analysis are given in paragraphs 3.4.3 and 3.4.4.

3.4.1 Introduction to pathway and network analyses

Pathway analysis refers to the computational approaches used to investigate network of

genes, proteins or metabolites as a system and describe a set of molecular events. In a broad

sense, biologists use the term “pathway” to describe a set of molecular events underlying a

biological process. Topological analysis of a pathway identifies the global qualitative properties

of the system. A process of building pathways once relied completely on the slow accumulation

of knowledge about individual molecular events. Information therefore was often spread over

thousands of publications. The introduction of new technologies is changing this old approach

and pathway analysis is increasingly used to interpret high-throughput experiments that

measure abundance of biological molecules for high number of data points generated in a single

run.

There are several approaches that produce those kind of observations: of mRNA using gene

expression microarrays [51][52], metabolomics experiments measuring endogenous and drug

metabolite concentration [53], proteomics experiments measuring protein levels [54], studies of

protein phosphorylation and protein-protein interactions by protein arrays, mass spectrometry

or yeast two-hybrid screen [55][56] and finally domain-driven lists of manually generated

protein or gene entries. Due to platform technical assumptions, characteristics, and implied

limitations, interpretation is however not straightforward.

Most approaches use known molecular interactions to calculate pathways, but some try to

infer novel interactions directly from the profiling data. Information about activated pathways

can be used to select known drugs for personalized therapy, to select and prioritize potential

drug targets to develop new drugs and to evaluate the efficacy of a drug candidate or to predict

drug side effects and toxicity.

3.4. Methods and Results (Part II) 51 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

As a result, analysis methods may fall into three broad categories:

1. Pathway analysis: loosely defined as highlighting known or pre-defined

pathways in response to stimulus during an experiment.

2. Network analysis: loosely defined as highlighting networks of interacting

proteins and molecules, where those interactions may include direct molecular

interactions, or interactions defined by other scientific criteria such as

signalling cascades or downstream effects.

3. Multi-dataset pathway analysis: loosely defined as answering the question

"How significant are results across datasets, compared to what could randomly

be considered an overlap across datasets?"

3.4.2 Databases and tools for Pathway/Network analysis

Several databases of molecular interaction have been developed using manual curation.

Public manually curated databases include: BIND, KEGG, DIP, HPRD, and Reactome.

Commercial manually-curated databases have been developed by Ingenuity Systems, Jubilant

Biosystems, Molecular Connections and GeneGo. Because these databases provide highly

accurate data, they unfortunately suffer from slow and expensive data accumulation.

The pathway analysis framework as just described provides a natural environment for

expanding the molecular interaction network over uncharacterized proteins and to assign a

confidence score to the interactions.

Network database navigation tools can find a minimal set of regulators responsible for most

of changes in molecular profiles and reveal how the regulation activity could be carried out. For

example, a transcriptomics experiment usually finds thousands of differentially expressed

genes, but their expression is driven in theory by a limited number of transcription factors. The

representation of the molecular profile as a set of major regulators reduces the complexity of the

observed pattern and simplifies further analysis. The set of major regulators, in combination

52 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

with their downstream targets exhibiting differential profiles, is interpreted as a list of affected

biological processes [57].

Another major application for current pathway analysis tools is biomarker discovery and

optimization. The most convenient biomarkers are secreted proteins and metabolites that

expression analysis technologies can not help detecting. However, they can still be identified as

downstream targets of selected sets of differentially expressed genes. The statistically significant

expression level variations are usually identified using microarray technology that cannot detect

changes in secretion or signalling through protein modification and chemical reactions.

Standard algorithms for network navigation [58][59], enable network expansion by finding

the shortest path with highest score between the database entities, common regulators and

targets. Network analysis using these algorithms is still laborious and time consuming but

unfeasible without them. Most commercial companies mentioned in previous paragraph provide

such tools, with varying degrees of complexity.

3.4.3 Canonical pathways analysis

To better understand molecular mechanisms correlated to dendritic plasticity, genes of the

dataset were superimposed on canonical pathways using IPA library. The significance of the

association between genes in the data set and the canonical pathway identified was measured as

follow:

1. Number of genes from the data set that mapped to the canonical pathway was divided

by the total number of genes members of the same pathway.

2. Fischer’s exact test was than used to measure p-value and determine the probability that

the association between the genes in the dataset and the canonical pathway emerged

would be explained by chance alone.

3.4. Methods and Results (Part II) 53 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Among the most significantly enriched pathways retrieved, Axonal Guidance Signaling and

Synaptic Long Term Potentiation categories, confirmed the consistency of the dataset selected

for this study (Fig. 3.7).

Another interesting enriched pathway instead, the Huntington’s disease (HD) pathway,

highlighted an additional link of the dendritic plasticity dataset to the neurodegenerative

disease. HD causes astrogliosis and loss of medium spiny neurons. Areas of the brain are

affected according to their structure and the types of neurons they contain, reducing in size as

they cumulatively lose cells. The areas affected are mainly in the striatum, but also the frontal

and temporal cortices.

Although very different in its aetiology, Huntington’s disease shares with Alzheimer the

same degenerative processes as confirmed by the literature [60]. This preliminary investigation

suggested that most probably also the molecular mechanisms could have common aspects, at

least those related to dendritic plasticity.

Figure 3.7: Canonical pathways are listed from most significant to least and

orange line denotes cut-off for significance (p-value < 0.05). Taller bars

represent categories with greater number of genes compared to shorter bars.

54 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

In order to graphically visualize biological context and evaluate how and which genes were

included in canonical pathways, maps were very useful. In figures 3.8 and 3.9 two of the most

relevant to this analysis, the Glutamate Receptor and the Ephrin Receptor signalling pathways,

are used to exemplify this issue. For the two maps, correlation between dendritic plasticity genes

and either psychiatric diseases or mechanisms of plasticity respectively are reported.

Glutamate receptor signaling pathway potently inhibits actin dynamics in spines. Activation

of AMPA or NMDA subtypes blockade protrusive activity from the spine head causing spines to

become more stable and regular in their morphology [61].

Figure 3.8: The glutamate receptor signalling pathway as described in IPA. Genes of

the selected dataset overlapping pathways in object are highlighted in grey. Cyan

coloured edges show connections of genes to psychiatric diseases.

Ephrins are especially interesting because they are known to be strongly related to dendritic

plasticity mechanisms. Ephrins are cell surface ligands for Eph receptors the largest family of

3.4. Methods and Results (Part II) 55 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

tyrosine kinase receptors and during development they seem to influence cell behaviours such

as morphogenesis and organogenesis. In adulthood the Eph/ephrin cell communication system

continues to play roles in tissue plasticity giving shape to dendritic spines during neuronal

plasticity [62].

Figure 3.9: Genes in dendritic spine related mechanisms are emphasized in red spots on

the ephrins canonical pathway. Map was generated using IPA.

3.4.4 Gene networks

Although canonical pathways can be visually helpful they suffer from a somewhat artificial

grouping of genes in a limited number of maps and provide no guidance on the absolute

statistical significance of the results. A useful complementary approach to face the problem is

mapping genes on large network of biological relationships derived by gene interaction

databases (e.g. databases of gene regulation) or obtained by manual curation of the literature, as

56 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

for the IPA tool. A network is generally intended as a graphical representation of the molecular

relationships between genes/gene products. Genes or gene products are represented as nodes,

and the biological relationship between two nodes is represented as an edge (line). In IPA, all

edges are supported by at least one reference from the literature, from a textbook, or from

canonical information stored in the database Ingenuity Pathways Knowledge Base. Human,

mouse, and rat orthologs of a gene are stored as separate objects, but are represented as a single

node in the network.

Given such a network of biological relationships covering many types of cellular processes

such as signaling, transcriptional regulation and metabolism, and a query set of interesting

genes the objective is to search the network for subnetworks consisting mostly of query genes.

The group of genes in such subnetworks and the literature-based relationships among them

provide some biological insight into the mechanism of action. The method used relies on a

scoring function and an algorithm to find the high-scoring subnetworks. The genes contained in

the subnetwork found by the algorithm consist of members included in the query set and genes

not included in the query set that fill in the ‘gaps’. Number of ‘gaps’ can be modified (i.e.

augmented or reduced) to identify boundaries of the statistical significance measure for each

subnetwork.

Genes of the dataset being analyzed were selected based on their absolute disease class

numerosity (paragraph 3.3.3) that is correlated for the top categories to psychiatric diseases as

emerged from disease analyses, and used to query the network of biological relationships.

Therefore, 18 Schizophrenia and 20 Alzheimer’s disease related genes (Appendix B) assigned to

respective categories by the Disease Ontology Knowledgebase were overlaid independently to

the IPA proprietary global network of gene relationships. In details, the data set containing the

list of gene identifiers was uploaded into the application. Each gene identifier was mapped to its

corresponding gene object in the Ingenuity Pathways Knowledge Base. Subnetworks of these

focused genes were then algorithmically generated and ranked based on their connectivity score

for each disease separately; the top scoring subnetworks for Schizophrenia and Alzheimer

respectively put in evidence that some of the constituent genes were common to both diseases.

When the same group of highly relevant genes is assigned to independent high-scoring

networks in distinct analysis, it is reasonable speculating on similar role of constituent genes.

3.4. Methods and Results (Part II) 57 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

This aspect sometimes underlies the importance of specific genes in common molecular

functions or regulation processes. Genes spanning over different diseases are potentially

interesting to elucidate physiological mechanisms of pathologies but are also useful to classify

diseases especially when this task is based on clinical feature observations only as for several

psychiatric disorders.

In figure 3.10a the subnetwork of Schizophrenia related genes is represented, while in figure

3.10b same subnetwork is used to super-impose genes found in common with Alzheimer.

Figure 3.10a, 3.10b: Each gene is represented by a node in the graph and each

relationship between genes is represented by an edge. Genes of the dendritic plasticity

dataset related to Schizophrenia disease are displayed in grey boxes highlighted in blue

on the left picture. Four of them identified as also relevant to Alzheimer’s disease are

highlighted on the right hand frame. Direct and indirect relationships are respectively

represented as solid and dashed edges. Data were analyzed through IPA tool.

Nodes are displayed using various shapes to symbolize the functional class of the gene

product. Edges are associated to several different label classes that describe the nature of the

relationship between the nodes (e.g., P for phosphorylation, T for transcription). Negative

evidence (gene A does not bind gene B) and group/complex relationships are excludes from the

network.

58 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

The four genes shared by Schizophrenia and Alzheimer’s diseases, namely brain derived

neurotrophic factor (BDNF), nerve growth factor 2 (NTF3), apolipoprotein E precursor (APOE)

Figure 3.11: Schizophrenia and Alzheimer’s common genes boxed in blue were

mapped on the literature network to identify common top scoring networks.

and major prion protein precursor (PRNP) were back mapped on the literature network as

already described to search for high-scoring subnetworks.

Objective of this step was to identify an enlarged group of genes possibly implicated in basic

biological processes common to distinct neuropsychiatric diseases (Fig. 3.11).

3.4. Methods and Results (Part II) 59 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Topological study of those interactions highlighted interconnectivity among many genes and

allowed to identify several central nodes of the network. This is the case for Mapk, ERK and Akt

that are graphically located in strategic positions known as ‘hubs’ in scale free networks, where

higher number of relationships (i.e. edges) are attracted.

Even greater group of elements can be collected by extending the network to more distant

genes. The risk however is to obtain a too high number of unspecific and irrelevant connections.

Balance among enlarged networks and stringency criteria must be always carefully evaluated in

order to maintain consistent results. Therefore, to identify sub-network boundaries, gene set

obtained in the last analysis, which is an extension of the small group of genes related to both

Schizophrenia and Alzheimer, were compared to the DO Knowledgebase. As expected many

cognition related disease classes in addition to Schizophrenia and Alzheimer, such as Memory

Impairment, Personality Trait, Attention Deficit Hyperactivity Disorder, Cognition

Performance, Obsessive-Compulsive disorder, and Vascular Dementia emerged as the most

enriched in the set.

When the sub-network was extended to more distant relationships through the addition of

one gap among possibly related elements, a kind of dilution effect was observed. Correlation to

neuropsychiatric diseases resulted weakened, while apparently unrelated diseases started to

appear. Genes identified after network expansion do not overlap with any other gene either

included in the dendritic plasticity dataset or previously associated to Schizophrenia and

Alzheimer diseases. This high-scoring subnetwork is therefore particularly interesting to start

analysing constituent genes in fundamental mechanisms of neuropsychiatric disorders.

For the specific purpose of validating the quality and usefulness of the DO Knowledgebase

in network analysis this test was successful, as it allowed to properly discriminate the most

focused subnetwork after network expansion.

Complete list of genes identified with the network analysis and displayed in Fig. 10 is

available in Appendix B.

60 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.4.5 Drug/nutraceutical interactors

As described in previous paragraphs, analysis of the dendritic plasticity dataset leaded to the

identification of several Schizophrenia and Alzheimer’s related genes, which were then mapped

on gene pathways and networks to spot the most interesting. Network extension allowed then to

increment number of those in common to both diseases.

To investigate which of those genes were already known as targets of commercial drugs and

for what diseases, they were compared against the quite exhaustive information on gene-drug

interactions available in the DO Knowledgebase (Table 3.12). Data were easily retrieved using

the gene name as key field in the database query command.

Table 3.12: Genes identified for Schizophrenia and Alzheimer in the DO Knowledgebase

together with those obtained in common sub-network are associated to known interacting

drugs and principal indication. Data are sorted by ‘Main Indication’ column.

3.5. Discussion 61 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

3.5 Discussion

The research objective of this case study was to make some investigations of dendritic

plasticity mechanisms in the context of diseases. Application of the method described in

previous paragraphs suggested direct correlation between dendritic plasticity and neurological

diseases like Schizophrenia, Alzheimer and epilepsy; these general results were completely

confirmed by the literature. Reduction in markers of axon terminal density [63] [64] and

pyramidal cell somal volume [65] [66] in the prefrontal cortex of subjects with schizophrenia

have been already reported and associated to reduced density of pyramidal cell dendritic spines.

Similarly, a number of studies indicate that experimentally induced reductions in excitatory

afferent input can result in reduced dendritic spine density [67]. However, only recently reduced

dendritic spine density has been observed in subjects with Schizophrenia [68]. To the same

extent it has been widely confirmed that in physiological conditions, such as learning and

memory, and in pathological conditions, such as Alzheimer's disease and epilepsy, dendrites and

spines undergo dynamic changes [69].

More specifically, at the gene level, involvement of several dendritic plasticity specific genes

in both Schizophrenia and Alzheimer diseases (i.e. BDNF, NTF3, APOE, PRNP) were reported

after network analysis. This information supported the hypothesis that certain molecular

mechanisms are probably shared among different neuropsychiatric disorders. Correlation of

specific genes to both diseases has been investigated to a certain extent in other studies. BDNF

has already been identified as a risk locus for mood disorders in adults [70], anorexia nervosa

[71], obsessive compulsive disorder and in combination with NTF3, NTRK as possible candidate

gene for the attention deficit hyperactivity disorder (ADHD) [72].

BDNF and NTF3 have also been reported as possibly associated to the pathogenesis of

schizophrenia and to some neurodevelopment abnormalities found in the diseased brains [73].

Functional and structural alterations of the hippocampal formation have been described in

major depression but the underlying pathophysiology remains unclear. Interactions between the

5-HT-system and neurotrophic factors like BDNF and glutamate are however known to also

affect morphology of hippocampus [74].

62 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

The presence of APOE has also been already associated with several neuropsychiatric

disorders and it is the first identified molecular susceptibility locus for sporadic and familial

forms of Alzheimer. Also, polymorphisms of PRNP are known to be strongly associated with

neurodegenerative disorders and might influence variables such as age at onset, disease

progression, cognitive impairment or response to antipsychotics [75].

63

Chapter 4

Conclusions

The development of a number of high throughput technologies such as transcriptional

profiling, proteomics, genetic association etc. is helpful in the identification of genes important

for a particular research investigation. Making correlations to biological pathways, diseases,

drugs or any other useful information for a subset of those genes is however still challenging.

Integration of different data sources, annotation projects, and high quality public domain

databases is still a major need. It is important to understand that biological interpretation of

gene lists based on single source of data can be successful but also limited by the underlying

knowledge bias and sometimes poor quality. Efforts involving multiple research groups with

mixed competencies guarantee wider coverage of the knowledge space and greater control over

quality. With very similar advantages, computational biology methods are extremely useful to

collect and analyse huge amounts of data, provided that sources of information are carefully

selected.

In this PhD work I developed a computational resource based on highly consistent cross

domain information, able to link genes, diseases and drugs in the same framework. Resulting

knowledgebase can be used either to extract biological knowledge out of gene sets or to simply

correlate single genes to the diseases they are known to be involved with and the drugs they are

modulated by. One of the major expectations for this computational effort was data quality,

therefore careful selection of sources and manual curation of data has been guaranteed.

The backbone of the system is represented by ontologies, which under the Open Biomedical

Ontologies umbrella represent a fundamental step towards the standardization of the

64 Chapter 4. Conclusions ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

multiplicity of possible descriptors available to synthesize the biological knowledge. One of

those ontologies, the Disease Ontology (DO) and its annotation with proper genes and drugs

was the primary target of the first part of my PhD activity. When this activity started there were

no other groups in the scientific community working to the same objective, which can be

therefore considered an original and innovative contribution to the field.

I developed software tools to automatically extract gene-disease-drug relations from several,

often manually curated databases. The preliminary creation of a disease and drug dictionary of

synonyms allowed term-term matching necessary to make consistent annotations of the

ontology. The information collected across this process has been stored in MySQL relational

database together with references to the original source. The DO Knowledgebase contains

original information from the GO, ChEBI, DrugBank, GKB, KEGG, OMIM, GAD databases that

can be easily cross-linked by interrogating the system with simple or complex queries. The

knowledgebase content is fully available command line on UNIX/LINUX but a preliminary

simplified Web interface has also been developed to quickly search correlations among single

genes, diseases and drugs. Searches for entire gene datasets will be implemented in the next

version of the tool. A fully functional Web interface will be developed to include user

registration, comment form, and basic or advanced query options to access data for genes,

drugs, disorders and their relationships. BioMart1 software is the solution currently investigated

to implement the interface. BioMart is a query-oriented data management system developed

jointly by the European Bioinformatics Institute (EBI) and Cold Spring Harbor Laboratory

(CSHL). BioMart simplifies the task of creation and maintenance of advanced query interfaces

backed by a relational database and it is particularly suited for providing the 'data mining' like

searches of complex descriptive (e.g. biological) data. It can work with existing data repositories

by converting them to the required format, as well as newly created databases. The annotation

process is expected to continue increasing the number of entries collected in the database. At the

same time the vocabulary will be improved with additional synonyms and integrated with

supplementary information such as both pathways and biological reaction data.

1http://www.biomart.org/

65 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

It would be also necessary to develop a new strategy to keep resources up to date since

knowledge of genes, drugs and diseases accumulate and change rapidly. Some revised data

curation plan is necessary to speed up the process of manual curation, which is currently very

time consuming.

This resource is expected to be further improved with feedback and collaboration of

experimental groups that are benefited by its usage. Therefore, the Disease Ontology

Knowledgebase will be made accessible to collaborators and participating members of the

scientific community to evaluate its functionality testing the system and come back with

constructive suggestions. Those interested to work on case studies devoted to system validation

will be particularly welcome.

Second part of my PhD activities was spent to test the knowledgebase by investigating an

important mechanism related to learning and memory processes, namely the dendritic

plasticity, that has been recently suggested to be strongly involved in neuropsychiatric disorders.

Despite the increasing amount of information emerged around plasticity however, underlying

molecular mechanisms are still poorly known. To make use of the knowledgebase creating

meantime a context to dendritic plasticity I collected several hundred relevant genes principally

from the GO and the literature. That large group of genes allowed building knowledge from solid

foundation and offered significant chances to find some actual connections to neuropsychiatric

diseases. As a confirmation, when the DO Knowledgebase was interrogated with the set of

around 250 genes, Alzheimer’s disease and Schizophrenia emerged as the best hits. Among

others also Obesity, Epilepsy and Myocardial infarction were found to be possibly correlated to

the dataset. However, since statistical confirmation of significance for over-represented groups

of genes will be object of future improvement of the knowledgebase I exploited other indirect

methods to validate results. Analysis of the same set of ~250 genes related to dendritic plasticity

with both public domain (DAVID) and commercial tools (IPA) allowed to confirm that

Neurodegenerative Diseases and Alzheimer are the diseases most significantly enriched.

In the following step, the application of additional pathway and networking methods

applied to the genes associated to Schizophrenia and Alzheimer diseases allowed to extend

knowledge to further relations with dendritic plasticity mechanisms, which need however to be

validated. Several canonical pathways and many hub genes highlighted in this computational

66 Chapter 4. Conclusions ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

research could be easily used to start additional confirmation studies, as for instance gene

expression experiments in brain diseased tissues.

Full usage of the knowledgebase I developed in this PhD work allowed me to extend results

of the case study analysis beyond its objectives and demonstrated that it is valuable when

applied to real data. As a matter of fact, the successful identification of diseases correlated to

plasticity has been also possible because of the selection of a critical number of relevant genes.

This is usually not a problem when the originating experiment is based on microarray

expression or proteomics, however low fold changes and consequently difficult identification of

regulated genes is often a problem for many experiments with CNS samples.

The knowledgebase is immediately useful to also identify diseases or drugs linked to single

genes. In this case careful annotation of the DO, progressed as described above, is the basic

element to pull out true relationships from the database. Similarly, it is not needed any

fundamental improvement to only extract annotations from the knowledgebase for gene/protein

datasets. Conversely, the Disease Ontology Knowledgebase still needs to be improved on several

aspects. The principal is further annotation of the Disease Ontology, necessary to obtain even

more consistent and robust results especially when the number of genes in the dataset

investigated is low. Two others have been identified, the implementation of some statistical

methods to measure significance applied to over-representation analysis and a full interface to

allow users exploiting completely the content of the knowledgebase.

Further improvements hopefully suggested by users are expected if the system will ensure

additional value to their research activities.

67

Appendix A

Dendritic Plasticity gene dataset

Table below contains list of all the over 220 genes collected from public domain sources (e.g. Gene Ontology) with some relevant annotation included.

Gene Ontology

Gene Name Official Gene Symbol

Alias Symbols

Gene Ontology (Biological processes)

Gene Ontology (Molecular function)

Gene Ontology (Cellular

component)

Genes below were obtained from the Gene Ontology database

angiotensinogen (serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 8)

AGT ANHU;SERPINA8

Regulation of long-term neuronal synaptic plasticity

hormone activity;serine-type endopeptidase inhibitor activity

soluble fraction

dishevelled, dsh homolog 1 (Drosophila)

DVL1 DVL;MGC54245

Positive regulation of dendrite morphogenesis, Dendrite morphogenesis

protein binding;signal transducer activity

cytoplasmic vesicle

EphB2 EPHB2 DRT;EPHT3;ERK;Hek5;Tyro5

Positive regulation of long-term neuronal synaptic plasticity, Regulation of neuronal synaptic plasticity

ATP binding;axon guidance receptor activity;transmembrane-ephrin receptor activity

integral to plasma membrane

forkhead box G1 FOXG1

HFK2, QIN, BF1, HFK1, HFK3, HBF-3

Neuron morphogenesis during differentiation

FYN oncogene related to SRC, FGR, YES

FYN MGC45350;SLK;SYN

Regulation of neuronal synaptic plasticity

ATP binding;non-membrane spanning protein tyrosine kinase activity;protein serine/threonine kinase activity

actin filament;cellular_component unknown

68 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

guanine nucleotide binding protein (G protein), q polypeptide

GNAQ G-ALPHA-q Neuron remodeling

GTP binding;heterotrimeric G-protein GTPase activity;signal transducer activity

cytoplasm;heterotrimeric G-protein complex

glutamate receptor, ionotropic, AMPA 1

GRIA1 GLUH1;GLUR1;GLURA;HBGR1

Regulation of long-term neuronal synaptic plasticity

alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionate selective glutamate receptor activity;glutamate-gated ion channel activity;kainate selective glutamate receptor activity;potassium channel activity;protein prenyltransferase activity

integral to membrane;plasma membrane

glutamate receptor, ionotropic, kainate 1

GRIK1 EAA3;EEA3;GLR5;GLUR5

Regulation of short-term neuronal synaptic plasticity, Regulation of long-term neuronal synaptic plasticity

glutamate-gated ion channel activity;kainate selective glutamate receptor activity;potassium channel activity

integral to plasma membrane

glutamate receptor, ionotropic, kainate 2

GRIK2 EAA4;GLR6;GLUR6

Regulation of short-term neuronal synaptic plasticity

glutamate-gated ion channel activity;kainate selective glutamate receptor activity;potassium channel activity

integral to plasma membrane

glutamate receptor, metabotropic 5

GRM5

GPRC1E;MGLUR5;MGLUR5A;MGLUR5B;mGlu5

Positive regulation of long-term neuronal synaptic plasticity

metabotropic glutamate\, GABA-B-like receptor activity

integral to plasma membrane

hypoxanthine phosphoribosyltransferase 1 (Lesch-Nyhan syndrome)

HPRT1 HGPRT;HPRT

Dendrite morphogenesis

hypoxanthine phosphoribosyltransferase activity;magnesium ion binding

chloroplast

v-Ha-ras Harvey rat sarcoma viral oncogene homolog

HRAS HRAS1;RASH1

Regulation of long-term neuronal synaptic plasticity

GTP binding;RAS small monomeric GTPase activity

cytoplasm;plasma membrane

basic helix-loop-helix domain containing, class B, 3

BHLHB3 DEC2, SHARP-1, SHARP1

Regulation of neuronal synaptic plasticity

transcription factor activity

nucleus

amyloid beta (A4) precursor protein (protease nexin-II, Alzheimer disease)

APP AAA;ABETA;AD1;CVAP

Neuron remodeling

cell adhesion molecule activity;heparin binding;protein binding;serine-type endopeptidase inhibitor activity

Golgi apparatus;coated pit;endoplasmic reticulum;extracellular;integral to plasma membrane

v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog

KRAS KRAS1 Regulation of long-term neuronal synaptic plasticity

69 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

calcium channel, voltage-dependent, P/Q type, alpha 1A subunit

CACNA1A

APCA;CACNL1A4;EA2;FHM;HPCA;MHP;MHP1;SCA6

Dendrite morphogenesis

DNA binding;calcium ion binding;voltage-gated calcium channel activity

nucleus;voltage-gated calcium channel complex

calcium channel, voltage-dependent, alpha 1F subunit

CACNA1F

CSNB2;CSNBX2

Dendrite morphogenesis

calcium ion binding;voltage-gated calcium channel activity;voltage-gated sodium channel activity

voltage-gated calcium channel complex;voltage-gated sodium channel complex

dopamine receptor D5

DRD5 DBDR;DRD1B;DRD1L2;MGC10601

Regulation of long-term neuronal synaptic plasticity

dopamine receptor activity

integral to plasma membrane

matrix metallopeptidase 9

Mmp9 positive regulation of synaptic plasticity

gelatinase B activity extracellular space

glutamate receptor, ionotropic, N-methyl D-aspartate 2B

GRIN2B NMDAR2B;NR2B;hNR3

Regulation of neuronal synaptic plasticity

N-methyl-D-aspartate selective glutamate receptor activity;glutamate-gated ion channel activity;magnesium ion binding

integral to plasma membrane;synaptic vesicle

apolipoprotein E APOE Regulation of neuronal synaptic plasticity

antioxidant activity;apolipoprotein E receptor binding;beta-amyloid binding;heparin binding;lipid binding;lipid transporter activity;low-density lipoprotein receptor binding;tau protein binding

cytoplasm;extracellular;membrane

asp (abnormal spindle)-like, microcephaly associated (Drosophila)

ASPM;MCPH5

FLJ10517;FLJ10549;MCPH5

Forebrain neuroblast division

calmodulin binding nucleus

T-cell leukemia, homeobox 2

TLX2 Enx;HOX11L1;NCX

Negative regulation of dendrite morphogenesis

molecular_function unknown;transcription factor activity

cellular_component unknown;nucleus

nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent 4

NFATC4 regulation of synaptic plasticity

growth factor cytosol, nucleus

neuroplastin NPTN SDR1, GP55, GP65, np65, np55

Positive regulation of long-term neuronal synaptic plasticity

receptor activity integral to membrane

metallothionein 3 (growth inhibitory factor (neurotrophic))

MT3 GIF;GIFB;GRIF

Negative regulation of dendrite morphogenesis

antioxidant activity;copper ion binding;electron transporter activity;zinc ion binding

synaptic vesicle

70 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

early growth response 2 (Krox-20 homolog, Drosophila)

EGR2 CMT1D;CMT4E;KROX20

Regulation of neuronal synaptic plasticity

DNA binding transcription factor complex

synaptophysin SYP Regulation of neuronal synaptic plasticity

calcium ion binding;molecular_function unknown;transporter activity

integral to synaptic vesicle membrane;synapse;synaptosome

acetylcholinesterase (YT blood group)

ACHE YT Positive regulation of dendrite morphogenesis

acetylcholine binding;acetylcholinesterase activity;beta-amyloid binding;cholinesterase activity;protein homodimerization activity;serine

basal lamina;membrane;synapse

ring finger protein 39

RNF39 HZF;HZFW;HZFW1;HZFw1;LIRF

Regulation of neuronal synaptic plasticity

molecular_function unknown;zinc ion binding

cellular_component unknown;integral to membrane

pro-melanin-concentrating hormone

PMCH MCH Regulation of neuronal synaptic plasticity

melanin-concentrating hormone activity;molecular_function unknown;neuropeptide hormone activity

extracellular

ras-related C3 botulinum toxin substrate 1 (rho family, small GTP binding protein Rac1)

RAC1 TC-25;p21-Rac1

Positive regulation of dendrite morphogenesis

ATP binding;GTP binding;Rho small monomeric GTPase activity

filopodium

ATPase, Cu++ transporting, alpha polypeptide (Menkes syndrome)

ATP7A MK;MNK;OHS

Pyramidal neuron development , Dendrite morphogenesis

ATP binding;copper ion binding;copper-exporting ATPase activity;magnesium ion binding;mercury ion transporter activity

Golgi apparatus;integral to plasma membrane

myosin light chain kinase 2, skeletal muscle

MYLK2 KMLC;MLCK;skMLCK

Regulation of neuronal synaptic plasticity

ATP binding;calmodulin binding;myosin-light-chain kinase activity;protein-tyrosine kinase activity

mitogen-activated protein kinase 8

MAPK8

JNK;JNK1;JNK1A2;JNK21B1/2;PRKM8;SAPK1

Positive regulation of dendrite morphogenesis

ATP binding;JUN kinase activity;MAP kinase activity;MAP kinase kinase activity

nucleus

presenilin 1 Psen1 regulation of synaptic plasticity

cadherin binding, peptidase activity

dendrite, axon

presenilin 2 Psen2 regulation of synaptic plasticity

endopeptidase activity

cell soma, Z disc, integral to plasma membrane

71 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Ras protein-specific guanine nucleotide-releasing factor 1

RASGRF1

CDC25;CDC25L;GNRP;GRF1;GRF55;H-GRF55

Regulation of neuronal synaptic plasticity

Ras guanyl-nucleotide exchange factor activity

nucleosome;plasma membrane;synaptosome

Regulation of neuronal synaptic plasticity

brain-derived neurotrophic factor

BDNF MGC34632

Regulation of short-term neuronal synaptic plasticity, Regulation of long-term neuronal synaptic plasticity

growth factor activity;protein binding

extracellular

S100 calcium binding protein, beta (neural)

S100B NEF;S100

Regulation of long-term neuronal synaptic plasticity, Regulation of neuronal synaptic plasticity

S100 alpha binding;S100 beta binding;calcium ion binding;kinase inhibitor activity;protein homodimerization activity;tau protein binding;zinc ion binding

cytoplasm;extracellular

steroidogenic acute regulatory protein

STAR STARD1 Regulation of neuronal synaptic plasticity

cholesterol binding;cholesterol transporter activity;lipid binding

mitochondrion

wingless-type MMTV integration site family, member 7B

WNT7B Positive regulation of dendrite morphogenesis

extracellular matrix structural constituent;signal transducer activity

extracellular

tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, eta polypeptide

YWHAH YWHA1 Negative regulation of dendrite morphogenesis

protein domain specific binding;protein kinase C inhibitor activity

cytoplasm

LIM homeobox 8 LHX8 Lhx7 Forebrain neuron differentiation

protein tyrosine phosphatase, receptor type, f polypeptide (PTPRF), interacting protein (liprin), alpha 3

PPFIA3 KIAA0654;LPNA3

Regulation of short-term neuronal synaptic plasticity

basic helix-loop-helix domain containing, class B, 2

BHLHB2 DEC1;Stra14 Regulation of neuronal synaptic plasticity

transcription factor activity

nucleus

Kruppel-like factor 7 (ubiquitous)

KLF7 UKLF Dendrite morphogenesis

transcription coactivator activity;transcription factor activity;zinc ion binding

perinuclear space

calcium/calmodulin-dependent protein kinase (CaM kinase) II gamma

CAMK2G

CAMK;CAMK-II;CAMKG;MGC26678

Regulation of long-term neuronal synaptic plasticity

ATP binding;calcium-dependent protein serine/threonine phosphatase activity;calcium/calmodulin-dependent

cellular_component unknown;membrane

72 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

protein kinase activity;calmodulin binding;calmodulin-dependent protein kinase I activity;protein-tyrosine kinase activity;signal transducer activity;transporter activity

doublecortin-like kinase 1

DCLK1 KIAA0369, DCLK, DCDC3A

Dendrite morphogenesis

FERM, RhoGEF and pleckstrin domain protein 2

FARP2 FIR;FRG;KIAA0793

Neuron remodeling guanyl-nucleotide exchange factor activity

cytoskeleton

GIPC PDZ domain containing family, member 1

GIPC1 regulation of synaptic plasticity

PDZ domain binding

dendritic spine, synaptic vesicle

citron (rho-interacting, serine/threonine kinase 21)

CIT CRIK;KIAA0949;STK21

Negative regulation of dendrite morphogenesis

diacylglycerol binding;small GTPase regulatory/interacting protein activity

actin cytoskeleton

leucine zipper, putative tumor suppressor 1

LZTS1 F37;FEZ1 Regulation of dendrite morphogenesis

transcription factor activity

cytoplasm;nucleus

neurochondrin KIAA0607 Regulation of neuronal synaptic plasticity

activity-regulated cytoskeleton-associated protein

ARC KIAA0278 Regulation of neuronal synaptic plasticity

actin binding cytoskeleton

signal-induced proliferation-associated 1 like 1

KIAA0440, E6TP1

Regulation of dendrite morphogenesis

Rho family GTPase 1

RND1 Rho6, ARHS Neuron remodeling

plexin A3 PLXNA3

6.3;PLEXIN-A3;PLXN4;Plxn3;SEX;XAP-6

Pyramidal neuron development

transmembrane receptor activity

integral to membrane

netrin 4 NTN4 PRO3091 Neuron remodeling structural molecule activity

extracellular matrix

chondroitin sulfate proteoglycan BEHAB

BEHAB;MGC13038

Regulation of neuronal synaptic plasticity

hyaluronic acid binding;sugar binding

73 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

chondroitin sulfate proteoglycan 4 (melanoma-associated)

CSPG4

MCSP;MCSPG;MEL-CSPG;MSK16;NG2

Neuron remodeling

ATP binding;hydrogen-transporting two-sector ATPase activity

integral to plasma membrane

cytoplasmic polyadenylation element binding protein 1

CPEB1 CPEB;FLJ13203

Regulation of neuronal synaptic plasticity

nucleic acid binding viral nucleocapsid

drebrin 1 DBN1 D0S117E;DKFZp434D064

Regulation of neuronal synaptic plasticity

actin binding;profilin binding

actomyosin;dendrite

doublecortex; lissencephaly, X-linked (doublecortin)

DCX DBCN;DC;LISX;SCLH;XLIS

Dendrite morphogenesis

microtubule binding microtubule associated complex

discs, large homolog 4 (Drosophila)

DLG4 PSD95;SAP90

Regulation of long-term neuronal synaptic plasticity

guanylate kinase activity;membrane-associated guanylate kinase;protein C-terminus binding

intercellular junction

candidate plasticity gene 1

Regulation of neuronal synaptic plasticity

discs, large homolog 4 (Drosophila)

Regulation of long-term neuronal synaptic plasticity

Genes below were obtained from the literature (manually curated)

synovial sarcoma translocation gene on chromosome 18-like 1

SS18L1

CREST; SS18L1; LP2261; KIAA0693; MGC26711; MGC78386

cAMP responsive element binding protein 1

CREB1

nerve growth factor (beta polypeptide)

NGF NGFB; HSAN5; Beta-NGF

neuronal growth, survival, differentiation

neurotrophin secreted

glutamate receptor interacting protein 1

GRIP1 GRIP intracellular signaling cascade

protein binding;receptor signaling complex scaffold activity

cellular_component unknown;ribosome

sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3A

SEMA3A

Hsema-I;SEMA1;SEMAD;SEMAIII;SEMAL;SemD;coll-1;sema III

neurogenesis receptor activity extracellular

Notch homolog 1, translocation-associated (Drosophila)

NOTCH1 notch-1;TAN1, hN1

74 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

neurotrophic tyrosine kinase, receptor, type 2

NTRK2

neurotrophic tyrosine kinase, receptor, type 2;SK378;TrkB.FL;NTRK2

transmembrane receptor protein tyrosine kinase signaling pathway;protein amino acid phosphorylation;neurogenesis

integral to plasma membrane;membrane;integral to membrane

neurotrophin TRKB receptor activity;kinase activity;transferase activity;receptor activity;neurotrophin binding;transmembrane receptor protein tyrosine kinase activity;ATP binding

v-abl Abelson murine leukemia viral oncogene homolog 1

ABL1

RP11-83J21.1, ABL, JTK7, bcr/abl, c-ABL, p150, v-abl, abl1

Synaptic plasticity; dendrite arborisation

protein kinase cytoplasm, dendritic spine

cadherin, EGF LAG seven-pass G-type receptor 2 (flamingo homolog, Drosophila)

CELSR2

CDHF10;EGFL2;Flamingo1;KIAA0279;MEGF3

development;homophilic cell adhesion;neuropeptide signaling pathway

G-protein coupled receptor activity;calcium ion binding;structural molecule activity

integral to membrane

rhodopsin (opsin 2, rod pigment) (retinitis pigmentosa 4, autosomal dominant)

RHO OPN2;RP4

G-protein coupled receptor protein signaling pathway;phototransduction\, visible light;rhodopsin mediated signaling

G-protein coupled photoreceptor activity

integral to plasma membrane

cell division cycle 42 (GTP binding protein, 25kDa)

CDC42 CDC42Hs;G25K

actin filament organization;small GTPase mediated signal transduction

GTP binding;Rho small monomeric GTPase activity

filopodium

T-cell lymphoma invasion and metastasis 1

TIAM1 intracellular signaling cascade

Rho guanyl-nucleotide exchange factor activity;protein binding;receptor signaling protein activity

membrane

cyclin-dependent kinase 5

CDK5 PSSALRE

axonogenesis;cell cycle;cytokinesis;protein amino acid phosphorylation

ATP binding;cyclin-dependent protein kinase activity;protein-tyrosine kinase activity

cytoplasm

adenosine A2a receptor

ADORA2A

ADORA2;RDC8;hA2aR

adenylate cyclase activation;apoptosis;blood coagulation;cAMP biosynthesis;cell-cell signaling;cellular defense response;central nervous system development;circulation;inflammatory response;phagocytosis;sensory

A2A adenosine receptor activity\, G-protein coupled

integral to plasma membrane;membrane fraction

75 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

perception

clusterin (complement lysis inhibitor, SP-40,40, sulfated glycoprotein 2, testosterone-repressed prostate message 2, apolipoprotein J)

CLU

APOJ;CLI;SGP-2;SGP2;SP-40;TRPM-2;TRPM2

apoptosis;complement activation\, classical pathway;fertilization (sensu Animalia);lipid metabolism

binding extracellular

dystrophin (muscular dystrophy, Duchenne and Becker types)

DMD

BMD;DXS142;DXS164;DXS206;DXS230;DXS239;DXS268;DXS269;DXS270;DXS272

biological_process unknown;muscle contraction;muscle development

actin binding;calcium ion binding;molecular_function unknown;structural constituent of cytoskeleton;zinc ion binding

cellular_component unknown;cytoskeleton;membrane

epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)

EGFR ERBB;ERBB1

EGF receptor signaling pathway;cell proliferation;electron transport;protein amino acid phosphorylation

ATP binding;electron transporter activity;epidermal growth factor receptor activity

cytoskeleton;endosome;integral to plasma membrane

growth arrest-specific 7

GAS7 KIAA0394 cell cycle arrest;neurogenesis

transcription factor activity

mitochondrion

gap junction protein, alpha 1, 43kDa (connexin 43)

GJA1

CX43;DFNB38;ODD;ODDD;ODOD;SDTY3

cell-cell signaling;hearing;heart development;muscle contraction;regulation of heart rate;transport

connexon channel activity;ion transporter activity

connexon complex;integral to plasma membrane

leukemia inhibitory factor receptor

LIFR cell surface receptor linked signal transduction

leukemia inhibitory factor receptor activity

integral to plasma membrane

neurofilament, light polypeptide 68kDa

NEFL CMT1F;CMT2E;NF68;NFL

cytoskeleton organization and biogenesis

structural constituent of cytoskeleton

neurofilament

Microtubule-associated protein 1S

MAP1S/C19ORF5

MAP1S, C19ORF5

thrombospondin 4 THBS4 TSP4

cell adhesion;substrate-bound cell migration\, cell extension

calcium ion binding;cell adhesion molecule activity;heparin binding;structural molecule activity

extracellular matrix;extracellular space

adenomatosis polyposis coli

APC DP2;DP2.5;DP3;FAP;FPC;GS

Wnt receptor signaling pathway;cell adhesion;negative regulation of cell cycle;protein

beta-catenin binding cytoplasm

76 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

complex assembly

cyclin-dependent kinase 5, regulatory subunit 1 (p35)

CDK5R1

CDK5P35;CDK5R;MGC33831;NCK5A;p23;p25;p35;p35nck5a

brain development;regulation of CDK activity;regulation of neuron differentiation

cyclin-dependent protein kinase 5 activator activity;protein kinase activity

cyclin-dependent protein kinase 5 activator complex

cell adhesion molecule with homology to L1CAM (close homolog of L1)

CHL1 CALL;L1CAM2

cell adhesion;signal transduction

cell adhesion molecule activity

integral to membrane

contactin 4 CNTN4

AXCAM;BIG-2;CNTN4A;MGC33615

cell adhesion;signal transduction

cell adhesion molecule activity

integral to membrane

dihydropyrimidinase-like 3

DPYSL3

CRMP-4;CRMP4;DRP-3;DRP3;ULIP

neurogenesis;nucleobase\, nucleoside\, nucleotide and nucleic acid metabolism;signal transduction

dihydropyrimidinase activity

membrane

fibroblast growth factor receptor 1 (fms-related tyrosine kinase 2, Pfeiffer syndrome)

FGFR1

BFGFR;C-FGR;CEK;FLG;FLJ14326;FLT2;H2;H3;H4;H5;KAL2;N-SAM

FGF receptor signaling pathway;MAPKKK cascade;cell growth;oncogenesis;protein amino acid phosphorylation;skeletal development

ATP binding;fibroblast growth factor receptor activity;heparin binding

integral to plasma membrane;membrane fraction

galanin receptor 2 GALR2 GALNR2

G-protein signaling\, coupled to cAMP nucleotide second messenger;cytosolic calcium ion concentration elevation;development;digestion;feeding behavior;learning and/or memory;muscle contraction;synaptic transmission

galanin receptor activity

integral to membrane;plasma membrane

glial cell derived neurotrophic factor

GDNF

G protein-regulator of neurite outgrowth 1

GPRIN1 KIAA1983

glutamate receptor, metabotropic 4

GRM4 GPRC1D;MGLUR4;mGlu4

negative regulation of adenylate cyclase activity;synaptic transmission

metabotropic glutamate\, GABA-B-like receptor activity

integral to plasma membrane

77 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

insulin-like growth factor 1 receptor

IGF1R JTK13

anti-apoptosis;insulin receptor signaling pathway;positive regulation of cell proliferation;protein amino acid phosphorylation;regulation of cell cycle

ATP binding;epidermal growth factor receptor activity;insulin-like growth factor receptor activity;protein binding

integral to membrane

laminin, beta 1 LAMB1 CLM

myosin, heavy polypeptide 10, non-muscle

MYH10 NMMHCB cytokinesis

ATP binding;actin binding;calmodulin binding;motor activity

myosin complex

neurturin NRTN

phosphatase and tensin homolog (mutated in multiple advanced cancers 1)

PTEN

BZS;MGC11227;MHAM;MMAC1;PTEN1;TEP1

development;negative regulation of cell cycle;protein amino acid dephosphorylation;regulation of CDK activity

phosphatidylinositol-3\,4\,5-trisphosphate 3-phosphatase activity;protein-tyrosine-phosphatase activity

cytoplasm

protein tyrosine phosphatase, receptor type, K

PTPRK R-PTP-kappa

protein amino acid dephosphorylation;transmembrane receptor protein tyrosine phosphatase signaling pathway

transmembrane receptor protein tyrosine phosphatase activity

integral to plasma membrane

runt-related transcription factor 3

RUNX3 AML2;CBFA3;PEBP2aC

cell proliferation;regulation of transcription\, DNA-dependent

ATP binding;DNA binding

nucleus;ribosome

septin 2 SEPT2

DIFF6, KIAA0158, NEDD5, Pnutl3, hNedd5

abl interactor 2 ABI2

ABI-2, ABI2B, AIP-1, AblBP3, SSH3BP2, argBPIA, argBPIB

ADP-ribosylation factor 6

ARF6

intracellular protein transport;nonselective vesicle transport;small GTPase mediated signal transduction

ARF small monomeric GTPase activity;GTP binding;enzyme activator activity;protein transporter activity

Golgi apparatus;membrane fraction;plasma membrane

Bardet-Biedl syndrome 1

BBS1 BBS2L2;FLJ23590

78 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

Bardet-Biedl syndrome 4

BBS4 vision

choline acetyltransferase

CHAT FIMG2 neurotransmitter biosynthesis

choline O-acetyltransferase activity

cytoplasm;nucleus

FEZ family zinc finger 2

FEZF2

FEZ, FEZL, FKSG36, FLJ10142, TOF, ZFP312, ZNF312

regulation of forebrain development

transcription factor activity

nucleus

ghrelin precursor GHRL MTLRP

G-protein coupled receptor protein signaling pathway;cell-cell signaling

growth hormone receptor binding;growth hormone-releasing hormone activity

extracellular space;soluble fraction

glutamate receptor, ionotropic, N-methyl-D-aspartate 3A

GRIN3A NMDAR-L;NR3A

ion transport

glutamate-gated ion channel activity;inotropic glutamate receptor activity

membrane

immunoglobulin superfamily, member 9, dasm1

IGSF9 KIAA1355;Nrt1

flight behavior transmembrane receptor activity

integral to plasma membrane

leukocyte specific transcript 1

LST1 B144;D6S49E;LST-1

cellular defense response

defense/immunity protein activity

integral to plasma membrane

MCF.2 cell line derived transforming sequence

MCF2 DBL

cell growth and/or maintenance;intracellular signaling cascade

guanyl-nucleotide exchange factor activity

cytoskeleton;cytosol;membrane fraction

methyl CpG binding protein 2 (Rett syndrome)

MECP2 MRX16;MRX79;PPMX;RTS;RTT

negative regulation of transcription from Pol II promoter

methyl-CpG binding;transcription corepressor activity

chromatin;nucleus

Microtubule associated protein 1B

MAP1B

DKFZp686E1099, DKFZp686F1345, FLJ38954, FUTSCH, MAP5

microtubule cytoskeleton

Microtubule associated protein 2

MAP2

DKFZp686I2148, MAP2A, MAP2B, MAP2C

microtubule cytoskeleton

myosin VI MYO6 DFNA22;DFNB37;KIAA0389

cytoskeleton organization and biogenesis;hearing;striated muscle contraction

ATP binding;actin binding;calmodulin binding;motor activity;myosin ATPase activity;structural constituent of muscle

unconventional myosin

neuropilin 1 NRP1 NRP;VEGF165R

angiogenesis;axon guidance;cell adhesion;cell-cell signaling;positive

cell adhesion molecule activity;protein binding;vascular

integral to membrane;membrane fraction

79 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

regulation of cell proliferation;signal transduction

endothelial growth factor receptor activity

p21/Cdc42/Rac1-activated kinase 1 (STE20 homolog, yeast)

PAK1 PAKalpha

JNK cascade;apoptosis;protein amino acid phosphorylation

ATP binding;protein serine/threonine kinase activity

focal adhesion

protein phosphatase 1, regulatory subunit 9B, spinophilin

PPP1R9B PPP1R9;SPINO

RNA splicing;cell cycle arrest;colony morphology;interpretation of external signals that regulate cell growth;negative regulation of cell growth;regulation of cell proliferation;regulation of exit from mitosis;transport

protein phosphatase 1 binding;protein phosphatase inhibitor activity;transporter activity

cytoplasm;membrane;nucleoplasm;protein phosphatase type 1 complex

protein kinase, cGMP-dependent, type I

PRKG1

CGKI;PGK;PRKG1B;PRKGR1B;cGKI-BETA;cGKI-alpha

actin cytoskeleton organization and biogenesis;actin cytoskeleton reorganization;protein amino acid phosphorylation;regulation of smooth muscle contraction;signal transduction

3'\,5'-cGMP binding;ATP binding;cAMP-dependent protein kinase regulator activity;cGMP-dependent protein kinase activity;protein-tyrosine kinase activity

cAMP-dependent protein kinase complex

pleckstrin homology, Sec7 and coiled-coil domains 2 (cytohesin-2)

PSCD2 ARNO;CTS18.1;Sec7p-L

actin cytoskeleton organization and biogenesis;endocytosis;signal transduction

ARF guanyl-nucleotide exchange factor activity

membrane fraction;plasma membrane

scavenger receptor class F, member 1

SCARF1 KIAA0149;MGC47738;SREC

cell adhesion;low-density lipoprotein catabolism;receptor mediated endocytosis

cell adhesion molecule activity;low-density lipoprotein binding;scavenger receptor activity;structural molecule activity

integral to membrane

synaptic Ras GTPase activating protein 1 homolog (rat)

SYNGAP1

DKFZp761G1421;KIAA1938;RASA1;RASA5;SYNGAP

GTPase activator activity

trafficking protein particle complex 4

TRAPPC4

ER to Golgi transport;dendrite morphogenesis;neurotransmitter receptor biosynthesis;vesicle-mediated transport

protein binding

Golgi cis-face;Golgi stack;dendrite;endoplasmic reticulum;synapse;synaptic junction;synaptic vesicle

kalirin, RhoGEF kinase

Kalrn

signal transduction;vesicle-mediated transport

guanyl-nucleotide exchange factor activity

determination of neuronal dendritic morphology

molecular switch nuclear transcription factor

80 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

tRNA cysteine TRC

Down syndrome cell adhesion molecule

DSCAM CHD2-42;CHD2-52

cell adhesion;neurogenesis

cell adhesion molecule activity

integral to plasma membrane;membrane fraction

Spineless (Drosophila); aryl hydrocarbon receptor (vertebrates)

determination of neuronal dendritic morphology

regulation of gene transcription

nuclear transcription factor

cadherin 2, type 1, N-cadherin (neuronal)

CDH2

CDHN; NCAD; CD325; CDw325

Wilson-Turner X-linked mental retardation syndrome

WTS MRXS6

Salvador SAV1

Neuritin NRN1 MGC44811, NRN, dJ380B8.2

neuritogenesis cell adhesion GPI-anchored membrane receptor

myosin Va Myo5a synapse organization and biogenesis

protein binding axon, neuron projection

staufen (RNA binding protein) homolog 1 (Drosophila)

Stau1 intracellular mRNA localization

double-stranded RNA binding

neuron projection

Rho GTPase-activating protein

RICS

Genes below were obtained from Jackson's Lab. phenotypes with data on LTP/LTD

amiloride-sensitive cation channel 2, neuronal

ACCN2_MOUSE

ACCN2_MOUSE;ASIC1_MOUSE;ASIC;Accn2;amiloride-sensitive cation channel 2, neuronal

transport;ion transport;cation transport;sodium ion transport;calcium ion transport;monovalent inorganic cation transport;associative learning;response to acid;memory

calcium ion binding;sodium channel activity;sodium ion binding;amiloride-sensitive sodium channel activity;ion channel activity;monovalent inorganic cation transmembrane transporter activity;cation channel activity

dendritic shaft;dendritic spine;synaptosome;integral to membrane;membrane;integral to plasma membrane;synapse

adenylate cyclase 8 ADCY8_MOUSE

ADCY8_MOUSE;AC8;ADCY8;AW060868;Adcy

intracellular signaling cascade;cyclic nucleotide

phosphorus-oxygen lyase activity;metal ion binding;adenylate cyclase activity;lyase

integral to membrane;membrane;plasma membrane

81 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

8 biosynthetic process;cAMP biosynthetic process;adenylate cyclase activation

activity;magnesium ion binding

adenylate cyclase activating polypeptide 1 receptor 1

ADCYAP1R1_MOUSE

PITUITARY ADENYLATE CYCLASE ACTIVATING POLYPEPTIDE TYPE I RECEPTOR PRECURSOR;PACAP TYPE I RECEPTOR;ADCYAP1R1_MOUSE;PACAP-R-1;PACAPR_MOUSE

signal transduction;spermatogenesis;multicellular organismal development;G-protein coupled receptor protein signaling pathway;cell differentiation

receptor activity;signal transducer activity;G-protein coupled receptor activity;vasoactive intestinal polypeptide receptor activity

integral to membrane;membrane;extracellular space

adducin 2 (beta) ADD2_MOUSE

ADD2_MOUSE;Add2;ADD2

hemopoiesis

structural molecule activity;metal ion binding;calmodulin binding

cytoskeleton;cytoplasm;membrane

AF4/FMR2 family, member 2

AFF2_MOUSE

FMR2;Fmr2;OX19;Ox19;Oxh;FMR2_MOUSE

learning and/or memory

adaptor-related protein complex 3, mu 2 subunit

AP3M2_MOUSE

Ap3m2;5830445E16Rik;AP3M2_MOUSE

protein complex assembly;transport;vesicle-mediated transport;protein transport;intracellular protein transport

protein transporter activity;protein binding

cytoplasmic vesicle;clathrin adaptor complex;clathrin vesicle coat;membrane coat;Golgi apparatus;membrane

calbindin 2 CALB2_MOUSE

calretinin;CALB2;CR;Calb2;CALB2_MOUSE

calcium ion binding gap junction

calcium/calmodulin-dependent protein kinase kinase 2, beta

CAMKK2_MOUSE

CAMKK2_MOUSE;6330570N16RIK_MOUSE;6330570N16Rik

protein amino acid phosphorylation

calmodulin binding;calmodulin-dependent protein kinase activity;ATP binding;protein serine/threonine kinase activity;transferase activity;nucleotide binding;protein kinase activity;kinase activity

cytoplasm

cerebellin 1 precursor protein

CBLN1_MOUSE

CBLN1_MOUSE;Cbln1;CBLN1;AI323299

membrane;extracellular space;synapse;cell junction;extracellular region

CD247 antigen CD247_MOUSE

CD3Z_MOUSE;CD247;CD3H;Tcrz;TCRk;Cd3z;CD3Z;Cd3;T3

cell surface receptor linked signal transduction

transmembrane receptor activity;protein binding;receptor activity

plasma membrane;membrane;integral to membrane;T

82 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

z;Cd3h;TCRZ

cell receptor complex;alpha-beta T cell receptor complex;cytoplasm

carbohydrate sulfotransferase 10

CHST10_MOUSE

AI507003;AU041319;Chst10;ST;CHST10_MOUSE;Hnk-1st-pending;Hnk-1st

long-term memory;carbohydrate metabolic process;learning

sulfotransferase activity;transferase activity

integral to membrane;Golgi apparatus;membrane;cellular_component

cannabinoid receptor 1 (brain)

CNR1_MOUSE

CNR1_MOUSE;CB1;CB1-R;Cannabinoid receptor 1

signal transduction;G-protein coupled receptor protein signaling pathway

cannabinoid receptor activity;receptor activity;signal transducer activity;rhodopsin-like receptor activity;G-protein coupled receptor activity

integral to membrane;membrane

collapsin response mediator protein 1

CRMP1_MOUSE

CRMP1_MOUSE;DRP-1;Collapsin response mediator protein 1;Dihydropyrimidinase related protein-1_mouse;CRMP-1;CRMP1;DPYSL1

hydrolase activity cell soma;dendrite;cytoplasm

chondroitin sulfate proteoglycan 5

CSPG5_MOUSE

NGC;Cspg5;CSPG5_MOUSE

cell differentiation;nervous system development;multicellular organismal development;regulation of cell growth;regulation of synaptic transmission

extracellular space;endoplasmic reticulum;Golgi apparatus;membrane;integral to membrane

catenin (cadherin associated protein), delta 2

CTNND2_MOUSE

Ctnnd2;CATND2_MOUSE;Catnd2;Nprap

learning;regulation of synaptic plasticity;transcription;morphogenesis of a branching structure;regulation of transcription, DNA-dependent;cell adhesion;multicellular organismal development

protein binding;binding;structural molecule activity

nucleus;cytoplasm;cell junction;cytoskeleton

dystroglycan 1 DAG1_MOUSE

DAG-1;D9Wsu13e;DAG1;DG;Dag1;DAG1_MOUSE

morphogenesis of an epithelial sheet

calcium ion binding;protein binding

insoluble fraction;lipid raft;sarcolemma;integral to membrane;extracellular region;cytoplasm;cytoskeleton;plasma

83 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

membrane;dystroglycan complex;membrane

diacylglycerol kinase, epsilon

DGKE_MOUSE

DGKE_MOUSE;Dgke;DGK;DAGK6

intracellular signaling cascade;protein kinase C activation

diacylglycerol kinase activity;zinc ion binding;kinase activity;transferase activity;diacylglycerol binding;metal ion binding

integral to membrane;membrane

double C2, alpha DOC2A_MOUSE

DOC2A_MOUSE;Doc2a

transport;exocytosis

transporter activity;calcium-dependent phospholipid binding;calcium ion binding

synaptosome;cell junction;cytoplasmic vesicle;synapse;membrane;synaptic vesicle

eukaryotic translation initiation factor 2 alpha kinase 4

EIF2AK4_MOUSE

EIF2AK4_MOUSE

post-translational protein modification;regulation of translation initiation in response to stress;unfolded protein response;cellular response to starvation;translation;tRNA aminoacylation for protein translation;negative regulation of translation;regulation of protein metabolic process;protein amino acid phosphorylation

nucleotide binding;protein serine/threonine kinase activity;translation initiation factor activity;small conjugating protein ligase activity;transferase activity;kinase activity;ATP binding;aminoacyl-tRNA ligase activity;eukaryotic translation initiation factor 2alpha kinase activity;protein kinase activity

cytoplasm

eukaryotic translation initiation factor 4E binding protein 2

EIF4EBP2_MOUSE

EIF4EBP2_MOUSE;PHAS-II;Eif4ebp2;4E-BP2;2810011I19Rik

insulin receptor signaling pathway;regulation of translational initiation;negative regulation of translation;cAMP-mediated signaling;negative regulation of translational initiation;regulation of translation

eukaryotic initiation factor 4E binding;translation initiation factor activity;protein binding

cellular_component

Eph receptor A4 EPHA4_MOUSE

EPHA4_MOUSE

transmembrane receptor protein tyrosine kinase signaling pathway;protein amino acid phosphorylation;axon guidance;adult walking behavior

ATP binding;transferase activity;ephrin receptor activity;protein kinase activity;protein-tyrosine kinase activity;transmembrane receptor protein tyrosine kinase activity;nucleotide binding;receptor activity;kinase activity;protein

membrane;integral to membrane

84 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

binding

fragile X mental retardation syndrome 1 homolog

FMR1_MOUSE

FMRP;FMR-1;FMR1;Fmr-1;Fmr1;FMR1_MOUSE

central nervous system development;transport;mRNA transport

RNA binding;protein binding

cytoplasm;nucleus

gamma-aminobutyric acid (GABA-B) receptor, 1

GABBR1_MOUSE

GABA-B-R;GABBR1_MOUSE;gamma-aminobutyric acid B receptor, 1A;GABAB1A_MOUSE

signal transduction;G-protein coupled receptor protein signaling pathway

metabotropic glutamate, GABA-B-like receptor activity;signal transducer activity;protein binding;G-protein coupled receptor activity;GABA-B receptor activity;receptor activity

postsynaptic membrane;cell junction;synapse;integral to membrane;cytoplasm;membrane

glial fibrillary acidic protein

GFAP_MOUSE

GFAP_MOUSE;Gfap

intermediate filament-based process

protein binding;structural molecule activity

intermediate filament;membrane fraction;cytoplasm

guanine nucleotide binding protein (G protein), alpha inhibiting 1

GNAI1_MOUSE

GNAI1_MOUSE;Gnai1;Gnai-1;Gialpha1

G-protein coupled receptor protein signaling pathway

GTPase activity intracellular

glutamate receptor, metabotropic 1

GRM1_MOUSE

MGLUR1_MOUSE;mGluR1;mGluR1alpha;GRM1_MOUSE;Gprc1a;Glutamate receptor, metabotropic 1

regulation of sensory perception of pain;regulation of MAPKKK cascade;locomotory behavior;G-protein coupled receptor protein signaling pathway;signal transduction;activation of MAPK activity;activation of MAPKK activity

metabotropic glutamate, GABA-B-like receptor activity;protein binding;G-protein coupled receptor activity;receptor activity;signal transducer activity;PLC activating metabotropic glutamate receptor activity

microsome;postsynaptic density;membrane;integral to membrane;dendrite;nucleus;cell soma;postsynaptic membrane

intercellular adhesion molecule 5, telencephalin

ICAM5_MOUSE

Tlcn;ICAM5;Icam5;TLCN;TLN;ICAM5_MOUSE

cell-cell adhesion;cell adhesion

protein binding

membrane;plasma membrane;integral to membrane

inositol 1,4,5-trisphosphate 3-kinase A

ITPKA_MOUSE

ITPKA_MOUSE;Itpka;MGC28924

inositol metabolic process

ATP binding;kinase activity;nucleotide binding;transferase activity;inositol or phosphatidylinositol kinase activity;calmodulin binding;inositol trisphosphate 3-kinase activity

cellular_component

potassium voltage-gated channel, shaker-related subfamily, beta member 1

KCNAB1_MOUSE

Akr8a8;Kcnab1;potassium voltage-gated channel, shaker-

ion transport;potassium ion transport;transport

voltage-gated ion channel activity;voltage-gated potassium channel activity;potassium channel

integral to membrane;integral to plasma membrane;cytoplasm

85 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

related subfamily, beta member 1;KCNAB1_MOUSE;Kv beta1_MOUSE

activity;potassium ion binding;oxidoreductase activity;ion channel activity

Kv channel interacting protein 3, calsenilin

KCNIP3_MOUSE

KCNIP3_MOUSE;Csen;DREAM;calsenilin, presenilin binding protein, EF hand transcription factor;KCHIP3_MOUSE

negative regulation of transcription from RNA polymerase II promoter;potassium ion transport;apoptosis;behavior;negative regulation of transcription;sensory perception of pain;regulation of neuron apoptosis;response to pain;transcription;regulation of transcription, DNA-dependent;ion transport;transport

protein C-terminus binding;voltage-gated ion channel activity;specific transcriptional repressor activity;potassium ion binding;calcium-dependent protein binding;ion channel activity;DNA binding;protein binding;calcium ion binding;potassium channel activity;transcription repressor activity

cytoplasm;nucleus;membrane;cytosol;Golgi apparatus;endoplasmic reticulum

potassium intermediate/small conductance calcium-activated channel, subfamily N, member 2

KCNN2_MOUSE

KCNN2_MOUSE;Kcnn2;SK2;SK-2_MOUSE;potassium intermediate/small conductance calcium-activated channel, subfamily N, member 2

potassium ion transport;biological_process;transport;ion transport

small conductance calcium-activated potassium channel activity;ion channel activity;calmodulin binding;calcium-activated potassium channel activity

integral to membrane;membrane

LIM-domain containing, protein kinase

LIMK1_MOUSE

LIMK1_MOUSE

protein amino acid phosphorylation;positive regulation of axon extension

protein heterodimerization activity;metal ion binding;transferase activity;kinase activity;nucleotide binding;zinc ion binding;protein kinase activity;ATP binding;protein binding;protein-tyrosine kinase activity;protein serine/threonine kinase activity

focal adhesion;nucleus;cytoplasm

mannosidase 2, alpha B1

MAN2B1_MOUSE

MANB;LAMAN;MAN2B;MAN2B1;MAN2B1_MOUSE;Man2b1;AW107687

carbohydrate metabolic process;learning and/or memory;metabolic process;mannose metabolic process

alpha-mannosidase activity;hydrolase activity, acting on glycosyl bonds;zinc ion binding;mannosidase activity;metal ion binding;hydrolase activity

lysosome

86 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

mitogen-activated protein kinase 3

MAPK3_MOUSE

MAPK3_MOUSE

response to exogenous dsRNA;response to lipopolysaccharide;lipopolysaccharide-mediated signaling pathway;phosphorylation;organ morphogenesis;signal transduction;cell cycle;response to DNA damage stimulus;cartilage development;protein amino acid phosphorylation;sensory perception of pain

transferase activity;nucleotide binding;phosphotyrosine binding;protein kinase activity;protein serine/threonine kinase activity;MAP kinase activity;protein binding;kinase activity;ATP binding

nucleus;cytoplasm

MAS1 oncogene MAS1_MOUSE

MAS1_MOUSE;Oncogene MAS1

cellular process;G-protein coupled receptor protein signaling pathway;regulation of cell cycle;signal transduction

receptor activity;rhodopsin-like receptor activity;G-protein coupled receptor activity;signal transducer activity

intracellular;membrane;integral to membrane

methyl-CpG binding domain protein 1

MBD1_MOUSE

Cxxc3;MBD1_MOUSE;Mbd1;PCM1

transcription;DNA methylation;regulation of transcription, DNA-dependent

zinc ion binding;DNA binding;metal ion binding

heterochromatin;chromatin;nucleus

neurocan NCAN_MOUSE

Ncan;CSPG3;Cspg3;NCAN;CSPG3_MOUSE;neurocan;C230035B04

cell adhesion

sugar binding;hyaluronic acid binding;calcium ion binding

extracellular space

NEL-like 2 (chicken)

NELL2_MOUSE

NELL2_MOUSE;mel91;Nell2;A330108N19Rik

cell adhesion calcium ion binding;structural molecule activity

extracellular space;extracellular region

neuro-oncological ventral antigen 2

NOVA2_MOUSE

Gm1424 protein binding;RNA binding

neurogranin NRGN_MOUSE

0710001B06Rik;NG;NG/RC3;NRGN_MOUSE;RC3;Nrgn;Pss1;R75334

protein kinase cascade

calmodulin binding

opioid receptor-like 1

OPRL1_MOUSE

KOR-3;OPRL1_MOUSE;ORL1;KOR3;nociceptin receptor;12C;K3;kappa-type 3 opioid receptor;orphanin FQ receptor

G-protein coupled receptor protein signaling pathway;signal transduction

rhodopsin-like receptor activity;signal transducer activity;receptor activity;G-protein coupled receptor activity;opioid receptor activity;X-opioid receptor activity

integral to membrane;membrane

87 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

opioid receptor, mu 1

OPRM1_MOUSE

MU-TYPE OPIOID RECEPTOR;OPRM1_MOUSE;MOR-1;MOR1

G-protein coupled receptor protein signaling pathway;signal transduction;behavior;G-protein signaling, adenylate cyclase inhibiting pathway;dopamine receptor, adenylate cyclase activating pathway;locomotory behavior

receptor activity;G-protein coupled receptor activity;signal transducer activity;mu-opioid receptor activity;opioid receptor activity;rhodopsin-like receptor activity

membrane fraction;membrane;integral to membrane

p21 (CDKN1A)-activated kinase 3

PAK3_MOUSE

PAK3_MOUSE

multicellular organismal development;protein amino acid phosphorylation

transferase activity;kinase activity;ATP binding;protein binding;protein serine/threonine kinase activity;protein kinase activity;catalytic activity;magnesium ion binding;nucleotide binding;metal ion binding

Parkinson disease (autosomal recessive, early onset) 7

PARK7_MOUSE

DJ-1_MOUSE;Dj1-pending;hiptar0004921;thiJ homologue (Caenorhabditis elegans);CAP1 (Rattus norvegicus);DJ-1 putative peptidase;4-methyl-5(beta-hydroxyethyl)-thiazole monophosphate biosynthesis protein (Escherichia coli);contraception-associated protein 1 (Rattus norvegicus);thiJ g.p. (Escherichia coli)

response to hydrogen peroxide;synaptic transmission, dopaminergic;adult locomotory behavior;dopamine uptake;cell proliferation

RNA binding nucleus;cytoplasm

plasminogen activator, tissue

PLAT_MOUSE

PLAT_MOUSE;t-plasminogen activator;hiptar0004973;tPA;tissue

platelet-derived growth factor receptor signaling pathway;proteolysis

peptidase activity;plasminogen activator activity;hydrolase activity;serine-type endopeptidase

extracellular region;extracellular space;apical part of cell;cytoplasm

88 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

plasminogen activator

activity ;secretory granule

phospholipase C, beta 4

PLCB4_MOUSE

PLCB4_MOUSE;Plcb4

intracellular signaling cascade;lipid metabolic process;signal transduction

protein binding;phospholipase C activity;phosphoinositide phospholipase C activity

dendrite;postsynaptic density;smooth endoplasmic reticulum;nucleus;microsome

protein phosphatase 1, regulatory (inhibitor) subunit 1A

PPP1R1A_MOUSE

Ppp1r1a;0610038N18Rik;PPP1R1A_MOUSE;I-1

carbohydrate metabolic process;signal transduction;glycogen metabolic process

protein binding;protein phosphatase inhibitor activity

protein kinase, cAMP dependent, catalytic, beta

PRKACB_MOUSE

PRKACB_MOUSE

protein amino acid phosphorylation;G-protein signaling, coupled to cAMP nucleotide second messenger

magnesium ion binding;protein serine/threonine kinase activity;cAMP-dependent protein kinase activity;ATP binding;kinase activity;transferase activity;nucleotide binding;protein kinase activity

cAMP-dependent protein kinase complex;cytoplasm;nucleus

protein kinase, cAMP dependent regulatory, type I beta

PRKAR1B_MOUSE

PRKAR1B_MOUSE;RIbeta;Prkar1b;AI385716

cell proliferation;organ morphogenesis;protein amino acid phosphorylation;signal transduction;learning and/or memory

cAMP binding;kinase activity;cAMP-dependent protein kinase regulator activity;nucleotide binding

cytoplasm;cAMP-dependent protein kinase complex

pleiotrophin PTN_MOUSE

HARP;HBBN;HBGF-8;HBNF;OSF;Ptn;HB-GAM;PTN;Osf1;Osf-1;PTN_MOUSE

bone mineralization;cell proliferation;learning;ossification

heparin binding;growth factor activity

extracellular space;proteinaceous extracellular matrix;extracellular region

protein tyrosine phosphatase, receptor type, D

PTPRD_MOUSE

PTPRD_MOUSE;Ptprd;MGC36851

dephosphorylation;transmembrane receptor protein tyrosine phosphatase signaling pathway;protein amino acid dephosphorylation

phosphoric monoester hydrolase activity;hydrolase activity;receptor activity;protein tyrosine phosphatase activity;phosphoprotein phosphatase activity

integral to membrane;membrane;plasma membrane

retinoic acid receptor, beta

RARB_MOUSE

RARB_MOUSE;Rarb

embryonic eye morphogenesis;positive regulation of transcription from RNA polymerase II promoter;positive regulation of apoptosis;regulation of transcription, DNA-dependent;ventricular cardiac muscle cell

receptor activity;DNA binding;metal ion binding;ligand-dependent nuclear receptor activity;transcription activator activity;sequence-specific DNA binding;zinc ion binding;transcription factor activity;steroid hormone receptor

nucleus

89 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

differentiation;transcription;ureteric bud development

activity;retinoic acid receptor activity

regulating synaptic membrane exocytosis 1

RIMS1_MOUSE

RIMS1_MOUSE;Rim;RIM1;RIM1a;Serg1;Rims1;Rab3ip1

exocytosis;neurotransmitter transport;intracellular protein transport;regulation of long-term neuronal synaptic plasticity;transport

protein binding;metal ion binding;Rab GTPase binding;zinc ion binding

cell junction;synapse

Ras and Rab interactor 1

RIN1_MOUSE

RIN1_MOUSE;Rin1

intracellular signaling cascade;signal transduction;neuropeptide signaling pathway;endocytosis

GTPase activator activity;protein binding

cytoplasm;cytoskeleton;membrane

ryanodine receptor 3

RYR3_MOUSE

Ryr3;AI851294;RYR3_MOUSE;ryanodine receptor 3

striated muscle contraction;transport;cellular calcium ion homeostasis;ion transport

receptor activity;ion channel activity

integral to membrane;junctional membrane complex

syndecan 3 SDC3_MOUSE

mKIAA0468;Synd3;SDC3_MOUSE;syn-3;MGC69616;SDC3;MGC65603;Sdc3

cytoskeletal protein binding

membrane;integral to membrane

serine (or cysteine) peptidase inhibitor, clade E, member 2

SERPINE2_MOUSE

GDN;Serpine2;PI7;PN1;Glia derived nexin [Precursor];Protease nexin I;PN-1;Protease inhibitor 7;SERPINE2_MOUSE

nervous system development;multicellular organismal development;cell differentiation

heparin binding;serine-type endopeptidase inhibitor activity;endopeptidase inhibitor activity

extracellular region;extracellular space

solute carrier family 24 (sodium/potassium/calcium exchanger), member 2

SLC24A2_MOUSE

Slc24a2;2810021B17Rik;SLC24A2_MOUSE

integral to membrane

solute carrier family 8 (sodium/calcium exchanger), member 2

SLC8A2_MOUSE

Ncx2;SLC8A2_MOUSE;Slc8a2

calcium ion transport;transport

calcium:sodium antiporter activity;transmembrane transporter activity;calmodulin binding

integral to plasma membrane;membrane;integral to membrane

ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 4

ST8SIA4_MOUSE

Siat8d;SIAT8D;SIAT8D_MOUSE;PST;PST-1;ST8SiaIV

protein amino acid glycosylation

transferase activity, transferring glycosyl groups;alpha-N-acetylneuraminate alpha-2,8-sialyltransferase activity;sialyltransferase activity;transferase activity

integral to membrane;Golgi apparatus;integral to Golgi membrane;membrane

90 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

synaptopodin SYNPO_MOUSE

SYNPO_MOUSE;9330140I15Rik;LOC170766;Synpo

cortical cytoskeleton organization and biogenesis

actin binding

cytoskeleton;tight junction;actin cytoskeleton;membrane;cell junction;postsynaptic membrane;synapse;axon;cell projection;dendritic spine;dendrite;cytoplasm

thymus cell antigen 1, theta

THY1_MOUSE

Thy1;Thy-1;THY1_MOUSE;THY1;THY-1;CD90;Thy1.1

retinal cone cell development;negative regulation of T cell receptor signaling pathway;angiogenesis

GPI anchor binding

membrane;external side of plasma membrane;anchored to external side of plasma membrane

tropomodulin 2 TMOD2_MOUSE

N-Tmod;TMOD2_MOUSE;NTMOD;Tmod2

positive regulation of G-protein coupled receptor protein signaling pathway;learning and/or memory;nerve-nerve synaptic transmission

actin binding;tropomyosin binding

cytoskeleton;cytoplasm

ubiquitin protein ligase E3A

UBE3A_MOUSE

4732496B02;UBE3A_MOUSE;Hpve6a;Ube3a

ubiquitin-dependent protein catabolic process;protein modification process;ubiquitin cycle

ubiquitin-protein ligase activity;ligase activity;protein binding

protein complex;cytosol;cytoplasm;nucleus;intracellular

ubiquitin specific peptidase 14

USP14_MOUSE

ubiquitin specific protease 14;USP14_MOUSE;TGT subunit;hiptar0005312;USP14 g.p. (Homo sapiens);tRNA-guanine transglycosylase 60-kDa subunit

synaptic transmission;ubiquitin cycle;ubiquitin-dependent protein catabolic process;protein modification process

cysteine-type peptidase activity;peptidase activity;ubiquitin thiolesterase activity;hydrolase activity

soluble fraction;synaptosome

voltage-dependent anion channel 1

VDAC1_MOUSE

VDAC1_MOUSE;VDAC1_MOUSE_V1;porin-1_MOUSE_v1;voltage-dependent anion channel 1;Vdac5;Vdac1;Plasmalemmal VDAC1;PL-

learning;nerve-nerve synaptic transmission;synaptic transmission;behavioral fear response;transport;ion transport;anion transport;apoptosis

voltage-gated ion-selective channel activity

mitochondrion;mitochondrial inner membrane;membrane;integral to membrane;mitochondrial outer membrane;outer membrane;extracellular

91 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

VDAC1 space

very low density lipoprotein receptor

VLDLR_MOUSE

VLDLR_MOUSE;Vldlr

transport;positive regulation of protein kinase activity;cholesterol metabolic process;lipid transport;endocytosis;steroid metabolic process;lipid metabolic process

lipid transporter activity;receptor activity;calcium ion binding

integral to membrane;membrane;membrane fraction;extracellular space;coated pit

A kinase (PRKA) anchor protein 5

AKAP5_MOUSE

Gm258 protein kinase binding

Cdc42 guanine nucleotide exchange factor (GEF) 9

ARHGEF9_MOUSE

TIG120842

intracellular signaling cascade;regulation of Rho protein signal transduction;small GTPase mediated signal transduction

Rho guanyl-nucleotide exchange factor activity;guanyl-nucleotide exchange factor activity

cell cortex;cytoplasm;intracellular

ataxin 1 ATXN1_MOUSE

Ataxin-1;Atx1;SCA1;SCA1_MOUSE;Sca1

adult locomotory behavior;regulation of excitatory postsynaptic membrane potential;visual learning

RNA binding;binding

cytoplasm;nuclear inclusion body;nuclear matrix;nucleus

beta-1,3-glucuronyltransferase 1 (glucuronosyltransferase P)

B3GAT1_MOUSE

0710007K08Rik;AI846286;B3GAT1;B3GAT1_MOUSE;B3gat1;GlcAT-P;HNK-1

UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity;galactosylgalactosylxylosylprotein 3-beta-glucuronosyltransferase activity;glucuronosyltransferase activity;manganese ion binding;metal ion binding;transferase activity

Golgi apparatus;extracellular space;integral to membrane;membrane

complexin 2 CPLX2_MOUSE

921-L;AI413745;AW492120;CPLX2_MOUSE;Cplx2

exocytosis;mast cell degranulation;membrane fusion;neurotransmitter transport;transport;vacuole organization and biogenesis;vesicle docking during exocytosis

syntaxin binding cytoplasm

galanin GAL_MOUSE

GALANIN;GALANIN MESSAGE-ASSOCIATED PEPTIDE;G

nervous system development;neuropeptide signaling pathway

hormone activity extracellular region;extracellular space

92 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

ALN;GAL_MOUSE;GLNN;GMAP

glutamate receptor, ionotropic, delta 2

GRID2_MOUSE

GLURD2_MOUSE;GRID2_MOUSE;Grid2;Lc;glutamate receptor, ionotropic, delta 2

ion transport;prepulse inhibition;regulation of excitatory postsynaptic membrane potential;synaptic transmission, glutamatergic;transport

extracellular-glutamate-gated ion channel activity;ion channel activity;ionotropic glutamate receptor activity;protein binding;receptor activity

cell junction;integral to membrane;membrane;membrane fraction;postsynaptic membrane;synapse;synaptosome

5-hydroxytryptamine (serotonin) receptor 2C

HTR2C_MOUSE

5-HT-2C;5-HT1C, 5HT2C;5-hydroxytryptamine 2C receptor;5HT-1;5HT-2C;HTR2C_MOUSE;serotonin receptor 2C

G-protein coupled receptor protein signaling pathway;inositol phosphate-mediated signaling;signal transduction

G-protein coupled receptor activity;receptor activity;rhodopsin-like receptor activity;serotonin receptor activity;signal transducer activity

external side of plasma membrane;integral to membrane;membrane

laminin, alpha 2 LAMA2_MOUSE

5830440B04;LAMA2;LAMA2_MOUSE;Lama2;dy;mer;merosin

cell adhesion;positive regulation of synaptic transmission, cholinergic;regulation of cell adhesion;regulation of cell migration;regulation of embryonic development

extracellular matrix structural constituent;protein binding;receptor binding

basal lamina;basement membrane;extracellular matrix;extracellular region;extracellular space;laminin-1 complex;proteinaceous extracellular matrix;sarcolemma

leptin receptor LEPR_MOUSE

DB;LEPR;LEPROT;LEPR_MOUSE;Lepr;MGC6694;OB-RGRP;OBR;Obr;db;diabetes;obese-like;obl

cholesterol metabolic process;negative regulation of hydrolase activity;regulation of metabolic process;signal transduction

hematopoietin/interferon-class (D200-domain) cytokine receptor activity;protein binding;receptor activity;transmembrane receptor activity

extracellular region;extracellular space;integral to membrane;integral to plasma membrane;membrane

purinergic receptor P2X, ligand-gated ion channel 4

P2RX4_MOUSE

P2RX4_MOUSE;P2RX4_MOUSE_V1;P2X4;P2X4_MOUSE_v1;P2rx4;purinergic receptor P2X, ligand-gated ion channel 4

calcium ion transport;ion transport;metabolic process;nitric oxide biosynthetic process;regulation of excitatory postsynaptic membrane potential;transport;vasodilation

ATP binding;ATP-gated cation channel activity;ion channel activity;receptor activity

apical part of cell;integral to membrane;integral to plasma membrane;membrane

93 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

PTEN induced putative kinase 1

PINK1_MOUSE

1190006F07RIK_MOUSE;1190006F07Rik;PINK1_MOUSE

protein amino acid phosphorylation;protein kinase cascade

ATP binding;kinase activity;magnesium ion binding;metal ion binding;nucleotide binding;protein kinase activity;protein serine/threonine kinase activity;transferase activity

mitochondrion

prion protein PRNP_MOUSE

CJD;Creutzfeld-Jakob disease, Gerstmann-Strausler-Scheinker syndrome;PRIP;PRNP_MOUSE;fatal familial insomnia;p27-30;prion protein

cellular copper ion homeostasis;nucleobase, nucleoside, nucleotide and nucleic acid metabolic process;protein homooligomerization;response to oxidative stress

GPI anchor binding;copper ion binding;protein binding

Golgi apparatus;endoplasmic reticulum;lipid raft;membrane;plasma membrane

protein tyrosine phosphatase, non-receptor type 4

PTPN4_MOUSE

PTPMEG;PTPN4_MOUSE;Ptn4;Ptpn4;TEP;hPTP-MEG

intracellular signaling cascade;protein amino acid dephosphorylation

hydrolase activity;non-membrane spanning protein tyrosine phosphatase activity;phosphoprotein phosphatase activity;prenylated protein tyrosine phosphatase activity;protein tyrosine phosphatase activity;receptor activity;structural molecule activity

cytoplasm;cytoskeleton

tenascin C TNC_MOUSE

AI528729;Hxb;TN-C;TNC_MOUSE;Ten;Tnc

cell adhesion;neuromuscular junction development;signal transduction

fibronectin binding;protein binding;receptor binding

basement membrane;extracellular region;extracellular space;proteinaceous extracellular matrix

WASP family 1 WASF1_MOUSE

AI195380;AI838537;Scar;WASF1_MOUSE;WAVE-1;Wasf1

actin filament polymerization;cell morphogenesis;cell motility;protein complex assembly

actin binding;protein binding

actin cytoskeleton;cytoplasm;cytoskeleton;lamellipodium;mitochondrial outer membrane;mitochondrion

94

95

Appendix B

Over-represented disease genes

Following tables summarize genes assigned to the two most represented diseases (alone or

together) resulted by both interrogating the DO Knowledgebase with dendritic plasticity

relevant genes and by network expansion. GO Biological Processes has been used as reference

annotation.

Alzheimer’s disease

Gene Symbol

Gene name GO Biological Process

A2M alpha-2-macroglobulin intracellular protein transport

APBB1 amyloid beta (A4) precursor protein-binding, family B, member 1

axonogenesis

APOA1 apolipoprotein A-I cholesterol metabolism

APOE apolipoprotein E regulation of neuronal synaptic plasticity

APP amyloid beta (A4) precursor protein signal transduction

BACE1 beta-secretase 1 beta amyloid metabolic process

BDNF brain derived neurotrophic factor neurogenesis

CDC2 cell division cycle 2, G1 to S and G2 to M start control point of mitotic cell cycle

CHAT choline acetyltransferase neurotransmitter biosynthesis

ESR1 estrogen receptor 1 cell growth;negative regulation of mitosis

FYN fyn proto-oncogene feeding behavior;learning;

LRP1 low density lipoprotein receptor-related protein 1

cell proliferation;pathogenesis

MAPK8IP1 mitogen-activated protein kinase 8 regulation of JNK cascade

96 Appendix B. Over-represented disease genes ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

interacting protein 1

MAPT microtubule-associated protein tau microtubule stabilization

NTF3 neurotrophin 3;neurotrophin-3 (HDNF/NT-3)

brain development

PRNP prion protein (p27-30) signal transduction

PSEN1 presenilin 1 intracellular signaling cascade

PSEN2 presenilin 2 intracellular signaling cascade

SNCA synuclein, alpha central nervous system development;pathogenesis

Schizophrenia

Gene Symbol

Gene Name GO Biological Process

CTLA4 cytotoxic T-lymphocyte-associated protein 4

immune response

HTATIP HIV-1 Tat interactive protein, 60 kD chromatin assembly/disassembly

CHL1 cell adhesion molecule with homology to L1CAM (close homolog of L1)

axon guidance;signal transduction

APOE apolipoprotein E learning and/or memory;regulation of neuronal synaptic plasticity

CHRNA7 cholinergic receptor, nicotinic, alpha polypeptide 7

activation of MAPK;synaptic transmission

CNR1 cannabinoid receptor 1 (brain) G-protein signaling, coupled to cyclic nucleotide second messenger

BDNF brain derived neurotrophic factor neurogenesis;regulation of long-term neuronal synaptic plasticity

DRD5 dopamine receptor 5 synaptic transmission;transmission of nerve impulse

GABBR1 gamma-aminobutyric acid (GABA) B receptor, 1

gamma-aminobutyric acid signaling pathway;synaptic transmission

GABRA5 gamma-aminobutyric acid A receptor, alpha 5

associative learning;synaptic transmission

GRIA4 glutamate receptor, ionotrophic, AMPA 4

glutamate signaling pathway;synaptic transmission

GRIN1 glutamate receptor, ionotropic, NMDA1 earning and/or memory;regulation of synaptic plasticity

GRIN2A glutamate receptor, ionotropic, NMDA2A (epsilon 1)

learning and/or memory;synaptic transmission

GRIN2B glutamate receptor, ionotropic, NMDA2B (epsilon 2)

learning and/or memory;synaptic transmission

97 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

NTF3 neurotrophin 3 brain development;signal transduction

PRNP prion protein (p27-30) pathogenesis;signal transduction

HTR2C 5-hydroxytryptamine (serotonin) receptor 2C

serotonin receptor signaling pathway;synaptic transmission

YWHAH tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, eta polypeptide

intracellular protein transport;protein kinase C activation

Schizophrenia and Alzheimer’s related genes after network

extension

Gene Symbol GO Biological Process Gene Name

NRG2 anti-apoptosis;cell-cell signaling neuregulin 2

ADRA2B G-protein coupled receptor protein

signaling pathway;cell-cell signaling

adrenergic, alpha-2B-, receptor

OLIG2 cell growth and/or

maintenance;regulation of

transcription\, DNA-dependent

oligodendrocyte lineage

transcription factor 2

SULF1 apoptosis;heparan sulfate proteoglycan

metabolism;lipid metabolism

sulfatase 1

GFRA2 transmembrane receptor protein

tyrosine kinase signaling pathway

GDNF family receptor alpha 2

TRIB1 regulation of MAP kinase activity Tribbles homolog 1

LGI1 cell proliferation;neurogenesis leucine-rich, glioma inactivated

1

NRG3 embryonic development;regulation of

cell growth;transmembrane receptor

protein tyrosine kinase ligand binding

neuregulin 3

NTF3 anti-apoptosis;brain development;cell

motility;cell-cell signaling;glial cell fate

determination;

neurotrophin 3

TDGF1 mesoderm cell fate determination;signal

transduction

teratocarcinoma-derived

growth factor 1

98 Appendix B. Over-represented disease genes ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

FGF13 cell-cell signaling;neurogenesis fibroblast growth factor 13

NRTN MAPKKK

cascade;neurogenesis;transmembrane

receptor protein tyrosine kinase

signaling pathway

neurturin

PRNP posttranslational membrane

targeting;regulation of transcription\,

DNA-dependent;signal transduction

prion protein (p27-30)

(Creutzfeld-Jakob disease,

Gerstmann-Strausler-Scheinker

syndrome, fatal familial

insomnia)

MFGE8 cell adhesion;oncogenesis milk fat globule-EGF factor 8

protein

SHC3 intracellular signaling

cascade;regulation of transcription\,

DNA-dependent

src homology 2 domain

containing transforming

protein C3

TACC1 cell growth and/or maintenance transforming, acidic coiled-coil

containing protein 1

THBs2 cell adhesion thrombospondin 2

RUSC1 development RUN and SH3 domain

containing 1

MFN2 biological_process unknown mitofusin 2

APOE learning and/or memory;regulation of

neuronal synaptic plasticity

apolipoprotein E

ANGPTL1 development angiopoietin-like 1

VGF biological_process unknown VGF nerve growth factor

inducible

BDNF neurogenesis brain-derived neurotrophic

factor

HTR1B G-protein signaling\, coupled to cyclic

nucleotide second messenger;synaptic

transmission

5-hydroxytryptamine

(serotonin) receptor 1B

MAPK6 cell cycle;protein amino acid

phosphorylation;signal transduction

mitogen-activated protein

kinase 6

ATP1A2 ATP hydrolysis coupled proton

transport;hydrogen ion homeostasis;

ATPase, Na+/K+ transporting,

alpha 2 (+) polypeptide

99

Appendix C

List of abbreviations

GOA Gene Ontology Annotation

GO Gene Ontology

DAG Directed Acyclic Graph

DO Disease Ontology

FTP File Transfer Protocol

UMLS Unified Medical Language System

HUGO Human Genome Organisation

OBO Open Biomedical Ontologies

GAD Genetic Association Database

OMIM Online Mendelian Inheritance in Man

PharmGKB Pharmacogenetics and Pharmacogenomics Knowledge Base

KEGG Kyoto Encyclopedia of Genes and Genomes

PubMed Publisher's MEDLINE

PK Pharmacokinetics

PD Pharmacodynamics

MeSH Medical Subject Headings

CSHL Cold Spring Harbor Laboratory

OWL Web Ontology Language

MGI Mouse Genome Informatics

SGD Saccharomyces Genome Database

100 Appendix C. List of abbreviations ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

RxList The Internet Drug Index

CAS RN Chemical Abstracts Service Registry Number

CREB cyclic AMP responsive element binding

AMPA α-amino-3-hydroxyl-5-methyl-4-isoxazole-propionate

NMDA N-methyl-D-aspartic acid

HD Huntington’s disease

UNIX UNiplexed Information and Computing System

EASE Expression Analysis Systematic Explorer

DAVID Database for Annotation Visualization and Integrated Discovery

EBI European Bioinformatics Institute

UniProtKB UniProtKnowledgebase

IPI International Protein Index

IPA Ingenuity Pathway Analysis

NCBO National Center for Biomedical Ontology

ICD9CM The International Classification of Diseases, Ninth Revision, Clinical Modification

SNOMED Systematized Nomenclature of Medicine-Clinical Terms

CNS Central Nervous System

HPA Hypothalamic-Pituitary-Adrenal axis

ACTH adrenocorticotropic hormone

ChEBI Chemical Entities of Biological Interest

CAS Chemical Abstract Service Registry Database

101

Bibliography

[1] S. Philippi and J. Kohler. Addressing the problems with life-science databases for traditional uses

and systems biology. Nat Rev Genet, 7(6):482- 8, 2006.

[2] M. A. Harris, J. Clark, A. Ireland, J. Lomax, M. Ashburner, R. Foulger, K. Eilbeck, S. Lewis, B.

Marshall, C. Mungall, J. Richter, G. M. Rubin, J. A. Blake, C. Bult, M. Dolan, H. Drabkin, J. T.

Eppig, D. P. Hill, L. Ni, M. Ringwald, R. Balakrishnan, J. M. Cherry, K. R. Christie, M. C.

Costanzo, S. S. Dwight, S. Engel, D. G. Fisk, J. E. Hirschman, E. L. Hong, R. S. Nash, A.

Sethuraman, C. L. Theesfeld, D. Botstein, K. Dolinski, B. Feierbach, T. Berardini, S. Mundodi, S.

Y. Rhee, R. Apweiler, D. Barrell, E. Camon, E. Dimmer, V. Lee, R. Chisholm, P. Gaudet, W.

Kibbe, R. Kishore, E. M. Schwarz, P. Sternberg, M. Gwinn, L. Hannick, J. Wortman, M.

Berriman, V. Wood, N. de la Cruz, P. Tonellato, P. Jaiswal, T. Seigfried, and R. White. The Gene

Ontology (GO) database and informatics resource. Nucleic Acids Res, 32(Database issue):D258-

61, 2004.

[3] T. R. Gruber. Towards principles for the design of ontologies used for knowledge sharing. In N.

Guarino and R. Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge

Representation, Deventer, The Netherlands, 1993. Kluwer Academic Publishers.

[4] P. Lambrix, M. Habbouche, and M. Perez. Evaluation of ontology development tools for

bioinformatics. Bioinformatics, 19(12):1564- 71, 2003.

[5] R. Mack and M. Hehenberger. Text-based knowledge discovery: search and mining of life-

sciences documents. Drug Discov Today, 7(11 Suppl):S89- 98, 2002.

102 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[6] M. Harris and H Parkinson. Standards and Ontologies for Functional Genomics: Towards Unified

Ontologies for Biology and Biomedicine. Comparative and Functional Genomics, 4(1):116- 120,

2003. doi:10.1002/cfg.249.

[7] M. Deng, Z. Tu, F. Sun, and T. Chen. Mapping Gene Ontology to proteins based on protein-

protein interaction data. Bioinformatics, 20(6):895- 902, 2004.

[8] O. Bodenreider and R. Stevens. Bio-ontologies: current trends and future directions. Brief

Bioinform, 7(3):256- 74, 2006.

[9] J. I. Clark, C. Brooksbank, and J. Lomax. It's all GO for plant scientists. Plant Physiol,

138(3):1268- 79, 2005.

[10] J. B. Bard and S. Y. Rhee. Ontologies in biology: design, applications and future challenges. Nat

Rev Genet, 5(3):213- 22, 2004.

[11] T. R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge

Acquisition, 5(2):199- 220, 1993.

[12] B. Andersen. What is an ontology? Ontology Works (http://www.ontologyworks.com), 2001.

[13] O. Bodenreider, J. A. Mitchell, and A. T. McCray. Biomedical ontologies. Pac Symp Biocomput,

pages 76-8, 2005.

[14] S. Schulze-Kremer. Ontologies for molecular biology and bioinformatics. In Silico Biol, 2(3):179-

93, 2002.

[15] J. S. Caldwell. Ontology recapitulates physiology. Chem Biol, 10(9):784-6, 2003.

103 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[16] D. Devos and A. Valencia. Intrinsic errors in genome annotation. Trends Genet, 17(8):429-31,

2001.

[17] C. Blaschke and A. Valencia. Automatic ontology construction from the literature. Genome

Inform, 13:201- 13, 2002.

[18] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. J. Goldberg, K. Eilbeck, A.

Ireland, C. J. Mungall, N. Leontis, P. Rocca-Serra, A. Ruttenberg, S. A. Sansone, R. H.

Scheuermann, N. Shah, P. L. Whetzel, and S. Lewis. The OBO Foundry: coordinated evolution of

ontologies to support biomedical data integration. Nat Biotechnol, 25(11):1251- 1255, 2007.

[19] B. Smith, W. Ceusters, B. Klagges, J. Kohler, A. Kumar, J. Lomax, C. Mungall, F. Neuhaus, A. L.

Rector, and C. Rosse. Relations in biomedical ontologies. Genome Biol, 6(5):R46, 2005.

[20] J. A. Blake and C. J. Bult. Beyond the data deluge: data integration and bio-ontologies. J Biomed

Inform, 39(3):314- 20, 2006.

[21] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K.

Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S.

Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene

ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1):25-

9, 2000.

[22] D. M. Jones and R. C. Paton. Toward principles for the representation of hierarchical knowledge in

formal ontologies. Data & Knowledge Engineering, 31(2):99-113, 1999.

[23] R. Stevens, C. A. Goble, and S. Bechhofer. Ontology-based knowledge representation for

bioinformatics. Brief Bioinform, 1(4):398-414, 2000.

104 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[24] J. Blake and M. Harris. The Gene Ontology Project: Structured vocabularies for molecular biology

and their application to genome and expression analysis. Baxevanis, A.D. Davison, D.B. Page, R.

Stormo, G. Stein, L.Current Protocols in Bioinformatics, Wiley & Sons, New York., 2003.

[25] K. Olden and S. Wilson. Environmental health and genomics: visions and implications. Nat Rev

Genet, 1(2):149- 53, 2000.

[26] O. Bodenreider. The Uni_ed Medical Language System (UMLS): integrating biomedical

terminology. Nucleic Acids Res, 32(Database issue):D267- 70, 2004.

[27] A. Hamosh, A. F. Scott, J. S. Amberger, C. A. Bocchini, and V. A. McKusick. Online Mendelian

Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic

Acids Res, 33(Database issue):D514- 7, 2005.

[28] K. G. Becker, K. C. Barnes, T. J. Bright, and S. A. Wang. The genetic association database. Nat

Genet, 36(5):431- 2, 2004.

[29] D. S. Wishart, C. Knox, A. C. Guo, S. Shrivastava, M. Hassanali, P. Stothard, Z. Chang, and J.

Woolsey. DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Nucleic Acids Res, 34(Database issue):D668- 72, 2006.

[30] M. Hewett, D. E. Oliver, D. L. Rubin, K. L. Easton, J. M. Stuart, R. B. Altman, and T. E. Klein.

PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res, 30(1):163- 5, 2002.

[31] R. B. Altman. PharmGKB: a logical home for knowledge relating genotype to drug response

phenotype. Nat Genet, 39(4):426, 2007.

[32] T. Hernandez-Boussard, M. Whirl-Carrillo, J. M. Hebert, L. Gong, R. Owen, M. Gong, W. Gor, F.

Liu, C. Truong, R. Whaley, M. Woon, T. Zhou, R. B. Altman, and T. E. Klein. The

105 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

pharmacogenetics and pharmacogenomics knowledge base: accentuating the knowledge. Nucleic

Acids Res, 2007.

[33] E. Castrén. Is mood chemistry? Nat Rev Neurosci 6(3):241-246, 2005

[34] D.S. Charney, H.K. Manji. Life stress, genes, and depression: multiple pathways lead to increased

risk and new opportunities for intervention. Sci STKE (225):re5, 2004

[35] R.S. Duman, G.R. Heninger, E.J. Nestler. A molecular and cellular theory of depression. Arch Gen

Psychiatry 54(7):597-606, 1997.

[36] E. J. Nestler, M. Barrot, R. J. DiLeone, A. J. Eisch, S. J. Gold, and L. M. Monteggia. Neurobiology

of depression. Neuron, 34(1):13-25, 2002.

[37] F. Holsboer. Stress, hypercortisolism and corticosteroid receptors in depression: implications for

therapy. J Affect Disord, 62(1-2):77-91, 2001.

[38] R. M. Sapolsky. Glucocorticoids and hippocampal atrophy in neuropsychiatric disorders. Arch Gen

Psychiatry, 57(10):925-35, 2000.

[39] A. A. Russo-Neustadt and M. J. Chen. Brain-derived neurotrophic factor and antidepressant

activity. Curr Pharm Des, 11(12):1495-510, 2005.

[40] K.M. Harris. Structure, development, and plasticity of dendritic spines. Curr Opin Neurobiol,

9:343–348, 1999.

[41] H. Hering and M. Sheng. Dentritic spines : structure, dynamics and regulation. Nature Reviews

Neuroscience 2, 880-888, 2001.

[42] E. Castrén. Neurotrophic effects of antidepressant drugs. Curr Opin Pharmacol 4(1):58-64, 2004.

106 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[43] J.E Malberg, A.J. Eisch, E.J. Nestler, R.S. Duman. Chronic antidepressant treatment increases

neurogenesis in adult rat hippocampus. J Neurosci 20(24):9104-9110, 2000.

[44] H. van Praag, G. Kempermann, F.H. Gage. Neural consequences of environmental enrichment. Nat

Rev Neurosci 1(3):191-198, 2000.

[45] L. Santarelli, M. Saxe, C. Gross, A. Surget, F. Battaglia, S. Dulawa, N. Weisstaub, J. Lee, R.

Duman, O. Arancio, C. Belzung, R. Hen. Requirement of hippocampal neurogenesis for the

behavioral effects of antidepressants. Science 301(5634):805-809, 2003.

[46] A.K. McAllister, D.C. Lo, L.C. Katz. Neurotrophins regulate dendritic growth in developing visual

cortex. Neuron 15:791–803, 1995.

[47] H.W. Horch, L.C. Katz. BDNF release from single cells elicits local dendritic growth in nearby

neurons. Nat Neurosci 5(11):1177–84, 2002.

[48] A.K. McAllister. Cellular and molecular mechanisms of dendrite growth. Cereb Cortex 10:963–73,

2000.

[49] G. Dennis Jr, B.T. Sherman,

D.A. Hosack,

J. Yang,

W. Gao,

H.C. Lane,

and R.A. Lempicki.

DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol

4(9):R60, 2003.

[50] DAVID [http://www.DAVID.niaid.nih.gov]

[51] G. Elvidge. Microarray expression technology: from start to finish. Pharmacogenomics 7(1):123-

34, 2006

[52] R.B. Stoughton. Applications of DNA microarrays in biology. Annu Rev Biochem. 74:53-82, 2005

107 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[53] W. Weckwerth, K. Morgenthal. Metabolomics: from pattern recognition to biological

interpretation. Drug Discov Today. 10(22):1551-8, 2005

[54] C.R. Stubberfield, M.J. Page. Applying proteomics to drug discovery. Expert Opin Investig Drugs.

Jan;8(1):65-70, 1999

[55] M.R. Flory, R. Aebersold. Proteomic approaches for the identification of cell cycle-related drug

targets. Prog Cell Cycle Res 5:167-71, 2003

[56] J.F. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, G.F. Berriz, F.D.

Gibbons, M. Dreze, N. Ayivi-Guedehoussou, N. Klitgord, C. Simon, M. Boxem, S. Milstein, J.

Rosenberg, D.S. Goldberg, L.V. Zhang, S.L. Wong, G. Franklin, S. Li, J.S. Albala, J. Lim, C.

Fraughton, E. Llamosas, S. Cevik, C. Bex, P. Lamesch, R.S. Sikorski, J. Vandenhaute, H.Y.

Zoghbi, A. Smolyar, S. Bosak, R. Sequerra, L. Doucette-Stamm, M.E. Cusick, D.E. Hill, F.P.

Roth, M. Vidal, Towards a proteome-scale map of the human protein-protein interaction network.

Nature 20;437(7062):1173-8, 2005

[57] S.E. Calvano, W. Xiao, D.R. Richards, R.M. Felciano, H.V. Baker, R.J. Cho, R.O. Chen, B.H.

Brownstein, J.P. Cobb, S.K. Tschoeke, C. Miller-Graziano, L.L. Moldawer, M.N. Mindrinos, R.W.

Davis, R.G. Tompkins, S.F. Lowry. Inflamm and Host Response to Injury Large Scale Collab. Res.

Program. A network-based analysis of systemic inflammation in humans. Nature

13;437(7061):1032-7, 2005.

[58] E.F. Moore (1959) The Shortest Path Through a Maze. Proc. International Symposium on the

Theory of Switching,Part II, Vol. 30 of “The Annals of the Computation Laboratory of Harvard

University”, Cambridge, MA, Harvard University Press, 1959

[59] A. Nikitin, S. Egorov, N. Daraselia and I. Mazo. Pathway studio - the analysis and navigation of

molecular networks. Alexander Bioinformatics 19 (0):1-3, 2003.

108 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[60] T.L. Spires and A.J. Hannan. Molecular mechanisms mediating pathological plasticity in

Huntington's disease and Alzheimer's disease. Journal of Neurochemistry 100:874–882, 2007.

[61] M. Fischer, S. Kaech, U. Wagner, H. Brinkhaus and A. Matus. Glutamate receptors regulate actin-

based plasticity in dendritic spines. Nature Neuroscience 3:887–894, 2000.

[62] R Klein. Eph/ephrin signaling in morphogenesis, neural development and plasticity. Current

Opinion in Cell Biology 16(5):580–589, 2004.

[63] L.A. Glantz, D.A. Lewis. Reduction of synaptophysin immunoreactivity in the prefrontal cortex of

subjects with schizophrenia. Regional and diagnostic specificity. Arch Gen Psychiatry 54: 943–

952, 1997.

[64] D.A. Lewis, D.A. Cruz, D.S. Melchitzky, J.N. Pierri. Lamina-specific deficits in parvalbumin-

immunoreactive varicosities in the prefrontal cortex of subjects with schizophrenia: evidence for

fewer projections from the thalamus. Am J Psychiatry 158: 1411–1422, 2001.

[65] G. Rajkowska, L.D. Selemon, P.S. Goldman-Rakic. Neuronal and glial somal size in the prefrontal

cortex: a postmortem morphometric study of schizophrenia and Huntington disease. Arch Gen

Psychiatry 55: 215–224, 1998.

[66] J.N. Pierri, C.L. Volk, S. Auh, A. Sampson, D.A. Lewis. Decreased somal size of deep layer 3

pyramidal neurons in the prefrontal cortex of subjects with schizophrenia. Arch Gen Psychiatry 58:

466–473, 2001.

[67] M. Segal, V. Greenberger, E. Korkotian. Formation of dendritic spines in cultured striatal neurons

depends on excitatory afferent activity. Eur J Neurosci 17: 2573–2585, 2003.

109 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[68] A.R. Sweet, R.A. Henteleff, W. Zhang, A.R. Sampson and D.A. Lewis. Reduced Dendritic Spine

Density in Auditory Cortex of Subjects with Schizophrenia. Neuropsychopharmacology 34, 374–

389, 2009.

[69] D.J. Selkoe, A. Triller, C. Yves. Synaptic Plasticity and the Mechanism of Alzheimer's Disease.

Springer (Eds.), 2008.

[70] J. Coyle , R. Duman. Finding the Intracellular Signaling Pathways Affected by Mood Disorder

Treatments . Neuron 38, 157 – 160, 2003

[71] M. Ribasés, M. Gratacòs, F. Fernández-Aranda, L. Bellodi, C. Boni, M. Anderluh, M.C. Cavallini,

E. Cellini, D. Di Bella, S. Erzegovesi, C. Foulon, M. Gabrovsek, P. Gorwood, J. Hebebrand,

A. Hinney, J. Holliday, X. Hu, A. Karwautz, A. Kipman, R. Komel, B. Nacmias, H. Remschmidt,

V. Ricca, S. Sorbi, M. Tomori, G. Wagner, J. Treasure, D. A. Collier and X. Estivill. Association

of BDNF with restricting anorexia nervosa and minimum body mass index: a family-based

association study of eight European populations. Eur J Hum Genet 13, 428–434, 2005.

[72] A.C. Conner, C. Kissling, E. Hodges, R. Hünnerkopf, R.M. Clement, E. Dudley, C.M. Freitag, M.

Rösler, W. Retz, J. Thome. Neurotrophic Factor-Related Gene Polymorphisms and Adult Attention

Deficit Hyperactivity Disorder (ADHD) Score in a High-Risk Male Population. Am J Med Genet B

Neuropsychiatr Genet. 147B(8):1476-1480, 2008.

[73] M. Dierssen, M. Gratacos, I. Sahun, M. Martin, X. Gallego, A. Amador-Arjona, M. Martinez de

Lagran, P. Murtra, E. Marti, M. A. Pujana, I. Ferrer, E. Dalfo, C. Martinez-Cue, J. Florez, J. F.

Torres-Peraza, J. Alberch, R. Maldonado, C. Fillat, X. Estivill, Transgenic mice overexpressing the

full-length neurotrophin receptor TrkC exhibit increased catecholaminergic neuron density in

specific brain areas and increased anxiety-like behavior and panic reaction. Neurobiology of

Disease 24(2):403-418, 2006.

110 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________

[74] T. Frodl, P. Zill, T. Baghai, C. Schüle, R. Rupprecht, T. Zetzsche, B. Bondy, M. Reiser, H.J.

Möller, E.M. Meisenzahl. Reduced hippocampal volumes associated with the long variant of the

tri- and diallelic serotonin transporter polymorphism in major depression. Am J Med Genet B

Neuropsychiatr Genet. 147B(7):1003-1007, 2008

[75] D . Rujescu, I. Giegling, A. Gietl, C. Gonnermann, A. Kirner and H.J. Möller. Association study of

a SNP coding for a M129V substitution in the prion protein in schizophrenia. Schizophrenia

Research 62(3):289-291, 2003

111 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________