Upload
lamduong
View
213
Download
0
Embed Size (px)
Citation preview
UNIVERSITA' DEGLI STUDI DI
PADOVA
Facoltà di Scienze MM. FF. NN.
Centro Ricerche Interdipartimentale Biotecnologie
Innovative (CRIBI)
SCUOLA DI DOTTORATO DI RICERCA IN BIOCHIMICA E BIOTECNOLOGIE
INDIRIZZO IN BIOTECNOLOGIE
CICLO XX
DEVELOPMENT OF AN INTEGRATED DISEASE
ONTOLOGY KNOWLEDGEBASE AND ITS APPLICATION
TO STUDY MECHANISMS OF NEUROPSYCHIATRIC
DISORDERS
Direttore della Scuola Ch.mo Prof. Giuseppe Zanotti
Supervisore Ch.mo Prof. Giorgio Valle
Dottorando Fabrizio Caldara
- i -
Abstract
Production and distribution of scientific information has grown exponentially in the recent
years. PubMed, a service of the U.S. National Library of Medicine that includes over 18 million
Medline citations to journal articles, has been extending its coverage to some 40.000 abstracts
in life sciences and biomedical literature every month.
The information age allowed storage and dissemination of huge amount of data but our
ability to extract and process knowledge remained constant. We make inferences on
uncharacterised observations by recording and using natural language, which unfortunately is
rarely adequate. Furthermore, biomedical research is characterised by highly specialised
disciplines with limited communication among them and poorly shared resources.
These many aspects draw attention to the real need of integration, a general concept with
many definitions. In the context of my PhD, integration is intended as the process by which data
from one source can be exchanged, interpreted or manipulated by another, in a way that make
sense to the users in their interaction with the system.
Biomedical ontologies (OBO) in general and the Gene Ontology (GO) in particular, have
been fundamental components of an important information integration effort started in year
2000 with the ambitious goal to build a tool for the unification of biology and beyond. My PhD
project, standing on the shoulders of those initiatives, has been focused on the development of a
human-readable knowledgebase system that hopefully would facilitate exploitation of biological
experimental data. This resource relies on information extracted from many databases, mostly
manually curated, and uses an ontology of human diseases (i.e. the ‘Disease Ontology’) as a
backbone of the system. The objective is providing some support to the scientific biomedical
community in the interpretation of data on human diseases and their correlated genes, possibly
delivering information on available interacting drugs.
To test the system meanwhile evaluating its value, real research case was investigated in the
second part of my PhD work.
Functional analysis of inherently complex high-throughput data sources for systems biology
(e.g. microarray) is a fundamental step to understand mechanisms regulating molecular
processes modulated in diseases and pathological states. Nonetheless, advances at any level
relevant to disease understanding and drug discovery for psychiatric disorders in recent years
have been relatively unsuccessful compared with other areas. Therefore, a suitable
computational strategy sustained by the newly developed resource was designed to allow
investigation of the involvement in dendritic plasticity of specific disease genes, their
mechanisms of action and the available drugs they are known to interact with.
Dendritic plasticity, an important component of the central nervous system function during
development, has been recently postulated to be strongly involved in pathogenesis of psychiatric
diseases. The concept of plasticity spans a broad spectrum from describing clinical features of
behavior/learning and memory down to the molecular mechanisms by which neurons create and
lose synapse connections between one another.
The chosen approach allowed the semi-automated identification of a great number of genes
involved in plasticity mechanism at the molecular level. At the same time it also allowed
preliminary validation of the newly developed Disease Ontology Knowledgebase and an
evaluation of its potentialities.
- iii -
Abstract
In questi ultimi anni, la produzione e distribuzione di dati scientifici è cresciuta
esponenzialmente. PubMed, un servizio della U.S. National Library of Medicine che include
ormai oltre 18 milioni di citazioni estratte da Medline, incrementa il proprio contenuto di circa
40.000 estratti da pubblicazioni scientifiche o biomediche ogni mese.
L’avvento dell’era dell’informazione ha permesso di accumulare e disseminare enormi
quantità di dati, ma la nostra capacità di ricavarne conoscenza è rimasta costante. Le nostre
inferenze che nascono dall’osservazione si basano spesso sull’uso del linguaggio verbale che
raramente risulta adeguato. Inoltre, la ricerca biomedica è caratterizzata da discipline
fortemente specializzate che raramente comunicano o condividono risorse.
Tutti questi aspetti aiutano a rivolgere l’attenzione sulla reale necessità di integrare
informazioni, un concetto generale con molte definizioni. Nel contesto del mio dottorato, per
integrazione si intende il processo attraverso il quale i dati possono essere scambiati,
interpretati e manipolati pur rimanendo comprensibili da chi utilizza il sistema.
Le ontologie biomediche in generale e la Gene Ontology in particolare sono state una
componente fondamentale di un importante sforzo di integrazione di informazioni di tipo
biologico iniziato nel 2000 con l’ambizioso obiettivo di sviluppare uno strumento per
l’unificazione della biologia e oltre. Il mio progetto di dottorato, accompagnandosi a questa
iniziativa, si è focalizzato sullo sviluppo di un particolare tipo di database (knowledgebase) che
possa facilitare l’esplorazione di specifici dati sperimentali.
Il sistema si sviluppa sulla base di informazioni estratte da numerose fonti di dati per buona
parte curate manualmente, usando come struttura portante un’ontologia di malattie umane
(Disease Ontology). Lo scopo è quello di fornire supporto alla comunità scientifica biomedica
per l’interpretazione dei dati relativi a malattie umane, ai geni a queste ricollegabili e ai farmaci
in grado di curarle.
Nella seconda parte del dottorato è stata approfondita una specifica tematica di ricerca utile
per provare il sistema e valutarne le reali possibilità.
L’analisi funzionale di dati complessi prodotti con tecnologie high-throughput come i
microarray, risulta fondamentale per comprendere i meccanismi di regolazione dei processi
molecolari implicati negli stati patologici. Tuttavia, nonostante la disponibilità di validi
strumenti di indagine, nel campo delle malattie psichiatriche non si sono avuti gli stessi rilevanti
progressi, utili per comprenderne i meccanismi patologici, ottenuti invece in altre aree di
ricerca.
Pertanto, una adeguata strategia computazionale, abbinata al recente sviluppo della risorsa
oggetto di questo lavoro, è stata disegnata per consentire un’indagini sul coinvolgimento di
alcuni specifici geni, meccanismi e farmaci nella causa o la cura della patologia psichiatrica.
La plasticità dendritica è una componente importante nel funzionamento del sistema
nervoso centrale durante lo sviluppo, ed è stato recentemente postulato che possa essere
fortemente coinvolta nella patogenesi delle malattie legate al sistema nervoso centrale.
Il concetto di plasticità abbraccia un ampio spettro di caratteristiche cliniche che descrivono
aspetti del comportamento, dell’apprendimento e della memoria fino ai meccanismi molecolari
con cui i neuroni creano o perdono le loro sinapsi.
La strategia scelta ha consentito di identificare in modo semi-automatico un grande numero
di geni coinvolti a livello molecolare nel meccanismo della plasticità dendritica e ha permesso
allo stesso tempo la verifica, in certa misura e in via preliminare, delle qualità e delle
potenzialità del knowledgebase sviluppato.
- v -
Acknowledgements
My first thank you goes to Prof. Giorgio Valle for the possibility he offered me to do this
PhD. For their collaboration and help in the development of the database I wish to thank Dr.
Erika Feltrin and Dr. Alessandro Albiero. I would like to also thank Dr. Andrea Telatin for the
preliminary database interface.
Another deserved thank to Chris Hastwell that always supported me and this project with
unconditioned trust.
I would like to thank my wife Laura for her incomparable encouragement in any
circumstance, my children Giorgia and Tommaso for their perseverance in reminding me real-
life priorities, my mother Maly for having been always there to help me when needed.
Finally, I want to remember and thank you my father that shared the beginning, but sadly
not the end of this period of my life.
vii
Contents
1 Introduction…………………..……………………………..………………………………..1
1.1 Data integration and ontologies……………………………………….……………..1
1.1.1 The word ontology…………………………………………………..……….2
1.1.2 Ontologies in modern research………………………...…………………..2
1.2 Open Biomedical Ontologies………………………………………...……………….6
1.3 The Gene Ontology project…………………………………………..……………….9
1.4 Gene Ontology Annotation (GOA)………………………………..………………..10
2 Disease Ontology Knowledgebase……………………….…………..………………..13
2.1 Introduction………………………………………………………………………….13
2.2 Basic resources………………………………………….……………………………14
2.2.1 Disease Ontology……………………………………………………………14
2.2.2 Online Mendelian Inheritance in Man (OMIM)……………..………….17
2.2.3 Genetic Association Database……………………………………………..17
2.2.4 DrugBank………………...………………………………………………….19
2.2.5 PharmGKB…………………………………………………………………..19
viii CONTENTS ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
2.3 Methods and Results………………………..……….…………………...…………20
2.3.1 Retrieval and integration of data…………………………………….……21
2.3.2 Acquisition of disease name synonyms…………….…………………….22
2.3.3 Gene annotation findings and gene-disease relationships……………..26
2.3.4 Compilation of the drug dictionary………………….……………………27
2.3.5 Finding correlations between drugs and diseases………….…...………28
2.3.6 Identifying relationships between drugs and target genes……...……..30
2.4 Possible applications………………………………........……………..……………31
3 Case study……………………………………………...…………………………………….35
3.1 Introduction…………………….………...………………………………………….35
3.2 Background………………………..…………….…………………………………...36
3.2.1 Neuropsychiatric disorders and mechanisms of regulation……………37
3.2.2 Dendritic plasticity………………………...………….………..………….38
3.2.3 Factors influencing dendritic plasticity………………………….…..…..39
3.3 Methods and results (Part I)……………………………….…………………..40
3.3.1 Selection of query terms and creation of a gene list………….…………41
3.3.2 Annotation and selection of the Dendritic Plasticity gene dataset…....42
3.3.3 Identification of the correlations gene-disease……...…………………..44
3.3.4 Data validation using alternative methods………...…………………….47
3.4 Methods and Results (Part II)……………………..………………...………….50
3.4.1 Introduction to pathway and network analyses..……………….………50
CONTENTS ix ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.4.2 Databases and tools for Pathway/Network analysis……..…………..…51
3.4.3 Canonical pathways analysis………………………….…………..………52
3.4.4 Gene networks………………………………..……..……..……………….55
3.4.5 Drug/nutraceutical interactors…………………………..……….....……60
3.5 Discussion………………..………………………………………………..….………61
4 Conclusions………………………………….………………………………………………63
A Dendritic Plasticity gene dataset…………………………..………………………….67
B Over-represented disease genes…………………………………..…...……………...95
C List of abbreviations……………………….……………………………………………..99
Bibliography……………………………………………………….……………………….101
1
Chapter 1
Introduction
This document is divided in 4 chapters. The initial one is an introduction to ontologies as
they are intended and used in the context of computer science and biomedical research. Chapter
2 is focused on the annotation of the Disease Ontology and the development of the Disease
Ontology Knowledgebase. Chapter 3 concerns the validation of the system; a research case study
focused on neuropsychiatric diseases is described in some details. Conclusions are presented in
the final chapter of the document.
1.1 Data integration and ontologies
The exponential increase of data-based information, owing to fast biotechnological advances
and to high-throughput technologies, in addition to the coming of the World Wide Web as a new
means for data exchange, made it more complex and difficult to ascertain the biological meaning
covered in the heterogeneous biological data available to the scientific community. Moreover the
huge amounts of information, that are now produced on a daily basis, require more advanced
management solutions, and the availability of the web as a modern infrastructure for scientific
exchange has created new requirements with respect to data accessibility [1].
Concurrently, in the era of genome-scale biology, the aggregation of biological data is
followed by the distributed proliferation of biology-oriented databases [2]. Therefore, to make
the most effective use of such databases and the knowledge they incorporate, different kind of
information from different sources must be merged in ways that make sense to life scientists. In
that respect, the consolidation of data from the existent databases has long been acknowledged
2 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
as a significant component in the life science studies and different technologies and approaches
to data integration have been prosecuted over the past decade [1].
A major component of the integration effort is the development and use of annotation
criteria such as ontologies.
1.1.1 The word ‘ontology’
The word 'ontology' descends by the Greek ontos (being) and logos (word) and its
conceptual origin can be traced back to early philosophers which have been studying the theory
of objects and their ties for centuries. In philosophy, ontology is used to name the discipline that
tries to describe reality.
The term 'ontology' however is still disputable since different people have different ideas on
its significance and definition in different linguistic context. The first formal and explicit
approach to ontologies in the technical (not philosophic) sense goes back to 1900, given by
Husserl. Later in the 1980's, the ontologies got into the computer science domain as a way to
offer a simplified and clear view of a particular field of interest.
There is certain consensus on what an ontology is not: it is not a taxonomy (i.e. just a class-
subclass hierarchy), a dictionary (ontology includes relationships between terms), nor a
knowledgebase that includes individual objects. According to Gruber, ontology is 'the
specification of conceptualisations, used to help programs and humans share knowledge' [3].
Today ontologies are more formalised conceptual models used in computer science,
database integration, and artificial intelligence and they make accessible a common terminology,
across a domain, necessary for communication between people and organisations. They provide
the foundation for interoperability between systems. They can be used to make the content in
information sources explicit and serve well as an index to a repository of information [4].
1.1.2 Ontologies in modern research
Many decades ago, the main drive of bioinformatics was to store, retrieve and analyse the
1.1. Data integration and ontologies 3 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
data created by life scientists; data such as nucleotide sequences and protein structures. At that
time, the limited quantity of data acquired by biological investigators, required elementary
systems for their management, organisation and analysis. However, the advent of the genome
sequencing projects, high-throughput experiments, and other techniques gave rise to a huge
amount of data that necessitated to be analysed. Today, bioinformatics systems have to deal with
once inconceivable quantities of complex information, unmanageable for a scientist without
advanced knowledge of management and information processing tools [5]. Such data are rising
at an exponential rate but the knowledge contained in them is not maturating at an equivalent
pace. There are different reasons for this deficiency of productive knowledge and the most
significant is that biological phenomena can be described in many different ways [6] and this
complexity has not been tackled semantically. This means that usually the life scientists are left
with a giant realm of information that they cannot access, analyse, or integrate in a sensible way
[7].
The impossibility of drawing on information from the data available, contributes additional
pressure to implement standardised and compatible nomenclature in molecular biology. The
central problem is that biomedical scientists gather facts, often recording them in natural
language, and then use that knowledge to make inferences about yet uncharacterised
observations. Because of this, knowledge is extremely heterogeneous. While it is easy to
compare, for instance, nucleic acid or polypeptide sequences between bioinformatics resources,
the knowledge content of these resources is very difficult to compare, both for humans and
computers, because the knowledge is represented in a wide variety of lexical forms [8].
Often in biology, a word refers to two different concepts: for example, the concept of
'gametogenesis' means different processes in mammals or in plants and a user, querying a
database for this concept, needs to deal with these terminological and conceptual
incompatibilities. This situation makes it more complicated for a computer to process
information because it would not be capable to reason over the data and simply capture the
knowledge content.
Thus, there is urgent demand for strategies suitable for the representation of biological
knowledge in formal manner [9]. One possibility of capturing that knowledge within
computational applications and databases in biology can be identified in the use of ontologies,
4 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
which in last years has driven the maturation of the 'bio-ontologies' and promoted a relatively
new area of bioinformatics [10].
An ontology is a 'controlled vocabulary' that provides a way to capture and represent the
knowledge of a domain in a computer-comprehensive way. An ontology describes objects and
the relations between them in a formal way, and has a grammar for using the vocabulary terms
to express something meaningful within a specific domain of interest [11]. The labels used for
the objects and the relationships in an ontological model can provide a language for a
community to talk about the domain being modelled. By agreeing on a particular ontological
representation, a common vocabulary can be used to describe and ultimately analyse data. Such
sharing has obvious benefits because it helps humans to make inferences about a studied
domain.
The data, that are clues for enriching the knowledge about the domain, become much easier
to handle as the same things are referred to in the same manner across the resources in which
those data are stored. If different biological databases use the same ontologies to describe their
data objects, the bio-ontologies can be used to link the databases and retrieve information from
them. Ultimately, since ontologies give a well-defined semantics for the knowledge
representation language, machine can make inferences about the facts expressed in that
language [8].
Ontologies are designed for the domain and application that they are intended to support,
however, it is forth pointing out that, for any ontology to be valuable, it has to be defined
following specific rules and assertions. There are several fundamental characteristics that an
ontology must possess to be considered complete and ready to be widely used [12]:
• Completeness: ontologies are designed to capture the maximum quantity of relevant
concepts for the domain they represent;
• Formalism: ontologies are built using mathematical formalisms, making them readable by
computer machines;
• Understandability (by humans): ontologies are built using natural language terms, making
1.1. Data integration and ontologies 5 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
them accessible to scientists;
• Freedom: ontologies aim to represent conceptual domains independently of any specific
use or implementation [13].
Figure 1.1: Interplay between ontologies, biology,
computer science and philosophy. Molecular biologists
discover facts that need to be organised and stored in
databases. Computer scientists provide techniques for data
representation and manipulation. Philosophers and
linguists help in organising the meaning behind database
labels [14].
Therefore, the development of an ontology requires in depth subject knowledge, computer
science skills to provide techniques for data representation and manipulation, and
philosophy/linguistics understanding to organise semantics behind data labels. The interplay
between all this disciplines is illustrated in figure 1.1.
Finally, it is worth pointing out that an ontology aiming to be of public interest, has to be
widely acknowledged by the community of the specific domain that it tries to represent.
6 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Furthermore, the entire scientific community needs to be strongly involved in the improvement
of a newly created ontology and is expected to promote the concept that only single ontologies
for each area should be placed in the public domain.
Number and variety of ontologies will most probably grow in the following years. They have
widely revealed themselves as useful tools not only to successfully integrate different resources
but also to create knowledge and accomplish predictions [15]. It has been for instance
demonstrated that functional annotation of new sequences based on sequence similarity is not
optimal [16] while semantic methods based on ontologies and applied to the same task can
represent a real improvement [17].
1.2 Open Biomedical Ontologies
The Open Biomedical Ontologies Foundry1 initiative gave shape to some principles partly
described above as relevant for an ontology to be of general interest, such as being widely
disseminated and accepted among users of the field that it aims to describe. The OBO Foundry is
a collaborative experiment involving developers of science-based ontologies who are
establishing a set of principles for ontology development. The goal is creating a suite of
orthogonal interoperable reference ontologies for the biomedical domain. It has been, and still
is, a strong community effort devoted to ensure wide ontological coverage on one side and to
avoid duplication of activities on the other. Some of the many OBO Foundry candidate
ontologies are reported in Table 1.2.
The aim of this initiative, focused on object-level questions, is to represent in an exhaustive
way the proteins, organisms, diseases or drug interactions that are of primary interest in
biomedical research [18].
1http://obofoundry.org
1.2. Open Biomedical Ontologies 7 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Table 1.2: Summary of some of the many groups developing ontologies who have expressed an interest in
OBO Foundry goal.
As a tangible result, the Open Biomedical Ontologies (OBO) library is now a unique
collection of controlled vocabularies shared across different biological and medical domains that
forms the basis of the OBO Foundry. The main role of the OBO is to be the reference resource of
ontologies in the biological science domain. It is supported by the NIH Roadmap National
Center for Biomedical Ontology (NCBO) through its BioPortal and it is continually kept up-to-
date by ontology-based developers. There are currently over 60 live-science ontologies lodge in
OBO, covering domains such as anatomy, development and phenotype, genomic and proteomic
information and taxonomic information. All of them use a range of different attributes to
describe the respective biological domain.
There are many resources available under the OBO umbrella, and most of these are shown
in figure 1.3, in which OBO have been roughly arranged along a spectrum of genotype to
phenotype. To be included in OBO Foundry, an ontology has to be developed following a set of
8 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
principles that are used to give coherence to wider ontological efforts across the community:
• Openness: ontologies must be available to all, without any constraint or license on their
use and it is only asked that users acknowledge the original source. This encourages usage
and community buy-in and effort;
• Common representation: this is either the OBO format2 or the Web Ontology Language
(OWL)3. This provides common access via open tools and offers common semantics for
knowledge representation;
• Independence: lack of redundancy across separate ontologies encourages combinatorial
re-use of ontologies and the interlinking of ontologies via relationships;
• Identifiers: each term should have a semantic-free identifier, the first part of which refers
to the originating ontology. This promotes easy management;
• Natural language definitions: terms themselves are often ambiguous, even in the context of
their ontology, and definition helps ensure appropriate interpretation. Thus, the terms in
each ontology must have a proper textual definition explaining clearly the exact meaning
of the concept within the context of a particular ontology.
2http://www.geneontology.org/GO.format.shtml#oboflat
3http://www.w3.org/TR/owl-features/
1.3. The Gene Ontology project 9 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Figure 1.3: The OBO ontologies arranged on a spectrum of genotype to phenotype, according
to their main domain [8].
1.3 The Gene Ontology project
The Gene Ontology4 (GO) project began in 1998 as a collaborative effort between three
model organism databases: FlyBase (Drosophila), the Saccharomyces Genome Database (SGD)
and the Mouse Genome Informatics (MGI) project [21]. Since then, many databases have joined
the GO Consortium including several of the world's major repositories for plant, animal and
microbial genomes.
Nowadays, the GO is the most successful OBO ontology and it is used in several studies
including expression profile analysis and proteomic studies to extract additional knowledge
from the huge amount of data available.
4http://www.geneontology.org
10 Chapter 1. Introduction ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
The GO project moved its first step from the consideration that a large fraction of the genes,
derived by genomic sequencing and specifying the core of biological functions, are shared by all
organisms.
At the moment, many robust methods are at hand for automated transferring of biological
annotations from the experimentally tractable model organisms to the others based on gene and
protein sequence similarity. The knowledge accumulated can be often transferred across
organisms but there is a wide range of hurdles to overcome. First, the current system of
nomenclature for genes and their products is not followed correctly. Even when an underlying
similarity between two genes can be appreciated, the experts are not very confident in using the
right nomenclature. Secondly, the lack of the interoperability between genomic databases limits
the use of the content of these databases. The Gene Ontology project was formed to help in the
solution of these major barriers.
The GO project has three main goals:
i) To develop and maintain a set of controlled and structured vocabularies, or ontologies
[22, 23], for the description of genes and gene products
ii) To use these vocabularies to annotate genes and gene products in biological database
from as many species as possible
iii) To provide a public resource allowing access to ontologies, to gene annotation files and
to specific tools developed to utilise all GO data [24]
1.4 Gene Ontology Annotation (GOA)
Data annotation is primarily progressed for species-specific database resources, such as the
Mouse Genome Informatics and FlyBase, and in multispecies resources such as Uniprot. The
complete list of contributing database groups and the total numbers of annotations are listed on
1.4. Gene Ontology Annotation (GOA) 11 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
the GO web page5. Among such contributors, there is the GOA group located at the European
Bioinformatics Institute (EBI)6.
The Gene Ontology Annotation (GOA)7 project aims to provide high-quality GO annotations
to proteins of the UniProtKnowledgebase (UniProtKB)8 and the International Protein Index
(IPI)9. It is also a central dataset for other major multi-species databases such as Ensembl10 and
NCBI11.
GOA has been a member of the GO Consortium since 2001, and is responsible for the
integration and release of GO annotations to the human, chicken and cow proteomes. GOA is
also committed to the comprehensive annotation of a set of disease-related proteins in human.
High-quality GO annotations are generated through a combination of electronic and manual
techniques, the latter being accomplished by expert biologists.
By annotating all characterised proteins with GO terms and facilitating the transfer of this
knowledge to similar uncharacterised proteins, the Uniprot group will make a valuable
contribution to biological and biotechnological research through a better understanding of all
proteomes.
5http://www.geneontology.org/GO.current.annotations.shtml
6http://ebi.ac.uk/
7http://www.ebi.ac.uk/GOA/
8http://www.ebi.ac.uk/uniprot/index.html
9 http://www.ebi.ac.uk/IPI
10http://www.ensembl.org/
11http://www.ncbi.nlm.nih.gov/
13
Chapter 2
Disease Ontology Knowledgebase
This second chapter describes in some details the development of the Disease Ontology
Knowledgebase, a computational resource useful to represent relations between genes, drugs
and diseases to help understand mechanisms of diseases. The data sources are described in the
first part. There is then a section on how the data were collected and organised in the database.
Last part finally suggests possible applications to make full use of the system.
2.1 Introduction
Last years development and implementation of high-throughput functional genomic
technologies have resulted in the rapid accumulation of genome-scale data sets. Simultaneously
linkage analysis and association studies that identify disease-associated genes are generating
increasingly large candidate gene sets that need to be exploited. It remains however a difficult
task to identify the most likely gene-disease relationship since the etiology of most chronic
diseases involves interaction of environmental factors and genes that modulate important
biological processes [25]. This is even more complicated by the not well understood molecular
mechanisms underlying the correlation between chemicals and diseases.
Additional limitation is that scientists involved in different research fields, are currently
hampered by the specialization of their technical language. For example, a physician trying to
collect information on gene products correlated to ‘Epilepsy’ might find that same genes are also
relevant for 'Febrile Seizure' and 'Unverricht-Lundborg Disease' without knowing that the latter
14 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
is just a correct synonym for a type of ‘Epilepsy’. A strictly correlated problem is polysemy,
which is the ambiguity of an individual word or phrase that can be used in different contexts
with different meanings. As a result, there is a major, continuing need to aggregate and annotate
data on genes, drugs, diseases and their interactions to generate new knowledge.
To make the best use of biological databases, different kinds of information from
different sources must be integrated in ways that make sense to the entire scientific community.
Ontologies are a valuable possibility for data integration [2]. Following the example of Gene
Ontology Annotation (GOA) project, our goal is to classify and represent gene-drug, gene-
disease or gene-drug-disease associations in a standardised way using ontologies. The idea is to
associate genes both related to disorders and regulated by drug treatment using the terms of the
Disease Ontology (DO), with the final objective of building a knowledge base of genes, drugs and
targets to help the investigation of the molecular processes relevant to diseases.
2.2 Basic resources
Many data sources focused on gene data, drugs and diseases, such as ontologies and
specialised databases were evaluated to develop the knowledgebase and five were selected; three
to build a vocabulary of disease names, and two others to develop an equivalent dictionary for
drug names.
2.2.1 Disease Ontology
The Disease Ontology1 (DO) is a controlled medical vocabulary modelled on the GO
structure and developed at the Bioinformatics Core Facility, in collaboration with the NuGene
Project, at the Center for Genetic Medicine (Chicago, US). It was designed to facilitate the
mapping of diseases and associated conditions to particular medical codes such as ICD9CM2
1http://diseaseontology.sourceforge.net/#projects
2The International Classification of Diseases, Ninth Revision, Clinical Modification is the official system
of for the classification of disease entries, diagnostic, and therapeutic procedures associated with hospital
utilisation in the US.
2.2. Basic resources 15 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
SNOMED3 and others. The Disease Ontology is implemented making use of directed acyclic
Figure 2.1: Screenshot of DO using OBO-Edit
version 1.1 .
graph (DAG) representation and utilises the Unified Medical Language System (UMLS)4 [26].
Based on this standard, much of the process of updating the ontology can be more easily
handled. In a manner similar to the GO curation process and open development, the ontology is
continually extended and revised in order to broadly encompass diseases. The DO is available in
OBO format and it can be readily edited and viewed using the OBO-Edit tool. In figure 2.1 an
OBO-Edit screenshot from the Disease Ontology version 3 is shown.
3 Systematised Nomenclature of Medicine-Clinical Terms is a standardised vocabulary system that
creates a common clinical language for medical databases. Current modules contain more that 357,000
concepts. 4The UMLS contains a metathesaurus within medical concepts and a semantic network. It is intended to
be used mainly by developers of systems in medical informatics and it provides facilities for natural
language processing.
16 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Previous version (v2.1) of the Disease Ontology was almost entirely based on ICD9CM with
some additional concepts useful to map common diseases. Newest version 3 has been based
primarily on freely available vocabularies. For this project the last available version (v3)
containing 12,448 concept nodes was used after downloading from the SourceForge5 home page.
Among others (e.g. it is an OBO ontology), the choice of this ontology was based on the
consideration that it had never been used for gene annotation as for instance the GO, and
consequently annotation projects were not yet initiated. Moreover, as already mentioned in the
introduction, there were several objective advantages in using ontologies for our project of data
integration:
• Through their semantic-free identifiers (unique IDs), ontologies allow quick linkage to
other resources that already make use of their notation system (e.g. GAD database);
Figure 2.2: Genes annotated with the child concept 'synovitis' can be
transitively annotated with parent terms 'disorder of tendon' and
'rheumatism' (on the right). This is not possible out of a DAG structure (on
the left).
5http://sourceforge.net/project/showfiles.php?group id79168&package id=202115&release id=508426
2.2. Basic resources 17 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
• Terms in natural language are often ambiguous even in the context of their ontologies, the
hierarchical definition structure (DAG) ensures however appropriate interpretation
(Figure 2.2);
• Finally and most importantly for computational projects, ontologies are machine
processable.
2.2.2 Online Mendelian Inheritance in Man (OMIM)
The Online Mendelian Inheritance in Man (OMIM)6 is a comprehensive, authoritative and
regularly updated knowledgebase of human genes and genetic disorders compiled to support
human genetics research, education and the practice of clinical genetics [27].
OMIM data are organised in two different files: the 'Gene Map' and the 'Morbid Map' files,
both available at the OMIM project FTP site7. The OMIM Gene Map is a single file, in tabular
format, listing genes that are described in the database. Not all OMIM entries are included in the
Gene Map, but only those for which a cytogenetic location has been published in the cited
references. Each entry is a list of fields such as gene location, gene symbol, MIM number,
disorders and reference. The OMIM Morbid Map is an alphabetical list of diseases used in the
database and their corresponding cytogenetic locations.
2.2.3 Genetic Association Database
The Genetic Association Database8 (GAD) is a publicly available NIH based database of
published gene-based genetic association studies which contains records of over 5,000 human
genetic association studies. The database is centred on genes and provides a standardised
molecular nomenclature by including official HUGO gene symbols. Each record refers to a gene
or a marker and is annotated with links to molecular databases (e.g. LocusLink, GeneCards) and
reference databases (e.g. PubMed, CDC) [28]. The goal of GAD is to allow rapid identification of
6http://www.ncbi.nlm.nih.gov/omim/
7ftp://ftp.ncbi.nih.gov/repository/OMIM/
8http://geneticassociationdb.nih.gov
18 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
medically relevant polymorphisms from a large volume of mutational data.
There are several data fields in GAD collected from genetic association studies, such as
disease, phenotypes, sample size and allele descriptions (Fig. 2.3). Of particular interest to the
Disease Ontology Knowledgebase project are the several disease data fields. A top level 'disease
class' is assigned followed by 'disease' specification from the original paper. Then, there is the
'Broad (or Narrow) Phenotype' disease class that is assigned if studies recognise clinical
subphenotypes and finally there is the MeSH Disease Terms. Full list of disease/phenotype
available in GAD can be freely retrieved9.
In addition, the OMIM gene field links each GAD official HUGO gene name to OMIM ID.
This database was selected as a valuable external resource because it is based on manual
curation and therefore provides an excellent baseline for constructing our knowledgebase. A
relatively large community of experts registered in ad-hoc list contributes to the GAD curation
process. Anyone specialised in either a specific disease, and/or a specific gene or other related
expertise, such as disease or gene specific data collections is invited to enter the list.
Figure 2.3: A simple search of associations for the disease schizophrenia. Fields in
this view include Official Gene Symbol, Disease Phenotype, Disease class, OMIM
ID, MeSH Disease term.
9http://geneticassociationdb.nih.gov/diseaselist.html
2.2. Basic resources 19 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
2.2.4 DrugBank
The DrugBank10 is a unique bioinformatics and cheminformatics resource combining
detailed drug data (i.e. chemical, pharmacological and pharmaceutical) with comprehensive
drug target information (i.e. sequence, structure, and pathways) [29]. It includes physical
property data, structure and image files, pharmacological and physiological data on thousands
of drug products as well as extensive molecular biological information about their corresponding
drug targets.
Each DrugCard contains more than 80 data fields with half of the information being
dedicated to drug/chemical data and the other half to drug target or protein data. Each entry is
created and formatted by one member of the curation team and then separately validated by a
second member of the same team that guarantees quality and completeness. Drug targets and
drug structures are accurately confirmed by using multiple data sources (e.g. PubMed, RxList,
PharmGKB, KEGG, PubChem).
Especially for the massive manual curation, the high-quality data collected in DrugBank was
partly integrated in our DO Knowledgebase.
2.2.5 PharmGKB
The Pharmacogenetics and Pharmacogenomics Knowledge Base11 (PharmGKB) is a public
resource that contains genomic, phenotype and clinical information collected from ongoing
research and from the literature [30]. It is devoted to cataloguing information about
pharmacogenes, which are genes involved in modulating the response to drugs [31].
Pharmacogenes are either involved in the pharmacokinetics (PK) of a drug (how the drug is
absorbed, distributed, metabolised and eliminated) or the pharmacodynamics (PD) of a drug
(how the drug acts on its target and its mechanisms of action).
The aim of PharmGKB is to capture the relationships between drugs, diseases/phenotypes
and genes from several types of information such as literature annotations, primary data sets,
PK and PD pathways, and expert-generated summaries of PK/PD relationships [32].
10
http://redpoll.pharmacy.ualberta.ca/drugbank/index.html 11
http://www.pharmgkb.org/index.jsp
20 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Figure 2.4: Some of the relationships among data objects in PharmGKB. Today, the
PharmGKB has curated evidence for nearly 2,000 genes involved in drug response.
There are 545 drugs with associated phenotype and genotype data or literature annotations,
57 manually created drug-centred pathways, 542 diseases with supporting information and
more than 2100 literature annotations (Figure 2.4). The scientific community contributes to
growing the database content by providing information about gene-drug, gene-disease or gene-
drug-disease associations, as well as available evidences for the associations. Submitted data are
internally curated to avoid possible inconsistencies.
2.3 Methods and Results
The approach used to create the Disease Ontology Knowledgebase combines automated and
manual curation to address two principal tasks: i) extracting gene, disease and drug data from
selected sources and ii) characterising relationships using several complementary strategies.
The suggested method was divided in 4 phases:
• Phase 1: acquisition and integration of data from the external resources;
• Phase 2: compilation of two vocabularies, one of disease names and disease synonyms, and
another of drug names and drug synonyms;
2.3. Methods and Results 21 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
• Phase 3: association of diseases, genes and drugs to DO terms based on automated and
manually curated approaches;
• Phase 4: design and implementation of a MySQL database.
Several problems of data formats were faced and solved during the development of our
resource. To parse data and pull out all the relevant information available, many software
routines were designed on a case by case basis. Information was then completely re-organized to
become easily accessible to the newly developed query tool and to allow an easy maintenance
and update.
2.3.1 Retrieval and integration of data
Main initial effort was focused on the retrieval of relevant data on genes, drugs, diseases and
their inter-relationships, from each and every external database selected. This aspect was
complicated by the many differences in terms of information content and data format of those
resources. Different approaches were therefore adopted to standardise files and make them
easily accessible.
After downloading, the newest revision of the Disease Ontology 31 (revision 21) text file, it
was parsed to extract disease names and synonyms with corresponding DO identifiers, leaving
out the 'temp holding' and the 'obsolete' terms. As already mentioned, terms in the DO are
structured as DAGs; parent terms can be linked to more than one child term and in turn child
terms can have more than one parent. A Perl script was developed to navigate the data structure
and drawing inferences from selected terms, going down through descendent or up through
ancestor of a given node, and taking account of multiple paths.
The PharmGKB provides access to a selected subset of data via a SOAP interface and
documentation. The sample client code and the client programs are freely available and can be
downloaded from the home page2. Several Perl scripts have been combined by authors in order
to allow extraction of different types of information from the PharmGKB knowledgebase. In
1http://sourceforge.net/project/showfiles.php?group_id79168&package_id=202115&release_id=508426
2http://www.pharmgkb.org/home/projects/webservices/index.jsp
22 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________
particular, the specialSearch.pl script was run with option ‘6’ to obtain all diseases with
supporting information. The results were parsed and given in input to the disease.pl script to
obtain information about all related genes and drugs for each disease. Finally, drugs.pl and
genes.pl scripts were used to retrieve information on each single drug and gene. In addition,
when available, the drug chemical structure was collected and integrated in our knowledgebase.
To download the complete database in tab-delimited text files, GAD required us to
manually fill in user request form with personal credentials.
Since every entry is described in GAD by several attributes that can be sub-selected, filters
were applied to extract only those fields relevant to our project, like Broad Phenotype, Disease
Class, MeSH Disease Term, Gene, Gene Name, OMIM ID.
OMIM morbid map was used to extract additional information on disorders and genes
involved in disorders starting from the assigned OMIM ID.
Finally, since DrugBank is a freely available resource, a full set of DrugBank Approved
DrugCards was downloaded3 in a single flat file and used as a source for drug names, synonyms,
and gene target symbols.
For each database, a list of all diseases was gathered and used for the compilation of the
disease dictionary. Then, association data for genes and diseases were extracted from each
resource and successively used for the gene annotation process.
2.3.2 Acquisition of disease name synonyms
One aspect that had to be undertaken was the presence of disease synonyms that often are
used to describe the same disease with different names. Also genes known to be associated to the
same disease are often annotated to different synonyms causing retrieval problems or
incompleteness. Therefore, in order to solve the problem, external resources, and again GAD,
PharmGKB and OMIM, were accurately parsed to provide an additional set of disease synonyms.
The strategy used to compare DO terms associated to disease names was based on the
combination of an automated association process (i.e. comparison algorithm) with a very time
consuming manual curation. The former produced an initial relatively low-quality set of
3http://redpoll.pharmacy.ualberta.ca/drugbank/cgi-bin/download.cgi
2.3. Methods and Results 23 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
associations derived without human intervention, the latter instead improved quality to a much
higher standard.
All the data collected were appropriately formatted to be more suitable for any available
analysis tool. The list of disorders included in the DO was used to link the internal ontology to
external resources. Each DO concept was mapped to any other database containing disease
names by running a Perl script specifically designed to allow term-term comparison, identify
overlapping definitions and extract correlated synonyms when available. In order to perform the
comparison process in all databases, the script was adjusted to be applied to different file format
inputs. Automated processing was focused on the principle of reducing as much the false
negatives as possible accepting meanwhile limited stringency on the false positives.
This initial approach allowed maximising the identification of possible synonyms from the
beginning devoting accuracy to the manual step. Being in the context of standard definitions and
not of the natural language, intended in its widest accepted meaning, no sophisticated learning
algorithms were necessary to make comparisons. The automated comparison method was based
on the application of simple rules to score the level of identity between sentences, also taking
into consideration some semantic content of composing words when possible. Similar
definitions were considered synonyms if at the first instance they responded to the following
condition: I>= int (K/2) where K=T-N (I=Identities, T=total number of words, N= words not
relevant). Conjunctions, generic medical words and order of terms were considered either
irrelevant for the identification of synonyms of diseases or negatively correlated to the level of
identity to be calculated. Main limitations of this comparison approach were the impossibility to
spot synonyms when definitions contained different words with the same meaning and also
when completely different definitions of the same disease existed (e.g. depressive disorder and
major depression).
When a DO term was successfully mapped to a disease name present in one of the source
databases, all its synonyms were extracted. The next step corresponded to the accurate curation
of the results that also addressed the false positives problem (Fig. 2.5).
A disease vocabulary was therefore created and almost all the diseases described in external
databases were appropriately associated to at least one DO term with a unique identifier.
24 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Figure 2.5: Overview of the method. The input files of the method correspond to two
lists of disease names and synonyms: one from the Disease Ontology and one among
GAD, OMIM and PharmGKB. In this example, the DO dictionary is augmented by
synonyms provided by PharmGKB. After the initial filtering using comparison
algorithm, a manual curation has been applied to correct the result and find
additional associations.
2.3. Methods and Results 25 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
The highest number of exact matches was found between the DO and PharmGKB database.
A total of 2,633 exact matches between these two resources were obtained, e.g. osteoporosis
(DOID: 11476 and GKB: PA445190), and rheumatoid arthritis (DOID: 7148 and GKB:
PA443434).
Table 2.6 recapitulates the number of matches between DO and external databases. Column
A represents the total number of associations generated by the script used for the comparison.
Table 2.6: Results of the comparison between DO and the three resources are reported;
numbers in brackets correspond to total number of terms for each database. Column A: total
number of associations generated by the script used for the comparison. Column B: sum of
totals in column C and D. Column C: total number of identities between the DO name and the
name or synonyms in the other database. Column D: total number of matches found after the
manual curation.
The associations, including false positives, were redundant and required curation process.
For instance, the script found and filtered 186,894 possible PharmGKB positive results that
corresponded to 2,976 non-redundant associations. After manual curation of this large set of
almost 3000 entries, 2,866 resulted as correctly matched by the script (column B). The total
matches are derived from the addition of the matches in column C and D. Column C shows the
identities between the DO name and the name or synonyms in the other database; column D
shows the matches found after manual curation. The highest global overlap (71.68%) was found
between the DO and PharmGKB database. The DO terms associated to the highest levels of the
26 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________
ontology hierarchy were easily spotted in all databases e.g. osteoporosis (DOID: 11476;
PharmGKB: PA445190; OMIM: 166710). The low-level DO terms, which refer to more specific
disease classes, were anyway found in at least one external database.
2.3.3 Gene annotation findings and gene-disease relationships
Gene information was retrieved and downloaded as available from the NCBI FTP site4. The
file containing human gene-based information only was then parsed to collect data from of
interest and the extracted information was implemented in a MySQL database (Fig. 2.7).
Gene annotation data were also found in the GOA gene association files5 that maintains the
Figure 2.7: The general disease vocabulary has been populated using GAD, OMIM and
PharmGKB data. This vocabulary was used to search in the genetic association file obtained
from GAD. All matches between the two files were collected in a gene annotation file where
genes are associated with one or more DO diseases with a unique DO ID.
4ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene info.gz
5ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/HUMAN/gene association.goa human.gz
2.3. Methods and Results 27 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
GO assignments for the proteins of the non-redundant human proteome set.
Disease names and synonyms of the general disease vocabulary were used to search in the
genetic association file produced with data collected from GAD. Matches between the two files
were collated in a general gene annotation file where genes are associated to one or more DO
disease with a unique DO ID (Fig. 2.7).
2.3.4 Compilation of the drug dictionary
The list of drugs used in our database was compiled from DrugBank and PharmGKB, being
the former first used in order of time.
DrugBank content is based on the 'active principle' or ‘active ingredient set’ of drugs. Due to
the effort required for curation, some drugs are included in the queue of 'to be added' drugs even
if publicly available (e.g. nimesulide). Relevant data were extracted from each DrugCard entry by
selecting, among others, the following fields:
• Generic name: standard name of drug as provided by drug manufacturer;
• Brand name and synonym: alternate names of the drug, brand names from different
manufacturers;
• Brand name mixtures: brand names and composition of mixtures that include the drug
described in the DrugCard file;
• Indication: description or common names of diseases that the drug is used to treat;
• Drug target(s) name: name of the protein or macromolecule (or other small molecule) that
the drug is supposed to act upon. Some drugs act on multiple targets, so these fields may
be repeated several times, reflecting the number of drug targets that a specific drug may
have;
• Drug target(s) gene name: gene name of drug target;
• Drug target(s) synonyms: alternate names (protein names, abbreviations, etc.) of the drug
target;
• Other fields such as ChEBI ID, CAS RN and PharmGKB ID.
28 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Starting from the drug’s generic name list obtained from DrugBank, Perl scripts were run on
PharmGKB database to extract additional entries (names or synonyms) and populate a general
dictionary in which each generic drug name is associated to all possible synonyms and to
mixtures possibly including the drug. A mixture might be associated to more than one drug e.g.
Ana-Kit is a mixture composed of Chlorpheniramine (APRD00001) and Epinephrine
(APRD00450). A total of 1,349 drug active principle names and 24,303 synonyms have been
identified, each of them have an average of 19 associated synonyms (Table 2.8).
Table 2.8: A total of 1,349 drug active principle names with an average of 19 synonyms
have been identified. 216 drugs are from DrugBank, 149 from PharmGKB and 984 are in
common. The total number of synonyms is 24,303 and number of mixture is 253.
2.3.5 Finding correlations between drugs and diseases
The following action was planned to extract and characterise the relationships between
drugs and diseases. The Indication field in DrugBank provided a description of the possible uses
of a drug for the treatment of specific disorders. Unfortunately, these definitions did not follow
any standard as they are derived by natural language. Finding drug-disease associations was
therefore further complicated by the large number of possible false positives. In order to address
this problem, Perl scripts were developed implementing stringency criteria useful to improve
predictivity as much as possible. Nonetheless, output files required manual curation to increase
the level of accuracy for each drug-disease relationship. As a result, 888 drugs from DrugBank
have been associated to 801 DO diseases (Fig. 2.9).
2.3. Methods and Results 29 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Figure 2.9: Example of associations between DO diseases and drugs.
Several drugs were associated to more than one DO term, as for example the
Chlorpheniramine (DrugBankID: APRD00001), which has indications for the treatment of
rhinitis, urticaria, allergy, common cold, asthma and hay fever. As expected it was properly
linked to six DO entries by our knowledgebase (Table 2.10). Chlorpheniramine was also
associated to the DO term 'hypersensitivity' as synonym of the disease name 'allergy'.
Table 2.10: Chlorpheniramine (DrugBankID:APRD00001), used for the treatment of
rhinitis, urticaria, allergy, common cold, asthma and hay fever, has been associated to six
DO entries.
30 Chapter 2. Disease Ontology Knowledgebase____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Thus, usage of our knowledgebase confirmed its capacity to retrieve the principal diseases
associated to a specific drug. In a similar manner, different drugs associated to the same disease
can be equally identified (Table 2.11).
Table 2.11: Divalproex and Rizatriptan are both used for the treatment of migraine DOID:
6364.
2.3.6 Identifying relationships between drugs and target genes
Gene or proteins are identified as disease key molecules when involved in specific metabolic
or signalling pathways relevant to given condition or pathology. A protein however, might also
be considered key molecule because target of drug treatment. Its inhibition for instance, could
block a pathway in the disease state.
Since the DrugBank database provides this type of information, it was used to characterise
and store relationships between drugs and the respective target genes. Each drug is ID linked to
relevant entries in the GENE table (Figure 2.12) of the MySQL database, which also contains
UniprotID, alternative gene name etc. (Figure 2.13).
2.4. Possible applications 31 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Figure 2.12: Example of associations between genes and drugs.
2.4 Possible applications
Analysis of differential gene expression is a widely known and useful method to reveal list of
possible disease candidate genes, which is usually a gradient of entries spanning from well-
known in the literature to completely new. Screening such a list in PubMed is neither quick nor
easy even for a general overview and the process is further complicated by the availability of
synonyms and common names. For instance, search of the BDNF official gene name in
combination with ‘bipolar’ retrieves 177 PubMed abstracts. This number is reduced to 153 if the
extended name Brain Derived Neurotrophic Factor is used instead. An equivalent search for
ALOX12 results in 1 entry only. However, both genes would be quickly associated to bipolar
disorder using our DO Knowledgebase. If relevant, it is then even more difficult to find public
domain resources able to correlate genes to known interacting drugs, which is another useful
option of the DO Knowledgebase. In summary, retrieving useful information on genes, diseases
and/or drugs of interest, is a time consuming and sometimes frustrating job. Our DO
knowledgebase is potentially useful to get through long lists of expression derived gene data
because it easily helps organising information at the higher-level saving time for detailed and
focused analysis.
Aim of the work has been to provide the research community with a tool able to deliver
32 Chapter 2. Disease Ontology Knowledgebase ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
quick overview of potential links between genes, drugs and diseases. The DO Knowledgebase
successfully connects those data making use of several very up to date and manually curated
resources. As soon as the web interface is available, the user will be able to browse the list of
disease concepts or query the database to search disease terms or genes of interest. Intersection
of different gene sets relevant for different disease concepts of particular interest will also be
possible. It is, for instance, widely recognised that people suffering for one specific sub-type of
mood disorders have an increased susceptibility to additional mood disorder’s sub-types. With
this approach it would be feasible to answer questions such as: how many genes are associated
to the several known types of depression in a complex expression experiment? And among them,
how many are in common or appear frequently? When those genes are identified, it is quick and
Figure 2.13: Schema of the MySQL database.
2.4. Possible applications 33 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
straightforward with the DO Knowledgebase to verify if they are associated to any available
therapeutic drugs for mood disorders. Moreover, finding gene-drug relationships would form
the basis of more detailed pharmacogenetic experimental investigations.
35
Chapter 3
Case study
This chapter provides description of the research case study used to verify quality and
potentialities of the Disease Ontology Knowledgebase. There is an initial introduction followed by
some background to contextualize molecular mechanisms of disease in the domain of
neuropsychiatric disorders. The central part of the chapter is divided in two sections:
a. DO Knowledgebase analysis and ‘Methods and results’ part one.
b. Pathway/Network analysis and ‘Methods and results’ part two.
Last paragraph is devoted to some discussion on this second part of my PhD activity.
3.1 Introduction
To test the functionality of the Disease Ontology Knowledgebase developed in my PhD
activity, validate the internal consistency of the data and possibly broaden current state of
knowledge on the subject, genes and mechanisms involved in dendritic plasticity were
investigated especially in correlation with neuropsychiatric diseases.
Evidences that etiopathology of several cognitive disorders is strongly influences by
regulation of plasticity mechanisms at the molecular level started accumulating recently. There
are many aspects of this interesting research subject already known in the literature that are
useful to control and verify results, several others however still remain to be elucidated. General
description of the multiplicity of molecular aspects involved in many of the disorders is often
36 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
lacking and even disease categorization is based on poorly objective criteria. Some detailed
overview of the many aspects implicated, spanning from basic definitions to description of state
of the art research, is reported in the next paragraphs (3.2, 3.2.1, 3.2.2, 3.2.3). Background is
provided in order to better understand methods and results described in the second part of the
chapter.
This study does not pretend to give any exhaustive results on the subject; it only suggests a
research activity that makes use of the computational resource developed during my PhD activity
integrated in a wider computational strategy. Main objective was the validation of the DO
Knowledgebase in a real research case, which is a less subjective method compared to test
examples created ‘ad hoc’. The good results obtained however, represent an interesting starting
point for further investigations in the field.
3.2 Background
Mental Disorders are categorized according to their predominant features. For example,
phobias, social anxiety, and post-traumatic stress disorder all include anxiety as a main feature of
the disorder. All of these disorders are therefore categorized under Anxiety Disorders. There are
over 300 different psychiatric disorders listed in the DSM-IV. With continued research, more are
named every year and some others are removed or re-categorized.
Important factors in the molecular genetics of psychiatric illnesses and relevance of
molecular signals have been elucidated using a combination of experiments and computation.
However, research in the field is still very far from describing even the fundamental mechanisms
for most of the disorders. This is true for instance with depression, one of the most serious
mental diseases with the highest prevalence worldwide. It is becoming the major source of
disability, second only to cardiovascular diseases. Depression like most mental illnesses is
probably caused by a combination of genetics and environmental causes. Abnormalities in brain
biochemistry and in the structure or activity of certain neural circuits are known to be
responsible for the extreme shifts in mood, energy, and functioning that characterize depression.
Lithium has remained one of the most effective medicines for depression patients, but the
mechanism of this effect is still unclear even if several of its molecular targets, as the Glycogen
3.2. Background 37 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Synthase Kinase 3 (GSK3), have been identified.
Data from the last decade have suggested that abnormalities in the development and
information processing in the neuronal networks involved in emotional processes may at least
partially underlie mood disorders [33] [34] [35].
Plasticity of neuronal networks has therefore emerged not only as a determinant of the
disease but also as a necessary component in successful antidepressant treatments, including
pharmacological and psychological therapies.
3.2.1 Neuropsychiatric disorders and mechanisms of regulation
There are many hypotheses on neuropsychiatric disorders and antidepressants mode of
action, several of which are largely based on the dysregulation of the hypothalamic-pituitary-
adrenal axis (HPA) mediated by the involvement of corticotropin-releasing hormone (CRH),
glucocorticoids, brain-derived neurotrophic factors (BDNF) and CREB [36]. Others focus on the
fact that neuropsychiatric disorders are stress-related and there are good evidences that episodes
of depression for instance often occur in response to stress to some trauma. A prominent
mechanism by which the brain reacts to acute and chronic stress is through the activation of the
HPA axis. When activated by exposure to stressors, CRH is produced within the hypothalamus.
In turn, CRH stimulates the anterior pituitary gland to release adreno-corticotropic
hormone (ACTH) into the bloodstream. ACTH then stimulates the release of glucocorticoids (e.g.
cortisol in human) from the adrenal cortex [37]. Circulating glucocorticoids interact with their
receptors in various target organs such as the liver and muscle tissue, as well as the brain and the
HPA axis itself. Here they are responsible for initiating feedback inhibition. Thus they exert
profound effects on general metabolism and also affect several processes like neurogenesis,
survival of neurons, neuronal plasticity, neuronal cell proliferation and cell death [38].
Other hypothesis suggests a role for neurotrophic factors at the basis of several pathological
neuropsychiatric conditions. They regulate neuronal growth and differentiation during
development but are also known to be potent regulators of plasticity and survival of adult
neurons and glia cells. Many papers dealing with the neurotrophin hypothesis have shown that
acute and chronic stress decreases levels of BDNF expression in several brain regions [39].
38 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
One of the emerging mechanisms of particular interest in brain disorders is widely
recognised to be plasticity of dendritic spines but details have not yet been elucidated.
Dendritic spines are morphological specializations that protrude from the main shaft of
dendrites. Most excitatory synapses in the mature mammalian brain occur on spines. So, spines
represent the main unitary postsynaptic compartment for excitatory input. Dendritic spines
generally consist of a head attached to a dendrite via a stalk or a neck and within this general
description, spines span a continuum of shapes. Spines have been classified by shape as thin,
stubby, mushroom and cup-shaped. However, spine morphology is not static; spines change size
and shape over variable timescales.
There is no definitive answer on the significance of dendritic spines, but prevailing view is
that their primary function is to provide a microcompartment for segregating postsynaptic
chemical responses, such as elevated calcium.
Regulated changes in spine number might reflect mechanisms for converting transient
changes in synaptic activity into long-lasting alterations. Indeed, changes in spine density have
been observed in response to changes in the efficacy of neurotransmission. In general terms,
spines seem to be maintained by an 'optimal' level of synaptic activity: spine density increases
when there is insufficient activity, and decreases when stimulation is excessive. Moreover, spine
morphology is markedly influenced by the activity of glutamate receptors.
In depth knowledge of dendritic plasticity would contribute better understanding of a wide
range of diseases with apparently different pathogenesis and symptoms, different drug
treatments but possibly common molecular determinants.
3.2.2 Dendritic plasticity
In neuroscience, synaptic plasticity is the ability of the connection, or synapse, between two
neurons to change in strength. Neuronal plasticity or remodelling is most often discussed with
regard to cellular and behavioural models of learning and memory. However, neuronal plasticity
is a fundamental process by which the brain acquires information and makes the appropriate
adaptive responses in future-related settings.
3.2. Background 39 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
The majority of synapses throughout the nervous system are made onto dendrites.
Dendrites are a major determinant of how neurons integrate and process incoming information,
and thus, they play a vital role in the functional properties of neural circuits. Thus, the extent of
a neuron’s dendritic arborisation is an important determinant of its input structure and affects
how incoming information is integrated and processed. Dendritic spines exhibit rapid motility.
Most spines can change shape in seconds. The shape change involves a remodelling of the
cytoskeleton in the spine, and actin-based protrusive activity from the spine head. The
underlying molecular mechanisms of this motile behaviour, and its functional significance, are
unknown. Existing data suggest that more spines form when neurons have less excitatory
activation, are maintained by optimal activation, and are lost when activation is too high, or if
the presynaptic axons degenerate [40]. This pattern supports the hypothesis that neurons may
homeostatically regulate input through spine number. It also suggests a second important fact
about dendritic spines. Extra spines that form when excitatory neuronal activation is low can
provide a morphological basis to support new synaptic plasticity.
Considerable progress has been made in identifying the molecules that control spine growth
and maturation. The cytoskeleton is crucial for their development and stability, and an
expanding set of actin-binding and actin-regulatory molecules has also been implicated in these
processes. They include Ras, and GTPases of the Rho/Rac/Cdc42 family, the small GTPase
series of receptors and scaffold proteins [41]. Several questions remain however to be answered
in this nascent field.
3.2.3 Factors influencing dendritic plasticity
Antidepressants have been shown to induce neuronal plasticity when administered over a
sufficiently long period [42] [43] in several cortical regions particularly the hippocampus, in a
manner that is analogous to that produced by favourable environmental stimulation [44].
Importantly, if neurogenesis is prevented, antidepressants fail to produce typical behavioural
responses in rodents, demonstrating at least an association between neurogenesis and
behavioural effects on antidepressants [45]. Another manipulation that causes widespread
changes in dendrites in a variety of adult brain regions is exposure to drugs of abuse. Because
40 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
chronic exposure to drugs causes profound experience dependent changes in behaviour, it was
hypothesized, based partly on the data from environmental enrichment and training, that such
exposure could also cause persistent changes in dendritic structure. Furthermore, drugs of
abuse are known to change the concentration of various neurotransmitters at synapses, which is
known to affect dendrite development.
Recent studies have indicated that neurotrophins also control dendritic growth and
arborization in the CNS. Some authors have found that all four neurotrophins have diverse
effects on dendritic arborization of pyramidal neurons in slices of developing visual cortex [46].
Horch and Katz (2002) have published elegant studies showing that BDNF supplied by a single
neuron in ferret cortex brain slices induces dendritic branching in nearby neurons in a distance
dependent manner [47]. These data support a role for BDNF in dendritic growth and
remodelling in the neocortex. However, it appears that some neurotrophins exhibit opposing
effects on cortical dendritic growth. In addition, it has been shown that NT-3 exerted dendrite-
retractive effects in developing visual cortex, suggesting a “push-pull” control of dendritic
arborization by different neurotrophins [48].
One of the simplest ways to link changes in activity to changes in dendrites is via the effects
of neurotransmitters themselves. The effects of glutamatergic transmission, particularly
mediated by the N-methyl-d-aspartate (NMDA) subtype of receptor, have been shown to affect
dendritic structure in the developing cortex, and there is an interaction between synaptic
activity driven by glutamate and the effects of BDNF.
3.3 Methods and results (Part I)
The strategy adopted to give an insight into the mechanisms involved in dendritic plasticity
and neuropsychiatric disorders illustrated in previous paragraphs is described in the following
summary and then detailed in paragraphs from 3.3.1 to 3.4.4:
Step 1: Identify relevant query terms to interrogate selected databases (e.g. Gene
Ontology) and extract as an exhaustive list of known genes associated to dendritic
plasticity as possible.
3.2. Background 41 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Step 2: Completely annotate genes with information on functional role, tissue
localization, molecular mechanisms involved etc. Filter list based on gene
expression criteria.
Step 3: Submit gene list to the Disease Ontology Knowledgebase and extract all the
associated diseases.
Step 4: Investigate gene distribution in canonical pathways.
Step 5: Extract the genes showing an involvement in at least two related diseases
identified in previous step.
Step 6: Build a network of relationships for genes identified in Step 5 to delineate
molecular context and spot topological “hubs”. Use the DO Knowledgebase to
study the network.
Step 7: Collect available gene-drug information for genes selected in Step 5 and Step 6.
3.3.1 Selection of query terms and creation of a gene list
Information sources covering the literature space, public domain ontologies and commercial
bioinformatics software solutions were selected in order to extract genes from computationally
structured knowledge.
Short textual definitions able to quickly describe the several known aspects of the dendritic
plasticity molecular mechanisms were short listed to interrogate the Gene Ontology, PubMed
and GeneGO (GeneGo® bioinformatics software)1 and extract relevant genes. These definitions
included free text query terms pertinent to biological processes (e.g. plasticity), morphological
aspects (e.g. dendritic spines) and known mechanisms (e.g. known protein markers). Several
1http://www.genego.com
42 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________
restrictions were applied to the original query definitions to avoid retrieval of either too
unspecific or just marginally relevant genes. The resulting list was than mapped when possible
to standard biological subject headings (i.e. descriptors) and used to interrogate the data
sources selected.
The expected output was a group of genes containing several hundred entries significantly
associated to the query terms (Fig. 3.1).
Figure 3.1: Query terms related to the mechanism
under investigation are selected through several
criteria and used to identify genes correlated.
Gene duplications caused by alias names were consolidated to end up with a final group of
more than 200 relevant genes emerged from a total of approximately 500. These genes were
selected to proceed with complete annotation followed by data analysis. Full list of genes is
available in Appendix A.
Data sources
- PubMed - GO - Networks Warehouse - Disease expert knowledge space
Process
• “Plasticity” and related
Structure
• Dendrite
• Dendritic
Molecular mechanism
• Known protein
Free text and ontology based queries
Filters:
• Expressed in correct regions
• Correlated Expression
• Quality of query terms
List of relevant genes
3.3. Methods and Results (Part I) 43 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.3.2 Annotation and selection of the Dendritic Plasticity gene dataset
A focused dataset constituted by over 220 genes relevant to dendritic plasticity and pulled
out as described in previous section (3.3.1) was fully annotated with quality controlled data to be
variously clustered based on common features. Information was retrieved from many sources
using manual and semi-automated methods often integrated when available by authors in
public domain Web tools.
In summary, the most significant sources of knowledge selected were the Gene Ontology,
the whole literature, the Jackson’s mouse phenotypes (The Jackson Laboratory, Bar Harbor,
Maine, US)2, the Gene Expression Omnibus (GEO, National Center for Biotechnology
Information, Bethesda, US)3 and the Allen Brain Atlas a public resource of the Allen Institute for
Brain Science, Seattle (WA, US)4. GO annotations were extracted from the most descriptive
levels of the ontology tree in all the three branches (i.e. biological processes, molecular
functions, cellular components) using EASE a standalone version of the Database for
Annotation, Visualization and Integrated Discovery tool [49][50].
Genes were ordered along rows and annotated along multiple columns of an Excel Worksheet.
Information extracted was condensed to fill in key fields and sub-fields in some details and as
much consistently as possible. When specific data were not available corresponding cell was left
empty in the table. Snapshot of the data sheet content split by main categories and sub-
functions is listed below. Number of genes annotated for each feature is reported in parenthesis:
o Gene Ontology
o Biological process (200); Molecular function (193); Cellular component (189)
o Mechanisms
o Structural (67); Trafficking (37); Long Term Potentiation (88); Long Term
Depression (30); Development (76); Neurogenesis (45)
2http://www.informatics.jax.org
3http://www.ncbi.nlm.nih.gov/geo
4http://www.brain-map.org
44 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________
o Species
o Human (223); Mouse (222); Rat (222); Drosophila only (15)
o Functional localization
o Adult (105); Development (107); Neuronal (109); Dendritic Spine (46); Post-
synaptic (49); Pre-synaptic (28)
o Brain region
o Cortex (104); Hippocampus (103); Amygdala (41); Thalamus (102);
Hypothalamus (99); Cerebellum (102)
o Cell localization
o Extracellular (50); Membrane (88); Cytoplasmic (121); Nucleus (48);
Mitochondrial (6)
The number of genes sharing characteristics known to be associated to dendritic plasticity
was further reduced to form a smaller and much focused subset of elements. Expression in
appropriate brain areas, developmental stage or functional localization etc. was used to filter out
genes with weaker evidences supporting their link to plasticity. Since no better sources of
human data were available, brain expression information was derived from the mouse Brain
Atlas.
This further sub-selection was important to improve specificity and proceeds with a more
accurate identification of the diseases that genes identified were associated to after analysis with
the Disease Ontology Knowledgebase.
3.3.3 Identification of the correlations gene-disease
The list of genes appropriately extracted from public domain sources and filtered to exclude
some of the less relevant was submitted to the Disease Ontology Knowledgebase. This allowed
extracting gene-disease associations based on the annotated dataset identified. A significant
portion of the selected genes appeared to be implicated in one or more diseases.
3.3. Methods and Results (Part I) 45 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Neuropsychiatric disorders, the most interesting to the objectives of this work, appeared highly
enriched in genes of the dendritic plasticity dataset (Fig. 3.2).
Figure 3.2: Absolute number of genes assigned to DO nodes is reported in
vertical axes, while diseases are distributed horizontally. Alzheimer and
Schizophrenia resulted as the top two represented diseases of the “Dendritic
Plasticity” focused dataset.
The most represented among all were by far Alzheimer’s disease and Schizophrenia,
followed by Epilepsy, Myocardial infarction, Hypertension and Obesity. Being dendritic
plasticity a mechanism known to be extremely relevant for cognitive related tasks at the
molecular level it was not unexpected to find diseases affecting these tasks on top of the list. On
the other hand, disease candidates like myocardial infarction and hypertension were less
obvious to be interpreted and would have required further investigation out of scope in this
project.
It is necessary to highlight that number of total annotations per disease in the DO
knowledgebase extracted as described previously could be involuntarily biased towards certain
diseases. This is not an effect of the annotation process; instead, it reflects data distribution in
0
2
4
6
8
10
12
14
16
18
20
Ach
ondr
oplasia
Acu
te m
yoca
rdia
l inf
arct
ion
Aden
ocar
cino
ma
of L
ung
Alcoh
ol a
buse
Alexa
nder
Disea
se
Alzhe
imer
's D
isea
se
Asthm
a
Ath
eros
cler
osis
Bipola
r Disor
der
Blepha
rosp
asm
Bre
ast C
arcinom
a
Car
cino
ma o
f Skin
Celiac Disea
se
Color
ecta
l Can
cer
Cor
onar
y he
art dise
ase
CRANIO
FRONTO
NASAL DYS
PLA
SIA
Dem
entia
Diabe
tes Mellit
us
Diabe
tic N
ephr
opat
hy
Diabe
tic R
etinop
athy
Dow
n Syn
drom
e
Eatin
g Disor
ders
Endo
met
riosis
Epile
psy
Ess
entia
l Hyp
erte
nsio
n
Fac
tor V
II Def
iciency
Fac
tor X
II Def
iciency
Fra
gile X
Syn
drom
e
Fro
ntot
empo
ral d
emen
tia
Gas
tric ulc
er
Ger
m cell
tum
or
Glauc
oma
Gra
ves' D
iseas
e
Hep
atitis A
Hep
atitis E
Her
oin
Dep
enden
ce
Histio
cyto
ma
Hun
tingto
n Disea
se
Hyp
erch
oles
tero
lemia
Hyp
erlip
idem
ia
Hyp
erlip
oprot
eine
mia
Typ
e III
Hyp
erte
nsion
Kidney
Failure
Lep
rosy
Les
ch-N
yhan
Syn
drom
e
Li-F
raum
eni S
yndr
ome
Lup
us er
ythem
atos
us
Macu
lar d
egene
ratio
n
Multi
ple
Mye
lom
a
Multi
ple
Scler
osis
Mya
sthen
ia G
ravis
Mye
lofib
rosis
Myo
card
ial Inf
arction
Myo
card
ial Isc
hem
ia
Myo
tonic Dys
troph
y
neu
ropa
thy
Obe
sity
Obs
essive
-Com
pulsive D
isor
der
Osteo
poros
is
Osteo
sarc
oma
Panc
reat
ic carc
inom
a
Panc
reat
itis
Park
inso
n Disea
se
Perio
dontitis
Pers
onality Disord
ers
Piebal
dism
Pro
stat
e ca
rcinom
a
Ret
inob
lastom
a
Rhe
umatoid
Arth
ritis
Sarc
oidos
is
Sch
izoph
renia
Squa
mou
s ce
ll ca
rcinom
a
Sys
tem
ic lu
pus er
ythem
atos
us
Thr
ombo
cyto
penia
Thy
roid car
cino
ma
Weg
ener's
Gra
nulom
atos
is
Wisko
tt-Ald
rich
synd
rom
e
Series2
Alzheimer’s Schizophrenia
Myocardial infarction
Hypertension
Epilepsy
Obesity
46 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________
public domain databases (e.g. the literature). There are several research fields that are more
investigated than others and for which the wealth of information available is clearly
overwhelming. In some cases bias is simply due to the availability of data preferentially
produced with large scale analysis techniques (e.g. omic studies).
Figure 3.3: Total number of annotations per single disease available in the knowledgebase is
highlighted in purple; in blue the equivalent number of disease annotations found for the
dendritic plasticity dataset. Log scale distribution of total disease annotations for relevant
diseases in the DO does not clearly correlate with distribution of diseases in the dendritic
plasticity dataset even if some tendency cannot be excluded.
However, as graphically reported in logarithmic scale in Fig. 3.3 over-representation of
disease annotations identified in the query set showed some correlation tendency only and not
clear relation to the total number of annotations in the knowledgebase. Alzheimer and
Schizophrenia diseases for example, are both highly annotated and also the most enriched in the
dataset but Epilepsy and Parkinson’s do not follow the same trend. At this point, statistical
analysis would be very useful to evaluate significance of results. Integration of statistical
methods applied to data validation however has been taken into consideration for further
1
10
100
Mye
loid le
ukem
ia
Ange
lman
Syn
drom
e
Bipola
r Disor
der
Celiac Disea
se
Lup
us er
ythem
atos
us
Osteo
poros
is
Osteo
sarc
oma
Pers
onality Disord
ers
Asthm
a
Cor
onar
y he
art dise
ase
Park
inso
n Disea
se
Rhe
umatoid
Arth
ritis
Sys
tem
ic lu
pus er
ythem
atos
us
Color
ecta
l Can
cer
Multi
ple
Scler
osis
Hyp
erte
nsion
Myo
card
ial Inf
arction
Epile
psy
Obe
sity
Sch
izoph
renia
Alzhe
imer
's D
isea
se
Dendr. plasticity
Tot. annotations
3.3. Methods and Results (Part I) 47 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
improvement of the knowledgebase system and it is not available in this first version of the
analysis tools.
3.3.4 Data validation using alternative methods
To make comparison of the gene-disease associations retrieved with the DO Knowledgebase
and possibly get indirect confirmation of results, supplemental approaches making use of two
different tools, the commercial Ingenuity Pathway Analysis (IPA, Ingenuity® Systems,
www.ingenuity.com) and the public domain DAVID were applied to the dataset.
Ingenuity database contains more than one million findings privately curated from the
public domain literature. Terminology of the Ingenuity Knowledge Base is however not standard
being the result of an internal effort and as such it is not identical to that developed for the
Disease Ontology. Nonetheless, it was straightforward verifying that this approach delivered
very comparable results as summarised in figure 3.4. IPA allowed to score the disease terms
over-represented in the dendritic plasticity dataset and to calculate p-value to evaluate statistical
significance. Both the “Neurological disease” and the “Psychological disorders” internal category
resulted below fixed significance threshold of 0.05.
Figure 3.4: Genes associated to diseases by IPA tool showed that neurological and
psychological disorders are the most represented diseases in the dendritic plasticity dataset.
48 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Usage of IPA analysis supported results pulled out from the DO Knowledgebase and was
considered valid indication to proceed with additional investigation.
Same analysis was performed with DAVID to take advantage of the large number of public
domain databases of pathways integrated in the tool. Top scoring pathways enriched with genes
of the dendritic plasticity gene set and ranked by significance are listed in Table 3.5.
Relationships of the dataset with Alzheimer’s disease and other neurodegenerative diseases in
KEGG pathways came out as statistically significant. The number of genes assigned to over-
represented categories however, resulted much lower compared to that obtained with the DO
Knowledgebase.
Table 3.5: Annotation of dendritic plasticity genes was obtained using DAVID.
Neurodegenerative Diseases and Alzheimer’s disease are significantly over-represented in
KEGG pathways.
3.3. Methods and Results (Part I) 49 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
When disease specific databases were included in DAVID analysis, significant associations
emerged from GAD (Table 3.6), which is also one of the databases used to annotate the DO (see
Chapter 2, 2.2.1). The two highest scoring categories were again Schizophrenia and Alzheimer,
in accordance with previous analysis. However, even if number of genes assigned to
Schizophrenia was similar, number of genes assigned to Alzheimer’s disease was again much
lower. This put in evidence that the integration of several different sources of data in the DO
Knowledgebase allow to get improved results and better annotation.
Table 3.6: Usage of disease specific annotations in DAVID allowed identifying Schizophrenia
and Alzheimer’s disease as the highest scoring entries.
50 Chapter 3. Case study____________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.4 Methods and Results (Part II)
After an initial introduction, Part II of ‘Methods and Results' section is dedicated to
pathway and network analysis. Details of the process used to validate and extend the output of
the DO Knowledgebase analysis are given in paragraphs 3.4.3 and 3.4.4.
3.4.1 Introduction to pathway and network analyses
Pathway analysis refers to the computational approaches used to investigate network of
genes, proteins or metabolites as a system and describe a set of molecular events. In a broad
sense, biologists use the term “pathway” to describe a set of molecular events underlying a
biological process. Topological analysis of a pathway identifies the global qualitative properties
of the system. A process of building pathways once relied completely on the slow accumulation
of knowledge about individual molecular events. Information therefore was often spread over
thousands of publications. The introduction of new technologies is changing this old approach
and pathway analysis is increasingly used to interpret high-throughput experiments that
measure abundance of biological molecules for high number of data points generated in a single
run.
There are several approaches that produce those kind of observations: of mRNA using gene
expression microarrays [51][52], metabolomics experiments measuring endogenous and drug
metabolite concentration [53], proteomics experiments measuring protein levels [54], studies of
protein phosphorylation and protein-protein interactions by protein arrays, mass spectrometry
or yeast two-hybrid screen [55][56] and finally domain-driven lists of manually generated
protein or gene entries. Due to platform technical assumptions, characteristics, and implied
limitations, interpretation is however not straightforward.
Most approaches use known molecular interactions to calculate pathways, but some try to
infer novel interactions directly from the profiling data. Information about activated pathways
can be used to select known drugs for personalized therapy, to select and prioritize potential
drug targets to develop new drugs and to evaluate the efficacy of a drug candidate or to predict
drug side effects and toxicity.
3.4. Methods and Results (Part II) 51 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
As a result, analysis methods may fall into three broad categories:
1. Pathway analysis: loosely defined as highlighting known or pre-defined
pathways in response to stimulus during an experiment.
2. Network analysis: loosely defined as highlighting networks of interacting
proteins and molecules, where those interactions may include direct molecular
interactions, or interactions defined by other scientific criteria such as
signalling cascades or downstream effects.
3. Multi-dataset pathway analysis: loosely defined as answering the question
"How significant are results across datasets, compared to what could randomly
be considered an overlap across datasets?"
3.4.2 Databases and tools for Pathway/Network analysis
Several databases of molecular interaction have been developed using manual curation.
Public manually curated databases include: BIND, KEGG, DIP, HPRD, and Reactome.
Commercial manually-curated databases have been developed by Ingenuity Systems, Jubilant
Biosystems, Molecular Connections and GeneGo. Because these databases provide highly
accurate data, they unfortunately suffer from slow and expensive data accumulation.
The pathway analysis framework as just described provides a natural environment for
expanding the molecular interaction network over uncharacterized proteins and to assign a
confidence score to the interactions.
Network database navigation tools can find a minimal set of regulators responsible for most
of changes in molecular profiles and reveal how the regulation activity could be carried out. For
example, a transcriptomics experiment usually finds thousands of differentially expressed
genes, but their expression is driven in theory by a limited number of transcription factors. The
representation of the molecular profile as a set of major regulators reduces the complexity of the
observed pattern and simplifies further analysis. The set of major regulators, in combination
52 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
with their downstream targets exhibiting differential profiles, is interpreted as a list of affected
biological processes [57].
Another major application for current pathway analysis tools is biomarker discovery and
optimization. The most convenient biomarkers are secreted proteins and metabolites that
expression analysis technologies can not help detecting. However, they can still be identified as
downstream targets of selected sets of differentially expressed genes. The statistically significant
expression level variations are usually identified using microarray technology that cannot detect
changes in secretion or signalling through protein modification and chemical reactions.
Standard algorithms for network navigation [58][59], enable network expansion by finding
the shortest path with highest score between the database entities, common regulators and
targets. Network analysis using these algorithms is still laborious and time consuming but
unfeasible without them. Most commercial companies mentioned in previous paragraph provide
such tools, with varying degrees of complexity.
3.4.3 Canonical pathways analysis
To better understand molecular mechanisms correlated to dendritic plasticity, genes of the
dataset were superimposed on canonical pathways using IPA library. The significance of the
association between genes in the data set and the canonical pathway identified was measured as
follow:
1. Number of genes from the data set that mapped to the canonical pathway was divided
by the total number of genes members of the same pathway.
2. Fischer’s exact test was than used to measure p-value and determine the probability that
the association between the genes in the dataset and the canonical pathway emerged
would be explained by chance alone.
3.4. Methods and Results (Part II) 53 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Among the most significantly enriched pathways retrieved, Axonal Guidance Signaling and
Synaptic Long Term Potentiation categories, confirmed the consistency of the dataset selected
for this study (Fig. 3.7).
Another interesting enriched pathway instead, the Huntington’s disease (HD) pathway,
highlighted an additional link of the dendritic plasticity dataset to the neurodegenerative
disease. HD causes astrogliosis and loss of medium spiny neurons. Areas of the brain are
affected according to their structure and the types of neurons they contain, reducing in size as
they cumulatively lose cells. The areas affected are mainly in the striatum, but also the frontal
and temporal cortices.
Although very different in its aetiology, Huntington’s disease shares with Alzheimer the
same degenerative processes as confirmed by the literature [60]. This preliminary investigation
suggested that most probably also the molecular mechanisms could have common aspects, at
least those related to dendritic plasticity.
Figure 3.7: Canonical pathways are listed from most significant to least and
orange line denotes cut-off for significance (p-value < 0.05). Taller bars
represent categories with greater number of genes compared to shorter bars.
54 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
In order to graphically visualize biological context and evaluate how and which genes were
included in canonical pathways, maps were very useful. In figures 3.8 and 3.9 two of the most
relevant to this analysis, the Glutamate Receptor and the Ephrin Receptor signalling pathways,
are used to exemplify this issue. For the two maps, correlation between dendritic plasticity genes
and either psychiatric diseases or mechanisms of plasticity respectively are reported.
Glutamate receptor signaling pathway potently inhibits actin dynamics in spines. Activation
of AMPA or NMDA subtypes blockade protrusive activity from the spine head causing spines to
become more stable and regular in their morphology [61].
Figure 3.8: The glutamate receptor signalling pathway as described in IPA. Genes of
the selected dataset overlapping pathways in object are highlighted in grey. Cyan
coloured edges show connections of genes to psychiatric diseases.
Ephrins are especially interesting because they are known to be strongly related to dendritic
plasticity mechanisms. Ephrins are cell surface ligands for Eph receptors the largest family of
3.4. Methods and Results (Part II) 55 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
tyrosine kinase receptors and during development they seem to influence cell behaviours such
as morphogenesis and organogenesis. In adulthood the Eph/ephrin cell communication system
continues to play roles in tissue plasticity giving shape to dendritic spines during neuronal
plasticity [62].
Figure 3.9: Genes in dendritic spine related mechanisms are emphasized in red spots on
the ephrins canonical pathway. Map was generated using IPA.
3.4.4 Gene networks
Although canonical pathways can be visually helpful they suffer from a somewhat artificial
grouping of genes in a limited number of maps and provide no guidance on the absolute
statistical significance of the results. A useful complementary approach to face the problem is
mapping genes on large network of biological relationships derived by gene interaction
databases (e.g. databases of gene regulation) or obtained by manual curation of the literature, as
56 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
for the IPA tool. A network is generally intended as a graphical representation of the molecular
relationships between genes/gene products. Genes or gene products are represented as nodes,
and the biological relationship between two nodes is represented as an edge (line). In IPA, all
edges are supported by at least one reference from the literature, from a textbook, or from
canonical information stored in the database Ingenuity Pathways Knowledge Base. Human,
mouse, and rat orthologs of a gene are stored as separate objects, but are represented as a single
node in the network.
Given such a network of biological relationships covering many types of cellular processes
such as signaling, transcriptional regulation and metabolism, and a query set of interesting
genes the objective is to search the network for subnetworks consisting mostly of query genes.
The group of genes in such subnetworks and the literature-based relationships among them
provide some biological insight into the mechanism of action. The method used relies on a
scoring function and an algorithm to find the high-scoring subnetworks. The genes contained in
the subnetwork found by the algorithm consist of members included in the query set and genes
not included in the query set that fill in the ‘gaps’. Number of ‘gaps’ can be modified (i.e.
augmented or reduced) to identify boundaries of the statistical significance measure for each
subnetwork.
Genes of the dataset being analyzed were selected based on their absolute disease class
numerosity (paragraph 3.3.3) that is correlated for the top categories to psychiatric diseases as
emerged from disease analyses, and used to query the network of biological relationships.
Therefore, 18 Schizophrenia and 20 Alzheimer’s disease related genes (Appendix B) assigned to
respective categories by the Disease Ontology Knowledgebase were overlaid independently to
the IPA proprietary global network of gene relationships. In details, the data set containing the
list of gene identifiers was uploaded into the application. Each gene identifier was mapped to its
corresponding gene object in the Ingenuity Pathways Knowledge Base. Subnetworks of these
focused genes were then algorithmically generated and ranked based on their connectivity score
for each disease separately; the top scoring subnetworks for Schizophrenia and Alzheimer
respectively put in evidence that some of the constituent genes were common to both diseases.
When the same group of highly relevant genes is assigned to independent high-scoring
networks in distinct analysis, it is reasonable speculating on similar role of constituent genes.
3.4. Methods and Results (Part II) 57 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
This aspect sometimes underlies the importance of specific genes in common molecular
functions or regulation processes. Genes spanning over different diseases are potentially
interesting to elucidate physiological mechanisms of pathologies but are also useful to classify
diseases especially when this task is based on clinical feature observations only as for several
psychiatric disorders.
In figure 3.10a the subnetwork of Schizophrenia related genes is represented, while in figure
3.10b same subnetwork is used to super-impose genes found in common with Alzheimer.
Figure 3.10a, 3.10b: Each gene is represented by a node in the graph and each
relationship between genes is represented by an edge. Genes of the dendritic plasticity
dataset related to Schizophrenia disease are displayed in grey boxes highlighted in blue
on the left picture. Four of them identified as also relevant to Alzheimer’s disease are
highlighted on the right hand frame. Direct and indirect relationships are respectively
represented as solid and dashed edges. Data were analyzed through IPA tool.
Nodes are displayed using various shapes to symbolize the functional class of the gene
product. Edges are associated to several different label classes that describe the nature of the
relationship between the nodes (e.g., P for phosphorylation, T for transcription). Negative
evidence (gene A does not bind gene B) and group/complex relationships are excludes from the
network.
58 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
The four genes shared by Schizophrenia and Alzheimer’s diseases, namely brain derived
neurotrophic factor (BDNF), nerve growth factor 2 (NTF3), apolipoprotein E precursor (APOE)
Figure 3.11: Schizophrenia and Alzheimer’s common genes boxed in blue were
mapped on the literature network to identify common top scoring networks.
and major prion protein precursor (PRNP) were back mapped on the literature network as
already described to search for high-scoring subnetworks.
Objective of this step was to identify an enlarged group of genes possibly implicated in basic
biological processes common to distinct neuropsychiatric diseases (Fig. 3.11).
3.4. Methods and Results (Part II) 59 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Topological study of those interactions highlighted interconnectivity among many genes and
allowed to identify several central nodes of the network. This is the case for Mapk, ERK and Akt
that are graphically located in strategic positions known as ‘hubs’ in scale free networks, where
higher number of relationships (i.e. edges) are attracted.
Even greater group of elements can be collected by extending the network to more distant
genes. The risk however is to obtain a too high number of unspecific and irrelevant connections.
Balance among enlarged networks and stringency criteria must be always carefully evaluated in
order to maintain consistent results. Therefore, to identify sub-network boundaries, gene set
obtained in the last analysis, which is an extension of the small group of genes related to both
Schizophrenia and Alzheimer, were compared to the DO Knowledgebase. As expected many
cognition related disease classes in addition to Schizophrenia and Alzheimer, such as Memory
Impairment, Personality Trait, Attention Deficit Hyperactivity Disorder, Cognition
Performance, Obsessive-Compulsive disorder, and Vascular Dementia emerged as the most
enriched in the set.
When the sub-network was extended to more distant relationships through the addition of
one gap among possibly related elements, a kind of dilution effect was observed. Correlation to
neuropsychiatric diseases resulted weakened, while apparently unrelated diseases started to
appear. Genes identified after network expansion do not overlap with any other gene either
included in the dendritic plasticity dataset or previously associated to Schizophrenia and
Alzheimer diseases. This high-scoring subnetwork is therefore particularly interesting to start
analysing constituent genes in fundamental mechanisms of neuropsychiatric disorders.
For the specific purpose of validating the quality and usefulness of the DO Knowledgebase
in network analysis this test was successful, as it allowed to properly discriminate the most
focused subnetwork after network expansion.
Complete list of genes identified with the network analysis and displayed in Fig. 10 is
available in Appendix B.
60 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.4.5 Drug/nutraceutical interactors
As described in previous paragraphs, analysis of the dendritic plasticity dataset leaded to the
identification of several Schizophrenia and Alzheimer’s related genes, which were then mapped
on gene pathways and networks to spot the most interesting. Network extension allowed then to
increment number of those in common to both diseases.
To investigate which of those genes were already known as targets of commercial drugs and
for what diseases, they were compared against the quite exhaustive information on gene-drug
interactions available in the DO Knowledgebase (Table 3.12). Data were easily retrieved using
the gene name as key field in the database query command.
Table 3.12: Genes identified for Schizophrenia and Alzheimer in the DO Knowledgebase
together with those obtained in common sub-network are associated to known interacting
drugs and principal indication. Data are sorted by ‘Main Indication’ column.
3.5. Discussion 61 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
3.5 Discussion
The research objective of this case study was to make some investigations of dendritic
plasticity mechanisms in the context of diseases. Application of the method described in
previous paragraphs suggested direct correlation between dendritic plasticity and neurological
diseases like Schizophrenia, Alzheimer and epilepsy; these general results were completely
confirmed by the literature. Reduction in markers of axon terminal density [63] [64] and
pyramidal cell somal volume [65] [66] in the prefrontal cortex of subjects with schizophrenia
have been already reported and associated to reduced density of pyramidal cell dendritic spines.
Similarly, a number of studies indicate that experimentally induced reductions in excitatory
afferent input can result in reduced dendritic spine density [67]. However, only recently reduced
dendritic spine density has been observed in subjects with Schizophrenia [68]. To the same
extent it has been widely confirmed that in physiological conditions, such as learning and
memory, and in pathological conditions, such as Alzheimer's disease and epilepsy, dendrites and
spines undergo dynamic changes [69].
More specifically, at the gene level, involvement of several dendritic plasticity specific genes
in both Schizophrenia and Alzheimer diseases (i.e. BDNF, NTF3, APOE, PRNP) were reported
after network analysis. This information supported the hypothesis that certain molecular
mechanisms are probably shared among different neuropsychiatric disorders. Correlation of
specific genes to both diseases has been investigated to a certain extent in other studies. BDNF
has already been identified as a risk locus for mood disorders in adults [70], anorexia nervosa
[71], obsessive compulsive disorder and in combination with NTF3, NTRK as possible candidate
gene for the attention deficit hyperactivity disorder (ADHD) [72].
BDNF and NTF3 have also been reported as possibly associated to the pathogenesis of
schizophrenia and to some neurodevelopment abnormalities found in the diseased brains [73].
Functional and structural alterations of the hippocampal formation have been described in
major depression but the underlying pathophysiology remains unclear. Interactions between the
5-HT-system and neurotrophic factors like BDNF and glutamate are however known to also
affect morphology of hippocampus [74].
62 Chapter 3. Case study ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
The presence of APOE has also been already associated with several neuropsychiatric
disorders and it is the first identified molecular susceptibility locus for sporadic and familial
forms of Alzheimer. Also, polymorphisms of PRNP are known to be strongly associated with
neurodegenerative disorders and might influence variables such as age at onset, disease
progression, cognitive impairment or response to antipsychotics [75].
63
Chapter 4
Conclusions
The development of a number of high throughput technologies such as transcriptional
profiling, proteomics, genetic association etc. is helpful in the identification of genes important
for a particular research investigation. Making correlations to biological pathways, diseases,
drugs or any other useful information for a subset of those genes is however still challenging.
Integration of different data sources, annotation projects, and high quality public domain
databases is still a major need. It is important to understand that biological interpretation of
gene lists based on single source of data can be successful but also limited by the underlying
knowledge bias and sometimes poor quality. Efforts involving multiple research groups with
mixed competencies guarantee wider coverage of the knowledge space and greater control over
quality. With very similar advantages, computational biology methods are extremely useful to
collect and analyse huge amounts of data, provided that sources of information are carefully
selected.
In this PhD work I developed a computational resource based on highly consistent cross
domain information, able to link genes, diseases and drugs in the same framework. Resulting
knowledgebase can be used either to extract biological knowledge out of gene sets or to simply
correlate single genes to the diseases they are known to be involved with and the drugs they are
modulated by. One of the major expectations for this computational effort was data quality,
therefore careful selection of sources and manual curation of data has been guaranteed.
The backbone of the system is represented by ontologies, which under the Open Biomedical
Ontologies umbrella represent a fundamental step towards the standardization of the
64 Chapter 4. Conclusions ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
multiplicity of possible descriptors available to synthesize the biological knowledge. One of
those ontologies, the Disease Ontology (DO) and its annotation with proper genes and drugs
was the primary target of the first part of my PhD activity. When this activity started there were
no other groups in the scientific community working to the same objective, which can be
therefore considered an original and innovative contribution to the field.
I developed software tools to automatically extract gene-disease-drug relations from several,
often manually curated databases. The preliminary creation of a disease and drug dictionary of
synonyms allowed term-term matching necessary to make consistent annotations of the
ontology. The information collected across this process has been stored in MySQL relational
database together with references to the original source. The DO Knowledgebase contains
original information from the GO, ChEBI, DrugBank, GKB, KEGG, OMIM, GAD databases that
can be easily cross-linked by interrogating the system with simple or complex queries. The
knowledgebase content is fully available command line on UNIX/LINUX but a preliminary
simplified Web interface has also been developed to quickly search correlations among single
genes, diseases and drugs. Searches for entire gene datasets will be implemented in the next
version of the tool. A fully functional Web interface will be developed to include user
registration, comment form, and basic or advanced query options to access data for genes,
drugs, disorders and their relationships. BioMart1 software is the solution currently investigated
to implement the interface. BioMart is a query-oriented data management system developed
jointly by the European Bioinformatics Institute (EBI) and Cold Spring Harbor Laboratory
(CSHL). BioMart simplifies the task of creation and maintenance of advanced query interfaces
backed by a relational database and it is particularly suited for providing the 'data mining' like
searches of complex descriptive (e.g. biological) data. It can work with existing data repositories
by converting them to the required format, as well as newly created databases. The annotation
process is expected to continue increasing the number of entries collected in the database. At the
same time the vocabulary will be improved with additional synonyms and integrated with
supplementary information such as both pathways and biological reaction data.
1http://www.biomart.org/
65 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
It would be also necessary to develop a new strategy to keep resources up to date since
knowledge of genes, drugs and diseases accumulate and change rapidly. Some revised data
curation plan is necessary to speed up the process of manual curation, which is currently very
time consuming.
This resource is expected to be further improved with feedback and collaboration of
experimental groups that are benefited by its usage. Therefore, the Disease Ontology
Knowledgebase will be made accessible to collaborators and participating members of the
scientific community to evaluate its functionality testing the system and come back with
constructive suggestions. Those interested to work on case studies devoted to system validation
will be particularly welcome.
Second part of my PhD activities was spent to test the knowledgebase by investigating an
important mechanism related to learning and memory processes, namely the dendritic
plasticity, that has been recently suggested to be strongly involved in neuropsychiatric disorders.
Despite the increasing amount of information emerged around plasticity however, underlying
molecular mechanisms are still poorly known. To make use of the knowledgebase creating
meantime a context to dendritic plasticity I collected several hundred relevant genes principally
from the GO and the literature. That large group of genes allowed building knowledge from solid
foundation and offered significant chances to find some actual connections to neuropsychiatric
diseases. As a confirmation, when the DO Knowledgebase was interrogated with the set of
around 250 genes, Alzheimer’s disease and Schizophrenia emerged as the best hits. Among
others also Obesity, Epilepsy and Myocardial infarction were found to be possibly correlated to
the dataset. However, since statistical confirmation of significance for over-represented groups
of genes will be object of future improvement of the knowledgebase I exploited other indirect
methods to validate results. Analysis of the same set of ~250 genes related to dendritic plasticity
with both public domain (DAVID) and commercial tools (IPA) allowed to confirm that
Neurodegenerative Diseases and Alzheimer are the diseases most significantly enriched.
In the following step, the application of additional pathway and networking methods
applied to the genes associated to Schizophrenia and Alzheimer diseases allowed to extend
knowledge to further relations with dendritic plasticity mechanisms, which need however to be
validated. Several canonical pathways and many hub genes highlighted in this computational
66 Chapter 4. Conclusions ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
research could be easily used to start additional confirmation studies, as for instance gene
expression experiments in brain diseased tissues.
Full usage of the knowledgebase I developed in this PhD work allowed me to extend results
of the case study analysis beyond its objectives and demonstrated that it is valuable when
applied to real data. As a matter of fact, the successful identification of diseases correlated to
plasticity has been also possible because of the selection of a critical number of relevant genes.
This is usually not a problem when the originating experiment is based on microarray
expression or proteomics, however low fold changes and consequently difficult identification of
regulated genes is often a problem for many experiments with CNS samples.
The knowledgebase is immediately useful to also identify diseases or drugs linked to single
genes. In this case careful annotation of the DO, progressed as described above, is the basic
element to pull out true relationships from the database. Similarly, it is not needed any
fundamental improvement to only extract annotations from the knowledgebase for gene/protein
datasets. Conversely, the Disease Ontology Knowledgebase still needs to be improved on several
aspects. The principal is further annotation of the Disease Ontology, necessary to obtain even
more consistent and robust results especially when the number of genes in the dataset
investigated is low. Two others have been identified, the implementation of some statistical
methods to measure significance applied to over-representation analysis and a full interface to
allow users exploiting completely the content of the knowledgebase.
Further improvements hopefully suggested by users are expected if the system will ensure
additional value to their research activities.
67
Appendix A
Dendritic Plasticity gene dataset
Table below contains list of all the over 220 genes collected from public domain sources (e.g. Gene Ontology) with some relevant annotation included.
Gene Ontology
Gene Name Official Gene Symbol
Alias Symbols
Gene Ontology (Biological processes)
Gene Ontology (Molecular function)
Gene Ontology (Cellular
component)
Genes below were obtained from the Gene Ontology database
angiotensinogen (serine (or cysteine) proteinase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 8)
AGT ANHU;SERPINA8
Regulation of long-term neuronal synaptic plasticity
hormone activity;serine-type endopeptidase inhibitor activity
soluble fraction
dishevelled, dsh homolog 1 (Drosophila)
DVL1 DVL;MGC54245
Positive regulation of dendrite morphogenesis, Dendrite morphogenesis
protein binding;signal transducer activity
cytoplasmic vesicle
EphB2 EPHB2 DRT;EPHT3;ERK;Hek5;Tyro5
Positive regulation of long-term neuronal synaptic plasticity, Regulation of neuronal synaptic plasticity
ATP binding;axon guidance receptor activity;transmembrane-ephrin receptor activity
integral to plasma membrane
forkhead box G1 FOXG1
HFK2, QIN, BF1, HFK1, HFK3, HBF-3
Neuron morphogenesis during differentiation
FYN oncogene related to SRC, FGR, YES
FYN MGC45350;SLK;SYN
Regulation of neuronal synaptic plasticity
ATP binding;non-membrane spanning protein tyrosine kinase activity;protein serine/threonine kinase activity
actin filament;cellular_component unknown
68 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
guanine nucleotide binding protein (G protein), q polypeptide
GNAQ G-ALPHA-q Neuron remodeling
GTP binding;heterotrimeric G-protein GTPase activity;signal transducer activity
cytoplasm;heterotrimeric G-protein complex
glutamate receptor, ionotropic, AMPA 1
GRIA1 GLUH1;GLUR1;GLURA;HBGR1
Regulation of long-term neuronal synaptic plasticity
alpha-amino-3-hydroxy-5-methyl-4-isoxazole propionate selective glutamate receptor activity;glutamate-gated ion channel activity;kainate selective glutamate receptor activity;potassium channel activity;protein prenyltransferase activity
integral to membrane;plasma membrane
glutamate receptor, ionotropic, kainate 1
GRIK1 EAA3;EEA3;GLR5;GLUR5
Regulation of short-term neuronal synaptic plasticity, Regulation of long-term neuronal synaptic plasticity
glutamate-gated ion channel activity;kainate selective glutamate receptor activity;potassium channel activity
integral to plasma membrane
glutamate receptor, ionotropic, kainate 2
GRIK2 EAA4;GLR6;GLUR6
Regulation of short-term neuronal synaptic plasticity
glutamate-gated ion channel activity;kainate selective glutamate receptor activity;potassium channel activity
integral to plasma membrane
glutamate receptor, metabotropic 5
GRM5
GPRC1E;MGLUR5;MGLUR5A;MGLUR5B;mGlu5
Positive regulation of long-term neuronal synaptic plasticity
metabotropic glutamate\, GABA-B-like receptor activity
integral to plasma membrane
hypoxanthine phosphoribosyltransferase 1 (Lesch-Nyhan syndrome)
HPRT1 HGPRT;HPRT
Dendrite morphogenesis
hypoxanthine phosphoribosyltransferase activity;magnesium ion binding
chloroplast
v-Ha-ras Harvey rat sarcoma viral oncogene homolog
HRAS HRAS1;RASH1
Regulation of long-term neuronal synaptic plasticity
GTP binding;RAS small monomeric GTPase activity
cytoplasm;plasma membrane
basic helix-loop-helix domain containing, class B, 3
BHLHB3 DEC2, SHARP-1, SHARP1
Regulation of neuronal synaptic plasticity
transcription factor activity
nucleus
amyloid beta (A4) precursor protein (protease nexin-II, Alzheimer disease)
APP AAA;ABETA;AD1;CVAP
Neuron remodeling
cell adhesion molecule activity;heparin binding;protein binding;serine-type endopeptidase inhibitor activity
Golgi apparatus;coated pit;endoplasmic reticulum;extracellular;integral to plasma membrane
v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog
KRAS KRAS1 Regulation of long-term neuronal synaptic plasticity
69 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
calcium channel, voltage-dependent, P/Q type, alpha 1A subunit
CACNA1A
APCA;CACNL1A4;EA2;FHM;HPCA;MHP;MHP1;SCA6
Dendrite morphogenesis
DNA binding;calcium ion binding;voltage-gated calcium channel activity
nucleus;voltage-gated calcium channel complex
calcium channel, voltage-dependent, alpha 1F subunit
CACNA1F
CSNB2;CSNBX2
Dendrite morphogenesis
calcium ion binding;voltage-gated calcium channel activity;voltage-gated sodium channel activity
voltage-gated calcium channel complex;voltage-gated sodium channel complex
dopamine receptor D5
DRD5 DBDR;DRD1B;DRD1L2;MGC10601
Regulation of long-term neuronal synaptic plasticity
dopamine receptor activity
integral to plasma membrane
matrix metallopeptidase 9
Mmp9 positive regulation of synaptic plasticity
gelatinase B activity extracellular space
glutamate receptor, ionotropic, N-methyl D-aspartate 2B
GRIN2B NMDAR2B;NR2B;hNR3
Regulation of neuronal synaptic plasticity
N-methyl-D-aspartate selective glutamate receptor activity;glutamate-gated ion channel activity;magnesium ion binding
integral to plasma membrane;synaptic vesicle
apolipoprotein E APOE Regulation of neuronal synaptic plasticity
antioxidant activity;apolipoprotein E receptor binding;beta-amyloid binding;heparin binding;lipid binding;lipid transporter activity;low-density lipoprotein receptor binding;tau protein binding
cytoplasm;extracellular;membrane
asp (abnormal spindle)-like, microcephaly associated (Drosophila)
ASPM;MCPH5
FLJ10517;FLJ10549;MCPH5
Forebrain neuroblast division
calmodulin binding nucleus
T-cell leukemia, homeobox 2
TLX2 Enx;HOX11L1;NCX
Negative regulation of dendrite morphogenesis
molecular_function unknown;transcription factor activity
cellular_component unknown;nucleus
nuclear factor of activated T-cells, cytoplasmic, calcineurin-dependent 4
NFATC4 regulation of synaptic plasticity
growth factor cytosol, nucleus
neuroplastin NPTN SDR1, GP55, GP65, np65, np55
Positive regulation of long-term neuronal synaptic plasticity
receptor activity integral to membrane
metallothionein 3 (growth inhibitory factor (neurotrophic))
MT3 GIF;GIFB;GRIF
Negative regulation of dendrite morphogenesis
antioxidant activity;copper ion binding;electron transporter activity;zinc ion binding
synaptic vesicle
70 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
early growth response 2 (Krox-20 homolog, Drosophila)
EGR2 CMT1D;CMT4E;KROX20
Regulation of neuronal synaptic plasticity
DNA binding transcription factor complex
synaptophysin SYP Regulation of neuronal synaptic plasticity
calcium ion binding;molecular_function unknown;transporter activity
integral to synaptic vesicle membrane;synapse;synaptosome
acetylcholinesterase (YT blood group)
ACHE YT Positive regulation of dendrite morphogenesis
acetylcholine binding;acetylcholinesterase activity;beta-amyloid binding;cholinesterase activity;protein homodimerization activity;serine
basal lamina;membrane;synapse
ring finger protein 39
RNF39 HZF;HZFW;HZFW1;HZFw1;LIRF
Regulation of neuronal synaptic plasticity
molecular_function unknown;zinc ion binding
cellular_component unknown;integral to membrane
pro-melanin-concentrating hormone
PMCH MCH Regulation of neuronal synaptic plasticity
melanin-concentrating hormone activity;molecular_function unknown;neuropeptide hormone activity
extracellular
ras-related C3 botulinum toxin substrate 1 (rho family, small GTP binding protein Rac1)
RAC1 TC-25;p21-Rac1
Positive regulation of dendrite morphogenesis
ATP binding;GTP binding;Rho small monomeric GTPase activity
filopodium
ATPase, Cu++ transporting, alpha polypeptide (Menkes syndrome)
ATP7A MK;MNK;OHS
Pyramidal neuron development , Dendrite morphogenesis
ATP binding;copper ion binding;copper-exporting ATPase activity;magnesium ion binding;mercury ion transporter activity
Golgi apparatus;integral to plasma membrane
myosin light chain kinase 2, skeletal muscle
MYLK2 KMLC;MLCK;skMLCK
Regulation of neuronal synaptic plasticity
ATP binding;calmodulin binding;myosin-light-chain kinase activity;protein-tyrosine kinase activity
mitogen-activated protein kinase 8
MAPK8
JNK;JNK1;JNK1A2;JNK21B1/2;PRKM8;SAPK1
Positive regulation of dendrite morphogenesis
ATP binding;JUN kinase activity;MAP kinase activity;MAP kinase kinase activity
nucleus
presenilin 1 Psen1 regulation of synaptic plasticity
cadherin binding, peptidase activity
dendrite, axon
presenilin 2 Psen2 regulation of synaptic plasticity
endopeptidase activity
cell soma, Z disc, integral to plasma membrane
71 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Ras protein-specific guanine nucleotide-releasing factor 1
RASGRF1
CDC25;CDC25L;GNRP;GRF1;GRF55;H-GRF55
Regulation of neuronal synaptic plasticity
Ras guanyl-nucleotide exchange factor activity
nucleosome;plasma membrane;synaptosome
Regulation of neuronal synaptic plasticity
brain-derived neurotrophic factor
BDNF MGC34632
Regulation of short-term neuronal synaptic plasticity, Regulation of long-term neuronal synaptic plasticity
growth factor activity;protein binding
extracellular
S100 calcium binding protein, beta (neural)
S100B NEF;S100
Regulation of long-term neuronal synaptic plasticity, Regulation of neuronal synaptic plasticity
S100 alpha binding;S100 beta binding;calcium ion binding;kinase inhibitor activity;protein homodimerization activity;tau protein binding;zinc ion binding
cytoplasm;extracellular
steroidogenic acute regulatory protein
STAR STARD1 Regulation of neuronal synaptic plasticity
cholesterol binding;cholesterol transporter activity;lipid binding
mitochondrion
wingless-type MMTV integration site family, member 7B
WNT7B Positive regulation of dendrite morphogenesis
extracellular matrix structural constituent;signal transducer activity
extracellular
tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, eta polypeptide
YWHAH YWHA1 Negative regulation of dendrite morphogenesis
protein domain specific binding;protein kinase C inhibitor activity
cytoplasm
LIM homeobox 8 LHX8 Lhx7 Forebrain neuron differentiation
protein tyrosine phosphatase, receptor type, f polypeptide (PTPRF), interacting protein (liprin), alpha 3
PPFIA3 KIAA0654;LPNA3
Regulation of short-term neuronal synaptic plasticity
basic helix-loop-helix domain containing, class B, 2
BHLHB2 DEC1;Stra14 Regulation of neuronal synaptic plasticity
transcription factor activity
nucleus
Kruppel-like factor 7 (ubiquitous)
KLF7 UKLF Dendrite morphogenesis
transcription coactivator activity;transcription factor activity;zinc ion binding
perinuclear space
calcium/calmodulin-dependent protein kinase (CaM kinase) II gamma
CAMK2G
CAMK;CAMK-II;CAMKG;MGC26678
Regulation of long-term neuronal synaptic plasticity
ATP binding;calcium-dependent protein serine/threonine phosphatase activity;calcium/calmodulin-dependent
cellular_component unknown;membrane
72 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
protein kinase activity;calmodulin binding;calmodulin-dependent protein kinase I activity;protein-tyrosine kinase activity;signal transducer activity;transporter activity
doublecortin-like kinase 1
DCLK1 KIAA0369, DCLK, DCDC3A
Dendrite morphogenesis
FERM, RhoGEF and pleckstrin domain protein 2
FARP2 FIR;FRG;KIAA0793
Neuron remodeling guanyl-nucleotide exchange factor activity
cytoskeleton
GIPC PDZ domain containing family, member 1
GIPC1 regulation of synaptic plasticity
PDZ domain binding
dendritic spine, synaptic vesicle
citron (rho-interacting, serine/threonine kinase 21)
CIT CRIK;KIAA0949;STK21
Negative regulation of dendrite morphogenesis
diacylglycerol binding;small GTPase regulatory/interacting protein activity
actin cytoskeleton
leucine zipper, putative tumor suppressor 1
LZTS1 F37;FEZ1 Regulation of dendrite morphogenesis
transcription factor activity
cytoplasm;nucleus
neurochondrin KIAA0607 Regulation of neuronal synaptic plasticity
activity-regulated cytoskeleton-associated protein
ARC KIAA0278 Regulation of neuronal synaptic plasticity
actin binding cytoskeleton
signal-induced proliferation-associated 1 like 1
KIAA0440, E6TP1
Regulation of dendrite morphogenesis
Rho family GTPase 1
RND1 Rho6, ARHS Neuron remodeling
plexin A3 PLXNA3
6.3;PLEXIN-A3;PLXN4;Plxn3;SEX;XAP-6
Pyramidal neuron development
transmembrane receptor activity
integral to membrane
netrin 4 NTN4 PRO3091 Neuron remodeling structural molecule activity
extracellular matrix
chondroitin sulfate proteoglycan BEHAB
BEHAB;MGC13038
Regulation of neuronal synaptic plasticity
hyaluronic acid binding;sugar binding
73 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
chondroitin sulfate proteoglycan 4 (melanoma-associated)
CSPG4
MCSP;MCSPG;MEL-CSPG;MSK16;NG2
Neuron remodeling
ATP binding;hydrogen-transporting two-sector ATPase activity
integral to plasma membrane
cytoplasmic polyadenylation element binding protein 1
CPEB1 CPEB;FLJ13203
Regulation of neuronal synaptic plasticity
nucleic acid binding viral nucleocapsid
drebrin 1 DBN1 D0S117E;DKFZp434D064
Regulation of neuronal synaptic plasticity
actin binding;profilin binding
actomyosin;dendrite
doublecortex; lissencephaly, X-linked (doublecortin)
DCX DBCN;DC;LISX;SCLH;XLIS
Dendrite morphogenesis
microtubule binding microtubule associated complex
discs, large homolog 4 (Drosophila)
DLG4 PSD95;SAP90
Regulation of long-term neuronal synaptic plasticity
guanylate kinase activity;membrane-associated guanylate kinase;protein C-terminus binding
intercellular junction
candidate plasticity gene 1
Regulation of neuronal synaptic plasticity
discs, large homolog 4 (Drosophila)
Regulation of long-term neuronal synaptic plasticity
Genes below were obtained from the literature (manually curated)
synovial sarcoma translocation gene on chromosome 18-like 1
SS18L1
CREST; SS18L1; LP2261; KIAA0693; MGC26711; MGC78386
cAMP responsive element binding protein 1
CREB1
nerve growth factor (beta polypeptide)
NGF NGFB; HSAN5; Beta-NGF
neuronal growth, survival, differentiation
neurotrophin secreted
glutamate receptor interacting protein 1
GRIP1 GRIP intracellular signaling cascade
protein binding;receptor signaling complex scaffold activity
cellular_component unknown;ribosome
sema domain, immunoglobulin domain (Ig), short basic domain, secreted, (semaphorin) 3A
SEMA3A
Hsema-I;SEMA1;SEMAD;SEMAIII;SEMAL;SemD;coll-1;sema III
neurogenesis receptor activity extracellular
Notch homolog 1, translocation-associated (Drosophila)
NOTCH1 notch-1;TAN1, hN1
74 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
neurotrophic tyrosine kinase, receptor, type 2
NTRK2
neurotrophic tyrosine kinase, receptor, type 2;SK378;TrkB.FL;NTRK2
transmembrane receptor protein tyrosine kinase signaling pathway;protein amino acid phosphorylation;neurogenesis
integral to plasma membrane;membrane;integral to membrane
neurotrophin TRKB receptor activity;kinase activity;transferase activity;receptor activity;neurotrophin binding;transmembrane receptor protein tyrosine kinase activity;ATP binding
v-abl Abelson murine leukemia viral oncogene homolog 1
ABL1
RP11-83J21.1, ABL, JTK7, bcr/abl, c-ABL, p150, v-abl, abl1
Synaptic plasticity; dendrite arborisation
protein kinase cytoplasm, dendritic spine
cadherin, EGF LAG seven-pass G-type receptor 2 (flamingo homolog, Drosophila)
CELSR2
CDHF10;EGFL2;Flamingo1;KIAA0279;MEGF3
development;homophilic cell adhesion;neuropeptide signaling pathway
G-protein coupled receptor activity;calcium ion binding;structural molecule activity
integral to membrane
rhodopsin (opsin 2, rod pigment) (retinitis pigmentosa 4, autosomal dominant)
RHO OPN2;RP4
G-protein coupled receptor protein signaling pathway;phototransduction\, visible light;rhodopsin mediated signaling
G-protein coupled photoreceptor activity
integral to plasma membrane
cell division cycle 42 (GTP binding protein, 25kDa)
CDC42 CDC42Hs;G25K
actin filament organization;small GTPase mediated signal transduction
GTP binding;Rho small monomeric GTPase activity
filopodium
T-cell lymphoma invasion and metastasis 1
TIAM1 intracellular signaling cascade
Rho guanyl-nucleotide exchange factor activity;protein binding;receptor signaling protein activity
membrane
cyclin-dependent kinase 5
CDK5 PSSALRE
axonogenesis;cell cycle;cytokinesis;protein amino acid phosphorylation
ATP binding;cyclin-dependent protein kinase activity;protein-tyrosine kinase activity
cytoplasm
adenosine A2a receptor
ADORA2A
ADORA2;RDC8;hA2aR
adenylate cyclase activation;apoptosis;blood coagulation;cAMP biosynthesis;cell-cell signaling;cellular defense response;central nervous system development;circulation;inflammatory response;phagocytosis;sensory
A2A adenosine receptor activity\, G-protein coupled
integral to plasma membrane;membrane fraction
75 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
perception
clusterin (complement lysis inhibitor, SP-40,40, sulfated glycoprotein 2, testosterone-repressed prostate message 2, apolipoprotein J)
CLU
APOJ;CLI;SGP-2;SGP2;SP-40;TRPM-2;TRPM2
apoptosis;complement activation\, classical pathway;fertilization (sensu Animalia);lipid metabolism
binding extracellular
dystrophin (muscular dystrophy, Duchenne and Becker types)
DMD
BMD;DXS142;DXS164;DXS206;DXS230;DXS239;DXS268;DXS269;DXS270;DXS272
biological_process unknown;muscle contraction;muscle development
actin binding;calcium ion binding;molecular_function unknown;structural constituent of cytoskeleton;zinc ion binding
cellular_component unknown;cytoskeleton;membrane
epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)
EGFR ERBB;ERBB1
EGF receptor signaling pathway;cell proliferation;electron transport;protein amino acid phosphorylation
ATP binding;electron transporter activity;epidermal growth factor receptor activity
cytoskeleton;endosome;integral to plasma membrane
growth arrest-specific 7
GAS7 KIAA0394 cell cycle arrest;neurogenesis
transcription factor activity
mitochondrion
gap junction protein, alpha 1, 43kDa (connexin 43)
GJA1
CX43;DFNB38;ODD;ODDD;ODOD;SDTY3
cell-cell signaling;hearing;heart development;muscle contraction;regulation of heart rate;transport
connexon channel activity;ion transporter activity
connexon complex;integral to plasma membrane
leukemia inhibitory factor receptor
LIFR cell surface receptor linked signal transduction
leukemia inhibitory factor receptor activity
integral to plasma membrane
neurofilament, light polypeptide 68kDa
NEFL CMT1F;CMT2E;NF68;NFL
cytoskeleton organization and biogenesis
structural constituent of cytoskeleton
neurofilament
Microtubule-associated protein 1S
MAP1S/C19ORF5
MAP1S, C19ORF5
thrombospondin 4 THBS4 TSP4
cell adhesion;substrate-bound cell migration\, cell extension
calcium ion binding;cell adhesion molecule activity;heparin binding;structural molecule activity
extracellular matrix;extracellular space
adenomatosis polyposis coli
APC DP2;DP2.5;DP3;FAP;FPC;GS
Wnt receptor signaling pathway;cell adhesion;negative regulation of cell cycle;protein
beta-catenin binding cytoplasm
76 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
complex assembly
cyclin-dependent kinase 5, regulatory subunit 1 (p35)
CDK5R1
CDK5P35;CDK5R;MGC33831;NCK5A;p23;p25;p35;p35nck5a
brain development;regulation of CDK activity;regulation of neuron differentiation
cyclin-dependent protein kinase 5 activator activity;protein kinase activity
cyclin-dependent protein kinase 5 activator complex
cell adhesion molecule with homology to L1CAM (close homolog of L1)
CHL1 CALL;L1CAM2
cell adhesion;signal transduction
cell adhesion molecule activity
integral to membrane
contactin 4 CNTN4
AXCAM;BIG-2;CNTN4A;MGC33615
cell adhesion;signal transduction
cell adhesion molecule activity
integral to membrane
dihydropyrimidinase-like 3
DPYSL3
CRMP-4;CRMP4;DRP-3;DRP3;ULIP
neurogenesis;nucleobase\, nucleoside\, nucleotide and nucleic acid metabolism;signal transduction
dihydropyrimidinase activity
membrane
fibroblast growth factor receptor 1 (fms-related tyrosine kinase 2, Pfeiffer syndrome)
FGFR1
BFGFR;C-FGR;CEK;FLG;FLJ14326;FLT2;H2;H3;H4;H5;KAL2;N-SAM
FGF receptor signaling pathway;MAPKKK cascade;cell growth;oncogenesis;protein amino acid phosphorylation;skeletal development
ATP binding;fibroblast growth factor receptor activity;heparin binding
integral to plasma membrane;membrane fraction
galanin receptor 2 GALR2 GALNR2
G-protein signaling\, coupled to cAMP nucleotide second messenger;cytosolic calcium ion concentration elevation;development;digestion;feeding behavior;learning and/or memory;muscle contraction;synaptic transmission
galanin receptor activity
integral to membrane;plasma membrane
glial cell derived neurotrophic factor
GDNF
G protein-regulator of neurite outgrowth 1
GPRIN1 KIAA1983
glutamate receptor, metabotropic 4
GRM4 GPRC1D;MGLUR4;mGlu4
negative regulation of adenylate cyclase activity;synaptic transmission
metabotropic glutamate\, GABA-B-like receptor activity
integral to plasma membrane
77 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
insulin-like growth factor 1 receptor
IGF1R JTK13
anti-apoptosis;insulin receptor signaling pathway;positive regulation of cell proliferation;protein amino acid phosphorylation;regulation of cell cycle
ATP binding;epidermal growth factor receptor activity;insulin-like growth factor receptor activity;protein binding
integral to membrane
laminin, beta 1 LAMB1 CLM
myosin, heavy polypeptide 10, non-muscle
MYH10 NMMHCB cytokinesis
ATP binding;actin binding;calmodulin binding;motor activity
myosin complex
neurturin NRTN
phosphatase and tensin homolog (mutated in multiple advanced cancers 1)
PTEN
BZS;MGC11227;MHAM;MMAC1;PTEN1;TEP1
development;negative regulation of cell cycle;protein amino acid dephosphorylation;regulation of CDK activity
phosphatidylinositol-3\,4\,5-trisphosphate 3-phosphatase activity;protein-tyrosine-phosphatase activity
cytoplasm
protein tyrosine phosphatase, receptor type, K
PTPRK R-PTP-kappa
protein amino acid dephosphorylation;transmembrane receptor protein tyrosine phosphatase signaling pathway
transmembrane receptor protein tyrosine phosphatase activity
integral to plasma membrane
runt-related transcription factor 3
RUNX3 AML2;CBFA3;PEBP2aC
cell proliferation;regulation of transcription\, DNA-dependent
ATP binding;DNA binding
nucleus;ribosome
septin 2 SEPT2
DIFF6, KIAA0158, NEDD5, Pnutl3, hNedd5
abl interactor 2 ABI2
ABI-2, ABI2B, AIP-1, AblBP3, SSH3BP2, argBPIA, argBPIB
ADP-ribosylation factor 6
ARF6
intracellular protein transport;nonselective vesicle transport;small GTPase mediated signal transduction
ARF small monomeric GTPase activity;GTP binding;enzyme activator activity;protein transporter activity
Golgi apparatus;membrane fraction;plasma membrane
Bardet-Biedl syndrome 1
BBS1 BBS2L2;FLJ23590
78 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
Bardet-Biedl syndrome 4
BBS4 vision
choline acetyltransferase
CHAT FIMG2 neurotransmitter biosynthesis
choline O-acetyltransferase activity
cytoplasm;nucleus
FEZ family zinc finger 2
FEZF2
FEZ, FEZL, FKSG36, FLJ10142, TOF, ZFP312, ZNF312
regulation of forebrain development
transcription factor activity
nucleus
ghrelin precursor GHRL MTLRP
G-protein coupled receptor protein signaling pathway;cell-cell signaling
growth hormone receptor binding;growth hormone-releasing hormone activity
extracellular space;soluble fraction
glutamate receptor, ionotropic, N-methyl-D-aspartate 3A
GRIN3A NMDAR-L;NR3A
ion transport
glutamate-gated ion channel activity;inotropic glutamate receptor activity
membrane
immunoglobulin superfamily, member 9, dasm1
IGSF9 KIAA1355;Nrt1
flight behavior transmembrane receptor activity
integral to plasma membrane
leukocyte specific transcript 1
LST1 B144;D6S49E;LST-1
cellular defense response
defense/immunity protein activity
integral to plasma membrane
MCF.2 cell line derived transforming sequence
MCF2 DBL
cell growth and/or maintenance;intracellular signaling cascade
guanyl-nucleotide exchange factor activity
cytoskeleton;cytosol;membrane fraction
methyl CpG binding protein 2 (Rett syndrome)
MECP2 MRX16;MRX79;PPMX;RTS;RTT
negative regulation of transcription from Pol II promoter
methyl-CpG binding;transcription corepressor activity
chromatin;nucleus
Microtubule associated protein 1B
MAP1B
DKFZp686E1099, DKFZp686F1345, FLJ38954, FUTSCH, MAP5
microtubule cytoskeleton
Microtubule associated protein 2
MAP2
DKFZp686I2148, MAP2A, MAP2B, MAP2C
microtubule cytoskeleton
myosin VI MYO6 DFNA22;DFNB37;KIAA0389
cytoskeleton organization and biogenesis;hearing;striated muscle contraction
ATP binding;actin binding;calmodulin binding;motor activity;myosin ATPase activity;structural constituent of muscle
unconventional myosin
neuropilin 1 NRP1 NRP;VEGF165R
angiogenesis;axon guidance;cell adhesion;cell-cell signaling;positive
cell adhesion molecule activity;protein binding;vascular
integral to membrane;membrane fraction
79 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
regulation of cell proliferation;signal transduction
endothelial growth factor receptor activity
p21/Cdc42/Rac1-activated kinase 1 (STE20 homolog, yeast)
PAK1 PAKalpha
JNK cascade;apoptosis;protein amino acid phosphorylation
ATP binding;protein serine/threonine kinase activity
focal adhesion
protein phosphatase 1, regulatory subunit 9B, spinophilin
PPP1R9B PPP1R9;SPINO
RNA splicing;cell cycle arrest;colony morphology;interpretation of external signals that regulate cell growth;negative regulation of cell growth;regulation of cell proliferation;regulation of exit from mitosis;transport
protein phosphatase 1 binding;protein phosphatase inhibitor activity;transporter activity
cytoplasm;membrane;nucleoplasm;protein phosphatase type 1 complex
protein kinase, cGMP-dependent, type I
PRKG1
CGKI;PGK;PRKG1B;PRKGR1B;cGKI-BETA;cGKI-alpha
actin cytoskeleton organization and biogenesis;actin cytoskeleton reorganization;protein amino acid phosphorylation;regulation of smooth muscle contraction;signal transduction
3'\,5'-cGMP binding;ATP binding;cAMP-dependent protein kinase regulator activity;cGMP-dependent protein kinase activity;protein-tyrosine kinase activity
cAMP-dependent protein kinase complex
pleckstrin homology, Sec7 and coiled-coil domains 2 (cytohesin-2)
PSCD2 ARNO;CTS18.1;Sec7p-L
actin cytoskeleton organization and biogenesis;endocytosis;signal transduction
ARF guanyl-nucleotide exchange factor activity
membrane fraction;plasma membrane
scavenger receptor class F, member 1
SCARF1 KIAA0149;MGC47738;SREC
cell adhesion;low-density lipoprotein catabolism;receptor mediated endocytosis
cell adhesion molecule activity;low-density lipoprotein binding;scavenger receptor activity;structural molecule activity
integral to membrane
synaptic Ras GTPase activating protein 1 homolog (rat)
SYNGAP1
DKFZp761G1421;KIAA1938;RASA1;RASA5;SYNGAP
GTPase activator activity
trafficking protein particle complex 4
TRAPPC4
ER to Golgi transport;dendrite morphogenesis;neurotransmitter receptor biosynthesis;vesicle-mediated transport
protein binding
Golgi cis-face;Golgi stack;dendrite;endoplasmic reticulum;synapse;synaptic junction;synaptic vesicle
kalirin, RhoGEF kinase
Kalrn
signal transduction;vesicle-mediated transport
guanyl-nucleotide exchange factor activity
determination of neuronal dendritic morphology
molecular switch nuclear transcription factor
80 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
tRNA cysteine TRC
Down syndrome cell adhesion molecule
DSCAM CHD2-42;CHD2-52
cell adhesion;neurogenesis
cell adhesion molecule activity
integral to plasma membrane;membrane fraction
Spineless (Drosophila); aryl hydrocarbon receptor (vertebrates)
determination of neuronal dendritic morphology
regulation of gene transcription
nuclear transcription factor
cadherin 2, type 1, N-cadherin (neuronal)
CDH2
CDHN; NCAD; CD325; CDw325
Wilson-Turner X-linked mental retardation syndrome
WTS MRXS6
Salvador SAV1
Neuritin NRN1 MGC44811, NRN, dJ380B8.2
neuritogenesis cell adhesion GPI-anchored membrane receptor
myosin Va Myo5a synapse organization and biogenesis
protein binding axon, neuron projection
staufen (RNA binding protein) homolog 1 (Drosophila)
Stau1 intracellular mRNA localization
double-stranded RNA binding
neuron projection
Rho GTPase-activating protein
RICS
Genes below were obtained from Jackson's Lab. phenotypes with data on LTP/LTD
amiloride-sensitive cation channel 2, neuronal
ACCN2_MOUSE
ACCN2_MOUSE;ASIC1_MOUSE;ASIC;Accn2;amiloride-sensitive cation channel 2, neuronal
transport;ion transport;cation transport;sodium ion transport;calcium ion transport;monovalent inorganic cation transport;associative learning;response to acid;memory
calcium ion binding;sodium channel activity;sodium ion binding;amiloride-sensitive sodium channel activity;ion channel activity;monovalent inorganic cation transmembrane transporter activity;cation channel activity
dendritic shaft;dendritic spine;synaptosome;integral to membrane;membrane;integral to plasma membrane;synapse
adenylate cyclase 8 ADCY8_MOUSE
ADCY8_MOUSE;AC8;ADCY8;AW060868;Adcy
intracellular signaling cascade;cyclic nucleotide
phosphorus-oxygen lyase activity;metal ion binding;adenylate cyclase activity;lyase
integral to membrane;membrane;plasma membrane
81 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
8 biosynthetic process;cAMP biosynthetic process;adenylate cyclase activation
activity;magnesium ion binding
adenylate cyclase activating polypeptide 1 receptor 1
ADCYAP1R1_MOUSE
PITUITARY ADENYLATE CYCLASE ACTIVATING POLYPEPTIDE TYPE I RECEPTOR PRECURSOR;PACAP TYPE I RECEPTOR;ADCYAP1R1_MOUSE;PACAP-R-1;PACAPR_MOUSE
signal transduction;spermatogenesis;multicellular organismal development;G-protein coupled receptor protein signaling pathway;cell differentiation
receptor activity;signal transducer activity;G-protein coupled receptor activity;vasoactive intestinal polypeptide receptor activity
integral to membrane;membrane;extracellular space
adducin 2 (beta) ADD2_MOUSE
ADD2_MOUSE;Add2;ADD2
hemopoiesis
structural molecule activity;metal ion binding;calmodulin binding
cytoskeleton;cytoplasm;membrane
AF4/FMR2 family, member 2
AFF2_MOUSE
FMR2;Fmr2;OX19;Ox19;Oxh;FMR2_MOUSE
learning and/or memory
adaptor-related protein complex 3, mu 2 subunit
AP3M2_MOUSE
Ap3m2;5830445E16Rik;AP3M2_MOUSE
protein complex assembly;transport;vesicle-mediated transport;protein transport;intracellular protein transport
protein transporter activity;protein binding
cytoplasmic vesicle;clathrin adaptor complex;clathrin vesicle coat;membrane coat;Golgi apparatus;membrane
calbindin 2 CALB2_MOUSE
calretinin;CALB2;CR;Calb2;CALB2_MOUSE
calcium ion binding gap junction
calcium/calmodulin-dependent protein kinase kinase 2, beta
CAMKK2_MOUSE
CAMKK2_MOUSE;6330570N16RIK_MOUSE;6330570N16Rik
protein amino acid phosphorylation
calmodulin binding;calmodulin-dependent protein kinase activity;ATP binding;protein serine/threonine kinase activity;transferase activity;nucleotide binding;protein kinase activity;kinase activity
cytoplasm
cerebellin 1 precursor protein
CBLN1_MOUSE
CBLN1_MOUSE;Cbln1;CBLN1;AI323299
membrane;extracellular space;synapse;cell junction;extracellular region
CD247 antigen CD247_MOUSE
CD3Z_MOUSE;CD247;CD3H;Tcrz;TCRk;Cd3z;CD3Z;Cd3;T3
cell surface receptor linked signal transduction
transmembrane receptor activity;protein binding;receptor activity
plasma membrane;membrane;integral to membrane;T
82 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
z;Cd3h;TCRZ
cell receptor complex;alpha-beta T cell receptor complex;cytoplasm
carbohydrate sulfotransferase 10
CHST10_MOUSE
AI507003;AU041319;Chst10;ST;CHST10_MOUSE;Hnk-1st-pending;Hnk-1st
long-term memory;carbohydrate metabolic process;learning
sulfotransferase activity;transferase activity
integral to membrane;Golgi apparatus;membrane;cellular_component
cannabinoid receptor 1 (brain)
CNR1_MOUSE
CNR1_MOUSE;CB1;CB1-R;Cannabinoid receptor 1
signal transduction;G-protein coupled receptor protein signaling pathway
cannabinoid receptor activity;receptor activity;signal transducer activity;rhodopsin-like receptor activity;G-protein coupled receptor activity
integral to membrane;membrane
collapsin response mediator protein 1
CRMP1_MOUSE
CRMP1_MOUSE;DRP-1;Collapsin response mediator protein 1;Dihydropyrimidinase related protein-1_mouse;CRMP-1;CRMP1;DPYSL1
hydrolase activity cell soma;dendrite;cytoplasm
chondroitin sulfate proteoglycan 5
CSPG5_MOUSE
NGC;Cspg5;CSPG5_MOUSE
cell differentiation;nervous system development;multicellular organismal development;regulation of cell growth;regulation of synaptic transmission
extracellular space;endoplasmic reticulum;Golgi apparatus;membrane;integral to membrane
catenin (cadherin associated protein), delta 2
CTNND2_MOUSE
Ctnnd2;CATND2_MOUSE;Catnd2;Nprap
learning;regulation of synaptic plasticity;transcription;morphogenesis of a branching structure;regulation of transcription, DNA-dependent;cell adhesion;multicellular organismal development
protein binding;binding;structural molecule activity
nucleus;cytoplasm;cell junction;cytoskeleton
dystroglycan 1 DAG1_MOUSE
DAG-1;D9Wsu13e;DAG1;DG;Dag1;DAG1_MOUSE
morphogenesis of an epithelial sheet
calcium ion binding;protein binding
insoluble fraction;lipid raft;sarcolemma;integral to membrane;extracellular region;cytoplasm;cytoskeleton;plasma
83 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
membrane;dystroglycan complex;membrane
diacylglycerol kinase, epsilon
DGKE_MOUSE
DGKE_MOUSE;Dgke;DGK;DAGK6
intracellular signaling cascade;protein kinase C activation
diacylglycerol kinase activity;zinc ion binding;kinase activity;transferase activity;diacylglycerol binding;metal ion binding
integral to membrane;membrane
double C2, alpha DOC2A_MOUSE
DOC2A_MOUSE;Doc2a
transport;exocytosis
transporter activity;calcium-dependent phospholipid binding;calcium ion binding
synaptosome;cell junction;cytoplasmic vesicle;synapse;membrane;synaptic vesicle
eukaryotic translation initiation factor 2 alpha kinase 4
EIF2AK4_MOUSE
EIF2AK4_MOUSE
post-translational protein modification;regulation of translation initiation in response to stress;unfolded protein response;cellular response to starvation;translation;tRNA aminoacylation for protein translation;negative regulation of translation;regulation of protein metabolic process;protein amino acid phosphorylation
nucleotide binding;protein serine/threonine kinase activity;translation initiation factor activity;small conjugating protein ligase activity;transferase activity;kinase activity;ATP binding;aminoacyl-tRNA ligase activity;eukaryotic translation initiation factor 2alpha kinase activity;protein kinase activity
cytoplasm
eukaryotic translation initiation factor 4E binding protein 2
EIF4EBP2_MOUSE
EIF4EBP2_MOUSE;PHAS-II;Eif4ebp2;4E-BP2;2810011I19Rik
insulin receptor signaling pathway;regulation of translational initiation;negative regulation of translation;cAMP-mediated signaling;negative regulation of translational initiation;regulation of translation
eukaryotic initiation factor 4E binding;translation initiation factor activity;protein binding
cellular_component
Eph receptor A4 EPHA4_MOUSE
EPHA4_MOUSE
transmembrane receptor protein tyrosine kinase signaling pathway;protein amino acid phosphorylation;axon guidance;adult walking behavior
ATP binding;transferase activity;ephrin receptor activity;protein kinase activity;protein-tyrosine kinase activity;transmembrane receptor protein tyrosine kinase activity;nucleotide binding;receptor activity;kinase activity;protein
membrane;integral to membrane
84 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
binding
fragile X mental retardation syndrome 1 homolog
FMR1_MOUSE
FMRP;FMR-1;FMR1;Fmr-1;Fmr1;FMR1_MOUSE
central nervous system development;transport;mRNA transport
RNA binding;protein binding
cytoplasm;nucleus
gamma-aminobutyric acid (GABA-B) receptor, 1
GABBR1_MOUSE
GABA-B-R;GABBR1_MOUSE;gamma-aminobutyric acid B receptor, 1A;GABAB1A_MOUSE
signal transduction;G-protein coupled receptor protein signaling pathway
metabotropic glutamate, GABA-B-like receptor activity;signal transducer activity;protein binding;G-protein coupled receptor activity;GABA-B receptor activity;receptor activity
postsynaptic membrane;cell junction;synapse;integral to membrane;cytoplasm;membrane
glial fibrillary acidic protein
GFAP_MOUSE
GFAP_MOUSE;Gfap
intermediate filament-based process
protein binding;structural molecule activity
intermediate filament;membrane fraction;cytoplasm
guanine nucleotide binding protein (G protein), alpha inhibiting 1
GNAI1_MOUSE
GNAI1_MOUSE;Gnai1;Gnai-1;Gialpha1
G-protein coupled receptor protein signaling pathway
GTPase activity intracellular
glutamate receptor, metabotropic 1
GRM1_MOUSE
MGLUR1_MOUSE;mGluR1;mGluR1alpha;GRM1_MOUSE;Gprc1a;Glutamate receptor, metabotropic 1
regulation of sensory perception of pain;regulation of MAPKKK cascade;locomotory behavior;G-protein coupled receptor protein signaling pathway;signal transduction;activation of MAPK activity;activation of MAPKK activity
metabotropic glutamate, GABA-B-like receptor activity;protein binding;G-protein coupled receptor activity;receptor activity;signal transducer activity;PLC activating metabotropic glutamate receptor activity
microsome;postsynaptic density;membrane;integral to membrane;dendrite;nucleus;cell soma;postsynaptic membrane
intercellular adhesion molecule 5, telencephalin
ICAM5_MOUSE
Tlcn;ICAM5;Icam5;TLCN;TLN;ICAM5_MOUSE
cell-cell adhesion;cell adhesion
protein binding
membrane;plasma membrane;integral to membrane
inositol 1,4,5-trisphosphate 3-kinase A
ITPKA_MOUSE
ITPKA_MOUSE;Itpka;MGC28924
inositol metabolic process
ATP binding;kinase activity;nucleotide binding;transferase activity;inositol or phosphatidylinositol kinase activity;calmodulin binding;inositol trisphosphate 3-kinase activity
cellular_component
potassium voltage-gated channel, shaker-related subfamily, beta member 1
KCNAB1_MOUSE
Akr8a8;Kcnab1;potassium voltage-gated channel, shaker-
ion transport;potassium ion transport;transport
voltage-gated ion channel activity;voltage-gated potassium channel activity;potassium channel
integral to membrane;integral to plasma membrane;cytoplasm
85 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
related subfamily, beta member 1;KCNAB1_MOUSE;Kv beta1_MOUSE
activity;potassium ion binding;oxidoreductase activity;ion channel activity
Kv channel interacting protein 3, calsenilin
KCNIP3_MOUSE
KCNIP3_MOUSE;Csen;DREAM;calsenilin, presenilin binding protein, EF hand transcription factor;KCHIP3_MOUSE
negative regulation of transcription from RNA polymerase II promoter;potassium ion transport;apoptosis;behavior;negative regulation of transcription;sensory perception of pain;regulation of neuron apoptosis;response to pain;transcription;regulation of transcription, DNA-dependent;ion transport;transport
protein C-terminus binding;voltage-gated ion channel activity;specific transcriptional repressor activity;potassium ion binding;calcium-dependent protein binding;ion channel activity;DNA binding;protein binding;calcium ion binding;potassium channel activity;transcription repressor activity
cytoplasm;nucleus;membrane;cytosol;Golgi apparatus;endoplasmic reticulum
potassium intermediate/small conductance calcium-activated channel, subfamily N, member 2
KCNN2_MOUSE
KCNN2_MOUSE;Kcnn2;SK2;SK-2_MOUSE;potassium intermediate/small conductance calcium-activated channel, subfamily N, member 2
potassium ion transport;biological_process;transport;ion transport
small conductance calcium-activated potassium channel activity;ion channel activity;calmodulin binding;calcium-activated potassium channel activity
integral to membrane;membrane
LIM-domain containing, protein kinase
LIMK1_MOUSE
LIMK1_MOUSE
protein amino acid phosphorylation;positive regulation of axon extension
protein heterodimerization activity;metal ion binding;transferase activity;kinase activity;nucleotide binding;zinc ion binding;protein kinase activity;ATP binding;protein binding;protein-tyrosine kinase activity;protein serine/threonine kinase activity
focal adhesion;nucleus;cytoplasm
mannosidase 2, alpha B1
MAN2B1_MOUSE
MANB;LAMAN;MAN2B;MAN2B1;MAN2B1_MOUSE;Man2b1;AW107687
carbohydrate metabolic process;learning and/or memory;metabolic process;mannose metabolic process
alpha-mannosidase activity;hydrolase activity, acting on glycosyl bonds;zinc ion binding;mannosidase activity;metal ion binding;hydrolase activity
lysosome
86 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
mitogen-activated protein kinase 3
MAPK3_MOUSE
MAPK3_MOUSE
response to exogenous dsRNA;response to lipopolysaccharide;lipopolysaccharide-mediated signaling pathway;phosphorylation;organ morphogenesis;signal transduction;cell cycle;response to DNA damage stimulus;cartilage development;protein amino acid phosphorylation;sensory perception of pain
transferase activity;nucleotide binding;phosphotyrosine binding;protein kinase activity;protein serine/threonine kinase activity;MAP kinase activity;protein binding;kinase activity;ATP binding
nucleus;cytoplasm
MAS1 oncogene MAS1_MOUSE
MAS1_MOUSE;Oncogene MAS1
cellular process;G-protein coupled receptor protein signaling pathway;regulation of cell cycle;signal transduction
receptor activity;rhodopsin-like receptor activity;G-protein coupled receptor activity;signal transducer activity
intracellular;membrane;integral to membrane
methyl-CpG binding domain protein 1
MBD1_MOUSE
Cxxc3;MBD1_MOUSE;Mbd1;PCM1
transcription;DNA methylation;regulation of transcription, DNA-dependent
zinc ion binding;DNA binding;metal ion binding
heterochromatin;chromatin;nucleus
neurocan NCAN_MOUSE
Ncan;CSPG3;Cspg3;NCAN;CSPG3_MOUSE;neurocan;C230035B04
cell adhesion
sugar binding;hyaluronic acid binding;calcium ion binding
extracellular space
NEL-like 2 (chicken)
NELL2_MOUSE
NELL2_MOUSE;mel91;Nell2;A330108N19Rik
cell adhesion calcium ion binding;structural molecule activity
extracellular space;extracellular region
neuro-oncological ventral antigen 2
NOVA2_MOUSE
Gm1424 protein binding;RNA binding
neurogranin NRGN_MOUSE
0710001B06Rik;NG;NG/RC3;NRGN_MOUSE;RC3;Nrgn;Pss1;R75334
protein kinase cascade
calmodulin binding
opioid receptor-like 1
OPRL1_MOUSE
KOR-3;OPRL1_MOUSE;ORL1;KOR3;nociceptin receptor;12C;K3;kappa-type 3 opioid receptor;orphanin FQ receptor
G-protein coupled receptor protein signaling pathway;signal transduction
rhodopsin-like receptor activity;signal transducer activity;receptor activity;G-protein coupled receptor activity;opioid receptor activity;X-opioid receptor activity
integral to membrane;membrane
87 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
opioid receptor, mu 1
OPRM1_MOUSE
MU-TYPE OPIOID RECEPTOR;OPRM1_MOUSE;MOR-1;MOR1
G-protein coupled receptor protein signaling pathway;signal transduction;behavior;G-protein signaling, adenylate cyclase inhibiting pathway;dopamine receptor, adenylate cyclase activating pathway;locomotory behavior
receptor activity;G-protein coupled receptor activity;signal transducer activity;mu-opioid receptor activity;opioid receptor activity;rhodopsin-like receptor activity
membrane fraction;membrane;integral to membrane
p21 (CDKN1A)-activated kinase 3
PAK3_MOUSE
PAK3_MOUSE
multicellular organismal development;protein amino acid phosphorylation
transferase activity;kinase activity;ATP binding;protein binding;protein serine/threonine kinase activity;protein kinase activity;catalytic activity;magnesium ion binding;nucleotide binding;metal ion binding
Parkinson disease (autosomal recessive, early onset) 7
PARK7_MOUSE
DJ-1_MOUSE;Dj1-pending;hiptar0004921;thiJ homologue (Caenorhabditis elegans);CAP1 (Rattus norvegicus);DJ-1 putative peptidase;4-methyl-5(beta-hydroxyethyl)-thiazole monophosphate biosynthesis protein (Escherichia coli);contraception-associated protein 1 (Rattus norvegicus);thiJ g.p. (Escherichia coli)
response to hydrogen peroxide;synaptic transmission, dopaminergic;adult locomotory behavior;dopamine uptake;cell proliferation
RNA binding nucleus;cytoplasm
plasminogen activator, tissue
PLAT_MOUSE
PLAT_MOUSE;t-plasminogen activator;hiptar0004973;tPA;tissue
platelet-derived growth factor receptor signaling pathway;proteolysis
peptidase activity;plasminogen activator activity;hydrolase activity;serine-type endopeptidase
extracellular region;extracellular space;apical part of cell;cytoplasm
88 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
plasminogen activator
activity ;secretory granule
phospholipase C, beta 4
PLCB4_MOUSE
PLCB4_MOUSE;Plcb4
intracellular signaling cascade;lipid metabolic process;signal transduction
protein binding;phospholipase C activity;phosphoinositide phospholipase C activity
dendrite;postsynaptic density;smooth endoplasmic reticulum;nucleus;microsome
protein phosphatase 1, regulatory (inhibitor) subunit 1A
PPP1R1A_MOUSE
Ppp1r1a;0610038N18Rik;PPP1R1A_MOUSE;I-1
carbohydrate metabolic process;signal transduction;glycogen metabolic process
protein binding;protein phosphatase inhibitor activity
protein kinase, cAMP dependent, catalytic, beta
PRKACB_MOUSE
PRKACB_MOUSE
protein amino acid phosphorylation;G-protein signaling, coupled to cAMP nucleotide second messenger
magnesium ion binding;protein serine/threonine kinase activity;cAMP-dependent protein kinase activity;ATP binding;kinase activity;transferase activity;nucleotide binding;protein kinase activity
cAMP-dependent protein kinase complex;cytoplasm;nucleus
protein kinase, cAMP dependent regulatory, type I beta
PRKAR1B_MOUSE
PRKAR1B_MOUSE;RIbeta;Prkar1b;AI385716
cell proliferation;organ morphogenesis;protein amino acid phosphorylation;signal transduction;learning and/or memory
cAMP binding;kinase activity;cAMP-dependent protein kinase regulator activity;nucleotide binding
cytoplasm;cAMP-dependent protein kinase complex
pleiotrophin PTN_MOUSE
HARP;HBBN;HBGF-8;HBNF;OSF;Ptn;HB-GAM;PTN;Osf1;Osf-1;PTN_MOUSE
bone mineralization;cell proliferation;learning;ossification
heparin binding;growth factor activity
extracellular space;proteinaceous extracellular matrix;extracellular region
protein tyrosine phosphatase, receptor type, D
PTPRD_MOUSE
PTPRD_MOUSE;Ptprd;MGC36851
dephosphorylation;transmembrane receptor protein tyrosine phosphatase signaling pathway;protein amino acid dephosphorylation
phosphoric monoester hydrolase activity;hydrolase activity;receptor activity;protein tyrosine phosphatase activity;phosphoprotein phosphatase activity
integral to membrane;membrane;plasma membrane
retinoic acid receptor, beta
RARB_MOUSE
RARB_MOUSE;Rarb
embryonic eye morphogenesis;positive regulation of transcription from RNA polymerase II promoter;positive regulation of apoptosis;regulation of transcription, DNA-dependent;ventricular cardiac muscle cell
receptor activity;DNA binding;metal ion binding;ligand-dependent nuclear receptor activity;transcription activator activity;sequence-specific DNA binding;zinc ion binding;transcription factor activity;steroid hormone receptor
nucleus
89 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
differentiation;transcription;ureteric bud development
activity;retinoic acid receptor activity
regulating synaptic membrane exocytosis 1
RIMS1_MOUSE
RIMS1_MOUSE;Rim;RIM1;RIM1a;Serg1;Rims1;Rab3ip1
exocytosis;neurotransmitter transport;intracellular protein transport;regulation of long-term neuronal synaptic plasticity;transport
protein binding;metal ion binding;Rab GTPase binding;zinc ion binding
cell junction;synapse
Ras and Rab interactor 1
RIN1_MOUSE
RIN1_MOUSE;Rin1
intracellular signaling cascade;signal transduction;neuropeptide signaling pathway;endocytosis
GTPase activator activity;protein binding
cytoplasm;cytoskeleton;membrane
ryanodine receptor 3
RYR3_MOUSE
Ryr3;AI851294;RYR3_MOUSE;ryanodine receptor 3
striated muscle contraction;transport;cellular calcium ion homeostasis;ion transport
receptor activity;ion channel activity
integral to membrane;junctional membrane complex
syndecan 3 SDC3_MOUSE
mKIAA0468;Synd3;SDC3_MOUSE;syn-3;MGC69616;SDC3;MGC65603;Sdc3
cytoskeletal protein binding
membrane;integral to membrane
serine (or cysteine) peptidase inhibitor, clade E, member 2
SERPINE2_MOUSE
GDN;Serpine2;PI7;PN1;Glia derived nexin [Precursor];Protease nexin I;PN-1;Protease inhibitor 7;SERPINE2_MOUSE
nervous system development;multicellular organismal development;cell differentiation
heparin binding;serine-type endopeptidase inhibitor activity;endopeptidase inhibitor activity
extracellular region;extracellular space
solute carrier family 24 (sodium/potassium/calcium exchanger), member 2
SLC24A2_MOUSE
Slc24a2;2810021B17Rik;SLC24A2_MOUSE
integral to membrane
solute carrier family 8 (sodium/calcium exchanger), member 2
SLC8A2_MOUSE
Ncx2;SLC8A2_MOUSE;Slc8a2
calcium ion transport;transport
calcium:sodium antiporter activity;transmembrane transporter activity;calmodulin binding
integral to plasma membrane;membrane;integral to membrane
ST8 alpha-N-acetyl-neuraminide alpha-2,8-sialyltransferase 4
ST8SIA4_MOUSE
Siat8d;SIAT8D;SIAT8D_MOUSE;PST;PST-1;ST8SiaIV
protein amino acid glycosylation
transferase activity, transferring glycosyl groups;alpha-N-acetylneuraminate alpha-2,8-sialyltransferase activity;sialyltransferase activity;transferase activity
integral to membrane;Golgi apparatus;integral to Golgi membrane;membrane
90 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
synaptopodin SYNPO_MOUSE
SYNPO_MOUSE;9330140I15Rik;LOC170766;Synpo
cortical cytoskeleton organization and biogenesis
actin binding
cytoskeleton;tight junction;actin cytoskeleton;membrane;cell junction;postsynaptic membrane;synapse;axon;cell projection;dendritic spine;dendrite;cytoplasm
thymus cell antigen 1, theta
THY1_MOUSE
Thy1;Thy-1;THY1_MOUSE;THY1;THY-1;CD90;Thy1.1
retinal cone cell development;negative regulation of T cell receptor signaling pathway;angiogenesis
GPI anchor binding
membrane;external side of plasma membrane;anchored to external side of plasma membrane
tropomodulin 2 TMOD2_MOUSE
N-Tmod;TMOD2_MOUSE;NTMOD;Tmod2
positive regulation of G-protein coupled receptor protein signaling pathway;learning and/or memory;nerve-nerve synaptic transmission
actin binding;tropomyosin binding
cytoskeleton;cytoplasm
ubiquitin protein ligase E3A
UBE3A_MOUSE
4732496B02;UBE3A_MOUSE;Hpve6a;Ube3a
ubiquitin-dependent protein catabolic process;protein modification process;ubiquitin cycle
ubiquitin-protein ligase activity;ligase activity;protein binding
protein complex;cytosol;cytoplasm;nucleus;intracellular
ubiquitin specific peptidase 14
USP14_MOUSE
ubiquitin specific protease 14;USP14_MOUSE;TGT subunit;hiptar0005312;USP14 g.p. (Homo sapiens);tRNA-guanine transglycosylase 60-kDa subunit
synaptic transmission;ubiquitin cycle;ubiquitin-dependent protein catabolic process;protein modification process
cysteine-type peptidase activity;peptidase activity;ubiquitin thiolesterase activity;hydrolase activity
soluble fraction;synaptosome
voltage-dependent anion channel 1
VDAC1_MOUSE
VDAC1_MOUSE;VDAC1_MOUSE_V1;porin-1_MOUSE_v1;voltage-dependent anion channel 1;Vdac5;Vdac1;Plasmalemmal VDAC1;PL-
learning;nerve-nerve synaptic transmission;synaptic transmission;behavioral fear response;transport;ion transport;anion transport;apoptosis
voltage-gated ion-selective channel activity
mitochondrion;mitochondrial inner membrane;membrane;integral to membrane;mitochondrial outer membrane;outer membrane;extracellular
91 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
VDAC1 space
very low density lipoprotein receptor
VLDLR_MOUSE
VLDLR_MOUSE;Vldlr
transport;positive regulation of protein kinase activity;cholesterol metabolic process;lipid transport;endocytosis;steroid metabolic process;lipid metabolic process
lipid transporter activity;receptor activity;calcium ion binding
integral to membrane;membrane;membrane fraction;extracellular space;coated pit
A kinase (PRKA) anchor protein 5
AKAP5_MOUSE
Gm258 protein kinase binding
Cdc42 guanine nucleotide exchange factor (GEF) 9
ARHGEF9_MOUSE
TIG120842
intracellular signaling cascade;regulation of Rho protein signal transduction;small GTPase mediated signal transduction
Rho guanyl-nucleotide exchange factor activity;guanyl-nucleotide exchange factor activity
cell cortex;cytoplasm;intracellular
ataxin 1 ATXN1_MOUSE
Ataxin-1;Atx1;SCA1;SCA1_MOUSE;Sca1
adult locomotory behavior;regulation of excitatory postsynaptic membrane potential;visual learning
RNA binding;binding
cytoplasm;nuclear inclusion body;nuclear matrix;nucleus
beta-1,3-glucuronyltransferase 1 (glucuronosyltransferase P)
B3GAT1_MOUSE
0710007K08Rik;AI846286;B3GAT1;B3GAT1_MOUSE;B3gat1;GlcAT-P;HNK-1
UDP-galactose:beta-N-acetylglucosamine beta-1,3-galactosyltransferase activity;galactosylgalactosylxylosylprotein 3-beta-glucuronosyltransferase activity;glucuronosyltransferase activity;manganese ion binding;metal ion binding;transferase activity
Golgi apparatus;extracellular space;integral to membrane;membrane
complexin 2 CPLX2_MOUSE
921-L;AI413745;AW492120;CPLX2_MOUSE;Cplx2
exocytosis;mast cell degranulation;membrane fusion;neurotransmitter transport;transport;vacuole organization and biogenesis;vesicle docking during exocytosis
syntaxin binding cytoplasm
galanin GAL_MOUSE
GALANIN;GALANIN MESSAGE-ASSOCIATED PEPTIDE;G
nervous system development;neuropeptide signaling pathway
hormone activity extracellular region;extracellular space
92 Appendix A. Dendritic Plasticity gene dataset ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
ALN;GAL_MOUSE;GLNN;GMAP
glutamate receptor, ionotropic, delta 2
GRID2_MOUSE
GLURD2_MOUSE;GRID2_MOUSE;Grid2;Lc;glutamate receptor, ionotropic, delta 2
ion transport;prepulse inhibition;regulation of excitatory postsynaptic membrane potential;synaptic transmission, glutamatergic;transport
extracellular-glutamate-gated ion channel activity;ion channel activity;ionotropic glutamate receptor activity;protein binding;receptor activity
cell junction;integral to membrane;membrane;membrane fraction;postsynaptic membrane;synapse;synaptosome
5-hydroxytryptamine (serotonin) receptor 2C
HTR2C_MOUSE
5-HT-2C;5-HT1C, 5HT2C;5-hydroxytryptamine 2C receptor;5HT-1;5HT-2C;HTR2C_MOUSE;serotonin receptor 2C
G-protein coupled receptor protein signaling pathway;inositol phosphate-mediated signaling;signal transduction
G-protein coupled receptor activity;receptor activity;rhodopsin-like receptor activity;serotonin receptor activity;signal transducer activity
external side of plasma membrane;integral to membrane;membrane
laminin, alpha 2 LAMA2_MOUSE
5830440B04;LAMA2;LAMA2_MOUSE;Lama2;dy;mer;merosin
cell adhesion;positive regulation of synaptic transmission, cholinergic;regulation of cell adhesion;regulation of cell migration;regulation of embryonic development
extracellular matrix structural constituent;protein binding;receptor binding
basal lamina;basement membrane;extracellular matrix;extracellular region;extracellular space;laminin-1 complex;proteinaceous extracellular matrix;sarcolemma
leptin receptor LEPR_MOUSE
DB;LEPR;LEPROT;LEPR_MOUSE;Lepr;MGC6694;OB-RGRP;OBR;Obr;db;diabetes;obese-like;obl
cholesterol metabolic process;negative regulation of hydrolase activity;regulation of metabolic process;signal transduction
hematopoietin/interferon-class (D200-domain) cytokine receptor activity;protein binding;receptor activity;transmembrane receptor activity
extracellular region;extracellular space;integral to membrane;integral to plasma membrane;membrane
purinergic receptor P2X, ligand-gated ion channel 4
P2RX4_MOUSE
P2RX4_MOUSE;P2RX4_MOUSE_V1;P2X4;P2X4_MOUSE_v1;P2rx4;purinergic receptor P2X, ligand-gated ion channel 4
calcium ion transport;ion transport;metabolic process;nitric oxide biosynthetic process;regulation of excitatory postsynaptic membrane potential;transport;vasodilation
ATP binding;ATP-gated cation channel activity;ion channel activity;receptor activity
apical part of cell;integral to membrane;integral to plasma membrane;membrane
93 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
PTEN induced putative kinase 1
PINK1_MOUSE
1190006F07RIK_MOUSE;1190006F07Rik;PINK1_MOUSE
protein amino acid phosphorylation;protein kinase cascade
ATP binding;kinase activity;magnesium ion binding;metal ion binding;nucleotide binding;protein kinase activity;protein serine/threonine kinase activity;transferase activity
mitochondrion
prion protein PRNP_MOUSE
CJD;Creutzfeld-Jakob disease, Gerstmann-Strausler-Scheinker syndrome;PRIP;PRNP_MOUSE;fatal familial insomnia;p27-30;prion protein
cellular copper ion homeostasis;nucleobase, nucleoside, nucleotide and nucleic acid metabolic process;protein homooligomerization;response to oxidative stress
GPI anchor binding;copper ion binding;protein binding
Golgi apparatus;endoplasmic reticulum;lipid raft;membrane;plasma membrane
protein tyrosine phosphatase, non-receptor type 4
PTPN4_MOUSE
PTPMEG;PTPN4_MOUSE;Ptn4;Ptpn4;TEP;hPTP-MEG
intracellular signaling cascade;protein amino acid dephosphorylation
hydrolase activity;non-membrane spanning protein tyrosine phosphatase activity;phosphoprotein phosphatase activity;prenylated protein tyrosine phosphatase activity;protein tyrosine phosphatase activity;receptor activity;structural molecule activity
cytoplasm;cytoskeleton
tenascin C TNC_MOUSE
AI528729;Hxb;TN-C;TNC_MOUSE;Ten;Tnc
cell adhesion;neuromuscular junction development;signal transduction
fibronectin binding;protein binding;receptor binding
basement membrane;extracellular region;extracellular space;proteinaceous extracellular matrix
WASP family 1 WASF1_MOUSE
AI195380;AI838537;Scar;WASF1_MOUSE;WAVE-1;Wasf1
actin filament polymerization;cell morphogenesis;cell motility;protein complex assembly
actin binding;protein binding
actin cytoskeleton;cytoplasm;cytoskeleton;lamellipodium;mitochondrial outer membrane;mitochondrion
95
Appendix B
Over-represented disease genes
Following tables summarize genes assigned to the two most represented diseases (alone or
together) resulted by both interrogating the DO Knowledgebase with dendritic plasticity
relevant genes and by network expansion. GO Biological Processes has been used as reference
annotation.
Alzheimer’s disease
Gene Symbol
Gene name GO Biological Process
A2M alpha-2-macroglobulin intracellular protein transport
APBB1 amyloid beta (A4) precursor protein-binding, family B, member 1
axonogenesis
APOA1 apolipoprotein A-I cholesterol metabolism
APOE apolipoprotein E regulation of neuronal synaptic plasticity
APP amyloid beta (A4) precursor protein signal transduction
BACE1 beta-secretase 1 beta amyloid metabolic process
BDNF brain derived neurotrophic factor neurogenesis
CDC2 cell division cycle 2, G1 to S and G2 to M start control point of mitotic cell cycle
CHAT choline acetyltransferase neurotransmitter biosynthesis
ESR1 estrogen receptor 1 cell growth;negative regulation of mitosis
FYN fyn proto-oncogene feeding behavior;learning;
LRP1 low density lipoprotein receptor-related protein 1
cell proliferation;pathogenesis
MAPK8IP1 mitogen-activated protein kinase 8 regulation of JNK cascade
96 Appendix B. Over-represented disease genes ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
interacting protein 1
MAPT microtubule-associated protein tau microtubule stabilization
NTF3 neurotrophin 3;neurotrophin-3 (HDNF/NT-3)
brain development
PRNP prion protein (p27-30) signal transduction
PSEN1 presenilin 1 intracellular signaling cascade
PSEN2 presenilin 2 intracellular signaling cascade
SNCA synuclein, alpha central nervous system development;pathogenesis
Schizophrenia
Gene Symbol
Gene Name GO Biological Process
CTLA4 cytotoxic T-lymphocyte-associated protein 4
immune response
HTATIP HIV-1 Tat interactive protein, 60 kD chromatin assembly/disassembly
CHL1 cell adhesion molecule with homology to L1CAM (close homolog of L1)
axon guidance;signal transduction
APOE apolipoprotein E learning and/or memory;regulation of neuronal synaptic plasticity
CHRNA7 cholinergic receptor, nicotinic, alpha polypeptide 7
activation of MAPK;synaptic transmission
CNR1 cannabinoid receptor 1 (brain) G-protein signaling, coupled to cyclic nucleotide second messenger
BDNF brain derived neurotrophic factor neurogenesis;regulation of long-term neuronal synaptic plasticity
DRD5 dopamine receptor 5 synaptic transmission;transmission of nerve impulse
GABBR1 gamma-aminobutyric acid (GABA) B receptor, 1
gamma-aminobutyric acid signaling pathway;synaptic transmission
GABRA5 gamma-aminobutyric acid A receptor, alpha 5
associative learning;synaptic transmission
GRIA4 glutamate receptor, ionotrophic, AMPA 4
glutamate signaling pathway;synaptic transmission
GRIN1 glutamate receptor, ionotropic, NMDA1 earning and/or memory;regulation of synaptic plasticity
GRIN2A glutamate receptor, ionotropic, NMDA2A (epsilon 1)
learning and/or memory;synaptic transmission
GRIN2B glutamate receptor, ionotropic, NMDA2B (epsilon 2)
learning and/or memory;synaptic transmission
97 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
NTF3 neurotrophin 3 brain development;signal transduction
PRNP prion protein (p27-30) pathogenesis;signal transduction
HTR2C 5-hydroxytryptamine (serotonin) receptor 2C
serotonin receptor signaling pathway;synaptic transmission
YWHAH tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, eta polypeptide
intracellular protein transport;protein kinase C activation
Schizophrenia and Alzheimer’s related genes after network
extension
Gene Symbol GO Biological Process Gene Name
NRG2 anti-apoptosis;cell-cell signaling neuregulin 2
ADRA2B G-protein coupled receptor protein
signaling pathway;cell-cell signaling
adrenergic, alpha-2B-, receptor
OLIG2 cell growth and/or
maintenance;regulation of
transcription\, DNA-dependent
oligodendrocyte lineage
transcription factor 2
SULF1 apoptosis;heparan sulfate proteoglycan
metabolism;lipid metabolism
sulfatase 1
GFRA2 transmembrane receptor protein
tyrosine kinase signaling pathway
GDNF family receptor alpha 2
TRIB1 regulation of MAP kinase activity Tribbles homolog 1
LGI1 cell proliferation;neurogenesis leucine-rich, glioma inactivated
1
NRG3 embryonic development;regulation of
cell growth;transmembrane receptor
protein tyrosine kinase ligand binding
neuregulin 3
NTF3 anti-apoptosis;brain development;cell
motility;cell-cell signaling;glial cell fate
determination;
neurotrophin 3
TDGF1 mesoderm cell fate determination;signal
transduction
teratocarcinoma-derived
growth factor 1
98 Appendix B. Over-represented disease genes ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
FGF13 cell-cell signaling;neurogenesis fibroblast growth factor 13
NRTN MAPKKK
cascade;neurogenesis;transmembrane
receptor protein tyrosine kinase
signaling pathway
neurturin
PRNP posttranslational membrane
targeting;regulation of transcription\,
DNA-dependent;signal transduction
prion protein (p27-30)
(Creutzfeld-Jakob disease,
Gerstmann-Strausler-Scheinker
syndrome, fatal familial
insomnia)
MFGE8 cell adhesion;oncogenesis milk fat globule-EGF factor 8
protein
SHC3 intracellular signaling
cascade;regulation of transcription\,
DNA-dependent
src homology 2 domain
containing transforming
protein C3
TACC1 cell growth and/or maintenance transforming, acidic coiled-coil
containing protein 1
THBs2 cell adhesion thrombospondin 2
RUSC1 development RUN and SH3 domain
containing 1
MFN2 biological_process unknown mitofusin 2
APOE learning and/or memory;regulation of
neuronal synaptic plasticity
apolipoprotein E
ANGPTL1 development angiopoietin-like 1
VGF biological_process unknown VGF nerve growth factor
inducible
BDNF neurogenesis brain-derived neurotrophic
factor
HTR1B G-protein signaling\, coupled to cyclic
nucleotide second messenger;synaptic
transmission
5-hydroxytryptamine
(serotonin) receptor 1B
MAPK6 cell cycle;protein amino acid
phosphorylation;signal transduction
mitogen-activated protein
kinase 6
ATP1A2 ATP hydrolysis coupled proton
transport;hydrogen ion homeostasis;
ATPase, Na+/K+ transporting,
alpha 2 (+) polypeptide
99
Appendix C
List of abbreviations
GOA Gene Ontology Annotation
GO Gene Ontology
DAG Directed Acyclic Graph
DO Disease Ontology
FTP File Transfer Protocol
UMLS Unified Medical Language System
HUGO Human Genome Organisation
OBO Open Biomedical Ontologies
GAD Genetic Association Database
OMIM Online Mendelian Inheritance in Man
PharmGKB Pharmacogenetics and Pharmacogenomics Knowledge Base
KEGG Kyoto Encyclopedia of Genes and Genomes
PubMed Publisher's MEDLINE
PK Pharmacokinetics
PD Pharmacodynamics
MeSH Medical Subject Headings
CSHL Cold Spring Harbor Laboratory
OWL Web Ontology Language
MGI Mouse Genome Informatics
SGD Saccharomyces Genome Database
100 Appendix C. List of abbreviations ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
RxList The Internet Drug Index
CAS RN Chemical Abstracts Service Registry Number
CREB cyclic AMP responsive element binding
AMPA α-amino-3-hydroxyl-5-methyl-4-isoxazole-propionate
NMDA N-methyl-D-aspartic acid
HD Huntington’s disease
UNIX UNiplexed Information and Computing System
EASE Expression Analysis Systematic Explorer
DAVID Database for Annotation Visualization and Integrated Discovery
EBI European Bioinformatics Institute
UniProtKB UniProtKnowledgebase
IPI International Protein Index
IPA Ingenuity Pathway Analysis
NCBO National Center for Biomedical Ontology
ICD9CM The International Classification of Diseases, Ninth Revision, Clinical Modification
SNOMED Systematized Nomenclature of Medicine-Clinical Terms
CNS Central Nervous System
HPA Hypothalamic-Pituitary-Adrenal axis
ACTH adrenocorticotropic hormone
ChEBI Chemical Entities of Biological Interest
CAS Chemical Abstract Service Registry Database
101
Bibliography
[1] S. Philippi and J. Kohler. Addressing the problems with life-science databases for traditional uses
and systems biology. Nat Rev Genet, 7(6):482- 8, 2006.
[2] M. A. Harris, J. Clark, A. Ireland, J. Lomax, M. Ashburner, R. Foulger, K. Eilbeck, S. Lewis, B.
Marshall, C. Mungall, J. Richter, G. M. Rubin, J. A. Blake, C. Bult, M. Dolan, H. Drabkin, J. T.
Eppig, D. P. Hill, L. Ni, M. Ringwald, R. Balakrishnan, J. M. Cherry, K. R. Christie, M. C.
Costanzo, S. S. Dwight, S. Engel, D. G. Fisk, J. E. Hirschman, E. L. Hong, R. S. Nash, A.
Sethuraman, C. L. Theesfeld, D. Botstein, K. Dolinski, B. Feierbach, T. Berardini, S. Mundodi, S.
Y. Rhee, R. Apweiler, D. Barrell, E. Camon, E. Dimmer, V. Lee, R. Chisholm, P. Gaudet, W.
Kibbe, R. Kishore, E. M. Schwarz, P. Sternberg, M. Gwinn, L. Hannick, J. Wortman, M.
Berriman, V. Wood, N. de la Cruz, P. Tonellato, P. Jaiswal, T. Seigfried, and R. White. The Gene
Ontology (GO) database and informatics resource. Nucleic Acids Res, 32(Database issue):D258-
61, 2004.
[3] T. R. Gruber. Towards principles for the design of ontologies used for knowledge sharing. In N.
Guarino and R. Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge
Representation, Deventer, The Netherlands, 1993. Kluwer Academic Publishers.
[4] P. Lambrix, M. Habbouche, and M. Perez. Evaluation of ontology development tools for
bioinformatics. Bioinformatics, 19(12):1564- 71, 2003.
[5] R. Mack and M. Hehenberger. Text-based knowledge discovery: search and mining of life-
sciences documents. Drug Discov Today, 7(11 Suppl):S89- 98, 2002.
102 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[6] M. Harris and H Parkinson. Standards and Ontologies for Functional Genomics: Towards Unified
Ontologies for Biology and Biomedicine. Comparative and Functional Genomics, 4(1):116- 120,
2003. doi:10.1002/cfg.249.
[7] M. Deng, Z. Tu, F. Sun, and T. Chen. Mapping Gene Ontology to proteins based on protein-
protein interaction data. Bioinformatics, 20(6):895- 902, 2004.
[8] O. Bodenreider and R. Stevens. Bio-ontologies: current trends and future directions. Brief
Bioinform, 7(3):256- 74, 2006.
[9] J. I. Clark, C. Brooksbank, and J. Lomax. It's all GO for plant scientists. Plant Physiol,
138(3):1268- 79, 2005.
[10] J. B. Bard and S. Y. Rhee. Ontologies in biology: design, applications and future challenges. Nat
Rev Genet, 5(3):213- 22, 2004.
[11] T. R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge
Acquisition, 5(2):199- 220, 1993.
[12] B. Andersen. What is an ontology? Ontology Works (http://www.ontologyworks.com), 2001.
[13] O. Bodenreider, J. A. Mitchell, and A. T. McCray. Biomedical ontologies. Pac Symp Biocomput,
pages 76-8, 2005.
[14] S. Schulze-Kremer. Ontologies for molecular biology and bioinformatics. In Silico Biol, 2(3):179-
93, 2002.
[15] J. S. Caldwell. Ontology recapitulates physiology. Chem Biol, 10(9):784-6, 2003.
103 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[16] D. Devos and A. Valencia. Intrinsic errors in genome annotation. Trends Genet, 17(8):429-31,
2001.
[17] C. Blaschke and A. Valencia. Automatic ontology construction from the literature. Genome
Inform, 13:201- 13, 2002.
[18] B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, L. J. Goldberg, K. Eilbeck, A.
Ireland, C. J. Mungall, N. Leontis, P. Rocca-Serra, A. Ruttenberg, S. A. Sansone, R. H.
Scheuermann, N. Shah, P. L. Whetzel, and S. Lewis. The OBO Foundry: coordinated evolution of
ontologies to support biomedical data integration. Nat Biotechnol, 25(11):1251- 1255, 2007.
[19] B. Smith, W. Ceusters, B. Klagges, J. Kohler, A. Kumar, J. Lomax, C. Mungall, F. Neuhaus, A. L.
Rector, and C. Rosse. Relations in biomedical ontologies. Genome Biol, 6(5):R46, 2005.
[20] J. A. Blake and C. J. Bult. Beyond the data deluge: data integration and bio-ontologies. J Biomed
Inform, 39(3):314- 20, 2006.
[21] M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K.
Dolinski, S. S. Dwight, J. T. Eppig, M. A. Harris, D. P. Hill, L. Issel-Tarver, A. Kasarskis, S.
Lewis, J. C. Matese, J. E. Richardson, M. Ringwald, G. M. Rubin, and G. Sherlock. Gene
ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1):25-
9, 2000.
[22] D. M. Jones and R. C. Paton. Toward principles for the representation of hierarchical knowledge in
formal ontologies. Data & Knowledge Engineering, 31(2):99-113, 1999.
[23] R. Stevens, C. A. Goble, and S. Bechhofer. Ontology-based knowledge representation for
bioinformatics. Brief Bioinform, 1(4):398-414, 2000.
104 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[24] J. Blake and M. Harris. The Gene Ontology Project: Structured vocabularies for molecular biology
and their application to genome and expression analysis. Baxevanis, A.D. Davison, D.B. Page, R.
Stormo, G. Stein, L.Current Protocols in Bioinformatics, Wiley & Sons, New York., 2003.
[25] K. Olden and S. Wilson. Environmental health and genomics: visions and implications. Nat Rev
Genet, 1(2):149- 53, 2000.
[26] O. Bodenreider. The Uni_ed Medical Language System (UMLS): integrating biomedical
terminology. Nucleic Acids Res, 32(Database issue):D267- 70, 2004.
[27] A. Hamosh, A. F. Scott, J. S. Amberger, C. A. Bocchini, and V. A. McKusick. Online Mendelian
Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic
Acids Res, 33(Database issue):D514- 7, 2005.
[28] K. G. Becker, K. C. Barnes, T. J. Bright, and S. A. Wang. The genetic association database. Nat
Genet, 36(5):431- 2, 2004.
[29] D. S. Wishart, C. Knox, A. C. Guo, S. Shrivastava, M. Hassanali, P. Stothard, Z. Chang, and J.
Woolsey. DrugBank: a comprehensive resource for in silico drug discovery and exploration.
Nucleic Acids Res, 34(Database issue):D668- 72, 2006.
[30] M. Hewett, D. E. Oliver, D. L. Rubin, K. L. Easton, J. M. Stuart, R. B. Altman, and T. E. Klein.
PharmGKB: the Pharmacogenetics Knowledge Base. Nucleic Acids Res, 30(1):163- 5, 2002.
[31] R. B. Altman. PharmGKB: a logical home for knowledge relating genotype to drug response
phenotype. Nat Genet, 39(4):426, 2007.
[32] T. Hernandez-Boussard, M. Whirl-Carrillo, J. M. Hebert, L. Gong, R. Owen, M. Gong, W. Gor, F.
Liu, C. Truong, R. Whaley, M. Woon, T. Zhou, R. B. Altman, and T. E. Klein. The
105 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
pharmacogenetics and pharmacogenomics knowledge base: accentuating the knowledge. Nucleic
Acids Res, 2007.
[33] E. Castrén. Is mood chemistry? Nat Rev Neurosci 6(3):241-246, 2005
[34] D.S. Charney, H.K. Manji. Life stress, genes, and depression: multiple pathways lead to increased
risk and new opportunities for intervention. Sci STKE (225):re5, 2004
[35] R.S. Duman, G.R. Heninger, E.J. Nestler. A molecular and cellular theory of depression. Arch Gen
Psychiatry 54(7):597-606, 1997.
[36] E. J. Nestler, M. Barrot, R. J. DiLeone, A. J. Eisch, S. J. Gold, and L. M. Monteggia. Neurobiology
of depression. Neuron, 34(1):13-25, 2002.
[37] F. Holsboer. Stress, hypercortisolism and corticosteroid receptors in depression: implications for
therapy. J Affect Disord, 62(1-2):77-91, 2001.
[38] R. M. Sapolsky. Glucocorticoids and hippocampal atrophy in neuropsychiatric disorders. Arch Gen
Psychiatry, 57(10):925-35, 2000.
[39] A. A. Russo-Neustadt and M. J. Chen. Brain-derived neurotrophic factor and antidepressant
activity. Curr Pharm Des, 11(12):1495-510, 2005.
[40] K.M. Harris. Structure, development, and plasticity of dendritic spines. Curr Opin Neurobiol,
9:343–348, 1999.
[41] H. Hering and M. Sheng. Dentritic spines : structure, dynamics and regulation. Nature Reviews
Neuroscience 2, 880-888, 2001.
[42] E. Castrén. Neurotrophic effects of antidepressant drugs. Curr Opin Pharmacol 4(1):58-64, 2004.
106 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[43] J.E Malberg, A.J. Eisch, E.J. Nestler, R.S. Duman. Chronic antidepressant treatment increases
neurogenesis in adult rat hippocampus. J Neurosci 20(24):9104-9110, 2000.
[44] H. van Praag, G. Kempermann, F.H. Gage. Neural consequences of environmental enrichment. Nat
Rev Neurosci 1(3):191-198, 2000.
[45] L. Santarelli, M. Saxe, C. Gross, A. Surget, F. Battaglia, S. Dulawa, N. Weisstaub, J. Lee, R.
Duman, O. Arancio, C. Belzung, R. Hen. Requirement of hippocampal neurogenesis for the
behavioral effects of antidepressants. Science 301(5634):805-809, 2003.
[46] A.K. McAllister, D.C. Lo, L.C. Katz. Neurotrophins regulate dendritic growth in developing visual
cortex. Neuron 15:791–803, 1995.
[47] H.W. Horch, L.C. Katz. BDNF release from single cells elicits local dendritic growth in nearby
neurons. Nat Neurosci 5(11):1177–84, 2002.
[48] A.K. McAllister. Cellular and molecular mechanisms of dendrite growth. Cereb Cortex 10:963–73,
2000.
[49] G. Dennis Jr, B.T. Sherman,
D.A. Hosack,
J. Yang,
W. Gao,
H.C. Lane,
and R.A. Lempicki.
DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol
4(9):R60, 2003.
[50] DAVID [http://www.DAVID.niaid.nih.gov]
[51] G. Elvidge. Microarray expression technology: from start to finish. Pharmacogenomics 7(1):123-
34, 2006
[52] R.B. Stoughton. Applications of DNA microarrays in biology. Annu Rev Biochem. 74:53-82, 2005
107 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[53] W. Weckwerth, K. Morgenthal. Metabolomics: from pattern recognition to biological
interpretation. Drug Discov Today. 10(22):1551-8, 2005
[54] C.R. Stubberfield, M.J. Page. Applying proteomics to drug discovery. Expert Opin Investig Drugs.
Jan;8(1):65-70, 1999
[55] M.R. Flory, R. Aebersold. Proteomic approaches for the identification of cell cycle-related drug
targets. Prog Cell Cycle Res 5:167-71, 2003
[56] J.F. Rual, K. Venkatesan, T. Hao, T. Hirozane-Kishikawa, A. Dricot, N. Li, G.F. Berriz, F.D.
Gibbons, M. Dreze, N. Ayivi-Guedehoussou, N. Klitgord, C. Simon, M. Boxem, S. Milstein, J.
Rosenberg, D.S. Goldberg, L.V. Zhang, S.L. Wong, G. Franklin, S. Li, J.S. Albala, J. Lim, C.
Fraughton, E. Llamosas, S. Cevik, C. Bex, P. Lamesch, R.S. Sikorski, J. Vandenhaute, H.Y.
Zoghbi, A. Smolyar, S. Bosak, R. Sequerra, L. Doucette-Stamm, M.E. Cusick, D.E. Hill, F.P.
Roth, M. Vidal, Towards a proteome-scale map of the human protein-protein interaction network.
Nature 20;437(7062):1173-8, 2005
[57] S.E. Calvano, W. Xiao, D.R. Richards, R.M. Felciano, H.V. Baker, R.J. Cho, R.O. Chen, B.H.
Brownstein, J.P. Cobb, S.K. Tschoeke, C. Miller-Graziano, L.L. Moldawer, M.N. Mindrinos, R.W.
Davis, R.G. Tompkins, S.F. Lowry. Inflamm and Host Response to Injury Large Scale Collab. Res.
Program. A network-based analysis of systemic inflammation in humans. Nature
13;437(7061):1032-7, 2005.
[58] E.F. Moore (1959) The Shortest Path Through a Maze. Proc. International Symposium on the
Theory of Switching,Part II, Vol. 30 of “The Annals of the Computation Laboratory of Harvard
University”, Cambridge, MA, Harvard University Press, 1959
[59] A. Nikitin, S. Egorov, N. Daraselia and I. Mazo. Pathway studio - the analysis and navigation of
molecular networks. Alexander Bioinformatics 19 (0):1-3, 2003.
108 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[60] T.L. Spires and A.J. Hannan. Molecular mechanisms mediating pathological plasticity in
Huntington's disease and Alzheimer's disease. Journal of Neurochemistry 100:874–882, 2007.
[61] M. Fischer, S. Kaech, U. Wagner, H. Brinkhaus and A. Matus. Glutamate receptors regulate actin-
based plasticity in dendritic spines. Nature Neuroscience 3:887–894, 2000.
[62] R Klein. Eph/ephrin signaling in morphogenesis, neural development and plasticity. Current
Opinion in Cell Biology 16(5):580–589, 2004.
[63] L.A. Glantz, D.A. Lewis. Reduction of synaptophysin immunoreactivity in the prefrontal cortex of
subjects with schizophrenia. Regional and diagnostic specificity. Arch Gen Psychiatry 54: 943–
952, 1997.
[64] D.A. Lewis, D.A. Cruz, D.S. Melchitzky, J.N. Pierri. Lamina-specific deficits in parvalbumin-
immunoreactive varicosities in the prefrontal cortex of subjects with schizophrenia: evidence for
fewer projections from the thalamus. Am J Psychiatry 158: 1411–1422, 2001.
[65] G. Rajkowska, L.D. Selemon, P.S. Goldman-Rakic. Neuronal and glial somal size in the prefrontal
cortex: a postmortem morphometric study of schizophrenia and Huntington disease. Arch Gen
Psychiatry 55: 215–224, 1998.
[66] J.N. Pierri, C.L. Volk, S. Auh, A. Sampson, D.A. Lewis. Decreased somal size of deep layer 3
pyramidal neurons in the prefrontal cortex of subjects with schizophrenia. Arch Gen Psychiatry 58:
466–473, 2001.
[67] M. Segal, V. Greenberger, E. Korkotian. Formation of dendritic spines in cultured striatal neurons
depends on excitatory afferent activity. Eur J Neurosci 17: 2573–2585, 2003.
109 ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[68] A.R. Sweet, R.A. Henteleff, W. Zhang, A.R. Sampson and D.A. Lewis. Reduced Dendritic Spine
Density in Auditory Cortex of Subjects with Schizophrenia. Neuropsychopharmacology 34, 374–
389, 2009.
[69] D.J. Selkoe, A. Triller, C. Yves. Synaptic Plasticity and the Mechanism of Alzheimer's Disease.
Springer (Eds.), 2008.
[70] J. Coyle , R. Duman. Finding the Intracellular Signaling Pathways Affected by Mood Disorder
Treatments . Neuron 38, 157 – 160, 2003
[71] M. Ribasés, M. Gratacòs, F. Fernández-Aranda, L. Bellodi, C. Boni, M. Anderluh, M.C. Cavallini,
E. Cellini, D. Di Bella, S. Erzegovesi, C. Foulon, M. Gabrovsek, P. Gorwood, J. Hebebrand,
A. Hinney, J. Holliday, X. Hu, A. Karwautz, A. Kipman, R. Komel, B. Nacmias, H. Remschmidt,
V. Ricca, S. Sorbi, M. Tomori, G. Wagner, J. Treasure, D. A. Collier and X. Estivill. Association
of BDNF with restricting anorexia nervosa and minimum body mass index: a family-based
association study of eight European populations. Eur J Hum Genet 13, 428–434, 2005.
[72] A.C. Conner, C. Kissling, E. Hodges, R. Hünnerkopf, R.M. Clement, E. Dudley, C.M. Freitag, M.
Rösler, W. Retz, J. Thome. Neurotrophic Factor-Related Gene Polymorphisms and Adult Attention
Deficit Hyperactivity Disorder (ADHD) Score in a High-Risk Male Population. Am J Med Genet B
Neuropsychiatr Genet. 147B(8):1476-1480, 2008.
[73] M. Dierssen, M. Gratacos, I. Sahun, M. Martin, X. Gallego, A. Amador-Arjona, M. Martinez de
Lagran, P. Murtra, E. Marti, M. A. Pujana, I. Ferrer, E. Dalfo, C. Martinez-Cue, J. Florez, J. F.
Torres-Peraza, J. Alberch, R. Maldonado, C. Fillat, X. Estivill, Transgenic mice overexpressing the
full-length neurotrophin receptor TrkC exhibit increased catecholaminergic neuron density in
specific brain areas and increased anxiety-like behavior and panic reaction. Neurobiology of
Disease 24(2):403-418, 2006.
110 BIBLIOGRAPHY ____________________________________________________________________________________________________________________________________________________________________________________________________________________________
[74] T. Frodl, P. Zill, T. Baghai, C. Schüle, R. Rupprecht, T. Zetzsche, B. Bondy, M. Reiser, H.J.
Möller, E.M. Meisenzahl. Reduced hippocampal volumes associated with the long variant of the
tri- and diallelic serotonin transporter polymorphism in major depression. Am J Med Genet B
Neuropsychiatr Genet. 147B(7):1003-1007, 2008
[75] D . Rujescu, I. Giegling, A. Gietl, C. Gonnermann, A. Kirner and H.J. Möller. Association study of
a SNP coding for a M129V substitution in the prion protein in schizophrenia. Schizophrenia
Research 62(3):289-291, 2003