72
Building a Suite of Biomedical Ontologies Barry Smith 1

Building a Suite of Biomedical Ontologies Barry Smith 1

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Building a Suite of Biomedical Ontologies

Barry Smith

1

Problems with UMLS-style approaches

• let a million ontologies bloom, each one close to the terminological habits of its authors

• in concordance with the “not invented here” syndrome

• then map these ontologies, and use these mappings to integrate your different pots of data

2

Mappings are hardThey create an N2 problem; are fragile, and expensive to

maintainNeed new authorities to maintain(one for each pair of

mapped ontologies), yielding new risk of forking – who will police the mappings?

The goal should be to minimize the need for mappings, by avoiding redundancy in the first place – one ontology for each domain

Invest resources in disjoint ontology modules which work well together – reduce need for mappings to minimum possible

3

How to do it right?• how create an incremental, evolutionary process,

where what is good survives, and what is bad fails• where the number of ontologies needing to be

used together is small – integration = addition• where these ontologies are stable• by creating a scenario in which people will find it

profitable to reuse ontologies, terminologies and coding systems which have been tried and tested

4

Modularity

modularity ensures • annotations can be additive• division of labor amongst domain experts• high value of training in any given module• lessons learned in one module can benefit

work on other modules• incentivization of those responsible for

individual modules

5

Reasons why GO has been successful

It is a system for prospective standardization built with coherent top level but with content contributed and monitored by domain specialists

Based on community consensusUpdated every nightClear versioning principles ensure backwards

compatibility; prior annotations do not lose their value

Initially low-tech to encourage users, with movement to more powerful formal approaches (including OWL-DL – though still proceeding caution)

6

GO has learned the lessons of successful cooperation

• Clear documentation• The terms chosen are already familiar• Fully open source (allows thorough testing in

manifold combinations with other ontologies)• Subjected to considerable third-party critique• Tracker for user input and help desk with rapid

turnaround

7

GO has been amazingly successful in overcoming the data balkanization

problembut it covers only generic biological entities of three sorts:

– cellular components– molecular functions– biological processes

no diseases, symptoms, disease biomarkers, protein interactions, experimental processes …

8

How create a disease ontology?

• One option: a flat list• One option: template approach

– Cancer– Infectious Disease– Diabetes– Autoimmune Disease

• To make this work: think very hard about what a disease is

9

Aristotelian definitions

• To define a term ‘A’ in an ontology identify the parent term ‘B’ and start your definition:

• An A is a B which … Cs ….

A = speciesB = genusC = differentia

10

• Cancer disease is a disease which …• Genetic disease is a disease which …• Infectious disease is a disease which …

11

Information Artifact

Ontology(IAO)

Ontology for Biomedical

Investigations(OBI)

Ontology of General Medical Science (OGMS)

Basic Formal Ontology (BFO)

12

Anatomy Ontology(FMA*, CARO)

Environment

Ontology(EnvO)

Infectious Disease

Ontology(IDO*)

Biological Process

Ontology (GO*)

Cell Ontology

(CL)

CellularComponentOntology

(FMA*, GO*) Phenotypic Quality

Ontology(PaTO)

Subcellular Anatomy Ontology (SAO)Sequence Ontology

(SO*) Molecular Function

(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organization

top level

mid-level

domain level

Information Artifact Ontology

(IAO)

Ontology for Biomedical Investigations

(OBI)

Ontology of General Medical Science

(OGMS)

Basic Formal Ontology (BFO)

13

Ontology for General Medical Science

http://code.google.com/p/ogms/

(OBO) http://purl.obolibrary.org/obo/ogms.obo

(OWL) http://purl.obolibrary.org/obo/ogms.owl

14

OGMS-based initiatives

Vital Signs Ontology (VSO)

EHR / Demographics Ontology

Infectious Disease Ontology (IDO)

Psychology Ontology (PSY)

Emotion Ontology (PSY-EM)

Genetic Disease Ontology

Cancer Ontology

15

BFO: the very top

Continuant Occurrent(Process, Event)

IndependentContinuant

DependentContinuant

16

RELATION TO TIME

GRANULARITY

CONTINUANT OCCURRENT

INDEPENDENT DEPENDENT

ORGAN ANDORGANISM

Organism(NCBI

Taxonomy)

Anatomical Entity(FMA, CARO)

OrganFunction

(FMP, CPRO) Phenotypic

Quality(PaTO)

Biological Process

(GO)CELL AND CELLULAR

COMPONENT

Cell(CL)

Cellular Compone

nt(FMA, GO)

Cellular Function

(GO)

MOLECULEMolecule

(ChEBI, SO,RnaO, PrO)

Molecular Function(GO)

Molecular Process

(GO)

17

BFO & GO

continuant occurrent

biological processes

independentcontinuant

cellular component

dependentcontinuant

molecular function

18

Basic Formal Ontology

Continuant Occurrent

process, eventIndependentContinuant

thing

DependentContinuant

quality

.... ..... .......

types

instances19

Experience with BFO in building ontologies provides

• a community of skilled ontology developers and users (user group has 120 members)

• associated logical tools • documentation for different types of users• a methodology for building conformant

ontologies by starting with BFO and populating downwards

20

Example: The Cell Ontology

How to build an ontologyimport BFO into ontology editor such as Protégé

work with domain experts to create an initial mid-level classification

find ~50 most commonly used terms corresponding to types in reality

arrange these terms into an informal is_a hierarchy according to this universality principle

A is_a B every instance of A is an instance of B

fill in missing terms to give a complete hierarchy

(leave it to domain experts to populate the lower levels of the hierarchy)

22

Basic Formal Ontology

continuant occurrent

independentcontinuant

dependentcontinuant

organism

23

Continuants

• continue to exist through time, preserving their identity while undergoing different sorts of changes

• independent continuants – objects, things, ...

• dependent continuants – qualities, attributes, shapes, potentialities ...

24

Occurrents

• processes, events, happenings– your life– this process of accelerated cell

division

25

Qualitiestemperatureblood pressuremass...

are continuantsthey exist through time while undergoing changes

26

Qualitiestemperature / blood pressure /

mass ...are dimensions of variation within the structure of the entitya quality is something which can change while its bearer remains one and the same

27

A Chart representing how John’s temperature

changes

28

A Chart representing how John’s temperature

changes

29

John’s temperature,the temperature he has throughout his entire life, cycles through different determinate temperatures from one time to the next

John’s temperature in thus changing, exerts an influence on other dimensions of variation in the physiology of the organism through time

30

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

quality

occurrent

temperature 31

Blinding Flash of the Obvious

independentcontinuant

dependentcontinuant

quality

temperature types

instances

organism

John John’s

temperature 32

Blinding Flash of the Obvious

independentcontinuant

dependentcontinuant

quality

temperature types

instances

organism

John John’s

temperature 33

Blinding Flash of the Obvious

temperature types

instances

organism

John John’s

temperature .inheres_in

34

temperature types

instances

John’s temperature

37ºC37.1º

C37.5º

C37.2º

C37.3º

C37.4º

C

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

instantiates at t6

35

human types

instances

John

embryo

fetus adultneonat

einfant child

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

instantiates at t6

36

whole plant continuants

occurrents37

zygote

pro-embry

o

mature whole plant

globular

embryo

bilateral

embryo...

becomes reproductivel

y able

fertili-zation

first cell

division

child transformation_of fetus

38

Temperature subtypesDevelopment-stage

subtypes

are threshold divisions (hence we do not have sharp boundaries, and we have a certain degree of choice, e.g. in how many subtypes to distinguish, though not in their ordering)

39

independentcontinuant

dependentcontinuant

quality

temperature types

instances

organism

John John’s

temperature

40

independentcontinuant

dependentcontinuant

quality

temperature

organism

John John’s

temperature

occurrent

process

course of temperature

changes

John’s temperature history

41

independentcontinuant

dependentcontinuant

quality

temperature

organism

John John’s

temperature

occurrent

process

life of an organism

John’s life

42

BFO: The Very Top

continuant occurrent

independentcontinuant

dependentcontinuant

quality disposition

43

BFO: The Very Top

continuant

independentcontinuant

dependentcontinuant

qualityfunctionroledisposition

occurrent

44

disposition- of a glass vase, to shatter if dropped- of a human, to eat - of a banana, to ripen- of John, to lose hair

45

dispositionif it ceases to exist, then its bearer and/or its immediate surrounding environment is physically changedits realization occurs when its bearer is in some special physical circumstancesits realization is what it is in virtue of the bearer’s physical make-up

46

function - of liver: to store glycogen- of birth canal: to enable transport- of eye: to see- of mitochondrion: to produce ATP

not optional; reflection of physical makeup of bearer; subtype of disposition

47

independentcontinuant

dependentcontinuant

function

to seeeye

John’s eye function of John’s eye: to see

occurrent

process

process of seeing

John seeing

48

OGMSOntology for General Medical

Science

http://code.google.com/p/ogms

49

Physical Disorder

50

:.

Physical Disorder

– independent continuantfiat object part

A causally linked combination of physical components of the extended organism that is clinically abnormal.

51

Clinically abnormal

– (1) not part of the life plan for an organism of the relevant type (unlike aging or pregnancy),

– (2) causally linked to an elevated risk either of pain or other feelings of illness, or of death or dysfunction, and

– (3) such that the elevated risk exceeds a certain threshold level.*

*Compare: baldness52

Big Picture

53

Pathological Process=def. A bodily process that is a manifestation of a disorder and is clinically abnormal.

Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.

54

Cirrhosis - environmental exposure

• Etiological process - phenobarbitol-induced hepatic cell death– produces

• Disorder - necrotic liver– bears

• Disposition (disease) - cirrhosis– realized_in

• Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death– produces

• Abnormal bodily features– recognized_as

• Symptoms - fatigue, anorexia• Signs - jaundice, enlarged spleen

55

Dispositions and Predispositions

All diseases are dispositions; not all dispositions are diseases.

Predisposition to Disease

=def. – A disposition in an organism that constitutes an increased risk of the organism’s subsequently developing some disease.

56

HNPCC - genetic pre-disposition• Etiological process - inheritance of a mutant mismatch repair gene

– produces• Disorder - chromosome 3 with abnormal hMLH1

– bears• Disposition (disease) - Lynch syndrome

– realized_in• Pathological process - abnormal repair of DNA mismatches

– produces• Disorder - mutations in proto-oncogenes and tumor suppressor genes with

microsatellite repeats (e.g. TGF-beta R2)– bears

• Disposition (disease) - non-polyposis colon cancer– realized in

• Symptoms (including pain)

57

Huntington’s Disease – genetic disease

• Etiological process - inheritance of >39 CAG repeats in the HTT gene– produces

• Disorder - chromosome 4 with abnormal mHTT– bears

• Disposition (disease) - Huntington’s disease– realized_in

• Pathological process - accumulation of mHTT protein fragments, abnormal transcription regulation, neuronal cell death in striatum– produces

• Abnormal bodily features– recognized_as

• Symptoms - anxiety, depression• Signs - difficulties in speaking and

swallowing

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out Huntington’s suggests

Laboratory tests produces

Test results - molecular detection of the HTT gene with >39CAG repeats used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease Huntington’s disease

58

Cirrhosis - environmental exposure

• Etiological process - phenobarbitol-induced hepatic cell death

– produces

• Disorder - necrotic liver

– bears

• Disposition (disease) - cirrhosis

– realized_in

• Pathological process - abnormal tissue repair with cell proliferation and fibrosis that exceed a certain threshold; hypoxia-induced cell death

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - fatigue, anorexia

• Signs - jaundice, splenomegaly

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out cirrhosis suggests

Laboratory tests produces

Test results - elevated liver enzymes in serum used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease cirrhosis

59

Systemic arterial hypertension

• Etiological process – abnormal reabsorption of NaCl by the kidney

– produces

• Disorder – abnormally large scattered molecular aggregate of salt in the blood

– bears

• Disposition (disease) - hypertension

– realized_in

• Pathological process – exertion of abnormal pressure against arterial wall

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - headaches, dizziness

• Signs – elevated blood pressure

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out hypertension suggests

Laboratory tests produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease hypertension

60

Type 2 Diabetes Mellitus• Etiological process –

– produces• Disorder – abnormal pancreatic beta

cells and abnormal muscle/fat cells– bears

• Disposition (disease) – diabetes mellitus– realized_in

• Pathological processes – diminished insulin production , diminished muscle/fat uptake of glucose

– produces• Abnormal bodily features

– recognized_as• Symptoms – polydipsia, polyuria,

polyphagia, blurred vision• Signs – elevated blood glucose and

hemoglobin A1c

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - rule out diabetes mellitus suggests

Laboratory tests – fasting serum blood glucose, oral glucose challenge test, and/or blood hemoglobin A1c produces

Test results - used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 2 diabetes mellitus

61

Type 1 hypersensitivity to penicillin• Etiological process – sensitizing of mast

cells and basophils during exposure to penicillin-class substance

– produces• Disorder – mast cells and basophils with

epitope-specific IgE bound to Fc epsilon receptor I

– bears• Disposition (disease) – type I

hypersensitivity– realized_in

• Pathological process – type I hypersensitivity reaction

– produces• Abnormal bodily features

– recognized_as• Symptoms – pruritis, shortness of breath• Signs – rash, urticaria, anaphylaxis

Symptoms & Signs used_in

Interpretive process produces

Hypothesis - suggests

Laboratory tests – produces

Test results – occasionally, skin testing used_in

Interpretive process produces

Result - diagnosis that patient X has a disorder that bears the disease type 1 hypersensitivity to penicillin

62

63

Disease vs. Disease course

Disease =def. – A disposition to undergo pathological processes that exists in an organism because of one or more disorders in that organism.

Disease course =def. – The aggregate of processes in which a disease disposition is realized.

64

coronary heart disease

John’s coronary heart disease

disease associated

with asymptomatic

(‘silent’) infarction

disease associated with early

lesions and small fibrous

plaques

stable angina

disease associated

with surface disruption of plaque

unstable angina

instantiates at t1

instantiates at t2

instantiates at t3

instantiates at t4

instantiates at t5

time65

independentcontinuant

dependentcontinuant

disposition

diseasedisorder

John’s disordered

heart

John’s coronary heart

disease

occurrent

process

course of disease

course of John’s disease

66

OGMS IDO

Independent Continuant

DisorderInfectious disorder

Dependent Continuant

Disease

Predisposition to disease

Infectious disease

Protective resistance

Occurrent Disease courseInfectious

disease course

Examples of ontology terms

IDO (Infectious Disease Ontology) CoreFollows GO strategy of providing a

canonical ontology of what is involved in every infectious disease – host, pathogen, vector, virulence, vaccine, transmission – accompanied by IDO Extensions for specific diseases, pathogens and vectorsProvides common terminology resources and tested common guidelines for a vast array of different disease communities

68

Infectious Disease Ontology Consortium• MITRE, Mount Sinai, UTSouthwestern –

Influenza• IMBB/VectorBase – Vector borne diseases

(A. gambiae, A. aegypti, I. scapularis, C. pipiens, P. humanus)

• Colorado State University – Dengue Fever• Duke University – Tuberculosis, Staph.

aureus• Cleveland Clinic – Infective Endocarditis• University of Michigan – Brucellosis• Duke University, University at Buffalo – HIV

69

Influenza - infectious

• Etiological process - infection of airway epithelial cells with influenza virus

– produces

• Disorder - viable cells with influenza virus

– bears

• Disposition (disease) - flu

– realized_in

• Pathological process - acute inflammation

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - weakness, dizziness

• Signs - fever 70

Influenza – disease course

• Etiological process - infection of airway epithelial cells with influenza virus

– produces

• Disorder - viable cells with influenza virus

– bears

• Disposition (disease) - flu

– realized_in

• Pathological process - acute inflammation

– produces

• Abnormal bodily features

– recognized_as

• Symptoms - weakness, dizziness

• Signs - fever 71

The disorder also induces normal physiological processes (immune response) that can results in the elimination of the disorder (transient disease course).

Big Picture

72