Upload
cameron-wilson
View
224
Download
2
Tags:
Embed Size (px)
Citation preview
The Lexical Grid1/14/2005 Copyright © 2005 Mayo Clinic 1
1/14/2005 Copyright © 2005 Mayo Clinic 2
Outline
• Purpose - why we built the Lexical Grid
• Function – what the Lexical Grid can do
1/14/2005 Copyright © 2005 Mayo Clinic 3
PurposeWhy did we built the Lexical Grid?
Communication
1/14/2005 Copyright © 2005 Mayo Clinic 4
Communication
Communication
n. 2. the imparting or interchange of thoughts, opinions, of information by speech, writing or signs.
Information
n. 2. any knowledge gained through communication, research, instruction, etc.
Random House Dictionary of the English Language, 1983
1/14/2005 Copyright © 2005 Mayo Clinic 5
Language and the Communication Process
• Language - a “specification” that enables communication
• Semantics - the association between signs or symbols and their intended “meaning”
• Syntax - the rules for ordering and structuring the signs into phrases and sentences
• Pragmatics - the relationship between signs and symbols and the recipient. Broadly, the shared context.
1/14/2005 Copyright © 2005 Mayo Clinic 6
Ogden’s Semiotic Triangle
C.K Ogden and I. A. Richards. The Meaning of Meaning.
Thought or Reference
Referent Symbol
SymbolisesRefers to
Stands for“Rose”, “ClipArt”
1/14/2005 Copyright © 2005 Mayo Clinic 7
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
The Communication Process
1/14/2005 Copyright © 2005 Mayo Clinic 8
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
The Communication Process
1/14/2005 Copyright © 2005 Mayo Clinic 9
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
Syntax
The Communication Process
1/14/2005 Copyright © 2005 Mayo Clinic 10
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
Syntax
Context Context
Shared Context
The Communication Process
1/14/2005 Copyright © 2005 Mayo Clinic 11
Shared Context
Impacts how much information can be contained in a symbol.
NoSharedContext
SharedUniverse
CommonLanguage
SharedSpecies
SharedPlanet
SharedSun
CommonCulture
SimilarEducation
CommonProfession
CommonSpecialty
Information / Symbol
1/14/2005 Copyright © 2005 Mayo Clinic 12
Minimum Shared Context
1/14/2005 Copyright © 2005 Mayo Clinic 13
Shared Culture
“It is possible to find the truth without controls, but the process has been demonstrated again and again to be notably inefficient, so that years may be required before it is appreciated that a given treatment is worthless...”
1/14/2005 Copyright © 2005 Mayo Clinic 14
Shared Specialties
“Both ontologically and in order of explanation, the intentionality of the propositional attitudes is prior to the intentionality of natural languages; and, both ontologically and in order of explanation, the intentionality of mental representations is prior to the intentionality of propositional attitudes.”
1/14/2005 Copyright © 2005 Mayo Clinic 15
The impact of context on communication
Shared context:• Allows information to be communicated in larger,
more succinct “chunks”.• Drug, analgesic and NSAID are all “chunks”,
yet differ markedly in conceptual complexity.• Enables specialized symbol sets:
• Contrast the amount of information contained in the formula E=MC2 versus that contained in this presentation...
1/14/2005 Copyright © 2005 Mayo Clinic 16
Contextual Formalism
The degree of formality in a shared context can vary across a wide spectrum:
• Tacit context which is simply presumed• Contextual negotiation proceeding the
actual message• Rigorous and formal rules and documents
describing the form and possible meanings behind every message and phrase.
1/14/2005 Copyright © 2005 Mayo Clinic 17
Factors Effecting the Degree Contextual Formalism
• Number of participating parties• Formalism needs to increase as number of
participants increase
• Geographic, cultural and temporal proximity of communicators
• The further apart communicators are, the less they can assume
• Amount of shared context• The more you have, the more important it
becomes to be organized
1/14/2005 Copyright © 2005 Mayo Clinic 18
Factors Effecting the Degree Contextual Formalism
• The cost of imprecise communication• Poetry and literature - low cost (some may argue
actual gain)• Technical and professional - high to very high
cost• What is the cost of assuming the units of a
thrust specification?• What is the cost of assuming the dose of a
prescription?• What is the cost of assuming the century in
which the communication originated?
1/14/2005 Copyright © 2005 Mayo Clinic 19
Common Forms of Contextual Formalism
• Dictionaries
• Thesauri
• Textbooks, college courses, etc.
• Operations manuals
• Data dictionaries
• Terminologies
1/14/2005 Copyright © 2005 Mayo Clinic 20
Shared Context...
1/14/2005 Copyright © 2005 Mayo Clinic 21
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Semantics
Syntax
Context Context
Shared Context
The Communication Process
1/14/2005 Copyright © 2005 Mayo Clinic 22
The Communication Process
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Lexicon
Information Model /Data Model
Context Context
Shared Context
The Communication Process
1/14/2005 Copyright © 2005 Mayo Clinic 23
Making Shared Context Explicit
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Context Context
Formal SharedContext
Terminologies Terminologies
Making Shared Context Explicit
1/14/2005 Copyright © 2005 Mayo Clinic 24
Shared Context Least Common Denominator
CONCEPT
Referent
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
Refers ToSymbolises
Stands For“Rose”,“ClipArt”
CONCEPT
Symbol Symbol
“I see a ClipArt image of a rose”
Context Context
Reduce the Shared Context...
Terminologies Terminologies
Terminologies
“I see a ClipArt image of a red flower with ...”
... increase the symbolcomplexity
Shared Context Least Common Denominator
1/14/2005 Copyright © 2005 Mayo Clinic 25
Communication and Clinical Information
1/14/2005 Copyright © 2005 Mayo Clinic 26
Clinical Information Today
• Still transitioning from paper to digital
• Existing digital records grew up in an era of scarce resources
• Maximum shared context• High information content per token
• Information is highly filtered• Only preserve what is absolutely necessary
• Context is tacit and local• Sharing information is labor intensive and
imprecise• Mapping between local contextual knowledge
and incomplete and idiosyncratic understanding of target medium
1/14/2005 Copyright © 2005 Mayo Clinic 27
Clinical Information Today
ClinicalRecord
Dictation
Notes
Direct Entry
Coding and Classification
Local Context
GlobalContext
1/14/2005 Copyright © 2005 Mayo Clinic 28
Clinical Information Today
• High degree of tacit shared context• Records require interpretation to be used
externally
• External representations are ‘lossy’• ICD-9, CPT & the like are abstractions and
classifications
• Expensive or impossible to utilize for purposes not originally envisioned by system architects
• e.g. genetics research
1/14/2005 Copyright © 2005 Mayo Clinic 29
Bioinformatics Research
• Needs high volumes of clinical data• Statistical power requirements• May have to span more than one institution
• May need to include environmental and other factors
• Cross-institutional and regions
• Will often be searching for new information• ‘Raw’ information available to clinician or
laboratory • May not be considered clinically significant
1/14/2005 Copyright © 2005 Mayo Clinic 30
Bioinformatics ResearchClinical Information Requirements
• High Volume
• Detailed
• Comparable
• Multi-institutional
• Multi-specialty
• Linkable with bio-, enviro-, socio- and other information types
1/14/2005 Copyright © 2005 Mayo Clinic 31
How do we get there?
• Maximize usefulness of today’s clinical information
• Improve acquisition and recording mechanisms for future clinical information
1/14/2005 Copyright © 2005 Mayo Clinic 32
How do we get there?
1) Create an explicit, global shared context• Terminology – shared terms, codes, and
precise definitions• Information model – data elements,
relationships and definitional (terminological) links
1/14/2005 Copyright © 2005 Mayo Clinic 33
How do we get there?
2) Map existing information structures onto this grid• Local data elements and structures
common data elements• Local names and descriptions
common terminology• Local codes and classifications
defined in terms of common terminology structure
1/14/2005 Copyright © 2005 Mayo Clinic 34
How do we get there?
3) Define future information systems requirements in terms of a shared information model and terminology• Develop tooling to allow consistent,
granular information capture• Storage resources are no longer the issue• Clinician time and efficiency is the new
bottleneck• Create standards and resources to make
implementation practical
1/14/2005 Copyright © 2005 Mayo Clinic 35
Clinical Information Today
ClinicalRecord
Dictation
Notes
Direct Entry
Coding and Classification
Local Context
GlobalContext
1/14/2005 Copyright © 2005 Mayo Clinic 36
Clinical Information in the Future
GlobalContext
ClinicalRecord
Dictation
Notes
Direct Entry
Coding and Classification
(Process)
Terminologies
Information Models
1/14/2005 Copyright © 2005 Mayo Clinic 37
Future Information
• Migrate to common models and terminology• common shared terminology• well understood shared information models
• Locally enhanced, but common elements remain common
• Tooling to minimize / eliminate translation steps• (semi) structured information entry• Definitions and terminology directly available to
clinician• Entry tools that allow both precision and detail
1/14/2005 Copyright © 2005 Mayo Clinic 38
The GAP“Terminologies”
Coding and Classification
“Ontologies”Computable DL Frameworks
ICD-9-CM
CPT-4
ICD-10-PCS
MESH
SNOMED-IIISNOMED CT
GO...
Many, many more to comeCountries
Languages
Mime Types
SNOP
FMA
ChEBI
MGED
GMOD
1/14/2005 Copyright © 2005 Mayo Clinic 39
We want the best of both worlds
• Definitions, instructions, comments
• Access to existing clinical records
• Computability
• Dynamic and distributed evolution
• Focus on specific uses and contexts
• Ability to navigate and traverse at will
1/14/2005 Copyright © 2005 Mayo Clinic 40
The Ultimate Goal
Terminology as a commodity resource• Available whenever and wherever it
is needed• Online or downloadable• Push or pull update mechanism• Available 24x7
• Revised and updated in “real-time”• Cross-linked and indexed
1/14/2005 Copyright © 2005 Mayo Clinic 41
Synopsis
The purpose of the Lexical Grid is to provide a framework that:
• Can represent yesterday’s, today’s and tomorrow’s terminological resources as a single virtual structure
• Allow these resources to be cross-linked and indexed
• Consists of building blocks and tools that allow applications and users to take advantage of the content where and when it is needed.
1/14/2005 Copyright © 2005 Mayo Clinic 42
FunctionWhat the Lexical Grid can do?
1/14/2005 Copyright © 2005 Mayo Clinic 43
The Heart of the Lexical Grid
The LexGrid Model - a model of terminology that:1) Explicitly names and defines the
things that the LexGrid tools need to reference explicitly
2) Represents “non-semantic” entities as name/value pairs
1/14/2005 Copyright © 2005 Mayo Clinic 44
The LexGrid Model
1/14/2005 Copyright © 2005 Mayo Clinic 45
The LexGrid Model
• Source is currently maintained in XML Schema
• (Semi) automatic transformations available to
• Unified Modeling Language (UML)• XML Model Interchange (XMI)• Eclipse Modeling Framework (EMF)• Java• LDAP Schema
1/14/2005 Copyright © 2005 Mayo Clinic 46
The LexGrid Node
• A LexGrid Node is software and a backing data store that represents terminological information in a format semantically faithful to the LexGrid Model
LexGridNode
DataStore
1/14/2005 Copyright © 2005 Mayo Clinic 47
LexGrid Components
LexGridNode
DataStore
Services
WebClients
Java
.NET
...
Import
Editors
Browsers
Query Tools
OWL
RDFXML
CSV
Terminology
Browse andEdit
Export
Embed
...
1/14/2005 Copyright © 2005 Mayo Clinic 48
Import Toolkit
LexGridNode
DataStore
Services
WebClients
Java
.NET
...
Import
Editors
Browsers
Query Tools
OWL
RDFXML
CSV
Terminology
Browse andEdit
Export
Embed
...
1/14/2005 Copyright © 2005 Mayo Clinic 49
Import ToolkitUMLS / SQL / SQL Lite
The lexgrid model currently has three different server-based data storage formats. LDAP, SQL Lite and SQL. The converter package is the tooling that we have written to allow you to move data from one format to another. The converter package also allows you to import terminologies directly from a local image of the UMLS database.
1/14/2005 Copyright © 2005 Mayo Clinic 50
Import ToolkitUMLS / SQL / SQL Lite
1/14/2005 Copyright © 2005 Mayo Clinic 51
Import ToolkitHL7 Version 3
1/14/2005 Copyright © 2005 Mayo Clinic 52
Import ToolkitExcel Import
1/14/2005 Copyright © 2005 Mayo Clinic 53
Import ToolkitNative Protege
1/14/2005 Copyright © 2005 Mayo Clinic 54
Import ComponentsOWL
1/14/2005 Copyright © 2005 Mayo Clinic 55
Import ComponentsCustom Formats as Needed
• HTML
• CSV
• SQL
• XML (arbitrary flavor)
• ...
1/14/2005 Copyright © 2005 Mayo Clinic 56
Functional Components
LexGridNode
DataStore
Services
WebClients
Java
.NET
...
Import
Editors
Browsers
Query Tools
OWL
RDFXML
CSV
Terminology
Browse andEdit
Export
Embed
...
1/14/2005 Copyright © 2005 Mayo Clinic 57
Browse and EditLexGrid Editor
• Eclipse Based
• Multi Terminology Query and Browsing
• Can co-exist w/ Protege
• Logging and Audit Trail
• Much more – needs a presentation by itself!
1/14/2005 Copyright © 2005 Mayo Clinic 58
1/14/2005 Copyright © 2005 Mayo Clinic 59
Browse and EditCTS Plugin for Protege
1/14/2005 Copyright © 2005 Mayo Clinic 60
Functional Components
LexGridNode
Data Store
Services
WebClients
Java
.NET
...
Import
Editors
Browsers
Query Tools
OWL
RDFXML
CSV
Terminology
Browse andEdit
Export
Embed
...
1/14/2005 Copyright © 2005 Mayo Clinic 61
ServicesCommon Terminology Services
1/14/2005 Copyright © 2005 Mayo Clinic 62
ServicesSOAP
1/14/2005 Copyright © 2005 Mayo Clinic 63
ServicesSOAP
1/14/2005 Copyright © 2005 Mayo Clinic 64
ServicesSOAP
1/14/2005 Copyright © 2005 Mayo Clinic 65
Services.NET
1/14/2005 Copyright © 2005 Mayo Clinic 66
Services
1/14/2005 Copyright © 2005 Mayo Clinic 67
Functionality
LexGridNode
DataStore
Services
WebClients
Java
.NET
...
Import
Editors
Browsers
Query Tools
OWL
RDFXML
CSV
Terminology
Browse andEdit
Export
Embed
...
1/14/2005 Copyright © 2005 Mayo Clinic 68
ExportLexGrid XML Format
1/14/2005 Copyright © 2005 Mayo Clinic 69
ExportOWL Format
UMLS Semantic Netrendered in OWL
1/14/2005 Copyright © 2005 Mayo Clinic 70
Export
• Other XML’s
• CSV
• Parameterized queries
• ...
1/14/2005 Copyright © 2005 Mayo Clinic 71
FunctionalityVirtual Nodes
LexGridNode
DataStoreLexGridNode
DataStore
LexGridNode
DataStore
LexGridNode
DataStore
Mayo
Stanford
UCSF
NCI
1/14/2005 Copyright © 2005 Mayo Clinic 72
FunctionalityVirtual Nodes
• Virtual Node Toolkit• Create and load a local node• Publish in web space• Node is treated as part of the larger
grid
1/14/2005 Copyright © 2005 Mayo Clinic 73
FunctionalityVirtual Nodes – Cross Node Search
ICD-9
FMA
MeSH
1/14/2005 Copyright © 2005 Mayo Clinic 74
FunctionalityReplication / Update
NCIReplica
DataStore
Mayo
NCIReplica
DataStore
Stanford
NCI
DataStore
NCI
Update
Subscribe
ChangeLog
ChangeLog
ChangeLog
“Push”“Pull”
1/14/2005 Copyright © 2005 Mayo Clinic 75
FunctionalityIndices
NCI
DataStore
NCI
Update
IndexService
Subscribe
“Push”
ReasoningService
Subscribe
“Push”
1/14/2005 Copyright © 2005 Mayo Clinic 76
FunctionalityCross References
NCI
DataStore
UMLS
DataStore
SemanticNET
DataStoreUMLS_CUI = URN:ISO:2.16.840.1.113883.6.56:C0002072
Semantic_Type = URN:ISO:2.16.840.1.113883.6.56.1:T123
T123 – “Biologically Active Substance”
ConceptCode: C222 entityDescription: Alkylsulfonate Compound Semantic_Type: SemNet:T123 UMLS_CUI: C0002072
C0002702 – “Alkanesufonates”
1/14/2005 Copyright © 2005 Mayo Clinic 77
FunctionalityNode Directory
1/14/2005 Copyright © 2005 Mayo Clinic 78
Acknowledgements
• Dr. Christopher Chute – visionary and enabler
• James Buntrock – administrative enabler
• Daniel Armbrust – programmer extraordinaire
• Thomas Johnson – architect / programmer extraordinaire
• Deepak Sharma – programmer extraordinaire
1/14/2005 Copyright © 2005 Mayo Clinic 79
Acnowledgements
This work was supported in part by a grant from the US National Library of Medicine: LM07319.
1/14/2005 Copyright © 2005 Mayo Clinic 80
LexGrid home page
https://cabig-kc.nci.nih.gov/Vocab/KC