Upload
manjit
View
30
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Medical Digital Library to Support Scenario Specific Information Retrieval. Wesley W. Chu [email protected] Computer Science Department University of California Los Angeles, California. Wesley W. Chu, PhD Hooshang Kangarloo, MD Usha Sinha, PhD David B. Johnson, PhD. Bernard Churchill, MD - PowerPoint PPT Presentation
Citation preview
Medical Digital Library to Medical Digital Library to Support Scenario Specific Support Scenario Specific
Information RetrievalInformation Retrieval
Wesley W. [email protected]@cs.ucla.edu
Computer Science DepartmentComputer Science DepartmentUniversity of CaliforniaUniversity of CaliforniaLos Angeles, CaliforniaLos Angeles, California
A Project of theA Project of theNIH Grant at UCLANIH Grant at UCLA
A Digital File Room for Patient Care, Education, and Research
Wesley W. Chu, PhDWesley W. Chu, PhDHooshang Kangarloo, MDHooshang Kangarloo, MDUsha Sinha, PhDUsha Sinha, PhDDavid B. Johnson, PhDDavid B. Johnson, PhD
Bernard Churchill, MDBernard Churchill, MDJohn D. N. Dionisio, PhDJohn D. N. Dionisio, PhDRichard Johnson, MDRichard Johnson, MDOsman Ratib, MD, PhDOsman Ratib, MD, PhD
• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress
BackgroundBackground
• Current file rooms managing patient records have limited functionality– Main goal of mapping patient ID to patient records
• PACS implementations are an electronic version of the traditional file room
• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress
BackgroundBackground
• Finding relevant information for a particular user is time consuming and labor intensive
• Poorly structured and incomplete results, which may affect patient management
• Current search tools limited for general use and not tailored to specific users or tasks
Lack of structure makes...
• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress
Digital File Room RequirementsDigital File Room Requirements
A navigable information space providing:– Relevant and reputable information– Access to similar patient records– Content-based cross referencing– Dynamically updated data repository– Tailored access for specific users and devices
• • BackgroundBackground • Hypothesis • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Specific Aims • Significance • Approach and Innovations • Research Progress
HypothesesHypotheses
• A digital file room (digital library) that delivers relevant and structured answers to specific query can be developed from existing medical databases
• Such a digital file room will increase user satisfaction and improve patient management
Specific AimsSpecific AimsSA1 Develop a system that identifies and provides access to reputable
information sources
SA2 Provide users with greater query capability (e.g. similar-to, approximate)
SA3 Extract knowledge from patient data, medical literature and radiology teaching files to support content-based cross-referencing
SA4 Provide access to dynamically updated collections based on patient data
SA5 Adapt information retrieval to user and device characteristics
• • Background • Hypothesis Background • Hypothesis • Specific Aims • Specific Aims • Significance • Approach and Innovations • Research Progress• Significance • Approach and Innovations • Research Progress
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
Approach and InnovationsApproach and Innovations• Intelligent information registration
– Provide access to multiple, related data sources through a single access point
• Content-based navigation and matching– Develop similarity matching based on medical concepts & patterns– Content correlation
• User and device modeling– Adaptive information retrieval based on user and device models
• Scenario-based information web (proxies)– Develop information web linking clustered data sources for a
given set of related tasks (i.e., scenario)
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
Intelligent Information RegistrationIntelligent Information RegistrationRegisters multiple information sources to provide transparent access through a single point (proxy object).
– Information requests are routed to appropriate data sources based on query characteristics
– Data sources are hierarchically clustered according to a four-layer data model
Procedure database data:billing, cpt
Laboratory databases
Ortho Incontinence IncontinenceNeurological Orthosummarization
Procedures Labsmeta-data
Patientproxy-object(access point)
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
Content-Based Navigation & Content-Based Navigation & MatchingMatching
Two types of navigation– Navigation of the information space using
proxies and content correlation– Pattern/similarity navigation using type
abstraction hierarchies (TAHs)
Pattern-Based Type Pattern-Based Type Abstraction HierarchiesAbstraction Hierarchies
• Scalable, hierarchical knowledge structures that facilitate similarity matching
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
Type Vadequate holding,poor storage,poor emptying
Adequate holding
Poorholding
Type IIadequate holding,adequate storage,poor emptying
Type IIIpoor holding,adequate storage,poor emptying
Type IVpoor holding,poor storage,poor emptying
6 dayM
Incontinence
7 moF
12 yrM
25 yrF
28 dayM
24 moF
15 yrM
20 yrF
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
Adaptive Information RetrievalAdaptive Information Retrieval
• Tailors query processing and query results according to:– Particular user– Characteristics of their device
• Examples:– Doctors prefer JAMA or Lancet while patients prefer Time or CNN.– High resolution workstations support large, detailed imaging
studies while portable devices need lower-bandwidth data.• Allows the system to retrieve appropriate data for a
particular query, user, and device
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
Scenario-Based ProxyScenario-Based Proxy
A framework that defines, for a particular domain and set of tasks, the access methods to and the relationships between information sources.
Patient
UCLA HFC
Procedures Labs
HFC BloodMD Office UCLA Blood
– intelligent information registration
– pattern-based similarity matching
– adaptive information retrieval
– information webType V
Adequate holding Inadequateholding
Type II Type III Type IV
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
Scenario-Based Information Scenario-Based Information WebWeb
A directed graph that defines access paths for navigation A directed graph that defines access paths for navigation among proxy objectsamong proxy objects
correlated-to
similar-tocorrelated-to
similar-to
Teaching FileTeaching File
PatientPatient
LiteratureLiterature
• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress
PatientPatient LiteratureLiterature
Teaching FileTeaching File
correlated-tocorrelated-to
similar-tosimilar-to correlated-tocorrelated-to
similar-tosimilar-to
Scenario-Based Information Scenario-Based Information WebWeb
• Similar-to links relate objects based on their Similar-to links relate objects based on their similaritysimilarity– patients similar by age, sex, and diseasepatients similar by age, sex, and disease
Extends the scope of the digital file room into a digital
medical library• Correlated-to links relate objects based on related Correlated-to links relate objects based on related
contentcontent– disease can be correlated to relevant literaturedisease can be correlated to relevant literature
Research ProgressResearch Progress
• Phrase IndexingPhrase generated from a n-word combination in a
sentence.– Domain Specific Retrieval– Document Summarization
• Content Correlation– Linking of relevant documents via patterns
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Domain Specific RetrievalDomain Specific Retrieval• Document are grouped into domain-specific
collections– Medical patient reports– Web sites are often tailored to specific subject areas
• Phrases can capture content better than single word, thus improve retrieval performance
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Problem With Longer PhrasesProblem With Longer Phrases
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+051.00E+06
1.00E+07
1.00E+08
1.00E+09
1.00E+10
1.00E+11
1.00E+12
1 2 3 4 5 6
100 worddocument125 worddocument150 worddocument100^n
14-wordsentence
Large combinatorial problem
To process longer phrases it is necessary to partitiondocuments into smaller segments
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Phrase AnalysisPhrase Analysis• A phrase is defined
as any 2, 3 or 4 words co-occurring in a sentence (word combination)
• Very large number of possible phrases– Use a stoplist to
remove “useless” words
– Normalize words to a common stem
rightthe upper lobe mass is seen again
rightThe upper lobe mass is seen again.sentence
casenormalization
right upper lobe mass seen againstop wordremoval
right upp lob mass seen againstemming
right upplob mass seenagainsorting
right
upp
lobmass
seen
again
candidate2-wordcombinations
againagain rightagainagain
lob masslob
seenupp
loblob
mass rightseenupp
right seenupp
seen upp
massmass
right
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Document Retrieval EvaluationDocument Retrieval Evaluation• Preliminary evaluation
– A domain specific collection of documents– Can phrase analysis limited to sentences improve retrieval
effectiveness?– SMART system (single word terms) used as baseline
• Data– Thoracic radiology patient reports– Dictated reports– Describe anatomy and abnormal findings such as enlarged
lymph nodes and cancer masses
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Domain SpecificDomain SpecificDocument RetrievalDocument Retrieval• Query: “right upper lobe mass”
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Frequent N-WordsFrequent N-Wordsheart, aspirin, patient, doct, study, they, risk, prevent, take, diseas, stafford, use, too, may, thi, we, attack, ther, intern, bia, gener, peopl, problem, call, know, not, pain, some, reduc, medicat, very, becaus, data, regul
aspirin patient, heart aspirin, aspirin use, aspirin take, aspirin risk, aspirin study, patient take, patient study, heart diseas, heart patient, diseas peopl, prevent too, they not, they ther, they take, doct data, doct some, doct too, doct use, doct stafford, aspirin regul, aspirin becaus, aspirin reduc, aspirin some, aspirin pain, aspirin not, aspirin attack, aspirin too, aspirin diseas, use regul, aspirin they, aspirin doct, stafford intern, take not, risk reduc, study take, patient becaus, patient some, patient not, patient too, patient use, patient they, patient doct, heart regul, heart peopl, heart attack, heart too, heart use, heart stafford, use some, heart study, heart doct
(a) Frequent 1-word table (total 34)
(b) Frequent 2-word table (total 52)
aspirin patient take, aspirin patient study, heart aspirin patient, aspirin doct some, aspirin patient some, heart aspirin use, doct use some, aspirin take not, aspirin they not, aspirin patient not, aspirin they take, aspirin study take, patient doct use, heart aspirin diseas, heart use regul, heart aspirin regul, aspirin patient too, heart aspirin attack, aspirin risk reduc, patient take not, patient they not, heart patient too, heart aspirin too, patient use some, patient doct some, patient they take, patient study take, aspirin doct use, heart doct stafford, aspirin patient use, heart diseas peopl, aspirin use regul, aspirin patient they, heart patient study, heart aspirin study, aspirin patient becaus, aspirin patient doct, aspirin use some, they take not
heart aspirin use regul, aspirin they take not, aspirin patient take not, patient doct use some, aspirin patient study take, patient they take not, aspirin patient use some, aspirin doct use some, aspirin patient they not, aspirin patient they take, aspirin patient doct some, heart aspirin patient too, aspirin patient doct use, heart aspirin patient studyaspirin patient they take not, aspirin patient doct use some
(c) Frequent 3-word table (total 39)
(d) Frequent 4-word table (total 14)(e) Frequent 5-word table (total 2)
Phrase length distributionPhrase length distribution
0
50
100
150
200
250
300
350
400
450
500
1 2 3 4 5 6 7 8 9
N-Word
Num
ber
Aspirin1
Aspirin2
Elian04
LAPD06
CNN-Bush
CNN-Florida
Automatic Text SummarizationAutomatic Text SummarizationSalton Method• Given a text file with n paragraphs• A paragraph can be represented by Di=(di1, di2, …, dim)
– dik is the weight to represent the importance for term Tk(word or phrase)
• The pair-wise similarity of two paragraphsSim(Di,Dj) = dik * djk , k = 1..m
Text relationship map:• Nodes = paragraph• Links = pair-wise similarity of the connected nodes• Links are created if Sim(Di, Dj) > threshold
Bushiness of a node = # of links of a nodeText Summarization derived from the Bushy nodes.
• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
P1
Pn
P5 P4
P2
P3
Performance Comparison of Sultan’s Summarization Performance Comparison of Sultan’s Summarization Method Based on Phrase and Single WordMethod Based on Phrase and Single Word
Aspirin.txt words 2W phrases 3W phrasesThreshold 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3
ParagraphsRankingBased onBushiness
No.1 4 6 8 2 2 2 2 2 2No.2 6 8 2 3 3 3 3 3 3No.3 8 3 3 6 6 6 8 8 8No.4 1 4 4 1 4 4 4 4 4No.5 5 5 5 8 5 5 6 6 6No.6 2 1 6 4 1 1 5 5 5No.7 3 2 1 5 8 8 7 7 7No.8 9 9 9 7 7 7 1 1 1No.9 7 7 7 9 9 9 9 9 9
• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Summarization based on Phrases are less sensitive to Threshold setting than Single Words.
Comparison between Salton & FBIComparison between Salton & FBI Salton FBI df
Threshold 0.1 Threshold 0.2 Threshold 0.3Apirin0113 sent
1 12,9,3,2,7 9,2,3,7,1 1,2,3,7,9 2,9,3,12,7 02 3,2,9,1,7 3,2,9,1,7 3,2,9,1,7 2,3,12,9,4 23 2,3,12,4,9 2,3,12,4,9 2,3,12,4,9 2,12,3,9,4 0
Apirin0268 sent
1 12,14,22,61,66 12,14,1,15,20 1,12,14,15,20 14,12,22,66,20 0
2 22,14,12,15,36 22,12,15,36,66 15,36,66,20,22 14,12,66,22,36 0
3 12,14,66,22,36 12,14,22,36,66 12,14,22,36,66 14,12,66,22,36 0
Elian0492 sent
1 26,76,33,59,2 26,76,33,2,24 76,26,2,44,7 26,76,2,7,44 1
2 26,7,76,33,2 26,7,76, 29,82 26,7,76,2,29 26,76,2,7,6 1
3 6,26,27,7,2 6,27,7,26,2 6,27,26,2,7 26,2,6,27,59 1LAPD0627 sent
1 7,6,19,25,20 6,19,7,20,25 6,19,7,14,25 6,7,19,20,25 02 18,6,19,20,9 18,6,19,20,9 6,19,18,20,9 18,19,6,20,7 13 5,12,14,17,18 5,12,14,17,18 5,12,14,17,18 5,20,12,14,17 1
CNNbush14 sent
1 12,5,6,8,11 12,5,6,11,7 12,5,6,11,3 5,12,8,11,6 02 8,12,5,6,7 8,12,5,11,3 8,12,5,3,10 5,12,8,3,10 03 5,8,12,10,3 5,8,12,10,3 12,5,8,3,10 12,5,8,9,10 1
Florida49 sent
1 29,11,41,2,26 29,41,11,26,2 29,41,26,11,14 29,11,17,48,41 1
2 20,40,17,11,22 20,40,17,11,22 20,17,40,22,25 17,40,20,11,48 1
3 17,20,40,6,22 17,20,40,6,22 17,20,6,22,25 17,20,25,40,11 1
Content CorrelationContent Correlation• Given a document in one collection, content
correlation links relevant documents in another document collection
PatientRecords
New EnglandJournal of Medicine
CNNTime
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Document ClusterDocument ClusterBy PatternBy Pattern
• A pattern is a set of unique terms that characterize some features in the data set
• Patterns can be found in a collection of documents by data mining
• Documents are grouped into clusters based on patterns via clustering technique
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Cluster SignatureCluster Signature• Every cluster can be classified according to the occurrence
frequency of the patterns• Looking to answer:
– The set of patterns summarize a given cluster?– How the patterns related among the clusters ?
Patient Records
Literature
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Deriving Cluster SignatureDeriving Cluster Signature• Metrics
– Local Cluster Certainty (LCC) measures the coverage of a pattern in a given cluster (Popularity)
– The Global Cluster Certainty (GCC) measures the coverage of a pattern among clusters (Exclusiveness)
• The Cluster Signature is the set of those patterns that have both high LCC and GCC
• Documents from one collection (source) can be linked to relevant clusters in another collection (target)
Patient Records
Literature
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Preliminary ResultsPreliminary Results• A collection of 69 pediatric urology literature abstracts taken from
Medline were clustered using the complete link clustering algorithm– 3 large clusters, each with 2 or more sub-clusters
• GCC and LCC were calculated for patterns found in several sub-clusters• Data from one sub-cluster is reported here
Document # Title
1 Complications in pediatric urological laparoscopy: results of a survey
2 Laparoscopic surgery in pediatric urology
3 [Laparoscopic interventions in pediatric urology]
4 Role of laparoscopic surgery in pediatric urology
5 [Laparoscopic interventions in urology]
6 Laparoscopic heminephroureterectomy in pediatric patients
• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
GCCGCC• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress
Term/Phrase Cl
Pediatr 1.0
Result 1.0
Patient 1.0
Perform 1.0
Compl 1.0
Laparoscop 1.0
Urolog 0.34
Laparoscop pediatr 1.0
Laparoscop perform 1.0
Diagnost laparoscop 0.35
Laparoscop operat 0.35
Compl rate 0.35
Laparoscop patient 0.35
Laparoscop operat perform 0.0817
Laparoscop patient perform 0.0817
LCCLCCTerm/Phrase Cg
Laparoscop 0.1887
Compl 0.0817
Child Laparoscop 1.0
Laparoscop patient 1.0
Compl Laparoscop 1.0
Comple techn 1.0
<MEAS> compl 1.0
Laparoscop perform 0.6088
Compl rate 0.4564
Laparoscop patient perform 1.0
Laparoscop perform procedur 1.0
<MEAS> compl rate 1.0
Laparoscop pediatr perform 1.0
Compl laparoscop techn 1.0
Project SummaryProject SummaryA system that provides:– relevant and reputable
information,– access to similar patient records,– content-based cross referencing,– a dynamically updated data
repository, and– tailored access for specific users
and devices
will:– augment the patient
record to provide tailored and timely access to a broader array of reputable information and
– extend the digital file room into a digital medical library.
• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress
Research ResultsResearch Results• Phrase Indexing
– Developed an efficient algorithm for extracting n-word features from textual documents
– Phrase index provide better results than single word index in document retrieval and summarization
• Content Correlation via Cluster Signature (LCC & GCC)– Preliminary results reveal the feasibility using cluster
signature for linking relevant documents• Work begun on proxy for information navigation
• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress
Future WorkFuture Work
• Develop Ontology for Intelligent Information Registration
• User Model for Information Retrieval
• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress