35
Medical Digital Medical Digital Library to Support Library to Support Scenario Specific Scenario Specific Information Retrieval Information Retrieval Wesley W. Chu [email protected] [email protected] Computer Science Department Computer Science Department University of California University of California Los Angeles, California Los Angeles, California

Medical Digital Library to Support Scenario Specific Information Retrieval

  • Upload
    manjit

  • View
    30

  • Download
    2

Embed Size (px)

DESCRIPTION

Medical Digital Library to Support Scenario Specific Information Retrieval. Wesley W. Chu [email protected] Computer Science Department University of California Los Angeles, California. Wesley W. Chu, PhD Hooshang Kangarloo, MD Usha Sinha, PhD David B. Johnson, PhD. Bernard Churchill, MD - PowerPoint PPT Presentation

Citation preview

Page 1: Medical Digital Library to Support Scenario Specific Information Retrieval

Medical Digital Library to Medical Digital Library to Support Scenario Specific Support Scenario Specific

Information RetrievalInformation Retrieval

Wesley W. [email protected]@cs.ucla.edu

Computer Science DepartmentComputer Science DepartmentUniversity of CaliforniaUniversity of CaliforniaLos Angeles, CaliforniaLos Angeles, California

Page 2: Medical Digital Library to Support Scenario Specific Information Retrieval

A Project of theA Project of theNIH Grant at UCLANIH Grant at UCLA

A Digital File Room for Patient Care, Education, and Research

Wesley W. Chu, PhDWesley W. Chu, PhDHooshang Kangarloo, MDHooshang Kangarloo, MDUsha Sinha, PhDUsha Sinha, PhDDavid B. Johnson, PhDDavid B. Johnson, PhD

Bernard Churchill, MDBernard Churchill, MDJohn D. N. Dionisio, PhDJohn D. N. Dionisio, PhDRichard Johnson, MDRichard Johnson, MDOsman Ratib, MD, PhDOsman Ratib, MD, PhD

Page 3: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

BackgroundBackground

• Current file rooms managing patient records have limited functionality– Main goal of mapping patient ID to patient records

• PACS implementations are an electronic version of the traditional file room

Page 4: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

BackgroundBackground

• Finding relevant information for a particular user is time consuming and labor intensive

• Poorly structured and incomplete results, which may affect patient management

• Current search tools limited for general use and not tailored to specific users or tasks

Lack of structure makes...

Page 5: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

Digital File Room RequirementsDigital File Room Requirements

A navigable information space providing:– Relevant and reputable information– Access to similar patient records– Content-based cross referencing– Dynamically updated data repository– Tailored access for specific users and devices

Page 6: Medical Digital Library to Support Scenario Specific Information Retrieval

• • BackgroundBackground • Hypothesis • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Specific Aims • Significance • Approach and Innovations • Research Progress

HypothesesHypotheses

• A digital file room (digital library) that delivers relevant and structured answers to specific query can be developed from existing medical databases

• Such a digital file room will increase user satisfaction and improve patient management

Page 7: Medical Digital Library to Support Scenario Specific Information Retrieval

Specific AimsSpecific AimsSA1 Develop a system that identifies and provides access to reputable

information sources

SA2 Provide users with greater query capability (e.g. similar-to, approximate)

SA3 Extract knowledge from patient data, medical literature and radiology teaching files to support content-based cross-referencing

SA4 Provide access to dynamically updated collections based on patient data

SA5 Adapt information retrieval to user and device characteristics

• • Background • Hypothesis Background • Hypothesis • Specific Aims • Specific Aims • Significance • Approach and Innovations • Research Progress• Significance • Approach and Innovations • Research Progress

Page 8: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Approach and InnovationsApproach and Innovations• Intelligent information registration

– Provide access to multiple, related data sources through a single access point

• Content-based navigation and matching– Develop similarity matching based on medical concepts & patterns– Content correlation

• User and device modeling– Adaptive information retrieval based on user and device models

• Scenario-based information web (proxies)– Develop information web linking clustered data sources for a

given set of related tasks (i.e., scenario)

Page 9: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Intelligent Information RegistrationIntelligent Information RegistrationRegisters multiple information sources to provide transparent access through a single point (proxy object).

– Information requests are routed to appropriate data sources based on query characteristics

– Data sources are hierarchically clustered according to a four-layer data model

Procedure database data:billing, cpt

Laboratory databases

Ortho Incontinence IncontinenceNeurological Orthosummarization

Procedures Labsmeta-data

Patientproxy-object(access point)

Page 10: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Content-Based Navigation & Content-Based Navigation & MatchingMatching

Two types of navigation– Navigation of the information space using

proxies and content correlation– Pattern/similarity navigation using type

abstraction hierarchies (TAHs)

Page 11: Medical Digital Library to Support Scenario Specific Information Retrieval

Pattern-Based Type Pattern-Based Type Abstraction HierarchiesAbstraction Hierarchies

• Scalable, hierarchical knowledge structures that facilitate similarity matching

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Type Vadequate holding,poor storage,poor emptying

Adequate holding

Poorholding

Type IIadequate holding,adequate storage,poor emptying

Type IIIpoor holding,adequate storage,poor emptying

Type IVpoor holding,poor storage,poor emptying

6 dayM

Incontinence

7 moF

12 yrM

25 yrF

28 dayM

24 moF

15 yrM

20 yrF

Page 12: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Adaptive Information RetrievalAdaptive Information Retrieval

• Tailors query processing and query results according to:– Particular user– Characteristics of their device

• Examples:– Doctors prefer JAMA or Lancet while patients prefer Time or CNN.– High resolution workstations support large, detailed imaging

studies while portable devices need lower-bandwidth data.• Allows the system to retrieve appropriate data for a

particular query, user, and device

Page 13: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Scenario-Based ProxyScenario-Based Proxy

A framework that defines, for a particular domain and set of tasks, the access methods to and the relationships between information sources.

Patient

UCLA HFC

Procedures Labs

HFC BloodMD Office UCLA Blood

– intelligent information registration

– pattern-based similarity matching

– adaptive information retrieval

– information webType V

Adequate holding Inadequateholding

Type II Type III Type IV

Page 14: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

Scenario-Based Information Scenario-Based Information WebWeb

A directed graph that defines access paths for navigation A directed graph that defines access paths for navigation among proxy objectsamong proxy objects

correlated-to

similar-tocorrelated-to

similar-to

Teaching FileTeaching File

PatientPatient

LiteratureLiterature

Page 15: Medical Digital Library to Support Scenario Specific Information Retrieval

• • Background • Hypothesis • Specific AimsBackground • Hypothesis • Specific Aims • Significance • Significance • Approach and Innovations • Approach and Innovations • Research Progress• Research Progress

PatientPatient LiteratureLiterature

Teaching FileTeaching File

correlated-tocorrelated-to

similar-tosimilar-to correlated-tocorrelated-to

similar-tosimilar-to

Scenario-Based Information Scenario-Based Information WebWeb

• Similar-to links relate objects based on their Similar-to links relate objects based on their similaritysimilarity– patients similar by age, sex, and diseasepatients similar by age, sex, and disease

Extends the scope of the digital file room into a digital

medical library• Correlated-to links relate objects based on related Correlated-to links relate objects based on related

contentcontent– disease can be correlated to relevant literaturedisease can be correlated to relevant literature

Page 16: Medical Digital Library to Support Scenario Specific Information Retrieval

Research ProgressResearch Progress

• Phrase IndexingPhrase generated from a n-word combination in a

sentence.– Domain Specific Retrieval– Document Summarization

• Content Correlation– Linking of relevant documents via patterns

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 17: Medical Digital Library to Support Scenario Specific Information Retrieval

Domain Specific RetrievalDomain Specific Retrieval• Document are grouped into domain-specific

collections– Medical patient reports– Web sites are often tailored to specific subject areas

• Phrases can capture content better than single word, thus improve retrieval performance

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 18: Medical Digital Library to Support Scenario Specific Information Retrieval

Problem With Longer PhrasesProblem With Longer Phrases

1.00E+00

1.00E+01

1.00E+02

1.00E+03

1.00E+04

1.00E+051.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

1.00E+12

1 2 3 4 5 6

100 worddocument125 worddocument150 worddocument100^n

14-wordsentence

Large combinatorial problem

To process longer phrases it is necessary to partitiondocuments into smaller segments

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 19: Medical Digital Library to Support Scenario Specific Information Retrieval

Phrase AnalysisPhrase Analysis• A phrase is defined

as any 2, 3 or 4 words co-occurring in a sentence (word combination)

• Very large number of possible phrases– Use a stoplist to

remove “useless” words

– Normalize words to a common stem

rightthe upper lobe mass is seen again

rightThe upper lobe mass is seen again.sentence

casenormalization

right upper lobe mass seen againstop wordremoval

right upp lob mass seen againstemming

right upplob mass seenagainsorting

right

upp

lobmass

seen

again

candidate2-wordcombinations

againagain rightagainagain

lob masslob

seenupp

loblob

mass rightseenupp

right seenupp

seen upp

massmass

right

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 20: Medical Digital Library to Support Scenario Specific Information Retrieval

Document Retrieval EvaluationDocument Retrieval Evaluation• Preliminary evaluation

– A domain specific collection of documents– Can phrase analysis limited to sentences improve retrieval

effectiveness?– SMART system (single word terms) used as baseline

• Data– Thoracic radiology patient reports– Dictated reports– Describe anatomy and abnormal findings such as enlarged

lymph nodes and cancer masses

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 21: Medical Digital Library to Support Scenario Specific Information Retrieval

Domain SpecificDomain SpecificDocument RetrievalDocument Retrieval• Query: “right upper lobe mass”

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 22: Medical Digital Library to Support Scenario Specific Information Retrieval

Frequent N-WordsFrequent N-Wordsheart, aspirin, patient, doct, study, they, risk, prevent, take, diseas, stafford, use, too, may, thi, we, attack, ther, intern, bia, gener, peopl, problem, call, know, not, pain, some, reduc, medicat, very, becaus, data, regul

aspirin patient, heart aspirin, aspirin use, aspirin take, aspirin risk, aspirin study, patient take, patient study, heart diseas, heart patient, diseas peopl, prevent too, they not, they ther, they take, doct data, doct some, doct too, doct use, doct stafford, aspirin regul, aspirin becaus, aspirin reduc, aspirin some, aspirin pain, aspirin not, aspirin attack, aspirin too, aspirin diseas, use regul, aspirin they, aspirin doct, stafford intern, take not, risk reduc, study take, patient becaus, patient some, patient not, patient too, patient use, patient they, patient doct, heart regul, heart peopl, heart attack, heart too, heart use, heart stafford, use some, heart study, heart doct

(a) Frequent 1-word table (total 34)

(b) Frequent 2-word table (total 52)

aspirin patient take, aspirin patient study, heart aspirin patient, aspirin doct some, aspirin patient some, heart aspirin use, doct use some, aspirin take not, aspirin they not, aspirin patient not, aspirin they take, aspirin study take, patient doct use, heart aspirin diseas, heart use regul, heart aspirin regul, aspirin patient too, heart aspirin attack, aspirin risk reduc, patient take not, patient they not, heart patient too, heart aspirin too, patient use some, patient doct some, patient they take, patient study take, aspirin doct use, heart doct stafford, aspirin patient use, heart diseas peopl, aspirin use regul, aspirin patient they, heart patient study, heart aspirin study, aspirin patient becaus, aspirin patient doct, aspirin use some, they take not

heart aspirin use regul, aspirin they take not, aspirin patient take not, patient doct use some, aspirin patient study take, patient they take not, aspirin patient use some, aspirin doct use some, aspirin patient they not, aspirin patient they take, aspirin patient doct some, heart aspirin patient too, aspirin patient doct use, heart aspirin patient studyaspirin patient they take not, aspirin patient doct use some

(c) Frequent 3-word table (total 39)

(d) Frequent 4-word table (total 14)(e) Frequent 5-word table (total 2)

Page 23: Medical Digital Library to Support Scenario Specific Information Retrieval

Phrase length distributionPhrase length distribution

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7 8 9

N-Word

Num

ber

Aspirin1

Aspirin2

Elian04

LAPD06

CNN-Bush

CNN-Florida

Page 24: Medical Digital Library to Support Scenario Specific Information Retrieval

Automatic Text SummarizationAutomatic Text SummarizationSalton Method• Given a text file with n paragraphs• A paragraph can be represented by Di=(di1, di2, …, dim)

– dik is the weight to represent the importance for term Tk(word or phrase)

• The pair-wise similarity of two paragraphsSim(Di,Dj) = dik * djk , k = 1..m

Text relationship map:• Nodes = paragraph• Links = pair-wise similarity of the connected nodes• Links are created if Sim(Di, Dj) > threshold

Bushiness of a node = # of links of a nodeText Summarization derived from the Bushy nodes.

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

P1

Pn

P5 P4

P2

P3

Page 25: Medical Digital Library to Support Scenario Specific Information Retrieval

Performance Comparison of Sultan’s Summarization Performance Comparison of Sultan’s Summarization Method Based on Phrase and Single WordMethod Based on Phrase and Single Word

Aspirin.txt words 2W phrases 3W phrasesThreshold 0.1 0.2 0.3 0.1 0.2 0.3 0.1 0.2 0.3

ParagraphsRankingBased onBushiness

No.1 4 6 8 2 2 2 2 2 2No.2 6 8 2 3 3 3 3 3 3No.3 8 3 3 6 6 6 8 8 8No.4 1 4 4 1 4 4 4 4 4No.5 5 5 5 8 5 5 6 6 6No.6 2 1 6 4 1 1 5 5 5No.7 3 2 1 5 8 8 7 7 7No.8 9 9 9 7 7 7 1 1 1No.9 7 7 7 9 9 9 9 9 9

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Summarization based on Phrases are less sensitive to Threshold setting than Single Words.

Page 26: Medical Digital Library to Support Scenario Specific Information Retrieval

Comparison between Salton & FBIComparison between Salton & FBI  Salton FBI df

Threshold 0.1 Threshold 0.2 Threshold 0.3Apirin0113 sent

1 12,9,3,2,7 9,2,3,7,1 1,2,3,7,9 2,9,3,12,7 02 3,2,9,1,7 3,2,9,1,7 3,2,9,1,7 2,3,12,9,4 23 2,3,12,4,9 2,3,12,4,9 2,3,12,4,9 2,12,3,9,4 0

Apirin0268 sent

1 12,14,22,61,66 12,14,1,15,20 1,12,14,15,20 14,12,22,66,20 0

2 22,14,12,15,36 22,12,15,36,66 15,36,66,20,22 14,12,66,22,36 0

3 12,14,66,22,36 12,14,22,36,66 12,14,22,36,66 14,12,66,22,36 0

Elian0492 sent

1 26,76,33,59,2 26,76,33,2,24 76,26,2,44,7 26,76,2,7,44 1

2 26,7,76,33,2 26,7,76, 29,82 26,7,76,2,29 26,76,2,7,6 1

3 6,26,27,7,2 6,27,7,26,2 6,27,26,2,7 26,2,6,27,59 1LAPD0627 sent

1 7,6,19,25,20 6,19,7,20,25 6,19,7,14,25 6,7,19,20,25 02 18,6,19,20,9 18,6,19,20,9 6,19,18,20,9 18,19,6,20,7 13 5,12,14,17,18 5,12,14,17,18 5,12,14,17,18 5,20,12,14,17 1

CNNbush14 sent

1 12,5,6,8,11 12,5,6,11,7 12,5,6,11,3 5,12,8,11,6 02 8,12,5,6,7 8,12,5,11,3 8,12,5,3,10 5,12,8,3,10 03 5,8,12,10,3 5,8,12,10,3 12,5,8,3,10 12,5,8,9,10 1

Florida49 sent

1 29,11,41,2,26 29,41,11,26,2 29,41,26,11,14 29,11,17,48,41 1

2 20,40,17,11,22 20,40,17,11,22 20,17,40,22,25 17,40,20,11,48 1

3 17,20,40,6,22 17,20,40,6,22 17,20,6,22,25 17,20,25,40,11 1

Page 27: Medical Digital Library to Support Scenario Specific Information Retrieval

Content CorrelationContent Correlation• Given a document in one collection, content

correlation links relevant documents in another document collection

PatientRecords

New EnglandJournal of Medicine

CNNTime

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 28: Medical Digital Library to Support Scenario Specific Information Retrieval

Document ClusterDocument ClusterBy PatternBy Pattern

• A pattern is a set of unique terms that characterize some features in the data set

• Patterns can be found in a collection of documents by data mining

• Documents are grouped into clusters based on patterns via clustering technique

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 29: Medical Digital Library to Support Scenario Specific Information Retrieval

Cluster SignatureCluster Signature• Every cluster can be classified according to the occurrence

frequency of the patterns• Looking to answer:

– The set of patterns summarize a given cluster?– How the patterns related among the clusters ?

Patient Records

Literature

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 30: Medical Digital Library to Support Scenario Specific Information Retrieval

Deriving Cluster SignatureDeriving Cluster Signature• Metrics

– Local Cluster Certainty (LCC) measures the coverage of a pattern in a given cluster (Popularity)

– The Global Cluster Certainty (GCC) measures the coverage of a pattern among clusters (Exclusiveness)

• The Cluster Signature is the set of those patterns that have both high LCC and GCC

• Documents from one collection (source) can be linked to relevant clusters in another collection (target)

Patient Records

Literature

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 31: Medical Digital Library to Support Scenario Specific Information Retrieval

Preliminary ResultsPreliminary Results• A collection of 69 pediatric urology literature abstracts taken from

Medline were clustered using the complete link clustering algorithm– 3 large clusters, each with 2 or more sub-clusters

• GCC and LCC were calculated for patterns found in several sub-clusters• Data from one sub-cluster is reported here

Document # Title

1 Complications in pediatric urological laparoscopy: results of a survey

2 Laparoscopic surgery in pediatric urology

3 [Laparoscopic interventions in pediatric urology]

4 Role of laparoscopic surgery in pediatric urology

5 [Laparoscopic interventions in urology]

6 Laparoscopic heminephroureterectomy in pediatric patients

• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Page 32: Medical Digital Library to Support Scenario Specific Information Retrieval

GCCGCC• • BackgroundBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress• Research Progress

Term/Phrase Cl

Pediatr 1.0

Result 1.0

Patient 1.0

Perform 1.0

Compl 1.0

Laparoscop 1.0

Urolog 0.34

Laparoscop pediatr 1.0

Laparoscop perform 1.0

Diagnost laparoscop 0.35

Laparoscop operat 0.35

Compl rate 0.35

Laparoscop patient 0.35

Laparoscop operat perform 0.0817

Laparoscop patient perform 0.0817

LCCLCCTerm/Phrase Cg

Laparoscop 0.1887

Compl 0.0817

Child Laparoscop 1.0

Laparoscop patient 1.0

Compl Laparoscop 1.0

Comple techn 1.0

<MEAS> compl 1.0

Laparoscop perform 0.6088

Compl rate 0.4564

Laparoscop patient perform 1.0

Laparoscop perform procedur 1.0

<MEAS> compl rate 1.0

Laparoscop pediatr perform 1.0

Compl laparoscop techn 1.0

Page 33: Medical Digital Library to Support Scenario Specific Information Retrieval

Project SummaryProject SummaryA system that provides:– relevant and reputable

information,– access to similar patient records,– content-based cross referencing,– a dynamically updated data

repository, and– tailored access for specific users

and devices

will:– augment the patient

record to provide tailored and timely access to a broader array of reputable information and

– extend the digital file room into a digital medical library.

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

Page 34: Medical Digital Library to Support Scenario Specific Information Retrieval

Research ResultsResearch Results• Phrase Indexing

– Developed an efficient algorithm for extracting n-word features from textual documents

– Phrase index provide better results than single word index in document retrieval and summarization

• Content Correlation via Cluster Signature (LCC & GCC)– Preliminary results reveal the feasibility using cluster

signature for linking relevant documents• Work begun on proxy for information navigation

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress

Page 35: Medical Digital Library to Support Scenario Specific Information Retrieval

Future WorkFuture Work

• Develop Ontology for Intelligent Information Registration

• User Model for Information Retrieval

• • Background • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research ProgressBackground • Hypothesis • Specific Aims • Significance • Approach and Innovations • Research Progress