Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor...

Medical Information RetrievalChallenges in a Webbed World

William Hersh, M.D.

Associate Professor and Chief

Division of Medical Informatics and Outcomes Research

Oregon Health Sciences University

hersh@ohsu.edu

Overview

Describe current information retrieval technology

Summarize IR research activities and their results

Discuss the implications of the World Wide Web (WWW) for IR

Overview of IR Process

Indexinglanguage

QueriesQueries DocumentsDocuments

Searchengine

Retrieval Indexing

What is the field of IR?

Concerned with creation, storage, organization, and retrieval of computer-based information

“IR” has traditionally focused on retrieval of information from heterogeneous textual databases

Recent expansion to multimedia and integration with “traditional” databases

Why is IR pertinent to health care?

Growth of knowledge has long surpassed human memory capabilities

Clinicians have frequent and unmet information needs

Primary literature on a given topic can be scattered and hard to synthesize

Non-primary literature sources are often neither comprehensive nor systematic

Further reading

Hersh WR, Information Retrieval: A Health Care Perspective, Springer-Verlag, 1996

Hersh WR, Hickam DH, How well do physicians use electronic information retrieval systems? A framework for investigation and systematic review, Journal of the American Medical Association, 1998, 280: 1347-1352

IR state of the art

Databases Indexing Retrieval Evaluation

Databases

Bibliographic– References to journal literature– Used in initial IR systems– Most famous example is MEDLINE

» Nearly 9 million references to peer-reviewed literature dating back to 1966

» Covers about 3,000 journals, mostly English-based

» About 300,000 new references added yearly

» Maintained by National Library of Medicine

Databases (cont.)

Full-text– Journal literature has been available for over a

decade in text-only and at high cost– Last decade has seen increasing growth of CD-

ROM market– New “evidence-based” resources are becoming

available, e.g., Best Evidence, Cochrane Hypertext

– Information linked in non-linear fashion

Indexing

Two major types:– Human indexing with controlled vocabulary

» MEDLINE uses the 18,000-term Medical Subject Headings (MeSH) vocabulary

– Computer assignment of all words in record» Often a stop word list to remove common words

(e.g., the, and, which) is used

» Some systems “stem” words to root form (eg, coughs to cough)

Limitations of indexing approaches

Human indexing– Inconsistency– Inadequate indexing vocabulary

Word indexing– Synonymy - e.g., cancer and carcinoma– Polysemy - e.g., lead– Granularity - e.g., antibiotics, penicillin– Focus

Retrieval

Traditional approach: indexing terms connected by AND, OR

Most bibliographic systems allow searching on both vocabulary and text words

Proximity operators require words to be within a certain range

Some systems hide Boolean operators

Limitations of retrieval approaches

Novices confuse ANDs and ORs Complex user interfaces dissuade busy

users Returned documents displayed in arbitrary

or, at best, reverse chronological order

An alternative approach to indexing and retrieval

Called vector-space, word-statistical, automated retrieval…

Developed by Salton in 1960’s but since works best for end-users did not achieve commercial prominence until 1990’s

Based on notion of finding similarity in words between user’s query and document

Used in Knowledge Finder (Aries) and most Web search engines

Word-statistical indexing

Indexing done of all words (though nothing precludes use of MeSH or other terms)

After stop word filtering and stemming, each word in each document assigned a weight based on product of IDF*TF:– Inverse document frequency of term i

» IDFi = log(# documents/# documents with term)+1

– Term frequency of term i in document j» Tfij = log(frequency of term in document)+1

Word-statistical retrieval

Queries entered in natural language, subject to same stop list and stemming

Each document gets a score based on sum of weights for each query term in the document

Results are sorted and presented to user (relevance ranking)

This approach allows other features:

Relevance feedback– After user designates relevant documents,

query modified Query expansion

– Same but using top-ranked documents without user relevance designations

Evaluation

What questions to ask?– Is system used?– Are users satisfied?– Do they find relevant information?– Do they complete their desired task?

Most research has focused on retrieval of relevant documents

Relevance-based measures

Recall =# retrieved and relevant documents

-------------------------------------------

# relevant documents in collection Precision =

# retrieved and relevant documents

-------------------------------------------

# relevant documents in search

Comments about recall and precision

There tends to be a trade-off between the two

“Relevance” can be a slippery notion It is unclear whether they correlate with a

user’s success in using an IR system The proliferation of standard test collections

leads to a great deal of research that excludes real users

How well do users search?(Haynes et al., Annals of Internal Medicine, 1990)

Recall Precision

Novice 27% 38%

Experiencedclinician

48% 48%

Librarian 49% 57%

More searching results(Hersh et al., Bull Med Libr Assoc, 1994)

Retrieved Recall PrecisionNovice physicians Knowledge Finder 88.8 68.2 14.7Novice physicians KF top 15 14.6 31.2 24.8Librarians Full MEDLINE 18.0 37.1 36.1Librarians Text words only 17.0 31.5 31.9Exp. Physicians Full MEDLINE 10.9 26.6 34.9Exp. Physicians Text words only 14.8 30.6 31.4

Other results

Little overlap among retrieval sets– Searchers tend to find similar quantities of

disparate relevant documents Novice searchers are satisfied with results

– Adequate information or ignorant bliss?

New approaches to evaluation

Changing the research questions– How well can clinical users answer questions?

– What factors are association with success?» Demographics, experience, cognitive factors, and

searching mechanics?

Ongoing study funded by NLM Challenges

– Appropriate questions, database, sample size, etc.

IR research directions

Enhancing word-statistical approaches Linguistic approaches Enhancing conventional indexing and

retrieval

Enhancing word-statistical approaches

Passage retrieval– Giving weight to documents that have sections

mapping closely to the query Use of phrases

– High, blood, and pressure have more meaning when occurring near each other

Linguistic approaches

Syntactic approaches– Conceptual matter tends to occur in noun

phrases Semantic approaches

– Can we overcome problems of synonymy, polysemy, granularity, etc.?

Identifying semantics in documents

SAPHIRE (Hersh and Hickam, 1995)– Direct mapping of text to terms in large

controlled vocabulary (UMLS Metathesaurus)– Works best when exact terms and synonyms

present, less well when terms vague or synonyms non-standard

MEDSPACE (Schatz, 1997)– Large-scale processing to uncover underlying

related terms and literatures

Enhancing conventional systems

Better content– Evidence-based resources

» More informative abstracts, e.g., Best Evidence

» Systematic reviews, e.g., Cochrane Database of Systematic Reviews

» Critically-Appraised Topics (CATs)

Better indexing– NLM’s MedIndex system provides expert

assistance to indexers

Enhancing conventional systems (cont.)

Better retrieval– NLM’s Internet Grateful Med looks for

common searching mistakes (eg, excessive AND’s) and informs searcher

Better vocabularies– NLM’s UMLS Project adds terminology from

other vocabularies

IR and the World Wide Web

Indexing and retrieval approaches Implications for scientific publishing Implications for health care Limitations

Indexing and retrieval on the Web

Web crawlers– Index everything they find– Examples: Alta Vista, InfoSeek, Lycos– Problems: non-discriminating, word only

Filtering and/or classifying– Sites filtered and/or classified based on criteria– Examples: Yahoo, CliniWeb, OMNI– Problems: maintenance, intended audience

Implications for scientific publishing

Peer-review process– Imperfect but best means for controlling

quality in publications Responsibility

– Increased anonymity of Web enhances ability for misrepresentation

Liability– Who is liable for inaccurate information?

Implications for health care

Informativeness vs. marketing– There is potential conflict between providing

information and self-promotion Patient empowerment

– Absolutely important but much potential for damage from misinformation

Much medical informationis on the Web

“Free” information from government agencies, medical schools, and advocacy groups is easy to access and use

“Best” information from traditional medical publishers still costly and fragmented

Some well-known launching pads– Medical Matrix: www.medmatrix.org/– CliniWeb: www.ohsu.edu/cliniweb/

Limitations of the Web(Hersh, ACPJC, 1996)

Difficult to find information - a diversity of different search engines, each with its own benefits and limitations

Everyone can be a publisher - Good for democratic society, less so for scientific and professional fields

Misrepresentation and fraud - Web can amplify misinformation and allow easy fraud

Some have expressed concern about free information on Web

Silberg et al. (JAMA, 1997) suggested standards for health information on Web– Authorship - names, affiliations, and credentials

– Attribution - references, sources, and (where appropriate) copyright

– Disclosure - potential and real conflicts of interest

– Currency - dates content posted and updated

But applicability and quality of Web content is poor

Hersh, Gorman, and Sacherek, JAMA, 1998 Searched on 50 questions generated by

clinicians Less than 10% of pages relevant, none for

half of queries Low percentage of JAMA quality indicators

Final thoughts

We are on the threshold of an exciting new era in communications and information dissemination– Integrity of information and responsibility for

it must be maintained– It should augment and not substitute for human

communication

References Cited: 1. Hersh, W., Information Retrieval: A Health Care Perspective. 1996, New York: Springer-Verlag. 2. Hersh, W. and D. Hickam, How well do physicians use electronic information retrieval systems? A framework

for investigation and review of the literature. Journal of the American Medical Association, 1998. 280: 1347-1352.

3. Haynes, R., et al., Online access to MEDLINE in clinical settings. Annals of Internal Medicine, 1990. 112: 78-84.

4. Hersh, W. and D. Hickam, The use of a multi-application computer workstation in a clinical setting. Bulletin of the Medical Library Association, 1994. 82: 382-389.

5. Hersh W. and D. Hickam, Information retrieval in medicine: the SAPHIRE experience. Journal of the American Society for Information Science, 1995. 46: 743-747.

6. Schatz B., Information retrieval in digital libraries: bringing search to the net. Science, 1997. 275: 327-334. 7. Hersh, W., Evidence-based medicine and the Internet. ACP Journal Club, 1996. 5(4): A12-A14. 8. Silberg, W., G. Lundberg, and R. Musacchio, Assessing, controlling, and assuring the quality of medical

information on the Internet: caveat lector et viewor - let the reader and viewer beware. Journal of the American Medical Association, 1997. 277: 1244-1245.

9. Hersh, W., P. Gorman, and L. Sacherek, Applicability and quality of information for answering clinical questions on the Web. Journal of the American Medical Association, 1998. 280: 1307-1308.

URLs:Division of Medical Informatics & Outcomes Research: www.ohsu.edu/bicc-informatics/CliniWeb: www.ohsu.edu/cliniweb/SAPHIRE International: www.ohsu.edu/cliniweb/saphint/

Medical Information Retrieval Challenges in a Webbed World William Hersh, M.D. Associate Professor...

Documents

Retrieval 1/2 BDK12-5 Information Retrieval William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health & Science University

TOOTH ROOT STRESS MODIFYING FACTORS OF WEBBED GEARS

Model Webbed Ppt

Examples of the EHR BDK10-4 Secondary Use (Re-Use) of Clinical Information William Hersh, MD Department of Medical Informatics & Clinical Epidemiology

BOOKS Seymour Hersh CS 26’’ BOOKS Seymour Hersh.jpg: …hillelthescribecommunications.com/uploads/ChicagoJUFNews... · 2020. 6. 3. · BOOKS_Seymour Hersh CS 26’’ BOOKS_Seymour

Mercedes Medical 110 MTI 108 Hill Dermaceuti cals, inc ... · Rushabh Inst LLC 317 Novodiax 315 17-8" 118 Clear Optix 116 Penn State Hersh Medic¶ Center 114 Mercedes Medical 110

Old Boys - Burton Hersh - 2001 eBook

[Ms. Hersh]’s

Public Health and Big Data BDK08-1 Public Health and Big Data William Hersh, MD Department of Medical Informatics & Clinical Epidemiology Oregon Health

04 C Perangkat Webbed WW 22

Terminology Standards 2/3 BDK06-6 Clinical Data Standards Related to Big Data William Hersh, MD Department of Medical Informatics & Clinical Epidemiology

Data Day Health Hersh Refs - dmice.ohsu.edu · Data Day Health Hersh Refs - dmice.ohsu.edu ... data

Do You have Webbed Feet?

MODEL WEBBED DALAM PEMBELAJARAN IPA TERPADU DI

MTV Webbed Season 2

Terminology Standards 3/3 BDK06-7 Clinical Data Standards Related to Big Data William Hersh, MD Department of Medical Informatics & Clinical Epidemiology

Aalborg Universitet Understanding Capitalism Li, Xing; Hersh, Jacques

Ppt model webbed

Is Relevance Associated with Successful Use of Information Retrieval Systems? William Hersh Professor and Head Division of Medical Informatics & Outcomes

Webbed Marketing Approved Twitter Tools and Applications