Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Mining theBiomedical Literature:
Creating a Challenge EvaluationLynette Hirschman
Chief ScientistInformation Technology Center
The MITRE CorporationBedford, MA
USA
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Outline
1. Overview: why mine the literature?2. Where we are: technologies for mining text3. Creating a challenge evaluation4. Recommendations
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Why Mine the Literature?
0 Biologists need information contained in text- To integrate information across articles (e.g., in
constructing metabolic pathways)- To refine sequence searches, e.g., literature
search coupled to BLAST searches (Chang et al.,2001)
- To research prior art (for patents)- To update databases
0 Natural language processing offers the tools tomake information in text accessible
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
How Good is Current Language Processing?
0 Natural language processing (NLP) works!0 Automated NLP systems exist now that can:
- Return documents relevant to a subject(information retrieval)
- Identify entities (90-95% accuracy) or relationsamong entities (70-80% accuracy) in text(information extraction)
- Answer factual questions using large documentcollections at 75-85% accuracy(question answering)
0 But... these systems work on news, not biology- And we don’t have comparable performance
metrics for biology
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Text Mining for Biology
0 Current situation:- There are increasing numbers of groups
working on NLP for biology- Each group reports results for a particular
task, on a specialized data set0 Right now, it is very difficult to compare results
across the groups0 Lack of standards also makes it difficult to share
- Data and knowledge resources- Software components
A common challenge evaluation can focus research and speed progress
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Outline
1. Overview: why natural language?2. Where we are: technologies for mining text3. Creating a challenge evaluation4. Recommendations
☛
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Literature Mining Overview
Information Extraction:documents to entities, relations
MEDLINE
PIR
Genbank
Collections:Gigabytes Documents:
Megabytes
Disease So urce Country City_n ameDate C ases N ew_case s Dea dEbola PR OMED Uganda Gula 26-Oct-2000 182 17 64Ebola PR OMED Uganda Gula 5-Nov-2000 280 14 89
Ebola PR OMED Uganda Gulu 13-Oct-2000 42 9 30Ebola PR OMED Uganda Gulu 15-Oct-2000 51 7 31
Ebola PR OMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PR OMED Uganda Gulu 17-Oct-2000 73 2 35
Ebola PR OMED Uganda Gulu 18-Oct-2000 94 21 39Ebola PR OMED Uganda Gulu 19-Oct-2000 111 17 41
Lists,Tables:Kilobytes
Protease-resistantprion protein
interacts with...
Phrases: Bytes
Information Retrieval:key words to documents
Question Answering:question to answer
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Information Retrieval0 Input: query words
Output: ranked list of documents0 Approach
- Speed, scalabilitydomain independence and robustness are criticalfor access to large collections of documents
0 Technique- Shallow processing provides coarse-grained
result (entire documents or passages)- Query is transformed to collection of words,
but grammatical relations between words lost- Documents are indexed by word occurrences- Search matches query “probe” against indexed
documents using Boolean combination of terms,or vector of word occurrences or language model
MITRE© 2 001 The MITRE Corporation. ALL R IG HTS RESERVED.
Information Retrieval
MEDLINE
PIR
Genbank
Collections:Gigabytes Documents:
Megabytes
Disease Source Country City_name Date Cases New_cases Dead
Ebola PROMED Uganda Gula 26-Oct-2000 182 17 64Ebola PROMED Uganda Gula 5-Nov-2000 280 14 89
Ebola PROMED Uganda Gulu 13-Oct-2000 42 9 30
Ebola PROMED Uganda Gulu 15-Oct-2000 51 7 31Ebola PROMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PROMED Uganda Gulu 17-Oct-2000 73 2 35
Ebola PROMED Uganda Gulu 18-Oct-2000 94 21 39Ebola PROMED Uganda Gulu 19-Oct-2000 111 17 41
L i s t s , T a b l e s :
K i l o b y t e s
Protease-resistantprion protein
interacts with...
Phrases: Bytes
Information Retrieval:key words to documents
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Evaluating Text Retrieval
0 The Text Retrieval Conference (TREC) has been heldannually, starting in 1992, run by NIST*
- Successful, attracting 100s of internationalparticipants from industry, academia, government
0 Goal: systematic evaluation of retrieval systems usinga large (5Gb) common corpus
- Given a set of queries, for each query- Systems return a ranked list of documents- Human judges provide relevance assessments for
the ranked documents- Relevance judgements are used to compute
average precision-recall plots for each system= Precision: % returned docs judged relevant= Recall: % of relevant documents found
*US National Institute of Standards and Technology
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Sample TREC Topic<num> Number: 409<title> legal, Pan Am, 103
<desc> Description:What legal actions have resulted fromthe destruction of Pan Am Flight 103over Lockerbie, Scotland on December21, 1988?
<narr> Narrative:Documents describing any charges, claimsor fines presented to or imposed by anycourt or tribunal are relevant, butdocuments that discuss charges made indiplomatic jousting are not relevant.
Title:Up to 3 wordsbest describingthe topic
Description:One-sentencedescriptionof the topic
Narrative:Description of whatmakes a documentrelevant or irrelevant
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
TREC9 Results for a High-PerformingSystem
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cisi
on
Automaticallygenerated query
Manuallygenerated query
4 of top 5 documents relevant:precision = 80%; recall low!
This is a representativehigh-performing system
Manual (“expert”) choiceof query words worksbetter than automaticallygenerated queries
Note that if you need tofind “all” the literatureon a subject, you have tolook through lots of junk!
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Lessons from TREC
0 The basic retrieval paradigm indexes words;adding syntax, semantics hasn’t helped (yet)
0 For short queries, adding information (words) tothe query helps
- by hand (manual query creation)- by thesaurus (synonyms, semantic classes)- by feedback of relevant documents to cull
more key words0 Need finer-grained retrieval -- documents too big!
- This has led to question-answering evaluation
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Information Extraction
0 Information extraction is theidentification of domain-specificclasses of entities & relations among them
0 Input for extraction: documentsOutput: entities in documents, lists of relations
0 Metrics:- F-measure: harmonic mean of precision, recall
0 Systems need training data (annotated examplesfrom text) to “learn” how to identify entities andrelations
- Data is typically generated by human experts- Systems need 1000’s of examples - the more
data, the higher the performance
MITRE© 2001 The MITRE Corporation. ALL RIG HTS RESERVED.
Information Extraction
Information Extraction:documents to entities, relations
MEDLINE
PIR
Genbank
Collections:Gigabytes Documents:
Megabytes
Disease Source Country City_name Date Cases New_cases Dead
Ebola PROMED Uganda Gula 26-Oct-2000 182 17 64
Ebola PROMED Uganda Gula 5-Nov-2000 280 14 89
Ebola PROMED Uganda Gulu 13-Oct-2000 42 9 30Ebola PROMED Uganda Gulu 15-Oct-2000 51 7 31
Ebola PROMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PROMED Uganda Gulu 17-Oct-2000 73 2 35Ebola PROMED Uganda Gulu 18-Oct-2000 94 21 39
Ebola PROMED Uganda Gulu 19-Oct-2000 111 17 41
L i s t s , T a b l e s :
K i l o b y t e s
Protease-resistantprion protein
interacts with...
Phrases: Bytes
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Information Extraction:Epidemiology Example
1. Extract entities from text (color coded via HTML)Disease Source Country City_nameDate Cases New_cases DeadEbola PROMED Uganda Gula 26-Oct-2000 182 17 64Ebola PROMED Uganda Gula 5-Nov-2000 280 14 89Ebola PROMED Uganda Gulu 13-Oct-2000 42 9 30Ebola PROMED Uganda Gulu 15-Oct-2000 51 7 31Ebola PROMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PROMED Uganda Gulu 17-Oct-2000 73 2 35Ebola PROMED Uganda Gulu 18-Oct-2000 94 21 39Ebola PROMED Uganda Gulu 19-Oct-2000 111 17 41
2. Extract outbreak events into table
0
50
100
150
200
250
300
350
400
10/1
3/20
0010
/20/
2000
10/2
7/20
0011
/3/2
000
11/1
0/20
0011
/17/
2000
11/2
4/20
00
TIME
Nu
mb
er C
ases
Cases
New_cases
Dead
3. Display events...
Total Cases; New Cases
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Information Extraction EvaluationsFor Newswire
0
10
20
30
40
50
60
70
80
90
100
1991 1992 1993 1995 1998 1999
Year
F-m
easu
re (
Acc
ura
cy)
Names: English
Names: Japanese
Names: Chinese
Relations
Events
Name extraction > 90%in English, Japanese;improving in Chinese
Relation extractionnow at over 80%
Event extractionless than 60%,improving slowly
Commercial nametaggers exist fornews reports in
multiple languages
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Lessons Learned from Extraction
0 Name extraction works well- For news (person, organization, location, time,
money), results are over 90%- Local information is used to identify names, e.g.,
morphology, terminology lists, local context0 Relations require more information --
identification of 2 entities & their relationship- Predicted relation accuracy =
Pr(E1)*Pr(E2)*Pr(R) ~(.93) * (.93) * (.93) = .800 Events are even harder
- More slots to fill means lower performance- Events require more cross-sentence information- Complex syntax in abstracts is a problem (see
examples from Park et al., PSB 2001)
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Question Answering(MITRE’s QANDA System)
Where did Dylan Thomas die?✖1. Swansea: In “Dylan: the Nine Lives of Dylan Thomas, Fryer makes a virtue of not coming from Swansea✖ 2. Italy: Dylan Thomas’s widow Caitlin, who died last week in Italy aged 81,
3. New York:Dylan Thomas died in New York 40 years ago next Tuesday
What diseases are caused by prions?
1. Both CJD and BSE are caused by mysterious particles ofinfectious protein called prions
2. Scientists trying to understand the epidemic face an unusual problem: BSE, scrapie, and CJD are caused by a bizarre infectious agent, the
prion which does not follow the normal rules of microbiology.
✖ 3. These diseases are caused by a prion, an abnormal version of a naturally-occurring protein, but researchers have recognized different strains of prions that differ in incubation times, symptoms, and severity of illness. ...
MITRE© 2 001 The MITRE Corporation. ALL RIGHTS RESERVED.
Question Answering
MEDLINE
PIR
Genbank
Collections:Gigabytes Documents:
Megabytes
Disease Source Country City_name Date Cases New_cases DeadEbola PROMED Uganda Gula 26-Oct-2000 182 17 64
Ebola PROMED Uganda Gula 5-Nov-2000 280 14 89Ebola PROMED Uganda Gulu 13-Oct-2000 42 9 30
Ebola PROMED Uganda Gulu 15-Oct-2000 51 7 31Ebola PROMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PROMED Uganda Gulu 17-Oct-2000 73 2 35
Ebola PROMED Uganda Gulu 18-Oct-2000 94 21 39Ebola PROMED Uganda Gulu 19-Oct-2000 111 17 41
L i s t s , T a b l e s :
K i l o b y t e s
Protease-resistantprion protein
interacts with...
Phrases: Bytes
Question Answering:question to answer
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Question Answering0 Stage 1: Question analysis
- Find type of object that answers the question:“when” needs time, “which proteins” need protein
0 Stage 2: Document retrieval- Using (augmented) question, retrieve set of
possibly relevant documents via informationretrieval
0 Stage 3: Document processing- Search documents for entities of the desired type
using information extraction- Search for entities in appropriate relations
0 Stage 4: Rank answer candidates0 Stage 5: Present the answer (N bytes, or a phrase
or a sentence or a summary)
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
TREC Q&A 2000 Results (250-byte)
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
SMU
U Waterlo
oLIM
SI
Imperia
l Colle
ge
U Montre
al
U Sheffield
U Mass
Nat Taiw
an U
D.I. (Pisa)
Seoul Nat U
Harabagiu and Moldovan,Southern Methodist University
Mean Reciprocal Rank: 76%First Answer Correct: 69%Correct Answer in Top 5: 86%
Lessons: question answering works -- at least for simple factual questions
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Outline
1. Overview: why natural language?2. Where we are: technologies for mining text3. Creating a challenge evaluation4. Recommendations
☛
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
What is a Challenge Evaluation?
0 A challenge evaluation is … an evaluation party0 The host provides
- The challenge problem- The data to feed (train) the systems- The evaluation metric to judge the systems
0 The guests bring their systems0 The guests then compete and share results and
insights, refereed by the host0 The CASP* evaluations are an example of a
challenge evaluation in biology: predict 3D proteinstructure from linear sequence data
* Critical Assessment of techniques for protein Structure Prediction
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Corpus-Based Evaluation Method
1. TASK: Define a useful task where people care aboutthe results
2. GOLD STANDARD: Have human experts create a“gold standard” or answer key for a representativedata sample
3. SCORING: Devise a method to score “correctness”of a result compared to the gold standard
4. TRAINING: Use the annotated data to “train” asystem to emulate human performance
5. EVALUATION: Evaluate system performanceagainst gold standard on unseen (blind) test data
6. ITERATION: Iterate and improve
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Example of a Task*
*Table from Stephens et al, PSB 2001
Extraction ofprotein-proteinrelations fromthe literaturevia a thesaurus-based approach
Lessons:Pick real problems where possibleMake sure people can do the taskChoose intuitive evaluation metricsShare… data, tools, metrics
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Tool from Genia* (Ohta et al, U Tokyo)
*Tools for Ontology-based Corpus AnnotationTomoko OHTA, Yuka TATEISI, and Jun’ichi TSUJIIUniversity of Tokyo, Tutorial at ISMB 01
Click “Insert”to insert a new
tag
We are starting tosee shared data,shared tools
Now we need toshare an evaluation
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Outline
1. Overview: why mine the literature?2. Where we are: technologies for mining text3. Creating a challenge evaluation4. Recommendations☛
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Text Mining for Bioinformatics:Recommendations
0 Goal:- Enable rapid progress in text mining for
biology- Transfer (or leapfrog over!) results from text
processing for newswire= Is biology easier? A restricted domain withan ontology
= Or is it harder - syntax is complex, newterms introduced constantly, confusionbetween gene vs. protein, ...
0 Approach:- Create a challenge evaluation for text mining
for biology
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Steps for Creating a Challenge Evaluation
0 Assess state of the art- Inventory current approaches & results- Inventory available annotated data, knowledge
sources (ontologies), NLP tools, standards0 Identify participants
- Researchers: biologists, NL researchers,bioinformatics researchers,…
- Identify other stakeholders: biotech industry,pharmaceuticals, standards organizations,…
0 Identify infrastructure needs- How much data, on what timetable?- How to define interesting problems w answers?
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Who Do We Need?0Users (biologists) to define a set of relevant
problems with “right answers” & create data sets0Researchers (NL researchers) w. relevant
technology, e.g., entity taggers, event extraction,retrieval systems
0Data providers who have relevant ontologies anddata collections and standards
0Funders who will pay for- Preparation of data- Creation of evaluation tools- Running the evaluation
0Evaluator, who will coordinate the evaluationDiscussion of possible challenge evaluations will continue
at PSB2002 in the Natural Language Session
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
References"Automatic extraction of biological information from scientific text:
Protein-protein interactions,” Christian Blaschke, Miguel A.Andrade, Christos Ouzounis, and Alfonso Valencia; InternationalConference on Intelligent Systems for Molecular Biology.Heidelberg, 1999.
“Including Biological Literature Improves Homology Search,” J.T.Chang, S. Raychaudhuri, and R.B. Altman; Pacific Symposium onBiocomputing 6:374-383 (2001).
“Tools for Ontology-based Corpus Annotation,” Tomoko Ohta, YukaTateisi, and Jun’ichi Tsujii, Tutorial at ISMB 01
“Bidirectional Incremental Parsing for Automatic PathwayIdentification with Combinatory Categorial Grammar,” J. C. Park,H. S. Kim, and J. J. Kim; Pacific Symposium on Biocomputing 6:396-407 (2001).
“Detecting Gene Relations from MEDLINE Abstracts.” M. Stephens,M. Palakal, S. Mukhopadhyay, R. Raje, and J. Mostafa; PacificSymposium on Biocomputing 6:483-496 (2001).
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Back-Ups
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Text Mining for Bioinformatics:Recommendations(1) Use ontologies to define objects of interest
- Entity classes (proteins, genes, …)- Event types (protein-protein interaction, …)
(2) Create training data to train systems- 100s documents, 1000’s of tagged entities
(3) Systems must handle language complexity- Journal abstracts are complex and filled with
(new) terminology; this may require differentsyntactic & discourse processing than newswire
(4) Biologists must specify what output they want- E.g., extract database of relations, find answers
to questions, seed further searches,...
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
1. Task Definition0 Basic principle:
- If people cannot do a task reliably andreproducibly, a program cannot do it either!
- Verify by having several experts perform thetask (e.g., mark up data with “right answers”)
- For example, for the task of identifying propernames as Person or Organization or Location,people agree 98% of time
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
2. Corpus Creation
0 Define the things of interest for the task, e.g.,- Genes, DNA, RNA, proteins, and the relations
among them0 Provide correctly annotated data
- Tools to aid the human annotator are needed
Note: having an ontology and a listof terms is very useful at this stage
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
3. Define Automated Evaluation Method0 An ideal evaluation function is
- Intuitive- Highly correlated with important functionality,
e.g., for speech transcription, evaluationfunction is word error
- For NLP, evaluation is often “accuracy”measured as precision and recall:
= precision: of things classified, what percentare correct?
= Recall: of things that should have been in aclass, what percent were returned?
= F-measure: harmonic mean of P&R
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
4. System TrainingIdentify patterns in data, eitherby machine learning or byhand-crafted heuristics
These patterns can recognizepreviously unseen entitiesfrom context
Identify patterns in data, eitherby machine learning or byhand-crafted heuristics
These patterns can recognizepreviously unseen entitiesfrom context
From data co-occurrences in text, we see that:binding to PROTEIN occurs frequentlyconversion of PROTEIN occurs frequently
From data co-occurrences in text, we see that:binding to PROTEIN occurs frequentlyconversion of PROTEIN occurs frequently
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
5. Evaluation
0 Test the system on blind (previously unseen) data0 Compare results across systems, to understand
what techniques work, which ones don’t
6. Iteration0 Iterate and improve
- Note: may want to improve the task definition,the scoring function, the amount or quality oftraining data, the rules used by the system,...
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Question Answering
MEDLINE
PIR
Genbank
Collections:Gigabytes Documents:
Megabytes
Disease So urce Country City_n ameDate C ases N ew_case s Dea dEbola PR OMED Uganda Gula 26-Oct-2000 182 17 64Ebola PR OMED Uganda Gula 5-Nov-2000 280 14 89
Ebola PR OMED Uganda Gulu 13-Oct-2000 42 9 30Ebola PR OMED Uganda Gulu 15-Oct-2000 51 7 31
Ebola PR OMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PR OMED Uganda Gulu 17-Oct-2000 73 2 35
Ebola PR OMED Uganda Gulu 18-Oct-2000 94 21 39Ebola PR OMED Uganda Gulu 19-Oct-2000 111 17 41
Lists,Tables:Kilobytes
Protease-resistantprion protein
interacts with...
Phrases: Bytes
Question Answering:question to answer
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Information Extraction
Information Extraction:documents to entities, relations
MEDLINE
PIR
Genbank
Collections:Gigabytes Documents:
Megabytes
Disease So urce Country City_n ameDate C ases N ew_case s Dea dEbola PR OMED Uganda Gula 26-Oct-2000 182 17 64Ebola PR OMED Uganda Gula 5-Nov-2000 280 14 89
Ebola PR OMED Uganda Gulu 13-Oct-2000 42 9 30Ebola PR OMED Uganda Gulu 15-Oct-2000 51 7 31
Ebola PR OMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PR OMED Uganda Gulu 17-Oct-2000 73 2 35
Ebola PR OMED Uganda Gulu 18-Oct-2000 94 21 39Ebola PR OMED Uganda Gulu 19-Oct-2000 111 17 41
Lists,Tables:Kilobytes
Protease-resistantprion protein
interacts with...
Phrases: Bytes
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Information Retrieval
MEDLINE
PIR
Genbank
Collections:Gigabytes Documents:
Megabytes
Disease So urce Country City_n ameDate C ases N ew_case s Dea dEbola PR OMED Uganda Gula 26-Oct-2000 182 17 64Ebola PR OMED Uganda Gula 5-Nov-2000 280 14 89
Ebola PR OMED Uganda Gulu 13-Oct-2000 42 9 30Ebola PR OMED Uganda Gulu 15-Oct-2000 51 7 31
Ebola PR OMED Uganda Gulu 16-Oct-2000 63 12 33Ebola PR OMED Uganda Gulu 17-Oct-2000 73 2 35
Ebola PR OMED Uganda Gulu 18-Oct-2000 94 21 39Ebola PR OMED Uganda Gulu 19-Oct-2000 111 17 41
Lists,Tables:Kilobytes
Protease-resistantprion protein
interacts with...
Phrases: Bytes
Information Retrieval:key words to documents
MITRE© 2001 The MITRE Corporation. ALL RIGHTS RESERVED.
Evaluating Question Answering Systems
0 TREC-9 Q&A Evaluation:- For each of 700 factual short-answers questions- Each system must return a ranked list of 5
candidate answers (250-byte or 50-byte) basedon the standard TREC document collection
- Each question-answer pair is judged as corrector incorrect by a person (“assessor”)
- System score is mean reciprocal rank of correctanswers
0 For TREC-8 and TREC-9, all questions had answersthat consisted of a phrase
0 Later TRECs will include questions without answer,and questions with lists for answer