24
ResPubliQA 2010: QA on European Legislation Anselmo Peñas, UNED, Spain Pamela Forner, CELCT, Italy Richard Sutcliffe, U. Limerick, Ireland Alvaro Rodrigo, UNED, Spain http://celct.isti.cnr.it/ResPubliQA/ 1

ResPubliQA 2010: QA on European Legislation

  • Upload
    teigra

  • View
    31

  • Download
    0

Embed Size (px)

DESCRIPTION

ResPubliQA 2010: QA on European Legislation. Anselmo Peñas , UNED, Spain Pamela Forner , CELCT , Italy Richard Sutcliffe , U. Limerick, Ireland Alvaro Rodrigo , UNED , Spain http://celct.isti.cnr.it/ResPubliQA/. Outline. - PowerPoint PPT Presentation

Citation preview

Page 1: ResPubliQA  2010: QA on European Legislation

1

ResPubliQA 2010:QA on European

LegislationAnselmo Peñas, UNED, Spain

Pamela Forner, CELCT, ItalyRichard Sutcliffe, U. Limerick, Ireland

Alvaro Rodrigo, UNED, Spain

http://celct.isti.cnr.it/ResPubliQA/

Page 2: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 2

Outline

The Multiple Language Question Answering Track at CLEF – a bit of History

ResPubliQA this year– What is new

Participation, Runs and Languages Assessment and Metrics Results Conclusions

Page 3: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 3

Multiple Language Question Answering at CLEF

Era I: 2003-2006

Era II: 2007-2008

Era III: 2009-2010

Ungrouped mainly factoid questions asked against monolingual newspapers; Exact answers returned

Grouped questions asked against newspapers and Wikipedia; Exact answers returned

ResPubliQA - Ungrouped questions against multilingual parallel-aligned EU legislative documents; Passages returned

Started in 2003: eighth year

Page 4: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 4

ResPubliQA 2010 – Second Year Key points:– same set of questions in all languages– same document collections: parallel aligned documents

Same objectives:– to move towards a domain of potential users– to allow the direct comparison of performances across

languages– to allow QA technologies to be evaluated against IR

approaches– to promote use of Validation technologies

But also some novelties…

Page 5: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 5

What’s new

1. New Task (Answer Selection)2. New document collection (EuroParl) 3. New question types4. Automatic Evaluation

Page 6: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 6

The Tasks Paragraph Selection (PS) – to extract a relevant paragraph of text

that satisfies completely the information need expressed by a natural language question

Answer Selection (AS)– to demarcate the shorter string of text

corresponding to the exact answer supported by the entire paragraph

NEW

Page 7: ResPubliQA  2010: QA on European Legislation

7

The Collections Subset of JRC-Acquis (10,700 docs per lang)– EU treaties, EU legislation, agreements and resolutions– Between 1950 and 2006– Parallel-aligned at the doc level (not always at paragraph)– XML-TEI.2 encoding

Small subset of EUROPARL (~ 150 docs per lang)– Proceedings of the European Parliament

• translations into Romanian from January 2009• Debates (CRE) from 2009 and Texts Adopted (TA) from 2007

– Parallel-aligned at the doc level (not always at paragraph)– XML encoding

ResPubliQA 2010, 22 September, Padua, Italy

NEW

Page 8: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 8

EuroParl Collection is compatible with Acquis domain allows to widen the scope of the

questions

Unfortunately– small number of texts • documents are not fully translatedThe specific fragments of JRC-Acquis and Europarl

used by ResPubliQA is available at http://celct.isti.cnr.it/ResPubliQA/Downloads

Page 9: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 9

Questions

two new question categories:– OPINION What did the Council think about the terrorist attacks on

London?– OTHERWhat is the e-Content program about?

Reason and Purpose categories merged togetherWhy was Perwiz Kambakhsh sentenced to death?

And also Factoid, Definition, Procedure

Page 10: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 10

ResPubliQA Campaigns

TaskRegistere

dgroups

Participant groups

Submitted Runs

Organizing people

ResPubliQA 2009 20 11 28 + 16

(baseline runs) 9

ResPubliQA 2010 24 13

49 (42 PS and 7

AS)

6 (+ 6 additional translators/assesso

rs)More participants and more submissions

Page 11: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 11

ResPubliQA 2010 Participants

System name Team Reference

bpac SZTAKI, HUNGARY Nemeskey

dict

Dhirubhai Ambani Institute of Information and Communication Technology, INDIA Sabnani et al

elix University of Basque Country, SPAIN Agirre et alicia RACAI, ROMANIA Ion et aliles LIMSI-CNRS, FRANCE Tannier et alju_c Jadavpur University, INDIA Pakray et alloga University Koblenz, GERMANY Glöckner and Pelzernlel U. Politecnica Valencia, SPAIN Correa et alprib Priberam, PORTUGAL -uaic Al.I.Cuza\ University of Iasi, ROMANIA Iftene et al

uc3mUniversidad Carlos III de Madrid, SPAIN Vicente-Díez et al

uiir University of Indonesia, INDONESIA Toba et aluned UNED, SPAIN Rodrigo et al

13 participants 8 countries

4 new participants

Page 12: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 12

Submissions by Task and Language

Target language Source

languages

DE EN ES FR IT PT RO TotalDE 4 (4,0) 4 (4,0)EN 19

(16,3)2

(2,0)21

(18,3)ES 7

(6,1)7 (6,1)

EU 2 (2,0) 2 (2,0)FR 7 (5,2) 7 (5,2)IT 3 (2,1) 3 (2,1)PT 1 (1,0) 1 (1,0)RO 4

(4,0)4 (4,0)

Total 4 (4,0)

21 (18,3)

7 (6,1)

7 (5,2)

3 (2,1)

1 (1,0)

6 (6,0)

49 (42,7)

Page 13: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 13

System Output Two options:– Give an answer (paragraph or exact answer)– Return NOA as response = no answer is given

The system is not confident about the correctness of its answer

Objective:– avoid to return an incorrect answer– reduce only the portion of wrong answers

Page 14: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 14

Evaluation Measure

)(11@nnnn

nc R

UR

nR: number of questions correctly answerednU: number of questions unansweredn: total number of questions (200 this year)

If nU = 0 then c@1=nR/n Accuracy

Page 15: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 15

AssessmentTwo steps:1) Automatic evaluationo responses automatically compared against the Gold

Standard manually produced – answers that exactly match with the GoldStandard, are

given the correct value (R) – correctness of a response: exact match of Document

identifier, Paragraph identifier, and the text retrieved by the system with respect to those in the GoldStandard

2) Manual assessmento Non-matching paragraphs/ answers judged by human

assessorso anonymous and simultaneous for the same question

31% of the answers automatically marked as correct

Page 16: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 16

Assessment for Paragraph Selection (PS)

binary assessment: – Right (R) – Wrong (W)

NOA answers:– automatically filtered and marked as U (Unanswered)– discarded candidate answers were also evaluated

• NoA R: NoA, but the candidate answer was correct• NoA W: NoA, and the candidate answer was incorrect• Noa Empty: NoA and no candidate answer was given

evaluators were guided by the initial “gold” paragraph– only a hint

Page 17: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 17

Assessment for Answer Selection (AS)

R (Right): the answer-string consists of an exact and correct answer, supported by the returned paragraph;

 

X (ineXact): the answer-string contains either part of a correct answer present in the returned paragraph or it contains all the correct answer plus unnecessary additional text;

 M (Missed): the answer-string does not contain a correct answer even in part but the returned paragraph in fact does contain a correct answer;

W (Wrong): the answer-string does not contain a correct answer and moreover the returned paragraph does not contain it either; or it contains an unsupported answer

Page 18: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 18

Monolingual Results for PS

system DE EN ES FR IT PT ROCombination 0.75 0.94 0.82 0.74 0.73 0.56 0.70

uiir101 0.73dict102 0.68bpac102 0.68loga102 0.62loga101 0.59prib101 0.56nlel101 0.49 0.65 0.56 0.55 0.63bpac101 0.65elix101 0.65

IR baseline (uned)

0.65 0.54

uned102 0.54uc3m102 0.52uc3m101 0.51dict101 0.64uiir102 0.64

uned101 0.63elix102 0.62nlel102 0.59 0.62 0.20 0.55 0.53ju_c101 0.50iles102 0.48 0.36uaic102 0.46 0.24 0.55uaic101 0.43 0.30 0.52icia102 0.49

Page 19: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 19

Improvement in the Performance

BEST AVERAGEResPubliQA 2009 0.68 0.39ResPubliQA 2010 0.73 0.54

Monolingual PS Task:

2010 Collections BEST AVERAGEJRC-Acquis 0.71 0.53EuroParl 0.77 0.55

Page 20: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 20

Cross-language Results for PS

system DE EN ES FR IT PT RO

elix102euen 0.36

elix101euen 0.33

icia101enro 0.29icia102enro 0.29In comparison to ResPubliQA 2009:

– More cross-language runs (+ 2) – Improvement in the best performance: from c@1 0.18 to 0.36

Page 21: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 21

Results for the AS Task

System c@1 #R #W

#M

#X #NoA

#NoA R

#NoA W

#NoA M

#NoA X

#NoA empty

combination 0.30 60 140

0 0 0 0 0 0 0 0

ju_c101ASenen

0.26 31 12 10 8 115 0 40 24 0

75iles101ASen

en

0.09 17 124

6 44 9 0 0 0 0

9iles101ASfrfr 0.08 14 12

87 36 15 0 0 0 0

15nlel101ASen

en

0.07 10 97 20 6 67 0 0 0 0

67nlel101ASes

es

0.06 12 138

21 1 28 0 0 0 0

28nlel101ASitit 0.03 6 13

918 7 30 0 0 0 0

30nlel101ASfrfr 0.02 4 13

213 11 40 0 0 0 0

40

Page 22: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 22

Conclusions

Successful continuation of ResPubliQA 2009

AS task: few groups and poor results Overall improvement of results New document collection and new

question types c@1 evaluation metric encourages the

use of validation module

Page 23: ResPubliQA  2010: QA on European Legislation

ResPubliQA 2010, 22 September, Padua, Italy 23

More on System Analyses and Approaches

MLQA’10 Workshop on Wednesday 14:30 – 18:00

Page 24: ResPubliQA  2010: QA on European Legislation

24

ResPubliQA 2010:QA on European

LegislationThank you!