45
Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University of Sao Paulo # = University of Aberdeen

Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Embed Size (px)

Citation preview

Page 1: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Overspecified reference in hierarchical domains: measuring the benefits for readers

Ivandre Paraboni *

Judith Masthoff #

Kees van Deemter #

* = University of Sao Paulo# = University of Aberdeen

Page 2: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

What this is about

• Generation of Referring Expressions (GRE)• Referring expression is overspecified

if a clear referring expression can be obtained by removing a property

• Informally: overspecified = logically redundant

Page 3: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Introduction to the problem

Suppose– I live on Western Road, the longest street in

Aberdeen– I live at number 968. No other house in Aberdeen

has that number

“Number 968, Aberdeen” is a distinguishing description, but it’s not very useful

It’s better to add logically redundant information, e.g., “968 Western Road, Aberdeen” , or even “968 Western Road, Bon Accord, Aberdeen”

Page 4: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Overspecification in referring expressions

• Any GRE algorithm that does not achieve “Full Brevity” (Dale 1989)

• Investigated in its own right by e.g.– Arts 2004 (role of location; purely empirical) – Jordan 2000 (overspec in specific situations,

e.g., when a sale is confirmed)– Horacek 2005 (overspec when there is

uncertainty about applicability of properties)

Page 5: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Our focus:

• The need for overspecification when a large domain is not fully known in advance to a hearer. Typical examples involve space or time:– A house in a city, a photocopier in a building,

a picture in a document– (An event or object in time, e.g., ‘the minister

of the colonies in the XYZ government’ )

• This talk: empirical validation of algorithms

Page 6: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Caveat

• Overspecification can make it easier to identify the referent ...

• ... but it is bound to lengthen reading times

• Our terminology: we expect overspecification – to make interpretation harder– to make resolution easier

Page 7: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Short history ...

Paraboni & van Deemter (INLG-2002):

• A simple theory of the way in which hearers perform search. Ancestral Search (AS)

• Two types of situations that AS predicts to be problematic for hearers: Lack of Orientation (LO) and Dead End (DE).

• An algorithm (in two flavours) that adds redundant information when AS predicts these problems

• An experiment to test whether these algorithms improve the output of GRE

Page 8: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

(1) Lack of Orientation (LO)

University of Brighton

Watts building Cockcroft building

North Wing South Wing North West South

biblioteca bibliotecaauditorium

“the West Wing”

Page 9: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

(2) Dead End (DE)

University of Brighton

Watts building Cockcroft building

North Wing ? South Wing North West South

library libraryauditorium

“the library in the North

Wing”

Page 10: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Explanation (informal!)

• Why are LO and DE bad?

• Ancestral Search (AS):

“Search locally, then one level up at a time”

• Essentially, this is just salience (cf. Krahmer & Theune 2000) applied to hierarchies

Page 11: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Summary of Experiment 1: Descriptions compared by subjects

• 15 subjects were shown documents from which most of the words were deleted

• Binary forced choice between two expressions that refer to document parts:

1. the obvious minimal description

2. the redundant description generated by our algorithm

Page 12: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

What the subjects chose between (example)

Page 13: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Hypotheses & Outcomes

• Hyp 1: In problematic situations, redundant descriptions are preferred

• Hyp 2: In non-problematic situations,non-redundant descriptions are preferred

• Outcomes:– Hyp 1: overwhelmingly confirmed– Hyp 2: trend in the right direction (57%),

but not statistically significant. (Too few subjects?)

Page 14: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Limitations of first experiment

• This experiment was hybrid: partly about reading, partly about writing

• It did not teach us why redundant descriptions were preferred (in problematic cases)

• We think this was because non-redundant descriptions caused problems for resolution ...

• ... but the experiment did not address resolution separately. (Subjects may have balanced interpretation and resolution when judging).

Page 15: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

What next?

• Therefore, a new experiment was called for, which addresses resolution only.

• Documents as our domain again• Add hyperlinks to support non-linear search

through the document• Track readers’ resolution (i.e., search) process• Intricate experiment, hence a new author:

Judith Masthoff (University of Aberdeen)

Page 16: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Experiment 2: Tracking resolution

• Effect of logical redundancy on the performance of readers

• Focussing on resolution

Page 17: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Experimental Design

• 40 subjects completed experiment• Within-subjects design:

each subject shown 20 documents• Order of documents randomized• Documents were made to look different• Reader had knowledge of hierarchical structure• Reader was given task: “Please click on..”• Navigation actions recorded

Page 18: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University
Page 19: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

“Let’s talk about helicopters. Please click on picture 4 in part C”

Reader Location

Page 20: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University
Page 21: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University
Page 22: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University
Page 23: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University
Page 24: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Hypothesis 1

• In a problematic (DE/LO) situation, the number of navigation actions required for a long (FI/SL) description is smaller than that required for a minimal description.

• Informally: redundancy helps resolution! (in problematic situations)

Page 25: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

But ...

• it seems likely that redundant information will always help resolution

• so let’s compare the “Gain” in problematic/unproblematic situations

Page 26: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Hypothesis 2

• The Gain achieved by a long description over a minimal description will be larger in a problematic situation than in a non-problematic situation

• Informally: redundancy helps especially in problematic situations

Page 27: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

But ...

• Even more redundancy might have helped even more

• The obvious candidate: a complete description• Compare cases where our algorithm prescribes a

complete description with ones where it does not.

• We want b to be greater than a:

a = Gain(complete-description, incomplete-description-generated-by-algorithm)

b = Gain(complete-description-generated-by-algorithm, incomplete-description)

Page 28: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Hypothesis 3

• The Gain of a complete description over a less complete one will be larger for a situation in which our algorithms generated the complete description, than for a situation in which our algorithms generated the less complete description.

Page 29: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Results: Hypothesis 1

0123456789

1 DE 2 DE 3 LO 4 LO 5 LO 6 LOSituation

# Clicks

MD

Long (SL/FI)

Do redundant descriptions benefitproblematic situations?

Page 30: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Results: Hypothesis 1

0123456789

1 DE 2 DE 3 LO 4 LO 5 LO 6 LOSituation

# Clicks

MD

Long (SL/FI)

Do redundant descriptions benefitproblematic situations?

Yes!

Page 31: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Results: Hypothesis 2

0

1

2

3

4

5

1 DE 7 NONE 2 DE 8 NONE

Situations

# Clicks

MD

Long

Do redundant descriptions benefit problematic situations MORE than non-problematic situations?

Page 32: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Comparing like with like

• General Linear Model (GML) with repeated measures

• Comparison of similar situations, e.g. 2 and 7

sit2&7: minimal = “pic.3 in part A” redundant = “pic.3 in part A of section 2”

sit2: reader is in same section as targetsit7: reader is in a different section

Page 33: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Results: Hypothesis 2

0

1

2

3

4

5

1 DE 7 NONE 2 DE 8 NONE

Situations

# Clicks

MD

Long

Do redundant descriptions benefit problematic situations MORE than non-problematic situations?

Yes!

Page 34: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Results: Hypothesis 3

0

1

2

3

4

5

6

3 LO 5 LO 4 LO 6 LO

Situation

# Clicks

Not complete

Complete

FI FI FI FI

Are our algorithms economical with redundancy?

Page 35: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Results: Hypothesis 3

0

1

2

3

4

5

6

3 LO 5 LO 4 LO 6 LO

Situation

# Clicks

Not complete

Complete

FI FI FI FI

Are our algorithms economical with redundancy?

Yes!

Page 36: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

How much overspecification is optimal ?

University of Brighton

Watts building Cockcroft building

North Wing South North West South

library libraryauditorium

“The auditorium”

“The ...in the North Wing”

“The .... in the Watts building”

“The .... on this campus”

Page 37: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

• Which of all these descriptions is best?• Depends on issues other than the structure of the

domain, e.g.,– how much time/space has the speaker/writer

available?– how important is it that misunderstandings are

avoided? [cf., Van Deemter et al., this conference]

– is there room for negotiation through dialogue [cf., Khan et al., this conference])

Page 38: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

In setting of this experiment

• We did not find a point beyond which overspecification backfires

• We did find a point of “diminishing returns” for resolution speed

• Given that interpretation deteriorates with every added property, the figures are suggestive

Page 39: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Getting a feeling for the numbers

• Nonproblematic situations (situations 7 and 8):– short descr: 1.53 clicks (2 properties)– redundant (other): 1.34 clicks (3 properties)

• Problematic situations (situations 3 and 4):– short descr: 4.05 clicks (1 property)– redundant (algorithm): 1.77 clicks (2 properties)– redundant(other): 1.31 clicks (3 properties)

Page 40: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Conclusion

• Overspec can have many reasons (Jordan 2000, Horacek 2005)

• Overspec isn’t always equally necessary• Focus on overspec for guiding “resolution”• The optimum amount of overspec

is hard to determine• But we have found a point of diminishing

returns, based on the need to avoid DE and LO.

Page 41: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Additional slides

Page 42: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

[ A medical comparison

• A hospital with two types of patients, all of whom have coughing (cf., clicking!) as their main symptom– chest infections (serious patients)– throat infections (light patients)

• you can administer 1, 2, or 3 of pills (cf., properties). But pills can be harmfull, so the doctor uses them sparingly

Page 43: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

The doctor’s regime:

• light patients should get 1 pill

• serious patients should get 2 pills on a normal night, and 3 pills on a bad night

Is this a wise regime?

Tests were done ...

Page 44: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Test of effectiveness of pills

1. Serious patients who get their 2 or 3 pills start coughing less

2. Serious patients benefit more from getting their prescribed high number of pills (as opposed to just 1) than light patients

3. Focus on serious patients. Try giving the ones that are having a good night 3 pills (i.e. one more than prescribed). They benefit less (from getting 3 instead of 2 pills) than the ones that are having a bad night benefitted (from getting 3 instead of 2 pills).

]

Page 45: Overspecified reference in hierarchical domains: measuring the benefits for readers Ivandre Paraboni * Judith Masthoff # Kees van Deemter # * = University

Results on Search Behaviour

# subjects

0123456789

10

0 1 2 3 4 5 6 7 8 9 10 11 12

# Deviations from Ancestral Search in first navigation action for 12 documents with incomplete descriptions