NLG STEC Workshop April 20-21, 2007 Arlington, VA Nancy Green Univ. of North Carolina Greensboro, USA

NLG STEC WorkshopApril 20-21, 2007Arlington, VA

Nancy GreenUniv. of North Carolina Greensboro, USA

NLG Pipeline Model & STEC

STEC

Pro-STEC Assumptions:

• (All/most/worth-funding) NLG can be decomposed into well-defined independent STEC-modules such that improving each one will advance NLG

• Input/output representation for STEC is non-controversial

Discourse KR&R

Domain CommunicationK

R&R

User Model KR&R

Media/ Presentation- related KR&R

NLG ‘Pipeline’ = Tip of Iceberg

Who will pay for NLG research outside of classical pipeline?: essential empirical research, major cost, but afraid it would fall outside of STEC funding model

Example NLG System KR&RGenIE: generates letters to genetics clinic patients; goal to justify medical

experts’ conclusions such that all arguments are comprehensible to a lay person

• Discourse: argumentation

• Domain Communication: conceptual causal model underlying expert-lay communication (not domain model)

• User Model: model of appraisal • Media/Presentation: how presentation affects argument

comprehension

Lesson from GenIE• NLG Pipeline = global control + sentence planning/realization

• can use existing surface realizers, standard domain ontology, and lexical resources

• Main cost has been KR&R modules; mainly empirical work: • Goal: find non-domain-specific principles/ guidelines to

optimize lay audience’s comprehension of arguments• Corpus studies: very useful but not sufficient• Controlled studies: necessary, and cannot afford to wait for

other disciplines (HCI, learning sciences, etc.) to do them for us

GenIE Corpus Studies• Intercoder reliability of content annotation

scheme: used to justify domain communication model

• Argumentation schemes (non-domain-specific, both normative and affective)

• Stylistic (lexical/syntactic) features of author perspective

• Argument presentation features (order, cue words, explicitness)

GenIE Controlled Studies

• How multimedia layout, cross-media cue words affect comprehension

• How argument presentation (explicit vs. implied claim, cue words) affects recognition of argument components (Claim vs. Data) & dependence of final claim on intermediate claims

NLG Pipeline Model & STEC

STEC

Pro-STEC Assumptions:

• (All/most/worth-funding) NLG can be decomposed into well-defined independent STEC-modules such that improving each one will advance NLG

• Input/output representation for STEC is non-controversial

STEC Input/Output Problem

Different input representations needed for different types of output; e.g. compare requirements for:

• Fixed-format text (original scope of NLG)

• Task-appropriate, user-friendly text format (e.g. line length, paragraphing, headings, font)

• Text and (reported or quoted) dialogue in story

• Dialogue spoken by animated emoting conversational agent

• Integrated text and images or data graphics

• Text referring to physical or visual properties of presentation (‘The red line in Fig. 2 shows sales in 2002.’)

Big Challenges

Empirical research to test computation- oriented, general theories, principles, guidelines to answer:

• What makes a “text” (i.e. including spoken

dialogue, MMPs, etc.)• Coherent? In story dialogue, believable? • User-friendly? Task-appropriate? • Comprehensible? Pedagogically effective?

• Entertaining (suspenseful, funny, etc.)?

Ex. Challenges (cont.)

• How does channel change answer? • E.g. HCI research: cannot assume findings for

paper apply to computer screen

• How does length change answer? • E.g. learning sciences: 300-word summary vs. 3-

page science argument for middle school

• How do individual differences matter?• E.g. cognitive impairments, affect

Conclusions• Need some NLG research with massively

interdisciplinary view: cognitive science, communication studies, etc.

• Need some NLG research motivated by search for answers to general questions such as above

• Will STEC approach effectively kill the above kind of NLG research?

Documents

NLG STEC Workshop April 20-21, 2007 Arlington, VA Nancy Green Univ. of North Carolina Greensboro, USA