33
Yves Marcoux - OLST-RALI - 21 mars 2007 1 Une approche basée sur la langue naturelle pour la modélisation de documents structurés Yves MARCOUX GRDS – EBSI Université de Montréal

Yves Marcoux - OLST-RALI - 21 mars 20071 Une approche basée sur la langue naturelle pour la modélisation de documents structurés Yves MARCOUX GRDS – EBSI

Embed Size (px)

Citation preview

Yves Marcoux - OLST-RALI - 21 mars 2007 1

Une approche basée sur la langue naturelle pour la modélisation de

documents structurés

Yves MARCOUXGRDS – EBSI

Université de Montréal

Yves Marcoux - OLST-RALI - 21 mars 2007 2

A natural-language approach to modeling

Why is some XML so difficult to write?

<http://www.idealliance.org/papers/extreme/proceedings/html/2006/Marcoux01/EML2006Marcoux01.html>

Yves Marcoux - OLST-RALI - 21 mars 2007 3

Structure of the talk

1. The problem

2. Proposed direction for solution

3. Conclusion

4. Question period

Yves Marcoux - OLST-RALI - 21 mars 2007 4

Writing well-formed XML: author’s choices

• <sex><male /></sex>• <is-female>FALSE</is-female>• <gender gender="&#x2642;" />• <note>It's a boy!</note>

&#x2642; = ♂

Yves Marcoux - OLST-RALI - 21 mars 2007 5

Writing valid XML is collaborative work

• Modeler has chosen the markup (container)

• Author supplies the contents

• Much like a form

• Collaborative work communication between parties: modeler and author

• But the modeler is gone…

Yves Marcoux - OLST-RALI - 21 mars 2007 6

Problem

• Authoring environments are:– good at conveying the syntactic intentions (or

decisions) of the modeler– not as good at conveying the semantic

intentions of the modeler

• Often, all there is is a generic ID or some slightly more developed form– Ex.: “date” in a memo

Yves Marcoux - OLST-RALI - 21 mars 2007 7

What is available?

• More or less developed forms of genIDs (and attribute names)

• General documentation of the model

• Per element (attribute) documentation

• OK for tooltips or popups

• Could we do better?

• (Applications / stylesheets are not appropriate)

Yves Marcoux - OLST-RALI - 21 mars 2007 8

Could we aim at…

• Having a semantic conversation right in the editing window?

• In the same way that there is actually a syntactic conversation?

• Yes…

Yves Marcoux - OLST-RALI - 21 mars 2007 9

Structure of the talk

1. The problem

2. Proposed direction for solution

3. Conclusion

4. Question period

Yves Marcoux - OLST-RALI - 21 mars 2007 10

Key idea

• Have modeler prepare bits of NL (prose)

• That can be intertwined with author-supplied contents to give them meaning

• Allows “fill-in”-like sentences

• And thus, a semantic conversation in the editing window

• NB: modeler segments can contain hyperlinks

Yves Marcoux - OLST-RALI - 21 mars 2007 11

Example

Facts about some US cities

City PopulationAnnual snowfall (inches)

Denver 850,000 23

Rochester 240,000 88

Palm Spring 48,000 0

Yves Marcoux - OLST-RALI - 21 mars 2007 12

Raw XML

<facts-about-US-cities> <city> <name>Denver</name> <population>850,000</population> <annual-snowfall-in-inches>23</annual-snowfall-in-inches> </city> <city> <name>Rochester</name> <population>240,000</population> <annual-snowfall-in-inches>88</annual-snowfall-in-inches> </city> ...</facts-about-US-cities>

Yves Marcoux - OLST-RALI - 21 mars 2007 13

Prose equivalent

Here are facts about some US cities. The city of Denver has a population of 850,000 and an annual snowfall of 23 inches. The city of Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city of Palm Spring has a population of 48,000 and an annual snowfall of 0 inches.

Yves Marcoux - OLST-RALI - 21 mars 2007 14

Modeler prepares “peritext” segments

Element text-before text-after

facts-about-US-cities"Here are facts about some US cities."

empty

city " The city " "."

name "named " empty

population" has a population of "

empty

annual-snowfall-in-inches" and an annual snowfall of "

" inches"

Yves Marcoux - OLST-RALI - 21 mars 2007 15

Possible “semantic” view

Here are facts about some US cities. The city named Denver has a population of 850,000 and an annual snowfall of 23 inches. The city named Rochester has a population of 240,000 and an annual snowfall of 88 inches. The city named Palm Spring has a population of 48,000 and an annual snowfall of 0 inches.

Yves Marcoux - OLST-RALI - 21 mars 2007 16

What it allows during editing (in semantic view)

• Peritexts convey the semantic intentions of the modeler

• A semantic conversation takes place in the editing window (instead of a syntactic one)

• Fill-in sentences:– Make “tag abuse” embarrassing…– Likely to reduce some kinds of errors

• Other views / fragment viewing / hyperlink

Yves Marcoux - OLST-RALI - 21 mars 2007 17

Discussion

• This is not like defining an application– Not a stylesheet mechanism

• Peritexts (fixed here) could be allowed to vary with some parameters:– position among siblings– attribute value– etc.

• (Attributes should be treated)

Yves Marcoux - OLST-RALI - 21 mars 2007 18

Why does it work?

• Sometimes tricky (see paper), but…

• NL has very high affordance

• NL can act as it’s own metalanguage

• XML contents + NL usually mix pretty well

Yves Marcoux - OLST-RALI - 21 mars 2007 19

Intertextual semantics

• Meaning of a text fragment is given by placing it in a network of other texts

• That network can simply consist in a sentence (or “quasi-sentence”)

• Or more elaborate topology: peritexts can contain hyperlinks, determining sense-making / learning paths– Too much hyperlinking can spoil the idea!

Yves Marcoux - OLST-RALI - 21 mars 2007 20

Interpretation workflow

• d is document or fragment, H is a human• S(d) is the intertextual semantics of d• S(d) is in NL• S is machine computable• Actual meaning of d for H may vary:

– with H– for a same H, from one “reading” of S(d) to

another

d S(d) actual “meaning” of d for HS H

Yves Marcoux - OLST-RALI - 21 mars 2007 21

Interpretation workflow

d

d S(d)

H1

H1

H2

H2

H3

H3

Yves Marcoux - OLST-RALI - 21 mars 2007 22

Suggests a modeling process

• Modeler starts with the prose

• Identify peritexts

• Work out more and more abbreviated forms– Will correspond to different “views” in the

editor

• Tersest level gives markup

• Increase model usability?

Yves Marcoux - OLST-RALI - 21 mars 2007 23

Mixed content question revisited

• Known: can get rid of mixed content with<!ELEMENT text (#PCDATA)>

Example:<!ELEMENT (e1 | e2 | … | #PCDATA)*>

becomes:<!ELEMENT (e1 | e2 | … | text)*>

• Why does it feel bad?– Tags “text” are not abbreviations of any

reasonable peritexts!

Yves Marcoux - OLST-RALI - 21 mars 2007 24

Is NL too much to ask for?

• Relative to some “target” community

• Can go a long way (previous slide)

• Hyperlinks are allowed in peritexts– Allows defining “sense-making” or learning

paths

• (Almost) anything formal can be turned into NL…

Yves Marcoux - OLST-RALI - 21 mars 2007 25

NL as formalism common denominator

Expression in artificial formalism

Textbook explaining formalism STAPLER

Equivalent expression in NL

Yves Marcoux - OLST-RALI - 21 mars 2007 26

Editing setup without intertextual semantics

Modeler

Author

Valid XMLinstance or fragment

World

NL and presupposed

knowledge of target community

XML EDITOR

XML DTD

Doc. / tr.material

Yves Marcoux - OLST-RALI - 21 mars 2007 27

Editing setup with intertextual semantics

Modeler

Author

Valid XMLinstance or fragment

World

NL and presupposed

knowledge of target community

XML EDITOR

XML DTDtext-before

and text-aftersegments

NL equivalent

Yves Marcoux - OLST-RALI - 21 mars 2007 28

Structure of the talk

1. The problem

2. Proposed direction for solution

3. Conclusion

4. Question period

Yves Marcoux - OLST-RALI - 21 mars 2007 29

What it suggests

• Bring some of the discipline of producing “good documents” (manuals of style) into model & interface design– E.g., don’t abuse hyperlinking

• Litterate modeling, litterate interfaces– Litterate interface / interaction design

• Benefit: make explicit prerequisite knowledge & sense-making / learning paths

Yves Marcoux - OLST-RALI - 21 mars 2007 30

Other possible uses of intertextual semantics

• Legal documents with multiple renditions• NLP systems that cannot treat markup

– Including full-text indexing• <ex>Hamlet</ex>• “Exit Hamlet”

• Other data models– Ex.: relational

• Normal forms

– A new look at expressivity

Yves Marcoux - OLST-RALI - 21 mars 2007 31

Future work

• Editing:– Work out a few existing / new models– Properly integrate attributes– More powerful peritext computation– Implement ideas in a real editor

• Display peritexts when chosing insertion• Hyperlinks in displayed peritexts

– Experiment with real authors

Yves Marcoux - OLST-RALI - 21 mars 2007 32

Future work

• More than peritexts?

• More than NL (icons, sound, …)?

• Compare with other semantic frameworks– Downstream semantics: Wrightson, Renear

et al.

• Other models

• Tackle litterate modeling / interface design

Yves Marcoux - OLST-RALI - 21 mars 2007 33

Merci!

Questions?