View
234
Download
0
Tags:
Embed Size (px)
Citation preview
Outline
• Natural Language Processing• Augmentative and Alternative Communication• My work - Generation of messages
– How does the process look like– What is needed
• NLP in AAC– Word prediction– Message generation– IR methods
Natural Language Processing
Applications/models with usage of linguistic knowledge, or that provide linguistic knowledge (POS taggers, parsers etc.)
• Language applications:– Machine translation– Text summarization– Information retrieval/extraction– Human Computer interface.
Alternative and Augmentative Communication• AAC Users
– Congenital diseases e.g. cerebral palsy– Progressive diseases e.g. ALS Amyotrophic Lateral Sclerosis (Lou
Gehrig's Disease)
– Trauma e.g. head injury• Cognitive disabilities vs. physical disabilities (each requires
different methods and assumptions).• Slow rate of conversation
– Speech rate 150-200 wpm, skilled typist 60 wpm– Speech prosthesis users: 10-15 wpm
• Each ‘key stroke’ may consume a lot of energy.• Trade off between conversation rate and cohesion of utterances.
AAC Techniques
• Simple pointing on boards and letter charts• Portable keyboard devices• Computer-based systems using single-switch
access for severely impaired subjects.• Symbols or letters
– Various symbol systems (Blissymbolics) /sets (PCS).
• Pre-stored phrases accessible via grid or iconic buttons.
AAC and NLP
• Common issues:– Text generation– Speech recognition– Text to speech synthesis– Information retrieval.
• 3 workshops, 1 special edition in journal (Natural Language Engineering).
Our framework
• Natural language generation – Content planning– Surface realization
• Lexical choice
• Syntactic realization
• Morphological processing.
• FUF/SURGE, HUGG
• Lexicon (Jing et al.)
Examples
• ME / TO SEE / CAT / TO EAT
I saw the cat eating.• CAT / TO EAT / TO SEE / ME
The cat ate and I saw it
The cat that ate saw me.
Blissymbolics• Invented by Charles Bliss 1965 as a written universal
language.• Adapted by Canadian speech therapists in the early 70s.:
successful alternative to verbal comm.• Consists of approx. 100 basic symbols.• Language now consists of more than 2000 complex
‘words’.• Not so easy to learn but…
– Enables good novel/personal expressions– Good basis for literacy– Adults like to use it too.
Syntax: Ambiguity• ME / SEE / CAT / TO EAT
I saw the cat eating.• CAT / TO EAT / SEE / ME
The cat ate and I saw itThe cat that ate saw me.
• However, users of AAC don’t usually obey the word order of spoken language:– go+girl+house or girl+house+go or house+go+girl
– Two+bed+sleep+boy+one+girl+white+bed+brown+bed (the boy and the girl are sleeping in two bed, one in a white bed and the other in a brown bed).
Pragmatics
• where situation is taking place,
• who’s the hearer, – Good morning vs. Hi– Open the window vs. Can you please open the
window?
• Gestures (facial, body)
Contextual Resources
• What is the context of the things that are said, following what was already said before, referential expressions.
• In a restaurant you can talk about “the menu”
• In front of a computer, “the menu” is a set of commands.
Textual Context: Previous Utterances
someone
a person
a woman
a mother
a female parent
Lucy
she
etc…
I met Lucy.She looks great
Generating from Symbols: Issues
• Syntactic ambiguity• Contextual ambiguity• No strict rules for use of symbols – Individual codes,
conventions, abbreviations.• Textual – how one word affects the choice of another,
ordering words, fluency.• Practical: Enhancing communication rate w/o limiting
expressing abilities.– (efficient keyboard setup, word prediction, structure
prediction).
An overview on Architecture
Content planning
lexicalization
Syntacticrealization vocalization
Task: Parsing – Requirements: world knowledge
Lexical informationOutput: well formed input for
syntactic realizer
Task: generating well formed sentences
Tools: Valliant’s conceptual graphs parsing
Lexicon for verbs (Jing et al.)Bliss Lexicon
Tools: SURGE/HUGG
Lexicon
• Mapping concepts - symbols to word
• Compositional vs. non-compositional
• Organization of symbols for efficient retrieval.– (POS, semantic connections)
• Available lexical knowledge– Syntactic structure, irregularities etc.
Methodology
Test interaction of different aspects– Word/symbol/ structure prediction
With more specific questions:– Concepts to words– Referential expression generation– Pragmatic considerations.
Word -> Symbols
Symbols -> Symbol | Symbol near Symbols
Symbol -> FeaturedComponent | Symbol++FeaturedComponent
FeaturedComponent -> (Atomic)Position+Size+Direction
Atomic -> Pictographs | Arbitrary
Pictographs -> protection, house, circle, plus, pointer^, arrow, room, body, legs, chair, water, wheel, feeling
Arbitrary -> Articles | Numerals | Math-sign | Bliss-arbitrary
Position -> Vertical and Horizontal
Vertical->VerticalPosition Spacing @ VerticalSigner
VerticalPosition -> right, left, centralized
VerticalSigner -> skyline, midline, earthline
Horizontal -> HorizontalPosition Spacing @ HorizontalSigner
HorizontalPosition -> above, under, centeralized
HorizontalSigner -> leftline, middleline, rightline
Spacing -> zero | one | two ;; the distance between the constituents
Size -> full, half, quarter
Direction -> as-is, horizontal, vertical, left, right, upside-down | Direction-Direction
Bliss-arbitrary -> action , enclosure, multiplication, evaluation, nature, horizontal-line, vertical-line,
Word Prediction
• Simple non-linguistic methods - possibly up to 50% savings of keystrokes.
• Required – improvement, • Including syntactic/semantic knowledge in the
prediction process, using machine learning methods, based on corpus analysis
• Methods: – Frequency-based models (bi/tri-grams)– Grammatical and conceptual modeling to predict well
formed utterances – such as the use of POS tags.
Message Generation
• Language generation from reduced input– Telegraphic text
[Cushler Badman Demasco and McCoy]• think red hammer break John =>
I think that the red hammer was broken by John.
– Cogeneration [Copestake]• Construction of full sentences from templates.
– PVI
• Main assumption: order of word choice implies topicalization and should be considered.
The common architecture
Iconic/telegraphic input
Semantic parser :Identification of
predicatorUnification of Arguments.
Lexical choice
Syntactic realization:Closed-words selection
Linearizationmorphology
Cogeneration approach • Situation-based approach.• A set of pre-defined templates :
Topic of discussion: <> Participants: <>Time of discussion: <> (optional)You know <participants> talking about <topic>
Prefer:You know we were talking at breakfast about buying a desk lamp.On ambiguous:You know we were talking about buying a desk lamp at breakfast.Templates W/o cogeneration:You know us talking about buy desk lamp breakfast.
PVI• Paradigmatic dimension: icons organized in taxemes,
further grouped in samantic domains.• Syntagmatic dimension: build a casual structure of
predicative concepts.• Meaning of an icon: the features that distinguish it from
the other icons. • Semantic analysis: reconstructing the meaning of the
icon sequence – building a semantic network.• Lexical choice – assuming there is no bijection mapping
of icons/words.• Generation
Message Selection Systems
• Discourse structure
• Talk:About univ. of Dundee
• A user uses pre-stored sentences.
• The sentences are indexed using rhetorical structure assumptions.
Language Simplification and Language Understanding• PSET project [Carroll et al.]
• Intended for aphasic readers – with lexical or syntactic impairments.
• Syntactic simplification: – Passive to active
• Lexical simplification: lookup for synonyms, use most frequent.