Oct 2009 HLT 2
Acknowledgement
Material for some of these slides taken from J Nivre, University of Gotheborg, Sweden D. Jurafsky & J. Martin
Oct 2009 HLT 3
Human Language Technology
HLT sometimes referred to as Natural Language Processing
focus on linguistic processing Computational Linguistics
focus on understanding language Language Engineering
focus on practical tasks and results
Oct 2009 HLT 4
HLT – Engineering v. Science
Engineering NLP is concerned with the design and
implementation of effective NL input and output components for computational systems (Robert Dale 2000)
Science The use of computers for linguistic research and
applications
Oct 2009 HLT 5
HLT is Interdisciplinary
Linguistics Theoretical Applied
Computer Science Algorithms Compiling Techniques
Artificial Intelligence Understanding, reasoning Intelligent Action
Oct 2009 HLT 10
Web Analytics Data-mining of social media
weblogs, discussion forums, message boards, user groups, and other forms of user generated media
Sentiment analysis, social network analysis Product marketing information Opinion tracking over space and time Social network analysis Buzz analysis (what’s hot, what topics are people
talking about right now).
Oct 2009 HLT 11
HLT can help with
Understanding how language works by implementing complex theories directly
More Natural Communication development of multimodal M/M communication:
language, speech, gesture Development of multilingual applications
Knowledge Management Language is the fabric of the web
Oct 2009 HLT 12
Language Enabled Applications
What makes an application a language processing application (as opposed to any other piece of software)? An application that requires the use of
knowledge about human languages
Example: Is Unix wc (word count) an example of a language processing application?
Oct 2009 HLT 13
Language Enabled Applications
Word count? When it counts words: Yes
To count words you need to know what a word is. That’s knowledge of language.
When it counts lines and bytes: No Lines and bytes are computer artifacts, not linguistic
entities
Oct 2009 HLT 14
Topics: Applications
Small Spelling correction Hyphenation
Medium Word-sense disambiguation Named entity recognition Information retrieval
Big Question answering Conversational agents Automatic Summarisation Machine translation
Stand-alone
Enabling applications
Funding/Business plans
Oct 2009 HLT 15
Big Applications
These kinds of applications require a tremendous amount of knowledge of language.
Consider the following interaction with HAL the computer from 2001: A Space Odyssey
Oct 2009 HLT 16
HAL from 2001
Dave: Open the pod bay doors, Hal. HAL: I’m sorry Dave, I’m afraid I can’t do
that. http://www.youtube.com/watch?v
=kkyUMmNl4hk
Oct 2009 HLT 17
What’s needed?
Speech recognition and synthesis Knowledge of the English words involved
What they mean How groups of words fit together into
groups What the groups mean
How the groups relate to each other.
Oct 2009 HLT 18
What’s needed?
Dialog It is polite to respond, even if you’re planning to
kill someone. It is polite to pretend to want to be cooperative
(I’m afraid, I can’t…)
Oct 2009 HLT 19
Summary of Application Areas Document Processing
Classification Summarisation Information Extraction
Question Answering Information Retrieval Dialogue
Multilinguality Machine Translation Translation tools
Multimodality speech intonation image
Oct 2009 HLT 20
Basic Problems
Analysis Conversion of NL input to internal representations
Generation Conversion of internal representations to NL output
Issues What kind of input/output/representations? Role of learning
Supervised v unsupervised What training data is available?
System Evaluation
Oct 2009 HLT 21
Levels of Linguistic Knowledge
Phonetics/Phonology: sound structure Morphology: word structure Syntax: sentence structure Semantics: meanings Pragmatics: use of language in context Discourse: paragraphs, texts, dialogues
Oct 2009 HLT 22
Processing Pipelines
Each level of knowledge is associated with an encapsulated set of processes.
Interfaces are defined that allow the various levels to communicate.
This often leads to a pipeline architecture.
Oct 2009 HLT 23
Ambiguity
Computational linguists are obsessed with ambiguity
Ambiguity is a fundamental problem of computational linguistics
Resolving ambiguity is a crucial goal Ambiguity arises at different levels of analysis
Oct 2009 HLT 24
Ambiguity – different flavours
LexicalI made her duck
SyntacticYoung men and women
ReferentialShe did it
PragmaticCan you pass the salt?
Oct 2009 HLT 25
Ambiguity
Find at least 5 meanings of this sentence: I made her duck
I cooked waterfowl for her benefit (to eat) I cooked waterfowl belonging to her I created the (plaster?) duck she owns I caused her to quickly lower her head or body I waved my magic wand and turned her into
undifferentiated waterfowl
Oct 2009 HLT 26
Ambiguity is Pervasive
I made her duck
I caused her to quickly lower her head or body Lexical category: “duck” can be a N or V
I cooked waterfowl belonging to her. Lexical category: “her” can be a possessive (“of her”)
or dative (“for her”) pronoun I made the (plaster) duck statue she owns
Lexical semantics: “make” can mean “create” or “cook”
Oct 2009 HLT 27
Ambiguity is Pervasive
Grammar: Make can be: Transitive: (verb has a noun direct object)
I cooked [waterfowl belonging to her] Ditransitive: (verb has 2 noun objects)
I made [her] (into) [undifferentiated waterfowl]
Action-transitive (verb has a direct object and another verb)
I caused [her] [to move her body]
Oct 2009 HLT 28
Ambiguity is Pervasive
Phonetics! I mate or duck I’m eight or duck Eye maid; her duck Aye mate, her duck I maid her duck I’m aid her duck I mate her duck I’m ate her duck I’m ate or duck I mate or duck
Oct 2009 HLT 29
Dealing with Ambiguity
Four possible approaches:1. Tightly coupled interaction among
processing levels; knowledge from other levels can help decide among choices at ambiguous levels.
2. Pipeline processing that ignores ambiguity as it occurs and hopes that other levels can eliminate incorrect structures.
Oct 2009 HLT 30
Dealing with Ambiguity
3. Probabilistic approaches based on making the most likely choices
4. Don’t do anything, maybe it won’t matter1. We’ll leave when the duck is ready to eat.
2. The duck is ready to eat now. Does the “duck” ambiguity matter with respect to whether
we can leave?
Oct 2009 HLT 31
Ways of Studying NLP
By ApplicationMT, IE, IR etc.
By Approachrational vs. empirical
By Linguistic Levelmorphology, syntax etc.
By Algorithm
Oct 2009 HLT 32
Algorithms
State Machines automata and transducers
Rule Systems regular and context free grammars
Search top-down/bottom-up parsing
Probabilistic algorithms
Oct 2009 HLT 33
Organisation of Course
Module 1: Words Linguistics: Morphological Structure Morphological Processing LAB + Assignment I
Module 2: Sentences Linguistics: Syntactic Structure NL Parsing Algorithms LAB + Assignment II
Module 3: Texts Statistics Text Classification LAB + Assignment III