View
215
Download
1
Category
Tags:
Preview:
Citation preview
Special Topics in Computer ScienceSpecial Topics in Computer Science
Advanced Topics in Information RetrievalAdvanced Topics in Information Retrieval
Lecture 10: Lecture 10: Natural Language Processing and IR. Natural Language Processing and IR.
Syntax and structural disambiguation Syntax and structural disambiguation Alexander Gelbukh
www.Gelbukh.com
2
Previous Chapter: Previous Chapter: ConclusionsConclusions
Tagging, word sense disambiguation, andanaphora resolution are cases of disambiguation ofmeaning
Useful in translation, information retrieval, and textundertanding
Dictionary-based methods good but expensive
Statistical methods cheap and sometimes imperfect... but not always (if very
large corpora are available)
3
Previous Chapter: Research topicsPrevious Chapter: Research topics
Too many to list New methods Lexical resources (dictionaries) = Computational linguistics
4
ContentsContents
Language levels Syntax
Dependency approach Constituency-based approach Head-driven approach
Grammars and parsing Ambiguity and disambiguation
5
Language levelsLanguage levels
Letters are built up into words Words into sentences Sentences into <...> text
Each level has its own representation This allows for modular processing
A module describes one levelor transforms from one level to another
6
Source of language complexity: 1-DSource of language complexity: 1-D
This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the
Language
Text (speech)
Meaning Meaning
........Text Text.......
Bra
in 1
Brain 2
7
Knowledge Knowledge
Lan-guage
Lan-guage
This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture. This is a text that represents the meaning shown in the right part of the picture.
Text
Source of language complexity: 1-DSource of language complexity: 1-D
8
Linguistic processorLinguistic processortranslates between representationstranslates between representations
Linguisticmodule
Meanings
This is an example of the output text ofthe system. This is an example of theoutput text of the system. This is anexample of the output text of thesystem. This is an example of the outputtext of the system. This is an example ofthe output text of the system. This is anexample of the output text of thesystem. This is an example of the outputtext of the system. This is an example ofthe output text of the system. This is anexample of the output text of thesystem. This is an example of the outputtext of the system. This is an example ofthe output text of the system. This is anexample of the output text of thesystem. This is an example of the outputtext of the system. This is an example ofthe output text of the system. This is anexample of the output text of thesystem. This is an example of the outputtext of the system. This is an example ofthe output text of the system. This is an
Texts
Linguisticmodule
Appliedsystem
9
General scheme of text General scheme of text processingprocessing
L inguistic processor
Applied system
(e.g., Expert system)
Out-put
In-put
(Semantic) representation
Linguistic processor uses linguistic knowledge Applied system uses other types of knowledge
(e.g., Artificial Intelligence)
10
Language levelsLanguage levels
Morphological: words Syntactic: sentences Semantic: meaning Pragmatic: intention ...?
11
This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture. This is a text that represents themeaning shown in the right part of thepicture.
LanguageText Meaning
Morphologicalrepresentation
Syntacticrepresentation
Morpho-logicaltrans-former
Syntac-tic
trans-former
Seman-tic
trans-former
Semanitcrepresentation
Surfacerepresentation
Fine structure of linguistic processor
12
Example of textExample of text
““Science is important for Science is important for our country.our country.
The Government pays it The Government pays it much attention.”much attention.”
13
Textual representationTextual representation
Text is a sequence of letter.
S c i e n c e i s S c i e n c e i s i m p o r t a n t i m p o r t a n t f o r o u r c f o r o u r c o u n t r y . T h e o u n t r y . T h e G o v e r n m e n G o v e r n m e n t p a y s i t t p a y s i t m u c h a t t e n m u c h a t t e n t i o n t i o n ..
14
Linguistic processor
Morpho-logical
analyzer
Semantic analyzer
Syntactic parser
Morphologicalanalysis
Morfological analysisMorfological analysis
15
Morphological Morphological representationrepresentation
A sequence of words.The THE article definite, plural/singular
science SCIENCE noun singular
is BE verb present, 3rd person, sing.
important IMPORTANT adjective
for FOR preposition
our WE pronoun possessive
country COUNTRY noun singular
16
Linguistic processor
Morpho-logical
analyzer
Semantic analyzer
Syntactic parser
Syntacticparsing
Syntactic parsingSyntactic parsing
17
Syntactic representation Syntactic representation
A sequence of syntactic trees.
BE
SCIENCE IMPORTANT
COUNTRY
WE
of
PAY
GOVERNMENT ATTENTION IT
MUCH
18
Syntactic representationSyntactic representation
What happened?
With whom happened?
... their details
PAY
GOVERNMENT ATTENTION IT
MUCH
19
Linguistic processor
Morpho-logical
analyzer
Semantic analyzer
Syntactic parser
Semanticanalysis
Semantic analysisSemantic analysis
Next lecture...Next lecture...
20
SyntaxSyntax
The structure describing the relationships between words in a sentence
Describes the relationships implied by grammatical characteristics not by meaning
Often allows for simple paraphrasing John reads the book The book is read by John
21
Early approach: Dependency syntaxEarly approach: Dependency syntax
Tree Nodes: words Arcs: modified by
Modifies means adds details,clarifies, chooses of many...makes more specific
Arcs are typed Types are: subject, object, attribute, ...
PAY
GOVERNMENT ATTENTION IT
MUCH
Subject
Obje
ct
Recipient
Att
ribute
22
... Dependency syntax... Dependency syntax
General situation: pay More specifically: the one
where: who pays is government what is paid is attention to whom it is paid is it
More specifically: attention that is much
PAY
GOVERNMENT ATTENTION IT
MUCH
Subject
Obje
ct
Recipient
Att
ribute
23
Advantages/disadvantages of Advantages/disadvantages of Dependency SyntaxDependency Syntax
Advantages Solid linguistic base Rather direct translation into semantics Easily applicable to languages with free word order
Korean? Russian, Latin This is why solid linguistic base: good for classical
languages!
Disadvantages No nice mathematical base No simple algorithms
24
Most popular approach: Constituency Most popular approach: Constituency (Phrase Structure grammars)(Phrase Structure grammars)
Tree Nodes: nested segments of the phrase
Cannot intersect, only nested Usually are labeled with part-of-speech names
Arcs: nesting In classical approach, arcs are not labeled
[[Our Government ] [pays [ much attention] [to it ] ] ]
25
ConstituencyConstituency
[[Our Government ] [pays [ much attention] [to it ] ] ]Our Government
pays
much attention
to it
26
ConstituencyConstituency
[[OurR GovernmentN ]NP
[paysV [ muchA attentionN]NP [toP itR ]PP ] VP]S
R: pronoun NP: noun phraseN: noun VP: verb phraseV: verb PP: prepositional phraseA: adjective S: sentence
27
Constituency: graphical representationConstituency: graphical representation
[[Our Government ]NP [pays [ much attention]NP [to it ]PP ] VP]S
S VP
NP NP PP
NP VP NP NP
R N V A N P R
Our Government pays much attention to it
28
Phrase structure grammarPhrase structure grammar
Enumerates possible configurations at nodes Usually recursive
S NP VP
NP A NP
NP R NP
NP P NP
NP N
VP VP NP PP
VP V
S VP
NP NP PP
NP VP NP NP
R N V A N P R
Our Government pays much attention to it
29
Context-independency hypothesisContext-independency hypothesis
A configuration is possible or not,regardless of where it is used Wherever you find VP NP PP, it can be VP Wherever you find NP VP, it can be S If you can put together S that covers all the sentence,
it is a grammatically correct description With this, given a suitable grammar, you can
List all sentences of a language List only correct sentences of that language
List all and only correct structures Correctness means a native speaker’s intuition
30
Generative ideaGenerative idea
Find a grammar to list all and only correct sentences (with their structures) of a language
This is a complete description of that language!
How can be useful in analysis? Reverse the grammar
31
ParsingParsing
Given a grammar and a sentence Find all possible structures That describe this sentence with this grammar
Many methods. Not discussed today.A lot of research. Very fast algorithms
Complexity: cubic in the number of words in the sentence (there are better methods, up to 2.8)
Problem: combinatorics of variants
32
Advantages and disadvantages of cAdvantages and disadvantages of consitituency approachonsitituency approach
Advantages Nice mathematics, very well understood Efficient analysis algorithms, very well-elaborated Good for languages with fixed word order
English. Chinese?
Disadvantages Difficult translation into semantics Bad when it comes to freer word order
Even in English! Worse in other languages
33
Head-driven approachesHead-driven approaches
Combine some advantages of dependency-based and constituency-based approaches
Syntax is still fixed-order. But word dependency information is added Easier translation into semantics More linguistically-based
How? In each constituent, the main word (head) is marked It modifies the head of the larger constituent
[[Our Government ] [pays [ much attention] [to it ] ] ]
34
Syntactic ambiguitySyntactic ambiguity
I see a cat with a telescope I see [a cat] [with a telescope]
I use a telescope to see a cat
I see [a cat [with a telescope]] I see a cat that has a telescope
Nearly any preposition causes ambiguity Dozens, thousands, millions of variants for a sentence!
Because their numbers multiply I see a cat with a telescope in a garden at the shore of a river
35
Ambiguity resolutionAmbiguity resolution
Syntactic means are not enough Is telescope more related to see or to cat?
Statistical methods: is it used with see or cat? Dictionary-based methods: does it share more meaning
with see or cat?• Path length in a dictionary of semantic relationships
Ideally, context should be analyzed, and reasoning applied: I see a cat with a telescope. It keeps the telescope in its
left paw. Now no good methods for this.
36
Shallow parsingShallow parsing
Due to the HUGE problems in resolving ambiguity Do not resolve it! Do what you can de wellI see [a cat] [with a telescope] [in a garden] [at the shore] [of a river]
Better than nothing Can be done well
37
EvaluationEvaluation
PARSEVAL international contents A practical parser usually gives only one variant
Implies disambiguation!
Manually built corpora (treebanks) Compare what the program did with what humans di
d
38
One of the uses in IR:One of the uses in IR:Lexical ambiguity resolutionLexical ambiguity resolution
Syntactic analysis helps in POS disambiguation: Oil is used well in Mexico. Oil well is used in Mexico. Well = ?
But does not help in WSD: I deposited my money in an international bank. I live on a beautiful bank of Han river.
39
Research topicsResearch topics
Faster algorithms E.g. parallel
Handling linguistic phenomena not handled bycurrent approaches
Ambiguity resolution! Statistical methods A lot can be done
40
ConclusionsConclusions
Syntax structure is one of intermediate representationsof a text for its processing
Helps text understanding Thus reasoning, question answering, ...
Directly helps POS tagging Resolves lexical ambiguity of part of speech But not WSD-type ambiguities
A big science in itself, with 50 (2000?) years of history
41
Thank you!Till June 8? 6 pm
Semantics
Recommended