54
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th April, 2011

CS460/626 : Natural Language Processing/Speech, NLP and the Web ( Lecture 38–Universal Networking Language)

  • Upload
    gaye

  • View
    64

  • Download
    0

Embed Size (px)

DESCRIPTION

CS460/626 : Natural Language Processing/Speech, NLP and the Web ( Lecture 38–Universal Networking Language). Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th April, 2011. A Perpective. Discourse. Pragmatics. Semantics. Syntax. Lexicon. Morphology. UNL: a United Nations project. - PowerPoint PPT Presentation

Citation preview

Page 1: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 38–Universal Networking Language)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

14th April, 2011

Page 2: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

A Perpective

Morphology

Lexicon

Syntax

SemanticsPragmatics

Discourse

Page 3: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

UNL: a United Nations project Started in 1996 10 year program 15 research groups across continents First goal: generators Next goal: analysers (needs solving various

ambiguity problems) Current active language groups

UNL_French (GETA-CLIPS, IMAG) UNL_English+Hindi UNL_Italian (Univ. of Pisa) UNL_Portugese (Univ of Sao Paolo, Brazil) UNL_Russian (Institute of Linguistics, Moscow) UNL_Spanish (UPM, Madrid)

Page 4: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

4

World-wide Universal Networking Language (UNL) Project

UNL

English Russian

Japanese

Hindi

Spanish

Language independent meaning representation.

Marathi

Others

Page 5: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Foundations and Applications

UNL Foundations Semantic Relations Universal Words Attributes How to write UNL expressions

UNL Applications Machine Translation: Rule based and

Statistical Search Text Entailment Sentiment Analysis

Page 6: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

UNL represents knowledge: John eats rice with a spoon

Semantic relations

attributes

Universal words

Repositoryof 42SemanticRelations and84 attributelabels

Page 7: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Sentence embeddingsDeepa claimed that she had composed a

poem.[UNL]

agt(claim.@entry.@past, Deepa)obj(claim.@entry.@past, :01)agt:01(compose.@past.@entry.@complete,

she)obj:01(compose.@past.@entry.@complete,

poem.@indef)[\UNL]

Page 8: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

English sentences: basic structure

A <verb> B John eats bread agt(eat.@entry,

John) obj(eat.@entry,

bread) A <verb>

John sleeps aoj(sleep.@entry,

John) A <be> B

John is good aoj(good.@entry,

John)

verb

A

R1

R2

B

A

aoj

verb

BA

R1R2

Page 9: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Hindi sentences: basic structure

A B <verb> John roti khaataa hai agt(eat.@entry, John) obj(eat.@entry,

bread) A <verb>

John sotaa hai aoj(sleep.@entry,

John) A <be> B

John acchaa hai aoj(good.@entry,

John)

verb

A

R1

R2

B

A

aoj

verb

BA

R1R2

Page 10: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

:02:01

Complex English sentences: Use recursion on the basic structure

A <verb> B John who is a good boy eats

bread which is toasted

agt(eat.@entry, :01) obj(eat.@entry, :02) aoj:01(boy, John.@entry) mod:01(boy, good) obj:01(toast,

bread.@entry.@focus)

boy

John

aoj

toast

Bread

obj

eat

:02

:01

agt obj

good

mod

Red arrows indicate entry nodes

Page 11: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

11

Constituents of Universal Networking Language Universal Words (UWs) Relations Attributes Knowledge Base

Page 12: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

12

What is a Universal Word (UW)? Words of UNL Constitute the UNL vocabulary, the

syntactic-semantic units to form UNL expressions

A UW represents a concept Basic UW (an English word/compound

word/phrase with no restrictions or Constraint List)

Restricted UW (with a Constraint List ) Examples:

“crane(icl>device)” “crane(icl>bird)”

Page 13: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

13

The LexiconFormat of the dictionary entry

e.g., [minister] {} “minister(icl>person)” (N,ANIMT,PHSCL,PRSN); Head word Universal word Attributes

Morphological - Pl(plural), V_ed(past tense form)

Syntactic - V(verb),VOA(verb of action) Semantic - ANIMT(animate), PLACE, TIME

[headword] {} “Universal word“ (Attribute list);

Page 14: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

14

The Lexicon (cntd)

Content words:

[forward] {} “forward(icl>send)” (V,VOA) <E,0,0>;

[mail] {} “mail(icl>message)” (N,PHSCL,INANI) <E,0,0>;

[minister] {} “minister(icl>person)” (N,ANIMT,PHSCL,PRSN) <E,0,0>;

Headword Universal Word Attributes

He forwarded the mail to the minister.

Page 15: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

15

The Lexicon (cntd)

function words:

[he] {} “he” (PRON,SUB,SING,3RD)[the] {} “the” (ART,THE) <E,0,0>;[to] {} “to” (PRE,#TO) <E,0,0>;

Headword Universal Word

Attributes

He forwarded the mail to the minister.

Page 16: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Multilingual dictionary

सार्व�भौमशब्दमुख्य शब्द

farmer(icl>creator)farmer

शेतकरी

किकसान N,M,ANIMT,FAUNA,MML,PRSN,Na

N,ANIMT,FAUNA,MML,PRSN

E

M

H

N,M,ANIMT,FAUNA,MML,PRSN

गुण

Page 17: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

17

The Features of a UW Every concept existing in any

language must correspond to a UW The constraint list should be as

small as necessary to disambiguate the headword

Every UW should be defined in the UNL Knowledge-Base (now wordnet)

Page 18: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

18

Restricted UWs Examples

He will hold office until the spring of next year.

The spring was broken. Restricted UWs, which are Headwords

with a constraint list, for example:“spring(icl>season)” “spring(icl>device)”“spring(icl>jump)”“spring(icl>fountain)”

Page 19: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

19

How to create UWs? Pick up a concept

the concept of “crane" as "a device for lifting heavy loads”

or as “a long-legged bird that wade in water in search of food”

Choose an English word for the concept. In the case for “crane", since it is a word of

English, the corresponding word should be ‘crane'

Choose a constraint list for the word. [ ] ‘crane(icl>device)' [ ] ‘crane(icl>bird)'

Page 20: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Example: Hindi word ghar ghar- house

usne garmii me ghar kii marammat kii he renovated the house in the summer

ghar- home office ke baad ghar louto return home after office

Ghar- family bade ghar kii betii girl from a renowned family

Page 21: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Example: ghar (cntd) ghar- own country

bahut saal bidesh me kaam karke ghar louta aayaa

returned home after working abroad for many years

Ghar- astrological position ashtam ghar par budh hai Mercury in in the eighth house

Page 22: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

House in English Wordnet 1. (1029) house -- (a dwelling that serves as

living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house")

3. (51) house -- (a building in which something is sheltered or located; "they had a large carriage house")

4. (39) family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household“;)

Page 23: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

House in English Wordnet 7. (13) house -- (aristocratic family

line; "the House of York") 11. sign of the zodiac, star sign,

sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided)

Page 24: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Unambiguous construction of UWs

Use constraints: Ontological, Semantic and Argument

Example: forward a mail to the minister forward(icl>do, icl>send, agt>thing(icl>animate), obj>thing(icl>inanimate), gol>thing)

Constraint types:icl>do: ontological,icl>send: semanticagt>thing, obj>thing, gol>thing: argument

Page 25: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

UNL Relations

Page 26: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Relations constitute the syntax of UNL Express how concepts (UWs) constitute

a sentence Represented as strings of 3 characters

or less A set of 41 relations specified in UNL

(e.g., agt, aoj, ben, gol, obj, plc, src, tim,…)

Refer to a semantic role between two lexical items in a sentence

Page 27: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

27

AGT / AOJ / OBJ AGT  (Agent)

Definition:  Agt defines a thing which initiates an action

AOJ (Thing with attribute)Definition:  Aoj defines a thing which is in a state or has an attribute

OBJ (Affected thing)Definition: Obj defines a thing in focus which is directly affected by an event or state

Page 28: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

28

Examples John broke the window.

agt ( break.@entry.@past, John)

This flower is beautiful.aoj ( beautiful.@entry, flower)

He blamed John for the accident.obj ( blame.@entry.@past, John)

Page 29: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Example: UNL Graph with agt, obj, ben

objagt

@ entry @ past

baby(icl>child)

carve(icl>cut)

toy(icl>plaything)

he(iof>person) @def

ben

He carved a toy for the baby.

Page 30: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

30

GOL / SRC GOL  (Goal : final state)

Definition:  Gol defines the final state of an object or the thing finally associated with an object of an event

SRC  (Source : initial state)Definition:  Src defines the initial state of object or the thing initially associated with object of an event

Page 31: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

31

GOL I deposited my money in my bank

account.

objagt

@ entry @ past

account(icl>statement)

deposit(icl>put)

money(icl>currency)

I

gol

bank(icl>possession)

modmod mod

I I

Page 32: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

32

SRC They make a small income from fishing.

objagt

@ entry @ present

fishing(icl>business)

make(icl>do)

income(icl>gain)

they(icl>persons)

src

small(aoj>thing)

mod

Page 33: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

33

PUR PUR (Purpose or objective)

Definition:  Pur defines the purpose or objectives of the agent of an event or the purpose of a thing exist

This budget is for food.pur ( food.@entry, budget )mod ( budget, this )

Page 34: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

34

RSN RSN (Reason)

Definition:  Rsn defines a reason why an event or a state happens

They selected him for his honesty.agt(select(icl>choose).@entry, they)obj(select(icl>choose) .@entry, he)rsn (select(icl>choose).@entry, honesty)

Page 35: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

35

TIM TIM (Time)

Definition:  Tim defines the time an event occurs or a state is true

I wake up at noon.agt ( wake up.@entry, I )tim ( wake up.@entry, noon(icl>time))

Page 36: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

36

PLC PLC (Place)

Definition:  Plc defines the place an event occurs or a state is true or a thing exists

Temples are very famous in India.aoj (famous.@entry,

temple@pl )man (famous.@entry, very)plc (famous.@entry, India)

Page 37: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

37

INS INS   (Instrument)

Definition:  Ins defines the instrument to carry out an event

I solved it with computeragt ( solve.@entry.@past, I )ins ( solve.@entry.@past, computer )obj ( solve.@entry.@past, it )

Page 38: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

38

INS

objagt

@ entry @ past

blanket(icl>object)

cover(icl>do)

baby(icl>child)

John(iof>person)

@def

ins

John covered the baby with a blanket.

Page 39: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

39

Attributes Constitute syntax of UNL Play the role of bridging the conceptual world

and the real world in the UNL expressions Show how and when the speaker views what is

said and with what intention, feeling, and so on

Seven types: Time with respect to the speaker Aspects Speaker’s view of reference Speaker’s emphasis, focus, topic, etc. Convention Speaker’s attitudes Speaker’s feelings and viewpoints

Page 40: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

40

Tense: @past

The past tense is normally expressed by @past

{unl}agt(go.@entry.@past, he)…{/unl}

He went there yesterday

Page 41: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

41

Aspects: @progress

{unl}man

( rain.@entry.@present.@progress, hard )

{/unl}

It’s raining hard.

Page 42: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

42

Speaker’s view of reference

@def (Specific concept (already referred))The house on the corner is for sale.

@indef (Non-specific class)There is a book on the desk

@not is always attached to the UW which is negated.

He didn’t come. agt ( come.@entry.@past.@not, he )

Page 43: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

43

Speaker’s emphasis @emphasis

John his name is.mod ( name, he )aoj ( John.@emphasis.@entry, name )

@entry denotes the entry point or main UW of an UNL expression

Page 44: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

How to generate UNL

Page 45: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

45

Early Enco (1996-98)

Analysis windows -Two in number Left Analysis Window (LAW) Right Analysis Window (RAW)

Condition windows - Many in number Left Condition Window (LCW) Right Condition Window (RAW)

LAW

Word2

Word1

Word4

RAW

RCW

Wordn

LCW

Word3

sentence

windows

Page 46: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

46

UNL Rule for a Semantic Relation

;Create relation between V and N2, after resolving the preposition preceding N2

<{V,VOA,:::}{N,TIME,DAY,ONRES,PRERES::tim:}P25;

IFthe left analysis window is on a verb(V) which is

verb of action (VOA) AND

the right analysis window is on a noun (N) and has TIME, DAY attribute for which the preceding preposition (on) has been processed and deleted

THENset up the tim relation between V and N2. (indicated by < at the start of the rule)

Page 47: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

UNL generation using NLP tools and resources

47

Page 48: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

SRS based system

Page 49: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Multi parser based system

Page 50: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Evaluation Recall =

#expressions matched in gold and generated UNL

#expressions expected in gold UNL

Precision =#expressions matched in gold and generated

UNL #expressions in generated UNL

F1 score = 2 * recall * precision recall + precision

Page 51: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Comparison between the two systems

Table Name Accuracy of XLE Parser Based System

Accuracy of Multi-parser based system

evalTb_OXF_V_TO_INF 0.8376 0.8591evalTb_OXF_VN_TO_INF 0.8369 0.8429evalTb_OXF_S_TO_DO_VERB 0.7833 0.7833evalTb_XTAG 0.7181 0.7835evalTb_FRAMENET 0.6618 0.7591evalTb_RADFORD 0.8141 0.8542evalTb_V 0.5920 0.7587evalTb_VN 0.7528 0.7625evalTb_VNN 0.7692 0.7902evalTb_VING 0.7084 0.7084evalTb_VADJ 0.5486 0.6214evalTb_VINF 0.7236 0.7772evalTb_VTHAT 0.7988 0.7999evalTb_TOI_Education 0.3875 0.3669evalTb_test 0.4667 0.4667evalTb_demo 1.0000 1.0000evalTb_Test2 0.3913 0.5116evalTb_t3 0.7155 0.8553evalTb_Barcelona 0.3194 0.3181

Total 0.6489 0.7010

Page 52: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

LanguageProcessing & Understanding

Information Extraction: Part of Speech tagging Named Entity Recognition Shallow Parsing Summarization

Machine Learning: Semantic Role labeling Sentiment Analysis Text Entailment (web 2.0 applications)Using graphical models, support vector machines, neural networks

IR: Cross Lingual Search Crawling Indexing Multilingual Relevance Feedback

Machine Translation: Statistical Interlingua Based EnglishIndian languages Indian languagesIndian languages Indowordnet

Resources: http://www.cfilt.iitb.ac.inPublications: http://www.cse.iitb.ac.in/~pb

Linguistics is the eye and computation thebody

Use of UNL in multiple NLP tasks

Page 53: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

Summing up Some NLP milestones covered

WSD: various approaches SMT Parsing (classical and probabilistic) Phonology, Phonetics, Syllabification,

Transliteration Semantics, UNL

Assignments: to reinforce understanding of lectures

Important topics left out: IR, Similarity measures

Seminars: wide range of topics for breadth and exposure

Lectures: Foundation and depth

Page 54: CS460/626 : Natural Language  Processing/Speech, NLP and the Web ( Lecture  38–Universal  Networking Language)

God Bless!!