81
Drop me a mail: Drop me a mail: [email protected] [email protected] Visit me at: Visit me at: http://rushdishams.googlepages.com http://rushdishams.googlepages.com 1 Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh

L1 l2 l3 introduction to machine translation

Embed Size (px)

Citation preview

Page 1: L1 l2 l3  introduction to machine translation

Drop me a mail: Drop me a mail: [email protected]@yahoo.com

Visit me at: Visit me at: http://rushdishams.googlepages.comhttp://rushdishams.googlepages.com

1Rushdi Shams, Lecturer, Dept of CSE, KUET, Bangladesh

Page 2: L1 l2 l3  introduction to machine translation
Page 3: L1 l2 l3  introduction to machine translation
Page 4: L1 l2 l3  introduction to machine translation

O OO O

V

Page 5: L1 l2 l3  introduction to machine translation
Page 6: L1 l2 l3  introduction to machine translation
Page 7: L1 l2 l3  introduction to machine translation
Page 8: L1 l2 l3  introduction to machine translation
Page 9: L1 l2 l3  introduction to machine translation
Page 10: L1 l2 l3  introduction to machine translation
Page 11: L1 l2 l3  introduction to machine translation
Page 12: L1 l2 l3  introduction to machine translation
Page 13: L1 l2 l3  introduction to machine translation
Page 14: L1 l2 l3  introduction to machine translation
Page 15: L1 l2 l3  introduction to machine translation
Page 16: L1 l2 l3  introduction to machine translation
Page 17: L1 l2 l3  introduction to machine translation
Page 18: L1 l2 l3  introduction to machine translation
Page 19: L1 l2 l3  introduction to machine translation
Page 20: L1 l2 l3  introduction to machine translation
Page 21: L1 l2 l3  introduction to machine translation
Page 22: L1 l2 l3  introduction to machine translation
Page 23: L1 l2 l3  introduction to machine translation
Page 24: L1 l2 l3  introduction to machine translation
Page 25: L1 l2 l3  introduction to machine translation
Page 26: L1 l2 l3  introduction to machine translation
Page 27: L1 l2 l3  introduction to machine translation

• Peter mentioned the book I sent to Marry

Page 28: L1 l2 l3  introduction to machine translation

• We will give medicines to pregnant women and children

Page 29: L1 l2 l3  introduction to machine translation

• I saw the boy with the telescope

Page 30: L1 l2 l3  introduction to machine translation

• The painter put on another coat

Page 31: L1 l2 l3  introduction to machine translation

• We like flying planes

Page 32: L1 l2 l3  introduction to machine translation

• The judge threw the book at him

Page 33: L1 l2 l3  introduction to machine translation

• Visiting relatives can be tiresome

Page 34: L1 l2 l3  introduction to machine translation

• Da Vinci liked to paint his models nude.

Page 35: L1 l2 l3  introduction to machine translation

• He wrote the note yesterday

Page 36: L1 l2 l3  introduction to machine translation

• You mean you carried the information by a bus?

Page 37: L1 l2 l3  introduction to machine translation

• Connecting wires are tiring in DLD lab

Page 38: L1 l2 l3  introduction to machine translation

• Squad helps dog bite victim

Page 39: L1 l2 l3  introduction to machine translation

Why use computers in translation?

• Too much translation for humans• Technical materials too boring for humans• Greater consistency required• Need results more quickly• Not everything needs to be top quality• Reduce costs

• any one of these may justify machine translation or computer aids

Page 40: L1 l2 l3  introduction to machine translation

Components of a LanguageComponents of a Language

• There are three components of a language‐There are three components of a language1. Lexicon

C i i2. Categorization3. Grammar Rules

Page 41: L1 l2 l3  introduction to machine translation

LexiconLexicon

stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | ....

is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | ……

right | left | east | south | back | smelly | right | left | east | south | back | smelly | ……

| | | | | f | | | || | | | | f | | | |here | there | nearby | ahead | right | left | east | south | back | here | there | nearby | ahead | right | left | east | south | back | ……

me | you | I | it | S=HE me | you | I | it | S=HE | | Y’ALL Y’ALL ……

John | Mary | Boston | UCB | PAJC |John | Mary | Boston | UCB | PAJC | ……John | Mary | Boston | UCB | PAJC | John | Mary | Boston | UCB | PAJC | ……

the | a | an | the | a | an | ……

to | in | on | near | to | in | on | near | ……

and | or | but | and | or | but | ……

0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 90 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 42: L1 l2 l3  introduction to machine translation

CategorizationCategorization

NounNoun > > stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | stench | breeze | glitter | nothing | wumpus | pit | pits | gold | east | ....

VerbVerb > > is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | is | see | smell | shoot | feel | stinks | go | grab | carry | kill | turn | ……

AdjectiveAdjective > > right | left | east | south | back | smelly | right | left | east | south | back | smelly | ……

| | | | | f | | | || | | | | f | | | |AdverbAdverb >> here | there | nearby | ahead | right | left | east | south | back | here | there | nearby | ahead | right | left | east | south | back | ……

PronounPronoun >> me | you | I | it | S=HE me | you | I | it | S=HE | | Y’ALL Y’ALL ……

NameName >> John | Mary | Boston | UCB | PAJC |John | Mary | Boston | UCB | PAJC | ……NameName John | Mary | Boston | UCB | PAJC | John | Mary | Boston | UCB | PAJC | ……

ArticleArticle > > the | a | an | the | a | an | ……

PrepositionPreposition > > to | in | on | near | to | in | on | near | ……

ConjunctionConjunction > > and | or | but | and | or | but | ……

DigitDigit > > 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 90 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

Page 43: L1 l2 l3  introduction to machine translation

Grammar StructureGrammar Structure

• In this lecture and the one following it  In this lecture and the one following it, attending it carefully does not mean you know all of English languageg g g

• Because, that will take you to read NLP as one subject for 4 years! ☺subject for 4 years! ☺

• We will learn how to define the basic grammar structure for NLP systemsstructure for NLP systems

• We will also learn what things you need to keep in your head while devising such systemskeep in your head while devising such systems

Page 44: L1 l2 l3  introduction to machine translation

Syntactic TreeSyntactic Tree

• Human recognizes the organization of words g gaccording to their POS in a sentence with trees.

• Are you denying?• Well you can. Because, you didn’t learn it this way in 

your childhood.• No one did!• No one did!• But it has been proved that our brain draws a tree 

like structure when we first develop our skills on planguage

• That research is beyond this lecture

Page 45: L1 l2 l3  introduction to machine translation

Syntactic TreeSyntactic Tree

• So  if you really do that unintentionally  So, if you really do that unintentionally, then why not learn it on pen and paper so that you can understand how you will teach that you can understand how you will teach machines to learn languages?

• The tree structure human contemplates is • The tree structure human contemplates is called syntactic tree

Page 46: L1 l2 l3  introduction to machine translation

Parsing a Syntactic TreeParsing a Syntactic Tree

• Parsing is the process of using grammar Parsing is the process of using grammar rules to determine whether a sentence is legal  and to obtain its syntactical structurelegal, and to obtain its syntactical structure

• ‘The large cat eats the small rat’

Page 47: L1 l2 l3  introduction to machine translation

ParsingParsing

The large cat eats the small rat

Page 48: L1 l2 l3  introduction to machine translation

ParsingParsing

Article adjective noun VerbArticle adjective noun

Article adjective noun

Verb

The large cat eats the small rat

Page 49: L1 l2 l3  introduction to machine translation

ParsingParsing

Article adjective noun noun phraseVerbArticle adjective noun noun phrase

Article adjective noun

Verb

The large cat eats the small rat

Page 50: L1 l2 l3  introduction to machine translation

ParsingParsing

Noun phrase verb phrase

Article adjective noun Verb noun phrase

Noun phrase verb phrase

Article adjective noun Verb noun phrase

Article adjective noun

The large cat eats the small rat

Page 51: L1 l2 l3  introduction to machine translation

Parsing

t

Parsing

Noun phrase verb phrase

sentence

Article adjective noun Verb noun phrase

Noun phrase verb phrase

Article adjective noun Verb noun phrase

Article adjective noun

The large cat eats the small rat

Page 52: L1 l2 l3  introduction to machine translation

Syntactic Tree

• The point where lines begin or end is called node• Each node has labels like S  PP or chasedEach node has labels like S, PP or chased• If 2 nodes are connected by a line, the upper node is immediate 

dominator of the lower node. D is the immediate dominator of the• Upper nodes in a branch are called dominators. NP is the dominator Upper nodes in a branch are called dominators. NP is the dominator 

of  D, N, the, dog

Page 53: L1 l2 l3  introduction to machine translation

Syntactic TreeSyntactic Tree

• Two nodes are sisters if they are immediately dominated by same node  D and N are sisterssame node. D and N are sisters.

• The immediate dominator of them is called their mother. NP is the mother of D and N. Similarly, D and N are daughters of NP

• The immediate dominators of them are called their parents

Page 54: L1 l2 l3  introduction to machine translation

Syntactic TreeSyntactic Tree

• Constituents are the terminal nodes that   ll d i d b     i l   i l are all dominated by a single non‐terminal 

node. Chased a cat into the garden are constituents as they are dominated by VPconstituents as they are dominated by VP

Page 55: L1 l2 l3  introduction to machine translation

Label BracketingLabel Bracketing

I  i       f  i   h   i    i   h  • It is a process of representing the syntactic tree in another way.

Page 56: L1 l2 l3  introduction to machine translation

Do yourself: Label Bracket the treeDo yourself: Label Bracket the tree

Page 57: L1 l2 l3  introduction to machine translation

R b      h     i   h  Remember, you may have to practise the reverse‐ constructing a syntactic tree 

f  l b l b k i  ☺from label bracketing ☺

Page 58: L1 l2 l3  introduction to machine translation

Constituents and CategoriesConstituents and Categories

• Tree structure provides two information‐Tree structure provides two information1. It divides the sentence into constituents

(in English  these are called phrases)(in English, these are called phrases)2. It puts them into categories (NP, VP, etc)

Page 59: L1 l2 l3  introduction to machine translation

Constituents and CategoriesConstituents and Categories

• How do we know what would be the right way to group g y g pwords into right category?

• How do we know into the garden is a category, but a cat i t  i   t?into is not?

• Any words that can be moved as group are probablyconstituents‐ the meaning of the dog chased a cat into g gthe garden and into the garden, the dog chased a cat. 

• Which one did you move? Into the garden‐ right?• And the meaning did not change• That’s probably our constituent

Page 60: L1 l2 l3  introduction to machine translation

Constituents and CategoriesConstituents and Categories

• Any string of words that can be deleted is Any string of words that can be deleted is probably a constituent

• If you omit into the garden from the sentence, y g ,nothing is changed grammatically.

• Usually, meaning of unit of words makes sense. y gInto the garden is much more meaningful than a cat into

Page 61: L1 l2 l3  introduction to machine translation

Constituents and CategoriesConstituents and Categories

• However, we are only talking about syntactic However, we are only talking about syntactic structure, not the semantic one.

• The dog, the cat and the garden‐ their grammar g, g gstructure is saying they are all noun phrases. 

• It means, they can be used interchangeably‐ no y g ylinguist can deny that

• Then what about‐ “The garden chased the cat into the dog”? ☺☺

• We will not focus on semantics, said you before!

Page 62: L1 l2 l3  introduction to machine translation

AmbiguityAmbiguity

• There are 2 types of ambiguity‐yp g y1. Lexical Ambiguity: Sentence contains an 

idiom/word/term that has more than one meaning.Glasses means both drinking glasses and spectacles

2. Structural Ambiguity: Sentence has more than one syntactic treeone syntactic treeI saw the boy with the telescope‐Did you see the boy with a telescope? OrDid you see the boy with a telescope? OrDid you see the boy who was having a telescope?

Page 63: L1 l2 l3  introduction to machine translation

Structural AmbiguityStructural Ambiguity

Page 64: L1 l2 l3  introduction to machine translation

Difficulties with Natural Language:Anaphora

•• Using pronouns to refer back to entities Using pronouns to refer back to entities Us g p o ou s to e e bac to e t t esUs g p o ou s to e e bac to e t t esalready introduced in the textalready introduced in the text

After Mary proposed to John, After Mary proposed to John, theythey found a found a preacher and got married.preacher and got married.For the honeymoon, For the honeymoon, theythey went to Hawaiiwent to HawaiiMary saw a ring through the window and asked Mary saw a ring through the window and asked John for John for ititMary threw a rock at the window and broke Mary threw a rock at the window and broke itit

Page 65: L1 l2 l3  introduction to machine translation

Difficulties with Natural Language:Indexicality

•• Indexical sentences refer to utterance Indexical sentences refer to utterance Indexical sentences refer to utterance Indexical sentences refer to utterance situation (place, time, S/H, etc.)situation (place, time, S/H, etc.)

I am over I am over herehereWhy did you do Why did you do thatthat??

Page 66: L1 l2 l3  introduction to machine translation

Difficulties with Natural Language:Metonymy

•• Using one noun phrase to stand for anotherUsing one noun phrase to stand for anotherUsing one noun phrase to stand for anotherUsing one noun phrase to stand for another

I'   dI'   d Sh kSh kI've readI've read ShakespeareShakespeareChryslerChrysler announced record profitsannounced record profitsThe The ham sandwichham sandwich on Table 4 wants another on Table 4 wants another beerbeer

Page 67: L1 l2 l3  introduction to machine translation

Difficulties with Natural Language:Metaphor

•• “Non“Non‐‐literal" usage of words and phrases  literal" usage of words and phrases  NonNon literal  usage of words and phrases, literal  usage of words and phrases, often systematic.often systematic.

I've tried killing the process but it won't die. I've tried killing the process but it won't die. I    k  i   liI    k  i   liIts parent keeps it alive.Its parent keeps it alive.

Page 68: L1 l2 l3  introduction to machine translation

Semantics in NLSemantics in NL

• I can't untie that knot with one hand.ca t u t e t at ot t o e a d.– The sentence is about the abilities of whoever spoke or wrote it. (Call this person the speaker.)

– It's also about a knot, maybe one that the speaker is pointing at

– The sentence denies that the speaker has a certain– The sentence denies that the speaker has a certain ability. (This is the contribution of the word `can't'.)

– Untying is a way of making something not tied.– The sentence doesn't mean that the knot has one hand; it has to do with how many hands are used to do the untyingdo the untying.

Page 69: L1 l2 l3  introduction to machine translation

Problems in Semantics in NLProblems in Semantics in NL

• If you do not understand certain you do ot u de sta d ce tacharacteristics of linguistics, you will not be able to understand the semantics.

• If you do understand them, you need to feel them

• If you do feel them, you need to see the context• If you see the context, you are dealt with both 

ti   d  ti  i  NLsemantics and pragmatics in NL• ☺

Page 70: L1 l2 l3  introduction to machine translation

SynonymySynonymy

• Synonyms are different words (or sometimesSynonyms are different words (or sometimes phrases) with identical or very similar meaningsmeanings.

• Words that are synonyms are said to be synonymous and the state of being abe synonymous, and the state of being a synonym is called synonymy

Page 71: L1 l2 l3  introduction to machine translation

SynonymySynonymy

• student and pupil (noun)student and pupil (noun)

• buy and purchase (verb)

i k d ill ( dj i )• sick and ill (adjective)

• quickly and speedily (adverb)

• on and upon (preposition)

Page 72: L1 l2 l3  introduction to machine translation

SynonymySynonymy

• Note that synonyms are defined with respectNote that synonyms are defined with respect to certain senses of words 

• pupil as the "aperture in the iris of the eye" is• pupil as the aperture in the iris of the eye is not synonymous with student. 

Si il l h i d h h• Similarly,he expiredmeans the same as he died, yetmy passport has expired cannot be 

l d b h di dreplaced bymy passport has died. 

Page 73: L1 l2 l3  introduction to machine translation

AntonymyAntonymy

• Antonyms are words with opposite or nearlyAntonyms are words with opposite or nearly opposite meanings. For example:

• short and tall• short and tall

• dead and alive

• increase and decrease

Page 74: L1 l2 l3  introduction to machine translation

HomonymyHomonymy

• a homonym is one of a group of words thata homonym is one of a group of words that share the same spelling and the same pronunciation but have different meanings, usually as a result of the two words having different origins. 

• The state of being a homonym is called homonymy. 

• bark (the sound of a dog) and bark (the skin of a tree).

Page 75: L1 l2 l3  introduction to machine translation

HeteronymyHeteronymy

• heteronyms (also known asheterophones) areheteronyms (also known asheterophones) are words with identical spellings (or characters) but different pronunciations and meaningsbut different pronunciations and meanings.

Page 76: L1 l2 l3  introduction to machine translation

Monolingual ambiguity

• morphological ambiguity:– German -en: noun plural, dative plural, weak noun non-nominative, adjective

masculine non-nominative, etc.• compound nouns:

– coincide -> coin+cide, cooperate -> cooper+ate• category ambiguity:

– round: the first round (noun), to round up cattle (verb), the round table (adjective), go on a voyage round the Mediterranean (preposition), it measure three feet round (adverb), etc.

• homographs and polysemes:– branch: ‘of a tree’, ‘of a bank’; crane (a bird or lifting machine)– ball: The ball rolled down the hill, The ball lasted until midnight

Page 77: L1 l2 l3  introduction to machine translation

Bilingual lexical ambiguity

• English wall: German Mauer (outside) or Wand (inside)• English river: French fleuve (major) or rivière (general term)• English leg: French jambe (human), patte (animal, insect), pied (table), étape (journey)• English blue: Russian goluboi (pale blue) or sinii (dark blue)• French louer: English hire or rent• German leihen: English borrow or lend• English wear: Japanese haoru (coat/jacket), haku (shoes/trousers), kaburu (hat), hameru

(ring/gloves), shimeru (belt/tie/scarf), tsukeru (brooch/clip), kakeru (glasses/necklace)

• resolvable by:– rules (indicating allowable or usual categories or types of subjects, objects, verbs, etc.)– collocations (specifying particular adjacent words)– frequencies (most probable adjacent or dependent words)

Page 78: L1 l2 l3  introduction to machine translation

Structural ambiguity• Flying planes can be dangerous• The man saw the girl with a telescope• John mentioned the book I sent to Mary• I told everyone concerned about the strike

– everyone concerned/involved/relevant, or: everyone disturbed/worried• He noticed her shaking hands

– either which were shaking from cold, or which were shaking other hands• They complained to the guide that they could not hear

– that as relative pronoun (‘whom they could not hear’) or as complementizer (‘that they could not hear him’)

• The mathematics students sat their examinations• The mathematics students study today is very complex

– difficulty of identifying noun compound vs. relative clause• Gas pump prices rose last time oil stocks fell

– each word potentially noun or verb

Page 79: L1 l2 l3  introduction to machine translation

ReferenceReference

• Richard ThomsonRichard Thomsonhttp://www.eecs.umich.edu/~rthomaso/documents/general/what is semantics htmlents/general/what‐is‐semantics.html

Page 80: L1 l2 l3  introduction to machine translation

ReferenceReference

• NLP for Prolog Programmers by Michael A  NLP for Prolog Programmers by Michael A. CovingtonChapter 4Chapter 4

Rushdi Shams, Dept of CSE, KUET, Bangladesh

1

Page 81: L1 l2 l3  introduction to machine translation

ReferenceReference

• Wikipedia  Wikipedia, http://en.wikipedia.org/wiki/Thematic_relationsns

Rushdi Shams, Dept of CSE, KUET, Bangladesh

1