29
Final Assignment Demo 11 th Nov, 2012 Deepak Suyel Geetanjali Rakshit Sachin Pawar CS 626 – Sppech, NLP and the Web

Final Assignment Demo 11 th Nov, 2012

Embed Size (px)

DESCRIPTION

Final Assignment Demo 11 th Nov, 2012. Deepak Suyel Geetanjali Rakshit Sachin Pawar. CS 626 – Sppech , NLP and the Web. Assignments. POS Tagger Bigram Viterbi Trigram Viterbi A-Star Bigram Discriminative Viterbi Language Model (Word Prediction) Bigram Trigram Yago Explorer - PowerPoint PPT Presentation

Citation preview

Page 1: Final Assignment Demo 11 th  Nov, 2012

Final Assignment Demo11th Nov, 2012

Deepak SuyelGeetanjali Rakshit

Sachin Pawar

CS 626 – Sppech, NLP and the Web

Page 2: Final Assignment Demo 11 th  Nov, 2012

2

Assignments

• POS Tagger– Bigram Viterbi – Trigram Viterbi– A-Star– Bigram Discriminative Viterbi

• Language Model (Word Prediction)– Bigram– Trigram

• Yago Explorer• Parser Projection and NLTK

Page 3: Final Assignment Demo 11 th  Nov, 2012

3

POS Tagger

Page 4: Final Assignment Demo 11 th  Nov, 2012

4

Viterbi: Generative Model

• Most probable tag sequence given word sequence:

• Bigram Model:

• Trigram Model:

Page 5: Final Assignment Demo 11 th  Nov, 2012

5

Discriminative Bigram Model

• Most probable tag sequence given word sequence:

Page 6: Final Assignment Demo 11 th  Nov, 2012

6

A-star Heuristic

• A : Highest transition probability– Static score which can be found directly from the

learned model• B : Highest lexical probability in the given

sentence– Dynamic score

• Min_cost = -log(A)-log(B)• h(n) = Min_cost * (no. of hops till goal state)

Page 7: Final Assignment Demo 11 th  Nov, 2012

7

Comparison of different flavours of POS Taggers

POS Tagger Correct Total Accuracy (%)

Bigram Generative Viterbi

812188.0 862785.0 94.14

Trigram Generative Viterbi

814505.0 862785.0 94.4

A-Star 793441.0 862785.0 91.96

Bigram Discriminative

Viterbi

796890.0 862785.0 92.36

Page 8: Final Assignment Demo 11 th  Nov, 2012

8

Language Model

Page 9: Final Assignment Demo 11 th  Nov, 2012

9

Next word prediction : Bigram Model

• Using language model on raw text

• Using language model on POS tagged text

Page 10: Final Assignment Demo 11 th  Nov, 2012

10

Next word prediction : Trigram Model

• Using language model on raw text

• Using language model on POS tagged text

Page 11: Final Assignment Demo 11 th  Nov, 2012

11

Metrics: Comparing Language Models

• We have used “Perplexity” for comparing two language models.– Language model using only previous word– Language model using previous word as well as

POS tag of previous word• Perplexity is weighted average branching

factor which is calculated as,

Page 12: Final Assignment Demo 11 th  Nov, 2012

12

Results

• Raw text LM :– Word Prediction Accuracy: 12.97%– Perplexity : 5451

• POS tagged text LM :– Word Prediction Accuracy : 13.24%– Perplexity : 5002

Page 13: Final Assignment Demo 11 th  Nov, 2012

13

ExamplesRaw Text - Incorrect POS tagged Text - Correct

• porridgy liquid is : fertiliser• AJ0_porridgy NN1_liquid is : is

• malt dissolve into : terms• NN1_malt VVB_dissolve into : into

• also act as : of• AV0_also VVB_act as : as

Page 14: Final Assignment Demo 11 th  Nov, 2012

14

Examples(Contd.)

• about english literature : and• PRP_about AJ0_english literature : literature

• spoken english was : literature• AJ0_spoken NN1_english was : was

Page 15: Final Assignment Demo 11 th  Nov, 2012

15

Yago Explorer

Page 16: Final Assignment Demo 11 th  Nov, 2012

16

Yago Explorer

• Made use of:– WikipediaCategories– WordnetCategores, and – YagoFacts.

• Modified Breadth First Search (BFS).

Page 17: Final Assignment Demo 11 th  Nov, 2012

17

Algorithm

• Input: Entities E1, E2• Output: Paths between E1 and E2• Procedure:

1. Find WikipediaCategories for E1 and E2. If any category matches, return

2. Find WordNetCategories for E1 and E2. If any match found, return.

3. Find YagoFacts for E1 and E2. If any match found, return4. Expand YagoFacts for E1 and E2. For each pair of

entities from E1 and E2, repeat steps 1-4.

Page 18: Final Assignment Demo 11 th  Nov, 2012

18

Ex:1 Narendra Modi and Indian National Congress

• Path from E1 : Narendra_Modi--livesIn--> Gandhinagar; Gandhinagar--category--> Indian_capital_cities; • Path from E2 : Indian_National_Congress--isLocatedIn--> New_Delhi; New_Delhi--category--> Indian_capital_cities; • Path from E1: Narendra_Modi--isAffiliatedTo--> Bharatiya_Janata_Party; Bharatiya_Janata_Party--category--> Political_parties_in_India; • Path from E2 : Indian_National_Congress--category--> Political_parties_in_India;

Page 19: Final Assignment Demo 11 th  Nov, 2012

19

Ex:2 Mahesh Bhupathi and Mother Teresa

• Path from E1 : Mahesh_Bhupathi--livesIn--> Bangalore; Bangalore--category--> Metropolitan_cities_in_India; • Path from E2: Mother_Teresa--diedIn--> Kolkata; Kolkata--category--> Metropolitan_cities_in_India; • Path from E1 : Mahesh_Bhupathi--hasWonPrize--> Padma_Shri; Padma_Shri--category--> Civil_awards_and_decorations_of_India; • Path from E2 : Mother_Teresa--hasWonPrize--> Bharat_Ratna; Bharat_Ratna--category--> Civil_awards_and_decorations_of_India;

Page 20: Final Assignment Demo 11 th  Nov, 2012

20

Ex:3 Michelle Obama and Frederick Jelinek

• Path from E1 : Michelle_Obama--graduatedFrom--> Princeton_University; Princeton_University--category--> university_108286569; • Path from E2 : Frederick_Jelinek--graduatedFrom--> Massachusetts_Institute_of_Technology; Massachusetts_Institute_of_Technology--category--> university_108286569;

Page 21: Final Assignment Demo 11 th  Nov, 2012

21

Ex:4 Sonia Gandhi and Benito Mussolini

• Path from E1 : Sonia_Gandhi--isCitizenOf--> Italy ; Italy--dealsWith--> Germany ; Germany--isLocatedIn--> Europe ; • Path from E2 : Benito_Mussolini--isAffiliatedTo--> National_Fascist_Party; National_Fascist_Party--isLocatedIn--> Rome; Rome--isLocatedIn--> Europe;

Page 22: Final Assignment Demo 11 th  Nov, 2012

22

Ex5 : Narendra Modi and Mohan Bhagwat

• Path from E1 :– Narendra_Modi--isAffiliatedTo--

>Bharatiya_Janata_Party ; Bharatiya_Janata_Party<--isAffiliatedTo--Hansraj_Gangaram_Ahir ;

• Path from E2 : – Mohan_Bhagwat--wasBornIn-->Chandrapur ;

Chandrapur<--livesIn--Hansraj_Gangaram_Ahir ;

Page 23: Final Assignment Demo 11 th  Nov, 2012

23

Parser Projection

Page 24: Final Assignment Demo 11 th  Nov, 2012

24

ExampleE: Delhi is the capital of IndiaH: dillii bhaarat kii raajdhaani haiE-parse: [ [ [Delhi]NN]NP

[ [is]VBZ [[the]ART [capital]NN]NP [[of]P [[India]NNP]NP]PP]VP

]S

H-parse: [ [ [dillii]NN]NP

[ [[[bhaarat]NNP]NP [kii]P ]PP [raaajdhaanii]NN]NP [hai]VBZ ]VP

]S

Page 25: Final Assignment Demo 11 th  Nov, 2012

25

Resource and Tools

• Parallel corpora in two languages L1 and L2

• Parser for langauge L1

• Word translation model• A statistical model of the relationship between

the syntactic structures of two different languages (can be effectively learned from a bilingual corpus by an unsupervised learning technique)

Page 26: Final Assignment Demo 11 th  Nov, 2012

26

Challenges

• Conflation across languages– “goes” “जा�ता� है�”

• Phrase to phrase translation required; some phrases are opaque to translation– E.g. Phrases like “piece of cake”

• Noise introduced by misalignments

Page 27: Final Assignment Demo 11 th  Nov, 2012

27

Natural LanguageTool Kit

• It is a platform for building Python programs to work with human language data.

• It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet

• It has a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

Page 28: Final Assignment Demo 11 th  Nov, 2012

28

NLTK ModulesLanguage processing task

NLTK modules Functionality

Collocation discovery

nltk.collocationst-test, chi-squared, point-wise mutual information

Part-of-speech tagging

nltk.tagn-gram, backoff, Brill, HMM, TnT

Classification nltk.classify, nltk.clusterdecision tree, maximum entropy, naive Bayes, EM, k-means

Chunking nltk.chunkregular expression, n-gram, named-entity

Parsing nltk.parsechart, feature-based, unification, probabilistic, dependency

Page 29: Final Assignment Demo 11 th  Nov, 2012

29

NLTK Modules (Contd)

Language processing task NLTK modules Functionality

Semantic interpretation nltk.sem, nltk.inferencelambda calculus, first-order logic, model checking

Evaluation metrics nltk.metricsprecision, recall, agreement coefficients

Probability and estimation nltk.probabilityfrequency distributions, smoothed probability distributions

Applications nltk.app, nltk.chatgraphical concordancer, parsers, WordNet browser, chatbots

Linguistic fieldwork nltk.toolboxmanipulate data in SIL Toolbox format