Lecture 18 Natural Language Processing - SDUmarco/DM828/Slides/dm828-lec18.pdf · 2011. 12. 18. ·...

Lecture 18Natural Language Processing

Marco Chiarandini

Department of Mathematics & Computer ScienceUniversity of Southern Denmark

Slides by Dan Klein at Berkeley

RecapSpeech RecognitionMachine TranslationCourse Overview

4 Introduction4 Artificial Intelligence4 Intelligent Agents

4 Search4 Uninformed Search4 Heuristic Search

4 Uncertain knowledge andReasoning

4 Probability and Bayesianapproach

4 Bayesian Networks4 Hidden Markov Chains4 Kalman Filters

4 Learning4 Supervised

Decision Trees, NeuralNetworksLearning Bayesian Networks

4 UnsupervisedEM Algorithm

4 Reinforcement LearningI Games and Adversarial Search

I Minimax search andAlpha-beta pruning

I Multiagent searchI Knowledge representation and

ReasoningI Propositional logicI First order logicI InferenceI Plannning

RecapSpeech RecognitionMachine TranslationOutline

1. Recap

2. Speech Recognition

3. Machine TranslationStatistical MTRule-based MT

RecapSpeech RecognitionMachine TranslationRecap: Sequential data

RecapSpeech RecognitionMachine TranslationRecap: Filtering

RecapSpeech RecognitionMachine TranslationRecap: State Trellis

I State trellis: graph of states and transitions over time

I Each arc represents some transition xt−1 → xt

I Each arc has weight Pr(xt | xt−1)Pr(et | xt)

I Each path is a sequence of statesI The product of weights on a path is the seq’s probabilityI Can think of the Forward (and now Viterbi) algorithms as computing

sums of all paths (best paths) in this graph

RecapSpeech RecognitionMachine TranslationRecap: Forward/Viterbi

RecapSpeech RecognitionMachine TranslationRecap: Particle Filtering

Particles: track samples of states rather than an explicit distribution

RecapSpeech RecognitionMachine TranslationNatural Language

I 100.000 years ago humans started to speak

I 7.000 years ago humans started to write

Machines process natural language to:

I acquire information

I communicate with humans

RecapSpeech RecognitionMachine TranslationNatural Language Processing

I Speech technologies

I Automatic speech recognition (ASR)

I Text-to-speech synthesis (TTS)

I Dialog systems

I Language processing technologies

I Machine translation

I Information extraction

I Web search, question answering

I Text classification, spam filtering, etc.

1. Recap

RecapSpeech RecognitionMachine TranslationDigitalizing Speech

Speech input is an acoustic wave form

RecapSpeech RecognitionMachine TranslationSpectral Analysis

RecapSpeech RecognitionMachine TranslationAcoustic Feature Sequence

RecapSpeech RecognitionMachine TranslationState Space

I Pr(E |X ) encodes which acoustic vectors are appropriate for eachphoneme (each kind of sound)

I Pr(X |X ′) encodes how sounds can be strung together

I We will have one state for each sound in each word

I From some state x, can only:

I Stay in the same state (e.g. speaking slowly)

I Move to the next position in the word

I At the end of the word, move to the start of the next word

I We build a little state graph for each word and chain them together toform our state space X

RecapSpeech RecognitionMachine TranslationHMM for speech

RecapSpeech RecognitionMachine TranslationTransition with Bigrams

RecapSpeech RecognitionMachine TranslationDecoding

I While there are some practical issues, finding the words given theacoustics is an HMM inference problem

I We want to know which state sequence x1:T is most likely given theevidence e1:T :

I From the sequence x, we can simply read off the words

1. Recap

RecapSpeech RecognitionMachine TranslationMachine Translation

I Fundamental goal: analyze and process human language, broadly, robustly,accurately...

I End systems that we want to build:Ambitious: speech recognition, machine translation, information extraction,dialog interfaces, question answering...Modest: spelling correction, text categorization, language recognition, genreclassification.

RecapSpeech RecognitionMachine TranslationLanguage Models

I Language defined by a sequence of strings and rules called grammars.

I Formal languages also need semantics that define meaning.

I Natural Languages:

1. not definitive: is disagreement with grammar rules“Not to be invited is sad”“To be not invited is sad”

2. ambiguous:“Entire store 25% off”“I will bring my bike tomorrow if it looks nice in the morning.”

3. large and constantly changing

RecapSpeech RecognitionMachine Translation

I n-gram sequence of n characters or sequence of n words, syllablesI n-gram models: define probability distributions for these sequencesI n-gram model is defined as a Markov chain of order n − 1.

For a trigram:

p(ci | c1:i−1) = p(ci | ci−2:i−1)

p(c1:N) =N∏

Pr(ci | c1:i−1) =N∏

Pr(ci | ci−2:i−1)

I 100 chars ; millions of entriesI with words even worse

I Corpus body of text

RecapSpeech RecognitionMachine TranslationLanguage identification

Learned from corpus:

p(ci | ci−2:i−1, l)

Most probable language:

l∗ = argmaxl p(l | c1:N)

= argmaxl p(l)p(c1:N | l) (Bayes)

= argmaxl p(l)N∏

p(ci | ci−2:i−1, l) (Markov property)

Computers can reach 99% accuracy

Rough translation: gives the main point but contains errors

Pre-edited translation: original text written in constrained language easier totranslate automatically

Restricted-source translation: fully automatic but only on technical contentas e.g. weather forecast

RecapSpeech RecognitionMachine TranslationMachine Translation Systems

Very simplified there are three types of machine translation

Statistical machine translation (SMT) learn relational dependencies offeatures such as grams, lemmas, etc. • Requires large data sets• Example: google translate • Relatively easy to implement

Rule-based machine translation (RBMT) use grammatical rules and languageconstructions to analyze syntax and semantics • Use moderatesize data sets • Long development time and expertise

Hybrid machine translation either construct from RBMT and use SMT topost-process and optimize the result • Or use grammaticalrules to derive further features to then be fed in the statisticallearning machine • New direction of research.

RecapSpeech RecognitionMachine TranslationBrief History

I Interlingual model: the source language, i.e. the text to be translated istransformed into an interlingua, i.e., an abstract language-independentrepresentation. The target language is then generated from theinterlingua.

I Transfer model: the source language is transformed into an abstract, lesslanguage-specific representation. Linguistic rules which are specific tothe language pair then transform the source language representation intoan abstract target language representation and from this the targetsentence is generated.

I Direct model: words are translated directly without passing through anadditional representation.

RecapSpeech RecognitionMachine TranslationLevels of Transfer

Interlingua SemanticsAttraction(NamedJohn, NamedMary, High)

English WordsJohn loves Mary

French WordsJean aime Marie

English Syntax

S(NP(John), VP(loves, NP(Mary))) S(NP(Jean), VP(aime, NP(Marie)))French Syntax

English SemanticsLoves(John, Mary) Aime(Jean, Marie)

French Semantics

Vauquois pyramid

RecapSpeech RecognitionMachine TranslationLevels of Transfer

RecapSpeech RecognitionMachine TranslationThe problem with dictionary look ups

RecapSpeech RecognitionMachine TranslationStatistical machine translation

Data driven MT

I e sequence of strings in EnglishI f sequence of strings in French

f ∗ = argmaxf Pr(f | e) = argmaxf Pr(e | f )Pr(f )

I Pr(e | f ) learned from bilingual (parallel) corpus made of phrases seenbefore

There is a smelly wumpus sleeping in 2 2

Il y a un wumpus qui dortmalodorant à 2 2

d1 = 0 d

3 = -2 d

2 = +1 d

4 = +1 d

Given English sentence e find French sentence f ∗:

1. break English e into phrases e1, . . . , en

2. ∀ei choose the French fi : Pr(fi | ei )

3. choose a permutation of phrases f1, . . . , fn∀fi choose distortion di : num. of words that phrase fi has moved wrt fi−1

Pr(f , d | e) =n∏

Pr(fi | ei )Pr(di )

with 100 French phrases for a 5-gram English there are 1005 different 5-gramand 5! reorderings. 34

RecapSpeech RecognitionMachine TranslationLearn probabilities

1. Parallel corpus: parliamentary debates, web pages

2. Segment into sentences. Periods are good indicators with some care.

3. Align sentences. length of sentences is an indicator, landmarks another

4. Align phrases within sentence: iterative process,aggregation of evidence,no other pair appear so frequently in the corpus. Pr(fi | ei )

5. Extract distortions: count how often distortions appear in the corpusafter phrase alignment (smoothing)

6. Improve estimates of Pr(f | e) and Pr(d) with EM.

RecapSpeech RecognitionMachine TranslationLearning to translate

RecapSpeech RecognitionMachine TranslationAn HMM model

RecapSpeech RecognitionMachine TranslationMachine translation systems

RecapSpeech RecognitionMachine TranslationGrammars

Grammars: set of rules (from left to right) that describe how to form stringsfrom the language’s alphabet that are valid according to the language’ssyntax (Language generator).

Parsing is the process of recognizing a string in natural languages by breakingit down to a set of symbols and analyzing each one against the grammar ofthe language, ie, determining whether the string belongs to the language or isgrammatically incorrect. The result is a parse tree.

I context free grammars (seehttp://en.wikipedia.org/wiki/Chomsky_hierarchy)

I probabilistic context free grammars

I lexicalized probabilistic context free grammars

RecapSpeech RecognitionMachine TranslationParsing as search

RecapSpeech RecognitionMachine TranslationProbabilistic Context Free Grammars

RecapSpeech RecognitionMachine TranslationHybrid Systems

The translated sentence can bechecked against a monolingual corpus.

I Translate text from one language to another

I Recombines fragments of example translations

I Challenges:

I What fragments? [learning to translate]

I How to make efficient? [fast translation search]

I After a first bubble now full speed in the sector

I In spite of the economical crisis 7% growth on world basis

I Commercial and technological focus

I Danish is a marginal language and existing systems cannot be appliedreliably

I www.eicom.dk and www.oversaetterhuset.dk search development incollaboration with research institutions (SDU, CBS, ASB)

RecapSpeech RecognitionMachine TranslationAnnouncement

Need for human resources, possibilties for thesis and individual studyactivities together with:

I Visual Interactive Syntax Learning project at the Institute for Languageand Communication of SDUhttp://beta.visl.sdu.dk/constraint_grammar.html

I Eckhard Bick project leaderhttp://en.wikipedia.org/wiki/Eckhard_Bick

If interested contact me.

Lecture 18 Natural Language Processing - SDUmarco/DM828/Slides/dm828-lec18.pdf · 2011. 12. 18. ·...

Documents

ReconstructionofWallGeometryforAugmentedReality … · 2020-06-18 · ReconstructionofWallGeometryforAugmentedReality MasterThesis of ArtiﬁcialIntelligence TheUniversityofAmsterdam

ArtiﬁcialIntelligence(KünstlicheIntelligenz)1 ... · Chapter 11 Propositional Reasoning, Part I: Principles 11.1 Introduction TheWumpusWorld Actions: GoForward, TurnRight (by 90

BAEP!423)!14374! Management!of!Small!Businesses! …...BAEP423:!Management!of!Small!Business! Syllabusand!CourseOverview! Page1!!!! !! BAEP!423)!14374! Management!of!Small!Businesses!

Lecture 13 Learning in Graphical Models - SDUmarco/DM828/Slides/dm828-lec13.pdf · Learning Graphical Models Outline Unsupervised Learning 1. LearningGraphicalModels ParameterLearninginBayesNets

Theory and Practice of Molecular Electronic Structureiopenshell.usc.edu/chem545/lectures2019/Lecture2-CourseOverview.… · his development of the density-functional theory and to

AMS 529: Finite Element Methods: Fundamentals ...jiao/teaching/ams529/lectures/lecture01.pdfAMS 529: Finite Element Methods: Fundamentals, Applications, and New Trends Lecture1: CourseOverview;

Areviewofmathematicalmodeling,artiﬁcialintelligence

Lecture1 CourseIntroduction ArtiﬁcialIntelligencemarco/DM828/Slides/dm828-lec1.pdf · ALICE: Knowledge is of two kinds: that which we learn from the senses and that which is true

Lecture1 CourseIntroduction ArtiﬁcialIntelligencemarco/DM828/Slides/dm828-lec1.pdf · Lecture1 CourseIntroduction ArtiﬁcialIntelligence MarcoChiarandini Department of Mathematics

ArtiﬁcialIntelligence-BasedDistributedBeliefPropagation ... · Wide-Area Monitoring Systems (WAMS) are developed that analyze the information obtained from a wide-area network of

FHA Appraisal 4000.1 and the (EAD ... - Bill Knows Dirt 8:30 to8:45AM Introduction, courseoverview ... discipline appraisers and inform potential homeowners of the ... FHA Appraisal

ArtiﬁcialIntelligence(KünstlicheIntelligenz)1 … · 2020-01-30 · PROLOG 4 Oct24. PROLOG,Complexity,Agents 5 Oct29. Rationality,Environments,AgentTypes 6 Oct31. Problemsolving,uninformedsearch

ALGORITHMS AND CLASSIFIERS EVALUATION AND ANALYSIS …837441/FULLTEXT01.pdf · 1.1 Learning Machine learning is an interdisciplinary field of research with influences and conceptsfrom,e.g.,mathematics,computerscience,artificialintelligence,statis-tics,

ArtiﬁcialIntelligence,Surveillance,andBig Data

DIM3/iTECH+CourseOverview+2011/2012+leif/papers/DIM3-iTech-Course-Overview-2012.pdf · DIM3+TimetableandAssessments+ Lectures12Z1pmMonday!and!Wednesday! Labs2Z4pmTuesday!! + DIM3!Assessment!Schedule!

AMS526: NumericalAnalysisI (NumericalLinearAlgebra)jiao/teaching/ams526_fall12/lectures/lecture… · Outline 1 CourseOverview 2 Matrix-VectorMultiplication 3 Matrix-MatrixMultiplication

Organizational introduction Unit0 · 2018. 10. 16. · Organizational introduction Goals CourseOverview InitialSurvey Situatinggrammar ProvisionalPlan(asat 16.10.2018) Resources

CourseOverview’ - SkillPath EXAM PREP COURSE CourseOverview’ ... Practiceexercises’bring’PMP®concepts’tolife ... (ITTOs) ’’ Identify

Section1: Introduction,ProbabilityConcepts andDecisions · CourseOverview Section 1: Introduction,ProbabilityConceptsandDecisions Section 2: LearningfromData: Estimation,ConﬁdenceIntervals

SchoolofComputerScience Peter van Beek ...vanbeek/cv.pdf · J8 Grzegorz Kondrak and Peter van Beek. A Theoretical Evaluation of Selected BacktrackingAlgorithms. ArtiﬁcialIntelligence,89:365-387,1997