93
Natural language processing

Natural language processing 2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Natural language processing 2

Natural language processing

Page 2: Natural language processing 2

OverviewBasic knowledgeDemonstration

Outline

Page 3: Natural language processing 2

LOGO

USER : Men are all alike.ELIZA : In what way?USER : They’re always bugging us about something or other. ELIZA : Can you think of a specific example?USER : Well, my boyfriend made me come here. ELIZA : Your boyfriend made you come here ?USER : He says I’m depressed much of the time.ELIZA : I am sorry to hear you are depressed.USER : It’s true, I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?USER : I need some help; that much seems certain.ELIZA : What would it mean to you if you got some help?USER : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family.USER : My mother takes care of me.ELIZA : Who else in your family takes care of you?USER : My father.ELIZA : Your father ?USER : You are like my father in some ways.ELIZA : What resemblance do you see?

vtloc
Question:- What is the first thought come into your mind when you hear about NLP- Have you ever chat to a chatbot? Or ever hear of it?
Page 4: Natural language processing 2

Natural language processing (NLP)

A sub-field of Artificial Intelligent, since 1960s …

Concerned with the interactions between computers and human languages with one ultimate goal : Computers can “understand” human

Many applications in real world

Do Phuong Dong
The definition of "understanding" is one of the major problems in natural-language processing The ultimate goal is to be able to specify models that approach human performance in the linguistic tasks of reading, writing, hearing, and speaking
Do Phuong Dong
Eliza Program developed in the mid-1960s in MITNLP draws on research in Linguistics,Theoretical Computer Science, ArtificialIntelligence, Mathematics and Statistics,Psychology, etc.
Do Phuong Dong
because natural-language recognition seems to require extensive knowledge about the outside world and the ability to manipulate it
Page 5: Natural language processing 2

Natural language processing (NLP)

Natural language unit? Natural language understanding Natural language generation

Data? Speech processing Text processing

Natural language text understanding!

Page 6: Natural language processing 2

Task of generating natural language from a machine representation

May be viewed as the opposite of natural language understanding .

Applications:Jokes generationTextual summaries of databases Enhancing accessibility

Natural language generation

Do Phuong Dong
The difference can be put this way: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words.
Do Phuong Dong
Content determinationDiscourse planningSentence aggregationLexicalisationReferring expression generationRealisation
Do Phuong Dong
For example the SPOTLIGHT system developed at A.C. Nielsen automatically generated readable English text based on the analysis of large amounts of retail sales data.
Do Phuong Dong
describing graphs and data sets to blind people
Page 7: Natural language processing 2

An advanced subtopic of NLP deals with reading comprehension

More complex than NLG Many commercial interest in this field

News-gatheringData-MiningVoice-ActivationLarge-scale content analysis

Natural language understanding

Do Phuong Dong
because of the occurrence of unknown and unexpected features in the input and the need to determine the appropriate syntactic and semantic schemes to apply to it, factors which are pre-determined when outputting language.
Page 8: Natural language processing 2

Ambiguity in Language - Difficulties

Logic is too clear, the lost of flexibility cause difficulties in NLP

Examples :Time flies like an arrowCan be understood in 7 ways !!!

I never said she stole my money ! Someone else said it, but I didn't.

Do Phuong Dong
A string of words may be interpreted in different ways. For example, the string "Time flies like an arrow" may be interpreted in a variety of ways: * The common simile: time moves quickly just like an arrow does; * measure the speed of flies like you would measure that of an arrow (thus interpreted as an imperative) - i.e. (You should) time flies as you would (time) an arrow; * measure the speed of flies like an arrow would - i.e. Time flies in the same way that an arrow would (time them); * measure the speed of flies that are like arrows - i.e. Time those flies that are like arrows; * all of a type of flying insect, "time-flies," collectively enjoys a single arrow (compare Fruit flies like a banana); * each of a type of flying insect, "time-flies," individually enjoys a different arrow (similar comparison applies); * A concrete object, for example the magazine, Time, travels through the air in an arrow-like manner.
Page 9: Natural language processing 2

Ambiguity in LanguageLogic is too clear, the lost of flexibility

become difficulties in NLP

Examples :Time flies like an arrowCan be understood in 7 ways !!!

I never said she stole my money !I simply didn't ever say it

Page 10: Natural language processing 2

Ambiguity in LanguageLogic is too clear, the lost of flexibility become

difficulties in NLP

Examples :Time flies like an arrowCan be understood in 7 ways !!!

I never said she stole my money !I might have implied it in some way, but I never

explicitly said it

Page 11: Natural language processing 2

Ambiguity in LanguageLogic is too clear, the lost of flexibility

become difficulties in NLP

Examples :Time flies like an arrowCan be understood in 7 ways !!!

I never said she stole my money !I said someone took it; I didn't say it was she

Page 12: Natural language processing 2

Ambiguity in LanguageLogic is too clear, the lost of flexibility

become difficulties in NLP

Examples:Time flies like an arrowCan be understood in 7 ways !!!

I never said she stole my money !I just said she probably borrowed it

Page 13: Natural language processing 2

Ambiguity in LanguageLogic is too clear, the lost of flexibility

become difficulties in NLP

Examples :Time flies like an arrowCan be understood in 7 ways !!!

I never said she stole my money !I said she stole someone else's money

Page 14: Natural language processing 2

Ambiguity in LanguageLogic is too clear, the lost of flexibility

become difficulties in NLP

Examples :Time flies like an arrowCan be understood in 7 ways !!!

I never said she stole my money !I said she stole something, but not my money

Page 15: Natural language processing 2

Concrete ProblemsWords combination and divisionStress placing on wordsThe properties of subjects

We gave the monkeys the bananas because they were hungry

We gave the monkeys the bananas because they were over-ripe

Specifying which word an adjective applies toA pretty little girls' school

Page 16: Natural language processing 2

Concrete ProblemsInvolves reasoning about the worldEmbedded a social system of people

interactingpersuading, insulting and amusing themchanging over time

Homonymous

Do Phuong Dong
Từ đồng âm
Page 17: Natural language processing 2

APPLICATION

vtloc
Kinds of application you think is belonged to this field?
Page 18: Natural language processing 2

Applications of NLP Techniques

Automatic Summarization

Page 19: Natural language processing 2

Information Extraction

Applications of NLP Techniques

Do Phuong Dong
Áp dụng trong Data Mining
Page 20: Natural language processing 2

Grammar Testing

Applications of NLP Techniques

Page 21: Natural language processing 2

Applications of NLP Techniques

Page 22: Natural language processing 2

Applications of NLP Techniques

Page 23: Natural language processing 2

Applications of NLP Techniques

Page 24: Natural language processing 2

Applications of NLP Techniques

Page 25: Natural language processing 2

Applications of NLP Techniques

Page 26: Natural language processing 2

Applications of NLP Techniques

Page 27: Natural language processing 2

ACHIEVEMENT

Page 28: Natural language processing 2

Achievements

ePi Group:Automatic Vietnamese processing systemwww.baomoi.com

Collecting news from all Vietnamese e-newspapers

EVTrans – Softex Co Ltd.CyclopVnKim

Page 29: Natural language processing 2
Page 30: Natural language processing 2
Page 31: Natural language processing 2

BASIC KNOWLEDGE

Page 32: Natural language processing 2

Stages

Page 33: Natural language processing 2

Morphological analysis : Individual words are analyzed into their

components Syntactic analysis

Linear sequence of words are transformed into structures that show how the words relate to each other

Semantic analysis A transformation is made from the input

text to an internal representation that reflects the meaning

Pragmatic analysis To reinterpret what was said to what

was actually meant Discourse analysis

Resolving references between sentences

Stages

Page 34: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 35: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 36: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesMorphemes: smallest meaningful unit spoken units of language.Stem: book, cat, car, …Affixes : un-, -s, -es, ..Clitic: ‘ve, ‘m

Morphological parsing: parsing a word into stem and affixes and identifying the parts and their relationships

vtloc
The study of meaningful parts and how they are put together.
Page 37: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesWord Classes Parts of speech: noun, verb,

adjectives, etc.Word class dictates how a word

combines with morphemes to form new words

ExamplesBooks: book + s Unladylike = un + lady + like

vtloc
We need a lexicon library to support this
Page 38: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesVietnamese?

Ăn = ănUống = uốngXe = xe

No ‘Xes’ in Vietnamese!Problems are text tokenizing.

Page 39: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesWhy parse words?

To identify a word’s part-of-speechTo identify a word’s stem (IR)

… then?Spell- checkingTo predict next wordsTo predict the word’s accent

Page 40: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesAmbiguityI want her to go to the cinema

with me

To - infinitive?To - preposition?

Con ngựa đá đá con ngựa đá.

đá = đá?

Page 41: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesHow to implement?Regular expressionFinite State Transducers (FST)Finite State Accepter (FSA)

*.exeir??man\b[0-9]+ *(Mb|[Mm]egabytes?)\b

vtloc
Regular Expression can be considered asa pattern to specify text search strings to search a corpus of texts
vtloc
Finite-state methods are useful in dealing with the lexicon (words)
Page 42: Natural language processing 2
Page 43: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesRelate terms:

Stem, stemmingPart of speechN-gram

vtloc
Start with eight basic categories-Noun, verb, pronoun, preposition, adjective, adverb, article, conjunction- These categories are based on morphological and distributional properties
vtloc
Stem: don't care about other infix, just the root. Used in IR
vtloc
Start looking at words in contextAn artificial task: predicting next words in asequence
Page 44: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 45: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

SYNTAX

Page 46: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesLinear sequence of words are transformed into structures that show how the words relate to each other.

Determine grammatical structure.

I am a boy = [Subject] [Verb] [Cardinal] [Noun]

Page 47: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

vtloc
we commonly use tree to represent sentence's syntax
Page 48: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesSyntax

Actual structure of a sentence

GrammarThe rule set used in the analysis

Page 49: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesA grammar define syntactically legal sentences I ate an apple (syntactic legal) I ate apple (not syntactic legal) I ate a building (syntactic legal,

but?)

doesn’t mean that it’s meaningful!

Page 50: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesAmbiguities

Page 51: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 52: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

SEMANTIC

vtloc
First we worked with words (morphology)o Then we looked at syntax and grammaro Now we’re moving on to meaning
Page 53: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesWhat could this mean…

Representations of linguistic inputs that capture the meanings of those inputs

For us it means Representations that permit or

facilitate semantic processing Permit us to reason about their truth

(relationship to some world) Permit us to answer questions based

on their content Permit us to perform inference

(answer questions and determine the truth of things we don’t actually know)

vtloc
So far, we have focused on the structure of language not on what things mean.We have seen that words have different meaning, depending on the context in which they are used.
Page 54: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

vtloc
First we worked with words (morphology)o Then we looked at syntax and grammaro Now we’re moving on to meaning
Page 55: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesRequirements

VerifiabilityAmbiguityCanonical FormInferenceExpressiveness

vtloc
First we worked with words (morphology)o Then we looked at syntax and grammaro Now we’re moving on to meaning
vtloc
The system’s ability to compare the state of affairs described by a representation to the state of affairs in some world as modeled in the knowledge base
vtloc
The system should allow us to represent meanings unambiguously
vtloc
Distinct inputs could have the same meaning n Does Herfi serve vegetarian dishes? n Do they have vegetarian food at Herfi? n Are vegetarian dishes served at Herfi? n Does Herfi serve vegetarian fare? o Alternative (if not the same): n Four different semantic representations n Store all possible meaning representations in KB
vtloc
Consider a more complex request
vtloc
Must accommodate wide variety of meanings First Order Predicate Calculus (FOPC) is expressive enough to handle many of the NLP needs
Page 56: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

vtloc
First we worked with words (morphology)o Then we looked at syntax and grammaro Now we’re moving on to meaning
vtloc
The system’s ability to compare the state of affairs described by a representation to the state of affairs in some world as modeled in the knowledge base
vtloc
The system should allow us to represent meanings unambiguously
vtloc
Distinct inputs could have the same meaning n Does Herfi serve vegetarian dishes? n Do they have vegetarian food at Herfi? n Are vegetarian dishes served at Herfi? n Does Herfi serve vegetarian fare? o Alternative (if not the same): n Four different semantic representations n Store all possible meaning representations in KB
vtloc
Consider a more complex request
vtloc
Must accommodate wide variety of meanings First Order Predicate Calculus (FOPC) is expressive enough to handle many of the NLP needs
Page 57: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

StagesPragmatics: concerns how

sentences are used in different situations and how use affects the interpretation of the sentence

Discourse: concerns how the immediately preceding sentences affect the interpretation of the next sentence

Page 58: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

‘He’, ‘it’, ‘his’ can be inferred from previous sentence

It’s discourse

Page 59: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 60: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 61: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 62: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 63: Natural language processing 2

Morphology

Syntax

Semantic

Pragmatic

Discourse

Stages

Page 64: Natural language processing 2

WordnetMindnetStanford TaggerStanford Parser……..

Resources

Page 65: Natural language processing 2

Machine translationSearch engineInformation extractionChat bot

Demonstration

Page 66: Natural language processing 2

Machine translation

Page 67: Natural language processing 2

Machine translation

Page 68: Natural language processing 2

Machine translation

Page 69: Natural language processing 2

Can we use previously translated text to learn how to translate new texts?Yes! But, it’s not so easyTwo paradigms, statistical MT, and EBMT

Requirements:Aligned large parallel corpus of translated

sentences{S source S target }Bilingual dictionary for intra-S alignmentGeneralization patterns (names, numbers,

dates…)

Example-based MT (EBMT)- Statistical MT

Page 70: Natural language processing 2

Simplest: Translation MemoryIf S new= S source in corpus, output aligned S target

Compositional EBMTIf fragment of Snew matches fragment of Ss,

output corresponding fragment of aligned St

Prefer maximal-length fragmentsMaximize grammatical compositionality

Via a target language grammarOr, via an N-gram statistical language model

Example-based MT (EBMT)- Statistical MT

Page 71: Natural language processing 2

Requires an Interlingua - language-neutral Knowledge Representation (KR)

Philosophical debate: Is there an interlingua?FOL is not totally language neutral (predicates,

functions, expressed in a language)Other near-interlinguas (Conceptual Dependency)

Requires a fully-disambiguating parserDomain model of legal objects, actions, relations

Requires a NL generator (KR -> text)Applicable only to well-defined technical domainsProduces high-quality MT in those domains

Intelingual based MT

Page 72: Natural language processing 2

Intelingua-based MTRule-based MT

Machine translation (MT)

Page 73: Natural language processing 2

Each approach has its own strength

Rapidly adaptable: statistical, example-basedGood grammar: rule-based (grammar)High precision in narrow domain: Intelingua

Machine translation

Page 74: Natural language processing 2

GoogleYahooAlta-vistaAnswer.com

Search engine

Page 75: Natural language processing 2

Spider - a browser-like program that downloads web pages.Crawler – a program that automatically follows all of the

links on each web page.Indexer - a program that analyzes web pages downloaded

by the spider and the crawler. Database– storage for downloaded and processed pages.Results engine – extracts search results from the database.  Web server – a server that is responsible for interaction

between the user and other search engine components.

Search engine

Page 76: Natural language processing 2

Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the

links on each web page. Indexer - a program that analyzes web pages downloaded

by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database.  Web server – a server that is responsible for interaction

between the user and other search engine components.

Search engine

NLP?

Page 77: Natural language processing 2

Search engine

NLP?

Page 78: Natural language processing 2

Google search?

Statistic-basedHOT

Page 79: Natural language processing 2

Information extraction

vtloc
IR: help user to find information from large amount of data.
Page 80: Natural language processing 2

Idea is to ‘extract’ particular types of information from arbitrary text or transcribed speech

Examples:Names entities: people, places, organizationTelephone numbersDates

Many uses:Question answering systems, fisting of news or

mail…Job ads, financial information, terrorist attacks

Information extraction

Page 81: Natural language processing 2

Often use a set of simple templates or frames with slots to be filled in from input text. Ignore everything else.Husni’s number is 966-3-860-2624.The inventor of the First plane was Abbas ibnu

FernasThe British King died in March of 1932.

Information extraction

Page 82: Natural language processing 2

Named Entity recognition (NE) Finds and classifies names, places etc.

Co-reference Resolution (CO) Identifies identity relations between entities in texts.

Template Element construction (TE) Adds descriptive information to NE results (using

CO). Template Relation construction (TR)

Finds relations between TE entities. Scenario Template production (ST)

Fits TE and TR results into specified event scenarios.

Information extraction

Page 83: Natural language processing 2

GATE

Page 84: Natural language processing 2

Named Entity recognition (NE)

Page 85: Natural language processing 2

Co-reference Resolution (CO)

Page 86: Natural language processing 2

Template elements

Page 87: Natural language processing 2

Scenario template

Page 88: Natural language processing 2

VN-Kim

Page 89: Natural language processing 2

AIML = Artificial Intelligent Mark-up Language

Alice

AIML and A.L.I.C.E

Page 90: Natural language processing 2

A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) an award-winning free natural language

artificial intelligence chat robot.

Ruled-baseHuman-like answer without complicated

“brain”Multi-language

AIML and A.L.I.C.E

Page 91: Natural language processing 2

AIML

Page 92: Natural language processing 2

NLP’s course , Husni Al-MuhtasebLexical descriptions for Vietnamese language

processing .en.wikipedia.orgwww.xulyngonngu.com

References

Page 93: Natural language processing 2

Thank you