Upload
thanh-loc-vo
View
673
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
Natural language processing
OverviewBasic knowledgeDemonstration
Outline
LOGO
USER : Men are all alike.ELIZA : In what way?USER : They’re always bugging us about something or other. ELIZA : Can you think of a specific example?USER : Well, my boyfriend made me come here. ELIZA : Your boyfriend made you come here ?USER : He says I’m depressed much of the time.ELIZA : I am sorry to hear you are depressed.USER : It’s true, I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?USER : I need some help; that much seems certain.ELIZA : What would it mean to you if you got some help?USER : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family.USER : My mother takes care of me.ELIZA : Who else in your family takes care of you?USER : My father.ELIZA : Your father ?USER : You are like my father in some ways.ELIZA : What resemblance do you see?
Natural language processing (NLP)
A sub-field of Artificial Intelligent, since 1960s …
Concerned with the interactions between computers and human languages with one ultimate goal : Computers can “understand” human
Many applications in real world
Natural language processing (NLP)
Natural language unit? Natural language understanding Natural language generation
Data? Speech processing Text processing
Natural language text understanding!
Task of generating natural language from a machine representation
May be viewed as the opposite of natural language understanding .
Applications:Jokes generationTextual summaries of databases Enhancing accessibility
Natural language generation
An advanced subtopic of NLP deals with reading comprehension
More complex than NLG Many commercial interest in this field
News-gatheringData-MiningVoice-ActivationLarge-scale content analysis
Natural language understanding
Ambiguity in Language - Difficulties
Logic is too clear, the lost of flexibility cause difficulties in NLP
Examples :Time flies like an arrowCan be understood in 7 ways !!!
I never said she stole my money ! Someone else said it, but I didn't.
Ambiguity in LanguageLogic is too clear, the lost of flexibility
become difficulties in NLP
Examples :Time flies like an arrowCan be understood in 7 ways !!!
I never said she stole my money !I simply didn't ever say it
Ambiguity in LanguageLogic is too clear, the lost of flexibility become
difficulties in NLP
Examples :Time flies like an arrowCan be understood in 7 ways !!!
I never said she stole my money !I might have implied it in some way, but I never
explicitly said it
Ambiguity in LanguageLogic is too clear, the lost of flexibility
become difficulties in NLP
Examples :Time flies like an arrowCan be understood in 7 ways !!!
I never said she stole my money !I said someone took it; I didn't say it was she
Ambiguity in LanguageLogic is too clear, the lost of flexibility
become difficulties in NLP
Examples:Time flies like an arrowCan be understood in 7 ways !!!
I never said she stole my money !I just said she probably borrowed it
Ambiguity in LanguageLogic is too clear, the lost of flexibility
become difficulties in NLP
Examples :Time flies like an arrowCan be understood in 7 ways !!!
I never said she stole my money !I said she stole someone else's money
Ambiguity in LanguageLogic is too clear, the lost of flexibility
become difficulties in NLP
Examples :Time flies like an arrowCan be understood in 7 ways !!!
I never said she stole my money !I said she stole something, but not my money
Concrete ProblemsWords combination and divisionStress placing on wordsThe properties of subjects
We gave the monkeys the bananas because they were hungry
We gave the monkeys the bananas because they were over-ripe
Specifying which word an adjective applies toA pretty little girls' school
Concrete ProblemsInvolves reasoning about the worldEmbedded a social system of people
interactingpersuading, insulting and amusing themchanging over time
Homonymous
APPLICATION
Applications of NLP Techniques
Automatic Summarization
Information Extraction
Applications of NLP Techniques
Grammar Testing
Applications of NLP Techniques
Applications of NLP Techniques
Applications of NLP Techniques
Applications of NLP Techniques
Applications of NLP Techniques
Applications of NLP Techniques
Applications of NLP Techniques
ACHIEVEMENT
Achievements
ePi Group:Automatic Vietnamese processing systemwww.baomoi.com
Collecting news from all Vietnamese e-newspapers
EVTrans – Softex Co Ltd.CyclopVnKim
BASIC KNOWLEDGE
Stages
Morphological analysis : Individual words are analyzed into their
components Syntactic analysis
Linear sequence of words are transformed into structures that show how the words relate to each other
Semantic analysis A transformation is made from the input
text to an internal representation that reflects the meaning
Pragmatic analysis To reinterpret what was said to what
was actually meant Discourse analysis
Resolving references between sentences
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesMorphemes: smallest meaningful unit spoken units of language.Stem: book, cat, car, …Affixes : un-, -s, -es, ..Clitic: ‘ve, ‘m
Morphological parsing: parsing a word into stem and affixes and identifying the parts and their relationships
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesWord Classes Parts of speech: noun, verb,
adjectives, etc.Word class dictates how a word
combines with morphemes to form new words
ExamplesBooks: book + s Unladylike = un + lady + like
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesVietnamese?
Ăn = ănUống = uốngXe = xe
No ‘Xes’ in Vietnamese!Problems are text tokenizing.
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesWhy parse words?
To identify a word’s part-of-speechTo identify a word’s stem (IR)
… then?Spell- checkingTo predict next wordsTo predict the word’s accent
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesAmbiguityI want her to go to the cinema
with me
To - infinitive?To - preposition?
Con ngựa đá đá con ngựa đá.
đá = đá?
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesHow to implement?Regular expressionFinite State Transducers (FST)Finite State Accepter (FSA)
*.exeir??man\b[0-9]+ *(Mb|[Mm]egabytes?)\b
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesRelate terms:
Stem, stemmingPart of speechN-gram
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
SYNTAX
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesLinear sequence of words are transformed into structures that show how the words relate to each other.
Determine grammatical structure.
I am a boy = [Subject] [Verb] [Cardinal] [Noun]
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesSyntax
Actual structure of a sentence
GrammarThe rule set used in the analysis
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesA grammar define syntactically legal sentences I ate an apple (syntactic legal) I ate apple (not syntactic legal) I ate a building (syntactic legal,
but?)
doesn’t mean that it’s meaningful!
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesAmbiguities
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
SEMANTIC
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesWhat could this mean…
Representations of linguistic inputs that capture the meanings of those inputs
For us it means Representations that permit or
facilitate semantic processing Permit us to reason about their truth
(relationship to some world) Permit us to answer questions based
on their content Permit us to perform inference
(answer questions and determine the truth of things we don’t actually know)
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesRequirements
VerifiabilityAmbiguityCanonical FormInferenceExpressiveness
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
StagesPragmatics: concerns how
sentences are used in different situations and how use affects the interpretation of the sentence
Discourse: concerns how the immediately preceding sentences affect the interpretation of the next sentence
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
‘He’, ‘it’, ‘his’ can be inferred from previous sentence
It’s discourse
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
Morphology
Syntax
Semantic
Pragmatic
Discourse
Stages
WordnetMindnetStanford TaggerStanford Parser……..
Resources
Machine translationSearch engineInformation extractionChat bot
Demonstration
Machine translation
Machine translation
Machine translation
Can we use previously translated text to learn how to translate new texts?Yes! But, it’s not so easyTwo paradigms, statistical MT, and EBMT
Requirements:Aligned large parallel corpus of translated
sentences{S source S target }Bilingual dictionary for intra-S alignmentGeneralization patterns (names, numbers,
dates…)
Example-based MT (EBMT)- Statistical MT
Simplest: Translation MemoryIf S new= S source in corpus, output aligned S target
Compositional EBMTIf fragment of Snew matches fragment of Ss,
output corresponding fragment of aligned St
Prefer maximal-length fragmentsMaximize grammatical compositionality
Via a target language grammarOr, via an N-gram statistical language model
Example-based MT (EBMT)- Statistical MT
Requires an Interlingua - language-neutral Knowledge Representation (KR)
Philosophical debate: Is there an interlingua?FOL is not totally language neutral (predicates,
functions, expressed in a language)Other near-interlinguas (Conceptual Dependency)
Requires a fully-disambiguating parserDomain model of legal objects, actions, relations
Requires a NL generator (KR -> text)Applicable only to well-defined technical domainsProduces high-quality MT in those domains
Intelingual based MT
Intelingua-based MTRule-based MT
Machine translation (MT)
Each approach has its own strength
Rapidly adaptable: statistical, example-basedGood grammar: rule-based (grammar)High precision in narrow domain: Intelingua
Machine translation
GoogleYahooAlta-vistaAnswer.com
Search engine
Spider - a browser-like program that downloads web pages.Crawler – a program that automatically follows all of the
links on each web page.Indexer - a program that analyzes web pages downloaded
by the spider and the crawler. Database– storage for downloaded and processed pages.Results engine – extracts search results from the database. Web server – a server that is responsible for interaction
between the user and other search engine components.
Search engine
Spider - a browser-like program that downloads web pages. Crawler – a program that automatically follows all of the
links on each web page. Indexer - a program that analyzes web pages downloaded
by the spider and the crawler. Database– storage for downloaded and processed pages. Results engine – extracts search results from the database. Web server – a server that is responsible for interaction
between the user and other search engine components.
Search engine
NLP?
Search engine
NLP?
Google search?
Statistic-basedHOT
Information extraction
Idea is to ‘extract’ particular types of information from arbitrary text or transcribed speech
Examples:Names entities: people, places, organizationTelephone numbersDates
Many uses:Question answering systems, fisting of news or
mail…Job ads, financial information, terrorist attacks
Information extraction
Often use a set of simple templates or frames with slots to be filled in from input text. Ignore everything else.Husni’s number is 966-3-860-2624.The inventor of the First plane was Abbas ibnu
FernasThe British King died in March of 1932.
Information extraction
Named Entity recognition (NE) Finds and classifies names, places etc.
Co-reference Resolution (CO) Identifies identity relations between entities in texts.
Template Element construction (TE) Adds descriptive information to NE results (using
CO). Template Relation construction (TR)
Finds relations between TE entities. Scenario Template production (ST)
Fits TE and TR results into specified event scenarios.
Information extraction
GATE
Named Entity recognition (NE)
Co-reference Resolution (CO)
Template elements
Scenario template
VN-Kim
AIML = Artificial Intelligent Mark-up Language
Alice
AIML and A.L.I.C.E
A.L.I.C.E. (Artificial Linguistic Internet Computer Entity) an award-winning free natural language
artificial intelligence chat robot.
Ruled-baseHuman-like answer without complicated
“brain”Multi-language
AIML and A.L.I.C.E
AIML
NLP’s course , Husni Al-MuhtasebLexical descriptions for Vietnamese language
processing .en.wikipedia.orgwww.xulyngonngu.com
References
Thank you