42
Computational Linguistics A short introduction

Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Computational LinguisticsA short introduction

Page 2: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Outline

● Introduction: Natural Language Processing

● Why NLP@DS?

● Syntax

● Semantics

● Pragmatics

● Applications

● Tools

● Conclusions

Page 3: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Outline

● Introduction: Natural Language Processing

● Why NLP@DS?

● Syntax

● Semantics

● Pragmatics

● Applications

● Tools

● Conclusions

Page 4: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

NLP: Applications

Page 5: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation

Machine translation (MT) is a sub-field of computational linguistics that investigates the use of software to translate text or speech from one language to another.

Page 6: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation

Page 7: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: rule based

Bernard Vauquois' pyramid showing comparative depths of intermediary representation, interlingual machine translation at the peak, followed by transfer-based, then direct translation.

(rule based)

Page 8: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: rule based

Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua.

Page 9: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: rule based

Direct machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or without morphological analysis or lemmatisation. While this approach to machine translation is probably the least sophisticated, dictionary-based machine translation is ideally suitable for the translation of long lists of phrases on the subsentential (i.e., not a full sentence) level, e.g. inventories or simple catalogs of products and services

Page 10: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: rule based

Look for a nice “Gnocchi to the pesto” advertisement (downtown, Piazza Raibetta!): a clear example of direct machine translation failure!

Page 11: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: rule based

In contrast to the simpler direct model of MT, transfer Machine Translation breaks translation into three steps: analysis of the source language text to determine its grammatical structure, transfer of the resulting structure to a structure suitable for generating text in the target language, and finally generation of this text. Transfer-based MT systems are thus capable of using knowledge of the source and target languages

Page 12: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: statistical

Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora.

The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's information theory. Statistical machine translation was re-introduced in the late 1980s and early 1990s by researchers at IBM's Thomas J. Watson Research Center and has contributed to the significant resurgence in interest in machine translation in recent years.

Nowadays it is by far the most widely studied machine translation method.

Page 13: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: statistical

Some benefits:● More efficient use of human and data resources● There are many parallel corpora in machine-readable

format and even more monolingual data.● Generally, SMT systems are not tailored to any

specific pair of languages.● Rule-based translation systems require the manual

development of linguistic rules, which can be costly, and which often do not generalize to other languages.

● More fluent translations owing to use of a language model.

Page 14: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: statistical

Some shortcomings:● Corpus creation can be costly.● Specific errors are hard to predict and fix.● Results may have superficial fluency that masks

translation problems.● Statistical machine translation usually works less well

for language pairs with significantly different word order.

● The benefits obtained for translation between Western European languages are not representative of results for other language pairs, owing to smaller training corpora and greater grammatical differences.

Page 15: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation: example-based

Example-based machine translation (EBMT) is a method of machine translation often characterized by its use of a bilingual corpus with parallel texts as its main knowledge base at run-time. It is essentially a translation by analogy and can be viewed as an implementation of a case-based reasoning approach to machine learning.

Page 16: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation

Online tools:● http://babelnet.org/● https://translate.google.com/● http://www.worldlingo.com/products_services/w

orldlingo_translator.html● ...many, many others....

Page 17: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation

Let's build our own machine translation system exploiting DCGs....

Page 18: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Machine Translation

Let's build our own machine translation system exploiting DCGs....

Page 19: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Information filtering and retrieval

An information filtering system is a system that removes redundant or unwanted information from an information stream using (semi)automated or computerized methods prior to presentation to a human user. Its main goal is the management of the information overload and increment of the semantic signal-to-noise ratio. To do this the user's profile is compared to some reference characteristics. These characteristics may originate from the information item (the content-based approach) or the user's social environment (the collaborative filtering approach).

Page 20: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Information filtering and retrieval

Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on full-text or other content-based indexing. Information retrieval is the science of searching for information in a document, searching for documents themselves, and also searching for metadata that describe data, and for databases of texts, images or sounds.

Automated information retrieval systems are used to reduce what has been called information overload. Many universities and public libraries use IR systems to provide access to books, journals and other documents. Web search engines are the most visible IR applications.

Page 21: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Question answering

Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.

● Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance), and can be seen as an easier task because NLP systems can exploit domain-specific knowledge frequently formalized in ontologies.

● Open-domain question answering deals with questions about nearly anything, and can only rely on general ontologies and world knowledge. On the other hand, these systems usually have much more data available from which to extract the answer.

Page 22: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Automatic summarization

Automatic summarization is the process of shortening a text document with software, in order to create a summary with the major points of the original document. Technologies that can make a coherent summary take into account variables such as length, writing style and syntax.

Online tools:● http://textcompactor.com/● https://www.tools4noobs.com/summarize/● http://smmry.com/

Page 23: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Sentiment analysis

Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.

Sentiment analysis aims to determine the attitude of a speaker, writer, or other subject with respect to some topic or the overall contextual polarity or emotional reaction to a document, interaction, or event. The attitude may be a judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author or speaker), or the intended emotional communication (that is to say, the emotional effect intended by the author or interlocutor).

Page 24: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Sentiment analysis

Tools:

SentiWordNet is a lexical resource for opinion mining. SentiWordNet assigns to each synset of WordNet three sentiment scores: positivity, negativity, objectivity

http://sentiwordnet.isti.cnr.it/

Page 25: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

IBM Watson

https://www.ibm.com/watson/

Page 26: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Outline

● Introduction: Natural Language Processing

● Why NLP@DS?

● Syntax

● Semantics

● Pragmatics

● Applications

● Tools

● Conclusions

Page 27: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Outline

● Introduction: Natural Language Processing

● Why NLP@DS?

● Syntax

● Semantics

● Pragmatics

● Applications

● Tools

● Conclusions

Page 28: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

● Sarcasm is "a sharp, bitter, or cutting expression or remark; a bitter gibe or taunt". Sarcasm may employ ambivalence, although sarcasm is not necessarily ironic.

● Understanding the subtlety of this usage requires second-order interpretation of the speaker's or writer's intentions; different parts of the brain must work together to understand sarcasm. This sophisticated understanding can be lacking in some people with certain forms of brain damage, dementia and autism (although not always).

Page 29: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 30: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 31: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

● In its broadest sense, irony is a rhetorical device, literary technique, or event in which what appears, on the surface, to be the case, differs radically from what is actually the case.

Page 32: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 33: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 34: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 35: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 36: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 37: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 38: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 39: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 40: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Page 41: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

Open problems

Irony as a result of wrong machine translation

● In the office of a doctor in Rome: Specialist in women and other diseases.

● In a Japanese hotel: You are invited to take advantage of the chambermaid.

● In a Norwegian cocktail lounge: Ladies are requested not to have children in the bar.

● ANY PERSONS (EXCEPT PLAYERS) CAUGHT COLLECTING GOLF BALLS ON THIS COURSE WILL BE PROSECUTED AND HAVE THEIR BALLS REMOVED

Page 42: Computational Linguistics A short introduction · Sentiment analysis Sentiment analysis (sometimes known as opinion mining or emotion AI) refers to the use of natural language processing,

...well... I'm afraid the lessons on Computational Linguistics are over...

THIS IS SARCASM ;)