20
Welcome!

Nautral Langauge Processing - Basics / Non Technical

Embed Size (px)

Citation preview

Page 1: Nautral Langauge Processing - Basics / Non Technical

Welcome!

Page 2: Nautral Langauge Processing - Basics / Non Technical

Why NLP?

lWe have to adopt to how computer wants data

land we still adopt to the way computer gives back

information.

lNLP is helping us to make computer understand one of the

most powerful interface to HUMANS : language.

lApple Siri , Google Now are cutting edge examples of how

NLP helps computer to fit humans.

lMore details : http://www.slideshare.net/yourfrienddhruv/apps-with-ears-and-eyes

Page 3: Nautral Langauge Processing - Basics / Non Technical

Google Now vs. Siri vs. Cortana

https://www.stonetemple.com/great-knowledge-box-showdown/

Page 4: Nautral Langauge Processing - Basics / Non Technical

Cutting edge NLP!

https://news.ycombinator.com/item?id=8426148

https://news.ycombinator.com/item?id=8428007

http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/

Page 5: Nautral Langauge Processing - Basics / Non Technical

Cutting edge NLP!

https://news.ycombinator.com/item?id=8428418

AI Websites That Design

Themselves

thegrid.i

o

Page 6: Nautral Langauge Processing - Basics / Non Technical

NLP in today's session

In this session we will focus more on how we

can deal with written language in software

products.

Page 7: Nautral Langauge Processing - Basics / Non Technical

NLP for text analysis

lKnowledge is fundamental requirement for any

problem solving.

lAn intelligent decision making system needs 3

Major things.

lA) Lots of relevant knowledge

lB) A way to represent that knowledge

corresponding to current problem/question at

hand

lC) A way to represent the answer in human

language.

Page 8: Nautral Langauge Processing - Basics / Non Technical

General Architecture of NLP systems

lBasic systems

l Tokenization-> [lemmatization] -> tagging ->

chunking -> domain mapping

l NLP systems requires per-created domain

specific corpora (dictionary+rule set handcrafted

by humans)

l Details: http://www.nltk.org/book/ch05.html

Page 9: Nautral Langauge Processing - Basics / Non Technical

General Architecture of NLP systems

lAdvance Systems

http://nlp.stanford.edu/software/patternslearning.shtml

Page 10: Nautral Langauge Processing - Basics / Non Technical

Relationship to Machine Learning

lNLP lAlgo and tooling are targeted to convert Text/Data to

Values

lML lAlgo and tooling are targeted to consume Values and

produce meaningful Values/Vectors

Page 11: Nautral Langauge Processing - Basics / Non Technical

Few popular NLP toolkits

lPythonlhttp://www.nltk.org

lhttp://scikit-learn.org/

lhttps://textblob.readthedocs.org

lJavalhttp://nlp.stanford.edu/software/index.shtml

lhttps://gate.ac.uk/overview.html

lhttps://opennlp.apache.org/

l Rlhttp://cran.r-

project.org/web/views/NaturalLanguageProcessing.ht

ml

Page 12: Nautral Langauge Processing - Basics / Non Technical

Interesting applications

lCoverd in this session

l1) Information summarization

l2) Information extraction

l3) Sentiment Analysis

l4) Dialog based systems

Page 13: Nautral Langauge Processing - Basics / Non Technical

1) Information summarization

lCreates summary of big text.

l http://summly.com/

lYou can create highly personalized summary of same

content per user

lhttp://automatedinsights.com/wordsmith/

lRace is on between 'plagiarism detection' and 'automatic

paraphrasing'

l http://copyscape.com/

l https://oaps.eu/project/overview/

l http://plagcontrol.com

lHandy code :

l Python and related : https://github.com/miso-belica/sumy

l Java/Scala : https://github.com/MojoJolo/textteaser

lBasics:

Have to pick most interesting sentences with highest

Page 14: Nautral Langauge Processing - Basics / Non Technical

2) Information extraction

lNamed Entity RecognitionlCommon entity types include ORGANIZATION,

PERSON, LOCATION, DATE, TIME, MONEY, and

GPE (geo-political entity).

lRelationship extractionlMainly between NERs

lhttp://www.cruxbot.com/

lHandy code :lhttp://www.nltk.org/book/ch07.html

lBasics:l Find interesting pair of words, and note adjoining

words to know relationship between them.

Page 15: Nautral Langauge Processing - Basics / Non Technical

2.1) Information Retrieval

lLarge text needs to be search based on key words

lTraditional RDMS indexing don't work.

lUsing Full text search toolkits, which are good practical

example of NLP implementation.

lHandy Code:

lSolar:Java

lPostgresql:DB

lhttp://blog.lostpropertyhq.com/postgres-full-text-search-is-

good-enough/

l Basics:

lWhile storing large text, remove non value added words (e.g

verbs) and index only root of words.

Page 16: Nautral Langauge Processing - Basics / Non Technical

3) Sentiment Analysis

lTo understand overall meaning/tone of text.le.g. Neutral vs. Polar. Positive vs. Negative.

lDemo lhttp://text-processing.com/demo/sentiment/

lhttp://nlp.stanford.edu:8080/sentiment/rntnDemo.html

lUse:lFinding twitter tread is positive or negative?

lFinding overall review for a product is positive or

negative?

lBasics:

l Have to pick most interesting phrases and co-

relate their meaning.

l Correlate/Group things with similar meaning

Page 17: Nautral Langauge Processing - Basics / Non Technical

4) Dialog based systems

lUnderstand input given in natural language.

lGoogle search, Siri, Google Now

lBuilding interactive chat bots to handle customer support.

lDetails:http://www.nltk.org/book/ch10.html

lHandy code:

l We can convert a question to SQL Query!

lBasics:

lHave English grammar mapped to another grammar for input parsing

& vise-a-verse

Page 18: Nautral Langauge Processing - Basics / Non Technical

Development & Testing/Verifying of NLP systems

l1) Understand Gold Set, Training Set , Test Set

l2) Seen vs Unseen Data

l3) Accuracy : Precision & Recall.

l4) Confusion Matrices

Page 19: Nautral Langauge Processing - Basics / Non Technical

Session Summary

l1) NLP + ML capabilities are foundation for

intelligent systems working with / on consumer

data.

l2) Domain knowledge is the key differentiators

and MAJOR cost factor

l3) NLP system development requires different mid

set, as its not creation but its evolution of software

system.

l4) Lots and Lots of academic / research reading is

must.

Page 20: Nautral Langauge Processing - Basics / Non Technical

What Next? Q&A? Are you sure?

lI have an Idea which might require NLPlGo reach out more people:

l@nikunjness , @yourfrienddhruv

lI am want to know how to develop such systems

lI think I want to research more possibilities!lRead this : http://www.nltk.org/book/ch01.html

lYes, It's python.

lI think its too complex.lYou are not alone.