16
1 ©2017 Talend Inc CWIN 17 – Natural Language Processing Armin Wallrab | Director PreSales Central & Northern Europe [email protected]

CWIN17 Frankfurt / talend_nlp

Embed Size (px)

Citation preview

Page 1: CWIN17 Frankfurt / talend_nlp

1©2017 Talend Inc

CWIN17– NaturalLanguageProcessing

ArminWallrab|DirectorPreSalesCentral&[email protected]

Page 2: CWIN17 Frankfurt / talend_nlp

2

• Whatisnaturallanguageprocessing?• Texttokenization• Sentencesplitting• Part-of-Speechtagging

http://www.clips.ua.ac.be/pages/mbsp-tags

• Syntacticparsing• Shallowparsing(akachunking)• NamedEntityRecognition

• Co-referenceresolution• Dependencyparsing

• Sentimentanalysis

Playathttp://nlp.stanford.edu:8080/corenlp/process

NaturalLanguageProcessing

Page 3: CWIN17 Frankfurt / talend_nlp

3

• Extractusefulinformationfromthetextualresources(suchasforums,notesinsalesforce,etc.)• Namesofpersons• Namesofcompanies(competitors...)• Namesoftools(concurrenttools...)

• Classifydiscussionsbytopics• Groupdiscussionstogether• Finddiscussionswherepeoplearementionedbutdon'tparticipatetothediscussion.

• Entitylinking• Linksbetweenprofilesandmentionsinthetext• Linksbetweenpersonsandorganizations• Linksbetweenpersonsandanyotherinformationthatmaybeusedforre-identification

Wherecanthisbeuseful?

Page 4: CWIN17 Frankfurt / talend_nlp

4

Wherecanthisbeuseful?

Page 5: CWIN17 Frankfurt / talend_nlp

5

• Usetextualdatatogetmoreinformationaboutyourstructureddata

• AnalyzeCRMnotes• Extractcontactnames• Getinformationabouttheirstatus(leftthecompany,newphonenumber,gotmarriedandchangedname…)

• Comparethemwiththecurrentvaluesinyourstructureddata• Contactinformationup-to-date?• Namechanged?• Phonechanged?• Addresschanged?• …

http://ualr.edu/informationquality/iciq-proceedings/iciq-2015/

Self-healingcustomerdataqualityissuesthroughinterpretationofunstructured

data(Chandrasekaran.K,Clement.D)

Relationshipwithdataquality?

Page 6: CWIN17 Frankfurt / talend_nlp

6

• Prepare text sample• Removeclutter (e.g.HTMLtags)• Tokenize &normalize

• TrainaModel• Designthe features• Labelentities• Validate the model (e.g.K-Fold CrossValidation)

• Usethe Model• Apply onfull text

UseSparkBatch

Great!HowdoesitworkinTalend?

Page 7: CWIN17 Frankfurt / talend_nlp

7

Componentworkflow

Page 8: CWIN17 Frankfurt / talend_nlp

8

Texttransformations

ConvertinConll-2003formataddoptionalfeaturesandlabeltokens

Extractnamedentitieswith<PER>labels

Page 9: CWIN17 Frankfurt / talend_nlp

9©2017 Talend Inc

TheStanfordCoreNLPLibrary

Page 10: CWIN17 Frankfurt / talend_nlp

10

Semantic Analysis

http://nlp.stanford.edu:8080/corenlp/

Page 11: CWIN17 Frankfurt / talend_nlp

11

Meaning of the tags

https://www.clips.uantwerpen.be/pages/mbsp-tags

Page 12: CWIN17 Frankfurt / talend_nlp

12

SentimentAnalysis&SentimentTree

http://corenlp.run/

http://nlp.stanford.edu:8080/sentiment/rntnDemo.html

Page 13: CWIN17 Frankfurt / talend_nlp

13©2017 Talend Inc

Let’sdosomeNLPwithTalend!

Page 14: CWIN17 Frankfurt / talend_nlp

14

Capturing TwitterMessages

Page 15: CWIN17 Frankfurt / talend_nlp

15

Analysisof text messages with Talend

Page 16: CWIN17 Frankfurt / talend_nlp

16

• NaturalLanguageProcessing(NLP)componentsareavailableinSparkBatchandStreaming

• Whatcanitbeusedfor?• Extractusefulinformationfromtextualresources(peoplenames,

companies,tools…)• Classifydiscussionsbytopics(groupdiscussionstogether,find

discussionswherepeoplearementioned)• Entitylinking(e.g.personsandorganizationslinking,links

betweenpersonsandanyotherinformationthatmaybeusedforre-identification)

• Whatarethetypicalindustryusecases?• IntelligentSearch• SentimentAnalysis• MarketingPersonalization• GDPR• …

• TalendcomeswithSupportforNLP• ModelPreparation• ModelTraining• ModelEvaluation

Summary

I added

a tool in the software