31
Intro to Information Extraction -Sentiment Lexicon Induction as an example Many slides adapted from Ellen Riloff and Dan Jurafsky

Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

  • Upload
    others

  • View
    19

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

IntrotoInformationExtraction-SentimentLexiconInductionasan

example

Many slides adapted from Ellen Riloff and Dan Jurafsky

Page 2: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

WhatisInformationExtraction?• Informationextraction(IE)isanumbrellatermforNLPtasksthatinvolveextractingpiecesofinformationfromtextandassigningsomemeaningtotheinformation.

• ManyIEapplicationsaimtoturnunstructuredtextintoa“structured” representation.

• IEproblemstypicallyinvolve:– identifyingtextsnippetstoextract

– assigningsemanticmeaningtoentitiesorconcepts

– findingrelationsbetweenentitiesorconcepts

Page 3: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

IEApplications• BiologicalProcesses(Genomics)

• ClinicalMedicine

• QuestionAnswering/WebSearch

• QueryExpansion/SemanticSets

• ExtractingEntityProfiles

• TrackingEvents(Violent,Diseases,Business,etc)

• TrackingOpinions(Political,ProductReputation,FinancialPrediction,On-lineReviews,etc.)

Page 4: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

GeneralTechniques• SyntacticAnalysis

– PhraseIdentification

– FeatureExtraction

• SemanticAnalysis

• StatisticalMeasures

• MachineLearning

– Supervised&WeaklySupervised

• GraphAlgorithms

Page 5: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

NamedEntityRecognition(NER)

MarsOneannouncedMonday thatithaspicked1,058aspiringspaceflyerstomoveontothenextroundinitssearchforthefirsthumanstoliveanddieontheRedPlanet.

TheWallStreetJournalreportsthatGoogle planstopartnerwithToyotatodevelopAndroidsoftwarefortheirhybridcars.

NERtypicallyinvolvesextractingandlabelingcertaintypesofentities,suchaspropernamesanddates.

Page 6: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Domain-specificNER

IL−2 gene expressionand NFkappaB activationthrough CD28 requiresreactiveoxygenproductionby 5lipoxygenase.

Biomedicalsystemsmustrecognizegenesandproteins:

Adrenal-Sparingsurgeryissafeandeffective,andmaybecomethetreatmentofchoiceinpatientswithhereditaryphaeochromocytoma.

Clinicalmedicalsystemsmustrecognizeproblemsandtreatments:

Page 7: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

SemanticClassIdentification

MarsOneannouncedMondaythatithaspicked1,058aspiringspaceflyerstomoveontothenextroundinitssearchforthefirsthumanstoliveanddieontheRedPlanet.

TheWallStreetJournalreportsthatGoogleplanstopartnerwithToyotatodevelopAndroidsoftwarefortheirhybridcars.

Page 8: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

SemanticLexiconInduction• Althoughsomegeneralsemanticdictionariesexist(e.g.,WordNet),domain-specificapplicationsoftenhavespecializedvocabulary.

• SemanticLexiconInductiontechniqueslearnlistsofwordsthatbelongtoasemanticclass.

Vehicles:car,jeep,helicopter,bike,tricycle,scooter,…

Animal:tiger,zebra,wolverine,platypus,echidna,…

Symptoms:cough,sneeze,pain,pu/pd,elevatedbp,…

Products:camera,laptop,iPad,tablet,GPSdevice,…

Page 9: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Domain-specificVocabulary

A 14yo m/n doxy owned by a reputable breeder is being treated for IBD with pred.

doxy

predIBDbreeder

ANIMALHUMANDISEASEDRUG

Domain-specific meanings: lab, mix, m/n = ANIMAL

Page 10: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

SemanticTaxonomyInduction• Ideally,wewantsemanticconceptstobeorganizedinataxonomy,tosupportgeneralizationbuttodistinguishdifferentsubtypes.

AnimalMammal

FelineLion,PantheraLeoTiger,PantheraTigris,FelisTigrisCougar,MountainLion,Puma,Panther,Catamount

CanineWolf,CanisLupusCoyote,PrairieWolf,BrushWolf,AmericanJackalDog,Puppy,CanisLupusFamiliaris,Mongrel

Page 11: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

ChallengesinTaxonomyInduction• Butthereareoftenmanywaystoorganizeaconceptualspace!

• Stricthierarchiesarerareinrealdata– graphs/networksaremorerealisticthantreestructures.

• Forexample,animalscouldbesubcategorizedbasedon:– carnivorevs.herbivore– water-dwellingvs.land-dwelling– wildvs.petsvs.agricultural– physicalcharacteristics(e.g.,baleenvs.toothedwhales)– habitat(e.g.,arcticvs.desert)

Page 12: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

RelationExtraction

InSalzburg,littleMozart grewupinalovingmiddle-classenvironment.

Birthplace(Mozart,Salzburg)

SteveBallmer isanAmericanbusinessmanwhohasbeenservingastheCEOofMicrosoft sinceJanuary2000

Employed-By(SteveBallmer,Microsoft)CEO(SteveBallmer,Microsoft)

Page 13: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

RelationsforWebSearch

Page 14: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Paraphrasing• Relationscanoftenbeexpressedwithamultitudeofdifferenceexpressions.

• Paraphrasingsystemstrytoexplicitlylearnphrasesthatrepresentthesametypeofrelation.

• Examples:– XwasborninY

– YisthebirthplaceofX

– X’sbirthplaceisY

– X’shometownisY

– XgrewupinY

Page 15: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

15

Event ExtractionGoal: extract facts about events from unstructured documents

December 29, Pakistan - The U.S. embassy in Islamabad was damaged this morning by a car bomb. Three diplomats were injured in the explosion. Al Qaeda has claimed responsibility for the attack.

EVENTType: bombingTarget: U.S. embassyLocation: Islamabad,

PakistanDate: December 29Weapon: car bombVictim: three diplomatsPerpetrator: Al Qaeda

Example: extracting information about terrorism events in news articles:

Page 16: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

EventExtraction

Document TextNew Jersey, February, 26. An outbreak of swine flu has been confirmed in Mercer County, NJ. Five teenage boysappear to have contracted the deadly virus from an unknown source. The CDC is investigating the cases and is taking measures to prevent the spread. . .

EventDisease: swine fluLocation: Mercer County, NJVictim: Five teenage boysDate: February 26Status: confirmed

Anotherexample:extractinginformationaboutdiseaseoutbreakevents.

Page 17: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Large-ScaleIEfromtheWeb

• SomeresearchershavebeendevelopingIEsystemsforlarge-scaleextractionoffactsandrelationsfromtheWeb.

• ThesesystemsexploitthemassiveamountoftextandredundancyavailableontheWebanduseweaklysupervised,iterativelearningtoharvestinformationforautomatedknowledgebaseconstruction.

• TheKnowItAllprojectatUWandNELLprojectatCMUarewell-knownresearchgroupspursuingthiswork.

Page 18: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced
Page 19: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

OpinionExtraction

Source:<writer>Target:PowershotAspect:pictures,colorsEvaluation:beautiful,easytogrip

IjustboughtaPowershotafewdaysago.Itooksomepicturesusingthecamera.Colorsaresobeautifulevenwhenflashisused.Alsoeasytogripsincethebodyhasagriphandle.[Kobayashietal.,2007]

Page 20: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

OpinionExtractionfromNews[Wilson&Wiebe,2009]

Source:ItaliansenatorRenzoGubertTarget:theChineseGovernmentEvaluation:praisedPOSITIVE

ItaliansenatorRenzoGubertpraisedtheChineseGovernment’sefforts.

AfricanobserversgenerallyapprovedofhisvictorywhileWesterngovernmentsdenouncedit.

Source:AfricanobserversTarget:hisvictoryEvaluation:approvedPOSITIVE

Source:WesterngovernmentsTarget:it(hisvictory)Evaluation:denouncedNEGATIVE

Page 21: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Summary• Informationextractionsystemsfrequentlyrelyonlow-levelNLPtoolsforbasiclanguageanalysis,ofteninapipelinearchitecture.

• ThereareawidevarietyofapplicationsforIE,includingbothbroad-coverageanddomain-specificapplications.

• SomeIEtasksarerelativelywell-understood(e.g.,namedentityrecognition),whileothersarestillquitechallenging!

• We’veonlyscratchedthesurfaceofpossibleIEtasks…nearlyendlesspossibilities.

Page 22: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

TurneyAlgorithmtolearnaSentimentLexicon

1. Extractaphrasallexiconfromreviews

2. Learnpolarityofeachphrase

3. Rateareviewbytheaveragepolarityofitsphrases

22

Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

Page 23: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Extracttwo-wordphraseswithadjectives

FirstWord SecondWord ThirdWord (notextracted)

JJ NNorNNS anythingRB, RBR,RBS JJ NotNNnorNNS

JJ JJ NotNNnorNNSNNorNNS JJ NotNNnor NNS

RB,RBR,orRBS

VB,VBD,VBN,VBG

anything

23

Page 24: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Howtomeasurepolarityofaphrase?• Positivephrasesco-occurmorewith“excellent”

• Negativephrasesco-occurmorewith“poor”

• Buthowtomeasureco-occurrence?

24

Page 25: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

PointwiseMutualInformation

• Pointwise mutualinformation:

– Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 26: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

PointwiseMutualInformation

• Pointwise mutualinformation:

– Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?

• PMIbetweentwowords:

– Howmuchmoredotwowordsco-occurthaniftheywereindependent?PMI(word1,word2 ) = log2

P(word1,word2)P(word1)P(word2)

PMI(X,Y ) = log2P(x,y)P(x)P(y)

Page 27: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

HowtoEstimatePointwiseMutualInformation

– Querysearchengine(Altavista)

• P(word)estimatedbyhits(word)/N

• P(word1,word2)byhits(word1 NEAR word2)/N– (MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecutivebigrams(word1,word2),butkN bigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)

PMI(word1,word2 ) = log2

1Nhits(word1 NEAR word2)

1Nhits(word1) 1

Nhits(word2)

Page 28: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Doesphraseappearmorewith“poor” or“excellent”?

28

Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")

= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!

"#

$

%&

= log2hits(phrase NEAR "excellent")

hits(phrase)hits("excellent")hits(phrase)hits("poor")

hits(phrase NEAR "poor")

= log2

1N hits(phrase NEAR "excellent")1N hits(phrase) 1

N hits("excellent")− log2

1N hits(phrase NEAR "poor")1N hits(phrase) 1

N hits("poor")

Page 29: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Phrasesfromathumbs-upreview

29

Phrase POStags

Polarity

onlineservice JJNN 2.8

onlineexperience JJNN 2.3

directdeposit JJNN 1.3

localbranch JJNN 0.42…

lowfees JJNNS 0.33

trueservice JJNN -0.73

otherbank JJNN -0.85

inconvenientlylocated

JJNN -1.5

Average 0.32

Page 30: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

Phrasesfromathumbs-downreview

30

Phrase POStags

Polarity

directdeposits JJNNS 5.8

onlineweb JJNN 1.9

veryhandy RBJJ 1.4…

virtualmonopoly JJNN -2.0

lesserevil RBRJJ -2.3

otherproblems JJNNS -2.8

lowfunds JJNNS -6.8

unethicalpractices

JJNNS -8.5

Average -1.2

Page 31: Intro to Information Extraction -Sentiment Lexicon ...faculty.cse.tamu.edu/huangrh/Fall18-638/l16_ie-overview_sentiment_lexicon.pdfNamed Entity Recognition (NER) Mars One announced

ResultsofTurneyalgorithm• 410reviewsfromEpinions

– 170(41%)negative

– 240(59%)positive

• Majorityclassbaseline:59%

• Turneyalgorithm:74%

• Phrasesratherthanwords

• Learnsdomain-specificinformation31