36
Case study: Comcast Consumer reviews Olabanji Shonibare and Melissa Davidson March 1, 2017

Natural language processing using comcast reviews

Embed Size (px)

Citation preview

Case study: Comcast Consumer reviews

Olabanji Shonibare and Melissa Davidson

March 1, 2017

Outline

• Introduction

• Word co-occurrences

• Topic modeling

• Sentiment analysis

Introduction

Comcast Consumer Complaints

• comcast_consumeraffairs_complaints.csv

• comcast_fcc_complaints_2015.csv

Raw complaint data about Comcast television and internet published at consumeraffairs.com between 04/08 and 09/16.

Raw complaints made to the FCC about Comcast between 04/15 and 06/15.

ConsumerAffairs

Number of complaints (CA)

AKBCNEONPEWYHIIDNDKSNVMENCWIOHVTMOARKYWVAZLANYMSNMDCSCDENHALUTCTORMNINCOMAVAWAMDTNMITXNJPAILGACAFL

0 200 400 600 800Count

Location

AL

AZ AR

CA

CO

CT

DE

FL

GA

ID

IL IN

IA

KSKY

LA

ME

MD

MAMI

MN

MS

MO

MT

NE

NV

NH

NJ

NM

NY

NC

ND

OH

OK

OR

PARI

SC

SD

TN

TX

UT

VT

VA

WA

WV

WI

WYPercent

[0.00272 to 0.00387)

[0.00387 to 0.00443)

[0.00443 to 0.00481)

[0.00481 to 0.00571)

[0.00571 to 0.00705)

[0.00705 to 0.00931)

[0.00931 to 0.27972]

NA

CA Percentage for Areas with Comcast

Complaints by percentage (CA)

FCC

Complaints by percentage (FCC)

AL

AZ AR

CA

CO

CT

DE

FL

GA

ID

IL IN

IA

KSKY

LA

ME

MD

MAMI

MN

MS

MO

MT

NE

NV

NH

NJ

NM

NY

NC

ND

OH

OK

OR

PARI

SC

SD

TN

TX

UT

VT

VA

WA

WV

WI

WY Percent[0.000663 to 0.001310)

[0.001315 to 0.001580)

[0.001577 to 0.001680)

[0.001682 to 0.002090)

[0.002093 to 0.003100)

[0.003103 to 0.005270)

[0.005271 to 0.303950]

NA

FCC Percentage for Areas with Comcast

Complaints by zip code (FCC)

30

40

50

−120 −100 −80longitude

latit

ude

1000

1500

2000

rank(freq.x)

FCC Comments by Zip Code

CA and FCC

Complaints for FCC and CA

alaskaiowa

montananebraska

rhode islandwyoming

hawaiiidaho

north dakotakansasnevada

wisconsinnorth carolina

maineohio

vermontmissouri

arkansaskentucky

west virginianew yorklouisiana

arizonanew mexico

delawarenew hampshire

district of columbiasouth carolina

mississippialabama

utahconnecticutminnesota

oregonindiana

massachusettscolorado

virginiawashington

marylandtexas

new jerseymichigan

tennesseepennsylvania

illinoisgeorgia

californiaflorida

oklahomasouth dakota

0 250 500 750 1000Count

Loca

tion

Complaints by percentage (FCC and CA)

AL

AZ AR

CA

CO

CT

DE

FL

GA

ID

IL IN

IA

KSKY

LA

ME

MD

MAMI

MN

MS

MO

MT

NE

NV

NH

NJ

NM

NY

NC

ND

OH

OK

OR

PARI

SC

SD

TN

TX

UT

VT

VA

WA

WV

WI

WYPercent

[0.00340 to 0.00522)

[0.00522 to 0.00623)

[0.00623 to 0.00690)

[0.00690 to 0.00799)

[0.00799 to 0.01037)

[0.01037 to 0.01295)

[0.01295 to 0.30395]

NA

Both Percentage for Areas with Comcast

Word co-occurrences

Co-occurrence network in Comcast dataset texts (CA)

●●

●●

● ●

call

customer

service

Internetmonth

bill

daytold

time

hour

charge

pay

week

cable

tv

supervisor

technician

rep

bad

receive

finally

tech

phone

home

fee

company

account

issue

cancel

box

speak

people

wait

credit

minute

fix

n1000

1500

2000

2500

3000

●●

●●

● ●

service

data

customer

caps

speed

billing

practices

issuesunfair

slow

speeds

poor

internet

phone

throttling

xfinity

switch

connection

tv

complaint

cap

horrible

bill

services

business

failure pricing

issue

charges

terrible

pricerental

fraudulent

300gb

bad

usage

cable

bait

overage

modem n25

50

75

100

Co-occurrence network in Comcast dataset titles (FCC)

●●

●●

●●

service

told

month

called

bill

timecall

customerphone

cable

pay

speed

servicesaccount

issue

paying

times

home

months

tv

day

modem

charge

internet

received

n300

500

700

900

Co-occurrence network in Comcast dataset text (FCC)

Topic modeling

Topic modeling

Topic modeling is a method for unsupervised classification of documents, similar to “clustering” on numerical data.

Topic model: Latent Dirichlet Allocation (LDA)

1 2 3 4

0.00 0.02 0.04 0.06 0.00 0.02 0.04 0.06 0.00 0.01 0.02 0.03 0.00 0.02 0.04 0.06

channel

time

pay

charge

cable

day

service

phone

bill

call

charge

cable

hour

service

pay

month

Internet

time

phone

customer

day

cable

pay

customer

time

bill

Internet

month

call

service

Internet

cable

receive

technician

customer

home

bill

box

call

service

β

Top 10 terms in each LDA topic (CA)

1 2 3 4

0.00 0.01 0.02 0.03 0.04 0.00 0.01 0.02 0.03 0.000.010.020.030.040.05 0.00 0.02 0.04 0.06

pay

tv

rate

package

price

cable

bill

month

Internet

service

told

issue

customer

account

day

Internet

phone

time

service

call

time

issue

bill

month

pay

charge

modem

service

Internet

speed

customer

stream

pay

limit

usage

month

service

cap

Internet

datum

β

Top 10 terms in each LDA topic (FCC)

Probability distribution for each topic (CA)

3 4

1 2

0.2 0.4 0.6 0.2 0.4 0.6

0

1000

2000

3000

4000

5000

0

1000

2000

3000

4000

5000

gamma

Num

ber o

f doc

umen

ts

Topic1

2

3

4

Probability distribution for each topic (FCC)

3 4

1 2

0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

0

500

1000

0

500

1000

gamma

Num

ber o

f doc

umen

ts

Topic1

2

3

4

Sentiment analysis

Sentiment analysis

It is a computational approach to identify people’s opinion towards an entity

Classification techniques• Machine learning• Lexicon-based

Sentiment analysis

Classification techniques• Machine learning• Lexicon-based

###Atibble:13,901×2##wordsentiment##<chr><chr>##1abacustrust##2abandonfear##3abandonnegative##4abandonsadness##5abandonedanger##6abandonedfear##7abandonednegative##8abandonedsadness##9abandonmentanger##10abandonmentfear###...with13,891morerows

###Atibble:6,788×2##wordsentiment##<chr><chr>##12-facednegative##22-facesnegative##3a+positive##4abnormalnegative##5abolishnegative##6abominablenegative##7abominablynegative##8abominatenegative##9abominationnegative##10abortnegative###...with6,778morerows

###Atibble:2,476×2##wordscore##<chr><int>##1abandon-2##2abandoned-2##3abandons-2##4abducted-2##5abduction-2##6abductions-2##7abhor-3##8abhorred-3##9abhorrent-3##10abhors-3###...with2,466morerows

NRC BING AFINN

AFINN lexicon

• “ I’m not happy and I don’t like it ”23

Sentiment score = 5

• This is utterly excellent!3

Sentiment score = 3

• "I continue to receive unwanted calls from Comcast despite my instructions."

Sentiment score = -2

-2

AlaskaNorth Carolina

HawaiiNorth Dakota

MaineWyomingNew York

IdahoMissouriAlabamaNevada

UtahNew Mexico

PennsylvaniaMassachusetts

TexasNebraskaArkansasMaryland

IllinoisNew Jersey

ColoradoConnecticut

GeorgiaVermont

MinnesotaFlorida

ArizonaMichiganCalifornia

TennesseeWashington

OhioNew Hampshire

VirginiaSouth Carolina

LouisianaDelaware

OregonIndianaKansas

KentuckyWisconsinMississippi

West Virginia

−3 −2 −1 0Average sentiment score

stat

e sentimentnegative

Average AFINN score for reviews within each state (CA)

Average AFINN score for reviews within each state (CA)

AL

AZ AR

CA

CO

CT

DE

FL

GA

ID

IL IN

IA

KSKY

LA

ME

MD

MAMI

MN

MS

MO

MT

NE

NV

NH

NJ

NM

NY

NC

ND

OH

OK

OR

PARI

SC

SD

TN

TX

UT

VT

VA

WA

WV

WI

WYSentiment

[−3.000 to −0.833)

[−0.833 to −0.735)

[−0.735 to −0.683)

[−0.683 to −0.616)

[−0.616 to −0.549)

[−0.549 to −0.516)

[−0.516 to −0.282]

NA

Average AFINN score for reviews within each state (FCC)

IowaNorth Carolina

ArkansasKansas

West VirginiaMaine

VermontArizona

WashingtonNew York

FloridaSouth Carolina

IllinoisPennsylvania

MontanaMichigan

TennesseeMinnesota

AlabamaCalifornia

GeorgiaColoradoMissouri

KentuckyDistrict Of Columbia

MississippiTexas

New JerseyOregonIndiana

New HampshireMaryland

MassachusettsDelaware

VirginiaNew Mexico

UtahConnecticut

NevadaLouisiana

OhioRhode Island

−1 0 1Average sentiment score

Stat

e

sentimentnegative

positive

paybad

cancelcharged

wrongrefuse

horriblepoor

complainterrible

mistakeworst

ridiculouserrorlost

leavehatefree

happyhopenicefine

supportresolve

care

−4000 −2000 0Contribution to sentiment

word

sentimentnegative

positive

Words with the highest contribution to sentiment scores (CA)

paycharged

cancelrefuse

badcomplain

wrongridiculous

poordroplack

terribleerror

mistakeunfairagreesavehope

promisefine

resolvedcare

increasesupportresolve

−1500 −1000 −500 0 500Contribution to sentiment

word

sentimentnegative

positive

Words with the highest contribution to sentiment scores (FCC)

without won't

no not

can't don't

−40

−30

−20

−10 0 0 5 10 15

−200

−100 0

100 0

100

0 20 40 60 −50 0 50 100

150

200

250

worrypay

botherhonor

trustrecommend

appreciatelike

wantcare

payworth

likeresolved

recommendtrue

happyhelpwantcare

payborefightstop

wasteagree

acceptallow

likerecommend

supporthonor

help

payblamecancel

awfulpunish

effectivelyfulfill

improverecommend

savethank

guaranteetrust

honorwin

help

problemproblems

warningbetter

solutionsuccess

mattergoodluckhelp

warningpenalty

failinterruption

losingproblems

killinglossluck

resolvesuccess

Sentiment score * number of occurrences

Wor

ds p

rece

ded

by a

neg

atio

n

Negations that contributed the most to sentiment (CA)

without won't

no not

can't don't

−30

−20

−10 0 10 −2 −1 0 1 2

−40

−20 0 20 0 20 40 60 80

0 5 10 15 0 20 40 60

paycomplain

allowappreciate

helphonor

supporttrustlike

carewant

helpfullikefair

trueallowcare

honorhelp

resolvedwant

pay

allow

extend

restore

help

honor

cancel

avoid

leave

pay

honor

help

problemwarning

problemscharges

progressresolvethanks

improvementluck

successmatter

goodhelp

warningfail

penaltylosing

worryingfear

interruptedproblemsrestriction

troubleapprovalresolvingsuccess

Sentiment score * number of occurrences

Wor

ds p

rece

ded

by a

neg

atio

nNegations that contributed the most to sentiment (FCC)

Thank you

Questions or comments?

Next step

• Machine learning

• Other packages: sentimentr, algorithmia, …