Aspect Extraction with Automated Prior Knowledge Learningzchen/presentations/ACL...Aspect Extraction...

Preview:

Citation preview

Aspect Extraction with Automated Prior Knowledge Learning

Zhiyuan (Brett) Chen Arjun Mukherjee

Bing Liu

Aspect Extraction

Extracting  aspect  terms

Aspect Terms

This  camera  takes  beautiful  pictures  but  its  price  is  higher  than  $200.

Aspect Terms

This  camera  takes  beautiful  pictures  but  its  price  is  higher  than  $200.

Aspect Extraction

Grouping  terms  into  categories

Extracting  aspect  terms

Grouping

PicturePhotoImageAspect  1 Aspect  2

PriceCostMoney

Aspect Extraction

Input:  A  review  collection

Output:  A  set  of  aspects(with  top  aspect  terms). Price

CheapCostMoneyPricy

BatteryLifeChargeAAAHour

Aspect  1 Aspect  2

Topic Models to Extract Aspects (e.g.,  Chen  et  al.,  2013;  Kim  et  al.,  2013;  Lazaridou  et  al.,  2013;  Mukherjee  and  Liu,  2012;  Moghaddam  and  Ester,  2011;  Sauper  et  al.,  2011;  Lin  and  He,  2009;  Titov  and  McDonald,  2008;  Lu  and  Zhai,  2008;)

Perform  both  extracting  and  grouping

A  topic  is  basically  an  aspect

Traditional Modeling Flow

M  DocsDomain  1

Traditional Modeling Flow

T  Topics

LDA

M  DocsDomain  1

Traditional Modeling Flow

T  Topics

LDA

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

Traditional Modeling Flow

T  Topics

LDA

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

Can we improve these topics by using them only?

Can we improve these topics by using them only? Fully automatic No other resources No human intervention

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

T  Topics

LDA

Our Proposed Algorithm

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

T  Topics

LDA

Topic  Base

Our Proposed Algorithm

Knowledge  BaseLearn  Knowledge  Automatically

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

T  Topics

LDA

Topic  Base

Our Proposed Algorithm

Knowledge  BaseLearn  Knowledge  Automatically

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

T  Topics

LDA

Topic  Base

Our Proposed Algorithm

a)  Existing  Domains

AKL (Automated !Knowledge LDA)!

Knowledge  BaseLearn  Knowledge  Automatically

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

T  Topics

LDA

Topic  Base

M  DocsDomain  1

T  Topics

AKL

M  DocsDomain  2

T  Topics

AKL

M  DocsDomain  N

T  Topics

AKL

Our Proposed Algorithm

a)  Existing  Domains

Knowledge  BaseLearn  Knowledge  Automatically

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

T  Topics

LDA

Topic  Base

Our Proposed Algorithm

b)  New  Domain

Knowledge  BaseLearn  Knowledge  Automatically

M  DocsDomain  1

T  Topics

LDA

M  DocsDomain  2

T  Topics

LDA

M  DocsDomain  N

T  Topics

LDA

Topic  Base

M  DocsDomain  N+1

T  Topics

AKL

Our Proposed Algorithm

b)  New  Domain

Why don’t we merge documents from different domains and run LDA?

Run LDA on Merged Data

Number  of  Topics

Topic  belongs  to  which  domain

Scalability

M  Docs

M  Docs M  Docs

M  Docs

M  Docs

Run LDA on Merged Data

Run LDA on Merged Data

Run  LDA

Our Proposed Algorithm Run  LDA Run  LDA Run  LDA

Run  LDA

Run  LDA

T  Topics

T  Topics T  Topics

T  Topics

T  Topics

Our Proposed Algorithm

Our Proposed Algorithm

Learn  Knowledge

Knowledge

Our Proposed Algorithm

Knowledge Knowledge

Knowledge

Knowledge

Our Proposed Algorithm Run  AKL Run  AKL Run  AKL

Run  AKL

Run  AKL

Multiple  Senses

KnowledgeReliability

Learn Knowledge Automatically

Multiple  Senses

KnowledgeReliability

Learn Knowledge Automatically

{Light,  Bright}{Light,  Luminance}

{Light,  Weight}{Light,  Heavy}

Light

Multiple Senses

Existing  Models  with  Multiple Senses

Assume  single  senseDF-‐‑‒LDA  (Andrzejewski  et  al.,  2009)

User  specified  multiple  sensesMC-‐‑‒LDA  (Chen  et  al.,  2013)

Automatically  distinguish  senses  when  extracting  knowledge

Multiple  Senses

KnowledgeReliability

Topic  Clustering

Learn knowledge Automatically

Topic Clustering

A  topic  represents  words  with  similar  meaning  (but  noisy)

Group  topics  with  similar  sense  into  one  cluster

Different  senses  of  a  word  should  be  split  into  different  clusters

Multiple  Senses

KnowledgeReliability

Topic  Clustering

Learn knowledge Automatically

Topic Overlapping

Every  product  domain  has  price.

Most  electronic  domains  have  battery.

Some  electronic  domains  share  screen.

Example BatteryLifePictureCharge

BatteryPriceLifeSize

BatteryChargeAAAScreen

D1                                        D2                                    D3

Example

BatteryLifePictureCharge

BatteryPriceLifeSize

BatteryChargeAAAScreen

D1                                        D2                                    D3

Two  words  together  at  least  2  times

Example

BatteryLifePictureCharge

BatteryPriceLifeSize

BatteryChargeAAAScreen

D1                                        D2                                    D3

Two  words  together  at  least  2  times

{Battery,  Life}  and  {Battery,  Charge}

Multiple  Senses

KnowledgeReliability

Topic  Clustering

Frequent  Itemset  Mining

Learn knowledge Automatically

Frequent Itemset Mining (FIM)

Each  topic  is  a  transaction

Find  frequent  patterns  satisfy  minimum  support  thresholds

Each  pattern  contains  2  terms

Knowledge Representation

In  the  form  of  knowledge  clusters  (KC)

Each  KC  has  a  list  of  frequent  2-‐‑‒patterns

KC1:  {battery,  life},  {battery,  charge},  {battery,  hour},  {charge,  hour}

AKL (Automated Knowledge LDA)

Incorporate  Knowledge

Wrong  Know.  Towards  Domain

AKL Model

Add  variable  cIncorporate  Knowledge

Wrong  Know.  Towards  Domain

AKL Plate Notation

c:  knowledge  cluster

AKL Plate Notation

c:  knowledge  cluster

AKL Plate Notation

c:  knowledge  cluster

AKL Plate Notation

c:  knowledge  cluster

AKL Model

Add  variable  c

GPU  Model

Incorporate  Knowledge

Wrong  Know.  Towards  Domain

Topic  0

price

LDA with SPU (Simple Pólya Urn Model)

Topic  0

price price

LDA with SPU (Simple Pólya Urn Model)

Topic  0

price

AKL with GPU (Generalized Pólya Urn Model)

Topic  0

price price

cheap

{price,  cheap}AKL with GPU (Generalized Pólya Urn Model)

AKL Model

Add  variable  c

GPU  Model

Incorporate  Knowledge

Wrong  Know.  Towards  Domain

Wrong Know. Towards Domain Wrong  because  of  TM  mistakes{Price,  Picture}

Wrong  towards  a  particular  domain  {Light,  Bright}{Light,  Weight}

AKL Model

Add  variable  c

GPU  Model

Co-‐‑‒Document  Frequency  Ratio

Incorporate  Knowledge

Wrong  Know.  Towards  Domain

Co-Document Frequency Ratio

Co-Document Frequency Ratio

Estimated  in  the  current  domain

Co-Document Frequency Ratio

Estimated  in  the  current  domain

{Price,  Cheap}{Price,  Image}

Evaluation

Evaluation 36  product  domains.  Each  domain:1000  Reviews15  Topics

EvaluationHuman

Objective

Model Comparison

LDA  (Blei  et  al.,  2003)

GK-‐‑‒LDA  (Chen  et  al.,  2013)

MC-‐‑‒LDA  (Chen  et  al.,  2013)

Model Comparison

LDA  (Blei  et  al.,  2003)

GK-‐‑‒LDA  (Chen  et  al.,  2013)

Feed  them  with  the  knowledge  from  our  algorithm

MC-‐‑‒LDA  (Chen  et  al.,  2013)

Objective Evaluation

-1510

-1490

-1470

-1450

-1430

0 1 2 3 4 5 6

Topi

c C

oher

ence

AKL GK-LDAMC-LDA LDA

Example Aspects

Human Evaluation

0.6

0.7

0.8

0.9

1.0

Camera Computer Headphone GPS

Precision

@ 5

AKL GK-LDA MC-LDA LDA

Human Evaluation

0.6

0.7

0.8

0.9

1.0

Camera Computer Headphone GPS

Precision

@ 1

0

AKL GK-LDA MC-LDA LDA

Number of Topic Clusters

-1510-1490-1470-1450-1430

20 30 40 50 60 70Top

ic C

oher

ence

#Clusters

Conclusions

To  extract  better  aspects

Learn  knowledge  automatically

AKL:  Leverage  automated  knowledge

Multiple  Senses

KnowledgeReliability

Learn knowledge Automatically

Topic  Clustering

Frequent  Itemset  Mining

AKL Model

Add  variable  c

GPU  Model

Co-‐‑‒Document  Frequency  Ratio

Incorporate  Knowledge

Wrong  Know.  Towards  Domain

Q&A

Recommended