26
Data Analytics at Texas A&M Lab Mengnan Du, Ninghao Liu, Fan Yang, Shuiwang Ji, Xia Hu Texas A&M University On Attribution of Recurrent Neural Network Predictions via Additive Decomposition

On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

Mengnan Du, Ninghao Liu, Fan Yang, Shuiwang Ji, Xia Hu

Texas A&M University

On Attribution of Recurrent Neural Network

Predictions via Additive Decomposition

Page 2: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

RNNs are regarded as black boxes

“Interpreting Machine Learning Models”. Medium.com, 2017.

RNNs make lots of progress

Accuracy

Inte

rpre

tab

ility

1

Text classification Machine translation

Four types of RNNs

Page 3: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

Interpretation beneficial both to researchers and end-users

Researcher/developer End-user

Our Goal --- Provide post-hoc interpretation behind individual prediction

• Increase the interpretability for RNNs• Keep prediction performance unchanged

2

RNN

Explanation

UsersTrust

Explanation

ResearcherRefine

Page 4: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

Key factors

--- A pre-trained RNN and an input text

--- The prediction of RNN

Post-hoc Interpretation

--- Contribution score for each feature in input

--- Deeper color in the heatmap means higher contribution

3

Interpretation heatmap

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Teaching Machines to Read and Comprehend, NIPS

Page 5: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

1

2

How to guarantee that the post-hoc interpretations are indeed

faithful to the original prediction

to the original model.

Local approximation based methods may not be faithful

It is challenging to develop an attribution method which could

generate phrase-level explanations.

4

positive

Negative sentiment

Phrase-level explanation“ Used to be my favorite”.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier”. KDD, 2016.

Page 6: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

Faithful interpretation

Phrase-levelinterpretation

Should investigate Internal neurons

Interpretation method should be flexible

Can we utilize decomposition based methods to derive interpretations?

5

Page 7: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

• Symbol α𝑡 ∶ partial evidence is brought to the time step 𝑡

• Symbol = 𝑔(𝑥𝑡): the evidence that RNN obtains at time step 𝑡

• Some follow this rule exactly, e.g., GRU. Some approximately, e.g., LSTM

6

ℎ1 ℎ𝑡−1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡−1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

• Abstracted RNN updating rule:ℎ𝑡−1 ℎ𝑡

𝑥𝑡

α𝑡

Page 8: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 7

ℎ1 ℎ𝑡−1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡−1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Abstracted RNN updating rule:

• RNN prediction decomposition:

Two essential elements:

Hidden state vector ℎ𝑡updating vector α𝑡

RNN logit value:

Page 9: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 8

• From decomposition to word-level explanation

ℎ1 ℎ𝑡−1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡−1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

EvidenceUpdating

EvidenceForgetting

Contribution score for 𝑥𝑡 :Updating from 𝑡 − 1 to 𝑡Forgetting from 𝑡 + 1 to 𝑇

Page 10: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 9

• Phrase-level explanation• Contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞,… , 𝑟} :

ℎ1 ℎq ℎ𝑟 ℎ𝑇

𝑥1 𝑥𝑞 𝑥𝑟 𝑥𝑇

𝑧 𝑦

EvidenceUpdating

EvidenceForgetting

Two parts:Evidence updating from 𝑞 − 1 to 𝑟Evidence forgetting from 𝑟 + 1 to 𝑇

Page 11: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 10

• Hidden state vector updating rule for GRU:

“Understanding LSTM Networks”. Colah’s blog, 2015.

• REAT updating rule:

• GRU contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞, … , 𝑟} :

Only need to replace REAT updating vector α𝑡 with GRU updating gate vector 𝑢𝑡

Page 12: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 11

• Hidden state vector updating rule for LSTM:

“Understanding LSTM Networks”. Colah’s blog, 2015.

• Approximate REAT updating rule:

• LSTM contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞,… , 𝑟} :

Page 13: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 12

• Concatenation of normal GRU and reverse GRU:

• Phrase-level attribution for BiGRU:

Concatenation of two terms:Normal GRU decompositionReverse GRU decomposition

Page 14: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

2. Qualitative Evaluation via Case Studies

13

1. Attribution Faithfulness Evaluation

3. Applying REAT for Linguistic Patterns Analysis

4. Applying REAT for Model Misbehavior Analysis

Page 15: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 14

• Once the most important sentence is deleted, it will cause the highest accuracy drop for the target class.

“ It is ridiculous , but of course it is also refreshing”.

REAT explanations are highly faithful to original RNN

Page 16: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

REAT accurately reflect the prediction score of different architectures

15

• Visualizations Under Different RNN Architectures

GRU

LSTM

BiGRU

The fight scenes are fun but it grows tedious

The fight scenes are fun but it grows tedious

The fight scenes are fun but it grows tedious

• GRU positive prediction (51.6% confidence)

• LSTM positive prediction (96.2% confidence)

• BiGRU negative prediction (62.7% confidence)

Green: positive contribution, red: negative contribution

Page 17: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 16

• Hierarchical Attribution: LSTM negative prediction with 99.46% confidence

Word

Phrase

Clause

The story may be new but the movie does n’t serve up lots of laughs,

The story may be new but the movie does n’t serve up lots of laughs,

The story may be new but the movie does n’t serve up lots of laughs,

Green: positive contribution, red: negative contribution

• In general, the first part of the text has negative contribution

• The second part of the text has positive contribution

• This hierarchical attention represents the contributions at different levels

of granularity

Page 18: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 17

• Apply REAT to analyze linguistic patterns for LSTM over SST2 test.

POS category score distributions

RBS, e.g, “best”, “most”, highest score

JJ, adjectives, ranks relatively high

NN, nouns, near-zero median score

REAT unveils useful linguistic knowledge captured by LSTM

Page 19: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 18

• LSTM wrongly gives a 99.97% negative sentiment prediction.

• Attribution score distribution for two words “terrible” and “terribly”

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• REAT could tell us that LSTM captures the meaning relevant to “terrible”

• While ignore other meanings, such as “extremely”

• This LSTM fails to model polysemant of words

Page 20: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 19

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Negative prediction, 99.97% confidence

Page 21: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 20

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Extremely

Page 22: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 21

“Schweiger is talented and extremely charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Positive prediction, 81.29% confidence

Page 23: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 22

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Very

Page 24: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 23

“Schweiger is talented and very charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Positive prediction, 99.53% confidence

Page 25: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab 24

“Occasionally melodramatic, it ’s also extremely effective.”

• This adversarial attack generalizes to other instances

“Occasionally melodramatic, it ’s also terribly effective.”

Positive prediction, 99.53%

Negative prediction, 99.0%

“Extremely well acted by the four primary actors, this is a seriously intended movie that is not easily forgotten.”

“Terribly well acted by the four primary actors, this is a seriously intended movie that is not easily forgotten.”

Positive prediction, 99.98%

Negative prediction, 87.7%

Page 26: On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On Attribution of Recurrent Neural Network Predictions via Additive Decomposition. Data Analytics

Data Analytics at Texas A&M Lab

REAT: A post-hoc interpretation method for predictions made by RNNs ---

• Highly faithful and interpretable explanations

• Useful debugging tool to examine RNNs

Future work ---• "Techniques for Interpretable Machine Learning",

Mengnan Du, Ninghao Liu, Xia Hu, Communications of the ACM, 2019.

25

New layer with interpretable constraints

Intrinsic Explanation( global or local )

Post-hocGlobal Explanation

Post-hocLocal Explanation