On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On...

Preview:

Citation preview

Data Analytics at Texas A&M Lab

Mengnan Du, Ninghao Liu, Fan Yang, Shuiwang Ji, Xia Hu

Texas A&M University

On Attribution of Recurrent Neural Network

Predictions via Additive Decomposition

Data Analytics at Texas A&M Lab

RNNs are regarded as black boxes

“Interpreting Machine Learning Models”. Medium.com, 2017.

RNNs make lots of progress

Accuracy

Inte

rpre

tab

ility

1

Text classification Machine translation

Four types of RNNs

Data Analytics at Texas A&M Lab

Interpretation beneficial both to researchers and end-users

Researcher/developer End-user

Our Goal --- Provide post-hoc interpretation behind individual prediction

• Increase the interpretability for RNNs• Keep prediction performance unchanged

2

RNN

Explanation

UsersTrust

Explanation

ResearcherRefine

Data Analytics at Texas A&M Lab

Key factors

--- A pre-trained RNN and an input text

--- The prediction of RNN

Post-hoc Interpretation

--- Contribution score for each feature in input

--- Deeper color in the heatmap means higher contribution

3

Interpretation heatmap

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Teaching Machines to Read and Comprehend, NIPS

Data Analytics at Texas A&M Lab

1

2

How to guarantee that the post-hoc interpretations are indeed

faithful to the original prediction

to the original model.

Local approximation based methods may not be faithful

It is challenging to develop an attribution method which could

generate phrase-level explanations.

4

positive

Negative sentiment

Phrase-level explanation“ Used to be my favorite”.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier”. KDD, 2016.

Data Analytics at Texas A&M Lab

Faithful interpretation

Phrase-levelinterpretation

Should investigate Internal neurons

Interpretation method should be flexible

Can we utilize decomposition based methods to derive interpretations?

5

Data Analytics at Texas A&M Lab

• Symbol α𝑡 ∶ partial evidence is brought to the time step 𝑡

• Symbol = 𝑔(𝑥𝑡): the evidence that RNN obtains at time step 𝑡

• Some follow this rule exactly, e.g., GRU. Some approximately, e.g., LSTM

6

ℎ1 ℎ𝑡−1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡−1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

• Abstracted RNN updating rule:ℎ𝑡−1 ℎ𝑡

𝑥𝑡

α𝑡

Data Analytics at Texas A&M Lab 7

ℎ1 ℎ𝑡−1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡−1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Abstracted RNN updating rule:

• RNN prediction decomposition:

Two essential elements:

Hidden state vector ℎ𝑡updating vector α𝑡

RNN logit value:

Data Analytics at Texas A&M Lab 8

• From decomposition to word-level explanation

ℎ1 ℎ𝑡−1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡−1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

EvidenceUpdating

EvidenceForgetting

Contribution score for 𝑥𝑡 :Updating from 𝑡 − 1 to 𝑡Forgetting from 𝑡 + 1 to 𝑇

Data Analytics at Texas A&M Lab 9

• Phrase-level explanation• Contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞,… , 𝑟} :

ℎ1 ℎq ℎ𝑟 ℎ𝑇

𝑥1 𝑥𝑞 𝑥𝑟 𝑥𝑇

𝑧 𝑦

EvidenceUpdating

EvidenceForgetting

Two parts:Evidence updating from 𝑞 − 1 to 𝑟Evidence forgetting from 𝑟 + 1 to 𝑇

Data Analytics at Texas A&M Lab 10

• Hidden state vector updating rule for GRU:

“Understanding LSTM Networks”. Colah’s blog, 2015.

• REAT updating rule:

• GRU contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞, … , 𝑟} :

Only need to replace REAT updating vector α𝑡 with GRU updating gate vector 𝑢𝑡

Data Analytics at Texas A&M Lab 11

• Hidden state vector updating rule for LSTM:

“Understanding LSTM Networks”. Colah’s blog, 2015.

• Approximate REAT updating rule:

• LSTM contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞,… , 𝑟} :

Data Analytics at Texas A&M Lab 12

• Concatenation of normal GRU and reverse GRU:

• Phrase-level attribution for BiGRU:

Concatenation of two terms:Normal GRU decompositionReverse GRU decomposition

Data Analytics at Texas A&M Lab

2. Qualitative Evaluation via Case Studies

13

1. Attribution Faithfulness Evaluation

3. Applying REAT for Linguistic Patterns Analysis

4. Applying REAT for Model Misbehavior Analysis

Data Analytics at Texas A&M Lab 14

• Once the most important sentence is deleted, it will cause the highest accuracy drop for the target class.

“ It is ridiculous , but of course it is also refreshing”.

REAT explanations are highly faithful to original RNN

Data Analytics at Texas A&M Lab

REAT accurately reflect the prediction score of different architectures

15

• Visualizations Under Different RNN Architectures

GRU

LSTM

BiGRU

The fight scenes are fun but it grows tedious

The fight scenes are fun but it grows tedious

The fight scenes are fun but it grows tedious

• GRU positive prediction (51.6% confidence)

• LSTM positive prediction (96.2% confidence)

• BiGRU negative prediction (62.7% confidence)

Green: positive contribution, red: negative contribution

Data Analytics at Texas A&M Lab 16

• Hierarchical Attribution: LSTM negative prediction with 99.46% confidence

Word

Phrase

Clause

The story may be new but the movie does n’t serve up lots of laughs,

The story may be new but the movie does n’t serve up lots of laughs,

The story may be new but the movie does n’t serve up lots of laughs,

Green: positive contribution, red: negative contribution

• In general, the first part of the text has negative contribution

• The second part of the text has positive contribution

• This hierarchical attention represents the contributions at different levels

of granularity

Data Analytics at Texas A&M Lab 17

• Apply REAT to analyze linguistic patterns for LSTM over SST2 test.

POS category score distributions

RBS, e.g, “best”, “most”, highest score

JJ, adjectives, ranks relatively high

NN, nouns, near-zero median score

REAT unveils useful linguistic knowledge captured by LSTM

Data Analytics at Texas A&M Lab 18

• LSTM wrongly gives a 99.97% negative sentiment prediction.

• Attribution score distribution for two words “terrible” and “terribly”

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• REAT could tell us that LSTM captures the meaning relevant to “terrible”

• While ignore other meanings, such as “extremely”

• This LSTM fails to model polysemant of words

Data Analytics at Texas A&M Lab 19

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Negative prediction, 99.97% confidence

Data Analytics at Texas A&M Lab 20

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Extremely

Data Analytics at Texas A&M Lab 21

“Schweiger is talented and extremely charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Positive prediction, 81.29% confidence

Data Analytics at Texas A&M Lab 22

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Very

Data Analytics at Texas A&M Lab 23

“Schweiger is talented and very charismatic, qualities essential to both movie stars and social anarchists”.

• Interpretable adversarial attack for a LSTM classifier

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Positive prediction, 99.53% confidence

Data Analytics at Texas A&M Lab 24

“Occasionally melodramatic, it ’s also extremely effective.”

• This adversarial attack generalizes to other instances

“Occasionally melodramatic, it ’s also terribly effective.”

Positive prediction, 99.53%

Negative prediction, 99.0%

“Extremely well acted by the four primary actors, this is a seriously intended movie that is not easily forgotten.”

“Terribly well acted by the four primary actors, this is a seriously intended movie that is not easily forgotten.”

Positive prediction, 99.98%

Negative prediction, 87.7%

Data Analytics at Texas A&M Lab

REAT: A post-hoc interpretation method for predictions made by RNNs ---

• Highly faithful and interpretable explanations

• Useful debugging tool to examine RNNs

Future work ---• "Techniques for Interpretable Machine Learning",

Mengnan Du, Ninghao Liu, Xia Hu, Communications of the ACM, 2019.

25

New layer with interpretable constraints

Intrinsic Explanation( global or local )

Post-hocGlobal Explanation

Post-hocLocal Explanation

Recommended