On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On...

Data Analytics at Texas A&M Lab

Mengnan Du, Ninghao Liu, Fan Yang, Shuiwang Ji, Xia Hu

Texas A&M University

On Attribution of Recurrent Neural Network

Predictions via Additive Decomposition

RNNs are regarded as black boxes

“Interpreting Machine Learning Models”. Medium.com, 2017.

RNNs make lots of progress

Accuracy

Text classification Machine translation

Four types of RNNs

Interpretation beneficial both to researchers and end-users

Researcher/developer End-user

Our Goal --- Provide post-hoc interpretation behind individual prediction

• Increase the interpretability for RNNs• Keep prediction performance unchanged

Explanation

UsersTrust

Explanation

ResearcherRefine

Key factors

--- A pre-trained RNN and an input text

--- The prediction of RNN

Post-hoc Interpretation

--- Contribution score for each feature in input

--- Deeper color in the heatmap means higher contribution

Interpretation heatmap

ℎ1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

Teaching Machines to Read and Comprehend, NIPS

How to guarantee that the post-hoc interpretations are indeed

faithful to the original prediction

to the original model.

Local approximation based methods may not be faithful

It is challenging to develop an attribution method which could

generate phrase-level explanations.

positive

Negative sentiment

Phrase-level explanation“ Used to be my favorite”.

"Why Should I Trust You?": Explaining the Predictions of Any Classifier”. KDD, 2016.

Faithful interpretation

Phrase-levelinterpretation

Should investigate Internal neurons

Interpretation method should be flexible

Can we utilize decomposition based methods to derive interpretations?

• Symbol α𝑡 ∶ partial evidence is brought to the time step 𝑡

• Symbol = 𝑔(𝑥𝑡): the evidence that RNN obtains at time step 𝑡

• Some follow this rule exactly, e.g., GRU. Some approximately, e.g., LSTM

ℎ1 ℎ𝑡−1 ℎ𝑡 ℎ𝑇

𝑥1 𝑥𝑡−1 𝑥𝑡 𝑥𝑇

𝑧 𝑦

• Abstracted RNN updating rule:ℎ𝑡−1 ℎ𝑡

𝑥𝑡

α𝑡

Data Analytics at Texas A&M Lab 7

𝑧 𝑦

Abstracted RNN updating rule:

• RNN prediction decomposition:

Two essential elements:

Hidden state vector ℎ𝑡updating vector α𝑡

RNN logit value:

• From decomposition to word-level explanation

𝑧 𝑦

EvidenceUpdating

EvidenceForgetting

Contribution score for 𝑥𝑡 :Updating from 𝑡 − 1 to 𝑡Forgetting from 𝑡 + 1 to 𝑇

• Phrase-level explanation• Contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞,… , 𝑟} :

ℎ1 ℎq ℎ𝑟 ℎ𝑇

𝑥1 𝑥𝑞 𝑥𝑟 𝑥𝑇

𝑧 𝑦

EvidenceUpdating

EvidenceForgetting

Two parts:Evidence updating from 𝑞 − 1 to 𝑟Evidence forgetting from 𝑟 + 1 to 𝑇

• Hidden state vector updating rule for GRU:

“Understanding LSTM Networks”. Colah’s blog, 2015.

• REAT updating rule:

• GRU contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞, … , 𝑟} :

Only need to replace REAT updating vector α𝑡 with GRU updating gate vector 𝑢𝑡

• Hidden state vector updating rule for LSTM:

“Understanding LSTM Networks”. Colah’s blog, 2015.

• Approximate REAT updating rule:

• LSTM contribution score for a phrase 𝑥𝐴, 𝐴 = {𝑞,… , 𝑟} :

• Concatenation of normal GRU and reverse GRU:

• Phrase-level attribution for BiGRU:

Concatenation of two terms:Normal GRU decompositionReverse GRU decomposition

2. Qualitative Evaluation via Case Studies

1. Attribution Faithfulness Evaluation

3. Applying REAT for Linguistic Patterns Analysis

4. Applying REAT for Model Misbehavior Analysis

• Once the most important sentence is deleted, it will cause the highest accuracy drop for the target class.

“ It is ridiculous , but of course it is also refreshing”.

REAT explanations are highly faithful to original RNN

REAT accurately reflect the prediction score of different architectures

• Visualizations Under Different RNN Architectures

The fight scenes are fun but it grows tedious

• GRU positive prediction (51.6% confidence)

• LSTM positive prediction (96.2% confidence)

• BiGRU negative prediction (62.7% confidence)

Green: positive contribution, red: negative contribution

• Hierarchical Attribution: LSTM negative prediction with 99.46% confidence

Phrase

Clause

The story may be new but the movie does n’t serve up lots of laughs,

Green: positive contribution, red: negative contribution

• In general, the first part of the text has negative contribution

• The second part of the text has positive contribution

• This hierarchical attention represents the contributions at different levels

of granularity

• Apply REAT to analyze linguistic patterns for LSTM over SST2 test.

POS category score distributions

RBS, e.g, “best”, “most”, highest score

JJ, adjectives, ranks relatively high

NN, nouns, near-zero median score

REAT unveils useful linguistic knowledge captured by LSTM

• LSTM wrongly gives a 99.97% negative sentiment prediction.

• Attribution score distribution for two words “terrible” and “terribly”

“Schweiger is talented and terribly charismatic, qualities essential to both movie stars and social anarchists”.

• REAT could tell us that LSTM captures the meaning relevant to “terrible”

• While ignore other meanings, such as “extremely”

• This LSTM fails to model polysemant of words

• Interpretable adversarial attack for a LSTM classifier

𝑧 𝑦

Negative prediction, 99.97% confidence

𝑧 𝑦

Extremely

“Schweiger is talented and extremely charismatic, qualities essential to both movie stars and social anarchists”.

𝑧 𝑦

Positive prediction, 81.29% confidence

𝑧 𝑦

“Schweiger is talented and very charismatic, qualities essential to both movie stars and social anarchists”.

𝑧 𝑦

Positive prediction, 99.53% confidence

“Occasionally melodramatic, it ’s also extremely effective.”

• This adversarial attack generalizes to other instances

“Occasionally melodramatic, it ’s also terribly effective.”

Positive prediction, 99.53%

Negative prediction, 99.0%

“Extremely well acted by the four primary actors, this is a seriously intended movie that is not easily forgotten.”

“Terribly well acted by the four primary actors, this is a seriously intended movie that is not easily forgotten.”

Positive prediction, 99.98%

Negative prediction, 87.7%

REAT: A post-hoc interpretation method for predictions made by RNNs ---

• Highly faithful and interpretable explanations

• Useful debugging tool to examine RNNs

Future work ---• "Techniques for Interpretable Machine Learning",

Mengnan Du, Ninghao Liu, Xia Hu, Communications of the ACM, 2019.

New layer with interpretable constraints

Intrinsic Explanation（ global or local ）

Post-hocGlobal Explanation

Post-hocLocal Explanation

On Attribution of Recurrent Neural Network Predictions via ...people.tamu.edu/~dumengnan/On...

Documents

Interpretable Predictions of Clinical Outcomes with An ...ysha8/publication/attention_ACMBCB2017.… · Interpretable Predictions of Clinical Outcomes with An Attention-based Recurrent

ELEC 677: Recurrent Neural Network Applications & Recurrent Neural Network … · 2016-11-08 · Recurrent Neural Network Applications & Recurrent Neural Network Language Models Lecture

Recurrent Epistaxis

Recurrent Pregnancy Loss Recurrent Pregnancy Loss

Attribution in the press Quotation and attribution

Top-down attribution StatPro Revolution: Advanced Equity Attribution · StatPro Revolution: Advanced Equity Attribution StatPro Revolution provides advanced equity attribution following

Recurrent Event

CAR-Net: Clairvoyant Attentive Recurrent Networkopenaccess.thecvf.com/...2018/...Clairvoyant_Attentive_ECCV_2018_… · CAR-Net: Clairvoyant Attentive Recurrent Network 5 in the recurrent

Attribution Theory

Data-driven attribution methodology in Attribution (Beta)

Attribution-fu: Using Correlation Data to Track Marketing Attribution

Smoothed Geometry for Robust Attribution · 2020. 6. 12. · length in two-dimensions. Score surface is represented by contours. Green and purple areas are two predictions. We choose

Post Graduate Program in Data Science · Master Feed-forward, Recurrent and Gaussian Neural Networks. This is your way into AI! TIME SERIES* Learn how to make predictions using time

Causally Driven Incremental Multi Touch Attribution Using a ...papers.adkdd.org/2019/papers/adkdd19-du-causally.pdfCausally Driven Incremental Multi Touch Attribution Using a Recurrent

Recurrent tuberculosis

ClassicalHodgkinLymphomaPresentingwithSevere,Recurrent

Predictions of short-term driving intention using recurrent ...Predictions of short-term driving intention using recurrent neural network on sequential data Zhou Xing, Fei Xiao Athena

Attribution Model

Source 1 - 나 자신을 알자 · Web viewInternal Factors (person/ dispositional attribution) versus External Factors (situation attribution) Attribution I. Definitions. Attribution

Chronic recurrent multifocal osteomyelitis exhibiting ... · cHronic recurrent multiFocAl osteomyelitis exHiBitinG PredominAnce oF PeriosteAl ... IMAGE IN MEDICINE Chronic recurrent