55
www.umbc.edu BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting By: Kiante Brantley Graduate Student M. S. Computer Science Guided By: Dr. Tim Oates

Network Pruning Technique to BCAP: An Artificial Neural

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP: An Artificial Neural Network Pruning Technique to

Reduce OverfittingBy: Kiante BrantleyGraduate Student

M. S. Computer Science

Guided By: Dr. Tim Oates

Page 2: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Index

• Motivating Problem

• Research Objective

• Methodology

• Experiments

• Analysis

• Conclusions

• Future Work

2BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 3: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Machine learning model inspired by biological neural networks

What is artificial neural network (ANN)?

3BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 4: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Functional mapping between input values and output values

But what is artificial neural network (ANN)?

4BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 5: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Larger networks:

What is the problem?

5

Pro’s:- Learn quickly- Less sensitive to initial

condition- Less sensitive to local

minima

Con’s: - Overfitting

[Reed, 1993]

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 6: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Smaller networks:

What is the problem?

6

Pro’s:- Generalize better- Faster build- Faster to compute- Easier understand

Con’s: - Underfitting

[Reed, 1993]

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 7: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Dropout [Hinton et al., 2014]

Current Approaches to solving overfitting:

7BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 8: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

DropConnect [Wan et. al, 2013]

Current Approaches to solving overfitting:

8

Regular Network: DropConnect Network:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 9: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Can we harness the benefits of both small and large networks

But ……

9

Larger networks:

Pro’s:- Learn quickly- Less sensitive to initial

condition- Less sensitive to local minima

Con’s: - Overfitting

[Reed, 1993]

Smaller networks:

Pro’s:- Generalize better- Faster build- Faster to compute- Easier understand

Con’s: - Underfitting

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 10: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Can we train on a large network and remove hidden units to make the network smaller

Lets try pruning …...

10

Train: Prune:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 11: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Neural Net Pruning - Why and How [Sietsma et. al, 1988]

Pruning Neural Networks:

11BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 12: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Others …...

Pruning Neural Networks:

12

[Sietsma et al., 1988]

Pattern Recognition Theory

[Chung et al., 1993]

Reed’s survey, Sensitivity Pruning [Reed, 1993]

Optimal Brain Damage [Le Cun et al., 1990]

Modified Back Propagation

Singular value decomposition (SVD)

[Xue et al., 1990]

Modified Cross Validation [Castellano et al., 1997]

…..

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 13: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Develop a pruning technique that can be applied to a fully connected layer of a neural network to addresses two issues

that neural networks face: overfitting and tuning hyperparameters

Research Objective

13BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 14: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Develop a pruning technique that can be applied to a fully connected layer of a

neural network to addresses three issues that neural networks face: overfitting,

tuning hyperparameters, and expensive hardware.

Research Objective

14

Our Contributions…

•A novel pruning technique that can be applied to fully connected hidden layer of a neural network to address overfitting and tuning the network configuration.

•Demonstration of BCAP's ability to prune data sets in two domains: Life science and Vision

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 15: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP: Model DescriptionPruning

Hidden units

Error Metric

Stopping Criteria

Finding Prunable

hidden Units

15BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Groups and Cosine Curve

Approximation

Page 16: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Feedforward Network:

16

n - number of records in the data, l - index for fully connected hidden layerhl- number of hidden units in a layer lxl - output of layer l with dimension h by nwl - weights matrix associated with the layer lbl - bias matrix associated with the layer lf(x) - activation function z(l+1) - input matrix to l+1

Formal Feedforward Operation:

Definitions:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 17: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP can be described as mapping the h dimensions of the weight and bias vectors to h', where h' <= h

BCAP Modification on FeedForward Network:

17

Original FeedForward Model:

BCAP FeedForward Model:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 18: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

A node is prunable if it produces output similar to another node in the same layer and the cosine similarity is used to detect the similarity between hidden units.

Finding Prunable hidden Units:

18BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 19: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Finding Prunable hidden Units:

19

Z = W x + b

W11 W12W21 W22W31 W31

Z =8 93 4

(8W11+ 3W12) + b1 (9W11+ 4W12) + b2 (8W21+ 3W22) + b1 (9W21+ 4W22) + b2 (8W31+ 3W32) + b1 (9W31+ 4W31) + b2

Z =

Two Records

.5 .7

.2 .3

.5 .7Z = Three hidden hidden units

AB

C

b1 b2 +

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 20: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP: Model DescriptionPruning

Hidden units

Error Metric

Stopping Criteria

Finding Prunable

hidden Units

20BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Groups and Cosine Curve

Approximation

Page 21: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Pruning Hidden Units:

21

When dealing with a fully-connected hidden layer, every hidden unit receives the same input from the previous layer

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 22: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

We formally describe how to mathematically combine hidden units:

Pruning Hidden Units:

22BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 23: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP: Model DescriptionPruning

Hidden units

Error Metric

Stopping Criteria

Finding Prunable

hidden Units

23BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Groups and Cosine Curve

Approximation

✓ ✓

Page 24: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Error Metric:

24

In practice, if there are hidden units in a hidden layer l that are similar, the cosine similarity measurement produced is not exactly 1

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 25: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Formally we describe how to mathematically the error that is introduced:

Error Metric:

25BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 26: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

In general we would like to minimize the effect on the network's performance capability with respect on the error e discussed because we do not know if the effect is positive (i.e. increasing generalization) or negative (decreasing generalization).

Our error measurement is based on the Mean Square Error (MSE):

Error Metric:

26BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 27: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

The MSE equation defined above is not sufficient enough for a error metric because it has depends on the size of hidden layer. Dividing by size of the hidden layer shares the error measure across all the nodes in the layer, rather than letting one node's change dominate the error

Error Metric:

27BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 28: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP: Model DescriptionPruning

Hidden units

Error Metric

Stopping Criteria

Finding Prunable

hidden Units

28BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Groups and Cosine Curve

Approximation

✓ ✓

Page 29: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

We would like prune hidden units from the most similar to least similar, where the maximum amount of error introduced in the network from all the prunes is less than ε.

Stopping Criteria:

29BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 30: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

We would like prune hidden units from the most similar to least similar, where the maximum amount of error introduced in the network from all the prunes is less than ε.

Stopping Criteria:

30BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 31: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

We would like prune hidden units from the most similar to least similar, where the maximum amount of error introduced in the network from all the prunes is less than ε.

Stopping Criteria:

31BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 32: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

One way to find the optimal amount of prunes with respect the stopping parameter ε is by iterating over all the possible prunable hidden units from the most similar to least similar.

If hidden layer l has n hidden units, the number of 2-combinations that can be formed is roughly n2:

Stopping Criteria:

32BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 33: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP: Model DescriptionPruning

Hidden units

Error Metric

Stopping Criteria

Finding Prunable

hidden Units

33BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Groups and Cosine Curve

Approximation

✓ ✓

✓ ✓

Page 34: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

This means that it will roughly take us quadratic time to explore all of angles produced from computing the similarity between hidden units. We would like to reduce this computation time by approximating the optimal angle associated with the last possible prunable hidden unit pair.

Groups and Cosine Curve Approximation:

34BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 35: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu35

The cosine similarity is a function that has a domain [0o,180o] and range [1,0]where the cosine of degree 0o is 1, cosine of degree 90o is 0, and the cosine of degree 180o is -1. We can divide the cosine similarity into i parts and use those values to find the optimal angle threshold denoted as ti.

Groups and Cosine Curve Approximation:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 36: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu36

The cosine similarity is a function that has a domain [0o,180o] and range [1,0]where the cosine of degree 0o is 1, cosine of degree 90o is 0, and the cosine of degree 180o is -1. We can divide the cosine similarity into i parts and use those values to find the optimal angle threshold denoted as ti.

Groups and Cosine Curve Approximation:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 37: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu37

Groups and Cosine Curve Approximation:

In addition to creating a series of ti to find the optimal degree threshold, we need to combine similar hidden units into groups denoted as Gi. If we do not group hidden units, we would be performing the same amount of computation as the iterative version.

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 38: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

BCAP: Model DescriptionPruning

Hidden units

Error Metric

Stopping Criteria

Finding Prunable

hidden Units

38BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Groups and Cosine Curve

Approximation

✓ ✓

✓ ✓

Page 39: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu39

BCAP Formal Algorithm:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 40: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu40

We evaluate the BCAP pruning technique on fully connected neural networks layers trained on various datasets in different domains for classifications tasks.

We intentionally use hidden layer sizes larger than necessary to take advantage of large networks ability to learn fast and then we either prune the network after or during training. If the training duration (i.e epochs) is too short, we avoid applying BCAP during training.

Experiments:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 41: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu41

The MNIST data set contains 28 x 28 pixel black and white handwritten digit images. There are 10 handwritten digits in this data set which range from 0 to 9 (10-classes). The training and test set contains digits from each of the classes

Experiments:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 42: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu42

Experiments: [2-layer MLP with 800 in each layer]

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 43: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Pruning Analysis

Tuning BCAP

Error ε

43BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Tradeoff Between Accuracy and Network Size

Pruning Epoch

for Activation

functions

Tuning of

BCAP Error ε

Page 44: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu44

This parameter decides on how much error will be introduced into the network after pruning.

Tuning of BCAP Error ε:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 45: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Pruning Analysis

Tuning BCAP

Error ε

45BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Tradeoff Between Accuracy and Network Size

Pruning Epoch

for Activation

functions

Tuning of

BCAP Error ε ✓

Page 46: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu46

When pruning neural networks there is an inherent trade off between the neural network size and accuracy. We would like to minimize the network size and maximize the accuracy of the network with respect to the test data set

Tradeoff Between Accuracy and Network Size:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 47: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Pruning Analysis

Tuning BCAP

Error ε

47BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Tradeoff Between Accuracy and Network Size

Pruning Epoch

for Activation

functions

Tuning of

BCAP Error ε ✓

Page 48: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu48

Pruning Epoch for Activation functions:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 49: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu49

Pruning Epoch for Activation functions:

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 50: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

• Method for pruning a fully connected layer of a neural network

• Evidence that pruning neural network can be effective

Conclusions from our work

50BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 51: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

• Explore other types of neural networks besides MLP

• Exploring neural networks with more than 2 hidden layers

• Evaluate pruning hidden units whose directions are opposite

Future work

51BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 52: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

• Explore other types of neural networks besides MLP

• Exploring neural networks with more than 2 hidden layers

• Evaluate pruning hidden units whose directions are opposite

Future work

52

Our method is a promising technique to help reduce overfitting and increasing generalization in a

neural network.

BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 53: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

• Reed, Russell. "Pruning Algorithms - A Survey." IEEE Transactions On Neural Networks 4.5 (1993)• Srivastava, Nitish, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. "Dropout: A

Simple Way to Prevent Neural Networks from Overfitting." Journal of Machine Learning Research (2014)• Wan, Li, Matthew Zeiler, Sixin Zhang, Yann LeCun, and Robert Fergus. "Regularization of Neural Networks

Using DropConnect." • Cun, Yann Le, John S. Denker, and Sara A. Sola. "Optimal Brain Damage." (1990)• CHUNG, F.L. "A NODE PRUNING ALGORITHM FOR BACKPROPAGATION NETWORKS." International

Journal of Neural Systems 03,.03 (1993)• Schiffmann, W., M. Joost, and R. Werner. "Synthesis and Performance Analysis of Multilayer Neural Network

Architectures." (1992): • Xue, Qiuzhen, Yu Hen Hu, and Paul Milenkovic. "Analyses of the Hidden Units of the Multi-layer Perceptron and

Its Application in Acoustic-to-articulatory Mapping." IEEE Transactions On Neural Networks 1.4 (1990)• Castellano, G., A.m. Fanelli, and M. Pelillo. "An Iterative Pruning Algorithm for Feedforward Neural Networks."

IEEE Trans. Neural Netw. IEEE Transactions on Neural Networks 8.3 (1997)

References

53BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 54: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu

Thank you

54BCAP: An Artificial Neural Network Pruning Technique to Reduce Overfitting

Page 55: Network Pruning Technique to BCAP: An Artificial Neural

www.umbc.edu