17
Neural Networks CMSC 422 MARINE CARPUAT [email protected] XOR slides by Graham Neubig (CMU)

Neural Networks - cs.umd.edu

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Neural Networks - cs.umd.edu

Neural Networks

CMSC 422

MARINE CARPUAT

[email protected]

XOR slides by Graham Neubig (CMU)

Page 2: Neural Networks - cs.umd.edu

Neural Networks

ā€¢ Today

ā€“ What are Neural Networks?

ā€“ How to make a prediction given an input?

ā€“ Why are neural networks powerful?

ā€¢ Next time

ā€“ how to train them?

Page 3: Neural Networks - cs.umd.edu

A warm-up example

sentiment analysis for movie review

ā€¢ the movie was horrible +1

ā€¢ the actors are excellent -1

ā€¢ the movie was not horrible -1

ā€¢ he is usually an excellent actor, but not in

this movie +1

Page 4: Neural Networks - cs.umd.edu

Binary classification

via hyperplanes

ā€¢ At test time, we check on

what side of the

hyperplane examples fall

š‘¦ = š‘ š‘–š‘”š‘›(š‘¤š‘‡š‘„ + š‘)

Page 5: Neural Networks - cs.umd.edu

Function Approximation

with PerceptronProblem setting

ā€¢ Set of possible instances š‘‹ā€“ Each instance š‘„ āˆˆ š‘‹ is a feature vector š‘„ = [š‘„1, ā€¦ , š‘„š·]

ā€¢ Unknown target function š‘“: š‘‹ ā†’ š‘Œā€“ š‘Œ is binary valued {-1; +1}

ā€¢ Set of function hypotheses š» = ā„Ž ā„Ž: š‘‹ ā†’ š‘Œ}ā€“ Each hypothesis ā„Ž is a hyperplane in D-dimensional space

Input

ā€¢ Training examples { š‘„ 1 , š‘¦ 1 , ā€¦ š‘„ š‘ , š‘¦ š‘ } of unknown

target function š‘“

Output

ā€¢ Hypothesis ā„Ž āˆˆ š» that best approximates target function š‘“

Page 6: Neural Networks - cs.umd.edu

Aside: biological inspiration

Analogy: the

perceptron

as a neuron

Page 7: Neural Networks - cs.umd.edu

Neural Networks

ā€¢ We can think of neural networks as

combination of multiple perceptrons

ā€“ Multilayer perceptron

ā€¢ Why would we want to do that?

ā€“ Discover more complex decision boundaries

ā€“ Learn combinations of features

Page 8: Neural Networks - cs.umd.edu

What does a 2-layer

perceptron look like?

(illustration on board)

ā€¢ Key concepts:

ā€“ Input dimensionality

ā€“ Hidden units

ā€“ Hidden layer

ā€“ Output layer

ā€“ Activation functions

Page 9: Neural Networks - cs.umd.edu

Activation functions

(aka link functions)ā€¢ Activation functions are non-linear

functions

ā€“ sign function as in the perceptron

ā€“ hyperbolic tangent and other sigmoid

functions that approximate sign but are

differentiable

ā€¢ What happens if the hidden units use the

identify function as an activation function?

Page 10: Neural Networks - cs.umd.edu

Matrix of hidden layer parameters

Vector of output layer parameters

Page 11: Neural Networks - cs.umd.edu

What functions can we approximate

with a 2 layer perceptron?Problem setting

ā€¢ Set of possible instances š‘‹ā€“ Each instance š‘„ āˆˆ š‘‹ is a feature vector š‘„ = [š‘„1, ā€¦ , š‘„š·]

ā€¢ Unknown target function š‘“: š‘‹ ā†’ š‘Œā€“ š‘Œ is binary valued {-1; +1}

ā€¢ Set of function hypotheses š» = ā„Ž ā„Ž: š‘‹ ā†’ š‘Œ}

Input

ā€¢ Training examples { š‘„ 1 , š‘¦ 1 , ā€¦ š‘„ š‘ , š‘¦ š‘ } of unknown

target function š‘“

Output

ā€¢ Hypothesis ā„Ž āˆˆ š» that best approximates target function š‘“

Page 12: Neural Networks - cs.umd.edu

Two-Layer Networks are

Universal Function Approximators

ā€¢ Theorem (Th 9 in CIML):Let F be a continuous function on a bounded subset of D-

dimensional space. Then there exists a two-layer neural

network š¹ with a finite number of hidden units that

approximates F arbitrarily well. Namely, for all x in the

domain of F,

Page 13: Neural Networks - cs.umd.edu

Example: a neural network to

solve the XOR problem

X

O

O

X

Ļ†0(x2) = {1, 1}Ļ†

0(x1) = {-1, 1}

Ļ†0(x4) = {1, -1}Ļ†

0(x3) = {-1, -1}

1

1

-1

-1

-1

-1

Ļ†1

Ļ†2

Ļ†1[1]

Ļ†1[0]

Ļ†1[0]

Ļ†1[1]

Ļ†1(x1) = {-1, -1}

X OĻ†

1(x2) = {1, -1}

O

Ļ†1(x3) = {-1, 1}

Ļ†1(x4) = {-1, -1}

Page 14: Neural Networks - cs.umd.edu

Exampleā— In new space, the examples are linearly separable!

X

O

O

X

Ļ†0(x2) = {1, 1}Ļ†

0(x1) = {-1, 1}

Ļ†0(x4) = {1, -1}Ļ†

0(x3) = {-1, -1}

1

1

-1

-1

-1

-1

Ļ†0[0]

Ļ†0[1]

Ļ†1[1]

Ļ†1[0]

Ļ†1[0]

Ļ†1[1]

Ļ†1(x1) = {-1, -1}

X O Ļ†1(x2) = {1, -1}

OĻ†1(x3) = {-1, 1}

Ļ†1(x4) = {-1, -1}

1

1

1Ļ†

2[0] = y

Page 15: Neural Networks - cs.umd.edu

Example

ā— The final net

tanh

tanh

Ļ†0[0]

Ļ†0[1]

1

Ļ†0[0]

Ļ†0[1]

1

1

1

-1

-1

-1

-1

1 1

1

1

tanh

Ļ†1[0]

Ļ†1[1]

Ļ†2[0]

Page 16: Neural Networks - cs.umd.edu

Discussion

ā€¢ 2-layer perceptron lets us

ā€“ Discover more complex decision boundaries than

perceptron

ā€“ Learn combinations of features that are useful for

classification

ā€¢ Key design question

ā€“ How many hidden units?

ā€“ More hidden units yield more complex functions

ā€“ Fewer hidden units requires fewer examples to train

Page 17: Neural Networks - cs.umd.edu

Neural Networks

ā€¢ Today

ā€“ What are Neural Networks?

ā€¢ Multilayer perceptron

ā€“ How to make a prediction given an input?

ā€¢ Simple matrix operations + non-linearities

ā€“ Why are neural networks powerful?

ā€¢ Universal function approximators!

ā€¢ Next

ā€“ How to train them?