Upload
zeus-hawkins
View
16
Download
0
Embed Size (px)
DESCRIPTION
Today’s Lecture. Administrative Details Learning Decision trees: cleanup & details Belief nets Sub-symbolic learning Neural networks. Administrativia. Assignment 3 now available. Due in 1 week. - PowerPoint PPT Presentation
Citation preview
CS-424 Gregory Dudek
Today’s Lecture
• Administrative Details
• Learning– Decision trees: cleanup & details– Belief nets– Sub-symbolic learning
• Neural networks
CS-424 Gregory Dudek
Administrativia• Assignment 3 now available. Due in 1 week.
<http://www.cim.mcgill.ca/~dudek/424/game.pdf>
• The game for the final project has been defined. It’s description can be
found via the course home page at: <http://www.cim.mcgill.ca/~dudek/424/game.pdf>
• Examine the game now, try it, see what good strategies might be.
• Note: the normal late policy will not apply to the project.– You **must** submit the electronic (executable) on time, or it may not be
evaluated (i.e. you get zero)!– It must run on LINUX. Be certain to compile and test it on one of the linux machines
in the lab well before it’s due. • If you are developing on another platform, regularly test on linux during
development.
CS-424 Gregory Dudek
ID3 (more…)• Last class, discussed using entropy to select a question for
building a decision tree. This idea first developed by Quinlan (1979) in the ID3 system, later improved resulting in C4.5
• Recap:– Entropy for classification into sets p+ and p- is I(p+,p-)– Consider information gain per attribute A.
• For each subtree Ai we have a few bits given by the distribution of cases on the subtree I(pi+,pi-). To fully classify all sub-cases, we need
Remainder(A) = i weight(i) I(pi+,pi-)• Thus, info gain is what’s left:
Gain = I(p+,p-) - Remainder(A)
CS-424 Gregory Dudek
Final thoughts on entropy...• Provoked by the seminar yesterday.• Idea:
– Entropy tells you about unpredictability, or randomness.– When selecting a question, the one with highest entropy
will carry the most information with respect to what you knew, because the answer is hardest to predict.
– When asking if we know about something, then high entropy is bad
– Consider the PDF of a robot’s position estimate…More entropy means more uncertainty.
CS-424 Gregory Dudek
Training & testing• When constructing a learning system, we what to
generalize to new example. (already discussed ad nauseam)
• How can we tell if it’s working?– Look at how well we do with our training data.
• But…. What if we just learned a “quirk” of the data?• Overfitting? Bad features?
– (tank classification example; table lookup)
– Look at how we do on a set of examples never used for any training: a “test set”.
• But… what if we can’t afford the data?– Cross validation: learner L on training set X
e(L:X) = i in X S=X-i error(L(S) on case i)2
CS-424 Gregory Dudek
Simple functions?
Is there a fixed circuit network topology that can be used to represent
a family of functions?
Yes! Neural-like networks (a.k.a. artificial neural networks) allow us this flexibility and more; we can represent arbitrary families of continuous functions using fixed topology networks.
CS-424 Gregory Dudek
Belief networks (ch. 15) - briefly
• We will cover only R&N Section 15.1 & 15.2 (briefly), and then segue to
chapter 19. You should read 15.3. This will be cursory coverage only.
• A belief net is a formalism for describing probabilistic relationships (a.k.a. Bayes nets).
• A graph (in fact a DAG) G(V,E). Nodes are random variables. Directed edges indicate a node has direct influence on another node.
• Each node has an associated conditional probability table quantifying the effects of it’s “parents”.
• No directed cycles.
CS-424 Gregory Dudek
Why?• Objective:
– Compute probabilities of variables of interest, query variables
– Given observations of specific phenomena in the world, evidence variables.
• The net you get depends on how you construct it, not just the problem and probabilities.
• Seek compactness (fewer links, tighter clusters): called locally structured nets.
CS-424 Gregory Dudek
• See overheads...
CS-424 Gregory Dudek
Issues• Where do the probabilities come from?
– They can be learned or inferred from data.
• Where does the causal structure come from (the topology)?– It’s (sometimes) very hard to learn.– Problem: lots of alternative topologies are possible.
What’s really cause and what’s effect?• Did it really rain because I brought my umbrella? Can
a computer infer this (or the opposite) just from weather data?
• Both these topics are current research areas.
Not in text
CS-424 Gregory Dudek
Neural Networks?Artificial Neural Nets
a.k.a.
Connectionist Nets (connectionist learning)
a.k.a.Sub-symbolic learning
a.k.a.Perceptron learning (a special case)
CS-424 Gregory Dudek
Networks that model the brain?• Note there is an interesting connection to Bayes
nets: it isn’t considered in the book.– Something to reflect on.
• Idea: model intelligence withour “jumping ahead” to symbolic representations.
• Related to earliest work on cybernetics.
CS-424 Gregory Dudek
The idealized neuron• Artificial neural networks come in several “flavors”.
– Most of based on a simplified model of a neuron.
• A set of (many) inputs.• One output.
• Output is a function of the sum on the inputs.– Typical functions:
• Weighted sum• Threshold• Gaussian
CS-424 Gregory Dudek
Why neural nets?• Motives:
– We wish to create systems with abilities akin to those of the human mind.
• The mind is usually assumed to be be a direct consequence of the structure of the brain.
– Let’s mimic the structure of the brain!
– By using simple computing elements, we obtain a system that might scale up easily to parallel hardware.
– Avoids (or solves?) the key unresolved problem of how to get from “signal domain” to symbolic representations.
– Fault tolerance
Not in text
CS-424 Gregory Dudek
Not in text
CS-424 Gregory Dudek
Not in text
CS-424 Gregory Dudek
Real and fake neurons• Signals in neurons
are coded by “spike rate”.
• In ANN’s, inputs can be either:– 0 or 1 (binary)– [0,1]– [-1,1]– R (real)
• Each input Ii has an associated real-valued weight wi
• Learning by changing
weights at synapses.
CS-424 Gregory Dudek
Not in text
CS-424 Gregory Dudek
Brains• The brain seems divided into functional areas• These are often seen as analogous to modules in a
software system.• Why would it be like this? (2 possible answers)
– Evolution: incremental improvement easier in a modular system.
– Advantage of combining complementary solutions.– It isn’t!
CS-424 Gregory Dudek
Inductive bias?• Where’s the inductive bias?
– In the topology and architecture of the network.– In the learning rules.– In the input and output representation.– In the initial weights.
Not in text
CS-424 Gregory Dudek
Simple neural models• Oldest ANN model is McCulloch-Pitts neuron
[1943] .– Inputs are +1 or -1 with real-valued weights.– If sum of weighted inputs is > 0, then the neuron “fires”
and gives +1 as an output.– Showed you can comput logical functions.– Relation to learning proposed (later!) by Donald Hebb
[1949].• Perceptron model [Rosenblatt, 1958].
– Single-layer network with same kind of neuron.• Firing when input is about a threshold: ∑xiwi>t .
– Added a learning rule to allow weight selection.
Not in text
CS-424 Gregory Dudek
Perceptron nets
CS-424 Gregory Dudek
Perceptron learning• Perceptron learning:
– Have a set of training examples (TS) encoded as input values (I.e. in the form of binary vectors)
– Have a set of desired output values associated with these inputs.
• This is supervised learning.– Problem: how to adjust the weights to make the actual
outputs match the training examples. • NOTE: we to not allow the topology to change! [You
should be thinking of a question here.]• Intuition: when a perceptron makes a mistake, it’s weights
are wrong.– Modify them so make the output bigger or smaller, as desired.
CS-424 Gregory Dudek
Learning algorithm• Desired Ti Actual output Oi
• Weight update formula (weight from unit j to i):Wj,i = Wj,I + k* xj * (Ti - Oi)
Where k is the learning rate.
• If the examples can be learned (encoded), then the perceptron learning rule will find the weights.– How?Gradient descent.
Key thing to prove is the absence of local minima.
CS-424 Gregory Dudek
Perceptrons: what can they learn?• Only linearly separable functions [Minsky &
Papert 1969].
• N dimensions: N-dimensional hyperplane.
CS-424 Gregory Dudek
More general networks• Generalize in 3 ways:
– Allow continuous output values [0,1]– Allow multiple layers.
• This is key to learning a larger class of functions.– Allow a more complicated function than thresholded
summation[why??]
Generalize the learning rule to accommodate this: let’s see how it works.
CS-424 Gregory Dudek
The threshold• The key variant:
– Change threshold into a differentiable function
– Sigmoid, known as a “soft non-linearity” (silly).
M = ∑xiwi
O = 1 / (1 + e -k M )