Upload
jeffery-southard
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Learning from how dogs Learning from how dogs learnlearn
Learning from how dogs Learning from how dogs learnlearn
Prof. Bruce BlumbergProf. Bruce Blumberg
The Media Lab, MITThe Media Lab, MIT
[email protected]@media.mit.edu
www.media.mit.edu/~brucewww.media.mit.edu/~bruce
Prof. Bruce BlumbergProf. Bruce Blumberg
The Media Lab, MITThe Media Lab, MIT
[email protected]@media.mit.edu
www.media.mit.edu/~brucewww.media.mit.edu/~bruce
Practical & compelling real-time learningPractical & compelling real-time learning
• Easy for interactive characters to learn what they ought to be able to learn
• Easy for a human trainer to guide learning process
• A compelling user experience
• Provide heuristics and practical design principles
• Easy for interactive characters to learn what they ought to be able to learn
• Easy for a human trainer to guide learning process
• A compelling user experience
• Provide heuristics and practical design principles
My bias & focusMy bias & focus
• Learning occurs within an innate Learning occurs within an innate structure that biases…structure that biases…• Attention
• Motivation
• Innate frequency, form and organization of behavior
• When certain things are most easily learned
• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?
• Learning occurs within an innate Learning occurs within an innate structure that biases…structure that biases…• Attention
• Motivation
• Innate frequency, form and organization of behavior
• When certain things are most easily learned
• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?
Temporal representationTemporal representation
See temporal representation (aka Goatzilla) video on my website
Rover@homeRover@home
See rover@home video on my website or go to Scientific American Frontiers website
Why look at Dog Training?Why look at Dog Training?
• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous
and far too big to search exhaustively
• To be compelling characters must
• Learn “obvious” contingencies between state, actions and consequences quickly
• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.
• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily
• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous
and far too big to search exhaustively
• To be compelling characters must
• Learn “obvious” contingencies between state, actions and consequences quickly
• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.
• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily
Invaluable resourcesInvaluable resources
• Doing it, and talking to people who Doing it, and talking to people who do it.do it.
• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez
• Lindsay, Burch & Bailey, Lindsay, Burch & Bailey, MackintoshMackintosh
• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger
• Doing it, and talking to people who Doing it, and talking to people who do it.do it.
• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez
• Lindsay, Burch & Bailey, Lindsay, Burch & Bailey, MackintoshMackintosh
• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger
The problem facing dogs (real and synthetic)
The problem facing dogs (real and synthetic)
Set of all possible actions
Set of all motivational
goals
Set of all possible stimuli
What do I do, when, in order to best satisfy my motivational goals?
The space of possible stimuli is wicked bigThe space of possible stimuli is wicked big
Set of all possible stimuli
SmellsMotion
Sounds
Dog sounds
SpeechWhistles
Modality of Stimuli
Time of Occurence
State Space
The space of possible actions is also very bigThe space of possible actions is also very big
Set of all possible actions
Action
Time of Performance
Figure -8
Shake
Low shake
High -5
Beg
Down
Left ear twitch
Action Space
Who gets credit for good things happening?
Who gets credit for good things happening?
Yumm..
Action
Figure -8
Shake
Low shake
High -5
Beg
Down
Left ear twitch
Motion
Sounds
Dog sounds
SpeechWhistles
Modality of Stimuli
Who gets credit for good things happening?
Who gets credit for good things happening?
stalkgrab-bite
eye
orient
kill-bitechase
Yumm..
Time
Conventional idea: back propagation from goal
Conventional idea: back propagation from goal
stalkgrab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
Conventional idea: back propagation from goal
Conventional idea: back propagation from goal
stalkgrab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
Conventional idea: back propagation from goal
Conventional idea: back propagation from goal
stalkgrab-bite
eye
orient
kill-bitechase
Yumm..
Time Credit flows backward
The problemThe problem
• If each element in sequence has 3 If each element in sequence has 3 variants, there are 729 possible variants, there are 729 possible combinations of which 1 may work combinations of which 1 may work (ignoring stimuli)(ignoring stimuli)
• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.of stimuli-action pairs to explore.
• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached
• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?
• If each element in sequence has 3 If each element in sequence has 3 variants, there are 729 possible variants, there are 729 possible combinations of which 1 may work combinations of which 1 may work (ignoring stimuli)(ignoring stimuli)
• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.of stimuli-action pairs to explore.
• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached
• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?
Leyhausen’s suggestion…Leyhausen’s suggestion…
stalkgrab-bite
eye
orient
kill-bitechase
Time Each element is innately self-motivating and has innate reward metric
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
Leyhausen’s suggestion…Leyhausen’s suggestion…
stalkgrab-bite
eye
orient
kill-bitechase
Time Each element is innately self-motivating and has innate reward metric
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
motivation & reward
Coppinger’s suggestion…Coppinger’s suggestion…
stalkgrab-bite
eye
orient
kill-bitechase
Time Varying innate tendency to follow behavior with “next” in sequence
Functional goal plays incidental roleFunctional goal plays incidental role
stalkgrab-bite
eye
orient
kill-bitechase
Time Propagated value from functional goal plays incidental role
Yumm..
Big idea: innate biases make learning possible Big idea: innate biases make learning possible
• Biases include…Biases include…• Temporal Proximity implies causality
• Attend more readily to certain classes of stimuli than to others (motion vs. speech)
• Lazy discovery (pay attention once you have a reason to pay attention)
• Elements may be “innately” self-motivating and have local metric of “goodness”
• Biases include…Biases include…• Temporal Proximity implies causality
• Attend more readily to certain classes of stimuli than to others (motion vs. speech)
• Lazy discovery (pay attention once you have a reason to pay attention)
• Elements may be “innately” self-motivating and have local metric of “goodness”
Good trainers actively guide dog’s exploration
Good trainers actively guide dog’s exploration• BehavioralBehavioral
• Train behavior, then cue
• Differential rewards encourage variability
• MotorMotor• Shaping
• Rewarding successive approximations
• Luring
• Pose, e.g. “down”• Trajectory, e.g. “figure-8”
• BehavioralBehavioral• Train behavior, then cue
• Differential rewards encourage variability
• MotorMotor• Shaping
• Rewarding successive approximations
• Luring
• Pose, e.g. “down”• Trajectory, e.g. “figure-8”
Dogs constrain search for causal agentsDogs constrain search for causal agents
Time
Consequences Window:Trainer “clicks” signaling reward is coming.
When reward is actually received
Attention Window:Cue given immediately before or as dog is moving into desired pose
Sit Approach Eat
Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows
Dogs use implicit feedback to guide perceptual learning
Dogs use implicit feedback to guide perceptual learning
Sit
Time
“sit-utterance” perceived.
Approach Eat
“click” perceived.
Dog decides to sit
Build & update perceptual model of “sit-utterance”
Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models
Dogs give credit where credit is due…Dogs give credit where credit is due…
• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose
• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously
• ImplicationImplication• Dog associates reward with
resulting body configuration or trajectory and not just with “follow-your nose”
• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose
• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously
• ImplicationImplication• Dog associates reward with
resulting body configuration or trajectory and not just with “follow-your nose”
Observation: dogs give credit where credit is due
Observation: dogs give credit where credit is due
Sit
Time
“sit-utterance” perceived.
Approach Eat
“click” perceived.
Dog decides to sit
1. Credit sitting in presence of “sit-utterance”2. Build & update perceptual model of “sit-
utterance”
D.L.: Take Advantage of Predictable Regularities
D.L.: Take Advantage of Predictable Regularities• Constrain search for causal agents by Constrain search for causal agents by
taking advantage of temporal proximity taking advantage of temporal proximity & natural hierarchy of state spaces& natural hierarchy of state spaces• Use consequences to bias choice of action
• But vary performance and attend to differences
• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand
• Constrain search for causal agents by Constrain search for causal agents by taking advantage of temporal proximity taking advantage of temporal proximity & natural hierarchy of state spaces& natural hierarchy of state spaces• Use consequences to bias choice of action
• But vary performance and attend to differences
• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand
D.L.: Make Use of All Feedback: Explicit & Implicit
D.L.: Make Use of All Feedback: Explicit & Implicit• Use rewarded action as context for Use rewarded action as context for
identifying identifying •Promising state space and action space to
explore
•Good examples from which to construct perceptual models, e.g.,
•A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.
• Use rewarded action as context for Use rewarded action as context for identifying identifying •Promising state space and action space to
explore
•Good examples from which to construct perceptual models, e.g.,
•A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.
D.L.: Make Them Easy to TrainD.L.: Make Them Easy to Train
• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies
• Support Luring and ShapingSupport Luring and Shaping•Techniques to prompt infrequently expressed
or novel motor actions
• ““Trainer friendly” credit Trainer friendly” credit assignmentassignment•Assign credit to candidate that matches
trainer’s expectation
• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies
• Support Luring and ShapingSupport Luring and Shaping•Techniques to prompt infrequently expressed
or novel motor actions
• ““Trainer friendly” credit Trainer friendly” credit assignmentassignment•Assign credit to candidate that matches
trainer’s expectation
Limitations and Future WorkLimitations and Future Work
• Important extensions Important extensions •Other kinds of learning (e.g., social or
spatial)
•Generalization
•Sequences
•Expectation-based emotion system
• How will the system scale?How will the system scale?
• Important extensions Important extensions •Other kinds of learning (e.g., social or
spatial)
•Generalization
•Sequences
•Expectation-based emotion system
• How will the system scale?How will the system scale?
Useful InsightsUseful Insights
• UseUse•Temporal proximity to limit search.
•Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration
• “trainer friendly” credit assignment
• Luring and shaping are essentialLuring and shaping are essential
• UseUse•Temporal proximity to limit search.
•Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration
• “trainer friendly” credit assignment
• Luring and shaping are essentialLuring and shaping are essential
AcknowledgementsAcknowledgements
• Members of the Synthetic Members of the Synthetic Characters Group, past, present & Characters Group, past, present & futurefuture
• Gary WilkesGary Wilkes
• Funded by the Digital Life Funded by the Digital Life ConsortiumConsortium
• Members of the Synthetic Members of the Synthetic Characters Group, past, present & Characters Group, past, present & futurefuture
• Gary WilkesGary Wilkes
• Funded by the Digital Life Funded by the Digital Life ConsortiumConsortium