40
Learning from how dogs Learning from how dogs learn learn Prof. Bruce Blumberg Prof. Bruce Blumberg The Media Lab, MIT The Media Lab, MIT [email protected] [email protected] www.media.mit.edu/~bruce www.media.mit.edu/~bruce

Learning from how dogs learn Prof. Bruce Blumberg The Media Lab, MIT [email protected] Prof. Bruce Blumberg The Media Lab, MIT

Embed Size (px)

Citation preview

Learning from how dogs Learning from how dogs learnlearn

Learning from how dogs Learning from how dogs learnlearn

Prof. Bruce BlumbergProf. Bruce Blumberg

The Media Lab, MITThe Media Lab, MIT

[email protected]@media.mit.edu

www.media.mit.edu/~brucewww.media.mit.edu/~bruce

Prof. Bruce BlumbergProf. Bruce Blumberg

The Media Lab, MITThe Media Lab, MIT

[email protected]@media.mit.edu

www.media.mit.edu/~brucewww.media.mit.edu/~bruce

About me…About me…

About me…About me…

Practical & compelling real-time learningPractical & compelling real-time learning

• Easy for interactive characters to learn what they ought to be able to learn

• Easy for a human trainer to guide learning process

• A compelling user experience

• Provide heuristics and practical design principles

• Easy for interactive characters to learn what they ought to be able to learn

• Easy for a human trainer to guide learning process

• A compelling user experience

• Provide heuristics and practical design principles

My bias & focusMy bias & focus

• Learning occurs within an innate Learning occurs within an innate structure that biases…structure that biases…• Attention

• Motivation

• Innate frequency, form and organization of behavior

• When certain things are most easily learned

• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?

• Learning occurs within an innate Learning occurs within an innate structure that biases…structure that biases…• Attention

• Motivation

• Innate frequency, form and organization of behavior

• When certain things are most easily learned

• What are the catalytic components of the What are the catalytic components of the scaffolding that make learning possible?scaffolding that make learning possible?

sheep|dog:trial by eiresheep|dog:trial by eire

See sheep|dog video on my website

Object persistenceObject persistence

See object persistence video on my website

Temporal representationTemporal representation

See temporal representation (aka Goatzilla) video on my website

Alpha WolfAlpha Wolf

See alpha wolf video on my website

Rover@homeRover@home

See rover@home video on my website or go to Scientific American Frontiers website

Dobie T. Coyote Goes to SchoolDobie T. Coyote Goes to School

See Dobie video on my website

Why look at Dog Training?Why look at Dog Training?

• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous

and far too big to search exhaustively

• To be compelling characters must

• Learn “obvious” contingencies between state, actions and consequences quickly

• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.

• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily

• Interactive characters pose unique challenges:Interactive characters pose unique challenges:• State, action and state-action spaces are often continuous

and far too big to search exhaustively

• To be compelling characters must

• Learn “obvious” contingencies between state, actions and consequences quickly

• Easy to train without visibility into internal state of character.• Learning is only one thing they have to do.

• Dogs and their trainers seem to solve these Dogs and their trainers seem to solve these problems easilyproblems easily

Invaluable resourcesInvaluable resources

• Doing it, and talking to people who Doing it, and talking to people who do it.do it.

• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez

• Lindsay, Burch & Bailey, Lindsay, Burch & Bailey, MackintoshMackintosh

• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger

• Doing it, and talking to people who Doing it, and talking to people who do it.do it.

• Wilkes, Pryor, RamirezWilkes, Pryor, Ramirez

• Lindsay, Burch & Bailey, Lindsay, Burch & Bailey, MackintoshMackintosh

• Lorenz, Leyhausen, Coppinger & Lorenz, Leyhausen, Coppinger & CoppingerCoppinger

The problem facing dogs (real and synthetic)

The problem facing dogs (real and synthetic)

Set of all possible actions

Set of all motivational

goals

Set of all possible stimuli

What do I do, when, in order to best satisfy my motivational goals?

The space of possible stimuli is wicked bigThe space of possible stimuli is wicked big

Set of all possible stimuli

SmellsMotion

Sounds

Dog sounds

SpeechWhistles

Modality of Stimuli

Time of Occurence

State Space

The space of possible actions is also very bigThe space of possible actions is also very big

Set of all possible actions

Action

Time of Performance

Figure -8

Shake

Low shake

High -5

Beg

Down

Left ear twitch

Action Space

Who gets credit for good things happening?

Who gets credit for good things happening?

Yumm..

Action

Figure -8

Shake

Low shake

High -5

Beg

Down

Left ear twitch

Motion

Sounds

Dog sounds

SpeechWhistles

Modality of Stimuli

Who gets credit for good things happening?

Who gets credit for good things happening?

stalkgrab-bite

eye

orient

kill-bitechase

Yumm..

Time

Conventional idea: back propagation from goal

Conventional idea: back propagation from goal

stalkgrab-bite

eye

orient

kill-bitechase

Yumm..

Time Credit flows backward

Conventional idea: back propagation from goal

Conventional idea: back propagation from goal

stalkgrab-bite

eye

orient

kill-bitechase

Yumm..

Time Credit flows backward

Conventional idea: back propagation from goal

Conventional idea: back propagation from goal

stalkgrab-bite

eye

orient

kill-bitechase

Yumm..

Time Credit flows backward

The problemThe problem

• If each element in sequence has 3 If each element in sequence has 3 variants, there are 729 possible variants, there are 729 possible combinations of which 1 may work combinations of which 1 may work (ignoring stimuli)(ignoring stimuli)

• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.of stimuli-action pairs to explore.

• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached

• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?

• If each element in sequence has 3 If each element in sequence has 3 variants, there are 729 possible variants, there are 729 possible combinations of which 1 may work combinations of which 1 may work (ignoring stimuli)(ignoring stimuli)

• If there are 12 possible stimuli, there are If there are 12 possible stimuli, there are 1,586,874,322,944 possible combinations 1,586,874,322,944 possible combinations of stimuli-action pairs to explore.of stimuli-action pairs to explore.

• Don’t know if it is the right sequence until Don’t know if it is the right sequence until goal is reachedgoal is reached

• What happens if “variant” needs to be What happens if “variant” needs to be learned?learned?

Leyhausen’s suggestion…Leyhausen’s suggestion…

stalkgrab-bite

eye

orient

kill-bitechase

Time Each element is innately self-motivating and has innate reward metric

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

Leyhausen’s suggestion…Leyhausen’s suggestion…

stalkgrab-bite

eye

orient

kill-bitechase

Time Each element is innately self-motivating and has innate reward metric

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

motivation & reward

Coppinger’s suggestion…Coppinger’s suggestion…

stalkgrab-bite

eye

orient

kill-bitechase

Time Varying innate tendency to follow behavior with “next” in sequence

Functional goal plays incidental roleFunctional goal plays incidental role

stalkgrab-bite

eye

orient

kill-bitechase

Time Propagated value from functional goal plays incidental role

Yumm..

Big idea: innate biases make learning possible Big idea: innate biases make learning possible

• Biases include…Biases include…• Temporal Proximity implies causality

• Attend more readily to certain classes of stimuli than to others (motion vs. speech)

• Lazy discovery (pay attention once you have a reason to pay attention)

• Elements may be “innately” self-motivating and have local metric of “goodness”

• Biases include…Biases include…• Temporal Proximity implies causality

• Attend more readily to certain classes of stimuli than to others (motion vs. speech)

• Lazy discovery (pay attention once you have a reason to pay attention)

• Elements may be “innately” self-motivating and have local metric of “goodness”

Good trainers actively guide dog’s exploration

Good trainers actively guide dog’s exploration• BehavioralBehavioral

• Train behavior, then cue

• Differential rewards encourage variability

• MotorMotor• Shaping

• Rewarding successive approximations

• Luring

• Pose, e.g. “down”• Trajectory, e.g. “figure-8”

• BehavioralBehavioral• Train behavior, then cue

• Differential rewards encourage variability

• MotorMotor• Shaping

• Rewarding successive approximations

• Luring

• Pose, e.g. “down”• Trajectory, e.g. “figure-8”

Dogs constrain search for causal agentsDogs constrain search for causal agents

Time

Consequences Window:Trainer “clicks” signaling reward is coming.

When reward is actually received

Attention Window:Cue given immediately before or as dog is moving into desired pose

Sit Approach Eat

Dogs make the problem tractable by constraining search for causal agents to narrow temporal windows

Dogs use implicit feedback to guide perceptual learning

Dogs use implicit feedback to guide perceptual learning

Sit

Time

“sit-utterance” perceived.

Approach Eat

“click” perceived.

Dog decides to sit

Build & update perceptual model of “sit-utterance”

Dogs use rewarded action to identify potentially promising state to explore and to guide formation of perceptual models

Dogs give credit where credit is due…Dogs give credit where credit is due…

• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose

• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously

• ImplicationImplication• Dog associates reward with

resulting body configuration or trajectory and not just with “follow-your nose”

• Trainer repeatedly lures dog Trainer repeatedly lures dog through a trajectory or into a through a trajectory or into a pose pose

• Eventually, dog performs Eventually, dog performs behavior spontaneouslybehavior spontaneously

• ImplicationImplication• Dog associates reward with

resulting body configuration or trajectory and not just with “follow-your nose”

Observation: dogs give credit where credit is due

Observation: dogs give credit where credit is due

Sit

Time

“sit-utterance” perceived.

Approach Eat

“click” perceived.

Dog decides to sit

1. Credit sitting in presence of “sit-utterance”2. Build & update perceptual model of “sit-

utterance”

D.L.: Take Advantage of Predictable Regularities

D.L.: Take Advantage of Predictable Regularities• Constrain search for causal agents by Constrain search for causal agents by

taking advantage of temporal proximity taking advantage of temporal proximity & natural hierarchy of state spaces& natural hierarchy of state spaces• Use consequences to bias choice of action

• But vary performance and attend to differences

• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand

• Constrain search for causal agents by Constrain search for causal agents by taking advantage of temporal proximity taking advantage of temporal proximity & natural hierarchy of state spaces& natural hierarchy of state spaces• Use consequences to bias choice of action

• But vary performance and attend to differences

• Explore state and action spaces on “as-Explore state and action spaces on “as-needed” basisneeded” basis• Build models on demand

D.L.: Make Use of All Feedback: Explicit & Implicit

D.L.: Make Use of All Feedback: Explicit & Implicit• Use rewarded action as context for Use rewarded action as context for

identifying identifying •Promising state space and action space to

explore

•Good examples from which to construct perceptual models, e.g.,

•A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

• Use rewarded action as context for Use rewarded action as context for identifying identifying •Promising state space and action space to

explore

•Good examples from which to construct perceptual models, e.g.,

•A good example of a “sit-utterance” is one that occurs within the context of a rewarded Sit.

D.L.: Make Them Easy to TrainD.L.: Make Them Easy to Train

• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies

• Support Luring and ShapingSupport Luring and Shaping•Techniques to prompt infrequently expressed

or novel motor actions

• ““Trainer friendly” credit Trainer friendly” credit assignmentassignment•Assign credit to candidate that matches

trainer’s expectation

• Respond quickly to “obvious” Respond quickly to “obvious” contingenciescontingencies

• Support Luring and ShapingSupport Luring and Shaping•Techniques to prompt infrequently expressed

or novel motor actions

• ““Trainer friendly” credit Trainer friendly” credit assignmentassignment•Assign credit to candidate that matches

trainer’s expectation

The SystemThe System

Dobie T. Coyote…Dobie T. Coyote…

See dobie video on my website

Limitations and Future WorkLimitations and Future Work

• Important extensions Important extensions •Other kinds of learning (e.g., social or

spatial)

•Generalization

•Sequences

•Expectation-based emotion system

• How will the system scale?How will the system scale?

• Important extensions Important extensions •Other kinds of learning (e.g., social or

spatial)

•Generalization

•Sequences

•Expectation-based emotion system

• How will the system scale?How will the system scale?

Useful InsightsUseful Insights

• UseUse•Temporal proximity to limit search.

•Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration

• “trainer friendly” credit assignment

• Luring and shaping are essentialLuring and shaping are essential

• UseUse•Temporal proximity to limit search.

•Hierarchical representations of state, action and state-action space & use implicit feedback to guide exploration

• “trainer friendly” credit assignment

• Luring and shaping are essentialLuring and shaping are essential

AcknowledgementsAcknowledgements

• Members of the Synthetic Members of the Synthetic Characters Group, past, present & Characters Group, past, present & futurefuture

• Gary WilkesGary Wilkes

• Funded by the Digital Life Funded by the Digital Life ConsortiumConsortium

• Members of the Synthetic Members of the Synthetic Characters Group, past, present & Characters Group, past, present & futurefuture

• Gary WilkesGary Wilkes

• Funded by the Digital Life Funded by the Digital Life ConsortiumConsortium