Acquisitionnwkpsych.rutgers.edu/~jose/courses/411_intro_cns/lectures/learning... · Increased...

Preview:

Citation preview

Increased conditioned response to the CS as a result of being paired with the US.

Acquisition

CSUS or CS+

Eyeblink Conditioning Example

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9

Blocks of 10 Trials

% C

Rs

Formation of a CS-US association

What Causes Acquisition?

CS US

Decreased conditioned response to a previously reinforced CS as a result of nonreinforced presentations (i.e., in the absence of the US).

Extinction

CSno US or CS-

Acquisition Extinction

0

20

40

60

80

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Blocks of 10 Trials

% C

Rs

CS+ CS-

• Unlearning – weakening the CS-US association

• Inhibition – new learning that interferes with the expression of the CS-US association

What Causes Extinction?

Prediction – if extinction is due to inhibition of an intact CS-US association, then certain stimulus manipulations should recover the CR.

Increased conditioned response to CS if delay is interpolated between extinction trials. (Pavlov, 1927)

Spontaneous Recovery

Acquisition Extinction Sessions

0

20

40

60

80

100

1 3 5 7 9 11 13 15 17 19 21 23

Blocks of 10 Trials

% C

Rs

Disinhibition

• Following light-food pairings, Pavlov initiated extinction:– During the extinction trials the dog stopped

salivating to the light CS.

• Pavlov now presented a new, novel stimulus, e.g., a clicker during the light CS.

• The dog salivated, suggesting a release from inhibition, or Disinhibition.

Disinhibition

LightFood Light-

Clicker

Conditioned Inhibition

• Pavlov discovered conditioned inhibition.

• A conditioned inhibitor is a stimulus that inhibits the conditioned response.

• Interspersed lightfood trialswith light-toneno US trials.

• Abbreviated A+/AX-. A = light, X = tone, + and – represent food and no food, respectively.

Conditioned Inhibition Procedure

A US AX A US

A USA US

A US AX A US

AX

Conditioned Inhibition

Trials

CR

A AX

A+/AX-

Summation Test for CI

Trials

CR

A AX BX

A+/AX-, B+

Retardation Test for CI

Trials

CR

A+/AX-, X+/Y+

X+

Y+

Second-Order Conditioning• Pavlov discovered the phenomenon of Second-Order Conditioning (SOC) which uses a similar procedure as that for conditioned inhibition.

• A+/AX- training.

• However, number of AX- trials is critical- Few AX- trials leads to SOC- Many AX- trials leads to conditioned inhibition

• SOC also typically produced in two phases.- A+ training followed by AX- training.

Second-Order ConditioningDesign of Conditioned InhibitionPhase 1 Test XA+/AX- CIMany AX- trials (tens to hundreds)

Design of Second-Order ConditioningPhase 1 Phase 2 Test XA+ AX- CRFew AX- trials (typically not more than 8-10)

Factors in conditioningContiguity: The closer two stimuli are in space

and time, the stronger the conditioned response.

Salience: More intense or noticeable stimuli condition more rapidly.

ContiguityDelay conditioning typically stronger than trace conditioning.

0

10

20

30

40

50

60

Delay Trace Control

Group

CR

Delay CSUS

Trace CS------------->US

Control CS ~US

SalienceMore intense CS conditions faster than less intense CS.

CR

Trials

80 Hz Tone

60 Hz Tone

Simple Model of Associative Learning

∆VCS =ab(-VCS(n))

Bush & Mostellar (1955)

∆VCS = change in associative strength of CS VCS = associative strength of CSλ = Asymptote of learning

Learning rate parametersα= CS salience (0-1; 0 = no CS)β = US salience (0-1; 0 = no US)

VCS = Vn + ∆Vn+1

Bush & Mostellar

λ = 1.0, β= .5 α60db = .2 α80db = .4

∆V60db V60db ∆V80db V80db .2 X .5(1-0) = .10 .10 .4 X .5(1-0) = .20 .20.2 X .5(1-.10) = .090 .19 .4 X .5(1-.20) = .16 .36.2 X .5(1-.19) = .081 .271 .4 X .5(1-.36) = .128 .488.2 X .5(1-.271) = .073 .344 .4 X .5(1-.488) = .102 .590.2 X .5(1-.344) = .066 .410 .4 X .5(1-.590) = .082 .672.2 X .5(1-.410) = .059 .469 .4 X .5(1-.672) = .066 .738

∆VCS = αβ(λ−VCS)VCS = Vn + ∆Vn+1

Bush & Mostellar - Acquisition

0.00

0.20

0.40

0.60

0.80

1.00

1 6 11 16 21 26 31

Trial

Ass

oci

ativ

e S

tren

gth

60 db

80 db

Bush & Mostellar - Extinction

0.00

0.20

0.40

0.60

0.80

1.00

1 6 11 16 21 26 31

Trial

Ass

oci

ativ

e S

tren

gth

60 db

80 db

∆VCS = αβ(λ-VCS)0

Factors in conditioningContiguity: The closer two stimuli are in space

and time, the stronger can be the association between them.

Salience: More intense or noticeable stimuli condition more rapidly.

Contingency: The higher the correlation between two stimuli, the stronger the conditioned response.

Cue-Competition EffectsWhen multiple CSs are presented together during conditioning.

Reduced response to one (or more) CS when it is presented alone on a probe test.

Overshadowing effectOvershadowing – Reduced CR to a CS if it is

paired with the US in the presence of a

more salient CS. i.e., AXUS, where A is more salient than X.

Design:Group Treatment Test xOvershadow Ax+ crAcq. Control x+ CR

Pavlov, 1927

Overshadowing (Blaisdell et al., 1998)

Group

cr

CR

→Overshadow

Control

Training Test

+

Blocking effectBlocking – Reduced CR to a CS if it is paired

with the US in the presence of a previously established CS.i.e., AUS followed by AXUS.

Design:Group Phase 1 Phase 2 Test XBlocking A+ AX+ crAcq. Control B+ AX+ CR

Kamin, 1968

Kamin’s (1968) interpretationof blocking

Group Phase 1 Phase 2Block AUS AXUSAcq BUS AXUS

US has to be “surprising” to the animal for learning of the blocked CS-US association to occur.

Because A predicts the US in the Blocking group, the US is not surprising during Phase 2 trials.

∆VCS = αβ(λ-VSUM)

Rescorla and Wagner (1972) modelFormalized the notion of “surprise” as a learning factor.

Blocking group∆VX = αβ(λ -VA+X)∆VX = 1(1 –[1+0]) = 0

Acq group∆VX = αβ(λ -VA+X)∆VX = 1(1 – [0+0]) = 1

Group Ph. 1 Ph. 2 VA VXBlock A+ AX+ 1 0 Acq B+ AX+ 1 1

Phase 2

Rescorla and Wagner (1972) model

Accounts for Contingency effects:

Blocking (Kamin)

Overshadowing (Pavlov)Ax+, A-US association develops faster than X-US

Conditioned inhibition (Pavlov)A+/AX-, (-VA+X) = (0-[1+0]) = -1X develops negative associative strength!

E.L. Thorndike

The Invention of S-R psychology: Connectionism ala ThorndikeInfluenced by Darwin, Associationismand Pavlov – but was interested in associations based on outcome or consequenceAnimal Intelligence? Anecdote insteadof careful observation and experiment. See---Hans the Clever Horse

The “Puzzle Box”

Learning: accident, insight or effect?

“When put into the box, the cat would show evident signs of discomfort and impulse to escape from confinement. It tries to squeeze through any opening; it claws and bites at the wire; it thrusts its paws out through any opening and claws at everything it reaches…. It does not pay very much attention to the food outside but seems simply to strive instinctively to escape from confinement…. The cat that is clawing all over the box in her impulsive struggle will probably claw the string or loop or button so as to open the door. And gradually all the other unsuccessful impulses will be stamped out and the particular impulse leading to the successful act will be stamped in by the resulting pleasure, until, after many trials, the cat will, when put in the box, immediately claw the button or loop in a definite way" (Thorndike, 1913:13).

Watching cats in the William Jame's Attic.

The Three “Laws”Thorndike's theory consists of three primary laws:

(1) law of effect - responses to a situation which are followed by a rewarding state of affairs (SATISFYERS) will be strengthened and become habitual responses to that situation, --the opposite for ANNOYERS (2) law of readiness - a series of responses can be chained together to satisfy some goal which will result in annoyance if blocked (expectancy), and (3) law of exercise - connections become strengthened with practice and weakened when practice is discontinued. A corollary of the law of effect was that responses that reduce the likelihood of achieving a rewarding state (i.e., punishments, failures) will decrease in strength.

Thorndike's legacy to education

Frequency not sufficent –Drill and rote memory fell out of favor after Thorndike's Law of effect was popularized

Punishment not as effective as RewardCorporal Punishment seen as less justified.

Thorndike Time-Line

1874 The birth of Edward Lee Thorndike

1897 Applied for graduate program at Columbia University

1898 Awarded his doctorate

1899 Instructor in Psychology at Teachers College, Columbia

1905 Formalized the Law of Effect

1911 Published "Animal Intelligence"

1912 Elected President of American Psychological Association

1917 One of the first psychologist admitted to the National Academy of Sciences

1921 Ranked #1 as an American Men of Science.

1934 Elected President of the American Association for the Advancement of Science

1939 Retired

1949 Thordike died

B F Skinner1904-1990

Behavior of Organisms (1938)

Generic nature of stimulus and response (1935)

The State of WorldStimulus Field

Response Equivalence set

Skinner

■ Respondent behavior - elicited by a known stimulus

■ Operant behavior - emitted by the organism

■ Type S or respondent conditioning■ Type R or operant conditioning

Skinner: The 3-term Contingency

{Stimulus}---->RESPONSE---->SR+

Environment EMITTED BEHAVIOR CONSEQUENCE

foraging context Search of area food

Skinner■ Deprivation State■ Magazine training■ shaping - reinforcement of successive

approximations toward a goal response– differential reinforcement– successive approximation

■ extinction - removal of the reinforcer maintaining a response such that the response is eliminated

Contingencies of Reinforcement

Positive

Negative

HEDONIC

ACCESSPresent Remove

Reward learning“praise”, “food” etc..

Punishment

“physical pain”“criticism”

Negative Reinforcement

“token economies”

Positive Punishment “time out”

“Corner search”“biting bar”“lever pressing”

Underlying Response Dimensions

Probability of response increases if it produces reinforcement

Pro

babi

lity

of R

espo

nse

Skinner onPunishment

■ Negatives of Punishment– emotional byproducts– Ineffective –not long lasting– states what organism should not do, not

what it should do– justifies inflicting pain - punishment is often

administered for the gradification of the punisher

– punishment elicits aggressive behavior

Skinner - Schedulesof Reinforcement

■ There are a large number of reinforcement schedules (Ferster & Skinner, 1957)

■ Seven we will address– continuous – fixed ratio (FR)– fixed interval (FI)– variable interval (VI)– variable ratio (VR)– concurrent– drl and dro

Skinner - Schedulesof Reinforcement

■ Continuous - every response is reinforced.

■ Fixed ratio is same but reinforcement occurs every nth time. Can be one to one or five to one etc.

Skinner - Schedulesof Reinforcement

0

2

4

6

8

10

12

14

16

18

20

Post Reinforcement Pause

Step Ladder Effect

Skinner - Schedules of Reinforcement

0

5

10

15

20

25

Variable RatioHigh Productivity

Skinner - Schedules of Reinforcement

0

10

20

30

40

50

60

70

80

Fixed Interval

Scalloping Effect

Skinner - Schedules of Reinforcement

0

5

10

15

20

25

30

Variable Interval

cummulative record

Herrnstein’s Matching Law■ For a concurrent reinforcement

schedule (rat responds to two levers each on a different reinforcement schedule) the relative frequency of behavior matches the relative frequency of reinforcement

Matching Law

B1

B1 + B2

=R1

R1 + R2

Skinner

■ Teaching machines■ programmed learning■ contingency contracts■ Skinner as social theorist■

■ Skinner introducing Teaching Machines.

Skinner- Verbal Behavior (1957)

■ Language– mand– tact– echoic– autoclitic

■ Critic Noam Chomsky■ Skinner framing the debate.

Anomalies, misbehavior and confusion about Operant conditioning and Pavlovian Conditioning.

Polydipsia (Adjunctive Behavior)

Superstitious Pigeons

Staddon & Simelhag (interim and terminal behaviors)

Premack Principle

Optimality

In the Brain.

Premack Principle

■ A less frequently engaged in activity can be reinforced by the opportunity to engage in a more frequently engaged in activity.

Timberlake’s Disequilibrium hypothesis

■ Given a free choice situation an organism will distribute its time in different behavior. Those behaviors set an equilibrium. If the animal falls below the equilibrium it will be motivated to increase the behavior to maintain the behavior. Behaviors moving to high creates the reverse situation

Misbehaviors

■ Breland and Breland – instinctual drift

■ Autoshaping - an animal will automatically condition itself instinctually if motivated. Pigeons peck at an illuminated disc prior to eating. This learn to do this automatically.

Recommended