Reward and Punishment. Cats escape from box to get a treat At first its all trial and error When successful the behaviour is rewarded This good

Operant ConditioningReward and Punishment

Thorndike’s cats – trial and error

Thorndike’s cats – trial and error

Cats escape from box to get a treat At first its all trial and error When successful the behaviour is

rewarded This good consequence strengthens

the behaviourLaw of effect – good consequence

more likely to be repeated, bad consequence not

Instrumental learning – the cat is active in achieving its own escape and reward

Operant Conditioning

A learning process by which the likelihood of a particular behaviour occurring is determined by the consequences of that behaviour

Theory of Operant Conditioning - Behaviour operates on the environment and our behaviour is instrumental in producing the consequences - Rewards/Punishments

US psychologist Burrhus Frederic Skinner (1904 – 1990) referred to the responses observed in trial and error learning as operants.

Operant Conditioning - Skinner American Psychologist, B.F Skinner (1904 –

1990) believed behaviour can be reduced to the relationships between the behaviour, its antecedents (the events that precede it), and its consequences

Operant - a response (or set or responses) that occurs and acts (“operates”) on the environment to produce some kind of effect. It is a response or behaviour that generates consequences.

Operant Conditioning

Operant Conditioning is based on Thorndike’s law of effect that an organism will tend to repeat a behaviour (operants) that have desirable consequences (e.g. receiving a treat) or that will enable it to avoid undesirable consequences (e.g. Given a detention). Organisms will tend not to repeat a behaviour that has undesirable consequences (e.g. Disapproval or a fine)

THREE-PHASE MODEL OF OPERANT CONDITIONING

3 components: 1. Stimulus (S) that precedes an operant

response 2. Operant response (R) to the stimulus 3. Consequence (C) to the operant

response

S R C

THREE-PHASE MODEL OF OPERANT CONDITIONING

Sometimes expressed as:

S R S where second S is a stimulus in the form of a

consequence.

The model means the probability of an operant response (R) to a stimulus (S) is a function of (depends on) the consequence (C) that has followed (R) in the past.

e.g. Cat in puzzle box, the S is the box, R the sequence of movements needed to open the door and C is escape and food.

See further examples in Table 10.2 (page 479)

SKINNER’S EXPERIMENTS WITH RATS

Skinner used the term “operant conditioning” rather than “instrumental learning” as he wanted to emphasise that animals and people learn to operate on the environment to produce desired or satisfying consequences.

He proposed that in Thorndike’s experiments the cat “operated” on the environment to allow it to escape and get the fish reward.


The operant that became conditioned was the behaviour of pushing the lever to open the door.

Skinner also contrasted operants with respondents in classical conditioning. Respondents are behaviours produced by known or recognised stimuli.e.g. Pavlov’s dogs responded by salivating to meat powder and later the bell. Thorndike’s cats made many different responses that were not prompted by a particular stimulus. The dog receives a consequence (food) whether or not it has learned the conditioned response.

This is why Skinner referred to classical conditioning as “respondent conditioning”.


In operant conditioning the consequence only occurs if the organism performs the response.

SUMMING UP:In operant conditioning, if responses are

not made, the consequence doesn’t happen. In classical conditioning, responses occur regardless of responding.


Skinner believed that ALL behaviour could be explained by the relationships between the behaviour, its antecedents (events occurring before it) and its consequences.

Skinner argued that any behaviour that is followed by a consequence will change in strength (become more, or less, established) and frequency (occur more, or less often) depending on the nature of that consequence (reward or punishment).

THE SKINNER BOX

The Skinner Box is a small chamber in which an experimental animal learns to make a particular response for which the consequences can be controlled by the researcher.It contains a lever that delivers food (or water) into a dish when pressed.Some boxes also have lights and buzzers, some have grid floors that can deliver a mild electric shock.

THE SKINNER BOX

Lever is usually wired to a cumulative recorder (chart paper with a pen that makes a special mark each time a desired response is made).

The recorder indicates how often (frequency) of response and the rate of response (speed).

Rats – press level Pigeons – peck disc.

The Skinner Box

The Skinner Box


Skinner referred to different types of rewards as “reinforcers”.

He used the Skinner Box to reward the animals according to different types of programs or schedules of reinforcement.

The fact the rats were hungry provided the motivation for their frantic activity, increasing the probability the lever would eventually be pressed and the food reward dispensed.


Skinner believed there was no need to search for internal agents (factors within an organism) to explain changes in behaviour.

He based his view on the notion that behaviour can be understood in terms of environmental or external influences, without any consideration of internal mental processes.

Reinforcement

Any stimulus (event or action) that subsequently strengthens or increases the likelihood of the response (behaviour) that it follows.

The reinforcer comes after the response (behaviour)

Reinforcement makes things stronger

Reinforcement

Reinforcement can involve receiving a pleasant stimulus (e.g. Treat for your dog) or avoiding or escaping an unpleasant stimulus (e.g. Umbrella on a rainy day).

An essential feature of reinforcement is that it is only used after the desired or correct response is made.

Reinforcement

A reinforcer is any stimulus (object or event) that strengthens or increases the frequency or likelihood of a response that it follows.

The word reinforcer is often used interchangeably with the word reward (although they are not technically the same).

One difference is that a reward suggests an outcome that is positive, such as satisfaction or pleasure.

A stimulus is a reinforcer if it strengthens the preceding behaviour.

Also, a stimulus can be rewarding because it’s pleasurable, but is not a reinforcer unless it increases the frequency of a response or the likelihood of a response occurring.

e.g. Eating chocolate is pleasurable but is not a reinforcer unless it promotes or strengthens a particular response.

Positive Reinforcement GOOD

Positive Reinforcer PLUS something GOOD A stimulus which strengthens a

response by providing a pleasant or satisfying consequence

Skinners experiment = food pellets Money Grades Applause

Negative Reinforcement BAD

Negative Reinforcer MINUS something BAD A stimulus that strengthens a response by

the reduction, removal or prevention of an unpleasant stimulus

The behaviour that removes reduces or prevents an unpleasant stimulus is strengthened by the consequence

Skinners experiment = electric shock Taking Panadol for headache Driving slow to avoid fine

Pos vs Neg Reinforcement Positive reinforcement add good

Negative reinforcement take away bad

Both STRENGTHEN a response Overall outcome is desirable to

organism, just have achieved it in different ways

Punishment BAD GOOD Positive punishment - the delivery of a stimulus following an

undesirable response PLUS BAD

Negative punishment – the removal of a stimulus following an undesired response

MINUS GOOD

Punisher – an unpleasant stimulus that when paired with a response weakens the response or decreases the rate of responding over time

Punishers reduce unwanted behaviour

It is usually more effective to reinforce alternative desirable behaviour than it is to punish undesirable behaviour

Response cost – negative punishment GOOD

MINUS GOOD Negative punishment often referred

to as response cost When a valued stimulus removed Eg. If you drink drive we will take

away your licence

Schedules of Reinforcement The way reinforcement is delivered is

referred to as the “schedule of reinforcement”.

It is a program for giving reinforcement, specifically the frequency and manner in which a desired response is reinforced.

The schedule influences the speed of learning and the strength of the learned response.

Schedules of Reinforcement

Continuous Reinforcement necessary for a response to become learned

Partial Reinforcement can be more effective at maintaining a response

Schedules of ReinforcementFixed Ratio Fixed number of correct responses Being paid $5 for every 100 newspapers delivered

Variable Ratio Variable number of correct responses Poker machines

Fixed Interval Fixed time period Teachers at Gleneagles get paid every fortnight

Variable Interval Variable time period Fishing

Variable Ratio

The variable ratio schedule is the most resistant to extinction

It leads to the fastest rate of responding

Gambling addiction is explicable through variable ratio reinforcement

Schedules of Reinforcement

Factors Effecting Reinforcement Order of presentation – reinforcement needs to

occur after the desired response not before! So the organism associates the reinforcement with the behaviour

Timing – Reinforcers need to occur as close in time to the desired response as possible. Most effective reinforcement occurs immediately after the desired response

Appropriateness of the reinforcer – For a

stimulus to be a reinforcer it must provide a pleasing or satisfying consequence for its recipient.

REINFORCERS

Reinforcers that work in one situation will not always work in another.

The characteristics of the individual involved and the particular situation need to be taken into account when deciding on the best kind of reinforcer to be used.

An inappropriate punisher can have the opposite effect and produce the same consequence as a reinforcer. (e.g. Giving verbal reprimand from a teacher to an attention seeking , talkative Year 8 student can act as a reinforcer for the talkative behaviour)

PUNISHMENT

Punishment may temporarily decrease the occurrence of unwanted responses or behaviour, but it doesn’t promote more desirable or appropriate behaviour in its place.

So, instead Skinner advocated for the greater use of positive reinforcement to strengthen desirable behaviours or to promote the learning of alternative behaviours to punishable behaviours.

KEY PROCESSES IN OPERANT CONDITIONING

Same key processes as in classical conditioning: ACQUISITION EXTINCTION SPONTANEOUS RECOVERY STIMULUS GENERALISATION STIMULUS DISCRIMINATION

ACQUISITION

Refers to the overall learning process during which a specific response, or pattern of responses is established.

THE MEANS by which this is acquired is different between operant and classical conditioning.

TYPES OF BEHAVIOURS acquired through operant conditioning are usually more complex than the reflexive involuntary responses in classical conditioning.

ACQUISITION

Acquisition in operant conditioning is the establishment of a response through reinforcement.

Speed of establishment of response depends on the schedule of reinforcement.

Sometimes, a behaviour to be acquired is too complex to be performed completely at the end of the acquisition process, so a simpler version of the behaviour or a step towards the target behaviour is attempted and reinforced continuously until it is established. This involves a procedure called shaping.

Extinction

Extinction – the gradual decrease in the strength or rate of responding after a period of non- reinforcement. Extinction occurs after the termination of reinforcement.

Extinction has occurred when a conditioned response is no longer present.

Depending on whether partial or continuous reinforcement has been used, the response rate may actually increase in the initial phase of extinction after reinforcement is stopped.

Extinction

There is often reluctance to stop the response altogether as it has had satisfying consequences.

Frustration and anger may also accompany the increased response rate.

Extinction is less likely to occur when partial reinforcement is used. Uncertainty leads to greater tendency for response to continue.

SPONTANEOUS RECOVERY Spontaneous recovery – the response is

(after a rest period) again shown in the absence of reinforcement.

Response is likely to be weaker and will probably not last very long.

A spontaneously recovered response is often stronger when it occurs after a lengthy period following extinction of the response than when it occurs relatively soon after extinction.

Stimulus Generalisation and Discrimination

Stimulus generalisation - occurs when the correct response is made to another stimulus which is similar to the stimulus for which reinforcement is obtained.

Response usually occurs at a reduced level (frequency and strength) e.g. pigeons pecked other colored lights

Stimulus discrimination - organism makes response to a stimulus for which reinforcement is obtained but not for any other similar stimulus (e.g. sniffer dogs used by drug detection units)

Shaping

Shaping – a strategy in which a reinforcer is given for any response that successively approximates and ultimately leads to the final desired response

Used to train behaviours that are unlikely to occur spontaneously

SHAPING

Also known as the method of successive approximations.

Used when the desired response has a low probability of occurring naturally.

Used in real life – dolphins at SeaWorld for entertainment purposes, search and rescue dogs tracking skills, guide dogs.

Learning to write Children learning

to swim

Monkeys trained to assist quadraplegics (Read Box 10.7 on page 499)

Token Economies

The consistent use of Operant conditioning to alter behaviour over time

Use of tokens as rewards that can be ‘cashed in’ for bigger rewards later

Schools Prisons

TOKEN ECONOMIES

Token Economies are a form of behaviour modification using reinforcement tokens to influence behaviour change.

E.g. Prisons – tokens cashed in for rewards such as cigarettes or privileges.

A token economy is a setting in which an individual receives tokens (reinforcers) for desired behaviour and these tokens can then be collected and exchanged for other reinforcers in the form of actual or “real” rewards.

E.g. Prisons, schools

TOKEN ECONOMIES

Tokens may be withdrawn as “penalties” for undesirable behaviour.

Advantage of tokens:can be used in large group situations where real rewards are difficult to administer immediately after a desired behaviour occurs.

Once desired behaviour is established, tokens can be phased out and replaced by more “natural” and easily administered reinforcers (e.g. Praise, smile).

E.g. Schools – to increase reading by students, improve social skills of students with intellectual disabilities.

BACKFIRING TOKEN ECONOMIES

Sometimes token economies backfire or fail.

WHY?

People may feel manipulated and refuse to co-operate.

OrSituations are so complex and uncontrolled that well planned programs can go wrong. (e.g. Not smiling when delivering reinforcer)

Operant conditioning procedures may fail also when the underlying cause of a behaviour is not altered.e.g. Rewarding cheerfulness when the gloominess is caused by a boring job – the solution may lie in changing jobs.

Classical Vs Operant Conditioning

CLASSICAL OPPERANT

ROLE OF LEARNER

Passive Active

TIMING OF STIMULUS & RESPONSE

Stimulus before response

Stimulus (reinforcement) after response

NATURE OF THE RESPONSE

Automatic, Involuntary

Voluntary

Documents

Reward and Punishment. Cats escape from box to get a treat At first its all trial and error When successful the behaviour is rewarded This good