Skinner's Operant Conditioning: When Rewards BackfireEarly years students in green cardigans use toy mechanisms and building blocks, exploring cause and effect based on Skinner's theories.

Updated on  

June 20, 2026

Skinner's Operant Conditioning: When Rewards Backfire

|

March 28, 2023

Skinner’s operant conditioning works in classrooms, but misapplied rewards backfire. Learn reinforcement schedules, misconceptions, and the science.

Build your next lesson freeExplore the toolkit
Copy citation

Main, P (2023, March 28). Skinner's Theories. Retrieved from https://www.structural-learning.com/post/skinners-theories

Skinner's theory of operant conditioning (Skinner, 1953) explains how consequences shape behaviour. In operant conditioning, learners repeat actions that bring useful outcomes. They reduce actions that bring unwanted outcomes.

This connects to the wider context of fundamental theories of learning in modern classroom practice.

Infographic summarizing Skinner's Operant Conditioning framework for UK classrooms, focusing on consequences, ethical reinforcement, and self-regulation.
Skinner's Operant Conditioning: A Classroom Framework

Positive reinforcement adds something valued. Negative reinforcement removes something unpleasant. Punishment tries to reduce an action.

In schools, this explains why praise, routines, points and sanctions can change behaviour quickly. It also explains why they can go wrong. A sticker chart may bring short-term quiet, but it can teach learners to work only when a reward is visible (Deci, Koestner, and Ryan, 1999).

The classroom question is practical. Use Skinner to spot the antecedent, behaviour and consequence. Then teach the replacement behaviour and fade rewards over time.

The goal is not control for its own sake. The goal is calm routines, clear feedback and growing self-regulation.

Use a simple test. First, name the action, then name what happens next. Ask whether that consequence makes the action more or less likely.

If it does, Skinner is useful. If it does not, look instead at task design, emotion, relationships or prior knowledge.

Key Takeaways

  • Consequences shape habits: learners are more likely to repeat actions that lead to useful outcomes.
  • Reinforcement is not bribery: use it to teach a replacement behaviour, then fade the reward.
  • Punishment has limits: it may stop an action briefly, but it does not teach what to do instead.
  • Ethics matter: rewards and sanctions should be fair, calm and linked to clear routines.
  • Self-regulation is the goal: Skinner helps teachers design routines that learners can later manage for themselves.

Skinner used operant conditioning to train pigeons, but classroom rewards need a narrower claim. Deci, Koestner, and Ryan (1999) reported a free-choice effect of d = -0.34 for tangible rewards overall, while expected and controlling rewards were more damaging than informational feedback. The classroom lesson is practical: do not pay learners with points for work they already value. Use rewards briefly, make the feedback specific, and fade the reward before interest becomes transactional.

Evidence overview

What the research says

Paul Main reviewed this article. He is the Founder and Educational Consultant at Structural Learning.

Operant Conditioning Definition

Classical conditioning and operant conditioning are often confused, but they work in different ways. Classical conditioning, linked with Pavlov (1927), produces automatic, reflexive responses. The familiar example is the bell paired with food in Pavlov's dogs. In this kind of learning, learners have little choice in how they react.

Operant conditioning is built on voluntary action. The learner does something, such as raising a hand, completing homework, or speaking up in class. What happens next shapes whether they are likely to do it again. The consequence follows the behaviour, and that timing creates the learning.

The ABC model makes operant conditioning easier to use in classrooms. A stands for Antecedent: the trigger, instruction, or context that prompts behaviour. For example, a teacher sets a maths problem on the board.

B is the Behaviour: the learner raises their hand to ask for help, works quietly, or doodles in the margin. C is the Consequence: praise, a mark, peer attention, or the internal satisfaction of solving it. The consequence is what increases or decreases the chance that the behaviour will happen again. Skinner (1953) developed this principle through animal studies and later applied it to human learning contexts, including education and clinical psychology.

In a Year 4 maths lesson, the teacher displays a challenging problem (antecedent). Learner A raises their hand to ask a clarifying question (behaviour). The teacher responds warmly with a hint (consequence: positive attention), so Learner A is more likely to ask questions in future lessons.

Skinner's Theory: Operant Conditioning for UK Classrooms — visual explainer sketchnote
An at-a-glance visual summary of Skinner's Theory: Operant Conditioning for UK Classrooms.

Learner B also raises their hand, but the teacher is busy and misses it. Learner B may stop volunteering because there was no useful consequence.

Learner C gives an incorrect answer aloud and hears giggles from peers. Learner C may avoid speaking in class because the consequence was social risk. All three outcomes depend on what happened after the behaviour, not on the learner's nature or fixed ability.

◆ Structural Learning
Skinner's Theory: Operant Conditioning for UK Classrooms
~22 min
A deep-dive audio episode

A concise Structural Learning audio episode on Skinner's Theory: Operant Conditioning for UK Classrooms, grounded in the curated research dossier and focused on practical classroom use.

Positive Reinforcement in Classrooms

Positive reinforcement means adding a valued consequence after a behaviour so the behaviour is more likely to happen again. In class, that might be quiet praise after a learner uses a sentence stem, a house point for arriving with the right equipment, or extra choice after a group tidies well. It works best when the learner understands exactly which behaviour earned the response.

In a Year 6 English lesson, a learner who usually struggles with writing finishes a paragraph ahead of schedule and reads it aloud. The teacher says, "That opening sentence is vivid; I can picture the scene. Well done," and writes a note in the learner's book.

The praise works because it is immediate, specific, and valued by the learner. Not every learner wants public recognition, so the teacher still needs to choose the right form of feedback. Korpershoek et al. (2016) found small but significant average effects of classroom-management strategies on student outcomes. Skinner (1957) called this pattern "reinforcing the operant."

A common pitfall is praise that is too vague ("Good job!"), too delayed (weeks later at parents' evening), or given for effort on a task the learner already finds intrinsically rewarding (see the Common Misconceptions section below). The goal is to use positive reinforcement strategically: reinforce the behaviour you most want to see, and do it often enough that learners build new habits before the reinforcement is gradually withdrawn.

Negative Reinforcement in Practice

Negative reinforcement means removing something unpleasant after a behaviour occurs, increasing the likelihood that the behaviour will happen again. Despite its name, negative reinforcement is not punishment. It strengthens behaviour. The word "negative" means the removal, or subtraction, of an aversive stimulus, not the quality of the outcome.

In a Year 3 classroom, learners are asked to work quietly until the noise level drops below a set threshold. Once it does, the timer stops and learners get extra playtime. The unpleasant demand is removed, and a valued activity is added.

The learner who settles first experiences relief because the demand for silence is lifted. Next time, they are more likely to settle quickly because they have learned that doing so ends the aversive state. Schools use the same pattern when they say, "Once you have completed your spelling words, you can leave the table."

The Department for Education's Behaviour in Schools guidance (February 2024) says schools should teach expected behaviour through clear routines, use positive reinforcement and sanctions, and make reasonable proactive adjustments for learners with additional needs. For example, a teacher may stop insisting on eye contact from neurodivergent learners. They may also avoid requiring perfect silence during independent work.

Negative reinforcement can create compliance. But it can also teach learners to tolerate or ignore the aversive condition. As a result, they may not truly internalise the value of the behaviour. Teachers sometimes overuse it, which can create classrooms where learners behave only to escape demands, not because they understand or agree with expectations.

Positive Punishment in Schools

Positive punishment means adding something unpleasant after a behaviour, so the behaviour is less likely to happen again. In school, this often means a verbal reprimand, loss of responsibility, or detention.

Imagine a Year 5 learner interrupts a lesson. The teacher calmly removes two minutes of playtime, which usually stops the learner from interrupting again immediately. However, Skinner (1953) argued that punishment may suppress behaviour temporarily and can produce unwanted emotional, escape and avoidance effects; it does not by itself teach alternative behaviour.

Punishment does not teach the replacement behaviour. It usually suppresses the unwanted behaviour for a short time. This is where strict zero-tolerance systems and isolation booths can drift away from Skinner's own logic: if a learner is removed, shamed or isolated without being taught what to do instead, the school has only changed the consequence, not the repertoire. Sidman (1989) warned that coercive control often produces avoidance, anxiety and counter-control, so repeated removals can train learners to escape school rather than rejoin learning.

A less harsh form is removing a privilege: a learner who has been off-task loses the choice of activity for the afternoon and is assigned a specific task instead. The consequence (reduced autonomy) is delivered calmly and without shame. The learner's behaviour may improve in the short term, but the long-term effect depends on whether they also learn what behaviour is expected and why it matters. Without that combination, punishment alone breeds resentment, not understanding.

Negative Punishment and Lost Privileges

Negative punishment means removing something desirable after a behaviour occurs, decreasing the likelihood that the behaviour will happen again. Teachers may also hear this called response cost or withdrawal of privileges. It is the quadrant that people most often misunderstand.

In a Year 4 classroom, learners earn minutes in a behaviour tally to spend on Friday afternoon activities. When a learner makes unkind comments during group work, the teacher quietly says, "That is unkind. You have lost two minutes," and updates the tally.

The learner has lost access to something they value. The behaviour is less likely to happen again if the loss feels significant and is applied consistently. The EEF (2019) behaviour guidance supports consistent policies, but it also stresses knowing learners, teaching learning behaviours and giving targeted support. Digital token economies, including ClassCharts-style points systems, can create high compliance but low independence if leaders treat points as the behaviour policy itself.

When rewards are removed, behaviour often returns to baseline. This happens because learners have learned to comply when incentives are visible, not to self-regulate.

Skinner's Operant Conditioning Study Notes preview
◆ Structural Learning
Skinner's Operant Conditioning Study Notes
Study notesOne-page revision sheet

Download a one-page study note for Skinner's Operant Conditioning, with the key ideas, limitations and classroom links in one place.

Something went wrong - please try again.

A related concern is shame. Some learners feel deeply shamed when they lose a privilege, especially if adults announce it in public. SEND learners may also find behaviour systems unpredictable or sensory-threatening, particularly when points are removed without warning. The Timpson Review (2019) helps here: behaviour may signal an unmet need, but schools still need clear expectations.

Negative punishment works best when adults explain the cause and effect. They should also teach the replacement behaviour and make reasonable adjustments when disability, trauma or communication needs are involved.

Why Reinforcement Schedules Matter

Schedules of reinforcement are rules for when teachers give feedback or rewards. Continuous reinforcement, such as praising every correct use of a new routine, helps learners acquire a behaviour. Intermittent reinforcement helps maintain it.

A fixed-ratio schedule rewards after a set number of responses. A variable-ratio schedule rewards after an unpredictable number of responses. The variable-ratio schedule often makes behaviour more resistant to extinction (Skinner, 1953).

This is why gamified AI tutors and adaptive apps need scrutiny. Streaks, surprise badges and unpredictable prompts can train attention to the platform rather than strengthen metacognitive learning. Recent reviews of intelligent tutoring systems echo this risk (Batsaikhan & Correia, 2024).

When reinforcement stops suddenly, behaviour may briefly get louder, happen more often or become more intense. This is an extinction burst, not automatic failure. It can also happen across a whole system.

A trust that scales a digital points economy across several schools should plan how rewards will fade when learners move phase, teacher or setting. If the next classroom does not use the same reward economy, learners may test the boundary before the new routine settles. Plan for that stage, stay consistent and reinforce the replacement behaviour, such as waiting to be invited in.

Use the Premack Principle when a preferred activity can support a less preferred one: "finish the retrieval questions, then choose the practical equipment". This works best when the preferred activity is already meaningful to the learner. It should not become a bribe for basic compliance, and it should be faded once the routine becomes secure.

Ethics of Reinforcement in Schools

School rewards must be fair, proportionate and respectful. The EEF behaviour guidance recommends knowing learners, teaching learning behaviours and using targeted support. It warns against relying only on rewards and sanctions (Education Endowment Foundation, 2019). The ITTECF also places behaviour within early teacher development: teachers should teach and maintain clear routines, apply rewards and sanctions consistently, use positive reinforcement, and adapt approaches for SEND (Department for Education, 2024).

For trauma-informed practice, the question is not "what consequence will make this stop?" but "what does this behaviour protect, avoid or communicate?" Reward the behaviour you want to see, teach self-regulation explicitly, and build predictable safe spaces before a learner reaches crisis. A calm exit card, a co-regulation script or a planned sensory break is still behaviour teaching; it simply starts with regulation rather than compliance.

Build the behaviour plan around prevention, teaching and fading. Involve learners when rules and consequences are set. Review whether the plan is still fair, especially for SEND learners and those with adverse experiences. Adults should model the behaviours they expect: calm language, patience, repair after conflict and clear feedback about the next successful step.

Where Skinner's Model Falls Short

Skinner's model has limits because it focuses on behaviour we can see. It gives less weight to emotion, motivation and the wider social context of learning. Bandura (1977) showed why this matters: learners can pick up behaviours by watching models, expecting consequences and judging their own capability, not only through direct reinforcement. A learner may copy the calm peer who gets listened to, or avoid the teacher who humiliates mistakes, before any reward or sanction is applied to them personally.

Operant conditioning is also part of a live debate about rewards. Deci and Ryan (1985) argued that controlling rewards can harm autonomy and intrinsic motivation. Cameron and Pierce (1994) argued that some rewards, such as verbal praise and feedback linked to performance, do not always harm motivation. Modern neuroscience adds a further point: reward prediction error, rather than the reward alone, seems central to learning because the brain updates when outcomes differ from expectation (Schultz, 2016).

For teachers, the key question is simple. Does the consequence give useful information, or does it only buy compliance?

Use operant conditioning alongside cognitive and relational strategies. Help learners name the goal, practise the replacement behaviour, notice progress and explain what helped. Keep large rewards rare. Use small, specific feedback while a routine is being learned, then fade it so the learner can act from mastery, belonging and purpose rather than from the promise of a prize.

Skinner's Theory: Operant Conditioning for UK Classrooms — slide preview
◆ Structural Learning
Skinner's Theory: Operant Conditioning for UK Classrooms
Classroom-readyWhat the theory means in practice

in practice — a classroom-ready briefing you can use this week.

Something went wrong — please try again.
✓ On its way. Download the slides now.

Common Misconceptions About Operant Conditioning

Operant conditioning is a powerful and popular tool. However, three common myths can stop it from working well in schools. Below are the traps that catch out even the most experienced teachers.

The Overjustification Effect

Rewarding learners for a task they already enjoy can undermine their natural motivation. Lepper, Greene and Nisbett (1973) offered preschool children an expected Good Player Award for drawing with Magic Markers, an activity they had already shown interest in. In a later free-choice observation, children who had expected the Good Player Award spent less time drawing with Magic Markers than children in the unexpected-award or no-award groups (Lepper, Greene and Nisbett, 1973). The children appeared to re-read the activity as something they did for a prize, not for interest.

When the reward disappeared, the behaviour became weaker too. This matters in the classroom. If a learner already enjoys reading, a prize scheme may pull attention away from the story and towards the certificate. The risk is not every reward, but an expected, visible reward linked to an activity that already has value.

Teachers must protect natural drives such as curiosity, mastery, autonomy and relatedness (Deci & Ryan, 1985). Save tangible rewards for routines learners do not yet value, such as tidying equipment or starting a disliked task. As the routine becomes easier, replace the token with informational feedback: "You started without a prompt today, so you have more working time." Then fade even that prompt.

Extinction Bursts: Don't Quit Too Early

When a previously reinforced behaviour stops being reinforced, teachers sometimes use planned ignoring. They stop giving attention to a challenging behaviour in the hope that it will fade. This can be a legitimate operant strategy, but it often triggers a predictable response.

For three to five days, the unwanted behaviour may intensify because the learner tries harder to get the attention they used to receive. This is called an extinction burst, and it catches many teachers by surprise. The learner who used to shout out for attention now shouts even louder.

The teacher may think the strategy is not working and abandon it when consistency is most needed. If the teacher gives attention at that point, even through scolding, they may reinforce the intensified behaviour. Training in operant techniques needs to prepare teachers for extinction bursts; without that preparation, well-intentioned behaviour plans often fail.

Confusing Compliance with Self-Regulation

A token economy or behaviour chart can produce clear short-term improvements in classroom behaviour. Learners work for tokens, lose them for misbehaviour, and compliance rises visibly. But compliance and self-regulation are not the same thing.

Compliance means doing what you are told because of outside consequences. Self-regulation means managing your own behaviour because you understand its value and take it in. Research cited in the Education Endowment Foundation (2019) behaviour guidance suggests that operant strategies can reduce challenging behaviour at the time. However, they do not always build self-regulation once the external system is removed.

A learner who behaves well on a reward chart may struggle when the chart ends or in settings without it. This is a leadership issue, not only a classroom issue. If a multi-academy trust standardises a points system, leaders should also standardise the fade: what gets taught explicitly, when rewards become less frequent, and how learners practise the same behaviour without the points screen. Token systems can build initial momentum, but they should be scaffolds that are removed, not permanent features.

Frequently Asked Questions About Operant Conditioning

Operant conditioning in simple terms

Skinner (1938) showed that consequences shape learning. Learners repeat behaviours with positive outcomes. Negative outcomes mean learners do the behaviour less. Teachers use this in reward systems, praise, and policies.

Positive and negative reinforcement difference

Positive reinforcement gives learners a reward, such as praise for good work. Negative reinforcement removes an unwanted pressure when learners reach a goal.

For example, you can remove an extra practice task after a learner has shown mastery. Both forms increase a behaviour. Punishment is different because it aims to reduce a behaviour (Skinner, 1953).

Skinner's theory in schools

Skinner (1953) informs behaviour strategies such as reward charts. Teachers should praise learners unpredictably for better results. Research confirms that this strengthens behaviours.

Effective teaching means scaffolding through shaping. Break tasks down and reward each small step (Skinner, 1953).

Chomsky's critique of Skinner

Chomsky (1959) reviewed Skinner's work. He said rewards alone cannot explain how we learn language. Learners create new sentences, which shows that language skills go beyond simple responses.

His review helped cognitive psychology replace behaviourism. It also proved that operant conditioning is not a complete theory (Chomsky, 1959).

Limitations of classroom rewards

Deci and Ryan (1985) showed extrinsic rewards may reduce learner motivation. Learners may lose interest in fun tasks if rewarded and then not. Good teachers mix rewards with things that build internal drive. Teachers should consider autonomy, purpose, and awareness of progress.

Research Evidence Check

◆ Structural Learning
Skinner's Theory: Operant Conditioning for UK Classrooms: Quick-Check Quiz
10-question self-test
Q1 of 10
0%

Evidence Synthesis

Five Consensus records give moderate support for using operant conditioning in classroom routines. They also support its use in reinforcement and behaviour support. Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.

Promising support: The evidence is strongest when teachers define the behaviour clearly. They should reinforce a useful replacement quickly and avoid treating rewards as a complete behaviour policy. Skinner gives a diagnostic lens, not a full account of motivation.

60% Yes from 5 studiesmoderate evidence
  • Yes60%
  • Possibly40%
  • Mixed0%
  • No0%
Teacher takeaway

Use Skinner's ideas as a simple check on behaviour. First, define the behaviour. Then identify the antecedent, or what happens before it, and the consequence, or what happens after it. Reinforce the replacement behaviour quickly, then fade external rewards so learners build self-regulation and do not rely on rewards.

View the evidence behind this answer5 studies
1Behaviorism, Skinner, and Operant Conditioning: Considerations for Coaching PracticeThomas M. Leeder (2022) · Strategies
theoretical overviewpossibly202226 citations

This gives a concise theoretical overview of Skinner's operant conditioning. It also includes four reflective questions for practitioners. It frames behaviourism as one tool among many. This helps teachers question their own assumptions about how learners change their behaviour.

Classroom implication: Use operant conditioning to ask what a routine rewards. Do not use it as a blanket behaviour policy.

2Positive Reinforcement Strategy in Classroom Behaviour: A Scoping ReviewAisha Rafi et al. (2020) · Journal of Rawalpindi Medical College
scoping reviewyes202016 citations

A PRISMA scoping review looked at positive reinforcement strategies for managing challenging classroom behaviour. Praise and feedback were the most common strategies. The review links both approaches to Skinner's operant principles.

Classroom implication: Name the behaviour you want to see. Reinforce it quickly, and make the feedback clear enough for learners to repeat it.

3Effect of Reinforcement on Teaching-Learning ProcessRezaul Hoque (2013) · IOSR Journal of Humanities and Social Science
classroom experimental studyyes201318 citations

An empirical study involved 100 Class IX learners split into experimental and control groups. The results supported Skinner's operant conditioning theory. They showed measurable behaviour change linked to systematic reinforcement. This was a rare classroom-based experimental design in this literature.

Classroom implication: Track whether a reinforcement routine changes the target behaviour. Do not assume that rewards are working.

4The Skinnerian Teaching Machine (1953-1968)Jason K. McDonald (2020) · EdTech Books
historical design casepossibly2020

A historical design case examined Skinner's teaching machine. It showed how three operant principles, reinforcement, shaping, and vanishing, were intentionally built into the device. This gives useful context for teachers working with adaptive learning software, which inherits this design lineage.

Classroom implication: When using adaptive software, check what the immediate feedback is doing. Is it helping learners think, or is it simply training response patterns?

5The Learning Theory of B.F. Skinner and Teaching Strategies for ADHD StudentsThilagarasi Subramaniam et al. (2025) · Special Education
literature reviewyes2025

A recent (2025) review applied Skinner's operant conditioning to ADHD classroom strategies. It found that positive reinforcement, structured environments, and behaviour modification can help learners sustain attention, complete tasks, and reduce disruption.

Classroom implication: Pair reinforcement with clear routines and a well-structured classroom. This is especially useful when learners need support with attention and task completion.

Further Reading

Use these related guides to place Skinner inside wider behaviour, motivation and learning theory.

References

Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. Plenum Press.

Department for Education. (2024). Behaviour in schools: advice for headteachers and school staff. UK Government Publications.

Education Endowment Foundation. (2019). Improving behaviour in schools: evidence review and recommendations. EEF.

Korpershoek, H., Harms, T., de Boer, H., van Kuijk, M., & Doolaard, S. (2016). A meta-analysis of the effects of classroom management strategies and classroom management programs on learners' academic, behavioural, emotional, and motivational outcomes. Review of Educational Research, 86(3), 643-680.

Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children's intrinsic interest with extrinsic reward: A test of the "overjustification" hypothesis. Journal of Personality and Social Psychology, 28(1), 129-137.

Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Appleton-Century-Crofts.

Skinner, B. F. (1953). Science and human behavior. Macmillan.

Skinner, B. F. (1957). Verbal behavior. Appleton-Century-Crofts.

Paul Main, Founder of Structural Learning
About the Author
Paul Main
Founder & Metacognition Researcher

Paul Main is an educator and metacognition researcher who founded Structural Learning in 2002. With a psychology degree from the University of Sunderland and 22+ years helping schools embed thinking skills, he bridges the gap between educational research and classroom practice. Fellow of the RSA and Chartered College of Teaching, with 128+ Google Scholar citations.

More →

Psychology

Back to Blog