Skinner's Operant Conditioning: When Rewards Backfire
Skinner’s operant conditioning works in classrooms, but misapplied rewards backfire. Learn reinforcement schedules, misconceptions, and the science.


Skinner’s operant conditioning works in classrooms, but misapplied rewards backfire. Learn reinforcement schedules, misconceptions, and the science.
Skinner's theory of operant conditioning (Skinner, 1953) explains how consequences shape behaviour. In operant conditioning, learners repeat actions that bring useful outcomes. They reduce actions that bring unwanted outcomes.
This connects to the wider context of fundamental theories of learning in modern classroom practice.

Positive reinforcement adds something valued. Negative reinforcement removes something unpleasant. Punishment tries to reduce an action.
In schools, this explains why praise, routines, points and sanctions can change behaviour quickly. It also explains why they can go wrong. A sticker chart may bring short-term quiet, but it can teach learners to work only when a reward is visible (Deci, Koestner, and Ryan, 1999).
The classroom question is practical. Use Skinner to spot the antecedent, behaviour and consequence. Then teach the replacement behaviour and fade rewards over time.
The goal is not control for its own sake. The goal is calm routines, clear feedback and growing self-regulation.
Use a simple test. First, name the action, then name what happens next. Ask whether that consequence makes the action more or less likely.
If it does, Skinner is useful. If it does not, look instead at task design, emotion, relationships or prior knowledge.
Skinner used operant conditioning to train pigeons, but classroom rewards need a narrower claim. Deci, Koestner, and Ryan (1999) reported a free-choice effect of d = -0.34 for tangible rewards overall, while expected and controlling rewards were more damaging than informational feedback. The classroom lesson is practical: do not pay learners with points for work they already value. Use rewards briefly, make the feedback specific, and fade the reward before interest becomes transactional.
Evidence overview
Paul Main reviewed this article. He is the Founder and Educational Consultant at Structural Learning.
Classical conditioning and operant conditioning are often confused, but they work in different ways. Classical conditioning, linked with Pavlov (1927), produces automatic, reflexive responses. The familiar example is the bell paired with food in Pavlov's dogs. In this kind of learning, learners have little choice in how they react.
Operant conditioning is built on voluntary action. The learner does something, such as raising a hand, completing homework, or speaking up in class. What happens next shapes whether they are likely to do it again. The consequence follows the behaviour, and that timing creates the learning.
The ABC model makes operant conditioning easier to use in classrooms. A stands for Antecedent: the trigger, instruction, or context that prompts behaviour. For example, a teacher sets a maths problem on the board.
B is the Behaviour: the learner raises their hand to ask for help, works quietly, or doodles in the margin. C is the Consequence: praise, a mark, peer attention, or the internal satisfaction of solving it. The consequence is what increases or decreases the chance that the behaviour will happen again. Skinner (1953) developed this principle through animal studies and later applied it to human learning contexts, including education and clinical psychology.
In a Year 4 maths lesson, the teacher displays a challenging problem (antecedent). Learner A raises their hand to ask a clarifying question (behaviour). The teacher responds warmly with a hint (consequence: positive attention), so Learner A is more likely to ask questions in future lessons.

Learner B also raises their hand, but the teacher is busy and misses it. Learner B may stop volunteering because there was no useful consequence.
Learner C gives an incorrect answer aloud and hears giggles from peers. Learner C may avoid speaking in class because the consequence was social risk. All three outcomes depend on what happened after the behaviour, not on the learner's nature or fixed ability.
A concise Structural Learning audio episode on Skinner's Theory: Operant Conditioning for UK Classrooms, grounded in the curated research dossier and focused on practical classroom use.
Positive reinforcement means adding a valued consequence after a behaviour so the behaviour is more likely to happen again. In class, that might be quiet praise after a learner uses a sentence stem, a house point for arriving with the right equipment, or extra choice after a group tidies well. It works best when the learner understands exactly which behaviour earned the response.
In a Year 6 English lesson, a learner who usually struggles with writing finishes a paragraph ahead of schedule and reads it aloud. The teacher says, "That opening sentence is vivid; I can picture the scene. Well done," and writes a note in the learner's book.
The praise works because it is immediate, specific, and valued by the learner. Not every learner wants public recognition, so the teacher still needs to choose the right form of feedback. Korpershoek et al. (2016) found small but significant average effects of classroom-management strategies on student outcomes. Skinner (1957) called this pattern "reinforcing the operant."
A common pitfall is praise that is too vague ("Good job!"), too delayed (weeks later at parents' evening), or given for effort on a task the learner already finds intrinsically rewarding (see the Common Misconceptions section below). The goal is to use positive reinforcement strategically: reinforce the behaviour you most want to see, and do it often enough that learners build new habits before the reinforcement is gradually withdrawn.
Negative reinforcement means removing something unpleasant after a behaviour occurs, increasing the likelihood that the behaviour will happen again. Despite its name, negative reinforcement is not punishment. It strengthens behaviour. The word "negative" means the removal, or subtraction, of an aversive stimulus, not the quality of the outcome.
In a Year 3 classroom, learners are asked to work quietly until the noise level drops below a set threshold. Once it does, the timer stops and learners get extra playtime. The unpleasant demand is removed, and a valued activity is added.
The learner who settles first experiences relief because the demand for silence is lifted. Next time, they are more likely to settle quickly because they have learned that doing so ends the aversive state. Schools use the same pattern when they say, "Once you have completed your spelling words, you can leave the table."
The Department for Education's Behaviour in Schools guidance (February 2024) says schools should teach expected behaviour through clear routines, use positive reinforcement and sanctions, and make reasonable proactive adjustments for learners with additional needs. For example, a teacher may stop insisting on eye contact from neurodivergent learners. They may also avoid requiring perfect silence during independent work.
Negative reinforcement can create compliance. But it can also teach learners to tolerate or ignore the aversive condition. As a result, they may not truly internalise the value of the behaviour. Teachers sometimes overuse it, which can create classrooms where learners behave only to escape demands, not because they understand or agree with expectations.
Positive punishment means adding something unpleasant after a behaviour, so the behaviour is less likely to happen again. In school, this often means a verbal reprimand, loss of responsibility, or detention.
Imagine a Year 5 learner interrupts a lesson. The teacher calmly removes two minutes of playtime, which usually stops the learner from interrupting again immediately. However, Skinner (1953) argued that punishment may suppress behaviour temporarily and can produce unwanted emotional, escape and avoidance effects; it does not by itself teach alternative behaviour.
Punishment does not teach the replacement behaviour. It usually suppresses the unwanted behaviour for a short time. This is where strict zero-tolerance systems and isolation booths can drift away from Skinner's own logic: if a learner is removed, shamed or isolated without being taught what to do instead, the school has only changed the consequence, not the repertoire. Sidman (1989) warned that coercive control often produces avoidance, anxiety and counter-control, so repeated removals can train learners to escape school rather than rejoin learning.
A less harsh form is removing a privilege: a learner who has been off-task loses the choice of activity for the afternoon and is assigned a specific task instead. The consequence (reduced autonomy) is delivered calmly and without shame. The learner's behaviour may improve in the short term, but the long-term effect depends on whether they also learn what behaviour is expected and why it matters. Without that combination, punishment alone breeds resentment, not understanding.
Negative punishment means removing something desirable after a behaviour occurs, decreasing the likelihood that the behaviour will happen again. Teachers may also hear this called response cost or withdrawal of privileges. It is the quadrant that people most often misunderstand.
In a Year 4 classroom, learners earn minutes in a behaviour tally to spend on Friday afternoon activities. When a learner makes unkind comments during group work, the teacher quietly says, "That is unkind. You have lost two minutes," and updates the tally.
The learner has lost access to something they value. The behaviour is less likely to happen again if the loss feels significant and is applied consistently. The EEF (2019) behaviour guidance supports consistent policies, but it also stresses knowing learners, teaching learning behaviours and giving targeted support. Digital token economies, including ClassCharts-style points systems, can create high compliance but low independence if leaders treat points as the behaviour policy itself.
When rewards are removed, behaviour often returns to baseline. This happens because learners have learned to comply when incentives are visible, not to self-regulate.

Download a one-page study note for Skinner's Operant Conditioning, with the key ideas, limitations and classroom links in one place.
A related concern is shame. Some learners feel deeply shamed when they lose a privilege, especially if adults announce it in public. SEND learners may also find behaviour systems unpredictable or sensory-threatening, particularly when points are removed without warning. The Timpson Review (2019) helps here: behaviour may signal an unmet need, but schools still need clear expectations.
Negative punishment works best when adults explain the cause and effect. They should also teach the replacement behaviour and make reasonable adjustments when disability, trauma or communication needs are involved.
Schedules of reinforcement are rules for when teachers give feedback or rewards. Continuous reinforcement, such as praising every correct use of a new routine, helps learners acquire a behaviour. Intermittent reinforcement helps maintain it.
A fixed-ratio schedule rewards after a set number of responses. A variable-ratio schedule rewards after an unpredictable number of responses. The variable-ratio schedule often makes behaviour more resistant to extinction (Skinner, 1953).
This is why gamified AI tutors and adaptive apps need scrutiny. Streaks, surprise badges and unpredictable prompts can train attention to the platform rather than strengthen metacognitive learning. Recent reviews of intelligent tutoring systems echo this risk (Batsaikhan & Correia, 2024).
When reinforcement stops suddenly, behaviour may briefly get louder, happen more often or become more intense. This is an extinction burst, not automatic failure. It can also happen across a whole system.
A trust that scales a digital points economy across several schools should plan how rewards will fade when learners move phase, teacher or setting. If the next classroom does not use the same reward economy, learners may test the boundary before the new routine settles. Plan for that stage, stay consistent and reinforce the replacement behaviour, such as waiting to be invited in.
Use the Premack Principle when a preferred activity can support a less preferred one: "finish the retrieval questions, then choose the practical equipment". This works best when the preferred activity is already meaningful to the learner. It should not become a bribe for basic compliance, and it should be faded once the routine becomes secure.
School rewards must be fair, proportionate and respectful. The EEF behaviour guidance recommends knowing learners, teaching learning behaviours and using targeted support. It warns against relying only on rewards and sanctions (Education Endowment Foundation, 2019). The ITTECF also places behaviour within early teacher development: teachers should teach and maintain clear routines, apply rewards and sanctions consistently, use positive reinforcement, and adapt approaches for SEND (Department for Education, 2024).
For trauma-informed practice, the question is not "what consequence will make this stop?" but "what does this behaviour protect, avoid or communicate?" Reward the behaviour you want to see, teach self-regulation explicitly, and build predictable safe spaces before a learner reaches crisis. A calm exit card, a co-regulation script or a planned sensory break is still behaviour teaching; it simply starts with regulation rather than compliance.
Build the behaviour plan around prevention, teaching and fading. Involve learners when rules and consequences are set. Review whether the plan is still fair, especially for SEND learners and those with adverse experiences. Adults should model the behaviours they expect: calm language, patience, repair after conflict and clear feedback about the next successful step.
Skinner's model has limits because it focuses on behaviour we can see. It gives less weight to emotion, motivation and the wider social context of learning. Bandura (1977) showed why this matters: learners can pick up behaviours by watching models, expecting consequences and judging their own capability, not only through direct reinforcement. A learner may copy the calm peer who gets listened to, or avoid the teacher who humiliates mistakes, before any reward or sanction is applied to them personally.
Operant conditioning is also part of a live debate about rewards. Deci and Ryan (1985) argued that controlling rewards can harm autonomy and intrinsic motivation. Cameron and Pierce (1994) argued that some rewards, such as verbal praise and feedback linked to performance, do not always harm motivation. Modern neuroscience adds a further point: reward prediction error, rather than the reward alone, seems central to learning because the brain updates when outcomes differ from expectation (Schultz, 2016).
For teachers, the key question is simple. Does the consequence give useful information, or does it only buy compliance?
Use operant conditioning alongside cognitive and relational strategies. Help learners name the goal, practise the replacement behaviour, notice progress and explain what helped. Keep large rewards rare. Use small, specific feedback while a routine is being learned, then fade it so the learner can act from mastery, belonging and purpose rather than from the promise of a prize.
in practice — a classroom-ready briefing you can use this week.
Operant conditioning is a powerful and popular tool. However, three common myths can stop it from working well in schools. Below are the traps that catch out even the most experienced teachers.
Rewarding learners for a task they already enjoy can undermine their natural motivation. Lepper, Greene and Nisbett (1973) offered preschool children an expected Good Player Award for drawing with Magic Markers, an activity they had already shown interest in. In a later free-choice observation, children who had expected the Good Player Award spent less time drawing with Magic Markers than children in the unexpected-award or no-award groups (Lepper, Greene and Nisbett, 1973). The children appeared to re-read the activity as something they did for a prize, not for interest.
When the reward disappeared, the behaviour became weaker too. This matters in the classroom. If a learner already enjoys reading, a prize scheme may pull attention away from the story and towards the certificate. The risk is not every reward, but an expected, visible reward linked to an activity that already has value.
Teachers must protect natural drives such as curiosity, mastery, autonomy and relatedness (Deci & Ryan, 1985). Save tangible rewards for routines learners do not yet value, such as tidying equipment or starting a disliked task. As the routine becomes easier, replace the token with informational feedback: "You started without a prompt today, so you have more working time." Then fade even that prompt.
When a previously reinforced behaviour stops being reinforced, teachers sometimes use planned ignoring. They stop giving attention to a challenging behaviour in the hope that it will fade. This can be a legitimate operant strategy, but it often triggers a predictable response.
For three to five days, the unwanted behaviour may intensify because the learner tries harder to get the attention they used to receive. This is called an extinction burst, and it catches many teachers by surprise. The learner who used to shout out for attention now shouts even louder.
The teacher may think the strategy is not working and abandon it when consistency is most needed. If the teacher gives attention at that point, even through scolding, they may reinforce the intensified behaviour. Training in operant techniques needs to prepare teachers for extinction bursts; without that preparation, well-intentioned behaviour plans often fail.
A token economy or behaviour chart can produce clear short-term improvements in classroom behaviour. Learners work for tokens, lose them for misbehaviour, and compliance rises visibly. But compliance and self-regulation are not the same thing.
Compliance means doing what you are told because of outside consequences. Self-regulation means managing your own behaviour because you understand its value and take it in. Research cited in the Education Endowment Foundation (2019) behaviour guidance suggests that operant strategies can reduce challenging behaviour at the time. However, they do not always build self-regulation once the external system is removed.
A learner who behaves well on a reward chart may struggle when the chart ends or in settings without it. This is a leadership issue, not only a classroom issue. If a multi-academy trust standardises a points system, leaders should also standardise the fade: what gets taught explicitly, when rewards become less frequent, and how learners practise the same behaviour without the points screen. Token systems can build initial momentum, but they should be scaffolds that are removed, not permanent features.
Skinner (1938) showed that consequences shape learning. Learners repeat behaviours with positive outcomes. Negative outcomes mean learners do the behaviour less. Teachers use this in reward systems, praise, and policies.
Positive reinforcement gives learners a reward, such as praise for good work. Negative reinforcement removes an unwanted pressure when learners reach a goal.
For example, you can remove an extra practice task after a learner has shown mastery. Both forms increase a behaviour. Punishment is different because it aims to reduce a behaviour (Skinner, 1953).
Skinner (1953) informs behaviour strategies such as reward charts. Teachers should praise learners unpredictably for better results. Research confirms that this strengthens behaviours.
Effective teaching means scaffolding through shaping. Break tasks down and reward each small step (Skinner, 1953).
Chomsky (1959) reviewed Skinner's work. He said rewards alone cannot explain how we learn language. Learners create new sentences, which shows that language skills go beyond simple responses.
His review helped cognitive psychology replace behaviourism. It also proved that operant conditioning is not a complete theory (Chomsky, 1959).
Deci and Ryan (1985) showed extrinsic rewards may reduce learner motivation. Learners may lose interest in fun tasks if rewarded and then not. Good teachers mix rewards with things that build internal drive. Teachers should consider autonomy, purpose, and awareness of progress.
Research Evidence Check
Five Consensus records give moderate support for using operant conditioning in classroom routines. They also support its use in reinforcement and behaviour support. Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.
Promising support: The evidence is strongest when teachers define the behaviour clearly. They should reinforce a useful replacement quickly and avoid treating rewards as a complete behaviour policy. Skinner gives a diagnostic lens, not a full account of motivation.
Use Skinner's ideas as a simple check on behaviour. First, define the behaviour. Then identify the antecedent, or what happens before it, and the consequence, or what happens after it. Reinforce the replacement behaviour quickly, then fade external rewards so learners build self-regulation and do not rely on rewards.
This gives a concise theoretical overview of Skinner's operant conditioning. It also includes four reflective questions for practitioners. It frames behaviourism as one tool among many. This helps teachers question their own assumptions about how learners change their behaviour.
Classroom implication: Use operant conditioning to ask what a routine rewards. Do not use it as a blanket behaviour policy.
A PRISMA scoping review looked at positive reinforcement strategies for managing challenging classroom behaviour. Praise and feedback were the most common strategies. The review links both approaches to Skinner's operant principles.
Classroom implication: Name the behaviour you want to see. Reinforce it quickly, and make the feedback clear enough for learners to repeat it.
An empirical study involved 100 Class IX learners split into experimental and control groups. The results supported Skinner's operant conditioning theory. They showed measurable behaviour change linked to systematic reinforcement. This was a rare classroom-based experimental design in this literature.
Classroom implication: Track whether a reinforcement routine changes the target behaviour. Do not assume that rewards are working.
A historical design case examined Skinner's teaching machine. It showed how three operant principles, reinforcement, shaping, and vanishing, were intentionally built into the device. This gives useful context for teachers working with adaptive learning software, which inherits this design lineage.
Classroom implication: When using adaptive software, check what the immediate feedback is doing. Is it helping learners think, or is it simply training response patterns?
A recent (2025) review applied Skinner's operant conditioning to ADHD classroom strategies. It found that positive reinforcement, structured environments, and behaviour modification can help learners sustain attention, complete tasks, and reduce disruption.
Classroom implication: Pair reinforcement with clear routines and a well-structured classroom. This is especially useful when learners need support with attention and task completion.
Free for teachers. The platform builds a classroom-ready lesson plan from your topic in under two minutes.
Use these related guides to place Skinner inside wider behaviour, motivation and learning theory.
Deci, E. L., & Ryan, R. M. (1985). Intrinsic motivation and self-determination in human behavior. Plenum Press.
Department for Education. (2024). Behaviour in schools: advice for headteachers and school staff. UK Government Publications.
Education Endowment Foundation. (2019). Improving behaviour in schools: evidence review and recommendations. EEF.
Korpershoek, H., Harms, T., de Boer, H., van Kuijk, M., & Doolaard, S. (2016). A meta-analysis of the effects of classroom management strategies and classroom management programs on learners' academic, behavioural, emotional, and motivational outcomes. Review of Educational Research, 86(3), 643-680.
Lepper, M. R., Greene, D., & Nisbett, R. E. (1973). Undermining children's intrinsic interest with extrinsic reward: A test of the "overjustification" hypothesis. Journal of Personality and Social Psychology, 28(1), 129-137.
Skinner, B. F. (1938). The behavior of organisms: An experimental analysis. Appleton-Century-Crofts.
Skinner, B. F. (1953). Science and human behavior. Macmillan.
Skinner, B. F. (1957). Verbal behavior. Appleton-Century-Crofts.
Theory grounded. Classroom workable. Free for teachers.