Stop Grading Facts: Assessing Conceptual Understanding in the MYP
Stop assessing recall and start assessing understanding. A practical rubric for MYP teachers using the Thinking Framework to map depth of knowledge across achievement levels 1-8.


Stop assessing recall and start assessing understanding. A practical rubric for MYP teachers using the Thinking Framework to map depth of knowledge across achievement levels 1-8.

Most MYP rubrics ask teachers to assess "conceptual understanding" at levels 5 to 8. Most MYP teachers end up assessing whether learners can describe things clearly. The gap between those two outcomes is not a marking problem. It is a task design problem.
A Year 9 history teacher asks learners to "explain the causes of the First World War". Forty responses arrive. Most learners list militarism, alliances, imperialism and nationalism from the lesson notes, then add a short conclusion. The work is accurate, but the task has allowed description to pass as explanation.
To stop grading facts, the question has to ask learners to test a causal relationship, weigh evidence and apply the pattern to a case they have not rehearsed.
A learner can describe a topic well and still not analyse it. A learner can recall facts well and still not show conceptual understanding.
The IB's MYP criterion descriptors explicitly use words like "analyse", "evaluate", and "synthesise" at the higher achievement levels. Yet when the task only asks learners to describe, teachers have no choice but to mark what they receive. The rubric has been rendered meaningless at levels 7 and 8 because the task never required that depth of thinking.
This is not a teacher failure. It is a validity problem, and it affects school data. If level 7 and 8 are awarded for polished recall in Key Stage 3, SLT receives inflated learner learning data. Year 12 teachers then inherit learners who can write fluently but struggle to defend a claim in DP or A-Level study.
Heads of teaching and learning should therefore spend less CPD time calibrating broad rubric language and more time redesigning the tasks that set the ceiling for achievement. Wiliam (2011) would call this assessment validity drift: the descriptor claims to assess one construct, while the task measures another.
In the MYP, conceptual understanding means that a learner can use facts and relationships to explain a transferable idea in a new context. The MYP is one of the IB's four programmes: PYP (ages 3 to 12), MYP (11 to 16), DP (16 to 19) and Career-related Programme / CP (16 to 19). Do not import PYP concept lists into MYP assessment. The current PYP has seven specified concepts, often still called key concepts: form, function, causation, change, connection, perspective and responsibility.
Reflection was the eighth historical lens, but the Enhanced PYP (2018) treats reflection as a continuous practice across inquiry, assessment and action; many school websites still say "8 lenses" in 2026. Erickson and Lanning (2014) help here because their model separates facts, topics, concepts, generalisations and theory. This matters in teaching and learning because a learner can recite the topic but still miss the transferable relationship that the rubric is meant to assess.
| Layer | Example (Biology) | Assessment Implication |
|---|---|---|
| Facts | The cell membrane is made of phospholipids | Level 1-2 assessment: recall and identification |
| Topics | Cell membrane structure | Topic-level description: names components |
| Concepts | Selective permeability | Level 3-4: describes the concept with detail |
| Generalisations | Cells maintain internal conditions by controlling what moves across membranes | Level 5-8 target: explains relationships |
| Theory | Cell theory: living organisms are composed of cells that maintain homeostasis | Level 7-8 synthesis: evaluates across systems |
The IB states that learners should work at the generalisation and theory levels, not just at the fact and topic levels. Yet most MYP summative tasks reach only the concept level at best. In Erickson's model, a learner who "describes selective permeability" has reached level 3 or 4. When the task asks them to 'explain why cells would cease to function if membrane permeability became non-selective', it moves them towards generalisation, where levels 5 to 8 belong.
Grant Wiggins and Jay McTighe (2005) make the same point in Understanding by Design. Their "facets of understanding" model treats transfer as the harder end of understanding: can the learner apply an idea to a context not practised in class? That does not mean asking novices to generalise before they know enough.
Kirschner, Sweller and Clark (2006) warn that unguided problem solving can overload working memory, and Ashman (2021) makes the same point for classroom instruction. In MYP assessment, factual fluency is not the enemy of conceptual work. It is the base that makes transfer fair.
The Thinking Framework's eight cognitive operations can be used as a local task-design map for MYP achievement levels. They are not an IB scoring scale. Their value is practical: each operation names the cognitive work a learner must perform if the assessment is to distinguish recall, explanation, transfer and evaluation.
| MYP Level | Cognitive Demand | Thinking Framework Operations | What Learner Work Looks Like |
|---|---|---|---|
| 1-2 (Limited) | Recall, identify | Classify (sort into given categories), Sequence (put in given order) | Lists facts, follows a template, copies structure |
| 3-4 (Adequate) | Describe, outline | Compare (similarities and differences), Part-Whole (break down into components) | Describes with some detail, identifies components, makes basic comparisons |
| 5-6 (Substantial) | Analyse, explain | Cause and Effect (explain why), Analogy (connect to other contexts) | Explains relationships, identifies causes, transfers to new contexts |
| 7-8 (Excellent) | Evaluate, synthesise | Perspective (multiple viewpoints), Systems Thinking (interconnections) | Evaluates competing arguments, synthesises across sources, identifies systemic patterns |
This mapping does something that standard Bloom (Bloom, 1956)'s Taxonomy guidance (Bloom, 1956) does not do for the MYP context: it gives you a concrete cognitive operation to design around, not just a verb to look for in learner writing. "Analyse" is an instruction to learners. "Cause and Effect" is the thinking structure you build the task around.
The distinction matters because Bloom's verbs describe what you want learners to do in their response. The Thinking Framework operations describe the mental structure that makes that response possible. When you design a task that requires Cause and Effect thinking, you are not just hoping learners will analyse. You are building the question so that analysis is the only way to answer it.
Norman Webb (1997) made a similar argument with his Depth of Knowledge framework: the cognitive demand is a property of the task, not the learner's response. A Depth of Knowledge level 1 task cannot produce level 4 evidence regardless of how capable the learner is. The same logic applies here. If your MYP task only requires Classify thinking, you cannot award level 7 even if a learner's response is beautifully written.
Worked examples demonstrate how each Thinking Framework translates to an MYP task. Subject examples are from different areas, proving the mapping is not subject specific (Wiggins and McTighe, 2005). Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.
| Level | Operation | Example Task |
|---|---|---|
| 1-2 | Classify | Sort these ten historical events into the categories 'political', 'economic', and 'social'. |
| 1-2 | Sequence | Arrange these stages of mitosis in the correct order and name each one. |
| 3-4 | Compare | Identify three similarities and three differences between the causes of the First and Second World Wars. |
| 3-4 | Part-Whole | Break down a short story into its narrative components: setting, protagonist, conflict, resolution. Describe what each contributes. |
| 5-6 | Cause and Effect | Explain why industrialisation caused urbanisation in 19th-century Britain, then predict where you would expect to see similar patterns emerging in a developing economy today. |
| 5-6 | Analogy | A learner says that a cell is like a factory. Evaluate this analogy: which aspects of cellular function does it explain well, and which aspects does it fail to capture? |
| 7-8 | Perspective | Evaluate the claim that globalisation benefits everyone by examining it from the perspectives of a multinational company, a rural farmer in a developing country, and an environmental scientist. |
| 7-8 | Systems Thinking | A government introduces a sugar tax. Map all the likely effects across health outcomes, food industry behaviour, household economics, and public attitudes. Identify which effects could be self-reinforcing. |
Notice how the Cause and Effect task at level 5 to 6 stops learners from answering by description alone. They must explain a mechanism and then transfer that mechanism to a new context. Even so, there is still a validity risk.
Koretz (2014) warns that complex performance tasks can be hard to score reliably. Christodoulou (2017) argues that assessments of broad skills often reward prior knowledge and verbal fluency. For MYP moderation, level 7 to 8 evidence should show the reasoning process, not only a polished written product.
Metacognition matters here because learners can only monitor the thinking they can name. When the task asks for Cause and Effect, Perspective or Systems Thinking, formative assessment can focus on the operation rather than the length of the answer. This gives teachers a cleaner moderation question: what kind of thinking did the task demand, and what evidence shows the learner used it?
Stern, Ferraro and Mohnkern (2017) describe transfer task design for conceptual understanding. It is built around factual, conceptual and transfer questions. For heads of teaching and learning, this three-part structure is useful. It separates content security from conceptual explanation and transfer, rather than treating all three as one grade.
| Level | Question Type | MYP Achievement Band | Thinking Framework Operation |
|---|---|---|---|
| 1, Factual | Do they know the content? | 1-4 | Classify, Sequence, Compare, Part-Whole |
| 2, Conceptual | Can they explain the relationships? | 5-6 | Cause and Effect, Analogy |
| 3, Transfer | Can they apply it to a new context? | 7-8 | Perspective, Systems Thinking |
MYP summative tasks should include three levels. Learners who answer only factual questions reach level 4. Learners who also answer conceptual questions show level 5-6 understanding. Learners who complete transfer tasks, evaluate perspectives or judge effects, reach level 7-8.
Most MYP summatives that teachers share at moderation sessions have level 1 and sometimes level 2 tasks. The transfer task is either absent or disguised as a writing task ("write a report that evaluates...") without scaffolding the specific cognitive operation being assessed.
This creates a clear challenge for novice learners. Conceptual transfer is not the starting point for every class. Kirschner, Sweller and Clark (2006) show why problem solving with little guidance can overload learners who do not yet have enough background knowledge.
A Year 7 learner who cannot recall the causes of urbanisation fluently is unlikely to make a valid transfer argument about a new housing crisis. In that case, the task has not shown higher order thinking. It has exposed weak factual automaticity.
The practical response is not to stop grading conceptual understanding. It is to stop grading it too early. Teach the facts, rehearse the relationships, then assess transfer when learners have enough schema to think with. This protects learner learning and makes the assessment fairer for SEND and EAL learners, who are often penalised when abstract writing is used as a proxy for thinking.
Wiggins and McTighe (2005) are direct on this point: if you only assess what learners can do with familiar material, you are assessing memory, not understanding. The transfer task does not need to be a wholly new topic. It needs to present the concept in a context the learner has not studied before.

An economics learner studies supply and demand in global commodity markets. They should be assessed on applying these concepts to a local housing shortage they haven't analysed before. That is a transfer task.
Three patterns appear often in MYP moderation. They need to be named clearly. Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.
Using the verb "analyse" in the task but accepting "describe" in the marking. This is the most common problem. The task says "analyse the impact of...", but markers often accept detailed description as evidence.
To fix this, build the analysis into the task. Ask for a causal claim, a comparison of explanations, or a judgement about evidence quality. Without that limit, the rubric rewards confident writing rather than conceptual understanding.
Over-scaffolding so the rubric does the thinking. Some teachers provide rubric criteria so detailed that the learner only has to match the descriptor rather than construct an original response. At level 7 to 8, descriptors should describe the quality of thinking, not the content of the answer. If the level 8 descriptor includes the arguments the learner should make, it has turned a level 4 task into level 8 packaging.
Assessing the product, not the process. In subjects like Design and Drama, teachers sometimes mark the finished artefact rather than the thinking behind it. A beautiful model built from a template shows Sequence thinking at level 1-2.
A model designed to solve a new constraint shows Systems Thinking at level 7-8. It has been tested against criteria and revised using feedback evidence. The product may look similar, but the cognitive process is entirely different.
Wiliam (2011) argues that assessment works when teachers and learners share a clear view of the goal, the current state and the next move. For MYP teams, that means writing learning intentions at two levels: the surface product learners will produce and the deeper thinking operation they must use. The rubric names achievement. The task has to name the thinking.

Place the Thinking Framework operations over standard MYP rubrics. For example, use them to improve Year 10 History Criterion B (Investigating). This change benefits the learner. Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.
Before (standard MYP-style descriptor):
| Level | Standard Descriptor |
|---|---|
| 7-8 | The learner consistently analyses and evaluates a range of sources and demonstrates a thorough understanding of historical significance. |
| 5-6 | The learner analyses some sources and demonstrates a substantial understanding of historical significance. |
| 3-4 | The learner describes some sources and demonstrates an adequate understanding of historical significance. |
After (Thinking Framework-informed descriptor):
| Level | TF Operation | Thinking Framework-Informed Descriptor |
|---|---|---|
| 7-8 | Perspective + Systems Thinking | The learner evaluates sources by examining competing interpretations and identifying how the historian's standpoint, context, or purpose shapes the argument. They identify patterns across sources that reveal systemic factors in historical change. |
| 5-6 | Cause and Effect | The learner explains how specific sources provide evidence for causal claims. They make explicit connections between historical evidence and the factors they identify as significant, rather than treating evidence as illustration. |
| 3-4 | Compare + Part-Whole | The learner describes what sources show and identifies similarities or differences between them. They can break a source down into its component claims but do not yet explain why those claims are significant. |
Assessments should show how learners think, not just how much work they finish. The shift from level 5-6 to 7-8 is about cognitive processes. Learners move from explaining causes to evaluating perspectives, weighing evidence and noticing system effects.
This also makes the rubric usable as a formative assessment tool. Partway through a unit, you can ask learners: 'Which level is your current draft at?' A learner can identify whether they are writing at the Compare level or the Cause and Effect level. That is a meaningful metacognitive question. "Is this adequately detailed?" is not.
Biggs and Collis (1982) developed SOLO Taxonomy, which many teachers know. It has five levels: prestructural, unistructural, multistructural, relational and extended abstract. The Thinking Framework operations match these levels. This makes them useful for learners in IB programmes.
Where SOLO provides a description of the structure of learner responses, the Thinking Framework provides the operation that produces that structure. A learner at SOLO's "relational" level is performing Cause and Effect or Compare thinking. A learner at "extended abstract" is performing Perspective or Systems Thinking. The two frameworks are complementary: SOLO tells you what you see in the work, the Thinking Framework tells you what to design for.
Teachers often wrongly award level 7-8 for detailed factual knowledge. SOLO multistructural responses list facts without connections (Biggs & Collis, 1982). These belong in level 3-4, despite detail. The Thinking Framework clarifies this better than the IB rubric.
One of the most practical consequences of using the Thinking Framework depth mapping is that it makes written feedback specific enough to be acted upon. Hattie (Hattie, 2009) and Timperley (2007), aligned with Hattie's later visible learning synthesis (Hattie, 2009), identify three conditions that make feedback effective. It must address where the learner currently is, where they need to go and how to get there. Generic level descriptions fail all three conditions.
Compare two comments about Sarah. "Sarah understands different map projections and can describe them." Now compare it with: "Sarah shows a strong grasp of different map projections and can judge how useful each one is for showing spatial phenomena."
The first comment stays at the surface. The second points to deeper understanding (Sadler, 1989). Hattie and Timperley (2007) found that feedback affects learner progress. Shute (2008) also found that feedback has a strong effect on learning.
A weak report comment says: Ahmed understands migration patterns well and writes clearly. To improve, he should analyse information more deeply and consider other points of view. It sounds positive, but it does not tell Ahmed what kind of thinking to attempt next.
A stronger comment says: Ahmed links cause and effect in migration when explaining push and pull factors. To reach level 7-8, he should use the Perspective operation by testing the same migration pattern from the viewpoint of the migrant, the receiving community and the policy-maker. The next task should give him practice at that operation, rather than asking for a longer answer.
Ahmed sees which operation he can use now and which one he still needs to develop. The feedback then shows what better work should look like in the next task. Metacognition research describes this as useful feedback (e.g. Nelson & Narens, 1990; Flavell, 1979). The learner understands how to improve, rather than only hearing that the work needs fixes.
These criteria give departments common ground for moderation. Instead of debating whether a response "sounds analytical", ask: "Does this learner use Cause and Effect thinking, or Perspective thinking?" That question focuses observation, discussion and staff training on visible evidence in the work.
For your next MYP summative task, name the highest Thinking Framework operation assessed at levels 7-8 before writing criteria. Make the task require that operation for learners to achieve the top band.
For Perspective tasks at level 7-8, give learners claims that can truly be read in more than one way. Learners must evaluate interpretations, not just describe them. "Consider viewpoints" isn't enough.
To assess this well, ask learners to judge the evidence behind each viewpoint. They should also explain how the viewpoint could change (Wiliam, 2018).
Then write your rubric level descriptions from the highest band down. Level 7-8 describes Perspective or Systems Thinking. Level 5-6 describes what Cause and Effect thinking looks like in this task. Level 3-4 describes Compare or Part-Whole thinking.
Level 1-2 describes Classify or Sequence thinking. The criteria now form a cognitive ladder, not just a detail scale.
Use this method for the first summative assessment, then compare learner responses with earlier work on the same topic. In 2026, also check the assessment format itself. Large language models can produce fluent generalisations, so written synthesis alone is weaker evidence than it used to be.
Ofqual (2024) treats AI use in non-exam assessment as a live validity risk. UCL (2025) advises clear categories for acceptable GenAI use. Schools can allow AI for planning only when the final judgement still comes from in-room explanation, annotated evidence, oral defence or another format that shows the learner's own reasoning.
These resources provide practical help with IB assessment design. They also assist with using the Thinking Framework across your department. Use the current IB MYP subject guides and programme resource centre materials alongside these classroom tools when reviewing assessment design.
Free for teachers. The platform builds a classroom-ready lesson plan from your topic in under two minutes.
The strongest critique concerns development. Conceptual transfer is not a shortcut around knowledge. Kirschner, Sweller and Clark (2006) argue that novice learners need explicit teaching, because open problem solving can overload working memory; Ashman (2021) makes the same case for classroom tasks. If MYP teams ask learners to generalise before they are fluent with the facts, conceptual assessment can widen gaps instead of showing understanding.
A second limit is measurement. Bloom (1956), Webb (1997) and Hattie (2009) help teachers name cognitive demand, but they do not solve the scoring problem. Koretz (2014) warned that complex assessments can lose reliability when broad constructs are judged from small samples of work.
Christodoulou (2017) adds that generic skill assessment often rewards vocabulary, background knowledge and coached performance. In the MYP, a polished transfer essay may therefore measure fluent language as much as conceptual synthesis.
There is also a cultural critique. Some international education models treat abstract reasoning, away from real context, as culturally neutral. Andreotti (2011) and Doherty (2020) question this view. Some learners may show relational or community-based knowledge in ways that do not fit neatly inside analytic rubrics.
AI adds a newer limit. Generative tools can draft responses that sound plausible and conceptual. Ofqual (2024) and UCL (2025) both point to the need for clearer rules here. Even with these limits, the framework remains valuable as a way to design tasks, not as a substitute for subject knowledge, moderation and professional judgement.
Bloom, B. (1956). Taxonomy of educational objectives.
Hattie, J. (2009). Visible learning.
Webb, N. (1997). Criteria for alignment of expectations and assessments.
The sources below are the verified core texts behind this guide's assessment approach. They cover concept-based curriculum design, transfer-focused assessment, backward design, SOLO taxonomy, Depth of Knowledge, feedback and metacognition.
Concept-Based Curriculum and Instruction for the Thinking Classroom View study ↗
Core text
Erickson, H. L., & Lanning, L. A. (2014)
Erickson and Lanning (2014) clarify facts, topics, concepts, generalisations and theory through the Structure of Knowledge model. This distinction helps teachers design assessment tasks that move beyond topic recall towards transferable conceptual understanding.
Tools for Teaching Conceptual Understanding, Secondary View study ↗
Core text
Stern, J., Ferraro, K., & Mohnkern, J. (2017)
Stern, Ferraro and Mohnkern (2017) link factual, conceptual and transfer levels to task design. The book gives subject examples that help MYP coordinators design assessment tasks with clearer cognitive demand.
Understanding by Design (Expanded 2nd Edition) View study ↗
Core text
Wiggins, G., & McTighe, J. (2005)
Wiggins and McTighe's (2005) backward design helps plan assessments before lessons. Their six facets support Thinking Framework depth mapping. This is useful for defining transfer across subjects.
The Power of Feedback View study ↗
81+ citations
Hattie, J., & Timperley, H. (2007)
Hattie and Timperley's research identified key feedback elements. They are: current learner position, target destination, and bridging strategies. Their work shows why generic rubrics fail to improve learner outcomes. This research informs the Thinking Framework report comments (Hattie & Timperley, 2007).
Alignment of Assessment, Curriculum, and Instruction View study ↗
Core text
Webb, N. L. (1997)
Webb (dates unspecified) argued Depth of Knowledge is a task property. Assessment researchers design questions with cognitive demand (Webb). This framework directly informs the level-to-operation mapping discussed in this guide.
Formative. Diagnostic. Free for teachers.
Open a free account and help organise learners' thinking with evidence-based graphic organisers. Reduce cognitive load and guide schema building dynamically.