Stop Grading Facts: Assessing Conceptual Understanding in the MYP

MYP classroom illustration of a teacher assessing conceptual understanding rather than factual recall.

Updated on

June 20, 2026

Stop Grading Facts: Assessing Conceptual Understanding in the MYP

Stop assessing recall and start assessing understanding. A practical rubric for MYP teachers using the Thinking Framework to map depth of knowledge across achievement levels 1-8.

Build your next lesson free Explore the toolkit

Copy citation

In this article

‍

Infographic comparing factual recall and conceptual assessment in the MYP to design better tasks for learners. — Factual Recall vs. Conceptual Assessment in the MYP

‍

Most MYP rubrics ask teachers to assess "conceptual understanding" at levels 5 to 8. Most MYP teachers end up assessing whether learners can describe things clearly. The gap between those two outcomes is not a marking problem. It is a task design problem.

Key Takeaways

Recall is not understanding: A learner who can name the parts of a cell is not demonstrating conceptual understanding. A learner who can explain why selective permeability matters is.
The Thinking Framework maps directly to MYP levels: Each of the eight cognitive operations corresponds to a specific achievement level band, giving teachers a practical tool for task design.
Three-level assessment architecture: Every MYP summative should have tasks at factual, conceptual, and transfer levels. Most only have the first.
Erickson's generalisation level is the target: The IB asks learners to work at the level of generalisations and theories, not facts and topics. Most assessments stop short of this.
Report comments write themselves: When you assess with the depth mapping, you can cite the exact cognitive operation a learner is using, making written feedback specific and actionable.

‍

The Problem: Teachers Grade What Is Easy to Measure

A Year 9 history teacher asks learners to "explain the causes of the First World War". Forty responses arrive. Most learners list militarism, alliances, imperialism and nationalism from the lesson notes, then add a short conclusion. The work is accurate, but the task has allowed description to pass as explanation.

To stop grading facts, the question has to ask learners to test a causal relationship, weigh evidence and apply the pattern to a case they have not rehearsed.

A learner can describe a topic well and still not analyse it. A learner can recall facts well and still not show conceptual understanding.

The IB's MYP criterion descriptors explicitly use words like "analyse", "evaluate", and "synthesise" at the higher achievement levels. Yet when the task only asks learners to describe, teachers have no choice but to mark what they receive. The rubric has been rendered meaningless at levels 7 and 8 because the task never required that depth of thinking.

This is not a teacher failure. It is a validity problem, and it affects school data. If level 7 and 8 are awarded for polished recall in Key Stage 3, SLT receives inflated learner learning data. Year 12 teachers then inherit learners who can write fluently but struggle to defend a claim in DP or A-Level study.

Heads of teaching and learning should therefore spend less CPD time calibrating broad rubric language and more time redesigning the tasks that set the ceiling for achievement. Wiliam (2011) would call this assessment validity drift: the descriptor claims to assess one construct, while the task measures another.

‍

What the IB Actually Means by Conceptual Understanding

In the MYP, conceptual understanding means that a learner can use facts and relationships to explain a transferable idea in a new context. The MYP is one of the IB's four programmes: PYP (ages 3 to 12), MYP (11 to 16), DP (16 to 19) and Career-related Programme / CP (16 to 19). Do not import PYP concept lists into MYP assessment. The current PYP has seven specified concepts, often still called key concepts: form, function, causation, change, connection, perspective and responsibility.

Reflection was the eighth historical lens, but the Enhanced PYP (2018) treats reflection as a continuous practice across inquiry, assessment and action; many school websites still say "8 lenses" in 2026. Erickson and Lanning (2014) help here because their model separates facts, topics, concepts, generalisations and theory. This matters in teaching and learning because a learner can recite the topic but still miss the transferable relationship that the rubric is meant to assess.

Layer	Example (Biology)	Assessment Implication
Facts	The cell membrane is made of phospholipids	Level 1-2 assessment: recall and identification
Topics	Cell membrane structure	Topic-level description: names components
Concepts	Selective permeability	Level 3-4: describes the concept with detail
Generalisations	Cells maintain internal conditions by controlling what moves across membranes	Level 5-8 target: explains relationships
Theory	Cell theory: living organisms are composed of cells that maintain homeostasis	Level 7-8 synthesis: evaluates across systems

The IB states that learners should work at the generalisation and theory levels, not just at the fact and topic levels. Yet most MYP summative tasks reach only the concept level at best. In Erickson's model, a learner who "describes selective permeability" has reached level 3 or 4. When the task asks them to 'explain why cells would cease to function if membrane permeability became non-selective', it moves them towards generalisation, where levels 5 to 8 belong.

Grant Wiggins and Jay McTighe (2005) make the same point in Understanding by Design. Their "facets of understanding" model treats transfer as the harder end of understanding: can the learner apply an idea to a context not practised in class? That does not mean asking novices to generalise before they know enough.

Kirschner, Sweller and Clark (2006) warn that unguided problem solving can overload working memory, and Ashman (2021) makes the same point for classroom instruction. In MYP assessment, factual fluency is not the enemy of conceptual work. It is the base that makes transfer fair.

‍

The Thinking Framework Depth Mapping

The Thinking Framework's eight cognitive operations can be used as a local task-design map for MYP achievement levels. They are not an IB scoring scale. Their value is practical: each operation names the cognitive work a learner must perform if the assessment is to distinguish recall, explanation, transfer and evaluation.

MYP Level	Cognitive Demand	Thinking Framework Operations	What Learner Work Looks Like
1-2 (Limited)	Recall, identify	Classify (sort into given categories), Sequence (put in given order)	Lists facts, follows a template, copies structure
3-4 (Adequate)	Describe, outline	Compare (similarities and differences), Part-Whole (break down into components)	Describes with some detail, identifies components, makes basic comparisons
5-6 (Substantial)	Analyse, explain	Cause and Effect (explain why), Analogy (connect to other contexts)	Explains relationships, identifies causes, transfers to new contexts
7-8 (Excellent)	Evaluate, synthesise	Perspective (multiple viewpoints), Systems Thinking (interconnections)	Evaluates competing arguments, synthesises across sources, identifies systemic patterns

This mapping does something that standard Bloom (Bloom, 1956)'s Taxonomy guidance (Bloom, 1956) does not do for the MYP context: it gives you a concrete cognitive operation to design around, not just a verb to look for in learner writing. "Analyse" is an instruction to learners. "Cause and Effect" is the thinking structure you build the task around.

The distinction matters because Bloom's verbs describe what you want learners to do in their response. The Thinking Framework operations describe the mental structure that makes that response possible. When you design a task that requires Cause and Effect thinking, you are not just hoping learners will analyse. You are building the question so that analysis is the only way to answer it.

Norman Webb (1997) made a similar argument with his Depth of Knowledge framework: the cognitive demand is a property of the task, not the learner's response. A Depth of Knowledge level 1 task cannot produce level 4 evidence regardless of how capable the learner is. The same logic applies here. If your MYP task only requires Classify thinking, you cannot award level 7 even if a learner's response is beautifully written.

‍

Designing Assessment Tasks at Each Level

Worked examples demonstrate how each Thinking Framework translates to an MYP task. Subject examples are from different areas, proving the mapping is not subject specific (Wiggins and McTighe, 2005). Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.

Level	Operation	Example Task
1-2	Classify	Sort these ten historical events into the categories 'political', 'economic', and 'social'.
1-2	Sequence	Arrange these stages of mitosis in the correct order and name each one.
3-4	Compare	Identify three similarities and three differences between the causes of the First and Second World Wars.
3-4	Part-Whole	Break down a short story into its narrative components: setting, protagonist, conflict, resolution. Describe what each contributes.
5-6	Cause and Effect	Explain why industrialisation caused urbanisation in 19th-century Britain, then predict where you would expect to see similar patterns emerging in a developing economy today.
5-6	Analogy	A learner says that a cell is like a factory. Evaluate this analogy: which aspects of cellular function does it explain well, and which aspects does it fail to capture?
7-8	Perspective	Evaluate the claim that globalisation benefits everyone by examining it from the perspectives of a multinational company, a rural farmer in a developing country, and an environmental scientist.
7-8	Systems Thinking	A government introduces a sugar tax. Map all the likely effects across health outcomes, food industry behaviour, household economics, and public attitudes. Identify which effects could be self-reinforcing.

Notice how the Cause and Effect task at level 5 to 6 stops learners from answering by description alone. They must explain a mechanism and then transfer that mechanism to a new context. Even so, there is still a validity risk.

Koretz (2014) warns that complex performance tasks can be hard to score reliably. Christodoulou (2017) argues that assessments of broad skills often reward prior knowledge and verbal fluency. For MYP moderation, level 7 to 8 evidence should show the reasoning process, not only a polished written product.

Metacognition matters here because learners can only monitor the thinking they can name. When the task asks for Cause and Effect, Perspective or Systems Thinking, formative assessment can focus on the operation rather than the length of the answer. This gives teachers a cleaner moderation question: what kind of thinking did the task demand, and what evidence shows the learner used it?

‍

The Three-Level Assessment Architecture

Stern, Ferraro and Mohnkern (2017) describe transfer task design for conceptual understanding. It is built around factual, conceptual and transfer questions. For heads of teaching and learning, this three-part structure is useful. It separates content security from conceptual explanation and transfer, rather than treating all three as one grade.

Level	Question Type	MYP Achievement Band	Thinking Framework Operation
1, Factual	Do they know the content?	1-4	Classify, Sequence, Compare, Part-Whole
2, Conceptual	Can they explain the relationships?	5-6	Cause and Effect, Analogy
3, Transfer	Can they apply it to a new context?	7-8	Perspective, Systems Thinking

MYP summative tasks should include three levels. Learners who answer only factual questions reach level 4. Learners who also answer conceptual questions show level 5-6 understanding. Learners who complete transfer tasks, evaluate perspectives or judge effects, reach level 7-8.

Most MYP summatives that teachers share at moderation sessions have level 1 and sometimes level 2 tasks. The transfer task is either absent or disguised as a writing task ("write a report that evaluates...") without scaffolding the specific cognitive operation being assessed.

This creates a clear challenge for novice learners. Conceptual transfer is not the starting point for every class. Kirschner, Sweller and Clark (2006) show why problem solving with little guidance can overload learners who do not yet have enough background knowledge.

A Year 7 learner who cannot recall the causes of urbanisation fluently is unlikely to make a valid transfer argument about a new housing crisis. In that case, the task has not shown higher order thinking. It has exposed weak factual automaticity.

The practical response is not to stop grading conceptual understanding. It is to stop grading it too early. Teach the facts, rehearse the relationships, then assess transfer when learners have enough schema to think with. This protects learner learning and makes the assessment fairer for SEND and EAL learners, who are often penalised when abstract writing is used as a proxy for thinking.

Wiggins and McTighe (2005) are direct on this point: if you only assess what learners can do with familiar material, you are assessing memory, not understanding. The transfer task does not need to be a wholly new topic. It needs to present the concept in a context the learner has not studied before.

Stop Grading Facts: Assessing Conceptual Understanding in the MYP — visual explainer sketchnote — An at-a-glance visual summary of Stop Grading Facts: Assessing Conceptual Understanding in the MYP.

An economics learner studies supply and demand in global commodity markets. They should be assessed on applying these concepts to a local housing shortage they haven't analysed before. That is a transfer task.

‍

Common Assessment Mistakes in the MYP

Three patterns appear often in MYP moderation. They need to be named clearly. Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.

Using the verb "analyse" in the task but accepting "describe" in the marking. This is the most common problem. The task says "analyse the impact of...", but markers often accept detailed description as evidence.

To fix this, build the analysis into the task. Ask for a causal claim, a comparison of explanations, or a judgement about evidence quality. Without that limit, the rubric rewards confident writing rather than conceptual understanding.

Over-scaffolding so the rubric does the thinking. Some teachers provide rubric criteria so detailed that the learner only has to match the descriptor rather than construct an original response. At level 7 to 8, descriptors should describe the quality of thinking, not the content of the answer. If the level 8 descriptor includes the arguments the learner should make, it has turned a level 4 task into level 8 packaging.

Assessing the product, not the process. In subjects like Design and Drama, teachers sometimes mark the finished artefact rather than the thinking behind it. A beautiful model built from a template shows Sequence thinking at level 1-2.

A model designed to solve a new constraint shows Systems Thinking at level 7-8. It has been tested against criteria and revised using feedback evidence. The product may look similar, but the cognitive process is entirely different.

Wiliam (2011) argues that assessment works when teachers and learners share a clear view of the goal, the current state and the next move. For MYP teams, that means writing learning intentions at two levels: the surface product learners will produce and the deeper thinking operation they must use. The rubric names achievement. The task has to name the thinking.

‍

Teacher and learners plan inquiry and project work with ATL routines in an International Baccalaureate classroom. — IB MYP Project Learning in Action in practice: learners connect concepts, evidence and project decisions.

‍

Rubric Design Using the Thinking Framework

Place the Thinking Framework operations over standard MYP rubrics. For example, use them to improve Year 10 History Criterion B (Investigating). This change benefits the learner. Use it as a starting point for professional discussion: identify the learner's current need, record evidence from more than one lesson, and agree the next classroom adjustment with the SENCO or family.

Before (standard MYP-style descriptor):

Level	Standard Descriptor
7-8	The learner consistently analyses and evaluates a range of sources and demonstrates a thorough understanding of historical significance.
5-6	The learner analyses some sources and demonstrates a substantial understanding of historical significance.
3-4	The learner describes some sources and demonstrates an adequate understanding of historical significance.

After (Thinking Framework-informed descriptor):

Level	TF Operation	Thinking Framework-Informed Descriptor
7-8	Perspective + Systems Thinking	The learner evaluates sources by examining competing interpretations and identifying how the historian's standpoint, context, or purpose shapes the argument. They identify patterns across sources that reveal systemic factors in historical change.
5-6	Cause and Effect	The learner explains how specific sources provide evidence for causal claims. They make explicit connections between historical evidence and the factors they identify as significant, rather than treating evidence as illustration.
3-4	Compare + Part-Whole	The learner describes what sources show and identifies similarities or differences between them. They can break a source down into its component claims but do not yet explain why those claims are significant.

Assessments should show how learners think, not just how much work they finish. The shift from level 5-6 to 7-8 is about cognitive processes. Learners move from explaining causes to evaluating perspectives, weighing evidence and noticing system effects.

This also makes the rubric usable as a formative assessment tool. Partway through a unit, you can ask learners: 'Which level is your current draft at?' A learner can identify whether they are writing at the Compare level or the Cause and Effect level. That is a meaningful metacognitive question. "Is this adequately detailed?" is not.

‍

The SOLO Taxonomy Connection

Biggs and Collis (1982) developed SOLO Taxonomy, which many teachers know. It has five levels: prestructural, unistructural, multistructural, relational and extended abstract. The Thinking Framework operations match these levels. This makes them useful for learners in IB programmes.

Where SOLO provides a description of the structure of learner responses, the Thinking Framework provides the operation that produces that structure. A learner at SOLO's "relational" level is performing Cause and Effect or Compare thinking. A learner at "extended abstract" is performing Perspective or Systems Thinking. The two frameworks are complementary: SOLO tells you what you see in the work, the Thinking Framework tells you what to design for.

Teachers often wrongly award level 7-8 for detailed factual knowledge. SOLO multistructural responses list facts without connections (Biggs & Collis, 1982). These belong in level 3-4, despite detail. The Thinking Framework clarifies this better than the IB rubric.

‍

Free slide deck

The key ideas on this topic as classroom-ready slides.

Something went wrong — please try again.

✓ On its way. Download the slides now.

One email, instant download. No spam.

‍

What This Means for Report Writing

One of the most practical consequences of using the Thinking Framework depth mapping is that it makes written feedback specific enough to be acted upon. Hattie (Hattie, 2009) and Timperley (2007), aligned with Hattie's later visible learning synthesis (Hattie, 2009), identify three conditions that make feedback effective. It must address where the learner currently is, where they need to go and how to get there. Generic level descriptions fail all three conditions.

Compare two comments about Sarah. "Sarah understands different map projections and can describe them." Now compare it with: "Sarah shows a strong grasp of different map projections and can judge how useful each one is for showing spatial phenomena."

The first comment stays at the surface. The second points to deeper understanding (Sadler, 1989). Hattie and Timperley (2007) found that feedback affects learner progress. Shute (2008) also found that feedback has a strong effect on learning.

A weak report comment says: Ahmed understands migration patterns well and writes clearly. To improve, he should analyse information more deeply and consider other points of view. It sounds positive, but it does not tell Ahmed what kind of thinking to attempt next.

A stronger comment says: Ahmed links cause and effect in migration when explaining push and pull factors. To reach level 7-8, he should use the Perspective operation by testing the same migration pattern from the viewpoint of the migrant, the receiving community and the policy-maker. The next task should give him practice at that operation, rather than asking for a longer answer.

Ahmed sees which operation he can use now and which one he still needs to develop. The feedback then shows what better work should look like in the next task. Metacognition research describes this as useful feedback (e.g. Nelson & Narens, 1990; Flavell, 1979). The learner understands how to improve, rather than only hearing that the work needs fixes.

These criteria give departments common ground for moderation. Instead of debating whether a response "sounds analytical", ask: "Does this learner use Cause and Effect thinking, or Perspective thinking?" That question focuses observation, discussion and staff training on visible evidence in the work.

‍

What to Try With Your Next Summative

For your next MYP summative task, name the highest Thinking Framework operation assessed at levels 7-8 before writing criteria. Make the task require that operation for learners to achieve the top band.

For Perspective tasks at level 7-8, give learners claims that can truly be read in more than one way. Learners must evaluate interpretations, not just describe them. "Consider viewpoints" isn't enough.

To assess this well, ask learners to judge the evidence behind each viewpoint. They should also explain how the viewpoint could change (Wiliam, 2018).

Then write your rubric level descriptions from the highest band down. Level 7-8 describes Perspective or Systems Thinking. Level 5-6 describes what Cause and Effect thinking looks like in this task. Level 3-4 describes Compare or Part-Whole thinking.

Level 1-2 describes Classify or Sequence thinking. The criteria now form a cognitive ladder, not just a detail scale.

Use this method for the first summative assessment, then compare learner responses with earlier work on the same topic. In 2026, also check the assessment format itself. Large language models can produce fluent generalisations, so written synthesis alone is weaker evidence than it used to be.

Ofqual (2024) treats AI use in non-exam assessment as a live validity risk. UCL (2025) advises clear categories for acceptable GenAI use. Schools can allow AI for planning only when the final judgement still comes from in-room explanation, annotated evidence, oral defence or another format that shows the learner's own reasoning.

These resources provide practical help with IB assessment design. They also assist with using the Thinking Framework across your department. Use the current IB MYP subject guides and programme resource centre materials alongside these classroom tools when reviewing assessment design.

‍

Key Takeaways

Task design determines ceiling: The cognitive operation in the task sets the maximum level a learner can demonstrate. No matter how capable the learner, a Classify task cannot produce Perspective thinking.
Erickson's generalisation level is the MYP target: Assessing at the fact or topic level is not what the MYP rubric is designed to reward. The upper bands require generalisation and theory-level responses.
The depth mapping is a design tool, not just a marking tool: Use the Thinking Framework operations before you write the task, not after you have received the learner responses.
Three-level architecture: Every MYP summative should contain factual tasks (levels 1-4), conceptual tasks (levels 5-6), and transfer tasks (levels 7-8). Most only have the first level.
Feedback becomes specific: Naming the cognitive operation a learner is using or needs to develop transforms generic level descriptions into actionable next steps.

‍

Limitations and Critiques

The strongest critique concerns development. Conceptual transfer is not a shortcut around knowledge. Kirschner, Sweller and Clark (2006) argue that novice learners need explicit teaching, because open problem solving can overload working memory; Ashman (2021) makes the same case for classroom tasks. If MYP teams ask learners to generalise before they are fluent with the facts, conceptual assessment can widen gaps instead of showing understanding.

A second limit is measurement. Bloom (1956), Webb (1997) and Hattie (2009) help teachers name cognitive demand, but they do not solve the scoring problem. Koretz (2014) warned that complex assessments can lose reliability when broad constructs are judged from small samples of work.

Christodoulou (2017) adds that generic skill assessment often rewards vocabulary, background knowledge and coached performance. In the MYP, a polished transfer essay may therefore measure fluent language as much as conceptual synthesis.

There is also a cultural critique. Some international education models treat abstract reasoning, away from real context, as culturally neutral. Andreotti (2011) and Doherty (2020) question this view. Some learners may show relational or community-based knowledge in ways that do not fit neatly inside analytic rubrics.

AI adds a newer limit. Generative tools can draft responses that sound plausible and conceptual. Ofqual (2024) and UCL (2025) both point to the need for clearer rules here. Even with these limits, the framework remains valuable as a way to design tasks, not as a substitute for subject knowledge, moderation and professional judgement.

‍

References

Bloom, B. (1956). Taxonomy of educational objectives.

Hattie, J. (2009). Visible learning.

Webb, N. (1997). Criteria for alignment of expectations and assessments.

‍

Further Reading: Key Research Papers

‍

Make Thinking Visible

Open a free account and help organise learners' thinking with evidence-based graphic organisers. Reduce cognitive load and guide schema building dynamically.

Create Free Account No credit card required

About the Author

Paul Main

Founder & Metacognition Researcher

Paul Main is an educator and metacognition researcher who founded Structural Learning in 2002. With a psychology degree from the University of Sunderland and 22+ years helping schools embed thinking skills, he bridges the gap between educational research and classroom practice. Fellow of the RSA and Chartered College of Teaching, with 128+ Google Scholar citations.

Stop Grading Facts: Assessing Conceptual Understanding in the MYP

Paul Main

Key Takeaways

The Problem: Teachers Grade What Is Easy to Measure

What the IB Actually Means by Conceptual Understanding

The Thinking Framework Depth Mapping

Designing Assessment Tasks at Each Level

The Three-Level Assessment Architecture

Common Assessment Mistakes in the MYP

Rubric Design Using the Thinking Framework

The SOLO Taxonomy Connection

What This Means for Report Writing

What to Try With Your Next Summative

Key Takeaways

You have explored Assessment of Conceptual Understanding. Now turn it into a lesson learners will remember.

Limitations and Critiques

References

Further Reading: Key Research Papers

Further Reading: Key Papers on MYP Assessment and Conceptual Understanding

Plan an assessment-for-learning Assessment of Conceptual Understanding lesson.

Make Thinking Visible

International Baccalaureate