This is an expanded version of the talk that I gave at ResearchEd on 9 September 2017. In it I argue that Tim Oates, Dylan Wiliam and Daisy Christodoulou, all educationalists whom I admire, have nevertheless got much wrong in their account of the curriculum. 14,000 words. You can bookmark individual slides by right clicking on the “SLIDE X” caption and selecting “Copy link address”. Slides can be enlarged by clicking on the slide.
In this talk, I am going to challenge some of the key orthodox positions that you will hear widely when you go to conferences such as this. In particular, I am going to challenge some of the positions taken by Tim Oates, Dylan Wiliam and Daisy Christodoulou. These are key thought-leaders in our discourse about education at the moment. Although there is much in what they have been saying that I agree with, I believe that the key conclusions that emerge from their arguments are flawed.
I am going to start by discussing the context in which we find ourselves.
Nearly twenty years ago, Ofsted published the Tooley Report, which made a survey of all the papers published in one year in four of the leading educational journals. The report’s findings were one of the key sources cited by Tom Bennett as a reason for setting up the ResearchEd conferences.
Tooley proposed that almost none of the academic research on education was of any use to the classroom teacher. Only 10% of the papers sampled asked how we could improve teaching and learning, only 15% were based on robust quantitative data; and only 36% were free from major methodological flaws. If you ask what proportion of education research is about improving teaching and learning AND being based on robust data AND being free from major methodological flaws, and if you treat these as three independent variables, then the answer would be that about ½%. And it got worse. Tooley reported that studies were almost never repeated or contested: researchers tended to work “in a vacuum, unnoticed and unheeded” by anyone else. Views became accepted by a process of “academic Chinese whispers” – a sort of distorted rumour mill – and not through robust debate.
It may be that the amount of quantitative studies into teaching and learning has increased. We have seen a lot of talk recently about effect sizes, based on the work of people like John Hattie and the Education Endowment Foundation – but I shall be arguing today that the significance of the quantitative data that we have is still uncertain that that quality of the debate has not changed very much.
Another part of the context in which we find ourselves is that the performance of our education service is extremely inconsistent. Research by Eric Hanusheck, often quoted by Dylan Wiliam, suggests that the best teachers teach in 6 months what the worst teachers take 2 years to teach. That suggests a massive level of inconsistency and under-performance in the system, of a magnitude that would never be tolerated in business or in the health system. Such conclusions have been corroborated recently by the mediocre position of the UK and US in international league tables.
And a third element of the context that I want to start with is what I am calling the current orthodoxy in regards to pedagogy, expressed here by Dylan Wiliam in a recent keynote at the Bryanston Festival of education. The two things that Professor Wiliam thinks we need to do in order to improve the quality of our education is to focus on our curriculum and the amount of formative assessment.
The findings of the Tooley report were bad. The inconsistency of our provision is bad. But these basic conclusions sound plausible at the very least. Stating the argument at this high level, I agree with Dylan. But what these high level statements actually mean and why the curriculum matters is the subject of this talk.
Before I talk about the curriculum, I want to talk about formative assessment, also known as responsive teaching or assessment for learning.
This is Professor Rob Coe talking on the subject of Assessment for Learning at the first ResearchEd conference in 2013, saying that in spite of all the research and all the money spent on promoting assessment for learning, it has had, and I quote, “no impact at all” on educational outcomes. So we cannot continue to advocate assessment for learning without also acknowledging that for 10 or 15 years, assessment for learning has been promoted and funded to no effect. We cannot continue to take the argument for formative assessment seriously unless we also seek to explain this failure of implementation. So in the next section, I will do just that.
Although it is not entirely clear what formative assessment refers to (an issue that I will address in my next post), a fairly safe starting point is to accept Wikipedia when it says: “Feedback is the central function of formative assessment”. If you are going to give feedback to students, you have first to stimulate them to do something. You have to ask them a question, set them a problem, or give them a creative opportunity. In other words, you have to design a learning activity. And this is not always easy.
Tim Oates reports that when the Expert Panel for the National Curriculum Review consulted teachers in 2011, there was widespread opposition to the idea of more practice in the Maths curriculum because most teachers assumed that practice was all about dull repetition. But it isn’t. In Singapore, Oates suggests, the practice provided in textbooks is challenging and interesting. The reason why UK Maths teachers oppose the idea of practice is because they are themselves so bad at designing interesting exercises. And without interesting practice, without activity, there is no possibility of giving students useful feedback or assessing what they need to do next.
The second problem with assessment for learning is that assessing students accurately is difficult.
The 2015 Commission for Assessment without Levels, on which Daisy Christodoulou sat, recommended that summative and formative assessment should be treated quite separately. And in making this argument, the report makes an assumption that accuracy matters for summative assessment but not so much for formative assessment. It says, for example, that standardized tests – by which it means summative exams – “can offer very accurate and reliable information”. In fact standardization does nothing to improve accuracy, it merely ensures the comparability of results. And in fact, our summative exams are not accurate at all: it is commonly estimated that between one fifth and one quarter of our SATs level allocations are wrong.
The way you improve reliability (which is about the consistency of results, being an important prerequisite of accuracy) is by repetition. Any political pollster will tell you that you will not get a very accurate poll if you ask the opinion of only one person – you need to aggregate the responses of a large number of respondents. Our summative examinations are inherently unreliable because they depend on single-sample snapshots of performance.
When it comes to formative assessment, the Commission makes no mention whatsoever of the need for accuracy, while at the same time claiming that results need not be recorded. If you do not record results, you cannot aggregate or corroborate results – you are again committing yourself to a highly unreliable, single-shot form of assessment. The Commission says that the only thing that matters about formative assessment is that its results are acted on. This is an absurd statement. If the results are inaccurate, then the action will be wrong. Surely they aren’t saying that any action will do? Surely they want teachers to modify their teaching in ways that will improve learning and not reduce it? But if that is the case, then the accuracy of formative assessment matters.
The Commission, and the profession more widely, is wrong to assume that accuracy matters more for summative assessment than for formative. If my summative assessment is wrong, I will walk away with the wrong bit of paper at the end of my course. But if my formative assessment is systematically wrong, then the quality of my teaching and learning itself will be damaged. Inaccurate summative assessment creates paper errors; inaccurate formative assessment does real damage to people’s real education. The only reason why people are able to maintain that this is not so important is that the damage is invisible. It is not recorded. It is, in the language of the intelligence agencies, deniable.
The reason that the Commission for Assessment without Levels seeks to separate formative and summative assessment is that it perceives that what is commonly called the “validity” of summative assessment – whether it attempts to assess the right things – is weakened by its attempt to compensate for its inherent lack of reliability. It does this by becoming increasingly formulaic and reductive and the Commission does not want formative assessment to be contaminated by poorly conceived summative assessment practice.
But this is a weak argument. First, it is unrealistic. Do we really expect teachers not to teach to the test? Are they to tell their students, “I don’t care if you fail your exams – I will teach you something much more important”?
Second, if summative exams are formulaic and unreliable, then this is in itself a problem. Shouldn’t we be trying to solve the root problem with summative assessment, rather than merely to contain it?
Tim Oates recognises the problem with summative assessment. Cambridge Assessment “could make GCSEs incredibly reliable”, he says (implicitly admitting that they are not very reliable at the moment) – but only by making them “long and incredibly expensive”. This argument acknowledges the importance of repetition and corroboration that I mentioned a moment ago. Oates assumes that it would not be realistic to make exams 40 hours long. But he would be wrong, if we were to derive summative results from formative assessments, ignoring the recommendations of Wiliam and Christodoulou and the Commission for Assessment without Levels that formative data should not be recorded. If we created really good practice activities and constantly monitored student performance on those activities, we would improve the learning of students, we would improve the opportunities for formative assessment, and we would improve the validity and reliability of our summative assessment all at the same time. 40 hours of practice is not too long because practice is the very stuff of a good education. Teachers do not have the expertise or resources to produce and manage such extensive programmes of formative assessment – but that is no reason why specialist providers could not do it, if we gave them the opportunity.
So the second difficulty with formative assessment is that it is difficult to achieve accuracy in our assessments and that, contrary to the general perception of educationalists, we need our assessments to be accurate if our interventions are going to be effective.
A third difficulty with formative assessment is that it implies personalisation, which is difficult to manage. The need for personalisation is regularly denied by the proponents of Assessment for learning, such as Dylan Wiliam, who suggest that all that is required is a couple of oral questions to a class, allowing the teacher to derive a general impression of whether a class has “got it” or not. But actions are personal, misconceptions are personal and it follows that feedback and progression needs to be personal as well.
One of the key recommendations of the Commission for Assessment without Levels was for Mastery Learning. It did not explain what it meant by “Mastery Learning”. But whatever it was, it said that “high quality research” had found that such an approach produced “consistent and positive impacts”. It supported that statement with references to an Encyclopedia by Guskey and a meta-analysis by Kulik.
Mastery learning was an idea publicised from the 1960s, principally by Benjamin Bloom (of “Bloom’s taxonomy” fame). It involved setting a threshold test at the end of every topic and ensuring that the class did not move on until everyone had passed the test. Those who failed the test at the first attempt were given remedial teaching and the chance to re-take. Kulik published a meta-analysis in 1990, which claimed that the method was very successful – but this conclusion was very seriously challenged by Robert Slavin’s 1996 article, Mastery learning revisited.
Slavin pointed out that Mastery Learning required “enormous amounts of corrective instruction” for students who did not pass the threshold test – and this involved two fundamental problems. First, this amount of remediation “could never be applied in real classrooms”. Second, it invalidated the research that Kulik was reporting. It is not surprising that students who were instructed according to Mastery Learning principles did better that those who did not, as “the total instruction time provided was sometimes two or three times more in experimental groups than control groups”.
If the Commission for Assessment without Levels was not aware of the Slavin article, it only had to refer to Wikipedia to learn that the technique had generally been abandoned because of “the difficulty of managing the classroom when each student is following an individual course of learning”. The Commission for Assessment without Levels recommended Mastery Learning without explaining what it is; without acknowledging that it is almost impossible to implement, and justifying its recommendation by referencing discredited research.
I am not saying that the principles of Mastery Learning are not worth looking at carefully but like all formative learning, it implies personalisation, and personalisation is very difficult to manage.
A fourth problem with formative assessment is the need for timeliness of feedback. Take a basic learning interaction, like riding a bicycle. The student gets on the bicycle and pushes off – that is the action. The bicycle wobbles and the student falls off – that is the reaction that closes the feedback loop. The student reflects on what happened, and tries again. This basic feedback loop echoes what Professor Diana Laurillard has called the conversational framework – though in the case of a bicycle, the conservation is of a non-verbal kind. The learning happens because of the misalignment of the reaction and the student’s expectation, and because of the association that the student is able to make between exactly what was done and what happened as a result. There may be several iterations of reflection, some occurring immediately and some being delayed (a point I will come back to later), some sorts of reflection and consolidation may happen in your sleep or months later in the middle of the summer holidays. But if there is a long delay between action and reaction, then it will be difficult for the the student to make an association between the two and the learning opportunity will be lost. The opportunity for a quick iteration of attempts – to get back on the bicycle and try again – is also lost. Rapid feedback is therefore required – yet this requires individual, real-time interaction between student and the instructive other with which the interaction is occurring. In a class of 30 students, learning abstract concepts or thinking skills, the instructive other is likely to be a teacher and achieving that speed of response becomes almost impossible.
The fifth problem is the sensitivity of learning to different types of feedback.
Dylan Wiliam often cites the Education Endowment Foundation research to make the case that feedback is one of the most effective inventions that a teacher can make. At the same time, he acknowledges that the effectiveness of feedback varies widely: sometimes it is super-effective; but in nearly 40% of studies, feedback was actually counter-productive. Feedback is not a commodity that comes in a vat and can be sprayed around the room indiscriminately. To be useful, not only do you need to assess the student’s current state of knowledge accurately, you also need to choose the right sort of feedback to provide. That is difficult to do, especially when you might be having to give different types of feedback to 30 different students in a class.
The best sort of feedback, according to Dylan Wiliam, is something that he does not even call feedback, although I would. It is when the teacher provides, not an evaluation of the student’s performance, or criticism, or advice on how to do it better – but another activity, specially chosen to allow the student to focus on whatever is causing difficulty. You could call it remediation; in edtech circles, it might be called adaptive sequencing; Professor Wiliam tends to call it responsive teaching. According to Professor Wiliam, the research suggests that it is twice as effective as evaluation or criticism.
It is worth noting that better use of adaptive sequencing would also resolve many of the problems that have been pointed out with differentiation. This isn’t about putting people in the top set or the bottom set, with all the harmful messaging that that involves. It simply says to the student, “this is what you did last” and it therefore follows that “this is what you need to do next”.
If we recognise the value of adaptive sequencing as a way of implementing formative assessment and ask how difficult it would be to put into practice, we will find that we are brought back to the first of our challenges, which was to provide a well-designed learning activity. Not only do you need a well-designed learning activity to provoke the student to her first action; you need an extensive collection of well-designed learning activities to allow multiple, individualised learning pathways to be provided to different students, depending on their different needs.
So if there is a single thing on which almost everyone in education circles seems to be agreed at the moment, it is that formative assessment is our motherhood and apple pie – and yet no-one seems to be addressing the fact that in practice it doesn’t work, and no-one seems to be asking why it doesn’t work or trying to do anything about it. The truth is that to implement formative assessment at scale represents a massive logistical challenge.
Another bit of motherhood and apple pie, here voiced by Michael Wilshaw, is that “a school is only ever as good as its teachers”. The teacher is the essential, irreducible unit of supply in our current educational system.
I suspect that the reason that politicians say this so often is that it appears flattering to teachers, and it is always a good idea to flatter those on the front line, on whom all the top brass ultimately depend. But in this case the flattery is a poisoned chalice because, when things go wrong (as they often do), you have to conclude from this premise that it’s all the teachers’ fault. There is no-one else to blame.
The assumption that all you need is a good teacher goes back to Socrates. This is Raphael’s School of Athens and Socrates is the ugly guy in the green cloak, top left, talking to a rather coy and effeminate young man, watched by a crowd of onlookers. The Socratic method consisted of a conversation between an expert tutor and a very few students – normally one or two. It was a highly responsive dialogue, completely centered on an all-absorbing dialogue with a single teacher. It is a method that ticks all the boxes for formative assessment: it is highly interactive, the feedback is timely, it is personal, and it challenges the student to further intellectual efforts.
As far as instructional methodologies goes, the Socratic dialogue is still the gold standard. Four days ago, Oxford and Cambridge were rated the best two universities in the world. They both stick to a tutorial system that faithfully implements the Socratic method: one or two students in conversation with a leading expert. There is only one thing wrong with the Socratic method – there are not enough leading experts, and certainly not enough to deliver a comprehensive education system with a set-size of 1 or 2. The method doesn’t scale and is therefore intrinsically elitist.
Tim Oates recently said that the best learning resources were still Nuffield Combined Science and SMP Maths, both of which were created in the 1970s as a response to some of the problems that I have been discussing. It is worth noting, in passing, this extraordinary statement, which suggests that in those fifty years, in which publishing has experienced its most important revolution for five hundred years, the education system has not managed to make any significant improvement to its educational resources at all. It is another indicator that there is a serious dysfunction in our education system.
But the point I want to make about Nuffield is from a book written by Kim Taylor, the headmaster of my old school, Sevenoaks, who went on to become Director of Learning Resources at Nuffield and wrote a book, Resources for Learning, as a justification of the Nuffield programme.
Taylor argued that the real problem with comprehensive education had nothing to do with the principle of selection: the challenge was how you were going to handle scale. He observed that education was an extremely labour-intensive business. It is almost entirely dependent on “workers and overseers” and hardly at all on “machinery and equipment” or the “tools of the trade” that could help teachers do their job.
The problem with this model is that “the craftsmen we need are going to be in scant supply”. And in this he was prophetic – we have ever since suffered from a chronic shortage of teachers, particularly in the most economically valuable subjects, and these shortages are only going to get worse, as does the problem of workload for those who remain. This is yet another symptom of our dysfunctional system.
Worst of all, the way that we organise our scarce and expert teachers is inefficient. Individual teachers are essentially left to get on with it themselves. As Taylor puts it, “There is not much to counterbalance the skill, or lack of skill, of the individual teacher”…
…who will almost certainly fail to achieve the almost impossible task that is expected of him, requiring as it does “too many performances…prodigies of co-ordination, busking his restive audience like a one-man band”.
While the medical service relies on a series of overlapping roles between consultant, registrar, doctor, nurse, technician and pharmacist, allowing supervision, teamwork and on-the-job training; when it comes to teaching, “It is hard to think of any other trade in which such isolation persists”. It was in an attempt to improve this model of the isolated and unsupported teacher that Nuffield produced its programme of course materials that was intended to provide teachers with the “tools of the trade” that they needed to do their job consistently and at scale.
This slide summarizes the position that Taylor describes. At the top, government has overall responsibility for the education service. In the middle are the systematic elements of the service, what Tim Oates refers to as policy instruments: to illustrate this concept I have chosen textbooks, assessment and training – though there are others too, like inspection and curriculum.
At the bottom are teachers, who in our present system are isolated and unsupported, generally being expected to get on with things themselves, determining their own objectives, devising their own course materials, their own formative assessments, being the main source in the classroom of authoritative feedback, and often being resentful of what they see as unhelpful interventions in their private domain by education authorities. As we have already seen, these expectations are unrealistic, given the logistical complexity of the task that teachers face. It is these unrealistic expectations that result in a service with an extraordinarily inconsistent performance, as well as high workload and stress for teachers, which aggravates the wastage of staff and the overall under-performance of the system.
Tim Oates, like Kim Taylor and the Nuffield Institute in the 1970s, argues that we should place more emphasis on improving these systematic elements, and particularly textbooks, not in order to replace teachers but to give them more support.
Oates makes this argument by looking at the international comparisons. When asked whether they base their teaching on a textbook, only 4% of UK Science teachers and 10% of UK Maths teachers answer yes. But in jurisdictions at the top of the PISA tables, the results are very different. In Singapore, the equivalent answers are 68% ad 70%, and in Finland – a country often cited for the quality of its teachers – the answers are 94% and 95%.
This slide shows not just an association between the use of textbooks and high-performing education systems – it also indicates the prejudice against textbooks in the British profession. A teacher who relies heavily on the textbook, it is widely assumed, is a bad teacher. And if it is a bad textbook, they might be right.
This deprecation of textbooks is supported by the widespread view, strongly advocated by Dylan Wiliam, that the teacher’s expertise is at heart a matter of intuition, which cannot therefore be systematized..
Wiliam bases his argument on three pillars.
First is what in ancient Greek is called phronesis, translated as “practical wisdom”. This is taken from a recent interpretation of Aristotle by Bent Flyvbjerg, which is used to argue that:
- first, the expertise of teachers consists in the determination of your own objectives (that’s why this is about wisdom and not just technical ability);
- second, that this can only be done on the basis of your own private experience, which has primacy over any abstract theory.
The second pillar is Polanyi’s theory of tacit knowledge. This is the sort of knowledge that we can’t read in a book but that we have to develop ourselves, again, through our own private experience. Polanyi talks about maxims. These can be read in books but they only make sense to those who are already possessed of the knowledge of the art. Maxims can help codify or re-enforce our understanding but only when we have already acquired the basic knowledge through private experience.
Third is Csíkszentmihályi’s [pronounced “cheek-sent-me-high’s”] theory of flow, or “being in the zone”. This is the idea that to be really good at something, we often need to stop trying so hard. We need to relax into a state of proficiency.
All three of these theories come together, as I see it, to give a good account of performance art. A concert pianist does not think about what all her fingers are doing on the keyboard: that is tacit knowledge that she has already acquired by a lifetime of practice. At the time of performance, she forgets the mechanics and focuses instead on the end goal – the particular sort of artistic expression that she is aiming for. She depends on automatic, non-conscious skills, focuses on ends rather than means, and relaxes into a state of proficiency.
This might be a helpful way to think about teaching if you are a teacher working in the current system, in which your are almost entirely dependent on your performance in the classroom. And it will continue to be a helpful way to think about teaching if you think that teaching will remain a performance art. If you think that Michael Wilshaw is right, that it is all about the individual, front-line teacher and her performance. If you are happy with our highly decentralized organisational model, which depends on isolated and largely unsupported teachers. If, in placing such an emphasis on individual teachers, you are prepared to tolerate the extremely inconsistent performance of the system as a whole. In that case, you will be happy to accept an account of teaching which sees it as a kind of performance art.
But if you agree with me that we should be aiming for more consistent outcomes; if you agree that formative assessment makes logistical demands that isolated and unsupported teachers are unable to meet, then you may think that such a focus on the performance of teachers is unhelpful. Then, you may agree with Kim Taylor, that we place an expectation on teachers of “too many performances” and “prodigious feats of co-ordination, busking their restless audience like a one-man band” – and that these demands for “too many performances” do not always make teaching a very attractive proposition for practitioners, either.
These are the reasons for rejecting Professor Wiliam’s emphasis on intuition: it doesn’t work; it doesn’t lead to consistent outcomes; it cannot meet the logistical requirements of teaching at scale; and it doesn’t create the sort of a satisfying working environment that will attract and retain high quality teachers.
It is also relevant that Flyvbjerg’s interpretation of Aristotle, on which much of the theory is depends and which has been taken up by a series of contemporary educationalists, including Frank Furedi and Gert Biesta, is based on a serious misunderstanding of Aristotle. I have dealt with this issue in two of my blog posts, Aristotle’s phronesis misunderstood and Flyvbjerg, phronesis and the expertise of teachers. If you ask who am I to say that I am right and half of the professors of education in this country are wrong, then my argument has at least been endorsed by Professor Kristian Kristianson, Professor of Virtue Ethics at Birmingham University and an Aristotle expert, who commented on my blog that “I am very impressed with your rebuttal of common misunderstandings of phronesis in educational circles…I have made many of the same points myself”.
It is not that anyone here is necessarily that interested in Arisotle. The reason that our understanding of Aristotle matters is because what Aristotle said was right. He uses the example of a saddle-maker. The saddle-maker’s expertise lies in how to make a good saddle. It doesn’t consist in what constitutes a good saddle – that is part of the expertise of the cavalryman, the saddlemaker’s customer, who knows what is needed to ride a horse into battle. And while the cavalryman’s expertise lies in how to fight from horseback, it is not the cavalryman but the military general whose job it is to know what constitutes an effective cavalry charge. At each stage, the expertise of the supplier of a service lies in how to deliver that service but it is not for them to determine why. And it is the same for teachers. What we should teach in an academic schools’ maths curriculum is for the university maths professor to say – the next link in the supply chain; what skills are needed to boost the employability of students is for employers to say – the ends of education are for others to decide, not teachers. To argue from Aristotle’s theory of phronesis that every teacher should decide on objectives independently is to turn Aristotle’s theory on its head. And it is to turn common sense on its head too.
So we have seen that the emphasis on tacit knowledge and intuition, which Dylan Wiliam advocates, justifies a highly decentralized model of provision that is incapable of meeting the logistical demands of effective forms of formative assessment. We have also seen that the conflation of means and ends, that such a doctrine of intuition implies, is inconsistent with basic ethical principles and a proper understanding of the nature of expertise.
This brings me at last to consider curriculum, which is, after all, the subject of this talk.
A good place to start is with this true and important statement by Professor Wiliam, that “the word curriculum has no generally agreed meaning”. I started by saying that our educational system was dysfunctional – and I think this is another good indicator of that dysfunctionality. Saying that educationalists have not yet agreed what they mean by “curriculum” is a bit like saying that physicists have not yet agreed what they mean by “acceleration”. How can you build any coherent body of theory when you have not even defined your most elementary terminology?
I would suggest that there are at least four common meanings which people give to the word “curriculum”. When you hear politicians use the word, perhaps in House of Commons Committee rooms, they are generally referring to what is on the school timetable. If you ask “is ‘citizenship education’ in the curriculum?”, you mean “do students get any lessons in citizenship”. That is fairly uncontroversial but it is dealing the issue at a high level. And it re-enforces the model in which education is delivered by isolated and unsupported teachers. Yes, Jo Bloggs gets one period a week on citizenship with Mrs Jones – and he goes to the right classroom and the right time and we shut the classroom door and at that point the school authorities wash their hands of the whole business: what happens after that is up to the teacher, relying on intuitive judgments and personal experience, to decide both the detailed ends and the methodology that will be deployed to attain those ends.
If we delve any deeper, we find there are two radically different understandings of what the curriculum is.
The first I would define as “an aggregation of learning objectives” and when I talk about “learning objectives”, I am talking about the knowledge, skills, and understandings that we want the students to acquire. In my view, this is the most useful definition and the one that matches most closely our common-sense understanding. The curriculum is “the stuff that you are being taught” and by “stuff”, most people mean “knowledge, skills and understandings”.
But in the last fifteen or twenty years, this definition has been rejected by educationalists. According to Wiliam, it started with Stenhouse, who argued that setting predetermined learning objectives was reductive. It prevented students acquiring skills of originality and creativity and developing according to their own lights. We should instead give students experiences and let them make of those experiences what they like. In a phrase that is often repeated by Professor Wiliam, Stenhouse argued that setting predetermined objectives “made the teacher an intellectual navvy, knowing where to dig trenches without knowing why”. In fact, the opposite is the case: it is only by setting pre-determined objectives that you can communicate why you are doing anything at all. Without objectives, there is no reason to dig any trenches at all and there is no criterion by which you can judge whether the trenches you have dug are the right ones or not. Without objectives, the whole of education becomes, quite literally, aimless.
The argument about intellectual navvies also conflates the matter of understanding the objective and deciding on the objective. It is very important that everyone in a business understands why they are doing what they are doing, so that they can respond to unforeseen events and optimize their performance. But this is not the same as saying that it is for the supplier, still less for the individual worker, to decide the purpose of the work being done. That would be like telling everyone to dig trenches wherever they like, which is not a good way to build a railway.
The upshot of this very flaky argument, according to Wiliam, is that we should “specify content rather than objectives”.
The frequently-used word “content” is another indicator of a poorly defined terminology. It is almost completely meaningless. The contents of a handbag are quite different from the contents of a speech; the contents of a book are different from the contents of a lesson, which are different from the contents of a assessment, which may or may not be different from the contents of a curriculum (depending on what “curriculum” is taken to mean: according to my definition, curriculum content is an aggregation of objectives). Never talk about content unless it is clear what the stuff you are talking about is contained in. But let us assume that by “content”, Stenhouse means “content of a lesson” and that this means some sort of activity.
And so it is that the modern, official meaning of “curriculum” has morphed from an “aggregation of learning objectives” – the knowledge, skills and understandings that we want students to acquire; into a “programme of planned activities”. This is the sense that Ofsted uses the term when it inspects schools on their curricula. And it is the sense in which the term is used by the 1999 National Curriculum, which states [p10] that “The School Curriculum comprises all the learning and other experiences that each school plans for its pupils”. This extraordinary statement goes so far as to suggest that learning itself is a type of experience and does not therefore describe the acquisition of knowledge, skills and understanding. Such is the Alice-in-Wonderland world into which the whole education system is plunged when we are not clear about the meaning of the words that we use.
The change in meaning of the word “curriculum” has left everyone confused. Both Dylan Wiliam and Tim Oates, who were on the expert panel advising Michael Gove on the curriculum in 2012, both believe that “the National Curriculum is not really a curriculum at all” – because the National Curriculum is a statement of knowledge, skills and understandings and Wiliam and to some degree Oates both believe that a real curriculum should be a programme of planned activities. Their report to the Secretary of State used the word “curriculum” in both senses, almost indiscriminately. And it is clear, when you listen to the House of Commons Education Select Committee taking evidence from educationalists, that for most of the time the politicians have not got a clue what the educationalists are saying, because they are talking in an convoluted, inconsistent, private language.
In supporting a definition of curriculum as a “programme of planned activities”, Dylan William emphasizes that “curriculum is pedagogy”, pedagogy in this context meaning the method of instruction (in contrast to many Marxist educationalists, who understand pedagogy as an approach to education that is based on political ideology). It is important to note that this formula conflates the ends and means of education. We don’t really have a word any more that describes our educational objectives – all we talk about is what we do, the activities we plan, the means we employ. As the purpose of these activities is undefined – or at best, determined by the individual intuition of hundreds of thousands of different teachers – there is no way of determining whether those activities are well chosen or effective. There is no answer to the question “what works?” because there is no shared statement of what we are trying to achieve.
So the really important confusion is created by definitions 2 and 3: an aggregation of learning objectives and a programme of planned activities.
Tim Oates, whose view on this matters because he was the Chairman of the Expert Panel for the National Curriculum Review in 2011, Tim Oates references a fourth definition, according to which the curriculum is an aggregation of policy instruments – and by policy instruments, I am talking about that middle layer in my organisation diagram: training, textbooks, assessments, inspection etc. The curriculum, according to Oates, is more than just a programme of planned activities; it is everything we do to deliver our educational services.
That is what he explained in a talk at ResearchEd 2016 called “Why curriculum matters” – to which this talk is a direct response.
A lot of Tim Oates’s argument about the curriculum hinges on the term “curriculum coherence”, which he has taken from Professor Bill Schmidt, who worked on the data from the TIMSS Maths tests in the 1990s. You will see a lot of talk about “Bill Schmidt’s work on curriculum coherence” not just in what Oates writes but also, for example, in the Commission for Assessment without Levels. But it is clear that, just like the games of academic Chinese whispers that Tooley observed in 1998, no one actually understands what Schmidt meant by “curriculum coherence” because everyone is copying the explanation given by Oates, who got it wrong.
Describing “curriculum coherence” as “a highly precise technical term” [Could do better, p4], Oates explains that it has two key characteristics.
- First, “content” (whatever that is) is arranged in a way that matches age-related progression. Six year olds should either learn the things that six year olds are ready to learn or they should do the sorts of activities that six year olds are ready to do.
And in passing, I will note that there has been a lot of criticism of this sort of approach, which rests in part on the work of Jean Piaget and his idea of readiness. Maybe some people can do at 6 what other people have to wait until they are 12 before they can manage. Any idea of age-related progression seems to have the effect of holding back our brightest students and giving an excuse for low expectations.
- Second, Oates observes, the coherent curriculum is one in which all the elements of the system, or policy instruments, line up. This is where we get the fourth definition of curriculum, which is an aggregation of policy instruments which, insofar as they are coherent, will be aligned.
To justify this definition, Oates references Schmidt’s 2006 paper, Curriculum coherence and national control of education.
But in this paper, Schmidt specifically denies that curriculum coherence is principally about the alignment of instruments. “Most of the studies”, he says, “have defined curriculum coherence as ‘alignment’. This is an important criterion, but we argue that…it is not a sufficient one”. You might say that Schmidt is allowing that alignment is at least part of the story – but this would over-interpret Schmidt’s begrudging concession that alignment is “an important criterion”. At best, I think that he is suggesting that alignment is a necessary precondition of any discussion about curriculum because in the absence of any clear statement of educational objectives, those objectives can be inferred by observing what people are actually doing. If you visit a rifle range and cannot see what the targets are, you might lean over someone’s shoulder, look along their sites, and see what they are aiming at. In that way, educational objectives might be inferred by looking at textbooks, on the assumption that the two are aligned. If you read the whole of this paper, which is the one that is always cited by Oates and those he has influenced, you will understand that it is not concerned to define what curriculum coherence is, but only to ask whether central state control is necessary in order to achieve it. In this sense, Oates is referencing the wrong paper. It is not in the 2006 paper that Schmidt gives his account of curriculum coherence, but in two earlier papers.
In his 2002 A coherent curriculum, Schmidt explicitly defines “standards and curricula as coherent if they are articulated as a sequence of topics and performances that are logical and reflect the sequential or hierarchical nature of the content”; and he makes clear that he is not using coherence to mean alignment by referring to the issue of alignment separately, complaining, for example, that “American students and teachers are greatly disadvantaged by our country’s lack of a common, coherent curriculum and
the texts, materials, and training that match it” [i.e. are aligned to it].
Note that in America, “standards” are what we would call criterion references or statements of attainment – in other words, learning objectives. So what Schmidt means by “curriculum” is not activities and certainly not policy instruments, but learning objectives – the knowledge, skills and understanding that is revealed by the student’s performances. Note too, Schmidt’s use of the word “performances”, which I shall come back to.
There is no mention here, nor anywhere else that I can find in Schmidt’s papers on this subject, about “age-related progression”. What he is really talking about is the intrinsic structure of the knowledge itself.
In this 2002 paper, and his earlier 1997 paper, A splintered Vision, Schmidt criticised the American maths curricula as comprising “long laundry lists of unrelated topics”, “a mile wide and an inch deep”. The point about this, the best-known of Schmidt’s analogies, is that a thin film of water has no structure. Curriculum coherence is all about the way that your different learning objectives are structured and how they relate to each other.
And the best way to start building that structure is to identify the really important, organising principles.
Dylan Wiliam gives what to my mind is the correct explanation of curriculum coherence. It is, he says in his Principled Curriculum Design, about “the internal logic of each discipline or subject” whose structure will best be revealed if you identify the “big ideas” of that subject. So far, so good. And he goes on to say that identifying these big ideas is “a very difficult task, requiring profound subject knowledge” as well as “substantial teaching experience”. I completely agree. But, first, let’s note that we are now talking about the curriculum as an aggregation of learning objectives and not, as Wiliam earlier argued in the same book, as a programme of planned activities. So one of my principle criticisms of Wiliam is inconsistency in the way that he defines a word to mean one thing and then uses it to mean something else.
Second (and remembering Wiliam’s emphasis on private intuition) why do we expect hundreds of thousands of individual teachers to come up with their own different accounts of the curriculum when it is clear from what Wiliam says that they are not qualified to do this? How is the Maths teacher who is being asked to teach Computing on the basis of a pretty superficial understanding of the subject – or the history teacher who is covering a new topic by staying a few pages ahead of the class in the textbook – how can it be reasonable to expect these teachers to complete this extremely challenging task, which must be done on the basis of “profound subject knowledge”.
And why do we need to ask tens of thousands of front-line teachers to reinvent this wheel, probably badly, over and over again? We do not need students in Hull to learn a different sort of Maths from students in Croydon. There is no good reason at all to adopt such an inefficient way of devising coherent curricula. These processes need to be centralised.
The difficulty we have if we want to centralise the process of curriculum design (understanding curriculum as an aggregation of learning objectives) is that the way that we have attempted to describe our learning objectives through criterion references, or rubrics, has been shown to be unreliable.
Anyone who has been attending these conferences will probably have heard from Dylan or Daisy about the inadequacy of rubrics. This argument has frequently been made by reference to an imaginary rubric which might require students to “compare two fractions and identify which is larger”. Research from the early 1980s shows what is predictable enough, that the difficulty of this task is entirely dependent on which fractions the student is asked to compare: 90% of 14-year-olds can tell which is larger of 3/7 and 5/7 – but only 15% of 14-year-olds can tell which is larger of 5/7 and 5/9. The research doesn’t say what proportion of 14-year-olds can tell which is larger between 3/7 and 5/9, where both numerator and denominator are different – perhaps only 1% or 2%. The point is that the rubric “compare two fractions and identify which is larger” gives no indication whatsoever of the difficulty of the task being prescribed or the level of understanding required by the student.
The first thing to be said about this argument is that the rubric “compare two fractions and identify which is larger” does not come from any real curriculum. It was invented by Dylan Wiliam specifically for the purpose of illustrating what a really bad rubric might look like. That is fine if you are illustrating a potential problem with rubrics. But to argue from that example that rubrics are intrinsically inadequate is a bit like saying that because you can’t fly to New York on that table, then it is pointless to try to fly to New York at all. It is a really bad argument. If this rubric is so bad, we should be asking ourselves how we write a better rubric – or better still, what else we can do to describe and communicate our learning objectives.
Both Dylan and Daisy will give you the answer if you ask them. They will say that learning objectives need to be exemplified.
And this was the original recommendation of the 1987 Task Group for Assessment and Testing, chaired by Professor Paul Black, which informed the 1988 Education Act and the introduction of the first National Curriculum. TGAT said that all attainment targets should be carefully exemplified. The problem with the National Curriculum and our recent history of criterion referencing, is that the recommendations of the TGAT report were not followed. And the problem with the argument being made by Dylan and Daisy is that they do not seem to recognise that exemplification is a way of describing and communicating learning objectives across the education system. Professor Wiliam recognises the importance of exemplars as a way for teachers to communicate a learning objective to their own children and recommends they always try and offer at least a couple of exemplars to their class. But this just gives yet another task to the isolated and unsupported teacher to complete and doesn’t help the consistency of standards across the system. Exemplification is not recognised as a systematic response that will allow for the central definition of learning objectives. That, it is assumed, can only be done by rubrics, which they dismiss as inadequate.
Instead of suggesting that we can clarify rubrics by the use of exemplars, Daisy Christodoulou argues that exemplars should replace rubrics. She proposes a way of moving beyond the rubric might be a method of assessment called comparative judgement. This requires that teachers develop their own tacit appreciation of their own educational objectives (what Dylan Wiliam calls their “nose for quality”) by comparing a series of pairs of student work and in each case deciding which is better of the two. After using this technique on a series of different pairs, teachers will have created a rank-order of quality across the group, which will have created a set of exemplars of what good looks like and what it doesn’t look like.
I am not against comparative judgement. Nor do I disagree with Professor Wiliam’s concept of a “nose for quality” – a phrase that sums up the idea that teachers need to internalize their appreciation of “what good looks like”. My problem with this campaign is that comparative judgement is being presented as an alternative to criterion referencing and the explicit communication of learning objectives between teachers.
What happens if one teacher comes up with a different rank order to another? It will probably be because different teachers are implicitly ranking the work according to different criteria.
At this point, I think I need to acknowledge that there is a political or ideological resonance to Christodoulou’s campaign – and one that I am sympathetic to. The implementation of the TGAT report in the late 1980s was influenced by left-wing education advisers, particularly in the Inner London Education Authority, who interpreted criterion references as binary checkboxes: either you had “got it” or you hadn’t. And if you were very clear about what your criteria were and you made sure that the criteria closely matched the natural age-related development processes of children and you were half-way competent in your teaching, there was no reason why almost all of your students should not end up “getting it”. This promised an egalitarian education system in which there was a realistic hope of prizes for all.
By insisting that you cannot understand what good looks like until you have put your students in rank order, Daisy is challenging this egalitarian ethic. To that extent, I agree with her. Education is intrinsically inegalitarian because it is about making people better. Any value-system that places undue weight on avoiding inequality will always tend to lower expectations and be corrosive to educational endeavour.
But this political point is not a reason to throw out criterion referencing, which was misinterpreted when it was assumed that it was about binary check-boxes. If you were judging a dance in Strictly Come Dancing, you might analyse a performance against several different criteria such as technical footwork, accurate timing, fluency of movement and artistic expression; and none of those criteria can be represented as binary check-boxes: they are all represented by marks out of 10, in which some might excel while others are merely competent.
Without any generally accepted definitions of learning objectives, comparative judgement creates a system in which teachers develop entirely private understandings of what good looks like. This re-enforces all our current problems of inconsistent performance, isolated and unsupported teachers, and a failure to develop centralised responses to educational requirements.
When the outputs from such as system need to be accepted as evidence for the purposes of statutory teacher assessment, Daisy’s answer is that they must be based on inter-school moderation that will establish standardised measures of performance. But moderation is potentially time consuming and needs to be approached with caution.
That was the conclusion of the 1994 Dearing Review into what had by then been recognised as the car crash of the early implementation of the 1988 National Curriculum: “Great care must be taken to assess the ‘opportunity costs’ of any moderation system. We must balance the need for objective scrutiny of the marking standards in individual schools against the very considerable cost in teachers’ time that such a system inevitably involves”. That advice should be taken particularly seriously in the light of our current workload crisis.
Daisy addresses the issue of workload by promoting the Sharing Standards scheme, a systematic process of online moderation, managed by digital analytics systems. In this diagram, taken from a sample report sent to a subscribing school, the range of standards of work submitted as evidence for KS2 writing is compared to the standards of writing examples produced by other schools in the scheme. In general terms, I think this represents exactly the right way to go: automated, centralised systems that support front-line teachers while minimizing workload.
But if you acknowledge that this is a system based on exemplars, you have to ask “exemplars of what?” And the answer is that we are exemplifying only the very highest-level learning objectives: in this case “writing ability”. The system will not tell you what aspects of writing ability teachers in other schools might value more highly than you do, nor will it help you ensure that the order in which you rank your students is more closely aligned to the way that other teachers would rank those same examples. It is not going to help you improve the quality of your formative assessment. It is not going to help classify different types of intervention designed, for example, to teach better paragraph structure or the use of metaphor. To paraphrase Dylan Wiliam and Stenhouse, it will tell you how many marks to give to your students but not why.
At the other end of the Christodoulou’s spectrum of recommendations, is the setting up national banks of question items. This also became a recommendation of the Commission for Assessment without Levels.
Associated with the promotion of national data banks of question items is Christodoulou’s advice that teachers track student performance against their answers to such individual questions and not against meaningless rubrics. These two screen grabs are taken from the online video of her presentation to Research Ed 2015.
Christodoulou tells teachers not to track students against criteria such as whether they are able to ask and answer questions or whether they can make inferences when reading texts; instead, they should track whether their students can answer question item 10,341 in the national item bank: “What is the verb in the sentence ‘I run to the shops’?”
So we are left in a situation in which we can measure whether a student is judged to be good at writing and whether they can answer question 10,341, but nothing in between.
I believe it is useful to envisage the curriculum as a hierarchy of learning objectives, with concrete & specific objectives at the bottom (knowing which is the verb in the sentence “I run to the shops”) and high-level objectives at the top (being good at writing). In this model, the intermediate objectives allow a smooth progression from the mastery of tightly defined procedures and factual knowledge at the bottom up to the general skills and dispositions that allow people to respond to real-world requirements. They provide the definitions that Bill Schmidt argues we need in order to provide a structured, coherent curriculum, and the means to understand how our programmes of study need to be sequenced.
But it seems to me that in Daisy Christodoulou’s model, in which we assess general writing skills on one hand and responses to individual questions on the other but disregard any intermediate statement of objectives, the whole of this important middle piece of the curriculum is lost. In this model, the curriculum has no structure, no chance of coherence, and there is no opportunity to create digital tools that will help us manage progression and sequencing, or aggregate individual results to improve the reliability of our assessments. It is a model to make any data scientist weep.
I acknowledge, of course that the criteria that Daisy cites, such as “Can ask and answer questions”, are meaningless. But just because we have created bad criteria in the past – shockingly bad criteria at that – is no reason why we cannot create good and meaningful criteria in the future. Just because we can’t fly to New York on that table is no reason why we can’t fly to New York in a 747.
This raises the question of how we are going to do in the future what we have failed to do properly in the past.
We have already established that we are going to create meaningful learning objectives by exemplification. That raises the next question, which is what is the nature of the examples we are going to use? And the answer is that you are going to produce examples of student performances, the same word that was used by Bill Schmidt when talking about the specification of curricula in terms of knowledge, skills and understanding.
This gets us into another misleading argument that is made by Daisy, Dylan, Warwick Mansell and many other educationalists, that we should not rely on performance as an indication of knowledge, skills and understanding.
The argument that is most commonly used is taken from Robert Bjork, who points out that “instructors frequently misinterpret short-term performance as a guide to long-term learning”.
Dylan Wiliam makes the point by saying that “psychologists tend to…[distinguish] between performance and learning. Performance is what we see when someone is being taught how to do something, while learning is defined…as ‘a change in long-term memory’”. This distinction is clearly false. It is simply not true that performance “is what we see when someone is being taught how to do something”: performance can occur at any time.
Indeed, there are many reasons why we need to require students to produce repeat performances, many of them being summed up by the phrase “spaced learning”: the likelihood that mastery will decay over time; the need to learn to apply principles in a variety of different contexts; the need to “put it all together” by applying isolated skills in more complex and what are often called “authentic” situations; the need progressively to withdraw teacher support or “scaffolding”. An additional reason, highlighted by the work of Bjork and other psychologists, is that learning can occur through processes of consolidation and delayed reflection, which may occur long after the receipt of instruction or participation in learning activity – and the effect of those delayed processes needs to be measured, and probably needs to be re-enforced, by delayed or spaced performances. Where student performances are regarded as opportunities for assessment as well as for teaching, it is also relevant that we need to build up reliability by repeat sampling.
All this emphasizes the fact that performance does not only occur during instruction, but can and should be repeated over extended periods, often well after instruction; or that instruction should not be regarded as a short episode, but rather an extended process, often revisiting familiar territory from slightly different perspectives.
It is also of no help to define learning as “a change in long-term memory”, claiming that this is something different from performance, because we cannot observe long-term memory directly – we do not even really understand what it is. Everything we know about someone’s long-term memory is inferred from our observation of their performances.
Bjork is right that we should not confuse short-term or isolated performances with long-memory and he is right that the relationship between an individual performance and the internal process of learning may very well be complex and counter-intuitive. But this is not the same as saying that that repeat performances made over an extended period of time are not a very good guide to long-term memory. Indeed, it is the only guide that we ever have to long-term memory: that is, to knowledge, skills and understanding.
And that is why Bill Schmidt defines the curriculum – the knowledge, skills and understandings that we wish our students to attain, in terms of the performances that we expect to see in the case that they have mastered those learning objectives. And it by the exemplification of performance that we can describe that curriculum.
Let me conclude this discussion of performance by summarising the two very important reasons why we should not mistrust performance.
First, because it is by performance that we learn. Practice is a form of performance and we learn new skills by practice. It is true that our active practice needs to be accompanied by internal processes that we can call consolidation or reflection, and that these internal processes may occur at different times. But these internal processes are the consequence of performance and practice, not an alternative to them.
Second, because the stuff that we want to teach – knowledge, skills and understanding – are invisible. There is no way that we can get inside people’s heads and determine what they know, what they understand and what they can do. The only way that we can know any of these things is by observing their performance. And so all of the different ways that we characterise our learning objectives – knowledge, skill, understanding, perhaps even attitude and physical development – can all be reduced to one word: capability. Capability is a disposition to perform. When we attribute a capability to someone, then we are predicting the types and standards of performance that they will produce in certain situations; and it is only by observing those performances over an extended period that we can infer that they have a certain enduring capability. If that is the only way we can infer knowledge, skills, understanding and attitude, then that is the only way that we can describe and define knowledge, skills, understanding and attitude (and if that conclusion seems odd, have a look at my discussion of logical positivism in The elephant in the room and Choose your paradigm).
And yet, in spite of the central importance of performance to any coherent theory of education, its importance is regularly denied by our leading educationalists.
There are several important conclusions that need to be drawn from this discussion. I have already made the argument that the attack on criterion referencing is unfounded because those making the attack have discounted the importance of exemplification – by which I mean the referencing of performance – as a way of describing learning objectives accurately. If you take Dylan Wiliam’s example of the ability to compare fractions, the problems with this objective would be revealed immediately, and the means to resolve the problems would at the same time be suggested, by the observation of the inconsistency of student performance across a a comprehensive body of exemplars.
Second, we have established that our observations of different performances need to be aggregated over time across different but related contexts. And it is by the accumulation and corroboration of different observations of performance that we achieve reliability in our assessment data. When Daisy Christodoulou bases her argument against our current methods of testing on the work of Daniel Koretz’s book, Measuring Up, neither Christodoulou nor Koretz makes any mention at all of the importance of building reliability by aggregating results. This is a significant intellectual failure on the part of the educational establishment in general: they have missed the most important point. Relying on single shot, high-stakes summative exams is like running a political poll by asking a single person what they are going to vote. And it is the intrinsic unreliability of single-shot exams that forces the exam boards to try and compensate by building formulaic and reductive exams, that gets us into the difficulty of teaching to the test. No-one is addressing this problem.
The aggregation of assessment data is entirely dependent on our ability to compare apples to apples. Averaging out a students ability to perform trigonometry and to write a moving poem will not be very useful. We need to classify the different tasks and questions against the learning objectives that they address. The retreat from criterion referencing means that such aggregation of data becomes impossible. At the same time, the recommendations of Christodoulou, Wiliam and the Commission for Assessment without Levels, that the results of regular formative tests should be discarded means that they will not be available in any case to build greater reliability in our assessment system. The conclusions of the Commission for assessment without levels are precisely – 180 degrees – wrong.
Third, perhaps the chief reason why teachers don’t understand the vital importance of aggregating results from different performances is the additional logistical complexity that this requires, in addition to all the logistical complexity that we have already discussed in the case of formative assessment. What is the point of talking about what is beyond the capabilities of isolated and unsupported teachers to achieve? We don’t have the right tools of the trade and the academics and thought leaders in the profession, who base their views on their observations of existing practice, do not understand what these tools might be or why they are needed.
So why is “curriculum” in particular such a problem?
First, because we talk about it all the time without knowing what it means.
Second, if we accept (as in my view we ought to do) that the curriculum should describe our learning objectives, then we have lost confidence in our ability to describe those learning objectives.
Third, the reason we have lost confidence in our ability to describe our learning objectives is because we have lost sight of the fact that learning is closely related to performance, both because it is through practice that we learn, and also because it is through the observation of performance that we define our objectives and measure our attainment.
And that is why the discourse of educationalists constantly falls back on a sort of romantic, mystical mumbo-jumbo about teacher judgement, intuition and tacit knowledge – which amounts in practice to an apology for the dysfunctional system in which teachers are currently trapped.
It is not enough to hand out tips and tricks to teachers on how to survive in a dysfunctional system. We need to fix the system. And systems can only work when we have clearly defined objectives. And the curriculum is what ought to define our objectives.
This is the slide which I used earlier to visualize our isolated and unsupported teachers at the bottom, and our inadequate and uncoordinated policy instruments in the middle. And I suggested that calling these elements “policy instruments” was itself unhelpful, because it suggested a top-down bureaucracy, it suggested that they were levers of power rather than elements in what ought to be a self-regulating system, not dependent on constant tinkering from Whitehall.
I agree with Tim Oates that if we are to achieve the sort of system reform that we need, then the alignment of these different elements is important. But alignment is not enough. When we are dead, we will have no heartbeat, no breathing, no brain activity, no reflexes: all our body functions will be perfectly aligned, but we’ll be dead. Our education systems, as well as being aligned, also need to be well designed in their own rights, and that means that there must be an opportunity to innovate. Innovation means change, and when you start to introduce change, things can often fall out of alignment. So there is at the very least a tension between innovation and alignment. We need to ask not just whether our textbooks are aligned with our assessment and our assessment is aligned with our curriculum – but how this alignment is achieved and whether it is helping us to design good individual processes and technologies.
One way of achieving alignment is by handing the whole shooting match to a single person or organization, who will make sure that the official textbooks, the official assessments, the official curriculum and the official training are all perfectly aligned. It could be the DfE or, more likely, some sort of outsourcer. It could be Cambridge Assessment, Tim Oates’ employer.
The problem with this solution, and the problem with monopolies in general, is that they suppress innovation and therefore reduce quality. There is no room for someone to come along and create a new curriculum, which better serves the needs of the modern world; or new textbooks which are more effective at supporting teachers in teaching an existing curriculum. Alignment in this model creates a sort of gridlock. And it tends to make it difficult, maybe impossible, even to perceive that there is a problem in the system because there is only a single source of truth. It would be a bit like living in the middle ages when it was impossible to challenge what was said by the church because they owned all the books.
If you think I am exaggerating the problem here, listen to Tim Oates talking about on the DfE’s YouTube channel about Assessment without levels. He says that the Expert Panel found three coexisting models of assessment and he runs through the very significant problems with each of those models. Compensation-based tests – in other words, the SATS test – awards a level 3 based on the number of marks you get in the test, even though you might have got all your marks from the level 2 questions and the level 4 questions and you got all the level 3 questions wrong. Best fit has similar problems – you might award a student level 3 because she is too good for level 2 and not good enough for level 4, even though she has serious gaps in her attainment judged against level 3 criteria. And threshold, when you award level three because you can find evidence of level 3 responses, is even worse – because you tend to award level 3 when a student is only just over the threshold of level 3 and still might have very large gaps in level 3 understanding.
All three of these assessment models are highly inaccurate in their different ways.
Now, if all three models were accurate, they would produce the same result. But because all three are inaccurate and they are all inaccurate in different ways, they produce different results.
Extraordinarily enough, Tim Oates argues that the inaccuracy of our assessment models isn’t really the problem. The only problem is that there are differences between the models which creates a disagreement . The problem is the coexistence of different models.
This is dangerous talk. The fact that the different models produce different results tells you that your assessment data is inaccurate – and that is an important thing to know. In my view, all assessment data should be accompanied by an indication of confidence. The main reason why this does not happen is that the authorities do not want to admit how low those confidence levels would be. But Tim Oates would be perfectly happy with this level of inaccuracy, so long as it nobody can see that it is inaccurate, which would be the case if there weren’t any alternative assessment models to compare it with.
This attitude illustrates the danger of a doctrine that emphasizes alignment above quality and ignores the vital importance of achieving reliability by aggregating and corroborating data, rather than trusting a single dataset, just because it is deemed to be authoritative.
Perhaps we shouldn’t be too shocked by Tim Oates’ totalitarian approach to assessment because this is what already happens. When it comes to public exams, there is a single source of truth which we know is very unreliable but we just have to put up with it. And the market for textbooks is uncompetitive because the most important thing to do if you are a textbook publisher is to get your textbooks endorsed by the main exam board. So the whole thing is run as a cabal and the excuse produced by all monopolists and all totalitarian regimes is that everything is very well aligned.
What we need is a system that encourages alignment while at the same time encouraging innovation and competition.
The only way, it seems to me, to achieve this is to define our educational objectives – in other words, our curriculum – and then allow a free market in the provision of all the other elements of the system.
The proper way to achieve alignment is for our means to be well aligned to our ends. The measure of quality for the market will be the ability of these other elements to assist students in the attainment of these predetermined objectives. By being aligned with the objectives, all the elements of the system would tend to align with each other. At the same time, there would be no restriction on innovation.
We have already seen that the way to define learning objectives clearly is by exemplification and that what we are exemplifying is performance and that performance is measured by assessment.
As assessment is one of the key system elements, and we require our assessment to be aligned with our learning objectives, and our learning objectives are themselves defined by reference to assessment outcomes, there is also a need to ensure that assessments by which students’ attainment is measured are well aligned with the exemplars of assessment by which our objectives are described.
This can be achieved by digital analytics systems, which can compare assessment data at scale and can ensure that consistent standards of performance are being maintained in respect of multiple learning objectives.
Instead of this very hierarchical, top-down model of our education system, with isolated and unsupported teachers on the front-line…
We would instead have a system in which the government can supervise from a distance what is essentially a self-regulating and self-optimizing system.
Once we have the means to describe our learning objectives clearly, then we can devise processes to ensure that we can create the different curricula that different students require, responding to a wide range of inputs from different societal stakeholders. And in this way the democratic accountability of the education service and its relevance to the requirements of the modern world will be enhanced.
Once those objectives are more clearly described, we can have an open competition between educational suppliers to provide the different system elements – the training, the textbooks, the digital technology, the assessments – providing the tools of the trade that front-line teachers need in order to cope with the problem of scale, which is the key challenge for modern education systems. Teachers will not be marginalized or de-skilled by this provision of technology, any more than a surgeon is de-professionalised by being given a well-equipped operating theatre in which to work. Teachers will be put in a more powerful position to manage the education of their students, well supported by the tools of the trade that they need, and with the time to focus on the building of relationships with their students and maintaining mastery of their subjects.
In order to achieve this transformation, we must start by defining our terms in a way that is clear and consistent. There is nothing more revealing about the intellectual poverty of our education theory than the Babel of inconsistent meanings that is given to some of the most basic terminology that teachers need to understand if they are to start to develop a coherent theory of practice.
We must find the means to describe our objectives which, as I have argued above, will depend on the development of digital technologies that are able to ensure the validity of data that is gathered against different statements of capability.
Contrary to the advice of Dylan Wiliam that curriculum is pedagogy, we must decouple pedagogy, defined as the means of achieving our goals, from curriculum, defined as the statement of those goals.
As one of my heroes, Brian Simon, said in his well-known 1981 essay, Why no pedagogy in England?, “Attempts to define common objectives for all pupils across the main subjects is the first necessary condition for identifying effective pedagogic means [to achieve them]”.
Curriculum matters because without clear learning objectives, there is no pedagogy, no measure of educational quality, no objective research, no pooling of resources to address provision at a systemic level. Without these things, we continue to be forced to rely on the efforts of isolated, unsupported and increasingly overworked, undervalued and demoralized teachers; with the government trying to compensate for the inadequacy of that delivery model with ineffective and frequently counter-productive bureaucratic controls.
But because no-one is recognising these problems, because no-one in the current system has the experience or perspective required to address them, and because our main thought leaders are promoting educational theories which actively prevent any progress in solving them – I can see no reason why our current dysfunctional education system should not survive for many decades to come.
If we are to have any chance of seeing an improvement, then the first thing that we need is to raise the quality of the academic discourse.
I have chosen to critique these three educationalists because in all three cases, they are thinkers that I rate. There is much that they have said that I agree with. I agree with Tim Oates that we need more systematic responses to our educational problems, in particular from learning materials such as textbooks, and I agree with him that curriculum matters. I agree with Dylan Wiliam that we need to improve our feedback and the extent to which our teaching can be adaptive. And on the whole I agree with Daisy’s critique of progressive education theory and support her interest in using technology to address the problems we face. I have chosen these three people to critique, not because I regard them as particularly egregious examples of deluded educationalists, but on the contrary because I think they are among the most interesting and important of our educational thinkers. But even so, I think there is an awful lot that they have got wrong. And when even our best and most influential thinkers have got so much wrong, there is clearly something wrong at a systemic level.
Tooley, with whom I started, said in 1998 that the work of most academics went “unnoticed and unheeded by anyone else”. Perhaps that cannot be said of my three subjects, who are all high-profile and influential figures. But what Tooley was really talking about in this phrase was the low level of contestation between academics. In that respect, little appears to have changed.
If you accept at least some of the criticisms that I have made in this article, you need also to ask why no-one else has made these arguments. Why, when there is no shortage of professional, academic educationalists, is it left to an amateur blogger to point out that the emperor is so scantily clad? I suspect that the world of bloggers and social media has made a situation that was bad enough in 1998 even worse, as people build their own echo chambers of support and ignore those criticisms that might compromise their online reputations. This matters because it is only through contestation that positions can be clarified, errors corrected, and new solutions to old problems suggested.
Of all the different ways in which I think our current education system can be described as dysfunctional, I think that this lack of serious, contested debate is perhaps the root of all evil. I hope that this essay might make a very small contribution to addressing this problem. And to that end, I hope that Tim Oates, Dylan Wiliam and Daisy Christodoulou will respond to the points that I have made, either in the comments section below, in guest posts that I shall be happy to provide on this blog, or on other platforms of their choosing.