In defence of feedback


What do we mean by “feedback”? Is it useful? Is it (like revenge) better served cold? And what has this to do with Bjork’s new theory of disuse?

This is the first of a series of (what were supposed to be short) follow-up posts, responding to significant comments made by readers of my longer article Curriculum Matters. In this, the first, I discuss what we mean by feedback, is it more effective when it is immediate or when it is delayed, how does this question relate to Robert Bjork’s new theory of disuse, and how should teachers make sense of the complex (and often uncertain) theory on how the brain works? 11,000 words.

I am grateful to Dr Chan Moruzi for drawing my attention on Twitter to a paper in the Journal of Applied Research in Memory and Cognition, Delaying feedback promotes transfer of knowledge despite student preferences to receive feedback immediately. The conclusions of this paper appeared to contradict my contention that “if there is a long delay between action and reaction [i.e. feedback], then it will be difficult for the student to make an association between the two and the learning opportunity will be lost”. This led to a discussion with Dylan Wiliam about Robert Bjork’s new theory of disuse (NTD), which provides a possible theoretical justification for the paper referenced by Dr Moruzi.

The issue is important to the argument I make in Curriculum Matters, that the logistical complexity of handling feedback is one of the principal reasons that the isolated and unsupported teacher is not able to meet the demands of a mass education system.

The issue is also given resonance by a recent trend in UK education to downplay the importance of feedback.

  • Teachers are being advised by educational writers such as David Didau, influenced by the work of Robert Bjork, on ways of “Reducing feedback” [heading to chapter 17];
  • The Workload Commission, undertaken on the initiative of UK Secretary of State for Education Nicky Morgan in 2016, recommended dramatically reducing the amount of written feedback to children, on the grounds that this did little good and “misinterpreted and ultimately distorted the main messages of Assessment for Learning” [point 14, p.6];
  • This interpretation of Assessment for Learning, which in Curriculum Matters I represented as principally concerning feedback, can also be interpreted in a way that de-emphasizes the importance of feedback in favour of students taking control of their own learning;
  • Some proponents of the knowledge-based curriculum appear to suggest that the transmission of factual knowledge is more important that practicing skills, which is where the opportunities for feedback are mainly to be found.

I am not suggesting that any of the voices characterised above are saying that feedback does not matter at all. They would undoubtedly say that they are encouraging teachers to work smarter rather than harder – and the importance of working smart is underlined by John Hattie’s findings that 40% of feedback is counter-productive. Yet the conclusion of their position, as David Didau puts it, is that “Getting feedback right is a difficult business”. If the panacea that teachers have been promised has turned out to be a curates egg, then many will conclude that feedback is too complicated and uncertain and is best avoided altogether.

Bjork’s new theory of disuse

First, let me summarize Bjork’s new theory of disuse, which is offered by Wiliam and Didau as a justification of the theory that we need less feedback, less often.

The main challenge that the new theory of disuse poses to many of our casual assumptions is that human memory is not at all like most storage systems that we know, like computer hard-drives or attics. The most difficult part of learning is not necessarily storing things in memory but getting them out again. Once things are stored, they are stored for ever: “it is assumed that storage strength once accumulated is never lost” [p.42, point 2]. Nor (unlike attics) does it become more difficult to stuff new items into memory, the fuller it becomes: “there is no limit on storage capacity” [ibid p.43, point 3]. On the contrary, our minds are a neurological equivalent of Dr Who’s tardis: the more you put into them, the more space is created for more stuff. But if we do not regularly get these memories down and dust them off: “items of information…eventually become non-recallable with disuse” [ibid p.43, point 4]. We therefore need to practice retrieval and the harder that practice is, the more effective: “the benefits of a successful retrieval, in terms of its influence on that item’s subsequent retrieval strength, are larger the more difficult or involved the act of retrieval” [ibid p.42 point 4]. This final point is associated with the idea of creating “desirable difficulties” in the testing of retrieval.

We can make our retrieval practice more difficult by spacing it out and/or by interleaving our retrieval practice with other tasks – both of which will help us forget about the item after its first encoding, making its subsequent retrieval more difficult, and therefore a more effective learning exercise. Bjork’s theory is called “new” because it replaces a similar but older theory of Thorndike (1914), which held that where information was not regularly retrieved, the storage strength of that information itself decayed. Under the new theory of disuse, this information is still stored just as well as ever it was: it is our ability to retrieve it that degrades.

A further twist to the theory is that our ability to retrieve information depends on the prompts that induce us to attempt that retrieval. It will generally be easier to retrieve information if we are given prompts that are similar to those that accompanied the original encoding. Varying the prompts is a third way (after spacing and interleaving) of making the process of retrieval more difficult – and remember, difficulties in this regard are desirable. One way of imagining how this process aids retrieval strength is to imagine that by varying the prompts, one is increasing the number of associations between the stored item and different potential prompts. Because another paradox of the theory is that practicing retrieval doesn’t only give you better access to what you already know: by making new connections, it also increases what you know. The more you get stuff down from the attic, the more the stuff in the attic accumulates.

Sometimes the same prompt might be used to retrieve different items of stored information: for example, “my room number” will be associated with both the hotel room that I am staying in tonight and the hotel room that I stayed in last night. In this case, the different associated items will be competitive with each other and practicing the retrieval of my current room number will tend to reduce the retrieval strength yesterday’s room number. This has obvious practical advantages, as one needs to be able to discriminate between information that is useful and current and information that is redundant.

Practical consequences of the NTD

A number of fairly straightforward conclusions can be drawn by teachers from the new theory of disuse.

The theory underlines the fact that students will not necessarily be able to recall what they are told, but that recall will need to be repeatedly practiced. This should raise questions about simplistic understandings of direct instruction and the knowledge-based curriculum, if these terms are understood to refer to the transmission of factual information, unsupported by regular practice in the retrieval and application of that knowledge. On the contrary, this point emphasizes the importance of feedback mechanisms that ensure teaching programmes are responsive to what the student has actually learned.

What is called the “testing effect” suggests, paradoxically, that testing is often the most effective way of studying. This challenges the position of many teachers and their representatives, who frequently make the case against “teaching to the test” by claiming that testing is a distraction from teaching. They could not be more wrong. It is further evidence of the dysfunctionality of our education system that such a fundamental misapprehension could have such a strong hold on such a large section of the teaching profession.

The idea of “desirable difficulties” suggests that such tests should be:

  • spaced;
  • interleaved with other material;
  • associated with the variation of contexts or prompts.

These conclusions are consistent with the points made in my Curriculum Matters about the logistical complexity of managing and sequencing learning activities over time and across varied contexts.

The idea of “desirable difficulties” also tends to suggest that students may be poor stewards of their own learning. Most students like to make their learning easy and regard teachers as the pedagogical equivalent of hotel porters in this respect. They want teachers just to bring them the answers, maybe by giving them some notes to copy off the board. They do not necessarily realize that the easier is their study, the less effective it is. This is consistent with Dan Willingham’s aphorism that “memory is the residue of thought”, and that effective learning requires hard thinking. This point should give us pause when considering ideas of independent learning, which have become particularly prominent within the edtech community and in some versions of Assessment for Learning. On the other hand, it emphasizes that one of the key roles of the teacher should be to make students’ study more difficult and that she should therefore be careful not to provide too much assistance or “scaffolding”. This is reflected in Dylan’s saying that school is too often where students go to watch teachers work – and that this balance needs to be reversed. Teachers should refrain from helping children out too quickly.

Having examined Bjork’s new theory of disuse and listed some of its obvious consequences, I will now examine some of the evidence in favour of delayed feedback.

Feedback exhibit A: evidence presented by Robert Bjork

Bjork’s direct contribution to the subject of feedback is contained in his 2007 paper, The costs and benefits of providing feedback during learning. The paper is based on an experiment in which students were asked to learn a list of Swahili words using flashcards, with the English work on one side of the card and the Swahili word on the other.

The research measured the effectiveness of feedback “in the form of the correct answer”, which was given by flipping a flashcard to see an English word on its reverse after the student had been shown the Swahili word and tried to recall the English equivalent. The paper reported an important benefit in receiving this form of feedback where an incorrect response had been made, improving final recall by 494%. On the other hand, there was little or no benefit in being shown the correct answer after a correct response had already been made. Where time was short (as it normally is), giving feedback when the student was already confident of the correct answer involved an opportunity cost, preventing the more productive use of the time by continuing directly with more retrieval practice. In contrast to the general suggestion (above) that students are not good stewards of their own learning, it was found that in this case, students’ own judgments of whether feedback would be useful or not was likely to be fairly accurate. After all, why would you seek out the correct answer that you were already confident you knew? This does not strike me as a particularly interesting finding. I doubt many people would have imagined that telling students what they already knew would be a useful aid to learning.

There are two further characteristics of this research that should warn against over-generalising from very particular findings:

  • the research deals in only one sort of feedback (providing the correct answer) which, I will argue below, is not really feedback at all;
  • the research deals in very simple, atomic pieces of information (vocabulary pairs) that are relatively easy to store and where the opportunities for storing incomplete or incorrect information is limited.

In fact, Bjork’s paper seems to me to be a rather slight contribution to the already well-established research into the benefits of delayed feedback. But before I consider some other research onto delayed feedback, this might be a good moment to ask what we mean by feedback and what the different sorts of feedback are.

Classifying feedback

Feedback is what makes the hi-fi speakers screech at a concert, when someone uses the microphone in front of them. The more sound comes out of the speakers, the more sound is recorded by the microphone and the more sound is recorded by the microphone, the more sound comes out of the speakers. It is what threatens us with a runaway global warming, because CO2 raises the atmospheric temperature and raised atmospheric temperatures increase the amount of CO2 in the atmosphere. In cybernetics, a branch of systems design specifically concerned with feedback, feedback loops allow systems to regulate themselves by the continuous monitoring of outcomes. More broadly, a feedback loop is any circular causal relationship in which A causes B and B causes A (A and B both standing for categories of phenomena). In the case of teaching and learning, this is illustrated by my “learning to ride a bicycle” slide. Yet in education, this word, like so many, is not clearly defined or properly understood: teachers generally use “feedback” as a synonym for “criticism” or “evaluation”, which are only a couple of forms of possible feedback.

If feedback is understood in its technical sense, then showing someone the correct answer after they have made their response to the question (regardless of what that response was) is not feedback at all. Feedback is by definition sensitive to the student response. Showing the correct answer, regardless of the student’s response, is an undiscriminating invitation to restudy – and restudy is something that is clearly distinguished in the literature from feedback. Yet this is exactly the type of intervention that leads Robert Bjork to draw (what we must assume are therefore unjustified) conclusions about feedback.

The simplest kind of genuine feedback that could be given in Bjork’s experiment would have been to show the student the correct answer only if their response was incorrect. The process would demonstrate minimal responsiveness to the student’s actions. In this case, Bjork’s experiment would have shown that the effect of feedback would have been to improve student performance on the long-term retention test by 494%.

If you think that I am splitting hairs over the definition of feedback (and who cares about definitions, you might think), this statistic should surely make you think again. Precisely because the matter of giving feedback is difficult, precisely because we need to discriminate between positive forms of feedback and negative, it is really important that we use language carefully and are clear about what we are talking about.

Having made a distinction between feedback and one example of something that is not feedback, it would also be useful to distinguish between the different forms of feedback. I do not claim that the following is an exhaustive classification of different sorts of feedback – but it is a start on what I would see as a fundamental task: the development of a precise technical vocabulary of education, and one that I have not seen seriously attempted elsewhere.

I am starting this list at item number 4 to allow for later insertions.

Feedback type 4. Intrinsic feedback. This term is widely recognised as referring to feedback that happens naturally as one performs a learning activity. You make a stack of bricks which is slightly off-centre and the stack falls down: that is feedback which is intrinsic to the business of stacking up bricks. Nobody is telling you how to make a pile of bricks other than the bricks that you are piling up. Correctly interpreted, such feedback will help you develop your own understanding of how to make steady stacks of bricks. The physical environment gives plenty of intrinsic feedback, which is why young children appear to learn motor skills relatively easily, without teaching. Activities targeting abstract learning need more deliberate engineering – this is what Seymour Papert meant in his 1980 book, Mindstorms, when he developed the notion of microworlds. He was referring to simulated environments that have been specifically created to teach abstract concepts through intrinsic feedback: turtle graphics to teach geometry; skidpans to teach Newtonian physics etc. This is why I focused in Curriculum Matters on the difficulty of creating initial learning activities and remedial learning activities.

As far as I am aware, all the research into the merits of delayed feedback concerns augmented feedback, which is the term that is (in my view incorrectly) applied by psychologists to all forms of feedback that are not “intrinsic”.

Feedback type 5. Operand conditioning. Much of the work on animals by behaviourists like B.F.Skinner focused on reward and punishment. In this case, speed of feedback is critical. To be effective, rewards and punishments need to be immediate. The paper referenced by Dr Moruzi, for example, states that “In operand learning paradigms…one of the core findings…was that the response and subsequent feedback had to be paired closely in time in order for the animal to perceive the contingency for learning…delaying feedback by even a few seconds dramatically delayed its effectiveness” [p.222]. Many modern educators regard behaviourist approaches to learning to be mechanistic, simplistic and slightly distasteful. Even referring to “operand learning paradigms” carries an implication that this research is simply not relevant to researchers working within other paradigms (see my Choose your paradigm). Maybe they conjure up those campaigns against laboratory testing on animals which pictured a small, furry creature with electrodes strapped to their skulls, waiting for some nasty scientist to deliver a mind-altering electric shock. But I suspect that we should not dismiss Skinner too lightly. Reward and punishment (which may be intrinsic as well as extrinsic) are fundamental to human motivation and it is common for people to observe that motivation is an important contributor to learning. It is also important to bear in mind that one of the reasons why the brain works so differently to computers is that it is optimised for managing behaviour. While the behaviourist approach to learning may well have paid insufficient regard to the complex processes by which brains develop their intellectual models of the world, it may be that cognitivists have underestimated the extent to which every cycle of the human central processing unit represents what is at heart a behavioural response.

Feedback type 6. Evaluation. I use this term to refer to any feedback which tells the student how well they did, perhaps by a mark, a grade, a tick or a cross. This could be viewed as the equivalent to a reward or punishment, on the assumption that the student cares how well they did and associates the evaluation with approval or disapproval. Research reported by Dylan Wiliam has emphasized the dangers of evaluative feedback, on the basis that it might discourage further effort, either because the student concludes that they are good enough already, or that they are so bad that they might as well give up. My own hunch is that this is a little like he Christian aphorism about blaming the sin and not the sinner: evaluation which is quick, iterative and is attached to the performance, not the performer, will surely provide valuable information that will allow the performer to optimise their approach to the activity. Evaluative feedback that is delayed and which is not associated with an immediate retry is much more likely to become de-coupled from the performance and seen instead as a reflection on the performer. When an athlete is experimenting with slightly different running techniques, or a Formula One team is adjusting their driver’s carburetor, rapid evaluative feedback is vital in order to assess the merits of slightly different approaches. Because evaluative feedback may carry different sorts of emotional baggage, I think that it is useful to distinguish between three sub-types:

6a) raw evaluative feedback, where feedback measures some objectively verifiable characteristic of the student’s performance (such as the time to complete a race);

6b) normative evaluative feedback, which carries some comparison with the level of performance that is expected (what Daniel Koretz refers to as a “cut scores on a continuum of performance” [location 3362 in Chapter 13]) or to the distribution of other performances across a reference cohort);

6c) ipsative evaluative feedback, which carries some comparison with other performances by the student in similar tasks (retrospectively, such as best performance scores, or anticipatory, such as personal targets).

These three varieties of evaluative performance are summarised in the following diagram.


Note that even raw evaluation imposes a particular view of what matters, illustrated in this figure by the fact that a score represents a single dimension through what is in reality a multi-dimensional space. While such a one-dimensional view of a student’s performance might be simplistic (in which case a student could be given multiple scores in respect of different aspects of the same performance), this is not necessarily a problem. It is principally by evaluative feedback that the teacher can communicate to the student an idea of the learning objective, which is an essential function of the teacher. Getting better at something and understanding what it is you have to get better at are often closely related.

Feedback type 7. Criticism. In this category I include all comments that draw attention to weaknesses, set general targets for improvement or offer guidance for improvement. Intuitively, most of us would imagine that such criticism ought to be useful – but often it isn’t because the advice is difficult to follow, especially by someone who has not yet experienced what success feels like. Aristotle describes this paradox by observing that “the things we have to learn before we can do them, we learn by doing them” [Book II, Chapter 1, p.21]. Diana Laurillard makes a similar point when she describes formal education as “the need to learn things we cannot know of until we have learned them”. An anecdote about a disgruntled student has them replying to a teacher’s comment advising him to improve his paragraph structure, “if I had known how to improve my paragraph structure, I would have used paragraphs properly in the first place”. Yet it is easy to criticise criticism too quickly. We still seek out mentors who can give us the secrets of success, help us resolve out recurring difficulties and to move on to higher levels of skill. What we really value in this search for constructive criticism is:

  • a genuine requirement (i.e. a problem that we cannot resolve on our own),
  • genuine domain expertise on the part of the critic,
  • a perceptive assessment of our current capability,
  • an understanding of what remedies may be effective;
  • preparedness on our part to receive and respond to criticism.

When it comes to criticism, quality matters more than quantity.

Feedback type 8. Adaptive sequencing. This is the approach discussed in Curriculum Matters, that Dylan Wiliam in his 2007 keynote to the Association for Learning Technology calls “aggregation technology”. Such an approach also lies at the heart of Benjamin Bloom’s Mastery Learning, in which context it could be called remediation. Often the best way of addressing problems is by the individualised selection of the next learning activity in a way that will help the student address his particular problems and experience the experience of success, without which criticism is often difficult to interpret.

These are what appear to me, at first reflection, to be the main types of feedback; but I shall define four more types as I delve more deeply into the subject.

Applying the classification to Bjork’s experiment

None of these definitions of feedback so far includes showing the student the correct answer, which is what happened in Bjork’s experiment under the description of providing feedback. One might say that the student can, easily enough, derive from a view of the correct answer two types of genuine feedback:

  • type 6, an evaluation of their response (was it correct or incorrect?);
  • type 7, guidance or criticism (this is what you ought to have said).

Both these instances of what might be thought of as genuine feedback are in fact generated by the student herself, which makes them indistinguishable from internal reflection and, as the graphic on my original slide makes clear, reflection is distinct from the feedback that prompts it.

In the case of Bjork’s study, the reflection is prompted by an event (viewing the correct answer) which happens regardless of what the student did. This is indistinguishable from restudy, embedded in a linear activity sequence.

Slide 2

It is possible that such a linear sequence might be devised to prompt reflection by the interesting juxtaposition of activities and events. If students behave in predictable ways (by always getting the answer wrong on the flashcards, for example) such a sequence might give the appearance of offering feedback. Given a sort of pedagogical Turing test, it might appear that the linear sequence was offering responsive feedback when in fact it would was just guessing. Following this logic, I shall enter this sort of sequencing into the classification of feedback under the name of:

Feedback type 1. Faux feedback. A linear sequence of activities designed to promote reflection by giving the appearance of feedback to expected student responses.

If you think this is being pedantic, re-read the argument above which points out that the difference between faux feedback and the simplest kind of genuine feedback is between a negative effect on learning and an improvement to learning of 494%. If you think that is splitting hairs, you need to think again.

Nevertheless, I should make clear that I am not saying that faux feedback is necessarily a bad thing, any more than faux fur on your collar will not keep you warm while minimizing suffering to animals. Given that accurately assessing and appropriately responding to student actions consumes scarce teaching resource, the design of non-responsive environments that nevertheless stimulate reflection may still be beneficial in many circumstances.

Taking the case in which genuine feedback is applied to Bjork’s experiment by showing the correct answer only where there has been an incorrect response, the sort of feedback being offered is adaptive sequencing (feedback type 7), as illustrated by the following process diagram.

Slide 3

In passing, it is worth observing that there are three distinct phases involved in this process:

  • performance (a.k.a. practice)
  • assessment
  • feedback.

And that the last of these stages could be further sub-divided into dialogic feedback and adaptive feedback.

Slide 4

The later phases are dependent on the previous phases (assessment can only occur when there has been a performance and feedback can only occur after some sort of assessment) but the two earlier phases do not depend on the later ones (it is quite possible to practice without being assessed or to be assessed without being given feedback). The language that teachers commonly use tends to conflate these different stages. “Formative assessment” is commonly used to refer to feedback and not assessment at all. At the same time, the benefits of formative assessment are commonly perceived to be those that flow from practice, which is separate and independent from both assessment and teacher-controlled feedback. Similarly, “assessment” is commonly used to refer to the task that elicits the performance (as in “an assessment”) and not the process of evaluation. Not only are assessed tasks often highly conducive to learning, being indistinguishable in themselves from normal practice, but also the assessment itself requires no student time at all. Such a muddling of terminology routinely leads teachers (and indeed psychology researchers) to draw false conclusions about the fundamental processes that lie at the heart of their professional practice.

The purpose of feedback

In a Twitter conversation with Dylan Wiliam following publication of my Curriculum Matters article, I observed that the immediate purpose of all of these forms of feedback is to modify the information (or behaviour, which comes to the same thing) that is memorized, rather than to improve the ability of the student to retrieve that information from memory that has already been stored. Education is not just about storage and retrieval: it is above all about storing and retrieving the right things. If I have it in my head that the Swahili word for soul is rojo (instead of the correct response, roho), then retrieval practice without corrective feedback will merely re-enforce my belief in false information. My contention was that the primary purpose of feedback was therefore to steer the student to storing the correct information, rather than to the practice and re-enforcement of retrieval, which is oblivious to whether the information being retrieved in right or wrong.

Dylan responded by pointing out that where a student is unable to retrieve already encoded information, then the supply of hints would count as a form of feedback. I accept this, and would be happy to add this as another form of feedback. Generalising a little from the specific example of hints, I will call this:

Feedback type 9. Calibration. Adjusting the difficulty of a particular task to match the capability of the student.

Adaptive sequencing (feedback type 8) can also be an appropriate response to a failure of retrieval, routing the student to restudy and/or another test, perhaps with different prompts. Indeed, calibration (new feedback type 9) could be thought of in terms of routing the student to a new question (feedback type 8) with easier prompts. I refer both types of feedback as “adaptive”, in that they alter the next task that the student is asked to undertake.

All the same, I hold to the view that what most teachers refer to as feedback (types 5, re-enforcement; 6, evaluation; 7, criticism) all introduce new information to be encoded; and it is only types 8, adaptive sequencing, and new type 9, calibration, that can be used to support retrieval.

I should also add, as another caveat to this position, that the distinction between encoding and retrieval is not quite as clear as the digital computer or attic analogies might suggest.

Different sorts of information

There are many who are better qualified than me to propose a comprehensive classification of different types of information. From the perspective of a non-specialist, I would observe that there may be important differences between information that is factual, procedural, related to motor skills, attitudinal or behavioural. There might be a relationship between these different types of information and the different types of feedback proposed in the previous section: for example, reward and punishment seems more likely to be associated with behaviour than with the memorization of facts.

As well as these different types of information, I think we can safely assume that information varies in its complexity. Swahili-English word pairs are a simple sort of propositional fact. Such facts might be regarded as the simplest forms of molecule in our memories. I do not go quite so far as to say “atoms”, as there is also a long-standing distinction between familiarity (“connaître” in French) which concerns a single object (“I know John”) and propositional knowledge (“savoir” in French) which represents a linking of two concepts (“I know John is late”). In my diagram, I would equate our familiarity with single concepts to atoms and the linking of two concepts, which is always involved in any proposition, to a simple molecule. Many capabilities (to solve problems or engage in creative activity) must reflect more complex networks of such atomic facts and behaviours than simple propositions. We might envisage this sense of increasing complexity as an ever-widening network of such atoms, forming increasingly complex, overlapping molecules.

FIGURE 5Slide 5

This idea of a continuum of wider or narrower associative networks corresponds to an idea of a scale between more concrete (at the bottom of the hierarchy) and more abstract knowledge at the top of this hierarchy). This is the same structure that corresponds to what Bill Schmidt calls the “upper triangular structure” of the coherent curriculum, illustrated in my Curriculum Matters with this slide.

FIGURE 6Slide 6

This is the same idea that lies behind:

  • E.D.Hirsch’s argument that reading ability depends on good domain knowledge, which suggests that more complex skills and capabilities are composed of (or dependent on) more discreet items of knowledge;
  • Bloom’s taxonomy (unfairly attacked, in my view, by proponents of the knowledge curriculum like Daisy Christodoulou and David Didau), which suggests how discreet factual information needs to be progressively incorporated with more complex tasks, such as problem solving, creative exercises and evaluation, proposing that the first steps in building this complexity as a distinction between “knowing” and “understanding”, which is illustrated above.
  • Jean Piaget’s theory by which new information is either assimilated or accommodated within our existing mental models supports, supporting the view that discreet, atomic facts and reflexes are networked together to form knowledge networks;
  • cognitive load theory, which suggests that it is only by such networking of facts that we can ever transfer information from transient working memory to more durable long-term memory;
  • modern neuroscience, which pictures of the brain as a neural network.

Feedback exhibit B: evidence presented by Richard Schmidt

Having discussed at length Bjork’s new theory of disuse and its (apparently rather slight) connection to the argument about delayed feedback, I want to turn to other evidence for the merits of delayed feedback. My main source for this is Richard Schmidt’s 1991 review of the evidence, Frequent augmented feedback can degrade learning: evidence and interpretations. Richard Schmidt should not be confused with Bill Schmidt, whom I discussed in Curriculum Matters in relation to his theory of “curriculum coherence”.

The first thing to note about Richard Schmidt’s article is that it confines itself to a discussion of “augmented” feedback – defined by Schmidt as feedback that is not intrinsic to the task, in the sense discussed in section 4 above, point 1. I should note at this point that I do not accept Schmidt’s definition of “augmented” – but it is at least better than the definition offered by Robert Bjork, quoted in David Didau, who refers to “feedback from an external source (i.e., augmented feedback)” [p.254]. Intrinsic feedback (which, according to Schmidt, is all feedback that is not augmented) clearly comes from external sources, such as bicycles and piles of bricks and (in social interactions) other interlocutors. Indeed, it is arguable that all feedback comes from external sources and that what might be called “internal feedback” is better called “reflection”. So, accepting for the moment Schmidt’s definition of “augmented feedback”, it is worth noting that none of the research referenced by Schmidt implies that intrinsic feedback (which is normally immediate) is not helpful.

Second, all the research discussed is conducted in relation to relatively simple performances, most often involving motor-skills such as holding a button down for a set period of time or replicating a particular movement pattern with one’s arms. Taking place in the physical world, these performances involve plentiful intrinsic feedback and do not involve complex networks of abstract knowledge. The applicability of its conclusions, given the very different sorts of knowledge that might be involved in school learning and the different effects of feedback (as already noted in the case of operand conditioning) means that its conclusions must be treated with some caution.

Putting these two points together, it is worth noting that teachers of academic subjects face a particular problem because the sort of abstract knowledge that they are trying to convey cannot be learnt through the sort of physical environment that gives plentiful intrinsic feedback. People performing physical tasks, holding down buttons or performing golf swings, are already receiving intrinsic feedback through their own nervous systems from the physical environment around them, of a type that people performing tasks involving what Jean Piaget called “abstract operations” do not. This is the argument that Seymour Papert made in his book, Mindstorms, already discussed.

Schmidt summarizes a number of experiments, from the seminal work of Lavery in 1962 through to more recent experiments in the late 1980s in which he was involved. Subjects were given varying amounts of feedback, in the form of a computer-generated comparison between the intended pattern of limb movements and their actual pattern of movements, along with a computer-generated score based on the comparison of the actual and target movements.

Here, then, two different sorts of feedback are being used:

  • a score, which counts as feedback type (6a), raw evaluation;
  • a computer-generated trace of the student’s actual arm movement, compared to the target path.

This second form of feedback again seems difficult to fit into the classification proposed so far. The addition of computer sensors is clearly an “augmentation” of the feedback mechanisms that are normally available to us. Yet in giving the student the ability to visualize the movement they have just made, we are dealing with raw data representing what we did. It might be a sort of data that we would not normally be able to access but it is of a similar sort to other types of sense data. From an educational perspective, the sort of augmentation provided is of a passive and technical sort. Unlike the bicycle, which makes a sort of (albeit inanimate) judgement on whether the student’s inputs are of the type that are required to keep a bicycle upright, the computer sensors provide no processing, no judgments, and no unexpected results: they merely show the student what she has done.

In Curriculum Matters, I discuss the iterative feedback loop with the “instructive other” as a sort of conversation. This analogy of a conversation is picked up in my earlier post, In the beginning was the conversation; in Diana Laurillard’s book Teaching as a design science; and in Robin Alexander’s work on dialogic teaching. The nature of this educational conversation is that the student engages with an instructive other that encapsulates some sort of assessment of the student’s actions. But in the case of the computer-generated trace of the performer’s arm movement, no decision is made by the “instructive other”; it merely plays back at the student what the student had just done. If it were a chat-bot being submitted to the Turing test, it would be playing that particularly annoying sort of conversation partner which repeats whatever you had just said.

I will allow that this qualifies as feedback, technically speaking, because it is sensitive to what the actor has just done – but from an instructional perspective, it is a very simple kind of feedback. It might be called “feed-through” as there is no significant processing of the information involved. It does not show you what the consequences of your actions are, it does not evaluate your actions or suggest ways of improving. For this reason, I shall add it to my classification of feedback types as:

Feedback type 3. Mirrored feedback. Playing back to the student what they have just done.

Because “what you have just done” is an intrinsic part of the activity that you are performing, this sort of feedback counts as “intrinsic”; but because it requires additional sensors that are not on our normal kit list, it is also “augmented”. This is the reason why it is not satisfactory to define “augmented feedback” as “feedback that is not intrinsic”. A better definition would be “feedback that is not normally available to us in the course of performing an activity”.

This is the same sort of feedback that teachers or sportsmen use when they video themselves in the classroom on the field of play; or ballet dancers when they practice their moves in front of the mirror. It is undoubtedly useful way to provoke reflection, but generally in combination with other, more advanced types of feedback. It gives you a better view of what you are doing but not whether what you are doing is right.

In this respect, mirrored feedback may be less useful than much intrinsic feedback, which might contain more information to promote evaluation (as when you fall of the bike or the stack of bricks falls down). Much intrinsic feedback is quite responsive, in this sense of being based on a kind of assessment.

The upshot of this discussion is that the existence of type 3 mirrored feedback (which is a sort of augmented feed-through) reveals yet another type of feedback, lying even more quietly in the performer’s own sensory systems, a sort of natural feed-through which I shall call:

Feedback type 2: awareness. The knowledge of results of a performers actions delivered by non-augmented means.

You might say that this new feedback type 2 is so obvious that it really wasn’t worth spending any time talking about it. Buddihsts (who stress the importance of mindfulness) and librarians (who like tidy taxonomies) might disagree.

A summary of my feedback classification

I have now completed my classification of different types of feedback, which I summarise in the following table.

TABLE 1 A taxonomy of feedback
1 Faux feedback A linear sequence of activities designed to promote reflection, often giving the impression of feedback
2 Awareness The normal monitoring of one’s actions by natural sense data
3 Mirroring The representation to the student of his/her performance by augmented means
4 Intrinsic feedback Responsive feedback which occurs naturally as part of an activity
5 Operand conditioning The attempt to reinforce certain behaviours by:
a Reward Positive response given to desirable behaviours
b Punishment Negative response given to undesirable behaviours
6 Evaluation Expresses the quality of performance in respect of particular criteria
a Raw Where the expression contains absolute data
b Normative Where the expression is relative to an expected level of achievement</td
c Ipsative Where the expression is compared to previous achievement
7 Criticism Comments on the student’s performance that:
a Reactive identifying good or bad aspects of previous performances
b Proactive proposing how future performances might be improved
8 Sequencing Selecting the next task based on previous performance
9 Calibration Adjusting a task on the basis of previous performance

Above this nine-point taxonomy, is a four-point meta-taxonomy, indicated by the shading in the table.

TABLE 2 A meta-taxonomy of feedback
Intrinsic feedback
Dialogic feedback
Adaptive feedback

Teachers can ensure that students receive plenty of intrinsic feedback by assigning well-designed tasks to their students while extrinsic feedback requires a continuing pedagogical effort.

Feedback types 1, 2 and 3, which I place below “intrinsic”, are all “non-responsive” because they are either totally insensitive to the student’s performance (faux feedback) or they simply feed-through information about that performance (awareness and mirroring). Feedback types 5 to 9, which I place above “intrinsic”, are all “responsive”. Of these, feedback types that occur within the context of a task I call “dialogic”; and when they involve the setting of a new task, “adaptive”.

The results of Schmidt’s research

In Schmidt’s research, the mirrored feedback and raw evaluations described in the previous section were given at different intervals to different groups engaged in blocked practice:

  • the 100% group received feedback after every attempt;
  • the 50% group received feedback after every other attempt;
  • the fade-down group received decreasing amounts of feedback throughout the trial;
  • the fade-up group received increasing amounts of feedback throughout the trial;
  • the bandwidth group received feedback after every performance that was judged to be poorer than a threshold.
FIGURE 7Slide 7

In these experiments, the order of performance on the long-term retention test was:

  • (best) bandwidth group;
  • fade-down group;
  • 50% group;
  • 100% group;
  • (worst) fade-up group.

Although these results suggest some benefit for adapting feedback depended on the subject’s performance (which can be assumed to improve), the main benefit was observed from simply reducing feedback altogether (i.e. to occur between the 100% group and the 50% group).

In a separate experiment, different groups were given summary or averaged feedback after:

  • each single performance (control);
  • the previous 5 performances;
  • the previous 10 performances;
  • the previous 15 performances.

In this experiment, the group that performed best varied between experiments: in one it was the 15-performance group that performed best, in another it was the 5-performance group; but in neither was it the single performance group.

The fact that in some circumstances the group receiving moderate feedback did better than the groups receiving less feedback shows that a simple “less is more” conclusion would not be justified. The research also shows that “inserting different [unrelated] tasks between a given performance and its feedback” (i.e. interleaving) also has the effect of degrading performance on the long-term retention test. This is the opposite to Bjork’s conclusion on interleaving in spaced practice [section 2.3, p.8] and suggests that delaying feedback might be problematic in real-world conditions (in which delaying feedback almost inevitably means interleaving feedback).

Schmidt’s explanations of the evidence

In his conclusion, Schmidt discusses four different theories which might explain the results, as follows:

  • The feedback “becomes part of the task”, so that when the student comes to perform the task without feedback, it seems unfamiliar and therefore more difficult. This idea does not necessarily make much sense, as feedback normally follows the performance of a task, in which circumstances it cannot be considered to influence the nature of the task that precedes it. Yet in the case of iterative, blocked practice, the feedback that followed the first performance precedes the second performance, in relationship to which it can have an influence. In this case, by increasing the student’s familiarity with the task, it would reduce the “desirable difficulties” that Bjork believes to encourage long-term learning.
FIGURE 8Slide 8
  • Schmidt’s second explanation is similar, being based more explicitly on Bjork’s NTD. This suggests that giving feedback does more than simply make the task more familiar, it provides explicit assistance or “scaffolding” to the task, removing the “desirable difficulties” that would otherwise enhance the value of the retrieval practice that is part of the performance. But as Schmidt observes, “At present [in 1991], there is no convincing evidence either for or against this hypothesis” [p.13].
  • Schmidt’s preferred explanation is that early feedback inhibits the mental processes that require the subject to evaluate his or her own performance. Without augmented feedback, “the learner develops a sensitivity to response-produced [i.e. intrinsic] feedback, leading to a learned capability to detect errors that provides the basis for recognition memory, or for an error-detection mechanism” [p.12]. As this evaluative mechanism may occur immediately after performance, this process may be disrupted by extrinsic feedback that is received at the same time.
  • With reference to the improved utility of summary feedback, it may be that “maladaptive short-term corrections” are elicited by feedback that responds to isolated errors that have little long-term significance for the student’s transferable capability, or because it responds only to the particular context in which a generic skill is applied. The student might end up making constant course corrections in response to the buffets of every passing wave, rather than steering a steady course, ignoring the noise of insignificant, short-term problems.

All four of these explanations seem plausible in their different ways, and may overlap. I have suggested that explanations 1 and 2 are both “Bjorkian” in attributing the problem to the avoidance of “desirable difficulties”. Yet this explanation only appears to work in the case of repetitive or blocked practice and does not account for the effect when it is exhibited in association with isolated performances. Feedback is unlikely to influence a performance when the performance precedes the feedback.

Similarly, I see explanations 3 and 4 overlapping in that they both suggest interference with the ability of the student to make sense of the task immediately after completion. They are either unable to evaluate their own performance (explanation 3) or they are distracted from focusing on the key criteria of performance by irrelevant noise (explanation 4). Both of these explanations I would characterise as “Schmidtian”. When it comes to intrinsic feedback, most performances could be viewed as iterative interactions in which feedback occurs in conjunction with student action. The action-reaction cycle, characterised in my diagram of learning to ride a bicycle, is often constant, rapid and immediate. Just as this sort of feedback is intrinsic to the activity, so evaluating and responding to such feedback is intrinsic to the performance. Extrinsic feedback that is provided too quickly and too often is likely to desensitize the student to this intrinsic feedback, which is what she should be learning to focus on.

Exhbit C: The 2013 Duke/Texas paper

This brings me back to the first paper on this subject that was recommended by Dr Moruzi. The paper reported two experiments which suggested the benefits of delayed feedback, published by Mullet, Butler, Verdin, Borries and Marsh from Duke University and the University of Texas.

The first experiment

The first experiment involved 26 students following a higher education maths course. They were given weekly assignments that required the application of complex mathematical concepts to between 10-14 problems. An example of the workings involved in answering one of these problems is given below.

FIGURE 9Slide 9

The students were required to submit free-text responses to each problem, after which they were given a multiple-choice question that offered four alternative solutions to the same problem. In neither the control group nor the experimental groups were they shown the correct answer immediately after answering the multiple choice question. This was made available to the control group at the end of the week in which they were allowed to complete the assignment, and to the experimental group one week later. Students were sent an email inviting them to view the feedback and were only credited with completing the assignment when they had done so. Long-term retention was measured by an exam, delivered a similar format, at the end of each semester and at the end of the course.

Four students were excluded (e.g. because they did not complete the course), leaving a population of 22. The proportion of students answering questions correctly during the initial assignment was about equal. The students in the “non-delayed” group viewed feedback on average 4.1 days after and the students in the “delayed” group on average 11.6 days after completing the initial assignment. The proportion of correct answers in the final exams given by students in the “delayed” group was on average slightly higher (92%) than by students in the “non-delayed” group (84%). Given a total population of 22 students, this represents an average difference of about 1 correct response on each question between the groups.

The students were later asked to evaluate their experience of receiving what is misleadingly referred to as “immediate” or delayed feedback. While 73% of students that received so-called immediate feedback said that they liked its timing, only 57% of students that received delayed feedback said the same.

The second experiment

In a second experiment, a student population of 36 was split two ways, between optional and mandatory feedback, and between so-called immediate and delayed feedback, providing an effective group size of 9. The average elapsed time between performance and viewing feedback is shown for the four groups in the following table.

Delay (in days) between performance and first viewing of feedback
“Immediate” Delayed
Optional 13.0 20.1
Mandatory 5.8 14.3

The percentage of questions answered correctly in the end-of-unit exam by the four groups is given in table 4.

Questions answered corrected in the end-of-unit exam
“Immediate” Delayed
Optional 48% 55%
Mandatory 66% 82%

In the end-of-unit exam, those receiving more delayed feedback performed better by an average margin of 0.8 correct responses per 9 students where feedback was optional and by an average of 1.6 correct responses per 9 students where feedback was mandatory.


The study (like many in educational research) is under-powered and it must be assumed that this problem will be compounded by the chance of publication bias: the likelihood that if no result had been obtained by the experiment, it would not have been published at all, distorting the conclusions that might be drawn from studies that are published.

Nevertheless, all three comparisons produced consistent results, in which the groups receiving more delayed feedback performed better by 8%, 16% and 7%. Assuming that these results could be shown by further research to be reliable, what interpretation can we lay on these results?

It is worth noting again that (like much education at scale in current conditions) the quality of feedback offered was low. There is no reason why software could not be developed that would parse students’ free-text responses and analyse their mistakes – yet we are still not in a position where this technology is widely available or easily deployed. The result is that no automatic feedback at all was given on students’ free-text responses, nor is any check made on whether a serious free-text answer has even been attempted. This seems a little slap-dash: given the small number of students involved, there is no reason why a manual check could not have been made.

Where genuine attempts were made at a free-text response, then a certain amount of immediate faux-feedback is provided to all students by the sequencing of free-text and multiple-choice responses (forcing the student to re-examine the first response in the light of the choices offered in the second stage).

The difference in timing between the two groups is less clear than might be expected (the so-called “immediate” groups received feedback after an average of 4.1, 5.0 and 13.8 days). This fact makes all four of Schmidt’s explanations of delayed feedback implausible. It is difficult to imagine how feedback received after such a significant delay could interfere with the student’s ability to evaluate their own performance (the Schmidtian explanation); nor is it possible to conceive how such delayed feedback, received well after all questions had been answered, would assist the students in answering the questions so many days earlier or become associated in their minds with those performances (the Bjorkian explanation).

Taken together with the low quality of the feedback itself, a different sort of Bjorkian explanation seems much more likely: that it is not the feedback, per se, that is accounting for the difference in performance, but the restudy that is being prompted by the receipt of feedback. In order to understand the feedback, the students need to think themselves into the problem another time and remind themselves how the solution works. It is this rehearsal of the performance, rather than the information about their earlier attempt, that is helping to reinforce the students’ understanding – and Bjork’s new theory of disuse has established that more widely spaced practice is more effective than less widely spaced practice, because the process of forgetting (which introduces “desirable difficulties” for the later rehearsal) is continuous.

Laying aside it’s statistical problems, this experiment appears not be really about feedback at all, but about spaced practice.

My Twitter discussion with Dylan

Following Dr Moruzi’s link to the Duke/Texas paper, I had a brief conversation regarding delayed feedback with Dylan Wiliam, whom I had critiqued in my original post, Curriculum Matters. As so often occurs with Twitter, nuances in what are necessarily complex conversations are quickly lost in the to-and-fro of 280 character soundbites. So I am taking this opportunity to explain why I disagree with some of Dylan’s points.

I reproduce the full discussion as a graphic (which can be enlarged by clicking) – and I have transcribed Dylan’s tweets below, along with my responses.


Slide 10

1. Dylan tweeted:

In my view Robert Bjork’s new theory of disuse provides a better explanation of why delayed feedback is often more effective than immediate feedback [better than my saying “evaluative advice better after student has made an effort to work it out themselves”]

I respond, on the basis of this essay.

1a. It is not clear to me why Bjork’s NTD should be classed a “better” explanation of delayed feedback than “making an effort to work it out themselves”, when NTD relies heavily on the concept of desirable difficulties which is very largely about making students do the work themselves. What I meant by my subsequent reference to cognitive load theory is that our ability to transfer knowledge from short-term to long-term memory relies on our ability to find strongly interlinked relationships between different knowledge “items”, and Bjork’s NTD has much to say about how this stitching together of memory occurs. So too do Schmidtian explanations, which insist that the student develops his/her own ability to self-evaluate. Working it out oneself is the common thread that runs through all these theories and, laying aside some Twitter mis-communication, I would be very surprised if Dylan disagrees with this general perspective.

1b. I have explained above why I do not agree with Dylan that Bjork’s new theory of disuse provides a satisfactory explanation of the merits of delayed feedback. His own research into this matter (Exhibit A) depended on faux feedback which is not true feedback at all. If feedback is understood correctly, Bjork’s experiment shows that immediate feedback is overwhelmingly productive. Other experiments seem to show that receiving intrinsic feedback is also productive of learning, which tends to support Schmidtian explanations above Bjorkian ones. Although I acknowledge that the Duke/Texas experiment is more susceptible of a Bjorkian than a Schmidtian explanation, it seems much more likely that this is to do with the merits of spaced practice, which is prompted by feedback, rather than being the result of receiving delayed feedback per se.

2. Dylan tweeted

That’s not actually correct. [my saying “Bjork’s disuse relates to retrieval not feedback”] The new theory of disuse specifically addresses both retrieval and restudy (former better than latter). Feedback can support retrieval, or prompt restudy (which explains hierarchy of effects in feedback studies).

I respond, on the basis of this essay.

I do not agree that my statement was incorrect and I think that Dylan’s response is confused. By definition, what is disused is what has previously been acquired. Applying the analogy of possession to memory, disuse occurs specifically through a lack of retrieval. So Bjork’s theory of disused concerns retrieval (or lack of it) by definition. To compare retrieval to restudy is a category error: retrieval is a neurological process that is comparable to encoding, while restudy is a pedagogical process comparable to testing and initial study. This categorisation is illustrated in the following table.

TABLE 5 Different categories of concept in Bjork’s new theory of disuse
Theoretical characteristics Storage strength Retrieval strength
Theoretical neurological processes Storage/encoding Retrieval
Pedagogical/experimental processes Study Re-study Testing

Bjork’s NTD is not a theory of study, re-study or testing, though of course practical consequences regarding classroom practice can be drawn from it. Those consequences are twofold in this regard.

  1. It is easier to memorize an item by restudy after that item has been previously studied and forgotten, than it was to memorize it through the initial study. This suggests that the memorized item is still stored and that restudy involves an element of retrieval as a sort of shadow-response to a repeated encoding. Perhaps another analogy for this process might be like digging a tunnel that is being worked on from both ends at once.
  2. If Dylan intended to compare restudy to testing, then Bjork’s conclusion is that, so long as retrieval is possible, “the act of retrieving an item of information is considerably more potent in terms of facilitating its subsequent successful recall than is an additional study trial on that item” [p.37]. So what Bjork’s NTD has to say about re-study (in the sense of re-reading previously read material) is that one should generally avoid it in favour of re-testing.

3. Dylan tweeted

[In response to this point that we are measuring the effect of spaced practice and not feedback itself] I would have thought that the main function of feedback is whichever function of feedback has the greatest impact on learning. In other words, it is an empirical question…

I respond, on the basis of this essay.

In my own mind, Dylan’s point here becomes clearer if we replace “function” with “consequence”. If A has consequence B then we might choose to do A for the sake of the benefit that we get from B (which is what we then deem to be the function of A). So far, so good.

Let me return, though, to Dylan’s navvies, which I discussed in Curriculum Matters: they know where to dig but not why, which Dylan argues is not the approach we want teachers to take. To argue that this is essentially an empirical question is surely to contradict that position: it says that all we need to know is that action A results in beneficial consequence B – so lets just carry on doing action A. I suggest that there are two reasons why this approach is to be avoided.

  1. We do not know that even better consequences are not possible than those which we can (empirically) demonstrate that we are achieving with current methods. And the likelihood of achieving better outcomes can only be assessed if we understand the why an action has a particular consequence.
  2. Rules of thumb work fairly well if the context of our actions remain constant but where (as in teaching) the context is constantly shifting, a rule of thumb that works in one context is likely not to work in another.

If we observe, empirically, that delayed feedback produces beneficial outcomes and we do not understand that those outcome are produced by spaced practice and not by the feedback itself, we open ourselves to making two mistakes.

  1. We are missing the opportunity to achieve the same benefits by scheduling delayed practice, without having to go through all the unnecessary work of feedback. Just because A prompts B does not mean that A is the only way of bringing about B.
  2. Using delayed feedback as a way of triggering spaced practice might actually be counter-productive. Bjork’s own research focuses on the opportunity cost: maybe you could fit in 2 practice attempts for every one feedback-practice cycle. Even more significant, Bjork’s work shows that feedback that precedes repeat practice is liable to reduce the desirable difficulties that make the spaced practice so valuable.

For these reasons, I would not be content to say that “it is an empirical question” but would add that empirical evidence needs to be accommodated into a model which offers reliable and transferable interpretations (which is what we mean when we talk about “understanding” the evidence). And I stand by my more general point that that understanding requires teachers to be much more specific and accurate in the way that they use terminology.

4. Dylan tweeted:

[In response to “feedback helps modify information as it is encoded – it doesn’t affect retrieval”] I don’t think it’s that simple. If feedback takes the form of a hint that supports the learner in retrieval, but where there are still “desirable difficulties” for the learner, then it is more likely to be effective than correction.

I respond, on the basis of this essay.

I accept that what I classify above as “adaptive” feedback can prompt retrieval – and I think that Dylan and I would agree (probably furiously) that adaptation (what Dylan calls “responsive teaching” or in 2007, in the context of edtech, “aggregation”) is an effective form of feedback. But with that important caveat, what I call “dialogic” feedback (which is what I suspect most teachers think of as feedback, plain and simple) I think my position is still valid, that this is about modifying the information to be encoded and not assisting retrieval. The importance of altering the information that is being encoded is, in my view, weakly reflected in the experiments reported by Schmidt and Bjork because these generally refer to simple word pairs or physical tasks and not complex concepts or skills.

5. In a separate tweet, unconnected in the conversation with me, Dylan also gave fulsome praise for an article called The Empty Brain, by Robert Epstein:

This may be the most important blog post I have read this year: “Your brain does not process information, retrieve knowledge or store memories. In short: your brain is not a computer.”

In making the case that the brain does not work like a computer (which I completely agree with), Epstein’s message that “your brain does not…retrieve knowledge” is, quite simply, wrong. It seems strange that at one moment, Dylan is championing Bjork’s new theory of disuse, which examines the importance of memory retrieval, and the next moment promoting quite so enthusiastically the message that our brains do not retrieve knowledge or stored memories at all.

Dylan’s initial response to my Curriculum Matters was to tweet:

I will reply to this in a week or two. In the meantime, people should be aware that many, perhaps most, of the views attributed to me by Crispin are not views I have ever actually held.

I am still looking forward to Dylan’s response but in the meantime I would stress that (while I agree with many of the points that Dylan has argued), my arguments in Curriculum Matters are based on things that Dylan has said (or not said) and that, as I said in that piece, “one of my principle criticisms of Wiliam is inconsistency”. I hope I have not misled anyone as to what Dylan believes because in many cases (I would not go so far as to say “most”), I would not claim to be sure of it myself.

General conclusions

1. Like Richard Schmidt in 1991, I have not been able to find any evidence that persuades me that Robert Bjork’s theories explain the benefits of delayed feedback – though I think he argues persuasively for the benefits of spaced practice, which is a different thing.

2. Other research suggests that the place of feedback varies depending on the nature of that feedback. Many of the statements routinely made about feedback concern procedures that do not involve any genuine feedback at all; other statements concern only particular (generally rather limited) sorts of feedback in response to very simple tasks. For that reason, I think that, as a matter of some urgency, teachers (and perhaps psychologists) should develop a more detailed, technical understanding of what feedback is and what are its different forms. In order to support such an effort, I have in this essay proposed a classification of feedback.

3. I find Schmidtian explanations of the benefits of delayed feedback to be more convincing than Bjorkian ones – and it may be that, when seen from a Schmidtian perspective, it would be more appropriate to think about the benefits of “reduced” or “averaged” feedback rather than “delayed” feedback. It should also be borne in mind that Schmidt’s research applies exclusively to “augmented” feedback, in which category I include “mirrored”, “dialogic” and “adaptive” feedback, though (in the absence of any research showing the contrary) I assume that the theory does not apply to adaptive feedback. The Schmidtian case is, then, that mirrored and dialogic feedback interfere with the development of the development of the student’s ability to evaluate intrinsic feedback directly. On the other hand, none of the research suggests that you can have too much intrinsic feedback or that it can come too quickly (and this is, of course, the sort of feedback that I was discussing in my example of learning to ride a bicycle).

5. The elephant in the room, which only starts to become apparent when this last point has been accepted, is that such intrinsic feedback is very hard to come by when learning abstract academic subjects. This was the central dilemma that was tackled by Seymour Papert in Mindstorms. The point underlines the central importance of engineering good learning activities, as discussed in Curriculum Matters. I do not think that Dylan would disagree with me here.

6. Like Diana Laurillard, I use the language of conversation and dialog as an analogy for the instructional feedback loop between teacher and student, even where this does not involve an actual face-to-face conversation. This should not obscure the fact that in the case of many domains concerning abstract concepts it may be very difficult to engineer appropriate microworlds, as Seymour Papert called simulated environments designed to teach abstract concepts. Often, the best sort of learning activity is real, old-fashioned, face-to-face conversation with an expert – and this is what the Socratic dialectic offered two and a half thousand years ago and the Oxbridge tutorial offers today. The importance of this approach has also been championed recently by Robin Alexander, who has written about “dialogic teaching”. My Curriculum Matters argued that, where this is the case, the role of technology must be to reduce the amount of time that teachers spend on mechanical work, so that they can give much more productive time to this vital (and rewarding) task. When the teacher participates in such conversations, the feedback that the teacher gives is best thought of as “intrinsic” to the activity, rather than extrinsic advice or evaluation. Such feedback is best given immediately and, I suggest, nothing in any of the research referenced by Bjork or Schmidt contradicts this fact. Conversations depend on rapid responses from your interlocutor.

7. The difficulty of conducting live, face-to-face conversations with hundreds of individual students means that such conversations may occur (though in rather less dynamic form) in written feedback as well as face-to-face. This sort of conversational written feedback (“Good point – but have you thought about x?”) may often more useful than criticism or evaluation. I regard the March 2016 report of the Independent Teacher Workload Review Group as a dangerously superficial analysis of the problem of written feedback. As a piece of work driven principally by the desire to reduce workload and not by any coherent analysis of good pedagogy, I suggest that its main significance is in its illustration of the intractable dysfunction of our current educational system rather than its offer of any useful recommendations or solutions.

8. I also agree with Dylan on the importance of what I term “adaptive” feedback. This point plays a key part of my argument in Curriculum Matters.

9. Finally, although theories of reduced, averaged and delayed feedback may have placed a health warning on what I term “dialogic” feedback, none of them question its ultimate importance. There may sometimes be better ways of answering the question “how” students should learn, but no amount of mirrors, remedial exercises or simulated environments will answer the questions “why?” or “what?” Instead of drawing the conclusion that dialogic feedback is so unpredictable in its consequences that it is best avoided, perhaps a better way of viewing the problem is to think that this sort of feedback is so precious that the teacher should take care not to devalue it by thoughtless or casual use. This again supports my general argument in Curriculum Matters. We do not de-professionalize the teacher by automating those parts of education that can easily be automated. On the contrary, we give the teacher more space to do what he or she is really good at – and, incidentally, what he or she generally wants to do most – and to do that thing really well.

One thought on “In defence of feedback

  1. Do you know of any research that supports the use of immediate feedback, particularly of non-trivial modes, i.e. not simply presenting the correct answer?

    I can offer an explanation of why ‘right answer’ feedback does not work; it comes not from education but from Ericksonian hypnosis:

    The issue is one of ‘satisfaction’. I will take your Swahili example. If I offer you the simple feedback “the spelling is roho”, you will find this explanation satisfactory, and will stop thinking about it. But what if I tease you “‘rojo’ would look just right to a Spaniard!” That might engage your attention and send you off on a long trawl that results in some strongly reinforced connections. You now have a whole story, which is much harder to forget.

    I offer this with the caveat that afaik nobody has ever even tried to research it, and I am not sure how they could. I personally find it persuasive, however.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s