Learning analytics for better learning content

A resume of a break-out discussion at the JISC/CETIS annual conference

For me, the highlight of the 2012 JISC/CETIS annual conference[1] was Adam Cooper’s session on “Mapping Cause and Effect”. Adam asked participants to create diagrams which traced chains of causality (both negative and positive) through to a final, desirable, pre-defined outcome.

I joined a break-out group with Colin Smythe (Chief Architect at IMS ), Tore Hoel (Oslo and Akershus University, Sweden), Malcolm Batchelor (JISC/CETIS), and Seyoung Kim (Carnegie Mellon University, USA). We chose to analyse what preconditions would favour or disfavour the use of analytics to improve the quality of course materials, taking course materials to be synonymous with learning content. We envisaged a scenario in which the author of a digital course might be able to track the performance of students taking the course. Having discovered from this data which parts of the course worked well and which worked less well, the author could improve the quality of the course materials.

Having sketched out a process diagram to summarise how analytics and content authoring might work together in practice, we identified two key prerequisites of our chosen outcome:

  • the aggregation of data in a data repository, because learning analytics must have suitable data to work with;
  • the reuse of content, because without reuse, there is no point in making any improvements.

The aggregation of data

Turning first to the data, Colin urged on us the need for huge quantities of the stuff—and that it needed to be of good quality.

I heard a very similar message a week later when I attended the “SAP Insider” conference in Las Vegas[2]. SAP is a software system that underlies the Enterprise Resource Planning (ERP) software systems used by most large multinational companies. The conference keynote was entitled “How ‘Big Data’ Shapes Business Results”. While the size of most SAP databases are measured in terabytes (thousands of gigabytes), the largest are measured in petabytes (millions of gigabytes)—and the quantity of data driving these systems is growing constantly. By the middle of the decade, it is estimated that we will collectively be storing about eight zettabytes (trillions of gigabytes) of data.

The keynote identified the desirable qualities of all that data in terms of four “v”s: volume, variety, velocity and validity. It was interesting that in our own discussion, we came up with three of these four, though we did not manage the alliterative flourish.

We thought you needed:

  • large quantities of data (volume);
  • varied sources of data (variety);
  • semantically meaningful data (which is a similar to validity, the latter being  about accuracy and the former being about having a meaningful standard in the first place, against which that accuracy can be measured. This is a complex topic which touches on structured, unstructured and semi-structured data which deserves another article in the future.

All that our group at the CETIS conference missed out from the SAP list was velocity. Maybe this omission just reflects how far behind the business world we are in education. While business measures data velocity in seconds, schools (if they have any meaningful data about student performance and competency at all) are doing well if they can measure it in months.

The omission of velocity from our list also reflects the nature of the educational requirement. If Johnnie is perceived to be weak on Simultaneous Equations, he does not need to be dragged out of his Geography lesson by the rapid-response remediation team so that the problem can be solved before lunch. Some extra revision over the next few weeks would be more appropriate.

In addition to the three essential characteristics of big data, our group also noted the need to resolve issues around data protection and privacy, if such aggregations of data were to be acceptable to the public in an educational context.

Only later did we find on a fifth contributor—and possibly the most interesting—to the creation of useful stores of aggregated data.

Content re-use

In the meantime, we turned to the other half of our process map.

Of all of those qualities which became known to the SCORM community as the “-ilities”, reusability has been the subject of the most heated discussion. What might initially have seemed like a modest aspiration, proved in practice to be very difficult to achieve.

We agreed that if learning content were to be widely reused, it needed to be:

  • of good quality (otherwise there would be little incentive to use it at all);
  • adaptable (because no two teaching contexts are likely to be identical and so content specifically designed for one purpose will not otherwise be reusable in another);
  • disaggregated (because the larger the granularity of content, the less likely it is to fit into different teaching contexts: teachers tend to be magpies, accumulating small pieces of content from different places, but rarely locking themselves into a single, all-encompassing script).

Perhaps the most interesting of these three criteria is the proposition that, in order to be susceptible to improvement, content must be of a certain quality to start with. This suggests that we are looking at a quality threshold, below which content will be stillborn and above which it will not only survive but steadily improve.

Good quality content

We identified four contributory factors in creation of high quality content:

  • authors must be incentivised to create quality content;
  • there must be investment in the authoring process (which includes the salaries of suitably skilled authors, as well as other development costs);
  • there must be an understanding of what quality means—and in particular that this generally means activity-based content and not just expositive media;
  • authors should have access to high-level tools (enabling the production of this kind of activity-based content without having to become full-time programmers).

Colin and I wanted to make the first point about incentives because, even though it might appear to be so obvious as not to be worth mentioning, we both felt that it was not true of most academic authors working in HE. When a lecturer or teacher produces poor quality content for the use of their own students, it is very unlikely that the author will suffer any financial or professional penalty; any shortcomings can be mitigated by the author’s teaching; and so few people are likely to use home-made content that it is in any case not worth putting a significant amount of work into attempting to achieve perfection.

The need for significant investment in the creation of high quality content also challenges the whole notion of the amateur teacher-author producing home-made or OER content. The proof of this pudding lay in the fact that most OER content comprises “expositive media”: PowerPoints, Word documents, a graphic or some video if you are lucky. And yet any halfway-competent educationalist knows that we learn by doing, not by having information thrown at us. In this sense, hardly any current amateur-authored learning content gets anywhere near the threshold at which it will be reused and improved.

This same conclusion had been endorsed by Adam Cooper’s presentation earlier in the day, in which he analysed the use over the previous year of key buzzwords in the CETIS blogs[3]. The popularity of each term was measured both by the frequency of its use and the degree of approval or disapproval with which it was used. Adam noted that the term “OER” (Open Educational Resources) had declined in popularity, concluding that the promise of OER had not been realized because in practice (wait for it) it tended to comprise pedagogically uninteresting PowerPoints and Word documents.

Even if teachers and lecturers are not generally able to produce quality content, they nevertheless need to be able to control the way that content is presented to their students. No teacher wants to be locked into a script devised by someone else. We concluded that this circle would be squared by in the production of appropriate high-level authoring tools. These needed to allow the creation of activity-based content,  automating the bits that non -technical authors could not be expected to master, and delivering consistent, high-quality pedagogical processes with general application. The teacher or lecturer would then be left to add the academic and curriculum information that made the activity appropriate to the current educational context and goals.

High-level authoring tools could be thought of as another form of adaptable content. In some circumstances, you might choose to think of an authoring system creating different content instances; in other circumstances, you might choose to think of the authoring tool as the content, with adaptations for different educational contexts. The two scenarios differ only in respect of whether you choose to view the tool or the activity as the true “content”. This illustrates a point made elsewhere[4], that there are many different types of content in an educational ecosystem.

Commercial provision

Given that content (including high-level tools) created by amateurs is not generally adequate, it follows that commercial provision is the key enabler for the production of high quality, reusable content.

In particular, we thought that commercial provision would enable:

  • appropriate levels of investment;
  • creating the incentives to optimise content;
  • providing the necessary expertise required to create high-level authoring tools.

It is worth noting that this point that, although it has been clear that amateur content is of little pedagogical use, the creation of learning content by amateurs has underpinned the strategy pursued by Becta and now the DfE over the last ten years. All the evidence is that the utility of the current generation of learning platforms is undermined because of the difficulty of populating them with satisfactory content. A fashionable perspective amongst many in the ed-tech community is that students should be encouraged to create learning content.

There are complex reasons why the commercial supply chain is currently so weak and our group did not have the time to discuss this question thoroughly. The one key enabler for commercial provision that we identified was good content distribution systems. Without the ability to sell widely, commercial companies will not be able to see the returns to justify the substantial investments required. Other key barriers in the current environment include the constant meddling in the market by government procurement frameworks; the funding of public sector initiatives which undercut commercial suppliers without providing any lasting benefit; and the uncompetitive practices of awarding bodies, which hold an important key to the sale of learning content in formal education.

There was an interesting exchange in Alan Sugar’s boardroom during the final round of his last Apprentice TV show. One of the finalists, misguidedly trying to appeal to Lord Sugar’s altruism, suggested that he set up a project to promote the use of computers to improve education. Lord Sugar turned to Bordan Tkachuk, CEO of Viglen, which was awarded a place on Becta’s Infrastructure framework in 2010, remarking “We tried that, didn’t we. There’s no money in education”. Until innovative commercial companies are able to make a reasonable return on their investments in education technology, education will have to continue to make do and mend with low-grade, home-made, overwhelmingly expositive learning content.

It always surprises me what a contention this point of view this tends cause in the ed-tech community. In my experience, there are commonly three objections made to commercial content.

1. “There is not the money in education to fund commercial provision of content”. But the cost of education in this country lies overwhelmingly in wages and very little in resources. In many subject areas, there are not enough suitably qualified people to fill the jobs being advertised. This presents the ed-tech industry with a huge opportunity to create software systems that reduce the wage bill, not costing but saving money. This does not mean that doing away with teachers, any more than books do away with teachers. It means saving teacher time and making efficient use of a scarce and expensive resource.

2. “Commercial content disenfranchises the teacher, who is the only person qualified to design an appropriate teaching methods for his/her students”. This argument judges the quality learning content to be supplied by an efficient market against the degraded standards of most OER. As identified by our discussion group, quality content can be easily disaggregated, selected and adapted, giving the teacher control over how the content is used.

3. “Look at company x: they produce product y which is very expensive and is completely useless”. If this type of anecdote is true, then the fault lies not with the company (who cannot be blamed for selling its products); nor with the principle of commercial provision, which has done so much to raise our standards of living; but with the incompetent, bureaucratic systems that procured such inappropriate technology in the first place. Given an open market, teachers would buy what they found useful, and successful companies would be those who supplied such useful products.

Outcome metrics

So far, our system map has developed as a straightforward hierarchy. Starting with two initial enablers, it has divided itself neatly, like a brain, into two separate hemispheres. Towards the end of our discussion, we were able to add a critical piece that linked the two hemispheres and promised the kind of virtuous circle that could trigger exponential improvement.

Unlike expositive media, the kinds of activity-based content proposed in the right hand side of our system map play or run over time and produce outcomes; outcomes can be expressed as data; and data can be automatically exported into the type of repository proposed in the left hand side of our system map.

Activity-based content is therefore a causal contributor to the creation of automatically-generated outcome metrics, which are in turn a causal contributor to the accumulation of voluminous, varied and valid aggregated data repositories.

This is a critical enabler because the Achilles heel of any learning analytics system is data input. It is not remotely realistic to expect serried rows of teachers to man the ICT suite long into the evening, keying in student performance metrics. The only way to get this data is to capture it automatically, and that data will only be captured if you digitize the activities which form the basic “stuff” of instruction.

There are many examples which can be used to illustrate this general principle. The revolution in supermarkets was largely based on the bar-code reader, which automated data input at the check-out. One aspect of the commercial potential of the internet lies in the ability of companies to capture data about their customers, either by getting them to fill in forms themselves, or by tracking their browsing behaviour. A friend of mine involved in developing in-car navigation systems told me about the problem they had in getting reliable data about traffic flows in London. Their answer was to give away their GPS-enable systems to all London taxis, and monitor traffic flows by watching their own boxes move around the city. Data capture is critical and in our system map, this sine qua non of learning analytics is represented by outcome metrics.


Interoperability is the last enabler to be added to our system map—and it is only having completed the rest of that map that its true significance can be appreciated.

Interoperability is a key enabler for:

  • the accumulation of data from varied sources, which will need to share common formats, transports and protocols to communicate effectively;
  • the accumulation of semantically meaningful data, given that it is a significant challenge to preserve the meaning of data when it is passed between different parties;
  • the automatic transmission of outcome metrics from activities managed by third-party software to centrally managed analytics systems;
  • the management of adaptable content, where the settings or parameters which define particular adaptations may often need to be accessed through a common learning management system or other gateway;
  • the management of disaggregated content, given that there needs to be agreement on what level of granularity is permissible to the rights holder, and common standards of dividing and joining content aggregations;
  • the distribution of content, given the need for standard formats of metadata to describe the coverage, legal constraints, behaviours, and intended usage and popularity of distributed content objects.


Our discussion focused on a single, narrowly defined outcome. The causal enablers of this outcome are nevertheless ones that would be likely to appear in almost any analysis of causal relationships for effective e-learning.

At the end of the discussion, we highlighted what we felt were the most important levers for change, choosing interoperability, semantically meaningful data, and content distribution systems.

In redrawing the map, I would change the emphasis a little and choose instead interoperability, outcome metrics, and commercial provision. My reason for changing this emphasis is that semantically meaningful data and good metadata for content distribution are both implied by interoperability, while the development of appropriate distribution channels will be driven by a healthy commercial market. I therefore see this as the prior cause, being dependent on the procurement policy pursued by governments, which is outside the control of the industry and the standards community.

Of all the different types of interoperability, I have selected outcome metrics as the most significant, not just because it is the link between the two halves of our system map, but because it is the key means by which management data will be captured. It also presupposes the critical understanding that learning content should be about activity, and not about the production of expositive media.

St Paul selected three cardinal virtues to recommend to the Corinthians[5] and both the discussion group and my re-worked version of its output selected three key levers for change. If the group and I disagree slightly on our selection of the three, I do not think there was any doubt about our selection of the single most important lever for change. Paraphrasing St Paul, “the greatest of these is interoperability”.


I attach a photograph of our original flipchart map. I have modified this a little, rearranging some of the linking arrows and leaving out two boxes, analytical skill of author, which I think is implied in investment, and personalised learning, the relevance of which in this context is not clear to me.

We created a diagram that distinguished between positive and negative causal influences (shown by blue and red boxes respectively). In retrospect, I am not sure that this distinction was useful. Every negative cause could be equally well expressed as its positive converse: one could equally well say that progress was favoured by good quality content as that it was disfavoured by poor quality content. The decision to express something as either positive or a negative influence did not therefore reflect on the essential nature of the causal relationship, but suggested a generalised assessment of the current state of affairs (e.g. was the current quality of content generally good or generally bad?). In redrawing our mind-map for this post, I have therefore removed this distinction, expressing all relationships in their positive form.

Related posts

Becta's ICT markScrapping “ICT”, argued that the term “ICT” was no longer useful and should be scrapped. I did not know at the time that the Royal Society had published a report 5 days earlier which came to the same conclusion.
Detail of horse from Elgin marblesAristotle’s Saddlemaker makes the argument for education-specific software, based on a discussion of the relationship between ends and means found originally in Aristotle’s Nichomachean Ethics.
What do we mean by content?What do we mean by “content”? argues that we need a broader understanding of the term “learning content”, including for example “learning activities” and not just “expositive media”.
Home page, with a full listing of posts on this blog.


[1] JISC/CETIS is the lead body for learning technology standards for Higher Education. The details of their 2012 conference are at http://jisc.cetis.ac.uk/events/register.php?id=294.

[3] http://blogs.cetis.ac.uk/adam/2012/02/21/edtech-blogs-a-visualisation-playground/ (although the link to “falling terms”, which is the one required to view the downward trend for “OER” is not working at the time of writing.

[4] I touch on this point in Aristotle’s Saddle-maker, at https://edtechnow.net/2012/01/25/aristotles-saddle-maker/; and will return to the point in more detail in my next post.

[5] See Corinthians, Chapter 13: “Now faith, hope, love, abide these three; but the greatest of these is love”.

5 thoughts on “Learning analytics for better learning content

  1. In a subsequent conversation, Adam Cooper told me that I had misinterpreted his comments at the plenary. Although it was true that less emphasis had been placed on the term in CETIS blogs, he still felt there was considerable interest in OER in the JISC/CETIS community.

  2. Crispin, I enjoyed your “Learning Analytics for Better Learning Conten”t blog post and thought you had the perfect rationale for the DMD the need for semantically meaningful data and why the aggregation.
    The whole quest is about producing “good” content and with out regard for the other factors – incentives, tools, etc, I agree good content is activity based content. “we learn by doing, not by having information thrown at us.” That is why activities like simulation simulators, games, drills,etc are great examples of activity based content
    Your sketch (Activity based Content > Outcome metrics>Aggregation of data)is the rationale for the data – outcomes or show me the data. Your analogy of what big data did for guys like Walmart and others by analyzing the sales info /bar code data or the guy that gave away GPS devices to taxi cab drivers to figure out the traffic flow in London by capturing the data off of their GPS systems – brilliant stories of why machine read data and what can happen with learning analytics and what we may accomplish. Thanks Frank

    • Hi Krisca Te,

      I am sorry to be so slow in responding – I am afraid I got distracted from the blog over the UK summer. Thanks very much for the infographic: I have embedded this on a new “guest blog” accessible through the main menu – and I will comment soon.

      I intend to post again soon on learning analytics – and will take time to have a good look at your site before I do.

      Best, Crispin.

  3. Pingback: Learning analytics for better learning content | Ed Tech Now | ArcheMedX

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s