In the UK we are right in the middle of the two week period where 16 and 18 year olds get the results of the public exams that will determine so much of their future prospects. For the past two years it has underlined problems of algorithmic accountability, transparency and ethics.
The 2020 debacle
Two years ago, in the midst of the pandemic, the UK government announced that all 2020 public exams would be cancelled and that instead students would be awarded grades based on the assessments of their teachers, CAGs (Centre Assessment Grades).
Ofqual (the Office of Qualifications and Examinations Regulation) who regulate all public exams in the UK was concerned that, because teachers tend to be overly optimistic about their students, this would result in the awarding of higher grades than they deserved, leading to rampant grade inflation.
So, Ofqual asked teachers to also assign a ranking for each student in their class. The ranking was just a number, from the best performing to the least, with no joint rankings of any students allowed. Teachers found this incredibly challenging to do and it led to many sleepless nights, with teachers worrying about the impact their ranking might have on each student’s future.
Ofqual said the rankings would be used “to standardise judgements—allowing fine tuning of the standard applied across schools and colleges” but no detailed information about how the ‘fine tuning’ or ‘standarisation’ would work in practice was forthcoming.
The algorithm
The algorithm that Ofqual came up with working with Cambridge Assessment, was complicated but essentially used the following elements to work out the final grades for each student;
- The historical grade distribution in schools from their three previous years (2017-2019).
- The rank of each student in their year based on their teacher’s evaluation.
- The previous exam results for a student in each subject.
But, the way the algorithm was designed ensured that the rankings, which the teachers had agonised about providing, ended up being the most important element, largely determining each student’s final grade.
Because, in practice, the model looked at the historical grade distribution of a school and then decided the 2020 student grade on the basis of their ranking in the current year group – compared to previous ones. For example, if a student was at the mid-point of the ranked list of students, then their grade was roughly whatever the person at the mid-point in the ranking list in that subject in their school in the previous three years had obtained.
The results were catastrophic. A mass downgrading of grades resulted in many students losing university offers on Results Day with far-reaching and long-term impacts for their future. There were three particular aspects of Ofqual’s algorithm which caused problems;
Outliers
If you were a high-performing student from a school which had not, for the previous three years used for the data in the algorithm, produced one, you were penalised. I.e. if no one from your school had got an A* grade in your subject in past three years, you were very unlikely to be awarded an A*, even if your past grades and CAG indicated that you should.
Small class sizes
In cases where five or fewer students from a school were entered for a subject, CAG (teacher) grades were used to award results instead of the algorithm. These were typically higher than the grades that the algorithm generated and usually benefitted students from independent (private) schools.
Rounding down
A rounding down of the top grades resulted from the fact that the model predicted grade proportions in any class on a continuous scale. This scale then had to be rounded to fit the specific number of students in the class. For example: in a class of 27, if the model suggested that 5.7% of the class should be awarded an A* grade, should one student (3.7%) or two students (7.4%) receive an A*? The algorithm was designed to round down for the higher grades, leading to a mass downgrading of CAGs.
A cautionary tale of algorithmic accountability
After a huge national outcry when grades were initially announced, with many individual cases of unfairness reported in the media, Ofqual backtracked and reverted to CAG grades, throwing the whole UK university admissions process into chaos. Offers which had been withdrawn on results day when grades weren’t met then had to be re-instated, with sometimes courses being full after this took place. After the reversal of the algorithm-awarded grades, colleges and university found themselves having to honour far more offers than they had predicted for that year requiring that students deferred their entry for twelve months. Some students gave up, applying to other universities and missing out on places they were entitled to. The fallout from the chaos caused by the algorithm is still being felt two years later.
It beggars belief that an untested, untried, largely opaque algorithm could ever have been used to determine something as sensitive and life-changing as A level grades. Algorithmic accountability and transparency seem never to have been priorities for Ofqual. They were fiercely protective, verging on secretive, about the design of the algorithm and rejected many outside offers of help to assess it before it was used.
The Royal Statistical Society, for example, which criticised the lack of independent expertise on the technical panel set up to advise Ofqual on the algorithm, called for an advisory panel to be created and suggested independent statisticians sit on it – but were told by Ofqual they would need to sign an NDA to be considered.
The UK’s A level algorithm was one of the first high-profile cases of algorithm accountability and unfairness being brought into the public view. But opaque and biased algorithms are increasingly impacting all areas of our lives; from social media to education, from finance to law and order. If Ofqual had approached the design of their algorithm with a commitment to transparency and involved independent and objective experts in its design there’s a real possibility that the whole debacle could have been avoided. Something that’s worth bearing in mind, both from the perspective of the public – who should be calling for more transparency in algorithms that affect them – and from those who are designing algorithms in the future.
For more on algorithmic accountability and how algorithms are changing our world, pick up a copy of my new book.