select search filters
roundups & rapid reactions
before the headlines
Fiona fox's blog

expert reaction to ONS COVID-19 Infection Survey technical article: analysis of populations in the UK by risk of testing positive for COVID-19, September 2021

The Office for National Statistics (ONS) have released a technical article looking at risk of testing positive for COVID-19.


Dr James Doidge, Senior Statistician, Intensive Care National Audit & Research Centre (ICNARC); and Honorary Associate Professor, London School of Hygiene and Tropical Medicine, said:

“As one of the largest surveys of randomly sampled individuals in world focusing on COVID-19, the ONS Coronavirus Infection Survey is one of the most valuable resources on the planet.  This analysis provides yet more evidence (see Figure 3) that the immunity conveyed by prior infection is comparable to that provided through vaccination.  Governments everywhere should consider the implications of this for vaccine mandates and for vaccination programs targeting populations with high prevalence of infection, such as children and adolescents in the UK.”


Prof Kevin McConway, Emeritus Professor of Applied Statistics, The Open University, said:

“This ONS article is indeed technical, and pretty complicated in many ways.  I think the modelling team from the collaborating universities and ONS have done a competent statistical job, though, as with any dataset as complicated and rich in information as this one, other analysts might have chosen to analyse the data in different ways, that could possible have led to rather different results.

“I don’t think there’s anything particularly surprising or new in the headline findings in the ONS article, based on data from the Covid-19 Infection Survey (CIS) from the fortnight 29 August-11 September.  That’s a large survey of a representative sample of the UK community population, that has been going on since the end of April 2020.  The results here come from almost 170,000 people, who were tested in that fortnight, as part of the CIS, for the virus that can cause Covid-19, and also completed questionnaires about their behaviour and characteristics.  Just under 1,900 of them tested positive for the virus (that is, about 1 in every 90).  The headline findings for that fortnight, briefly, are that vaccinated people were less likely to test positive, and people in households of three or more, younger people, people who never wore face coverings indoors, and people who had more social contacts were all more likely to test positive.  As I said, none of that is surprising.

“What’s perhaps more interesting relates to the characteristics that the analysts did not find to be associated with the chance of testing positive.  The data were compatible with women and men having the same chance of testing positive; with people in multigenerational households having the same chance of testing positive as people in other household types; with people of non-white and white ethnicity having the same chance of testing positive; and with the chance of infection in rural areas being the same as in cities and towns.  This could have been because there really was a difference but the level of statistical uncertainty means that the difference was within the margin of error that must always arise with survey data and with statistical modelling.  However, it could have been because there really was no difference, or only a very small difference.

“For instance, the central estimate is that people of non-white ethnicity (who were analysed together as just one group) had pretty well the same chance of infection as people of white ethnicity, about 1 in 90.  But because the number of ethnically non-white people in the CIS sample was considerably smaller than the number of white people (because the UK population contains many more people of white ethnicity than of non-white ethnicity), there is quite a wide margin of error around that estimate, and the chance of infection in people of non-white ethnicity (other things being equal) could have been a different figure, between 1 in 75 and 1 in 105.  This does tell us that there is unlikely to be a really huge difference between the chance of infection in the white and non-white groups, and that there may be no difference or only a very small difference, but somewhat larger differences are also compatible with the data.  Also, in interpreting these figures we need to take into account that the estimates take out of consideration (“control for”) differences between ethnic groups in terms of things like age, some household characteristic, and some characteristics of the areas where people live.

“There are several issues that need to be taken into account in interpreting results like these.  Those issues include the use of what’s sometimes called ‘statistical control’ to try to pick out the extra effects of a factor in addition to other factors in a statistical model, and the fact that this study is observational, so that despite the statistical control, it can’t be clear on what causes what.  I say more about these points in the ‘Further information’ below.  I should also point out an error in Figure 4 in the ONS article.  This gives figures for the likelihood of infection, depending on physical contacts with under 18 year olds and on social contacts with 18-69 year olds.  However, the different categories are given in the Figure as “Aged 0”, “Aged 1 to 5”, and so on.  The text of the article, and (more clearly) the accompanying data file, make it clear that these should be “No contacts”, “1 to 5 contacts”, and so on.

Further information

“There are some issues that always arise in interpreting any complicated statistical model of this kind on data from a survey of this nature.  An important point is that each measure of the association between a single characteristic and the chance of testing positive takes into account the other factors that were examined in the same statistical model.  So, for instance, one of the findings is that the chance of testing positive in people who had their second Pfizer jab more than 150 days ago was about a third of the chance of testing positive in unvaccinated people (with a statistical margin of error going from roughly a quarter of the risk to a bit less than half the risk).  But there will be lots of other differences, on average, between unvaccinated people and people who had their second Pfizer jab quite a long time ago, apart from their vaccination status.  So the estimate of a third is the estimate of the difference you would get if you compared to groups of people, that both had the same pattern of all the other factors in the statistical model (age, sex, the sort of place they live, ethnicity, household size, whether they were previously infected, their employment status, their smoking status, and more), except that one group was unvaccinated and the other had had two Pfizer jabs at least 150 days ago.  This process, sometimes called statistical control, is done in order to try to pick out the association of just one factor at a time with the chance of testing positive.  It’s done in a complicated way in this study, because there are three different statistical models, and the effects of different factors are interpreted in different models.

“To relate this to the example I mentioned previously, about ethnicity, other studies, particularly in other times earlier in the pandemic, did (sometimes, though not always) find differences between ethnic groups in infection risk, while this study did not.  But the finding in this study that the chance of infection in people of non-white ethnicity could be the same as in people of white ethnicity is making that comparison after taking into account many other factors such as household size, age, and the type of area where people live (urban/rural, level of deprivation), and so on.  Some previous studies, early on, did not take into account some relevant factors.

“But another point is that the study is observational – that is, the researchers don’t change anything that the survey participants do, other than testing them and getting them to answer questions.  That means that we just can’t be sure about what is causing what.  There will be many differences, on average, between people who tested positive and people who didn’t.  The way that the statistical models are fitted means that the analysis has to come extent, taken account of other factors when measuring the association between a given factor and the chance of testing positive.  But this can’t take into account any factors that weren’t recorded in the survey.  So what could be taken to be a cause-and-effect association between a factor and the chance of testing positive might actually be caused by something else that wasn’t in the data or the statistical model.

“In fact it’s likely to be more complicated still.  There could well be chains of cause and effect involving several factors, for instance, and picking those out from statistical analyses like this is essentially impossible, at least not without taking other information into account.  Trying to take this into account to a certain extent seems to be one of the reasons behind the researchers’ use of several different statistical models, and there’s more on this in the preprint1 that is referred to in the ONS article.  But the researchers, rightly, do not claim that this clears up all aspects of what causes what here.  I do agree with the researchers’ conclusion in that preprint, that “the screening process presented could be a valuable tool in understanding the characteristics driving current SARS-CoV-2 positivity, allowing us to provide enhanced up-to-date understanding of the pandemic across the UK.”  But that, rightly, isn’t saying that their methods can tell us all we need to know about cause and effect.

“Finally, one piece of clarification.  The diagrams in the article show, for the various characteristics, what is labelled as “Likelihood of testing positive”, which is given as a multiplier (2x, half, 1/3, and so on).  This is not referring to the technical statistical term “Likelihood”, which has a different meaning.  What’s being shown is the chance of testing positive, for people with the characteristic in question, compared to the chance in the ‘reference category’ for that factor.  So, in Figure 4 in the article, and ignoring the margin of error, the “Likelihood of testing positive” for people who never wear a face covering in enclosed spaces is a little over 1.5x.  That means that the chance that someone who never wears a mask in enclosed spaces was infected, in the fortnight in question, was a little over one and a half times the chance that a person in the reference category, “Always wears face covering in enclosed spaces”, has of being infected, when all the other factors in the statistical model are controlled for.  If you wanted to know what the chance of being infected actually was for people in that reference category, you could look it up in the data file that accompanies the article – it’s about 1 in 110, so that the chance of being infected for someone who never wears a face covering is about 1 in 70, other things in the model being equal.  And one final technical point about this, in the unlikely event that anyone is concerned – those “Likelihoods” are actually odds ratios, and in general the odds ratio isn’t the same as the ratio of chances (probabilities) – but at the levels of infection that were prevalent in the fortnight in question, the odds ratios will indeed be very close to the ratio of probabilities, so this nit-pick is irrelevant.”




All our previous output on this subject can be seen at this weblink:



Declared interests

Dr James Doidge: “No conflicts of interest to declare.”

Prof Kevin McConway: “I am a Trustee of the SMC and a member of its Advisory Committee.  I am also a member of the Public Data Advisory Group, which provides expert advice to the Cabinet Office on aspects of public understanding of data during the pandemic.  My quote above is in my capacity as an independent professional statistician.”

in this section

filter RoundUps by year

search by tag