expert reaction to a preprint reporting deleted deep sequencing data of early SARS-CoV-2 sequences from Wuhan

A preprint, an unpublished non-peer reviewed study, looks at deleted deep sequencing data in an attempt to shed more light on the early Wuhan SARS-CoV-2 epidemic.


Prof David Robertson, MRC Investigator, Head of CVR Bioinformatics MRC-University of Glasgow Centre for Virus, said:

“This seems a bit of a no story. The paper the “deleted” data was originally published in is diagnostic in focus, and all of the variation (the deleted data) is presented in their tables. Obviously this is not ideal for reproducing their results but the paper wasn’t looking at the origins of SARS-CoV-2, rather it is a methodology paper, which is the reason the utility of this data set was missed. It’s hard therefore to conclude this is a cover-up rather than a more mundane deletion of data. Clearly the authors should have shared the data and hopefully it’ll be made available ASAP so it is accessible to others. We also know already that the Huanan market wasn’t the sole spillover event and SARS-CoV-2 was probably circulating in late October/November, e.g., and so there’s not much new understanding being presented.  New data is always useful but this is an excellent example of where the peer review is needed, e.g., including an email as a figure is highly unusual. Due to the high genetic divergence involved it’s also a bit simplistic to expect the human SARS-CoV-2 sequences to be properly rooted based on the RatG13 bat virus. A couple of back-mutations would mess with the assumption being made here and the issue with rooting as discussed in some detail here,”


Prof Martin Hibberd, Professor of Emerging Infectious Disease, London School of Hygiene & Tropical Medicine (LSHTM), said:

“This is an interesting, but speculative, paper. It is suggesting that there were earlier cases of SARS-CoV-2 infections than those reported in the Wuhan market, based on some deleted partial sequences – and that these partial sequences were more similar to the sequences derived from bats.

“More work would need to be done to know how solid these findings are, particularly the accuracy and reasons for the sequence deletions, but it does look intriguing.”


Prof Andrew Preston, Professor of Microbial Pathogenesis, University of Bath:

“The paper has some interesting aspects, but it’s going to be very difficult to corroborate the work. The author raises issues with the provenance of SARS-CoV2 samples, but including data from seemingly deleted SRA files raises the same provenance issues.

“The language of the paper is unusual, its contains a significant degree of supposition and conjecture, cites blog posts and appears to be pointing towards a deliberate cover up by Chinese authorities of early sequence data from Wuhan. However, this is an entirely subjective appraisal of the situation, which will be very difficult to confirm or disprove.

“Overall, the paper is suggesting that the SARS-CoV2 sequences attributed to patients with contact with the Wuhan Seafood market, early in the outbreak in China, are not representative of other samples of viruses attributed to other patients from other locations at that time. The paper does add further discussion to how difficult it is to identify the very earliest patients in the outbreak (i.e. patient zero), but as mentioned, the paper veers into non-scientific areas such as cover ups and deliberate with-holding of data.”



Dr David Matthews, Reader in Virology, University of Bristol, said:

“This is an interesting paper which recovers and reanalyses sequencing data that was submitted to an international archive and subsequently deleted at the request of the Chinese scientists who submitted the data. The analysis lends weight to the idea that while Wuhan is likely to be the origin of the pandemic, the Wuhan seafood market investigated by the WHO team and others is unlikely to be “ground zero”. The paper also implies that Chinese scientists were themselves keen on open sharing of the data they were generating but their initial openness might have been curtailed. Exactly why that initial data sharing was later withdrawn is unclear, but it does not seem to have been on pure scientific grounds.”



Declared interests

Prof David Robertson: “No interests to declare.”

Prof Martin Hibberd: “No conflicts to declare.”

Prof Andrew Preston: “No relevant declarations of interest. I am a partner on a EU IMI-2 project on pertussis (whooping cough) vaccinology. GSK and Sanofi are partners and funders of this project, and Bill and Melinda Gates Foundation is also a funder.

I receive funding from GSK for a Ph.D. studentship, also in the field of pertussis vaccines.”

Dr David Matthews: “No conflicts of interest.”

None others received.



