select search filters
roundups & rapid reactions
factsheets & briefing notes
before the headlines
Fiona fox's blog

expert reaction to a study looking at the effectiveness of AI at diagnosing disease compared to health professionals

Research, published in The Lancet Digital Health, reports that artificial intelligence may be as effective as health professionals at diagnosing disease.


Dr Franz Kiraly, UCL and Turing Fellow in the Department for Statistical Science, Alan Turing Institute, said:

“The study looks like it’s been carefully conducted (as a review), and the statements are valid.

“The big caveat is, in my opinion, that the story is not ‘AI may be as good as health professionals’, but that ‘the general standard of evaluating performance of AI is shoddy’.

As the study states, only a small fraction of this type of study adhere to the more obvious best practices, and there are a number of potential subtle issues that the review didn’t check.

“Regarding doing RCT with algorithms: I don’t think there’s a major problem scientifically, the problem is (a) with the regulators (otherwise no incentives), and (b) with infrastructure – AIs today usually aren’t of sufficiently high quality (in terms of software/infrastructure) that they could be systematically to test in a study.”


Dr Maggie Cheang, Team Leader in Genomic Analysis, The Institute of Cancer Research, London (ICR), said:

“This review article demonstrated the potential use of applying AI in image analysis of radiological images at the diagnostic setting, but the promise is a bit too pre-mature.

“The AI algorithm remains “black box” in how they are being trained, optimised and validated. I am looking forward to a real “head to head” comparison between human assessment and AI diagnostics in accuracy and most important added clinical benefit in a multivariate analysis in randomised trial, like the same standard we have applied to other genomic biomarkers IVD (in vitro diagnostic devices).”


Dr Nils Hammerla, Director of Machine Learning and Natural Language Processing, Babylon, said:

“I congratulate the authors on this monumental effort to sort through many thousands of studies to conduct this meta-analysis. I am glad to see the promising results and that studies have improved in quality over time. 

“I think a major point to take-away is that machine learning researchers have to adapt their methodologies to better bridge the gap between their community and clinical practice. This is vital to increase trust in the AI and enable their research to actually make a difference. This starts with their language – there is very little consistent use of terminology – and spans to more thorough study design and standardised reporting of results.

“Machine learning can have a massive impact on problems in healthcare, big and small, but unless we can convince clinicians and the public of its safety and ability then it won’t be much use to anybody.”


Prof David Spiegelhalter, Chair, Winton Centre for Risk and Evidence Communication, University of Cambridge, said:

“This excellent review demonstrates that the massive hype over AI in medicine obscures the lamentable quality of almost all evaluation studies. Deep learning can be a powerful and impressive technique, but clinicians and commissioners should be asking the crucial question: what does it actually add to clinical practice?”


Prof Paul Leeson, Professor of Cardiovascular Medicine, University of Oxford, said:

“This paper summaries the current state of research testing how well artificial intelligence identifies disease in a medical image compared to a clinician. The review has been performed very carefully but real challenges were encountered trying to get a useful result. The authors had to lump together findings from completely different medical problems and types of imaging, including research performed at a very early stage in the development. They have also had to ask a very simple question, which is not really relevant to the majority of AI applications in healthcare. Clinicians are not likely to use results generated by AI in isolation, head-to-head with a computer, but rather combine the information from AI tools with other sources to decide how best to look after a patient. Importantly the work highlights a new phase of research is needed, using more detailed trials, to work out the best ways to use artificial intelligence in healthcare.”


Dr Peter Bannister, Chair of the IET’s Healthcare Panel, said:

“The application of artificial intelligence (AI) techniques to diagnostics continues to attract considerable attention given the potential upside in terms of sensitivity, repeatability and throughput when applied to large, information-rich datasets, including medical images. However, there has been limited adoption to date and moreover there exists scepticism as to whether these approaches can ever yield a net patient benefit when implemented at scale. This comprehensive study clearly illustrates what is possible but also identifies the large evidence gap faced by nearly all groups who have tried to apply AI to diagnostics. The paper considers a broad range of clinical applications and highlights key issues with study design and reporting which, if applied, should deliver a much-needed, greater level of clinical rigour to the evaluation of these techniques, which should ultimately increase the likelihood of effective AI diagnostics being integrated with routine clinical workflows.”


Prof David Curtis, Honorary Professor, UCL Genetics Institute, said:

“This is an interesting report with some important findings.

“I think the most striking aspect is that out of over 20,000 studies of applications using AI for medical imaging published in scientific journals only 14 were good enough to use. That’s fewer than 1 in a thousand. Almost all published studies of AI for medical imaging did not use proper methods and can be safely ignored.

“Of the tiny handful of studies which were actually valid, the results show that AI may interpret imaging about as well as medical professionals but in many cases the professionals were denied access to information which would have been available to them in a real clinical scenario.

“The review authors draw attention to the many problems which may arise when attempting to apply AI approaches in clinical practice and which are not captured by reports of accuracy in an artificial test situation.

“The review confines its attention to the use of AI for interpreting medical imaging. In a way, this ought to be one of the easiest areas to apply deep learning approaches and some success in this limited area certainly does not mean that AI implementations might be useful in other areas of medical practice.

“To be honest, implementing AI processes demands very advanced computational techniques which editors, reviewers and readers of medical journals simply do not properly understand. This means that they can fail to see the limitations of these methods and that people can form an over-optimistic opinion of their usefulness.”


Prof Richard Mitchell, Professor of Cybernetics and University Teaching Fellow, University of Reading, said:

“Great strides are being made in artificial intelligence, including the use of Deep Learning methods, and in some circumstances such systems can outperform humans. One issue with Deep Learning is that, contrary to say an Expert System (a computer system that emulates the decision-making ability of a human expert), it is not straightforward to ‘explain’ why a particular outcome was reached. There are some examples where a combination of human and artificial intelligence gives an even better result, and that may be the better route to take.”


Dr Dennis Wang, Lecturer in Bioinformatics and Genomic Medicine, University of Sheffield, said:

“Despite the performance of the deep learning algorithms at predicting diagnosis, the authors rightly pointed out that many of these algorithms were poorly described. This would hinder the adoption of AI techniques into the clinic, since patients and clinicians not only want the final diagnosis but also how the algorithms arrived at their conclusion. Would you feel comfortable listening to a ‘blackbox’ without knowing how it worked?”


A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis’ by Liu et al. was published in The Lancet Digital Health at 23:30 UK time on Tuesday 24th September. 

DOI: 10.1016/S2589-7500(19)30124-4


Declared interests

Dr Nils Hammerla: “I work at (and am a shareholder in) Babylon, which uses artificial intelligence and machine learning to provide healthcare tools for patients and clinicians.”

Prof Paul Leeson: “Research grants in the field of AI and medical imaging. Founder and Director of Ultromics, an echocardiography-AI company.”

Dr Peter Bannister: “I’m on the same leadership training course as one of the authors but we have no professional links”

Prof David Curtis: “I have no conflict of interest”

Prof David Spiegelhalter: “No conflicts of interest”

Prof Richard Mitchell: “None”

Dr Dennis Wang: “I have no conflicts.”

None others received.

in this section

filter RoundUps by year

search by tag