Scientists comment on a UK Biobank data breach where data was offered for sale online in China.
Anni Feng, Co-Chair of the Digital Futures Policy Centre at the Institution of Engineering and Technology (IET), said:
“Data is incredibly valuable and powerful and therefore often a target for cyber-attacks. But, data is also fundamental to technology and innovation, the more high quality, interoperable data we have provides a more accurate picture.
“Even the data we have now, is not robust enough or readily sharable to allow for advancements in AI and medicine to reach their true potential, which can have an impact on things like reproducibility. Ethical data sharing is going to be critical to the successful development, deployment and assessment of tools. The IET recommends that the Responsible Handover of AI should be included as part of ethical guidance. We also need to ensure we have diverse data from different data sources as there will naturally be a bias in the data if people are volunteering their data or if we model technologies on data from a ‘healthy’ population or ‘unhealthy’ bias.
“Technologies such as environmental sensors, monitoring air quality, temperature, and noise, can be linked with patient health data to improve outcomes. However, fundamentally, transparency around the data used is critical to ensure that we maintain trust and build public confidence in the use of their data. Training and emphasis on data handling skills for researchers working with data should be a key focus.”
Anna Steere, Head of Understanding Patient Data, said:
“UK Biobank participants have made an exceptional contribution to health research, volunteering their data through explicit consent to support the public good. Reports like this will understandably be unsettling, particularly for people who have taken part in good faith.
“It is important to recognise that UK Biobank is a specific, consented research cohort, and not representative of how most people’s health data is routinely used across the NHS. When distinctions like this are unclear, incidents can fuel wider anxiety and undermine confidence across the whole health data system.
“We welcome the swift initial steps taken to remove the listings, suspend access and pause data sharing while safeguards are reviewed. Establishing clear facts and understanding what went wrong will be essential.
“Trust in the use of health data is not built on the idea that nothing ever goes wrong, but on what organisations do when it does -through accountability, openness and a clear commitment to strengthening protections in response. Those early actions matter in maintaining public confidence and ensuring essential research can continue responsibly.”
Prof John Gallacher, Director, Dementias Platform UK, University of Oxford, said:
“Whilst this data breach is alarming, it is also a backhanded compliment to the global value of UK Biobank. UK Biobank is one of the world’s foremost biomedical resources; its breadth and depth of data creating scientific opportunity globally. Critical to this value is maintaining the covenant of trust between researchers and participants. The rapid action of UK Biobank and the UK and Chinese Governments against bad actors to ensure the privacy of participants is testament to the importance of this trust. As a ‘Biobanker’ I am reassured that the value of my small contribution to global health is jealously guarded.”
Professor John Danesh, BHF Professor of Epidemiology and Medicine, Department of Public Health and Primary Care, University of Cambridge, said:
“UK Biobank is a jewel in the crown of UK science—a uniquely powerful resource for medical discovery and arguably the most valuable population-scale biomedical dataset in the world. It’s critical to advancing science to help patients.
“Since making de-identified participant data available to researchers in 2012, UK Biobank has enabled thousands of scientists globally—including my own team at the University of Cambridge and the Wellcome Sanger Institute—to generate insights that are improving the prevention and treatment of disease.
“It is reassuring that UK Biobank’s swift and robust response has reportedly prevented an illegal attempt to breach its access rules.
“Continued vigilance, strong governance, and transparency will help sustain the strong public trust already demonstrated in this resource.
“At moments like this, it is important to recognise the profound scientific and public health benefits that the data in UK Biobank delivers. Responsible, secure access to that data is fundamental.”
Professor Andrew Morris, Director of HDR UK, said:
“To find data for sale on a website in China will be greatly concerning for participants. Even with all identifying information removed from the data, this is still sensitive data and a serious data breach.
“I am glad to see that rapid action has been taken at the highest levels with a joined-up response between UK Biobank and the UK and Chinese governments. The fact that the datasets were rapidly taken down and interim measures have been put in place offers reassurance. It is important that there is a full review.
“Health research using large de-identified datasets is delivering great advances in the prevention, diagnosis and treatment of diseases affecting millions of people in the UK and globally. UK Biobank has been at the vanguard of many of these discoveries. But such research is only possible with the trust of participants in how their data is handled.
“Significant advances have been made in the last 5-10 years in how data is accessed by researchers to give added layers of security and privacy protection. This has included the use of secure environments that bring researchers to the data, not data to the researchers and also allow rigorous checking of all information that leaves those environments. UK Biobank has moved to a secure environment and has described how they intend to introduce output checking. This is an important step.
“The future of healthcare is increasingly data-dependent. We must double down on implementation of secure systems to enable essential research that is responsible, trusted and can operate at scale.”
Prof Luc Rocher, Associate Professor, Oxford Internet Institute, University of Oxford, said:
“This is the 198th known exposure of UK Biobank data since last summer (see https://biobank.rocher.lc/). UK Biobank data is not just available for sale, it also remains available online for anyone to download today. Researchers have, in the past, repeatedly and accidentally uploaded datasets to online code sharing platforms, and many of these files are now replicated across the web. UK Biobank has sought to downplay the importance of such exposures, stating that data are “de-identified” and free of “personally identifying information,” and that no participant has been unwillingly re-identified. Last month, The Guardian correctly identified a single participant from just two easily known facts. The actions being taken today are inadequate to take down data from the web, and cannot protect the 500,000 participants whose intimate health records have been exposed 198th time this year.”
Prof Elena Simperl, Department of Informatics, King’s College London, said:
“The recent UK Biobank data exposure is not a moment to point fingers, but to take seriously what it tells us about national data infrastructure. Initiatives like UK Biobank are absolutely essential to driving innovation across the health and life sciences ecosystem. With longitudinal data on half a million volunteers and more than 18,000 peer-reviewed papers to its name, the UK is world-leading in this space, and rightly proud of it. What happened here was an infrastructure problem, not the result of a complex cyber-attack. Too often, the costs of maintaining infrastructure for flagship data stewardship projects like this are treated as an afterthought. The UK has built something remarkable, but we need to keep investing in keeping it safe.”
https://www.bbc.co.uk/news/articles/cpvxgl3n138o
Declared interests
Prof John Danesh: Professorial Fellow, Jesus College, Cambridge. Faculty Member, Wellcome Sanger Institute. Director, Health Data Research UK-Cambridge
Professor Andrew Morris: “Director of Health Data Research UK, the national institute for health data science; is Professor of Medicine and Vice Principal at the University of Edinburgh; is President of the Academy of Medical Sciences, has minority (<1.5%) shareholding in Aridhia Informatics and a small number of shares in GSK (<£5,000).”
Prof Luc Rocher: “no conflicts of interest”
Prof Elena Simperl: “no conflicts”