Literature DB >> 34868617

Towards nationally curated data archives for clinical radiology image analysis at scale: Learnings from national data collection in response to a pandemic.

Dominic Cushnan¹, Rosalind Berka², Ottavia Bertolli², Peter Williams¹, Daniel Schofield¹, Indra Joshi¹, Alberto Favaro², Mark Halling-Brown^3,4, Gergely Imreh², Emily Jefferson^5,6, Neil J Sebire⁵, Gerry Reilly⁵, Jonathan C L Rodrigues⁷, Graham Robinson⁷, Susan Copley⁸, Rizwan Malik⁹, Claire Bloomfield^10,11, Fergus Gleeson^10,11, Moira Crotty¹², Erika Denton¹³, Jeanette Dickson¹⁴, Gary Leeming¹⁵, Hayley E Hardwick¹⁶, Kenneth Baillie¹⁷, Peter Jm Openshaw¹⁸, Malcolm G Semple¹⁹, Caroline Rubin²⁰, Andy Howlett¹², Andrea G Rockall^8,21, Ayub Bhayat²², Daniel Fascia²³, Cathie Sudlow²⁴, Joseph Jacob^25,26.

Abstract

The prevalence of the coronavirus SARS-CoV-2 disease has resulted in the unprecedented collection of health data to support research. Historically, coordinating the collation of such datasets on a national scale has been challenging to execute for several reasons, including issues with data privacy, the lack of data reporting standards, interoperable technologies, and distribution methods. The coronavirus SARS-CoV-2 disease pandemic has highlighted the importance of collaboration between government bodies, healthcare institutions, academic researchers and commercial companies in overcoming these issues during times of urgency. The National COVID-19 Chest Imaging Database, led by NHSX, British Society of Thoracic Imaging, Royal Surrey NHS Foundation Trust and Faculty, is an example of such a national initiative. Here, we summarise the experiences and challenges of setting up the National COVID-19 Chest Imaging Database, and the implications for future ambitions of national data curation in medical imaging to advance the safe adoption of artificial intelligence in healthcare.

Entities: Chemical

Keywords: Imaging; artificial intelligence; coronavirus SARS-CoV-2 disease; general; machine learning; medicine; radiology; respiratory

Year: 2021 PMID： 34868617 PMCID： PMC8637703 DOI： 10.1177/20552076211048654

Source DB: PubMed Journal: Digit Health ISSN： 2055-2076

Introduction

Medical images have a central role in disease diagnosis, predicting prognosis and risk stratification. Recent advances in computing power and a growth in the availability of larger data repositories (including open-access datasets) have led to an increase in the medical applications of artificial intelligence (AI) systems, with a focus on image analysis to support clinical pathways of care. These systems aim to deliver similar image analysis functions to human interpretation of medical imaging, as well as detect insights beyond the capability of the human eye. In response to the coronavirus SARS-CoV-2 disease (COVID-19), huge efforts have been made by the deep learning community analysing imaging datasets with the aim of enhancing the certainty of COVID-19 diagnosis[2,3] and improving disease outcome prediction[4,5] (such as ITU admission and death) with varying levels of success. , The main barriers to the development of successful AI models have been the variable quality of imaging data available for model training, and the limited amount of curated data that comprehensively encompasses the variability of real-world clinical cases necessary for robust model validation. Until recently, the collection of imaging data to enable the development of AI solutions in the UK has occurred on a relatively limited scale, dependent largely on bespoke local academic and commercial partnerships within healthcare institutions. Homogenous but small datasets are expensive to curate and limit clinical validation of AI tools using a wider study population.[10,11] In this regard, algorithms could be prone to biases such as gender, ethnicity, socioeconomic status, and environmental factors, which show the need for a well-diverse dataset such as a national-scale dataset. Larger national-scale imaging datasets covering multiple geographic regions, using routinely collected data and accessible to the wider community, have the potential to enable development of more robust AI tools where the data can help evaluate clinical questions, validate AI algorithms and assess translational challenges. The general lack of such datasets is cited as one of the most significant challenges to the adoption of AI technologies in the NHS. In this paper, we discussed our experiences and challenges in setting up the National COVID-19 Chest Imaging Database (NCCID). The NCCID was established by the NHS AI Lab in May 2020 to combat the COVID-19 pandemic using AI technologies. It assimilates data of over 15,000 patients from 22 NHS Trusts and Health Boards to enable validation of AI algorithms. , The NCCID was endorsed by the Royal College of Radiologists and aimed to provide a central repository of chest X-rays, chest computed tomography (CT) scans and cardiac magnetic resonance images (MRIs) of COVID-19 diagnosed patients (through reverse transcription polymerase chain reaction (RT-PCR) testing) and controls.

Methodology

The experiences from the NCCID have emphasised that there are opportunities to enhance the current UK health data infrastructure for the national collection of medical imaging data. This could be accelerated if led by a neutral national healthcare body, such as NHSX in England, in close collaboration with existing entities leading on data collection initiatives in the UK (several of which are referenced in this paper) and the devolved administrations in Scotland, Wales and Northern Ireland. These learnings are summarised in Table 1.

Table 1.

Learnings summary from the National COVID-19 Chest Imaging Database (NCCID) for future data collection initiatives.

	Category	Learning for future data collection
1	Information governance (IG)	Data governance processes must be clarified and standardised to reduce barriers to NHS Trust participation in future national data collection exercises
2	Database linkages	Collaboration and linking with other databases improve the quality, completeness and coverage of the data collected, increasing opportunities for discovery
3	Automation	Incorporating automation is vital to enable mass data collection and reduce manual data capture and burden for hospital staff
4	Trusted research environments (TREs)	Building national infrastructure enables data to be accessed and analysed in a safe and secure way, facilitating research and innovation
5	Availability of validation datasets	Creation of large-scale high-quality validation datasets helps to accelerate the route to market for new artificial intelligence models
6	Funding	Defining a variety of funding mechanisms to support NHS Trust data collection activities and infrastructure engineering is key to the sustainability of any national programme
7	Patient and public engagement	Consulting patients and the wider public is important to ensure that concerns regarding how patient data is used and stored are addressed and that this is done in a safe, secure and ethical way
8	Benefit share models	Defining benefit share frameworks helps to ensure that the NHS benefits at the local site level, which can incentivise participation in national data collection exercises

Learnings summary from the National COVID-19 Chest Imaging Database (NCCID) for future data collection initiatives. A diagram and description of the infrastructure set-up for the NCCID is also included in Figure 1.

Figure 1.

National COVID-19 Chest Imaging Database (NCCID) infrastructure and explanation.

Information governance: Rationalising data governance processes to reduce the barrier to participation in future national data collection exercises

The NCCID's data governance arrangements were established quickly as a consequence of regulations put in place for data collection during COVID-19. The notice under regulation 3(4) of the Health Service (Control of Patient Information) Regulations 2002 (COPI Notice), valid for England and Wales until end of September 2021, enabled the NCCID to collect pseudonymised patient data without seeking consent to support a national response to the pandemic.[20,21] Once the COPI Notice expires, additional ethical approvals will need to be acquired potentially through application to the Confidentiality Advisory Group (CAG). In addition to compliance with the terms of the COPI Notice, the NCCID has established robust data governance procedures. Using the IG expertise of NHSX advisors and project partners, in particular the National Consortium for Intelligent Medical Imaging (NCIMI), the NCCID initiative received formal ethical approval from the Health Research Authority's Research Ethics Committee to establish a research database for the collection and sharing of the NCCID data. Indeed using a research database model was crucial in communicating the ethical basis of the initiative to hospital sites to encourage participation. Ongoing amendments became necessary as the data points collected in the database expanded to keep pace with the rapid progression of scientific understanding around COVID-19. These experiences highlight the comprehensive support and expertise required to establish data governance arrangements for national-level initiatives. A Data Protection Impact Assessment (DPIA) is formally shared with each participating hospital site alongside a Data Sharing Agreement (DSA) for approval by Data Protection Officers (DPOs). The national data opt-out policy mandates that hospital sites should not upload data for patients who have registered to opt-out of research studies, and additional checks have been built in to the NCCID's data management procedures to ensure this data is automatically deleted in rare instances where it is mistakenly uploaded. Before accessing the database, researchers must sign a Data Access Agreement and Data Access Framework contract. Information governance (IG) for national data collection is further complicated by the fact that legislation is not consistent across the devolved administrations. For example, the COPI Notice does not apply in Scotland, where approval must be sought through the Public Benefit and Privacy Panel (PBPP) instead. Navigating different regulatory procedures across UK territories inevitably delays the ability to facilitate UK-wide imaging data collection. In the short term, it would be beneficial for a neutral healthcare body to publish clear, available guidance for researchers to understand the necessary processes to be undertaken when setting up data collection studies. In the long term, collaboration across national organisations would help to review existing data governance models and identify how these can be streamlined to support researchers.

Database linkages: Promoting discovery through collaboration with other databases

The NCCID has been collaborating with other UK database initiatives to improve the quality, completeness and coverage of the data collected. These are detailed in Table 2.

Table 2.

Database linkages for the National COVID-19 Chest Imaging Database (NCCID).

	Database linkage	Objective
1	International Severe Acute Respiratory and Emerging Infection Consortium (ISARIC) 4C repository⁵⁹	To enhance the number of clinical variables that can be evaluated alongside clinical imaging and to reduce duplication of data gathering efforts for hospital sites during a period of stretched limited resources.
2	NHS England and Improvement	To provide comprehensive ethnicity data, which can be challenging to collect when relying solely on hospital records. Collecting reliable ethnicity data is essential given the disproportionate impact of COVID-19 on the Black, Asian and Minority Ethnic (BAME) populations, and to avoid introducing inherent biases during artificial intelligence model development.⁶⁰ ⁶¹
3	National Scottish Picture Archiving and Communications System (PACS) and Safe Haven Network[62,63]	To increase the geographic coverage of the database to the entirety of Scotland.⁶⁴

Database linkages for the National COVID-19 Chest Imaging Database (NCCID). Linking the NCCID to other databases was achieved one database at a time, and maintaining the necessary support from IG experts, project managers, technologists and clinicians over a prolonged period is difficult to sustain. Establishing national-level infrastructure and processes to integrate and allow communication between existing datasets will be important to leverage the potential of data collected within the NHS. A number of initiatives in the UK have made progress on this. The ambition of Health Data Research UK (HDR UK) is to create a UK-wide approach for the development of secure and transparent data services through initiatives such as the Health Data Research Innovation Gateway, a common discovery portal to UK health research data for accredited researchers and innovators.[26,27] HDR UK is also working closely with other Trusted Research Environments (TREs ) and Safe Haven Environments to increase the discoverability of UK health datasets and is promoting accessibility to vital COVID-19 datasets via the COVID-19 National Core Studies programme. Our experience in the NCCID has highlighted the national-level coordination required to overcome the barriers of a fragmented data landscape by establishing data access terms and processes that are common across UK initiatives. Data federation is a potential solution, which refers to the aggregation of different data sources into a standard data format for users to access. This is a long-term vision that will require national coordination to be fulfilled, yet success will be key to leveraging the potential of data collected in the NHS.

Automation: Building infrastructure to enable mass data collection and reduce manual data capture for hospital staff

The NCCID project aims to reduce the burden on staff at hospital sites by running a pilot exercise to automate imaging data collection. To rapidly commence data collection, the initial architecture of the NCCID pragmatically relied upon Sectra's Image Exchange Portal (IEP) as processes for transferring images via this network are well-known, resulting in minimal setup overhead at hospital sites. To allow the automation of image transfer, Royal Surrey NHS Foundation Trust (RSNFT) has deployed SMART boxes at NCCID pilot sites, installed on either physical hardware or a virtual machine, that act as a data collection node at each site. The SMART box fulfils three roles: (a) management of cases and clinical data through an internal web portal, (b) image collection and de-identification through a DICOM server, and (c) data transfer to the central NCCID data warehouse. This infrastructure, previously used to create large-scale imaging databases such as the OPTIMAM Image Database project, is significantly reducing workloads on hospital PACS teams. The capture and storage of clinical variables are supported by technology and data standards that are less comprehensive and universal than those for images. The manual processes to capture these data points utilise local staff or informatics to collate the information from the local hospital systems. Automating the collection of these clinical data points by the NCCID has proved difficult due to the heterogeneity of the clinical systems and integration processes. The initiative is currently exploring possible solutions to this challenge in addition to the data linkage with ISARIC 4C. The experiences of the NCCID demonstrate that a coordinated UK-wide approach to automating components of the data collection process across multiple use cases of medical imaging could accelerate the development of AI technologies. Numerous regional and national initiatives are already underway including five new centres of excellence aiming to enhance digital pathology and medical imaging AI development and deployment, initiatives such as PICTURES led by the University of Dundee in conjunction with NHS Scotland and at a more local level, radiology imaging networks such as EMRAD (see Table 3).

Table 3.

Examples of existing initiatives facilitating collection of medical imaging data.

	Data initiative	Description of their work
1	National Consortium of Intelligent Medical Imaging (NCIMI)	Led from Oxford's Big Data Institute and brings together expertise across 14 NHS partners, academia and 13 industry partners to support the development, testing, validation and adoption of new clinical imaging AI tools into the NHS.
2	PICTURES (Interdisciplinary Collaboration for the Efficient and Effective Use of Clinical Images in Big Data Health Care Research)	Led by the University of Dundee in conjunction with NHS Scotland and other academic bodies. The study is collating 30 million images from the Scottish National PACS, across all 14 Scottish health boards, to support the development of AI technologies. PICTURES will allow researchers to work on vast amounts of data in a secure environment that protects individual patient information.
3	East Midlands Radiology Consortium (EMRAD)	Working with Kheiron Medical to develop and test a CE-marked Mammography Intelligent Assessment (MIA) tool that can help detect breast cancer, using the historic data and images provided by EMRAD.⁶⁵ Regional imaging networks are increasingly able to facilitate the rapid testing and application of machine learning technologies within large databases.⁶⁶
4	Yorkshire Imaging Collaborative	Partnered with Leeds University to develop a Trusted Research Environment that extracts data from across the network to predict future capacity and demand requirements for each imaging modality across the region.⁶⁷

Examples of existing initiatives facilitating collection of medical imaging data. Given the wide-ranging initiatives that already exist in the UK, there is an opportunity to establish national-level linkage and discoverability to support combining such networks and datasets to enable, promote and support public health research.

TREs: Building infrastructure to enable secure data access for research

Ensuring that data can be accessed and analysed in a safe and secure way is crucial to the successful development of AI technologies. Data access can be organised by bringing “the data to the users” or “the users to the data”. The NCCID, due to the urgency presented by the pandemic and the need to make data available at pace, adopted the former approach, pushing data to users that are General Data Protection and Regulation (GDPR) and security compliant. Bringing users to the data is a long-term ambition of the NCCID, as it can help reassure both controllers of healthcare data and the patient community that data is being managed in a secure environment. This model is increasingly adopted through the use of TREs, otherwise known as Data Safe Havens. TREs ensure that all data remains in infrastructure approved by the controller, thereby making it easier to audit and ensure compliance with IG standards. A number of initiatives in the UK have supported research into TREs in recent years, which have accelerated as a result of the data response to COVID-19. The Scotland National Safe Haven programme, Genomics England Research Environment and the NHS Digital TRE for England are all examples of enabling advances in TRE infrastructure design executed against the ‘five safes’ – safe people, projects, settings, data and outputs. However, progress remains to enhance TRE models to better cater for the requirements of researchers and technology developers, and adapting their functionality for scalability. Users must be provided with the computational resource, data storage and utilities necessary to do useful work within the TRE itself, while preventing extraction of provided data. TRE users should also have access to statistical and data visualisation and annotation software tools, the ability to access large virtual machines equipped with graphical processing units (GPUs) and high performing, flexible compute power to manage large volumes of data, including imaging. Fair licensing and costing models are also necessary to make data more widely available to the public and commercial companies (the complexities of such commercial models are explored further in the Benefit share models: Defining frameworks to ensure that data sharing directly benefits the NHS system section – Benefit Share Models).[36,37] All these factors will require consideration in the future development of national infrastructure to facilitate access to medical imaging data in the UK.

Availability of validation datasets: Promoting creation of large-scale high-quality validation datasets

Automation can support data collection for national-level validation datasets, which endeavour to capture an unbiased representation of the UK population. Validation datasets should ideally remain dedicated to the purpose of assessing the performance of established AI models. However, within the expansive research environment of the UK, it can be challenging to ensure that a subset of the data within validation datasets has never previously been utilised for training the same AI models that are to be validated. The creation of an entirely segregated validation dataset to assess AI technologies was a core objective at the centre of the NCCID, which has yet to be fully realised. Due to the urgency presented by the pandemic and the requests from technology developers to share the NCCID data as soon as it became available, assignment to the NCCID validation dataset occurs via random sampling of patients. This initial pragmatic choice allowed users access to validation data while the NCCID was growing in scale, but once of a sufficient size, the aim remains the provision of comprehensive, tailored validation sets. This process will also need to include image annotation, which is crucial for curating high-quality datasets for medical image analysis. Hospital sites participating in the NCCID are asked to confirm if they are sharing the data with other entities as a way of understanding what data might have been used in the development of an AI algorithm for COVID-19 research. If data has been shared with other entities, it is excluded from the validation dataset. A consequence of this policy is that it takes longer and requires numerous participating hospital sites to build up a large enough validation sample that can be used to assess algorithms for robustness. There is currently no easy solution to this problem. To request a hospital site to set aside data, which would have immediate value when used in local studies, for the express purpose of allowing long-term creation of a national AI model validation dataset, is a significant and probably unrealistic requirement. Hospitals may also be reluctant to share data that could be used for commercial purposes. There are a small number of examples of UK studies that have managed to curate hold-out unseen validation datasets, such as via the OPTIMAM study, NCIMI, and the PROSPECTS trial, but it is by no means common practice.[39,40] However, one could argue that the UK, with national oversight and systematic coordination of health and care networks in the NHS, would be in a unique position to develop hold-out validation datasets. A nationally coordinated effort to define and implement best practices for generating validation datasets would significantly advance the path to independent robust validation, which is currently lacking for CE marked solutions. Robust validation in turn could speed up regulatory processes for the licensing and deployment of AI technologies in the UK.

Funding: Defining mechanisms to support data collection activities and infrastructure engineering as a research discipline

Understanding which funding mechanisms were the most appropriate for the NCCID to pursue was difficult because data collection as a pursuit in itself is not widely recognised as clinical research in the UK. As the NCCID required support from hospital Research & Development (R&D) departments to facilitate collection of clinical data, a number of sites recommended applying for inclusion on the National Institute of Health Research (NIHR) Urgent Public Health portfolio of studies, whereby hospital resources can be diverted towards supporting prioritised initiatives. However, the NCCID was deemed unsuitable for Urgent Public Health badging, resulting in some sites declining participation due to a lack of capacity among local research staff during the pandemic. There are also challenges with sourcing research nurses or research coordinators for radiology studies specifically and with setting up such studies in radiology departments, as there is a significant shortage of radiology staff members. , To date, apart from hospital sites that reallocated or sourced in-house funding, the NCCID initiative has only received funding from NHSX and has predominantly relied on voluntary support from hospital sites to contribute data. This model has sufficed due to the urgency presented by the pandemic, but would be unsustainable for long-term mass collection of imaging data. Further support and collaboration at a national level are required to establish funding models that are specifically designed for national data collection infrastructure, including imaging, as a research discipline in its own right.

Patient and public engagement: Consulting patients and the wider public on the requirements and benefits of algorithmic development in medical imaging

The adoption of AI technologies in medical imaging, particularly with the input of imaging specialists such as radiologists, can have considerable benefit for patients and the healthcare sector where over 100,000 imaging studies are performed daily in the UK.[43,44] The 2019 Clinical Radiology Workforce census highlights increases in workforce shortages, with seven out of 10 clinical directors of UK radiology departments expressing the view that there were insufficient numbers of clinical radiologists to be able to deliver safe and effective patient care. AI tools may improve the accuracy of diagnosis and the estimation of likely prognosis for a patient in some settings, as well as potentially improve efficiencies by speeding up screening processes and reducing the number of human readers required. However, despite data used in AI software development being de-identified, there are understandable concerns with how patient data is stored and used, particularly for purposes that result in profit for commercial organisations. For the NCCID specifically, the Data Access Committee includes ethical and patient advisors, who participate in the review and approval of all applications to access the database. This ensures that the patient viewpoint is always considered before a decision is made to share data with the requesting applicant. The public are also kept informed on the projects that the NCCID data is currently supporting through study publications on the NCCID website, which is important for patients to understand how their data is being used. The Patient and Public Involvement and Engagement (PPIE) methods adopted for the NCCID will be expanded in future. NHSX is working with the National Data Guardian (NDG) and the Department of Health and Social Care to use medical imaging as a case study for exploring public attitudes towards the sharing of health and social care data for data-driven research and innovation in England. NHSX also plans to collaborate with the Ada Lovelace Institute to trial a model for algorithmic impact assessments. Such assessments will involve public engagement on AI solutions developed using medical imaging data and encourage innovators to review and refine their products based on discussions with citizens about its ethical and societal implications. These mechanisms are useful for understanding the public perceptions of using data to develop and test algorithms, and the level of trust felt towards algorithms in clinical care supporting the role of human readers. More targeted research and public consultation on the impact of this is needed to support the successful adoption of AI technologies in medical imaging at scale in the UK.

Benefit share models: Defining frameworks to ensure that data sharing directly benefits the NHS system

NHS data used in the development and validation of AI imaging algorithms should result in improved patient care. At the same time, the NHS needs to receive fair value from the data assets that it holds. There has been much discussion of the forms this value might take, and can include clinical, social, economic development, environmental and commercial value. The commercial value can itself take a range of forms, including monetary, intellectual property (IP) ownership, and equity and royalties from any products developed, all of which can be achieved via payment or in the form of beneficial access terms for the NHS. It is important to create the right frameworks to identify and realise the benefits for patients and the NHS where data underpins innovation, with an increasing focus on establishing a new ‘social contract’ to achieve this. Due to the severity of the COVID-19 pandemic, the NCCID has established a contractual arrangement with approved data users whereby data is provided to the user free of charge, but any related technologies developed as a result of this must be provided at zero-licence fee cost to the NHS for the duration of the Data Access Agreement signed by the data user. Importantly, all applicants must be able to demonstrate that their proposal will directly benefit the NHS to be granted access. This model aims to expedite the development of potentially useful tools, which is crucial in the context of COVID-19 but may not be appropriate for non-COVID-19-related technologies. Another benefit share model that has the potential to accelerate the development of AI technologies contractually requires data users to share derived data such as image annotations from their analysis for the benefit of all researchers. High-quality image labelling, often performed as an internal exercise within a research group, can be time-consuming and expensive to create. Trained experts may be necessary to annotate the images or check the annotations created by automated tools. High-quality annotation is vital for supervised training, testing and validation of AI algorithms, and there may be inherent benefits in openly sharing the outputs of image annotation from multiple expert sources with the wider research community. The national influence and scale of the NHS mean that it is suitably positioned to encourage the open sharing of annotated datasets in the UK through collaborations with research consortiums and industry partners. It is also important to consider a more direct form of benefit share model for hospital sites at a local level. Hospital sites may show greater willingness to commit resources and expertise to support commercial companies if they are able to directly benefit from the technology in return. Existing research databases have utilised these models through direct income share models, with hospital sites gaining income through commercial licensing of data and images. Collection sites sign an assignment agreement giving the study a non-exclusive licence to include the images and data in the research database and proportions of income shared is calculated based on contributions from the participating sites and the characteristics of the data submitted. NHSX is currently focusing on defining the terms of such benefit share models. The Life Sciences Industrial Strategy has set out guiding principles for sharing health and care data that emphasises that data has to be used for the purpose of improving patient care and that any arrangements need to be clearly and transparently communicated. Providing assurance to the public on the ethical basis for sharing data, involving them in a transparent dialogue and informing them of the direct public benefit are important strategies that need to be widely adopted. The NHSX-led Centre for Improving Data Collaboration has been set up to provide direct support to the health and care sector to meet these guiding principles and establish frameworks for achieving fair value for the NHS. Importantly, further research is needed to define the different types of benefit share models that should apply to commercial developers in the UK depending on the different use cases that the data is being shared for, for example, whether it is a public health emergency or a rare disease, and the intended scale of the deployment.

Future aspirations

The experiences of the NCCID have resulted in valuable learnings to inform a national-level data collection approach for AI development in medical imaging. There are several areas that will need focus, which will be key to ensure that the benefits of data collection are realised through the implementation of AI technologies in real-world settings (see Table 4).

Table 4.

Future aspirations to support the end-goal of implementing artificial intelligence (AI) technologies in the NHS.

	Future aspiration	Explanation
1	Central cloud-based infrastructure, managed by a neutral national body	To facilitate the automated collection of medical imaging data into comprehensive training and validation datasets.
2	Improved collaboration between the entrepreneurial community and the NHS	To encourage the identification of new life-saving technologies through more widely accessible data.
3	Greater engagement with the radiology community as users of AI	To ensure that the AI tools being prioritised for deployment are designed for optimal utility and address the needs of radiologists.
4	Further guidance and support on regulatory approvals and AI model evaluation criteria	For both technology developers and commissioners of AI technology – to ensure that AI products are deemed safe and effective for use in a clinical setting. NHSX has already published an ‘AI Buyers Guide’ to advise commissioners of AI technology on this matter, which is a useful starting point.⁶⁸
5	A centralised, vendor-agnostic deployment infrastructure	To implement AI technologies for use in complex environment, and support the pathway from innovation to deployment

Future aspirations to support the end-goal of implementing artificial intelligence (AI) technologies in the NHS. NHSX will be working closely with research groups, technology companies and NHS Trusts to progress the above focus areas through the work of the NHS AI Lab.

Conclusions

Establishing the NCCID during the COVID-19 pandemic has emphasised the challenges that exist when setting up national data collection infrastructure in the UK, but has also emphasised future opportunities for advancing the development and adoption of AI technologies in medical imaging. These opportunities relate to (a) improving access to high-quality data, (b) supporting impactful research objectives, and (c) facilitating better integration between the NHS and the entrepreneurial community to identify life-saving solutions. Robust and clear data governance structures are a requisite to support the legal and ethical basis for data collection. Automated data pipelines to build large-scale datasets for training and validating AI technologies will require national coordination and infrastructure expertise. TRE models will need to be enhanced through targeted support and closer integration to facilitate easier access to national datasets for research and technology development. All of this work will require consistent funding at a national level and support from the general public, along with patient advocacy groups, to realise impact. As one of the few countries to offer a publicly funded, single payer, nationwide healthcare system, the UK is optimally positioned to be a global leader in national data infrastructure for medical imaging to support AI development and deployment into clinical pathways and to demonstrate how care can be improved as a result. Ultimately, the success of this endeavour will be dependent on national collaboration between NHS executives, clinicians, patients, researchers and technology companies, and the NHS AI Lab is well placed to coordinate these diverse stakeholders to drive this initiative forward. If you work at a hospital site and are interested in contributing data to this initiative, please reach out directly to imaging@nhsx.nhs.uk. If you are involved in research or technology development and would like to apply for access to the NCCID training dataset, please follow the instructions at this link: https://nhsx.github.io/covid-chest-imaging-database/.

Declarations of interest/conflicts

JCLR has received consultancy fees from NHSX for work completed on the NCCID. CB and FG have received consultancy fees from NHSX for work completed on the NCCID. NJS – nil. JJ reports consultancy fees from NHSX for work on the NCCID and consultancy fees from Boehringer Ingelheim, Roche and GlaxoSmithKline unrelated to the submitted work.

Statement of contributorship

All authors have contributed significantly to the concept, design and writing of the work. All authors have been involved in drafting and revision, including all authors approving the article for submission.

Guarantor

Dominic Cushnan (dominic.cushnan@nhsx.nhs.uk)

NCCID collaborative:

Daniel Alexander, Melissa Alexander, Mr Nicholas Ashley, Hena Aziz, Dr Judith L Babar, Dr Ramona-Rita Barbar, Angela Bates, Oscar Bennett, Alison Bettany, Dr Bahadar Bhatia, Angela Bowen, Hannah Bown, Doreen Brookes, Stephen Buckle, Paul Burfield, Paul Burford, Sarah Cardona, Dr Harmeet Chana, Dr Madalina Chifu, Rachel Clark, Dr Jordan Colman, Hayley Connoley, John Curtin, Rachel Darrah, Mrs Janet Deane, Dr Dileep Duvva, Esme Easter, Dr Kieran Foley, Amy J Frary, Samantha Gan, Dr Nemi Gandy, Tara Ganepol, Avneet Gill, Fergus Gleeson, Natasha Greig, Peter Halson, Anisha Harrar, Rachael Harrison, Chris Heafey, Angela Heeney, Dr Benjamin Hudson, John Hurst, Zahida Hussain, Dr Mark Ingram, Dr Leila Ismail, Mary Jones, Joanne Kellett, Rumi Kidwai, Dr Daniel Kim, Dr Fiona Kirkham, Dr Ross Kruger, Chinnoi Law, Francois Lemarchand, Emma Lewis, Dr Hannah Lewis, Gerald Lip, Katy Lomas, Berenice Lufton, Dr James MacKay, Peter Manser, Nigel Marchbank, Dr Giles Maskell, Liz Mathers, Violet Matthews, Dr Caroline McCann, Dr David McCreavy, Kate Milne, Jacqueline Monaghan, Annette Moreton, Andrew Moth, Dr Edward Neil-Gallacher, Dr Helen Oliver, Nnenna Omeje, Bethan Wyn Owen, Susan Palmer, Lisa Patterson, Dr Thomas Payne, Dr Emily Pearlman, Lindsay Van Pelt, Carla Pothecary, Philip Quinlan, Dawn Redwood, Rowena Reyes, Claire Ridgeon, Lisa Roche, Fiona Rotherham, Dr Timothy J Sadler, Dr Alexander Sanchez-Cabello, Anastasios Sarellas, Daniel Schofield, Simon Seal, Dr Aarti Shah, Dr Ban Sharif, Smita Shetty, Dr Melissa Sia, Dr Tze Siah, Marcelle de Sousa, Susanne Spas, James Sutherland, Andrew Swift, Dr Matthew Thorley, Joanna Tilley, Jenna Tugwell-Allsup, Keri Turner, Dr Debbie Wai, Gayle Warren, Janet Watkins, Dr Tom Welsh, Adele Wilson, Richard Wood, Dr Sarah Yusuf

29 in total

1. Ensuring Fairness in Machine Learning to Advance Health Equity.

Authors: Alvin Rajkomar; Michaela Hardt; Michael D Howell; Greg Corrado; Marshall H Chin
Journal: Ann Intern Med Date: 2018-12-04 Impact factor: 25.391

Review 2. Governance of automated image analysis and artificial intelligence analytics in healthcare.

Authors: C W L Ho; D Soon; K Caals; J Kapur
Journal: Clin Radiol Date: 2019-03-19 Impact factor: 2.350

3. The National Institute for Health Research: making an impact in imaging research.

Authors: V Goh
Journal: Clin Radiol Date: 2019-01-23 Impact factor: 2.350

4. Artificial intelligence in clinical imaging: a health system approach.

Authors: F J Gilbert; S W Smye; C-B Schönlieb
Journal: Clin Radiol Date: 2019-09-30 Impact factor: 2.350

5. Development and evaluation of an artificial intelligence system for COVID-19 diagnosis.

Authors: Cheng Jin; Weixiang Chen; Yukun Cao; Zhanwei Xu; Zimeng Tan; Xin Zhang; Lei Deng; Chuansheng Zheng; Jie Zhou; Heshui Shi; Jianjiang Feng
Journal: Nat Commun Date: 2020-10-09 Impact factor: 14.919

6. Prognostication of patients with COVID-19 using artificial intelligence based on chest x-rays and clinical data: a retrospective study.

Authors: Zhicheng Jiao; Ji Whae Choi; Kasey Halsey; Thi My Linh Tran; Ben Hsieh; Dongcui Wang; Feyisope Eweje; Robin Wang; Ken Chang; Jing Wu; Scott A Collins; Thomas Y Yi; Andrew T Delworth; Tao Liu; Terrance T Healey; Shaolei Lu; Jianxin Wang; Xue Feng; Michael K Atalay; Li Yang; Michael Feldman; Paul J L Zhang; Wei-Hua Liao; Yong Fan; Harrison X Bai
Journal: Lancet Digit Health Date: 2021-03-24

7. Public Perception of Artificial Intelligence in Medical Care: Content Analysis of Social Media.

Authors: Shuqing Gao; Lingnan He; Yue Chen; Dan Li; Kaisheng Lai
Journal: J Med Internet Res Date: 2020-07-13 Impact factor: 5.428

Review 8. Artificial intelligence in medical imaging: threat or opportunity? Radiologists again at the forefront of innovation in medicine.

Authors: Filippo Pesapane; Marina Codari; Francesco Sardanelli
Journal: Eur Radiol Exp Date: 2018-10-24

9. Key challenges for delivering clinical impact with artificial intelligence.

Authors: Christopher J Kelly; Alan Karthikesalingam; Mustafa Suleyman; Greg Corrado; Dominic King
Journal: BMC Med Date: 2019-10-29 Impact factor: 8.775

10. Early prediction of disease progression in COVID-19 pneumonia patients with chest CT and clinical characteristics.

Authors: Zhichao Feng; Qizhi Yu; Shanhu Yao; Lei Luo; Wenming Zhou; Xiaowen Mao; Jennifer Li; Junhong Duan; Zhimin Yan; Min Yang; Hongpei Tan; Mengtian Ma; Ting Li; Dali Yi; Ze Mi; Huafei Zhao; Yi Jiang; Zhenhu He; Huiling Li; Wei Nie; Yin Liu; Jing Zhao; Muqing Luo; Xuanhui Liu; Pengfei Rong; Wei Wang
Journal: Nat Commun Date: 2020-10-02 Impact factor: 14.919

1 in total

Review 1. The Artificial Intelligence in Digital Radiology: Part 1: The Challenges, Acceptance and Consensus.

Authors: Daniele Giansanti; Francesco Di Basilio
Journal: Healthcare (Basel) Date: 2022-03-10

1 in total