| Literature DB >> 35937419 |
Thomas Birk Kristiansen1, Kent Kristensen2, Jakob Uffelmann3,4, Ivan Brandslund5.
Abstract
This paper reviews dilemmas and implications of erroneous data for clinical implementation of AI. It is well-known that if erroneous and biased data are used to train AI, there is a risk of systematic error. However, even perfectly trained AI applications can produce faulty outputs if fed with erroneous inputs. To counter such problems, we suggest 3 steps: (1) AI should focus on data of the highest quality, in essence paraclinical data and digital images, (2) patients should be granted simple access to the input data that feed the AI, and granted a right to request changes to erroneous data, and (3) automated high-throughput methods for error-correction should be implemented in domains with faulty data when possible. Also, we conclude that erroneous data is a reality even for highly reputable Danish data sources, and thus, legal framework for the correction of errors is universally needed.Entities:
Keywords: AI; artificial intelligence; data quality; deep learning; machine learning (ML); personalized medicine
Year: 2022 PMID: 35937419 PMCID: PMC9355416 DOI: 10.3389/fdgth.2022.862095
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1Danish digital infrastructure for health data. The figure shows different data categories divided into regional, interregional and national systems. Source systems making up the digital infrastructure can be categorized in relation to four data categories. (1) data collected for use in the patient record, (2) data collected for administration of the patient course, (3) data on drugs for use in the administration of drugs, and (4) paraclinical data. As shown, it is a basic principle that the same health data are usually stored both regionally, nationally, and in some cases also inter-regionally. In this way, data control is divided between the regions and the state. This data redundancy may be deliberate but may also be by chance as the infrastructure has grown organically over many years. The figure is included in a textbook by Kristensen (24). EHR, electronic health record; PRO data, patient recorded outcome; CQD, clinical quality databases; NPR, national patient register; PAS, patient administrative systems; CMC, common medicine card; DAR, drug administration register; DSR, drug statistics register; NGC, national genome center.