Literature DB >> 32548642

Empirical assessment of bias in machine learning diagnostic test accuracy studies.

Ryan J Crowley1,2, Yuan Jin Tan1,3, John P A Ioannidis1,3,4,5,6.   

Abstract

OBJECTIVE: Machine learning (ML) diagnostic tools have significant potential to improve health care. However, methodological pitfalls may affect diagnostic test accuracy studies used to appraise such tools. We aimed to evaluate the prevalence and reporting of design characteristics within the literature. Further, we sought to empirically assess whether design features may be associated with different estimates of diagnostic accuracy.
MATERIALS AND METHODS: We systematically retrieved 2 × 2 tables (n = 281) describing the performance of ML diagnostic tools, derived from 114 publications in 38 meta-analyses, from PubMed. Data extracted included test performance, sample sizes, and design features. A mixed-effects metaregression was run to quantify the association between design features and diagnostic accuracy.
RESULTS: Participant ethnicity and blinding in test interpretation was unreported in 90% and 60% of studies, respectively. Reporting was occasionally lacking for rudimentary characteristics such as study design (28% unreported). Internal validation without appropriate safeguards was used in 44% of studies. Several design features were associated with larger estimates of accuracy, including having unreported (relative diagnostic odds ratio [RDOR], 2.11; 95% confidence interval [CI], 1.43-3.1) or case-control study designs (RDOR, 1.27; 95% CI, 0.97-1.66), and recruiting participants for the index test (RDOR, 1.67; 95% CI, 1.08-2.59). DISCUSSION: Significant underreporting of experimental details was present. Study design features may affect estimates of diagnostic performance in the ML diagnostic test accuracy literature.
CONCLUSIONS: The present study identifies pitfalls that threaten the validity, generalizability, and clinical value of ML diagnostic tools and provides recommendations for improvement.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  bias; diagnostic techniques and procedures; machine learning; research design; sensitivity and specificity

Mesh:

Year:  2020        PMID: 32548642      PMCID: PMC7647361          DOI: 10.1093/jamia/ocaa075

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  39 in total

1.  Feature extraction and classification of breast cancer on dynamic magnetic resonance imaging using artificial neural network.

Authors:  P Abdolmaleki; L D Buadu; H Naderimansh
Journal:  Cancer Lett       Date:  2001-10-10       Impact factor: 8.679

2.  Machine learning in medicine: a primer for physicians.

Authors:  Akbar K Waljee; Peter D R Higgins
Journal:  Am J Gastroenterol       Date:  2010-06       Impact factor: 10.864

3.  Screening for Down syndrome during first trimester: a prospective study using free beta-human chorionic gonadotropin and pregnancy-associated plasma protein A.

Authors:  J C Forest; J Massé; J M Moutquin
Journal:  Clin Biochem       Date:  1997-06       Impact factor: 3.281

4.  Multifeature analysis of Gd-enhanced MR images of breast lesions.

Authors:  S Sinha; F A Lucas-Quesada; N D DeBruhl; J Sayre; D Farria; D P Gorczyca; L W Bassett
Journal:  J Magn Reson Imaging       Date:  1997 Nov-Dec       Impact factor: 4.813

5.  Combining independent studies of a diagnostic test into a summary ROC curve: data-analytic approaches and some additional considerations.

Authors:  L E Moses; D Shapiro; B Littenberg
Journal:  Stat Med       Date:  1993-07-30       Impact factor: 2.373

6.  Effect of verification bias on screening for prostate cancer by measurement of prostate-specific antigen.

Authors:  Rinaa S Punglia; Anthony V D'Amico; William J Catalona; Kimberly A Roehl; Karen M Kuntz
Journal:  N Engl J Med       Date:  2003-07-24       Impact factor: 91.245

7.  The d-dimer test for deep venous thrombosis: gold standards and bias in negative predictive value.

Authors:  John T Philbrick; Steven Heim
Journal:  Clin Chem       Date:  2003-04       Impact factor: 8.327

Review 8.  High-performance medicine: the convergence of human and artificial intelligence.

Authors:  Eric J Topol
Journal:  Nat Med       Date:  2019-01-07       Impact factor: 53.440

9.  STARD 2015 guidelines for reporting diagnostic accuracy studies: explanation and elaboration.

Authors:  Jérémie F Cohen; Daniël A Korevaar; Douglas G Altman; David E Bruns; Constantine A Gatsonis; Lotty Hooft; Les Irwig; Deborah Levine; Johannes B Reitsma; Henrica C W de Vet; Patrick M M Bossuyt
Journal:  BMJ Open       Date:  2016-11-14       Impact factor: 2.692

10.  Promises, Pitfalls, and Basic Guidelines for Applying Machine Learning Classifiers to Psychiatric Imaging Data, with Autism as an Example.

Authors:  Pegah Kassraian-Fard; Caroline Matthis; Joshua H Balsters; Marloes H Maathuis; Nicole Wenderoth
Journal:  Front Psychiatry       Date:  2016-12-01       Impact factor: 4.157

View more
  5 in total

Review 1.  Extracellular MicroRNAs as Intercellular Mediators and Noninvasive Biomarkers of Cancer.

Authors:  Blanca Ortiz-Quintero
Journal:  Cancers (Basel)       Date:  2020-11-20       Impact factor: 6.639

2.  Clinician Preimplementation Perspectives of a Decision-Support Tool for the Prediction of Cardiac Arrhythmia Based on Machine Learning: Near-Live Feasibility and Qualitative Study.

Authors:  Stina Matthiesen; Søren Zöga Diederichsen; Mikkel Klitzing Hartmann Hansen; Christina Villumsen; Mats Christian Højbjerg Lassen; Peter Karl Jacobsen; Niels Risum; Bo Gregers Winkel; Berit T Philbert; Jesper Hastrup Svendsen; Tariq Osman Andersen
Journal:  JMIR Hum Factors       Date:  2021-11-26

3.  Machine Learning Decomposition of the Anatomy of Neuropsychological Deficit in Alzheimer's Disease and Mild Cognitive Impairment.

Authors:  Ningxin Dong; Changyong Fu; Renren Li; Wei Zhang; Meng Liu; Weixin Xiao; Hugh M Taylor; Peter J Nicholas; Onur Tanglay; Isabella M Young; Karol Z Osipowicz; Michael E Sughrue; Stephane P Doyen; Yunxia Li
Journal:  Front Aging Neurosci       Date:  2022-05-03       Impact factor: 5.750

4.  Trial-level characteristics associate with treatment effect estimates: a systematic review of meta-epidemiological studies.

Authors:  Huan Wang; Jinlu Song; Yali Lin; Wenjie Dai; Yinyan Gao; Lang Qin; Yancong Chen; Wilson Tam; Irene Xy Wu; Vincent Ch Chung
Journal:  BMC Med Res Methodol       Date:  2022-06-15       Impact factor: 4.612

5.  Machine Learning Models for Predicting Neonatal Mortality: A Systematic Review.

Authors:  Cheyenne Mangold; Sarah Zoretic; Keerthi Thallapureddy; Axel Moreira; Kevin Chorath; Alvaro Moreira
Journal:  Neonatology       Date:  2021-07-14       Impact factor: 4.035

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.