Literature DB >> 34585237

Preventing dataset shift from breaking machine-learning biomarkers.

Jérôme Dockès1, Gaël Varoquaux1,2, Jean-Baptiste Poline1.   

Abstract

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g.,  because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning-extracted biomarkers, as well as detection and correction strategies.
© The Author(s) 2021. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  biomarker; dataset shift; generalization; machine learning

Mesh:

Substances:

Year:  2021        PMID: 34585237      PMCID: PMC8478611          DOI: 10.1093/gigascience/giab055

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  33 in total

1.  A structural approach to selection bias.

Authors:  Miguel A Hernán; Sonia Hernández-Díaz; James M Robins
Journal:  Epidemiology       Date:  2004-09       Impact factor: 4.822

2.  Most people are not WEIRD.

Authors:  Joseph Henrich; Steven J Heine; Ara Norenzayan
Journal:  Nature       Date:  2010-07-01       Impact factor: 49.962

Review 3.  Deep learning for healthcare applications based on physiological signals: A review.

Authors:  Oliver Faust; Yuki Hagiwara; Tan Jen Hong; Oh Shu Lih; U Rajendra Acharya
Journal:  Comput Methods Programs Biomed       Date:  2018-04-11       Impact factor: 5.428

Review 4.  Machine Learning and Health Care Disparities in Dermatology.

Authors:  Adewole S Adamson; Avery Smith
Journal:  JAMA Dermatol       Date:  2018-11-01       Impact factor: 10.282

5.  Noninvasive glucose monitoring using mid-infrared absorption spectroscopy based on a few wavenumbers.

Authors:  Ryosuke Kasahara; Saiko Kino; Shunsuke Soyama; Yuji Matsuura
Journal:  Biomed Opt Express       Date:  2017-12-20       Impact factor: 3.732

6.  Using and understanding cross-validation strategies. Perspectives on Saeb et al.

Authors:  Max A Little; Gael Varoquaux; Sohrab Saeb; Luca Lonini; Arun Jayaraman; David C Mohr; Konrad P Kording
Journal:  Gigascience       Date:  2017-05-01       Impact factor: 6.524

7.  Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data.

Authors:  Milena A Gianfrancesco; Suzanne Tamang; Jinoos Yazdany; Gabriela Schmajuk
Journal:  JAMA Intern Med       Date:  2018-11-01       Impact factor: 21.873

Review 8.  Building better biomarkers: brain models in translational neuroimaging.

Authors:  Choong-Wan Woo; Luke J Chang; Martin A Lindquist; Tor D Wager
Journal:  Nat Neurosci       Date:  2017-02-23       Impact factor: 24.884

9.  External validation is necessary in prediction research: a clinical example.

Authors:  S E Bleeker; H A Moll; E W Steyerberg; A R T Donders; G Derksen-Lubsen; D E Grobbee; K G M Moons
Journal:  J Clin Epidemiol       Date:  2003-09       Impact factor: 6.437

10.  Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis.

Authors:  Agostina J Larrazabal; Nicolás Nieto; Victoria Peterson; Diego H Milone; Enzo Ferrante
Journal:  Proc Natl Acad Sci U S A       Date:  2020-05-26       Impact factor: 11.205

View more
  4 in total

1.  Statistical quantification of confounding bias in machine learning models.

Authors:  Tamas Spisak
Journal:  Gigascience       Date:  2022-08-26       Impact factor: 7.658

Review 2.  Machine learning for medical imaging: methodological failures and recommendations for the future.

Authors:  Gaël Varoquaux; Veronika Cheplygina
Journal:  NPJ Digit Med       Date:  2022-04-12

3.  Integrated bioinformatical analysis, machine learning and in vitro experiment-identified m6A subtype, and predictive drug target signatures for diagnosing renal fibrosis.

Authors:  Chunxiang Feng; Zhixian Wang; Chang Liu; Shiliang Liu; Yuxi Wang; Yuanyuan Zeng; Qianqian Wang; Tianming Peng; Xiaoyong Pu; Jiumin Liu
Journal:  Front Pharmacol       Date:  2022-08-31       Impact factor: 5.988

4.  Transcriptional and post-transcriptional regulation of checkpoint genes on the tumour side of the immunological synapse.

Authors:  Paula Dobosz; Przemysław A Stempor; Miguel Ramírez Moreno; Natalia A Bulgakova
Journal:  Heredity (Edinb)       Date:  2022-04-22       Impact factor: 3.832

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.