Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 How doppelgänger effects in biomedical data confound machine learning.

Literature DB >> 34743902

How doppelgänger effects in biomedical data confound machine learning.

Li Rong Wang¹, Limsoon Wong², Wilson Wen Bin Goh³.

Abstract

Machine learning (ML) models have been increasingly adopted in drug development for faster identification of potential targets. Cross-validation techniques are commonly used to evaluate these models. However, the reliability of such validation methods can be affected by the presence of data doppelgängers. Data doppelgängers occur when independently derived data are very similar to each other, causing models to perform well regardless of how they are trained (i.e., the doppelgänger effect). Despite the abundance of data doppelgängers in biomedical data and their inflationary effects, they remain uncharacterized. We show their prevalence in biomedical data, demonstrate how doppelgängers arise, and provide proof of their confounding effects. To mitigate the doppelgänger effect, we recommend identifying data doppelgängers before the training-validation split.

Entities: Chemical

Keywords: Computational biology; Data science; Doppelgänger effect; Machine learning

Mesh：

Year: 2021 PMID： 34743902 DOI： 10.1016/j.drudis.2021.10.017

Source DB: PubMed Journal: Drug Discov Today ISSN： 1359-6446 Impact factor: 7.851

Keyword Cloud
Cited

1 in total

1. Doppelgänger spotting in biomedical gene expression data.

Authors: Li Rong Wang; Xin Yun Choy; Wilson Wen Bin Goh
Journal: iScience Date: 2022-07-19

1 in total