Literature DB >> 33568741

Systematic auditing is essential to debiasing machine learning in biology.

Fatma-Elzahraa Eid1,2, Haitham A Elmarakeby3,4,5, Yujia Alina Chan3, Nadine Fornelos3, Mahmoud ElHefnawi6, Eliezer M Van Allen3,5, Lenwood S Heath7, Kasper Lage8,9,10.   

Abstract

Biases in data used to train machine learning (ML) models can inflate their prediction performance and confound our understanding of how and what they learn. Although biases are common in biological data, systematic auditing of ML models to identify and eliminate these biases is not a common practice when applying ML in the life sciences. Here we devise a systematic, principled, and general approach to audit ML models in the life sciences. We use this auditing framework to examine biases in three ML applications of therapeutic interest and identify unrecognized biases that hinder the ML process and result in substantially reduced model performance on new datasets. Ultimately, we show that ML models tend to learn primarily from data biases when there is insufficient signal in the data to learn from. We provide detailed protocols, guidelines, and examples of code to enable tailoring of the auditing framework to other biomedical applications.

Entities:  

Year:  2021        PMID: 33568741      PMCID: PMC7876113          DOI: 10.1038/s42003-021-01674-5

Source DB:  PubMed          Journal:  Commun Biol        ISSN: 2399-3642


  37 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Large-scale prediction of human protein-protein interactions from amino acid sequence based on latent topic features.

Authors:  Xiao-Yong Pan; Ya-Nan Zhang; Hong-Bin Shen
Journal:  J Proteome Res       Date:  2010-10-01       Impact factor: 4.466

3.  Predicting protein-protein interactions based only on sequences information.

Authors:  Juwen Shen; Jian Zhang; Xiaomin Luo; Weiliang Zhu; Kunqian Yu; Kaixian Chen; Yixue Li; Hualiang Jiang
Journal:  Proc Natl Acad Sci U S A       Date:  2007-03-05       Impact factor: 11.205

4.  The PickPocket method for predicting binding specificities for receptors based on receptor pocket similarities: application to MHC-peptide binding.

Authors:  Hao Zhang; Ole Lund; Morten Nielsen
Journal:  Bioinformatics       Date:  2009-03-17       Impact factor: 6.937

5.  Revisiting the negative example sampling problem for predicting protein-protein interactions.

Authors:  Yungki Park; Edward M Marcotte
Journal:  Bioinformatics       Date:  2011-09-09       Impact factor: 6.937

6.  MHCflurry: Open-Source Class I MHC Binding Affinity Prediction.

Authors:  Timothy J O'Donnell; Alex Rubinsteyn; Maria Bonsack; Angelika B Riemer; Uri Laserson; Jeff Hammerbacher
Journal:  Cell Syst       Date:  2018-06-27       Impact factor: 10.304

7.  AI can be sexist and racist - it's time to make it fair.

Authors:  James Zou; Londa Schiebinger
Journal:  Nature       Date:  2018-07       Impact factor: 49.962

8.  Using deep learning to model the hierarchical structure and function of a cell.

Authors:  Jianzhu Ma; Michael Ku Yu; Samson Fong; Keiichiro Ono; Eric Sage; Barry Demchak; Roded Sharan; Trey Ideker
Journal:  Nat Methods       Date:  2018-03-05       Impact factor: 28.547

9.  Modeling polypharmacy side effects with graph convolutional networks.

Authors:  Marinka Zitnik; Monica Agrawal; Jure Leskovec
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

10.  Systematically benchmarking peptide-MHC binding predictors: From synthetic to naturally processed epitopes.

Authors:  Weilong Zhao; Xinwei Sher
Journal:  PLoS Comput Biol       Date:  2018-11-08       Impact factor: 4.475

View more
  2 in total

Review 1.  Systematic indication extension for drugs using patient stratification insights generated by combinatorial analytics.

Authors:  Sayoni Das; Krystyna Taylor; Simon Beaulah; Steve Gardner
Journal:  Patterns (N Y)       Date:  2022-06-10

2.  Modeling in systems biology: Causal understanding before prediction?

Authors:  Szilvia Barsi; Bence Szalai
Journal:  Patterns (N Y)       Date:  2021-06-11
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.