Literature DB >> 34837041

Navigating the pitfalls of applying machine learning in genomics.

Sean Whalen1, Jacob Schreiber2, William S Noble3, Katherine S Pollard4,5,6.   

Abstract

The scale of genetic, epigenomic, transcriptomic, cheminformatic and proteomic data available today, coupled with easy-to-use machine learning (ML) toolkits, has propelled the application of supervised learning in genomics research. However, the assumptions behind the statistical models and performance evaluations in ML software frequently are not met in biological systems. In this Review, we illustrate the impact of several common pitfalls encountered when applying supervised ML in genomics. We explore how the structure of genomics data can bias performance evaluations and predictions. To address the challenges associated with applying cutting-edge ML methods to genomics, we describe solutions and appropriate use cases where ML modelling shows great potential.
© 2021. Springer Nature Limited.

Entities:  

Mesh:

Year:  2021        PMID: 34837041     DOI: 10.1038/s41576-021-00434-9

Source DB:  PubMed          Journal:  Nat Rev Genet        ISSN: 1471-0056            Impact factor:   53.242


  78 in total

1.  Avoiding common pitfalls in machine learning omic data science.

Authors:  Andrew E Teschendorff
Journal:  Nat Mater       Date:  2019-05       Impact factor: 43.841

2.  The nature of confounding in genome-wide association studies.

Authors:  Bjarni J Vilhjálmsson; Magnus Nordborg
Journal:  Nat Rev Genet       Date:  2012-11-20       Impact factor: 53.242

3.  Confounding and heterogeneity in genetic association studies with admixed populations.

Authors:  Jinghua Liu; Juan Pablo Lewinger; Frank D Gilliland; W James Gauderman; David V Conti
Journal:  Am J Epidemiol       Date:  2013-01-18       Impact factor: 4.897

Review 4.  Tackling the widespread and critical impact of batch effects in high-throughput data.

Authors:  Jeffrey T Leek; Robert B Scharpf; Héctor Corrada Bravo; David Simcha; Benjamin Langmead; W Evan Johnson; Donald Geman; Keith Baggerly; Rafael A Irizarry
Journal:  Nat Rev Genet       Date:  2010-09-14       Impact factor: 53.242

5.  Evaluation of methods for modeling transcription factor sequence specificity.

Authors:  Matthew T Weirauch; Atina Cote; Raquel Norel; Matti Annala; Yue Zhao; Todd R Riley; Julio Saez-Rodriguez; Thomas Cokelaer; Anastasia Vedenko; Shaheynoor Talukder; Harmen J Bussemaker; Quaid D Morris; Martha L Bulyk; Gustavo Stolovitzky; Timothy R Hughes
Journal:  Nat Biotechnol       Date:  2013-01-27       Impact factor: 54.908

Review 6.  Deep learning: new computational modelling techniques for genomics.

Authors:  Gökcen Eraslan; Žiga Avsec; Julien Gagneur; Fabian J Theis
Journal:  Nat Rev Genet       Date:  2019-07       Impact factor: 53.242

Review 7.  A primer on deep learning in genomics.

Authors:  James Zou; Mikael Huss; Abubakar Abid; Pejman Mohammadi; Ali Torkamani; Amalio Telenti
Journal:  Nat Genet       Date:  2018-11-26       Impact factor: 38.330

Review 8.  Population structure in genetic studies: Confounding factors and mixed models.

Authors:  Jae Hoon Sul; Lana S Martin; Eleazar Eskin
Journal:  PLoS Genet       Date:  2018-12-27       Impact factor: 5.917

9.  The Unreasonable Effectiveness of Convolutional Neural Networks in Population Genetic Inference.

Authors:  Lex Flagel; Yaniv Brandvain; Daniel R Schrider
Journal:  Mol Biol Evol       Date:  2019-02-01       Impact factor: 16.240

Review 10.  Opportunities and obstacles for deep learning in biology and medicine.

Authors:  Travers Ching; Daniel S Himmelstein; Brett K Beaulieu-Jones; Alexandr A Kalinin; Brian T Do; Gregory P Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M Hoffman; Wei Xie; Gail L Rosen; Benjamin J Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M Cofer; Christopher A Lavender; Srinivas C Turaga; Amr M Alexandari; Zhiyong Lu; David J Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura K Wiley; Marwin H S Segler; Simina M Boca; S Joshua Swamidass; Austin Huang; Anthony Gitter; Casey S Greene
Journal:  J R Soc Interface       Date:  2018-04       Impact factor: 4.293

View more
  11 in total

Review 1.  Obtaining genetics insights from deep learning via explainable artificial intelligence.

Authors:  Gherman Novakovsky; Nick Dexter; Maxwell W Libbrecht; Wyeth W Wasserman; Sara Mostafavi
Journal:  Nat Rev Genet       Date:  2022-10-03       Impact factor: 59.581

2.  An approachable, flexible and practical machine learning workshop for biologists.

Authors:  Chris S Magnano; Fangzhou Mu; Rosemary S Russ; Milica Cvetkovic; Debora Treu; Anthony Gitter
Journal:  Bioinformatics       Date:  2022-06-24       Impact factor: 6.931

Review 3.  Current progress and open challenges for applying deep learning across the biosciences.

Authors:  Nicolae Sapoval; Amirali Aghazadeh; Michael G Nute; Dinler A Antunes; Advait Balaji; Richard Baraniuk; C J Barberan; Ruth Dannenfelser; Chen Dun; Mohammadamin Edrisi; R A Leo Elworth; Bryce Kille; Anastasios Kyrillidis; Luay Nakhleh; Cameron R Wolfe; Zhi Yan; Vicky Yao; Todd J Treangen
Journal:  Nat Commun       Date:  2022-04-01       Impact factor: 14.919

Review 4.  Precision medicine for the treatment of glomerulonephritis: a bold goal but not yet a transformative achievement.

Authors:  Richard J Glassock
Journal:  Clin Kidney J       Date:  2021-12-11

5.  Novel Genetic Signatures Associated With Sporadic Amyotrophic Lateral Sclerosis.

Authors:  Robert Logan; Juleah Dubel-Haag; Nicolas Schcolnicov; Sean J Miller
Journal:  Front Genet       Date:  2022-03-24       Impact factor: 4.599

6.  Nine quick tips for pathway enrichment analysis.

Authors:  Davide Chicco; Giuseppe Agapito
Journal:  PLoS Comput Biol       Date:  2022-08-11       Impact factor: 4.779

7.  Identifying interactions in omics data for clinical biomarker discovery using symbolic regression.

Authors:  Niels Johan Christensen; Samuel Demharter; Meera Machado; Lykke Pedersen; Marco Salvatore; Valdemar Stentoft-Hansen; Miquel Triana Iglesias
Journal:  Bioinformatics       Date:  2022-06-22       Impact factor: 6.931

8.  Machine learning for prediction of schizophrenia using genetic and demographic factors in the UK biobank.

Authors:  Matthew Bracher-Smith; Elliott Rees; Georgina Menzies; James T R Walters; Michael C O'Donovan; Michael J Owen; George Kirov; Valentina Escott-Price
Journal:  Schizophr Res       Date:  2022-06-29       Impact factor: 4.662

9.  System Construction of Athlete Health Information Protection Based on Machine Learning Algorithm.

Authors:  Long Liu; Xiaodong Fan
Journal:  Biomed Res Int       Date:  2022-09-28       Impact factor: 3.246

10.  The ability to classify patients based on gene-expression data varies by algorithm and performance metric.

Authors:  Stephen R Piccolo; Avery Mecham; Nathan P Golightly; Jérémie L Johnson; Dustin B Miller
Journal:  PLoS Comput Biol       Date:  2022-03-11       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.