Literature DB >> 29218881

Data-driven advice for applying machine learning to bioinformatics problems.

Randal S Olson1, William La Cava, Zairah Mustahsan, Akshay Varik, Jason H Moore.   

Abstract

As the bioinformatics field grows, it must keep pace not only with new data but with new algorithms. Here we contribute a thorough analysis of 13 state-of-the-art, commonly used machine learning algorithms on a set of 165 publicly available classification problems in order to provide data-driven algorithm recommendations to current researchers. We present a number of statistical and visual comparisons of algorithm performance and quantify the effect of model selection and algorithm tuning for each algorithm and dataset. The analysis culminates in the recommendation of five algorithms with hyperparameters that maximize classifier performance across the tested problems, as well as general guidelines for applying machine learning to supervised classification problems.

Entities:  

Mesh:

Year:  2018        PMID: 29218881      PMCID: PMC5890912     

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  5 in total

Review 1.  Machine learning in bioinformatics: a brief survey and recommendations for practitioners.

Authors:  Harish Bhaskar; David C Hoyle; Sameer Singh
Journal:  Comput Biol Med       Date:  2005-10-13       Impact factor: 4.589

2.  Machine learning for detecting gene-gene interactions: a review.

Authors:  Brett A McKinney; David M Reif; Marylyn D Ritchie; Jason H Moore
Journal:  Appl Bioinformatics       Date:  2006

3.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction.

Authors:  Digna R Velez; Bill C White; Alison A Motsinger; William S Bush; Marylyn D Ritchie; Scott M Williams; Jason H Moore
Journal:  Genet Epidemiol       Date:  2007-05       Impact factor: 2.135

4.  Data mining in bioinformatics using Weka.

Authors:  Eibe Frank; Mark Hall; Len Trigg; Geoffrey Holmes; Ian H Witten
Journal:  Bioinformatics       Date:  2004-04-08       Impact factor: 6.937

5.  PMLB: a large benchmark suite for machine learning evaluation and comparison.

Authors:  Randal S Olson; William La Cava; Patryk Orzechowski; Ryan J Urbanowicz; Jason H Moore
Journal:  BioData Min       Date:  2017-12-11       Impact factor: 2.522

  5 in total
  48 in total

1.  Deep learning for cardiovascular medicine: a practical primer.

Authors:  Chayakrit Krittanawong; Kipp W Johnson; Robert S Rosenson; Zhen Wang; Mehmet Aydar; Usman Baber; James K Min; W H Wilson Tang; Jonathan L Halperin; Sanjiv M Narayan
Journal:  Eur Heart J       Date:  2019-07-01       Impact factor: 29.983

2.  Automatic Machine Learning to Differentiate Pediatric Posterior Fossa Tumors on Routine MR Imaging.

Authors:  H Zhou; R Hu; O Tang; C Hu; L Tang; K Chang; Q Shen; J Wu; B Zou; B Xiao; J Boxerman; W Chen; R Y Huang; L Yang; H X Bai; C Zhu
Journal:  AJNR Am J Neuroradiol       Date:  2020-07       Impact factor: 3.825

3.  Identification of Patients with Nontraumatic Intracranial Hemorrhage Using Administrative Claims Data.

Authors:  Rohit B Sangal; Samah Fodeh; Andrew Taylor; Craig Rothenberg; Emily B Finn; Kevin Sheth; Charles Matouk; Andrew Ulrich; Vivek Parwani; John Sather; Arjun Venkatesh
Journal:  J Stroke Cerebrovasc Dis       Date:  2020-10-15       Impact factor: 2.136

4.  Identification of DNA motifs that regulate DNA methylation.

Authors:  Mengchi Wang; Kai Zhang; Vu Ngo; Chengyu Liu; Shicai Fan; John W Whitaker; Yue Chen; Rizi Ai; Zhao Chen; Jun Wang; Lina Zheng; Wei Wang
Journal:  Nucleic Acids Res       Date:  2019-07-26       Impact factor: 16.971

Review 5.  A guide to machine learning for biologists.

Authors:  Joe G Greener; Shaun M Kandathil; Lewis Moffat; David T Jones
Journal:  Nat Rev Mol Cell Biol       Date:  2021-09-13       Impact factor: 94.444

6.  Comparative analysis of molecular fingerprints in prediction of drug combination effects.

Authors:  B Zagidullin; Z Wang; Y Guan; E Pitkänen; J Tang
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 11.622

Review 7.  Preparing next-generation scientists for biomedical big data: artificial intelligence approaches.

Authors:  Jason H Moore; Mary Regina Boland; Pablo G Camara; Hannah Chervitz; Graciela Gonzalez; Blanca E Himes; Dokyoon Kim; Danielle L Mowery; Marylyn D Ritchie; Li Shen; Ryan J Urbanowicz; John H Holmes
Journal:  Per Med       Date:  2019-02-14       Impact factor: 2.512

8.  Specific histone modifications associate with alternative exon selection during mammalian development.

Authors:  Qiwen Hu; Casey S Greene; Elizabeth A Heller
Journal:  Nucleic Acids Res       Date:  2020-05-21       Impact factor: 16.971

9.  Random forest-based prediction of stroke outcome.

Authors:  Carlos Fernandez-Lozano; Pablo Hervella; Virginia Mato-Abad; Manuel Rodríguez-Yáñez; Sonia Suárez-Garaboa; Iria López-Dequidt; Ana Estany-Gestal; Tomás Sobrino; Francisco Campos; José Castillo; Santiago Rodríguez-Yáñez; Ramón Iglesias-Rey
Journal:  Sci Rep       Date:  2021-05-12       Impact factor: 4.379

10.  Predicting critical state after COVID-19 diagnosis: model development using a large US electronic health record dataset.

Authors:  Mike D Rinderknecht; Yannick Klopfenstein
Journal:  NPJ Digit Med       Date:  2021-07-20
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.