Literature DB >> 26342255

Predicting protein function and other biomedical characteristics with heterogeneous ensembles.

Sean Whalen1, Om Prakash Pandey2, Gaurav Pandey3.   

Abstract

Prediction problems in biomedical sciences, including protein function prediction (PFP), are generally quite difficult. This is due in part to incomplete knowledge of the cellular phenomenon of interest, the appropriateness and data quality of the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor for specific problems. In such scenarios, a powerful approach to improving prediction performance is to construct heterogeneous ensemble predictors that combine the output of diverse individual predictors that capture complementary aspects of the problems and/or datasets. In this paper, we demonstrate the potential of such heterogeneous ensembles, derived from stacking and ensemble selection methods, for addressing PFP and other similar biomedical prediction problems. Deeper analysis of these results shows that the superior predictive ability of these methods, especially stacking, can be attributed to their attention to the following aspects of the ensemble learning process: (i) better balance of diversity and performance, (ii) more effective calibration of outputs and (iii) more robust incorporation of additional base predictors. Finally, to make the effective application of heterogeneous ensembles to large complex datasets (big data) feasible, we present DataSink, a distributed ensemble learning framework, and demonstrate its sound scalability using the examined datasets. DataSink is publicly available from https://github.com/shwhalen/datasink.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Distributed machine learning; Diversity-performance tradeoff; Ensemble calibration; Heterogeneous ensembles; Nested cross-validation; Protein function prediction

Mesh:

Substances:

Year:  2015        PMID: 26342255      PMCID: PMC4718788          DOI: 10.1016/j.ymeth.2015.08.016

Source DB:  PubMed          Journal:  Methods        ISSN: 1046-2023            Impact factor:   3.608


  24 in total

Review 1.  Principles for the buffering of genetic variation.

Authors:  J L Hartman; B Garvik; L Hartwell
Journal:  Science       Date:  2001-02-09       Impact factor: 47.728

2.  Functional discovery via a compendium of expression profiles.

Authors:  T R Hughes; M J Marton; A R Jones; C J Roberts; R Stoughton; C D Armour; H A Bennett; E Coffey; H Dai; Y D He; M J Kidd; A M King; M R Meyer; D Slade; P Y Lum; S B Stepaniants; D D Shoemaker; D Gachotte; K Chakraburtty; J Simon; M Bard; S H Friend
Journal:  Cell       Date:  2000-07-07       Impact factor: 41.582

3.  Protein function prediction using multilabel ensemble classification.

Authors:  Guoxian Yu; Huzefa Rangwala; Carlotta Domeniconi; Guoji Zhang; Zhiwen Yu
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2013 Jul-Aug       Impact factor: 3.710

4.  Synthetic sickness or lethality points at candidate combination therapy targets in glioblastoma.

Authors:  Ewa Szczurek; Navodit Misra; Martin Vingron
Journal:  Int J Cancer       Date:  2013-06-04       Impact factor: 7.396

5.  Diagnosis of multiple cancer types by shrunken centroids of gene expression.

Authors:  Robert Tibshirani; Trevor Hastie; Balasubramanian Narasimhan; Gilbert Chu
Journal:  Proc Natl Acad Sci U S A       Date:  2002-05-14       Impact factor: 11.205

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

7.  The genetic landscape of a cell.

Authors:  Michael Costanzo; Anastasia Baryshnikova; Jeremy Bellay; Yungil Kim; Eric D Spear; Carolyn S Sevier; Huiming Ding; Judice L Y Koh; Kiana Toufighi; Sara Mostafavi; Jeany Prinz; Robert P St Onge; Benjamin VanderSluis; Taras Makhnevych; Franco J Vizeacoumar; Solmaz Alizadeh; Sondra Bahr; Renee L Brost; Yiqun Chen; Murat Cokol; Raamesh Deshpande; Zhijian Li; Zhen-Yuan Lin; Wendy Liang; Michaela Marback; Jadine Paw; Bryan-Joseph San Luis; Ermira Shuteriqi; Amy Hin Yan Tong; Nydia van Dyk; Iain M Wallace; Joseph A Whitney; Matthew T Weirauch; Guoqing Zhong; Hongwei Zhu; Walid A Houry; Michael Brudno; Sasan Ragibizadeh; Balázs Papp; Csaba Pál; Frederick P Roth; Guri Giaever; Corey Nislow; Olga G Troyanskaya; Howard Bussey; Gary D Bader; Anne-Claude Gingras; Quaid D Morris; Philip M Kim; Chris A Kaiser; Chad L Myers; Brenda J Andrews; Charles Boone
Journal:  Science       Date:  2010-01-22       Impact factor: 47.728

8.  Finding function: evaluation methods for functional genomic data.

Authors:  Chad L Myers; Daniel R Barrett; Matthew A Hibbs; Curtis Huttenhower; Olga G Troyanskaya
Journal:  BMC Genomics       Date:  2006-07-25       Impact factor: 3.969

9.  Toward better benchmarking: challenge-based methods assessment in cancer genomics.

Authors:  Paul C Boutros; Adam A Margolin; Joshua M Stuart; Andrea Califano; Gustavo Stolovitzky
Journal:  Genome Biol       Date:  2014-09-17       Impact factor: 13.583

10.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.

Authors:  Lourdes Peña-Castillo; Murat Tasan; Chad L Myers; Hyunju Lee; Trupti Joshi; Chao Zhang; Yuanfang Guan; Michele Leone; Andrea Pagnani; Wan Kyu Kim; Chase Krumpelman; Weidong Tian; Guillaume Obozinski; Yanjun Qi; Sara Mostafavi; Guan Ning Lin; Gabriel F Berriz; Francis D Gibbons; Gert Lanckriet; Jian Qiu; Charles Grant; Zafer Barutcuoglu; David P Hill; David Warde-Farley; Chris Grouios; Debajyoti Ray; Judith A Blake; Minghua Deng; Michael I Jordan; William S Noble; Quaid Morris; Judith Klein-Seetharaman; Ziv Bar-Joseph; Ting Chen; Fengzhu Sun; Olga G Troyanskaya; Edward M Marcotte; Dong Xu; Timothy R Hughes; Frederick P Roth
Journal:  Genome Biol       Date:  2008-06-27       Impact factor: 13.583

View more
  11 in total

1.  LEARNING PARSIMONIOUS ENSEMBLES FOR UNBALANCED COMPUTATIONAL GENOMICS PROBLEMS.

Authors:  Ana Stanescu; Gaurav Pandey
Journal:  Pac Symp Biocomput       Date:  2017

2.  MetaClean: a machine learning-based classifier for reduced false positive peak detection in untargeted LC-MS metabolomics data.

Authors:  Kelsey Chetnik; Lauren Petrick; Gaurav Pandey
Journal:  Metabolomics       Date:  2020-10-21       Impact factor: 4.290

3.  Objective risk stratification of prostate cancer using machine learning and radiomics applied to multiparametric magnetic resonance images.

Authors:  Bino Varghese; Frank Chen; Darryl Hwang; Suzanne L Palmer; Andre Luis De Castro Abreu; Osamu Ukimura; Monish Aron; Manju Aron; Inderbir Gill; Vinay Duddalwar; Gaurav Pandey
Journal:  Sci Rep       Date:  2019-02-07       Impact factor: 4.379

4.  Gene function finding through cross-organism ensemble learning.

Authors:  Gianluca Moro; Marco Masseroli
Journal:  BioData Min       Date:  2021-02-12       Impact factor: 2.522

5.  Integrating multimodal data through interpretable heterogeneous ensembles.

Authors:  Yan Chak Li; Linhua Wang; Jeffrey N Law; T M Murali; Gaurav Pandey
Journal:  bioRxiv       Date:  2022-07-25

Review 6.  Prediction of Genetic Interactions Using Machine Learning and Network Properties.

Authors:  Neel S Madhukar; Olivier Elemento; Gaurav Pandey
Journal:  Front Bioeng Biotechnol       Date:  2015-10-26

7.  A Nasal Brush-based Classifier of Asthma Identified by Machine Learning Analysis of Nasal RNA Sequence Data.

Authors:  Gaurav Pandey; Om P Pandey; Angela J Rogers; Mehmet E Ahsen; Gabriel E Hoffman; Benjamin A Raby; Scott T Weiss; Eric E Schadt; Supinda Bunyavanich
Journal:  Sci Rep       Date:  2018-06-11       Impact factor: 4.379

8.  Large-scale protein function prediction using heterogeneous ensembles.

Authors:  Linhua Wang; Jeffrey Law; Shiv D Kale; T M Murali; Gaurav Pandey
Journal:  F1000Res       Date:  2018-09-28

9.  Evaluation of Combined Artificial Intelligence and Radiologist Assessment to Interpret Screening Mammograms.

Authors:  Thomas Schaffter; Diana S M Buist; Christoph I Lee; Yaroslav Nikulin; Dezso Ribli; Yuanfang Guan; William Lotter; Zequn Jie; Hao Du; Sijia Wang; Jiashi Feng; Mengling Feng; Hyo-Eun Kim; Francisco Albiol; Alberto Albiol; Stephen Morrell; Zbigniew Wojna; Mehmet Eren Ahsen; Umar Asif; Antonio Jimeno Yepes; Shivanthan Yohanandan; Simona Rabinovici-Cohen; Darvin Yi; Bruce Hoff; Thomas Yu; Elias Chaibub Neto; Daniel L Rubin; Peter Lindholm; Laurie R Margolies; Russell Bailey McBride; Joseph H Rothstein; Weiva Sieh; Rami Ben-Ari; Stefan Harrer; Andrew Trister; Stephen Friend; Thea Norman; Berkman Sahiner; Fredrik Strand; Justin Guinney; Gustavo Stolovitzky; Lester Mackey; Joyce Cahoon; Li Shen; Jae Ho Sohn; Hari Trivedi; Yiqiu Shen; Ljubomir Buturovic; Jose Costa Pereira; Jaime S Cardoso; Eduardo Castro; Karl Trygve Kalleberg; Obioma Pelka; Imane Nedjar; Krzysztof J Geras; Felix Nensa; Ethan Goan; Sven Koitka; Luis Caballero; David D Cox; Pavitra Krishnaswamy; Gaurav Pandey; Christoph M Friedrich; Dimitri Perrin; Clinton Fookes; Bibo Shi; Gerard Cardoso Negrie; Michael Kawczynski; Kyunghyun Cho; Can Son Khoo; Joseph Y Lo; A Gregory Sorensen; Hwejin Jung
Journal:  JAMA Netw Open       Date:  2020-03-02

10.  A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection.

Authors:  Slim Fourati; Aarthi Talla; Mehrad Mahmoudian; Joshua G Burkhart; Riku Klén; Ricardo Henao; Thomas Yu; Zafer Aydın; Ka Yee Yeung; Mehmet Eren Ahsen; Reem Almugbel; Samad Jahandideh; Xiao Liang; Torbjörn E M Nordling; Motoki Shiga; Ana Stanescu; Robert Vogel; Gaurav Pandey; Christopher Chiu; Micah T McClain; Christopher W Woods; Geoffrey S Ginsburg; Laura L Elo; Ephraim L Tsalik; Lara M Mangravite; Solveig K Sieberts
Journal:  Nat Commun       Date:  2018-10-24       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.