Literature DB >> 27896983

LEARNING PARSIMONIOUS ENSEMBLES FOR UNBALANCED COMPUTATIONAL GENOMICS PROBLEMS.

Ana Stanescu1, Gaurav Pandey.   

Abstract

Prediction problems in biomedical sciences are generally quite difficult, partially due to incomplete knowledge of how the phenomenon of interest is influenced by the variables and measurements used for prediction, as well as a lack of consensus regarding the ideal predictor(s) for specific problems. In these situations, a powerful approach to improving prediction performance is to construct ensembles that combine the outputs of many individual base predictors, which have been successful for many biomedical prediction tasks. Moreover, selecting a parsimonious ensemble can be of even greater value for biomedical sciences, where it is not only important to learn an accurate predictor, but also to interpret what novel knowledge it can provide about the target problem. Ensemble selection is a promising approach for this task because of its ability to select a collectively predictive subset, often a relatively small one, of all input base predictors. One of the most well-known algorithms for ensemble selection, CES (Caruana et al.'s Ensemble Selection), generally performs well in practice, but faces several challenges due to the difficulty of choosing the right values of its various parameters. Since the choices made for these parameters are usually ad-hoc, good performance of CES is difficult to guarantee for a variety of problems or datasets. To address these challenges with CES and other such algorithms, we propose a novel heterogeneous ensemble selection approach based on the paradigm of reinforcement learning (RL), which offers a more systematic and mathematically sound methodology for exploring the many possible combinations of base predictors that can be selected into an ensemble. We develop three RL-based strategies for constructing ensembles and analyze their results on two unbalanced computational genomics problems, namely the prediction of protein function and splice sites in eukaryotic genomes. We show that the resultant ensembles are indeed substantially more parsimonious as compared to the full set of base predictors, yet still offer almost the same classification power, especially for larger datasets. The RL ensembles also yield a better combination of parsimony and predictive performance as compared to CES.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 27896983      PMCID: PMC5147733          DOI: 10.1142/9789813207813_0028

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  13 in total

1.  Functional discovery via a compendium of expression profiles.

Authors:  T R Hughes; M J Marton; A R Jones; C J Roberts; R Stoughton; C D Armour; H A Bennett; E Coffey; H Dai; Y D He; M J Kidd; A M King; M R Meyer; D Slade; P Y Lum; S B Stepaniants; D D Shoemaker; D Gachotte; K Chakraburtty; J Simon; M Bard; S H Friend
Journal:  Cell       Date:  2000-07-07       Impact factor: 41.582

Review 2.  Large-scale prediction of drug-target relationships.

Authors:  Michael Kuhn; Mónica Campillos; Paula González; Lars Juhl Jensen; Peer Bork
Journal:  FEBS Lett       Date:  2008-02-20       Impact factor: 4.124

3.  RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression.

Authors:  M B Shapiro; P Senapathy
Journal:  Nucleic Acids Res       Date:  1987-09-11       Impact factor: 16.971

4.  Protein function prediction using multilabel ensemble classification.

Authors:  Guoxian Yu; Huzefa Rangwala; Carlotta Domeniconi; Guoji Zhang; Zhiwen Yu
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2013 Jul-Aug       Impact factor: 3.710

5.  An integrative multi-network and multi-classifier approach to predict genetic interactions.

Authors:  Gaurav Pandey; Bin Zhang; Aaron N Chang; Chad L Myers; Jun Zhu; Vipin Kumar; Eric E Schadt
Journal:  PLoS Comput Biol       Date:  2010-09-09       Impact factor: 4.475

6.  Improving the Caenorhabditis elegans genome annotation using machine learning.

Authors:  Gunnar Rätsch; Sören Sonnenburg; Jagan Srinivasan; Hanh Witte; Klaus-R Müller; Ralf-J Sommer; Bernhard Schölkopf
Journal:  PLoS Comput Biol       Date:  2006-12-21       Impact factor: 4.475

7.  Finding function: evaluation methods for functional genomic data.

Authors:  Chad L Myers; Daniel R Barrett; Matthew A Hibbs; Curtis Huttenhower; Olga G Troyanskaya
Journal:  BMC Genomics       Date:  2006-07-25       Impact factor: 3.969

8.  A large-scale evaluation of computational protein function prediction.

Authors:  Predrag Radivojac; Wyatt T Clark; Tal Ronnen Oron; Alexandra M Schnoes; Tobias Wittkop; Artem Sokolov; Kiley Graim; Christopher Funk; Karin Verspoor; Asa Ben-Hur; Gaurav Pandey; Jeffrey M Yunes; Ameet S Talwalkar; Susanna Repo; Michael L Souza; Damiano Piovesan; Rita Casadio; Zheng Wang; Jianlin Cheng; Hai Fang; Julian Gough; Patrik Koskinen; Petri Törönen; Jussi Nokso-Koivisto; Liisa Holm; Domenico Cozzetto; Daniel W A Buchan; Kevin Bryson; David T Jones; Bhakti Limaye; Harshal Inamdar; Avik Datta; Sunitha K Manjari; Rajendra Joshi; Meghana Chitale; Daisuke Kihara; Andreas M Lisewski; Serkan Erdin; Eric Venner; Olivier Lichtarge; Robert Rentzsch; Haixuan Yang; Alfonso E Romero; Prajwal Bhat; Alberto Paccanaro; Tobias Hamp; Rebecca Kaßner; Stefan Seemayer; Esmeralda Vicedo; Christian Schaefer; Dominik Achten; Florian Auer; Ariane Boehm; Tatjana Braun; Maximilian Hecht; Mark Heron; Peter Hönigschmid; Thomas A Hopf; Stefanie Kaufmann; Michael Kiening; Denis Krompass; Cedric Landerer; Yannick Mahlich; Manfred Roos; Jari Björne; Tapio Salakoski; Andrew Wong; Hagit Shatkay; Fanny Gatzmann; Ingolf Sommer; Mark N Wass; Michael J E Sternberg; Nives Škunca; Fran Supek; Matko Bošnjak; Panče Panov; Sašo Džeroski; Tomislav Šmuc; Yiannis A I Kourmpetis; Aalt D J van Dijk; Cajo J F ter Braak; Yuanpeng Zhou; Qingtian Gong; Xinran Dong; Weidong Tian; Marco Falda; Paolo Fontana; Enrico Lavezzo; Barbara Di Camillo; Stefano Toppo; Liang Lan; Nemanja Djuric; Yuhong Guo; Slobodan Vucetic; Amos Bairoch; Michal Linial; Patricia C Babbitt; Steven E Brenner; Christine Orengo; Burkhard Rost; Sean D Mooney; Iddo Friedberg
Journal:  Nat Methods       Date:  2013-01-27       Impact factor: 28.547

9.  Predicting gene function in a hierarchical context with an ensemble of classifiers.

Authors:  Yuanfang Guan; Chad L Myers; David C Hess; Zafer Barutcuoglu; Amy A Caudy; Olga G Troyanskaya
Journal:  Genome Biol       Date:  2008-06-27       Impact factor: 13.583

10.  Comparison of classifier fusion methods for predicting response to anti HIV-1 therapy.

Authors:  André Altmann; Michal Rosen-Zvi; Mattia Prosperi; Ehud Aharoni; Hani Neuvirth; Eugen Schülter; Joachim Büch; Daniel Struck; Yardena Peres; Francesca Incardona; Anders Sönnerborg; Rolf Kaiser; Maurizio Zazzi; Thomas Lengauer
Journal:  PLoS One       Date:  2008-10-21       Impact factor: 3.240

View more
  4 in total

1.  Integrating multimodal data through interpretable heterogeneous ensembles.

Authors:  Yan Chak Li; Linhua Wang; Jeffrey N Law; T M Murali; Gaurav Pandey
Journal:  bioRxiv       Date:  2022-07-25

2.  Integrating multimodal data through interpretable heterogeneous ensembles.

Authors:  Yan Chak Li; Linhua Wang; Jeffrey N Law; T M Murali; Gaurav Pandey
Journal:  Bioinform Adv       Date:  2022-09-12

3.  Large-scale protein function prediction using heterogeneous ensembles.

Authors:  Linhua Wang; Jeffrey Law; Shiv D Kale; T M Murali; Gaurav Pandey
Journal:  F1000Res       Date:  2018-09-28

4.  A crowdsourced analysis to identify ab initio molecular signatures predictive of susceptibility to viral infection.

Authors:  Slim Fourati; Aarthi Talla; Mehrad Mahmoudian; Joshua G Burkhart; Riku Klén; Ricardo Henao; Thomas Yu; Zafer Aydın; Ka Yee Yeung; Mehmet Eren Ahsen; Reem Almugbel; Samad Jahandideh; Xiao Liang; Torbjörn E M Nordling; Motoki Shiga; Ana Stanescu; Robert Vogel; Gaurav Pandey; Christopher Chiu; Micah T McClain; Christopher W Woods; Geoffrey S Ginsburg; Laura L Elo; Ephraim L Tsalik; Lara M Mangravite; Solveig K Sieberts
Journal:  Nat Commun       Date:  2018-10-24       Impact factor: 14.919

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.