Literature DB >> 30658497

A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction.

Lin Liu1, Lin Tang2, Xin Jin3, Wei Zhou4.   

Abstract

With the continuous accumulation of biological data, more and more machine learning algorithms have been introduced into the field of gene function prediction, which has great significance in decoding the secret of life. Recently, a multi-label supervised topic model named labeled latent Dirichlet allocation (LLDA) has been applied to gene function prediction, and obtained more accurate and explainable predictions than conventional methods. Nonetheless, the LLDA model is only able to construct a bag of amino acid words as a classification feature, and does not support any other features, such as hydrophobicity, which has a profound impact on gene function. To achieve more accurate probabilistic modeling of gene function, we propose a multi-label supervised topic model conditioned on arbitrary features, named Dirichlet multinomial regression LLDA (DMR-LLDA), for introducing multiple types of features into the process of topic modeling. Based on DMR framework, DMR-LLDA applies an exponential a priori construction, previously with weighted features, on the hyper-parameters of gene-topic distribution, so as to reflect the effects of extra features on function probability distribution. In the five-fold cross validation experiment of a yeast datasets, DMR-LLDA outperforms the compared model significantly. All of these experiments demonstrate the effectiveness and potential value of DMR-LLDA for predicting gene function.

Entities:  

Keywords:  Dirichlet-multinomial Regression; gene function; multi-label classification; probability distribution; topic model

Mesh:

Year:  2019        PMID: 30658497      PMCID: PMC6356783          DOI: 10.3390/genes10010057

Source DB:  PubMed          Journal:  Genes (Basel)        ISSN: 2073-4425            Impact factor:   4.096


  8 in total

1.  Basic local alignment search tool.

Authors:  S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal:  J Mol Biol       Date:  1990-10-05       Impact factor: 5.469

2.  Predicting Protein Function Using Multiple Kernels.

Authors:  Guoxian Yu; Huzefa Rangwala; Carlotta Domeniconi; Guoji Zhang; Zili Zhang
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2015 Jan-Feb       Impact factor: 3.710

3.  Multilabel classification with principal label space transformation.

Authors:  Farbound Tai; Hsuan-Tien Lin
Journal:  Neural Comput       Date:  2012-05-17       Impact factor: 2.026

4.  ProFET: Feature engineering captures high-level protein functions.

Authors:  Dan Ofer; Michal Linial
Journal:  Bioinformatics       Date:  2015-06-30       Impact factor: 6.937

Review 5.  Novel function discovery through sequence and structural data mining.

Authors:  Briallen Lobb; Andrew C Doxey
Journal:  Curr Opin Struct Biol       Date:  2016-06-10       Impact factor: 6.809

6.  Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks.

Authors:  Renzhi Cao; Jianlin Cheng
Journal:  Methods       Date:  2015-09-11       Impact factor: 3.608

7.  A large-scale evaluation of computational protein function prediction.

Authors:  Predrag Radivojac; Wyatt T Clark; Tal Ronnen Oron; Alexandra M Schnoes; Tobias Wittkop; Artem Sokolov; Kiley Graim; Christopher Funk; Karin Verspoor; Asa Ben-Hur; Gaurav Pandey; Jeffrey M Yunes; Ameet S Talwalkar; Susanna Repo; Michael L Souza; Damiano Piovesan; Rita Casadio; Zheng Wang; Jianlin Cheng; Hai Fang; Julian Gough; Patrik Koskinen; Petri Törönen; Jussi Nokso-Koivisto; Liisa Holm; Domenico Cozzetto; Daniel W A Buchan; Kevin Bryson; David T Jones; Bhakti Limaye; Harshal Inamdar; Avik Datta; Sunitha K Manjari; Rajendra Joshi; Meghana Chitale; Daisuke Kihara; Andreas M Lisewski; Serkan Erdin; Eric Venner; Olivier Lichtarge; Robert Rentzsch; Haixuan Yang; Alfonso E Romero; Prajwal Bhat; Alberto Paccanaro; Tobias Hamp; Rebecca Kaßner; Stefan Seemayer; Esmeralda Vicedo; Christian Schaefer; Dominik Achten; Florian Auer; Ariane Boehm; Tatjana Braun; Maximilian Hecht; Mark Heron; Peter Hönigschmid; Thomas A Hopf; Stefanie Kaufmann; Michael Kiening; Denis Krompass; Cedric Landerer; Yannick Mahlich; Manfred Roos; Jari Björne; Tapio Salakoski; Andrew Wong; Hagit Shatkay; Fanny Gatzmann; Ingolf Sommer; Mark N Wass; Michael J E Sternberg; Nives Škunca; Fran Supek; Matko Bošnjak; Panče Panov; Sašo Džeroski; Tomislav Šmuc; Yiannis A I Kourmpetis; Aalt D J van Dijk; Cajo J F ter Braak; Yuanpeng Zhou; Qingtian Gong; Xinran Dong; Weidong Tian; Marco Falda; Paolo Fontana; Enrico Lavezzo; Barbara Di Camillo; Stefano Toppo; Liang Lan; Nemanja Djuric; Yuhong Guo; Slobodan Vucetic; Amos Bairoch; Michal Linial; Patricia C Babbitt; Steven E Brenner; Christine Orengo; Burkhard Rost; Sean D Mooney; Iddo Friedberg
Journal:  Nat Methods       Date:  2013-01-27       Impact factor: 28.547

8.  Probabilistic topic modeling for the analysis and classification of genomic sequences.

Authors:  Massimo La Rosa; Antonino Fiannaca; Riccardo Rizzo; Alfonso Urso
Journal:  BMC Bioinformatics       Date:  2015-04-17       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.