Literature DB >> 28977511

Towards enhanced and interpretable clustering/classification in integrative genomics.

Yang Young Lu1, Jinchi Lv2, Jed A Fuhrman3, Fengzhu Sun1,4.   

Abstract

High-throughput technologies have led to large collections of different types of biological data that provide unprecedented opportunities to unravel molecular heterogeneity of biological processes. Nevertheless, how to jointly explore data from multiple sources into a holistic, biologically meaningful interpretation remains challenging. In this work, we propose a scalable and tuning-free preprocessing framework, Heterogeneity Rescaling Pursuit (Hetero-RP), which weighs important features more highly than less important ones in accord with implicitly existing auxiliary knowledge. Finally, we demonstrate effectiveness of Hetero-RP in diverse clustering and classification applications. More importantly, Hetero-RP offers an interpretation of feature importance, shedding light on the driving forces of the underlying biology. In metagenomic contig binning, Hetero-RP automatically weighs abundance and composition profiles according to the varying number of samples, resulting in markedly improved performance of contig binning. In RNA-binding protein (RBP) binding site prediction, Hetero-RP not only improves the prediction performance measured by the area under the receiver operating characteristic curves (AUC), but also uncovers the evidence supported by independent studies, including the distribution of the binding sites of IGF2BP and PUM2, the binding competition between hnRNPC and U2AF2, and the intron-exon boundary of U2AF2 [availability: https://github.com/younglululu/Hetero-RP].
© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28977511      PMCID: PMC5714251          DOI: 10.1093/nar/gkx767

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


  24 in total

1.  A simple and efficient algorithm for gene selection using sparse logistic regression.

Authors:  S K Shevade; S S Keerthi
Journal:  Bioinformatics       Date:  2003-11-22       Impact factor: 6.937

2.  MaxBin 2.0: an automated binning algorithm to recover genomes from multiple metagenomic datasets.

Authors:  Yu-Wei Wu; Blake A Simmons; Steven W Singer
Journal:  Bioinformatics       Date:  2015-10-29       Impact factor: 6.937

Review 3.  Genome-wide association studies for common diseases and complex traits.

Authors:  Joel N Hirschhorn; Mark J Daly
Journal:  Nat Rev Genet       Date:  2005-02       Impact factor: 53.242

Review 4.  Classification of metagenomic sequences: methods and challenges.

Authors:  Sharmila S Mande; Monzoorul Haque Mohammed; Tarini Shankar Ghosh
Journal:  Brief Bioinform       Date:  2012-09-08       Impact factor: 11.622

5.  A Selective Overview of Variable Selection in High Dimensional Feature Space.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  Stat Sin       Date:  2010-01       Impact factor: 1.261

6.  RNAcontext: a new method for learning the sequence and structure binding preferences of RNA-binding proteins.

Authors:  Hilal Kazan; Debashish Ray; Esther T Chan; Timothy R Hughes; Quaid Morris
Journal:  PLoS Comput Biol       Date:  2010-07-01       Impact factor: 4.475

7.  GraphProt: modeling binding preferences of RNA-binding proteins.

Authors:  Daniel Maticzka; Sita J Lange; Fabrizio Costa; Rolf Backofen
Journal:  Genome Biol       Date:  2014-01-22       Impact factor: 13.583

8.  Quantitative mass spectrometry and PAR-CLIP to identify RNA-protein interactions.

Authors:  Marion Scheibe; Falk Butter; Markus Hafner; Thomas Tuschl; Matthias Mann
Journal:  Nucleic Acids Res       Date:  2012-08-09       Impact factor: 16.971

9.  Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements.

Authors:  Kathi Zarnack; Julian König; Mojca Tajnik; Iñigo Martincorena; Sebastian Eustermann; Isabelle Stévant; Alejandro Reyes; Simon Anders; Nicholas M Luscombe; Jernej Ule
Journal:  Cell       Date:  2013-01-31       Impact factor: 41.582

10.  GroopM: an automated tool for the recovery of population genomes from related metagenomes.

Authors:  Michael Imelfort; Donovan Parks; Ben J Woodcroft; Paul Dennis; Philip Hugenholtz; Gene W Tyson
Journal:  PeerJ       Date:  2014-09-30       Impact factor: 2.984

View more
  1 in total

1.  SolidBin: improving metagenome binning with semi-supervised normalized cut.

Authors:  Ziye Wang; Zhengyang Wang; Yang Young Lu; Fengzhu Sun; Shanfeng Zhu
Journal:  Bioinformatics       Date:  2019-11-01       Impact factor: 6.937

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.