Xu Shi1, Xiao Wang1, Tian-Li Wang2, Leena Hilakivi-Clarke3, Robert Clarke3, Jianhua Xuan1. 1. Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Arlington, VA 22203, USA. 2. Department of Pathology, Johns Hopkins Medical Institutions, Baltimore, MD 21231, USA. 3. Department of Oncology and Lombardi Comprehensive Cancer Center, Georgetown University, Washington, DC 20057, USA.
Abstract
Motivation: Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems. Results: We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data. A spike-and-slab prior is incorporated into the Bayesian model to enforce the sparsity for isoform identification, effectively alleviating the problem of overfitting. A Gibbs sampling procedure is further developed to simultaneously identify and quantify transcripts from RNA-seq data. With the sampling approach, SparseIso estimates the joint distribution of all candidate transcripts, resulting in a significantly improved performance in detecting lowly expressed transcripts and multiple expressed isoforms of genes. Both simulation study and real data analysis have demonstrated that the proposed SparseIso method significantly outperforms existing methods for improved transcript assembly and isoform identification. Availability and implementation: The SparseIso package is available at http://github.com/henryxushi/SparseIso. Contact: xuan@vt.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Recent advances in high-throughput RNA sequencing (RNA-seq) technologies have made it possible to reconstruct the full transcriptome of various types of cells. It is important to accurately assemble transcripts or identify isoforms for an improved understanding of molecular mechanisms in biological systems. Results: We have developed a novel Bayesian method, SparseIso, to reliably identify spliced isoforms from RNA-seq data. A spike-and-slab prior is incorporated into the Bayesian model to enforce the sparsity for isoform identification, effectively alleviating the problem of overfitting. A Gibbs sampling procedure is further developed to simultaneously identify and quantify transcripts from RNA-seq data. With the sampling approach, SparseIso estimates the joint distribution of all candidate transcripts, resulting in a significantly improved performance in detecting lowly expressed transcripts and multiple expressed isoforms of genes. Both simulation study and real data analysis have demonstrated that the proposed SparseIso method significantly outperforms existing methods for improved transcript assembly and isoform identification. Availability and implementation: The SparseIso package is available at http://github.com/henryxushi/SparseIso. Contact: xuan@vt.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Kin Fai Au; Vittorio Sebastiano; Pegah Tootoonchi Afshar; Jens Durruthy Durruthy; Lawrence Lee; Brian A Williams; Harm van Bakel; Eric E Schadt; Renee A Reijo-Pera; Jason G Underwood; Wing Hung Wong Journal: Proc Natl Acad Sci U S A Date: 2013-11-26 Impact factor: 11.205
Authors: Aziz M Mezlini; Eric J M Smith; Marc Fiume; Orion Buske; Gleb L Savich; Sohrab Shah; Sam Aparicio; Derek Y Chiang; Anna Goldenberg; Michael Brudno Journal: Genome Res Date: 2012-11-29 Impact factor: 9.043
Authors: Donna Karolchik; Galt P Barber; Jonathan Casper; Hiram Clawson; Melissa S Cline; Mark Diekhans; Timothy R Dreszer; Pauline A Fujita; Luvina Guruvadoo; Maximilian Haeussler; Rachel A Harte; Steve Heitner; Angie S Hinrichs; Katrina Learned; Brian T Lee; Chin H Li; Brian J Raney; Brooke Rhead; Kate R Rosenbloom; Cricket A Sloan; Matthew L Speir; Ann S Zweig; David Haussler; Robert M Kuhn; W James Kent Journal: Nucleic Acids Res Date: 2013-11-21 Impact factor: 16.971
Authors: Xu Shi; Andrew F Neuwald; Xiao Wang; Tian-Li Wang; Leena Hilakivi-Clarke; Robert Clarke; Jianhua Xuan Journal: Bioinformatics Date: 2021-05-05 Impact factor: 6.937