Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.

Literature DB >> 22135461

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.

Jingyi Jessica Li¹, Ci-Ren Jiang, James B Brown, Haiyan Huang, Peter J Bickel.

Abstract

Since the inception of next-generation mRNA sequencing (RNA-Seq) technology, various attempts have been made to utilize RNA-Seq data in assembling full-length mRNA isoforms de novo and estimating abundance of isoforms. However, for genes with more than a few exons, the problem tends to be challenging and often involves identifiability issues in statistical modeling. We have developed a statistical method called "sparse linear modeling of RNA-Seq data for isoform discovery and abundance estimation" (SLIDE) that takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. SLIDE is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with deterministic isoform assembly algorithms (e.g., Cufflinks), SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data such as RACE, CAGE, and EST into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The SLIDE software package is available at https://sites.google.com/site/jingyijli/SLIDE.zip.

Entities: Gene

Mesh：

Substances：

Year: 2011 PMID： 22135461 PMCID： PMC3250192 DOI： 10.1073/pnas.1113972108

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

22 in total

1. Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage.

Authors: Toshiyuki Shiraki; Shinji Kondo; Shintaro Katayama; Kazunori Waki; Takeya Kasukawa; Hideya Kawaji; Rimantas Kodzius; Akira Watahiki; Mari Nakamura; Takahiro Arakawa; Shiro Fukuda; Daisuke Sasaki; Anna Podhajska; Matthias Harbers; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki
Journal: Proc Natl Acad Sci U S A Date: 2003-12-08 Impact factor: 11.205

2. Complementary DNA sequencing: expressed sequence tags and human genome project.

Authors: M D Adams; J M Kelley; J D Gocayne; M Dubnick; M H Polymeropoulos; H Xiao; C R Merril; A Wu; B Olde; R F Moreno
Journal: Science Date: 1991-06-21 Impact factor: 47.728

3. Statistical inferences for isoform expression in RNA-Seq.

Authors: Hui Jiang; Wing Hung Wong
Journal: Bioinformatics Date: 2009-02-25 Impact factor: 6.937

4. Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors: Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal: Nat Methods Date: 2008-05-30 Impact factor: 28.547

5. RNA-Seq gene expression estimation with read mapping uncertainty.

Authors: Bo Li; Victor Ruotti; Ron M Stewart; James A Thomson; Colin N Dewey
Journal: Bioinformatics Date: 2009-12-18 Impact factor: 6.937

6. Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs.

Authors: Mitchell Guttman; Manuel Garber; Joshua Z Levin; Julie Donaghey; James Robinson; Xian Adiconis; Lin Fan; Magdalena J Koziol; Andreas Gnirke; Chad Nusbaum; John L Rinn; Eric S Lander; Aviv Regev
Journal: Nat Biotechnol Date: 2010-05-02 Impact factor: 54.908

7. Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries.

Authors: Corinne Dahinden; Giovanni Parmigiani; Mark C Emerick; Peter Bühlmann
Journal: BMC Bioinformatics Date: 2007-12-11 Impact factor: 3.169

8. Improving RNA-Seq expression estimates by correcting for fragment bias.

Authors: Adam Roberts; Cole Trapnell; Julie Donaghey; John L Rinn; Lior Pachter
Journal: Genome Biol Date: 2011-03-16 Impact factor: 13.583

9. Unlocking the secrets of the genome.

Authors: Susan E Celniker; Laura A L Dillon; Mark B Gerstein; Kristin C Gunsalus; Steven Henikoff; Gary H Karpen; Manolis Kellis; Eric C Lai; Jason D Lieb; David M MacAlpine; Gos Micklem; Fabio Piano; Michael Snyder; Lincoln Stein; Kevin P White; Robert H Waterston
Journal: Nature Date: 2009-06-18 Impact factor: 49.962

10. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

Authors: Juliane C Dohm; Claudio Lottaz; Tatiana Borodina; Heinz Himmelbauer
Journal: Nucleic Acids Res Date: 2008-07-26 Impact factor: 16.971

62 in total

1. RNA Sequencing and Analysis.

Authors: Kimberly R Kukurba; Stephen B Montgomery
Journal: Cold Spring Harb Protoc Date: 2015-04-13

2. Characterization of the human ESC transcriptome by hybrid sequencing.

Authors: Kin Fai Au; Vittorio Sebastiano; Pegah Tootoonchi Afshar; Jens Durruthy Durruthy; Lawrence Lee; Brian A Williams; Harm van Bakel; Eric E Schadt; Renee A Reijo-Pera; Jason G Underwood; Wing Hung Wong
Journal: Proc Natl Acad Sci U S A Date: 2013-11-26 Impact factor: 11.205

3. Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.

Authors: Wei Li; Tao Jiang
Journal: Bioinformatics Date: 2012-10-11 Impact factor: 6.937

4. A robust method for transcript quantification with RNA-seq data.

Authors: Yan Huang; Yin Hu; Corbin D Jones; James N MacLeod; Derek Y Chiang; Yufeng Liu; Jan F Prins; Jinze Liu
Journal: J Comput Biol Date: 2013-03 Impact factor: 1.479

Review 5. Navigating and mining modENCODE data.

Authors: Nathan Boley; Kenneth H Wan; Peter J Bickel; Susan E Celniker
Journal: Methods Date: 2014-03-15 Impact factor: 3.608

6. On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly.

Authors: Romeo Rizzi; Alexandru I Tomescu; Veli Mäkinen
Journal: BMC Bioinformatics Date: 2014-09-10 Impact factor: 3.169

7. Targeted sequencing for gene discovery and quantification using RNA CaptureSeq.

Authors: Tim R Mercer; Michael B Clark; Joanna Crawford; Marion E Brunck; Daniel J Gerhardt; Ryan J Taft; Lars K Nielsen; Marcel E Dinger; John S Mattick
Journal: Nat Protoc Date: 2014-04-03 Impact factor: 13.491

Review 8. Genome-guided transcriptome assembly in the age of next-generation sequencing.

Authors: Liliana D Florea; Steven L Salzberg
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2013 Sep-Oct Impact factor: 3.710

9. IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity.

Authors: Wei Sun; Yufeng Liu; James J Crowley; Ting-Hued Chen; Hua Zhou; Haitao Chu; Shunping Huang; Pei-Fen Kuan; Yuan Li; Darla R Miller; Ginger D Shaw; Yichao Wu; Vasyl Zhabotynsky; Leonard McMillan; Fei Zou; Patrick F Sullivan; Fernando Pardo-Manuel de Villena
Journal: J Am Stat Assoc Date: 2015-11-07 Impact factor: 5.033

10. CLASS2: accurate and efficient splice variant annotation from RNA-seq reads.

Authors: Li Song; Sarven Sabunciyan; Liliana Florea
Journal: Nucleic Acids Res Date: 2016-03-14 Impact factor: 16.971