Literature DB >> 22135461

Sparse linear modeling of next-generation mRNA sequencing (RNA-Seq) data for isoform discovery and abundance estimation.

Jingyi Jessica Li1, Ci-Ren Jiang, James B Brown, Haiyan Huang, Peter J Bickel.   

Abstract

Since the inception of next-generation mRNA sequencing (RNA-Seq) technology, various attempts have been made to utilize RNA-Seq data in assembling full-length mRNA isoforms de novo and estimating abundance of isoforms. However, for genes with more than a few exons, the problem tends to be challenging and often involves identifiability issues in statistical modeling. We have developed a statistical method called "sparse linear modeling of RNA-Seq data for isoform discovery and abundance estimation" (SLIDE) that takes exon boundaries and RNA-Seq data as input to discern the set of mRNA isoforms that are most likely to present in an RNA-Seq sample. SLIDE is based on a linear model with a design matrix that models the sampling probability of RNA-Seq reads from different mRNA isoforms. To tackle the model unidentifiability issue, SLIDE uses a modified Lasso procedure for parameter estimation. Compared with deterministic isoform assembly algorithms (e.g., Cufflinks), SLIDE considers the stochastic aspects of RNA-Seq reads in exons from different isoforms and thus has increased power in detecting more novel isoforms. Another advantage of SLIDE is its flexibility of incorporating other transcriptomic data such as RACE, CAGE, and EST into its model to further increase isoform discovery accuracy. SLIDE can also work downstream of other RNA-Seq assembly algorithms to integrate newly discovered genes and exons. Besides isoform discovery, SLIDE sequentially uses the same linear model to estimate the abundance of discovered isoforms. Simulation and real data studies show that SLIDE performs as well as or better than major competitors in both isoform discovery and abundance estimation. The SLIDE software package is available at https://sites.google.com/site/jingyijli/SLIDE.zip.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 22135461      PMCID: PMC3250192          DOI: 10.1073/pnas.1113972108

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  22 in total

1.  Cap analysis gene expression for high-throughput analysis of transcriptional starting point and identification of promoter usage.

Authors:  Toshiyuki Shiraki; Shinji Kondo; Shintaro Katayama; Kazunori Waki; Takeya Kasukawa; Hideya Kawaji; Rimantas Kodzius; Akira Watahiki; Mari Nakamura; Takahiro Arakawa; Shiro Fukuda; Daisuke Sasaki; Anna Podhajska; Matthias Harbers; Jun Kawai; Piero Carninci; Yoshihide Hayashizaki
Journal:  Proc Natl Acad Sci U S A       Date:  2003-12-08       Impact factor: 11.205

2.  Complementary DNA sequencing: expressed sequence tags and human genome project.

Authors:  M D Adams; J M Kelley; J D Gocayne; M Dubnick; M H Polymeropoulos; H Xiao; C R Merril; A Wu; B Olde; R F Moreno
Journal:  Science       Date:  1991-06-21       Impact factor: 47.728

3.  Statistical inferences for isoform expression in RNA-Seq.

Authors:  Hui Jiang; Wing Hung Wong
Journal:  Bioinformatics       Date:  2009-02-25       Impact factor: 6.937

4.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

5.  RNA-Seq gene expression estimation with read mapping uncertainty.

Authors:  Bo Li; Victor Ruotti; Ron M Stewart; James A Thomson; Colin N Dewey
Journal:  Bioinformatics       Date:  2009-12-18       Impact factor: 6.937

6.  Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs.

Authors:  Mitchell Guttman; Manuel Garber; Joshua Z Levin; Julie Donaghey; James Robinson; Xian Adiconis; Lin Fan; Magdalena J Koziol; Andreas Gnirke; Chad Nusbaum; John L Rinn; Eric S Lander; Aviv Regev
Journal:  Nat Biotechnol       Date:  2010-05-02       Impact factor: 54.908

7.  Penalized likelihood for sparse contingency tables with an application to full-length cDNA libraries.

Authors:  Corinne Dahinden; Giovanni Parmigiani; Mark C Emerick; Peter Bühlmann
Journal:  BMC Bioinformatics       Date:  2007-12-11       Impact factor: 3.169

8.  Improving RNA-Seq expression estimates by correcting for fragment bias.

Authors:  Adam Roberts; Cole Trapnell; Julie Donaghey; John L Rinn; Lior Pachter
Journal:  Genome Biol       Date:  2011-03-16       Impact factor: 13.583

9.  Unlocking the secrets of the genome.

Authors:  Susan E Celniker; Laura A L Dillon; Mark B Gerstein; Kristin C Gunsalus; Steven Henikoff; Gary H Karpen; Manolis Kellis; Eric C Lai; Jason D Lieb; David M MacAlpine; Gos Micklem; Fabio Piano; Michael Snyder; Lincoln Stein; Kevin P White; Robert H Waterston
Journal:  Nature       Date:  2009-06-18       Impact factor: 49.962

10.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

Authors:  Juliane C Dohm; Claudio Lottaz; Tatiana Borodina; Heinz Himmelbauer
Journal:  Nucleic Acids Res       Date:  2008-07-26       Impact factor: 16.971

View more
  62 in total

1.  RNA Sequencing and Analysis.

Authors:  Kimberly R Kukurba; Stephen B Montgomery
Journal:  Cold Spring Harb Protoc       Date:  2015-04-13

2.  Characterization of the human ESC transcriptome by hybrid sequencing.

Authors:  Kin Fai Au; Vittorio Sebastiano; Pegah Tootoonchi Afshar; Jens Durruthy Durruthy; Lawrence Lee; Brian A Williams; Harm van Bakel; Eric E Schadt; Renee A Reijo-Pera; Jason G Underwood; Wing Hung Wong
Journal:  Proc Natl Acad Sci U S A       Date:  2013-11-26       Impact factor: 11.205

3.  Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads.

Authors:  Wei Li; Tao Jiang
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

4.  A robust method for transcript quantification with RNA-seq data.

Authors:  Yan Huang; Yin Hu; Corbin D Jones; James N MacLeod; Derek Y Chiang; Yufeng Liu; Jan F Prins; Jinze Liu
Journal:  J Comput Biol       Date:  2013-03       Impact factor: 1.479

Review 5.  Navigating and mining modENCODE data.

Authors:  Nathan Boley; Kenneth H Wan; Peter J Bickel; Susan E Celniker
Journal:  Methods       Date:  2014-03-15       Impact factor: 3.608

6.  On the complexity of Minimum Path Cover with Subpath Constraints for multi-assembly.

Authors:  Romeo Rizzi; Alexandru I Tomescu; Veli Mäkinen
Journal:  BMC Bioinformatics       Date:  2014-09-10       Impact factor: 3.169

7.  Targeted sequencing for gene discovery and quantification using RNA CaptureSeq.

Authors:  Tim R Mercer; Michael B Clark; Joanna Crawford; Marion E Brunck; Daniel J Gerhardt; Ryan J Taft; Lars K Nielsen; Marcel E Dinger; John S Mattick
Journal:  Nat Protoc       Date:  2014-04-03       Impact factor: 13.491

Review 8.  Genome-guided transcriptome assembly in the age of next-generation sequencing.

Authors:  Liliana D Florea; Steven L Salzberg
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2013 Sep-Oct       Impact factor: 3.710

9.  IsoDOT Detects Differential RNA-isoform Expression/Usage with respect to a Categorical or Continuous Covariate with High Sensitivity and Specificity.

Authors:  Wei Sun; Yufeng Liu; James J Crowley; Ting-Hued Chen; Hua Zhou; Haitao Chu; Shunping Huang; Pei-Fen Kuan; Yuan Li; Darla R Miller; Ginger D Shaw; Yichao Wu; Vasyl Zhabotynsky; Leonard McMillan; Fei Zou; Patrick F Sullivan; Fernando Pardo-Manuel de Villena
Journal:  J Am Stat Assoc       Date:  2015-11-07       Impact factor: 5.033

10.  CLASS2: accurate and efficient splice variant annotation from RNA-seq reads.

Authors:  Li Song; Sarven Sabunciyan; Liliana Florea
Journal:  Nucleic Acids Res       Date:  2016-03-14       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.