Literature DB >> 27583310

The impact of RNA-seq aligners on gene expression estimation.

Cheng Yang1, Po-Yen Wu2, Li Tong3, John H Phan3, May D Wang3.   

Abstract

While numerous RNA-seq data analysis pipelines are available, research has shown that the choice of pipeline influences the results of differentially expressed gene detection and gene expression estimation. Gene expression estimation is a key step in RNA-seq data analysis, since the accuracy of gene expression estimates profoundly affects the subsequent analysis. Generally, gene expression estimation involves sequence alignment and quantification, and accurate gene expression estimation requires accurate alignment. However, the impact of aligners on gene expression estimation remains unclear. We address this need by constructing nine pipelines consisting of nine spliced aligners and one quantifier. We then use simulated data to investigate the impact of aligners on gene expression estimation. To evaluate alignment, we introduce three alignment performance metrics, (1) the percentage of reads aligned, (2) the percentage of reads aligned with zero mismatch (ZeroMismatchPercentage), and (3) the percentage of reads aligned with at most one mismatch (ZeroOneMismatchPercentage). We then evaluate the impact of alignment performance on gene expression estimation using three metrics, (1) gene detection accuracy, (2) the number of genes falsely quantified (FalseExpNum), and (3) the number of genes with falsely estimated fold changes (FalseFcNum). We found that among various pipelines, FalseExpNum and FalseFcNum are correlated. Moreover, FalseExpNum is linearly correlated with the percentage of reads aligned and ZeroMismatchPercentage, and FalseFcNum is linearly correlated with ZeroMismatchPercentage. Because of this correlation, the percentage of reads aligned and ZeroMismatchPercentage may be used to assess the performance of gene expression estimation for all RNA-seq datasets.

Entities:  

Year:  2015        PMID: 27583310      PMCID: PMC5003035          DOI: 10.1145/2808719.2808767

Source DB:  PubMed          Journal:  ACM BCB


  26 in total

1.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.

Authors:  John C Marioni; Christopher E Mason; Shrikant M Mane; Matthew Stephens; Yoav Gilad
Journal:  Genome Res       Date:  2008-06-11       Impact factor: 9.043

Review 2.  RNA-Seq: a revolutionary tool for transcriptomics.

Authors:  Zhong Wang; Mark Gerstein; Michael Snyder
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

3.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control Consortium.

Authors: 
Journal:  Nat Biotechnol       Date:  2014-08-24       Impact factor: 54.908

4.  RNA-Seq gene profiling--a systematic empirical comparison.

Authors:  Nuno A Fonseca; John Marioni; Alvis Brazma
Journal:  PLoS One       Date:  2014-09-30       Impact factor: 3.240

5.  HTSeq--a Python framework to work with high-throughput sequencing data.

Authors:  Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal:  Bioinformatics       Date:  2014-09-25       Impact factor: 6.937

6.  The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote.

Authors:  Yang Liao; Gordon K Smyth; Wei Shi
Journal:  Nucleic Acids Res       Date:  2013-04-04       Impact factor: 16.971

7.  A comparison of methods for differential expression analysis of RNA-seq data.

Authors:  Charlotte Soneson; Mauro Delorenzi
Journal:  BMC Bioinformatics       Date:  2013-03-09       Impact factor: 3.169

8.  OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds.

Authors:  Jie Wu; Olga Anczuków; Adrian R Krainer; Michael Q Zhang; Chaolin Zhang
Journal:  Nucleic Acids Res       Date:  2013-04-09       Impact factor: 16.971

9.  Comparative studies of differential gene calling using RNA-Seq data.

Authors:  Ximeng Zheng; Etsuko N Moriyama
Journal:  BMC Bioinformatics       Date:  2013-10-01       Impact factor: 3.169

10.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions.

Authors:  Daehwan Kim; Geo Pertea; Cole Trapnell; Harold Pimentel; Ryan Kelley; Steven L Salzberg
Journal:  Genome Biol       Date:  2013-04-25       Impact factor: 13.583

View more
  4 in total

1.  Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq.

Authors:  Claire R Williams; Alyssa Baccarella; Jay Z Parrish; Charles C Kim
Journal:  BMC Bioinformatics       Date:  2017-01-17       Impact factor: 3.169

2.  Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis.

Authors:  Luis A Corchete; Elizabeta A Rojas; Diego Alonso-López; Javier De Las Rivas; Norma C Gutiérrez; Francisco J Burguillo
Journal:  Sci Rep       Date:  2020-11-12       Impact factor: 4.379

3.  NASA GeneLab RNA-seq consensus pipeline: standardized processing of short-read RNA-seq data.

Authors:  Eliah G Overbey; Amanda M Saravia-Butler; Zhe Zhang; Komal S Rathi; Homer Fogle; Willian A da Silveira; Richard J Barker; Joseph J Bass; Afshin Beheshti; Daniel C Berrios; Elizabeth A Blaber; Egle Cekanaviciute; Helio A Costa; Laurence B Davin; Kathleen M Fisch; Samrawit G Gebre; Matthew Geniza; Rachel Gilbert; Simon Gilroy; Gary Hardiman; Raúl Herranz; Yared H Kidane; Colin P S Kruse; Michael D Lee; Ted Liefeld; Norman G Lewis; J Tyson McDonald; Robert Meller; Tejaswini Mishra; Imara Y Perera; Shayoni Ray; Sigrid S Reinsch; Sara Brin Rosenthal; Michael Strong; Nathaniel J Szewczyk; Candice G T Tahimic; Deanne M Taylor; Joshua P Vandenbrink; Alicia Villacampa; Silvio Weging; Chris Wolverton; Sarah E Wyatt; Luis Zea; Sylvain V Costes; Jonathan M Galazka
Journal:  iScience       Date:  2021-03-26

4.  Evaluation of STAR and Kallisto on Single Cell RNA-Seq Data Alignment.

Authors:  Yuheng Du; Qianhui Huang; Cedric Arisdakessian; Lana X Garmire
Journal:  G3 (Bethesda)       Date:  2020-05-04       Impact factor: 3.154

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.