Literature DB >> 33712853

The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.

Holly C Beale1,2, Jacquelyn M Roger3, Matthew A Cattle3, Liam T McKay3, Drew K A Thompson3, Katrina Learned2, A Geoffrey Lyle1,2, Ellen T Kephart2, Rob Currie2, Du Linh Lam2, Lauren Sanders1, Jacob Pfeil2, John Vivian2, Isabel Bjork2, Sofie R Salama4,5, David Haussler4,5, Olena M Vaske1,2.   

Abstract

BACKGROUND: The reproducibility of gene expression measured by RNA sequencing (RNA-Seq) is dependent on the sequencing depth. While unmapped or non-exonic reads do not contribute to gene expression quantification, duplicate reads contribute to the quantification but are not informative for reproducibility. We show that mapped, exonic, non-duplicate (MEND) reads are a useful measure of reproducibility of RNA-Seq datasets used for gene expression analysis.
FINDINGS: In bulk RNA-Seq datasets from 2,179 tumors in 48 cohorts, the fraction of reads that contribute to the reproducibility of gene expression analysis varies greatly. Unmapped reads constitute 1-77% of all reads (median [IQR], 3% [3-6%]); duplicate reads constitute 3-100% of mapped reads (median [IQR], 27% [13-43%]); and non-exonic reads constitute 4-97% of mapped, non-duplicate reads (median [IQR], 25% [16-37%]). MEND reads constitute 0-79% of total reads (median [IQR], 50% [30-61%]).
CONCLUSIONS: Because not all reads in an RNA-Seq dataset are informative for reproducibility of gene expression measurements and the fraction of reads that are informative varies, we propose reporting a dataset's sequencing depth in MEND reads, which definitively inform the reproducibility of gene expression, rather than total, mapped, or exonic reads. We provide a Docker image containing (i) the existing required tools (RSeQC, sambamba, and samblaster) and (ii) a custom script to calculate MEND reads from RNA-Seq data files. We recommend that all RNA-Seq gene expression experiments, sensitivity studies, and depth recommendations use MEND units for sequencing depth.
© The Author(s) 2021. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  RNA-Seq; depth; duplicate; exonic; quality; sequencing; unmapped

Year:  2021        PMID: 33712853      PMCID: PMC7955155          DOI: 10.1093/gigascience/giab011

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   6.524


  17 in total

1.  Reproducibility of high-throughput mRNA and small RNA sequencing across laboratories.

Authors:  Peter A C 't Hoen; Marc R Friedländer; Jonas Almlöf; Michael Sammeth; Irina Pulyakhina; Seyed Yahya Anvar; Jeroen F J Laros; Henk P J Buermans; Olof Karlberg; Mathias Brännvall; Johan T den Dunnen; Gert-Jan B van Ommen; Ivo G Gut; Roderic Guigó; Xavier Estivill; Ann-Christine Syvänen; Emmanouil T Dermitzakis; Tuuli Lappalainen
Journal:  Nat Biotechnol       Date:  2013-09-15       Impact factor: 54.908

2.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

3.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

4.  Near-optimal probabilistic RNA-seq quantification.

Authors:  Nicolas L Bray; Harold Pimentel; Páll Melsted; Lior Pachter
Journal:  Nat Biotechnol       Date:  2016-04-04       Impact factor: 54.908

5.  Toil enables reproducible, open source, big biomedical data analyses.

Authors:  John Vivian; Arjun Arkal Rao; Frank Austin Nothaft; Christopher Ketchum; Joel Armstrong; Adam Novak; Jacob Pfeil; Jake Narkizian; Alden D Deran; Audrey Musselman-Brown; Hannes Schmidt; Peter Amstutz; Brian Craft; Mary Goldman; Kate Rosenbloom; Melissa Cline; Brian O'Connor; Megan Hanna; Chet Birger; W James Kent; David A Patterson; Anthony D Joseph; Jingchun Zhu; Sasha Zaranek; Gad Getz; David Haussler; Benedict Paten
Journal:  Nat Biotechnol       Date:  2017-04-11       Impact factor: 54.908

6.  SAMBLASTER: fast duplicate marking and structural variant read extraction.

Authors:  Gregory G Faust; Ira M Hall
Journal:  Bioinformatics       Date:  2014-05-07       Impact factor: 6.937

7.  Effect of method of deduplication on estimation of differential gene expression using RNA-seq.

Authors:  Anna V Klepikova; Artem S Kasianov; Mikhail S Chesnokov; Natalia L Lazarevich; Aleksey A Penin; Maria Logacheva
Journal:  PeerJ       Date:  2017-03-16       Impact factor: 2.984

8.  The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.

Authors:  Holly C Beale; Jacquelyn M Roger; Matthew A Cattle; Liam T McKay; Drew K A Thompson; Katrina Learned; A Geoffrey Lyle; Ellen T Kephart; Rob Currie; Du Linh Lam; Lauren Sanders; Jacob Pfeil; John Vivian; Isabel Bjork; Sofie R Salama; David Haussler; Olena M Vaske
Journal:  Gigascience       Date:  2021-03-13       Impact factor: 6.524

9.  Genetic effects on gene expression across human tissues.

Authors:  Alexis Battle; Christopher D Brown; Barbara E Engelhardt; Stephen B Montgomery
Journal:  Nature       Date:  2017-10-11       Impact factor: 49.962

10.  Elimination of PCR duplicates in RNA-seq and small RNA-seq using unique molecular identifiers.

Authors:  Yu Fu; Pei-Hsuan Wu; Timothy Beane; Phillip D Zamore; Zhiping Weng
Journal:  BMC Genomics       Date:  2018-07-13       Impact factor: 3.969

View more
  2 in total

1.  Global regulatory factor VeA upregulates the production of antitumor substances in endophytic Fusarium solani.

Authors:  Lu Cai; Jiankang Wang; Yongjie Li; Min Qin; Xuemin Yin; Zhangjiang He; Jichuan Kang
Journal:  Antonie Van Leeuwenhoek       Date:  2022-07-05       Impact factor: 2.158

2.  The case for using mapped exonic non-duplicate reads when reporting RNA-sequencing depth: examples from pediatric cancer datasets.

Authors:  Holly C Beale; Jacquelyn M Roger; Matthew A Cattle; Liam T McKay; Drew K A Thompson; Katrina Learned; A Geoffrey Lyle; Ellen T Kephart; Rob Currie; Du Linh Lam; Lauren Sanders; Jacob Pfeil; John Vivian; Isabel Bjork; Sofie R Salama; David Haussler; Olena M Vaske
Journal:  Gigascience       Date:  2021-03-13       Impact factor: 6.524

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.