Literature DB >> 31992223

Choice of library size normalization and statistical methods for differential gene expression analysis in balanced two-group comparisons for RNA-seq studies.

Xiaohong Li1, Nigel G F Cooper2, Timothy E O'Toole3, Eric C Rouchka4.   

Abstract

BACKGROUND: High-throughput RNA sequencing (RNA-seq) has evolved as an important analytical tool in molecular biology. Although the utility and importance of this technique have grown, uncertainties regarding the proper analysis of RNA-seq data remain. Of primary concern, there is no consensus regarding which normalization and statistical methods are the most appropriate for analyzing this data. The lack of standardized analytical methods leads to uncertainties in data interpretation and study reproducibility, especially with studies reporting high false discovery rates. In this study, we compared a recently developed normalization method, UQ-pgQ2, with three of the most frequently used alternatives including RLE (relative log estimate), TMM (Trimmed-mean M values) and UQ (upper quartile normalization) in the analysis of RNA-seq data. We evaluated the performance of these methods for gene-level differential expression analysis by considering the factors, including: 1) normalization combined with the choice of a Wald test from DESeq2 and an exact test/QL (Quasi-likelihood) F-Test from edgeR; 2) sample sizes in two balanced two-group comparisons; and 3) sequencing read depths.
RESULTS: Using the MAQC RNA-seq datasets with small sample replicates, we found that UQ-pgQ2 normalization combined with an exact test can achieve better performance in term of power and specificity in differential gene expression analysis. However, using an intra-group analysis of false positives from real and simulated data, we found that a Wald test performs better than an exact test when the number of sample replicates is large and that a QL F-test performs the best given sample sizes of 5, 10 and 15 for any normalization. The RLE, TMM and UQ methods performed similarly given a desired sample size.
CONCLUSION: We found the UQ-pgQ2 method combined with an exact test/QL F-test is the best choice in order to control false positives when the sample size is small. When the sample size is large, UQ-pgQ2 with a QL F-test is a better choice for the type I error control in an intra-group analysis. We observed read depths have a minimal impact for differential gene expression analysis based on the simulated data.

Entities:  

Keywords:  Differentially expressed genes; Normalization; RNA-seq; Sample sizes; Statistical test

Year:  2020        PMID: 31992223     DOI: 10.1186/s12864-020-6502-7

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  13 in total

1.  A zebrafish screen reveals Renin-angiotensin system inhibitors as neuroprotective via mitochondrial restoration in dopamine neurons.

Authors:  Gha-Hyun J Kim; Han Mo; Harrison Liu; Zhihao Wu; Steven Chen; Jiashun Zheng; Xiang Zhao; Daryl Nucum; James Shortland; Longping Peng; Mannuel Elepano; Benjamin Tang; Steven Olson; Nick Paras; Hao Li; Adam R Renslo; Michelle R Arkin; Bo Huang; Bingwei Lu; Marina Sirota; Su Guo
Journal:  Elife       Date:  2021-09-22       Impact factor: 8.140

2.  SAREV: A review on statistical analytics of single-cell RNA sequencing data.

Authors:  Dorothy Ellis; Dongyuan Wu; Susmita Datta
Journal:  Wiley Interdiscip Rev Comput Stat       Date:  2021-05-20

3.  Depth normalization of small RNA sequencing: using data and biology to select a suitable method.

Authors:  Yannick Düren; Johannes Lederer; Li-Xuan Qin
Journal:  Nucleic Acids Res       Date:  2022-06-10       Impact factor: 19.160

4.  Rapid single cell evaluation of human disease and disorder targets using REVEAL: SingleCell™.

Authors:  Namit Kumar; Ryan Golhar; Kriti Sen Sharma; James L Holloway; Srikant Sarangi; Isaac Neuhaus; Alice M Walsh; Zachary W Pitluk
Journal:  BMC Genomics       Date:  2021-01-06       Impact factor: 3.969

5.  The overexpression of DNA repair genes in invasive ductal and lobular breast carcinomas: Insights on individual variations and precision medicine.

Authors:  Ruwaa I Mohamed; Salma A Bargal; Asmaa S Mekawy; Iman El-Shiekh; Nurcan Tuncbag; Alaa S Ahmed; Eman Badr; Menattallah Elserafy
Journal:  PLoS One       Date:  2021-03-04       Impact factor: 3.240

6.  High heterogeneity undermines generalization of differential expression results in RNA-Seq analysis.

Authors:  Weitong Cui; Huaru Xue; Lei Wei; Jinghua Jin; Xuewen Tian; Qinglu Wang
Journal:  Hum Genomics       Date:  2021-01-28       Impact factor: 4.639

7.  Flimma: a federated and privacy-aware tool for differential gene expression analysis.

Authors:  Olga Zolotareva; Reza Nasirigerdeh; Julian Matschinske; Reihaneh Torkzadehmahani; Mohammad Bakhtiari; Tobias Frisch; Julian Späth; David B Blumenthal; Amir Abbasinejad; Paolo Tieri; Georgios Kaissis; Daniel Rückert; Nina K Wenke; Markus List; Jan Baumbach
Journal:  Genome Biol       Date:  2021-12-14       Impact factor: 13.583

8.  The multicellular signalling network of ovarian cancer metastases.

Authors:  Leah Sommerfeld; Florian Finkernagel; Julia M Jansen; Uwe Wagner; Andrea Nist; Thorsten Stiewe; Sabine Müller-Brüsselbach; Anna M Sokol; Johannes Graumann; Silke Reinartz; Rolf Müller
Journal:  Clin Transl Med       Date:  2021-11

9.  Integrated Analysis of miR-430 on Steroidogenesis-Related Gene Expression of Larval Rice Field Eel Monopterus albus.

Authors:  Lihan Zhang; Qiushi Yang; Weitong Xu; Zhaojun Wu; Dapeng Li
Journal:  Int J Mol Sci       Date:  2021-06-29       Impact factor: 5.923

10.  Short-term high fat diet alters genes associated with metabolic and vascular dysfunction during adolescence in rats: a pilot study.

Authors:  Alex E Mohr; Rebecca A Reiss; Monique Beaudet; Johnny Sena; Jay S Naik; Benjimen R Walker; Karen L Sweazea
Journal:  PeerJ       Date:  2021-07-09       Impact factor: 2.984

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.