Literature DB >> 26732976

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.

Yanzhu Lin1, Kseniya Golovnina2, Zhen-Xia Chen3, Hang Noh Lee4, Yazmin L Serrano Negron5, Hina Sultana6, Brian Oliver7, Susan T Harbison8.   

Abstract

BACKGROUND: A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment: genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set.
RESULTS: We found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex.
CONCLUSIONS: The best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data.

Entities:  

Mesh:

Substances:

Year:  2016        PMID: 26732976      PMCID: PMC4702322          DOI: 10.1186/s12864-015-2353-z

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


  46 in total

1.  Waking experience affects sleep need in Drosophila.

Authors:  Indrani Ganguly-Fitzgerald; Jeff Donlea; Paul J Shaw
Journal:  Science       Date:  2006-09-22       Impact factor: 47.728

2.  Stem cell transcriptome profiling via massive-scale mRNA sequencing.

Authors:  Nicole Cloonan; Alistair R R Forrest; Gabriel Kolle; Brooke B A Gardiner; Geoffrey J Faulkner; Mellissa K Brown; Darrin F Taylor; Anita L Steptoe; Shivangi Wani; Graeme Bethel; Alan J Robertson; Andrew C Perkins; Stephen J Bruce; Clarence C Lee; Swati S Ranade; Heather E Peckham; Jonathan M Manning; Kevin J McKernan; Sean M Grimmond
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

3.  Synthetic spike-in standards for RNA-seq experiments.

Authors:  Lichun Jiang; Felix Schlesinger; Carrie A Davis; Yu Zhang; Renhua Li; Marc Salit; Thomas R Gingeras; Brian Oliver
Journal:  Genome Res       Date:  2011-08-04       Impact factor: 9.043

4.  The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster.

Authors:  W Jin; R M Riley; R D Wolfinger; K P White; G Passador-Gurgel; G Gibson
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

5.  Preferred analysis methods for Affymetrix GeneChips. II. An expanded, balanced, wholly-defined spike-in dataset.

Authors:  Qianqian Zhu; Jeffrey C Miecznikowski; Marc S Halfon
Journal:  BMC Bioinformatics       Date:  2010-05-27       Impact factor: 3.169

6.  Paucity of genes on the Drosophila X chromosome showing male-biased expression.

Authors:  Michael Parisi; Rachel Nuttall; Daniel Naiman; Gerard Bouffard; James Malley; Justen Andrews; Scott Eastman; Brian Oliver
Journal:  Science       Date:  2003-01-02       Impact factor: 47.728

7.  edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Authors:  Mark D Robinson; Davis J McCarthy; Gordon K Smyth
Journal:  Bioinformatics       Date:  2009-11-11       Impact factor: 6.937

8.  Substantial biases in ultra-short read data sets from high-throughput DNA sequencing.

Authors:  Juliane C Dohm; Claudio Lottaz; Tatiana Borodina; Heinz Himmelbauer
Journal:  Nucleic Acids Res       Date:  2008-07-26       Impact factor: 16.971

9.  FlyBase 102--advanced approaches to interrogating FlyBase.

Authors:  Susan E St Pierre; Laura Ponting; Raymund Stefancsik; Peter McQuilton
Journal:  Nucleic Acids Res       Date:  2013-11-13       Impact factor: 16.971

10.  Gene expression during the life cycle of Drosophila melanogaster.

Authors:  Michelle N Arbeitman; Eileen E M Furlong; Farhad Imam; Eric Johnson; Brian H Null; Bruce S Baker; Mark A Krasnow; Matthew P Scott; Ronald W Davis; Kevin P White
Journal:  Science       Date:  2002-09-27       Impact factor: 47.728

View more
  56 in total

1.  Development of Poly(A)-ClickSeq as a tool enabling simultaneous genome-wide poly(A)-site identification and differential expression analysis.

Authors:  Nathan D Elrod; Elizabeth A Jaworski; Ping Ji; Eric J Wagner; Andrew Routh
Journal:  Methods       Date:  2019-01-06       Impact factor: 3.608

Review 2.  Statistical analysis of non-coding RNA data.

Authors:  Qianchuan He; Yang Liu; Wei Sun
Journal:  Cancer Lett       Date:  2018-01-04       Impact factor: 8.679

3.  Selecting between-sample RNA-Seq normalization methods from the perspective of their assumptions.

Authors:  Ciaran Evans; Johanna Hardin; Daniel M Stoebel
Journal:  Brief Bioinform       Date:  2018-09-28       Impact factor: 11.622

4.  Maternal stress has divergent effects on gene expression patterns in the brains of male and female threespine stickleback.

Authors:  David C H Metzger; Patricia M Schulte
Journal:  Proc Biol Sci       Date:  2016-09-28       Impact factor: 5.349

5.  SeqNet: An R Package for Generating Gene-Gene Networks and Simulating RNA-Seq Data.

Authors:  Tyler Grimes; Somnath Datta
Journal:  J Stat Softw       Date:  2021-07-10       Impact factor: 6.440

6.  ANPELA: analysis and performance assessment of the label-free quantification workflow for metaproteomic studies.

Authors:  Jing Tang; Jianbo Fu; Yunxia Wang; Bo Li; Yinghong Li; Qingxia Yang; Xuejiao Cui; Jiajun Hong; Xiaofeng Li; Yuzong Chen; Weiwei Xue; Feng Zhu
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

7.  Characterization of Gene Expression Phenotype in Amyotrophic Lateral Sclerosis Monocytes.

Authors:  Weihua Zhao; David R Beers; Kristopher G Hooten; Douglas H Sieglaff; Aijun Zhang; Shanker Kalyana-Sundaram; Christopher M Traini; Wendy S Halsey; Ashley M Hughes; Ganesh M Sathe; George P Livi; Guo-Huang Fan; Stanley H Appel
Journal:  JAMA Neurol       Date:  2017-06-01       Impact factor: 18.302

8.  TPM, FPKM, or Normalized Counts? A Comparative Study of Quantification Measures for the Analysis of RNA-seq Data from the NCI Patient-Derived Models Repository.

Authors:  Yingdong Zhao; Ming-Chung Li; Mariam M Konaté; Li Chen; Biswajit Das; Chris Karlovich; P Mickey Williams; Yvonne A Evrard; James H Doroshow; Lisa M McShane
Journal:  J Transl Med       Date:  2021-06-22       Impact factor: 5.531

9.  Human Placental Trophoblasts Infected by Listeria monocytogenes Undergo a Pro-Inflammatory Switch Associated With Poor Pregnancy Outcomes.

Authors:  Lauren J Johnson; Siavash Azari; Amy Webb; Xiaoli Zhang; Mikhail A Gavrilin; Joanna M Marshall; Kara Rood; Stephanie Seveau
Journal:  Front Immunol       Date:  2021-07-23       Impact factor: 8.786

10.  Statistical approaches for differential expression analysis in metatranscriptomics.

Authors:  Yancong Zhang; Kelsey N Thompson; Curtis Huttenhower; Eric A Franzosa
Journal:  Bioinformatics       Date:  2021-07-12       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.