| Literature DB >> 35643435 |
Elizabeth A Wynn1, Brian E Vestal2, Tasha E Fingerlin2, Camille M Moore3.
Abstract
BACKGROUND: As the cost of RNA-sequencing decreases, complex study designs, including paired, longitudinal, and other correlated designs, become increasingly feasible. These studies often include multiple hypotheses and thus multiple degree of freedom tests, or tests that evaluate multiple hypotheses jointly, are often useful for filtering the gene list to a set of interesting features for further exploration while controlling the false discovery rate. Though there are several methods which have been proposed for analyzing correlated RNA-sequencing data, there has been little research evaluating and comparing the performance of multiple degree of freedom tests across methods.Entities:
Keywords: Correlated data; Multiple DF testing; RNA-seq
Mesh:
Substances:
Year: 2022 PMID: 35643435 PMCID: PMC9148455 DOI: 10.1186/s12874-022-01615-8
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.612
Analysis methods with their associated R packages and details concerning their implementation
| Method | R-Package(s) | Multiple DF Test | Library Adjustment Offset | Details |
|---|---|---|---|---|
| DESeq2 | DESeq2 [ | LRT | DESeq2 size factors | Default settings used, correlation ignored. |
| DESeq2 ∗ | DESeq2 [ | LRT | DESeq2 size factors | Default settings used, subjects treated as fixed effects to account for correlation. |
| edgeR | edgeR [ | LRT | TMM offset | Default settings used, correlation ignored. |
| edgeR ∗ | edgeR [ | LRT | TMM offset | Default settings used, subjects treated as fixed effects to account for correlation. |
| limma | limma [ | Moderated F-test | NA | Count data transformed using the voom function. The duplicateCorrelation function was used with subject id as a blocking factor to account for correlation. |
| GEE | Custom R Functions, geepack [ | Wald | DESeq2 size factors | Models fit using exchangeable working correlation structure. For small sample estimators, custom functions were created by modifying code from the geesmv package [ |
| LMM | lmerTest [ | F-test | NA | Data transformed using the variance stabilizing transformation from the DESeq2 package. |
| NBMM-AGQ | GLMMadaptive [ | LRT | DESeq2 size factors | Model fit using the mixed_model function with a negative binomial distribution. Default settings used for all other parameters. |
| NBMM-LP | glmmADMB [ | LRT | DESeq2 size factors | Model fit using the glmmadmb function with a negative binomial distribution. Default settings used for all other parameters. |
| NBMM-PL | Custom R Function | LRT | DESeq2 size factors | Custom function was created drawing from the glmm.nb function in the NBZIMM package [ |
| rmRNAseq | rmRNAseq [ | Moderated F-statistic with bootstrapped | NA | Model fit using the TC_CAR1 function with the default parameters. |
Summary of simulated datasets
| 10 | |
| ∼ 15,000 | |
| 3, 5, and 10 per group | |
| 4 | |
| 0 (all genes) | |
| 0 (all genes) | |
| 0 (80% of genes), | |
| Drawn from an empirical distribution based on human samples in real RNA-seq data sets with repeated measures [ | |
| Between-subject | Are there differences in expression between the treatment and control at any of the time points? |
| Within-subject | Is there a change in gene expression between any timepoints for the treatment group? |
| Interaction | Are there any significant interaction effects? |
| Global | Are there any significant model coefficients? |
Fig. 1Percentage of non-converged models from selected methods. Methods in which less than 1% of models failed to converge are not included in the figure. For NBMM-LP, which uses a likelihood ratio test, the solid portion of the bar represents the proportion of models in which the full model did not converge and the transparent portion represents genes for which the reduced model for one or more tests failed to converge in which case results for those tests could not be obtained
Fig. 2FDR versus power across different sample sizes for four tests of interest. FDR and power were calculated using a 0.05 FDR significance level and were averaged across 10 simulations for each method and sample size. Points that lie to the left of the dashed vertical line represent methods that have an observed FDR less than the nominal rate of 5%, while points to the right represent methods with FDR inflation. Points located in the bottom left-hand corner with an FDR and power of 0 represent instances in which no genes were found significant. A log scale is used on the x-axis to better differentiate between methods with close to nominal FDR
Non-convergence rate, analysis run time, and number of DEGs for 4 hypothesis tests in the shock dataset. The run time for fitting the full model for each gene, as well as the total time to fit models and perform hypothesis testing is displayed. There were 14,340 genes in the dataset and genes were labelled as a DEG if the Benjamini Hochberg adjusted p-value was < 0.05. For NBMM-LP, the percentage of genes in which one or more reduced models failed to converge is shown in parentheses after the full model non-convergence rate
| DESeq2 | DESeq2* | edgeR | edgeR* | GEE | limma | LMM | NBMM-AGQ | NBMM-LP | NBMM-PL | rmRNAseq | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| % Non-converged | 0% | 0% | 0% | 0% | 0.04% | 0% | 0.31% | 0% | 4.33% (9.07%) | 0.45% | 0% |
| Model Fit Time | 1.14 | 6.44 | 0.61 | 5.8 | 11.01 | 2.16 | 8.91 | 24.83 | 326.74 | 62.72 | - |
| Total Time (Model Fit & Hypothesis Testing) | 6.05 | 28.49 | 0.72 | 6.76 | 18.19 | 2.2 | 11.76 | 104.78 | 1450.86 | 65.6 | 415.12 |
| SS vs. CS (Any timepoint) | 5,323 | - | 5,179 | - | 4,034 | 3,801 | 3,009 | 3,303 | 3,633 | 3,351 | 3,461 |
| Change over time - SS group | 5,252 | 8,331 | 5,155 | 8,319 | 7,910 | 9,796 | 8,048 | 8,600 | 7,772 | 8,281 | 8,081 |
| Change over time - CS group | 128 | 1,381 | 123 | 1,313 | 872 | 638 | 1,003 | 1,836 | 1,280 | 1,229 | 168 |
| Any Significant Interaction Effect | 369 | 1,842 | 358 | 1,783 | 1,416 | 1,962 | 1,502 | 2,356 | 1,801 | 1,769 | 1,045 |
Non-convergence rate, analysis run time, and number of DEGs for 4 hypothesis tests in the reduced shock dataset in which ten subjects from each group were randomly selected. The run time for fitting the full model for each gene, as well as the total time to fit models and perform hypothesis testing is displayed. There were 14,340 genes in the dataset and genes were labelled as a DEG if the Benjamini Hochberg adjusted p-value was < 0.05. For NBMM-LP, the percentage of genes in which one or more reduced models failed to converge is shown in parentheses after the full model non-convergence rate
| DESeq2 | DESeq2* | edgeR | edgeR* | GEE | limma | LMM | NBMM-AGQ | NBMM-LP | NBMM-PL | rmRNAseq | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| % Non-converged | 0% | 0% | 0% | 0% | 0.01% | 0% | 1.12% | 0.03% | 4.35% (9.07%) | 1.28% | 0% |
| Model Fit Time | 0.88 | 2.16 | 0.27 | 1.23 | 8.91 | 1.47 | 8.62 | 20.54 | 226.53 | 70.43 | - |
| Total Time (Model Fit & Hypothesis Testing) | 4.62 | 9.71 | 0.32 | 1.41 | 15.74 | 1.51 | 11.51 | 80.02 | 1067.07 | 73.26 | 327.84 |
| SS vs. CS (Any timepoint) | 4,201 | - | 3,337 | - | 3,108 | 1,743 | 1,387 | 2,422 | 2,290 | 1,696 | 1,355 |
| Change over time - SS group | 1,684 | 4,466 | 1,397 | 3,960 | 4,875 | 5,894 | 3,925 | 5,124 | 4,390 | 4,323 | 2,994 |
| Change over time - CS group | 115 | 756 | 30 | 592 | 767 | 187 | 237 | 1,291 | 732 | 480 | 0 |
| Any Significant Interaction Effect | 166 | 719 | 117 | 701 | 773 | 740 | 398 | 1,291 | 748 | 549 | 129 |
Fig. 3Heatmap of predicted gene expression (row scaled) across the three study timepoints for genes that were significant in a test for differential expression between any two timepoints in the CS group. Predicted values and significance results came from the LMM analysis. Genes are clustered using a correlation distance metric and complete linkage clustering methods and are split into seven clusters indicated by the color bars along the rows
Functional enrichment analysis results. The 25 GO terms with the smallest Benjamini Hochberg (BH) adjusted p-values were selected for each cluster. The lists were then reduced to include only the most specific subclass for each ontology. All GO terms had a BH adjusted p-value < 0.01
| GO Term | Description | # Genes in Set | # Genes in Cluster | # Expected | Fold Enrichment |
|---|---|---|---|---|---|
| GO:0002523 | leukocyte migration involved in inflammatory response | 12 | 6 | 0.30 | 20.00 |
| GO:0050729 | positive regulation of inflammatory response | 86 | 13 | 2.15 | 6.05 |
| GO:0051092 | positive regulation of NF-kappaB transcription factor activity | 136 | 16 | 3.40 | 4.71 |
| GO:1990266 | neutrophil migration | 79 | 12 | 1.98 | 6.06 |
| GO:0002755 | MyD88-dependent toll-like receptor signaling pathway | 33 | 7 | 0.83 | 8.43 |
| GO:0045623 | negative regulation of T-helper cell differentiation | 14 | 5 | 0.35 | 14.29 |
| GO:0060142 | regulation of syncytium formation by plasma membrane fusion | 14 | 5 | 0.35 | 14.29 |
| GO:0071260 | cellular response to mechanical stimulus | 61 | 9 | 1.53 | 5.88 |
| GO:0060396 | growth hormone receptor signaling pathway | 16 | 5 | 0.40 | 12.50 |
| GO:0032651 | regulation of interleukin-1 beta production | 68 | 9 | 1.70 | 5.29 |
| GO:0032695 | negative regulation of interleukin-12 production | 17 | 5 | 0.43 | 11.63 |
| GO:0071354 | cellular response to interleukin-6 | 28 | 6 | 0.70 | 8.57 |
| GO:0006958 | complement activation, classical pathway | 83 | 40 | 2.11 | 18.96 |
| GO:0030449 | regulation of complement activation | 71 | 29 | 1.80 | 16.11 |
| GO:0002377 | immunoglobulin production | 128 | 34 | 3.25 | 10.46 |
| GO:0038096 | Fc-gamma receptor signaling pathway involved in phagocytosis | 117 | 29 | 2.97 | 9.76 |
| GO:0006910 | phagocytosis, recognition | 54 | 21 | 1.37 | 15.33 |
| GO:0050871 | positive regulation of B cell activation | 111 | 25 | 2.82 | 8.87 |
| GO:0006910 | phagocytosis, recognition | 54 | 9 | 0.22 | 40.91 |
| GO:0006958 | complement activation, classical pathway | 83 | 9 | 0.34 | 26.47 |
| GO:0006911 | phagocytosis, engulfment | 87 | 9 | 0.35 | 25.71 |
| GO:0016584 | nucleosome positioning | 11 | 4 | 0.04 | 100.00 |
| GO:0030261 | chromosome condensation | 35 | 5 | 0.14 | 35.71 |
| GO:0045910 | negative regulation of DNA recombination | 37 | 4 | 0.15 | 26.67 |