| Literature DB >> 32322368 |
Wenan Chen1, Silu Zhang2, Justin Williams3, Bensheng Ju3, Bridget Shaner3, John Easton3, Gang Wu1, Xiang Chen3.
Abstract
Accounting for batch effects, especially latent batch effects, in differential expression (DE) analysis is critical for identifying true biological effects. Single-cell RNA sequencing (scRNA-seq) is a powerful tool for quantifying cell-to-cell variation in transcript abundance and characterizing cellular dynamics. Although many scRNA-seq DE analysis methods accommodate known batch variables, their performance has not been systematically evaluated. Moreover, the challenge of accounting for latent batch variables in scRNA-seq DE analysis is largely unmet. In contrast, many methods have been developed to account for batch variables (either known or latent) in other high-dimensional data, especially bulk RNA-seq. We extensively evaluate 11 methods for batch variables in different scRNA-seq DE analysis scenarios, with a primary focus on latent batch variables. We demonstrate that for known batch variables, incorporating them as covariates into a regression model outperformed approaches using a batch-corrected matrix. For latent batches, fixed effects models have inflated FDRs, whereas aggregation-based methods and mixed effects models have significant power loss. Surrogate variable based methods generally control the FDR well while achieving good power with small group effects. However, their performance (except that of SVA) deteriorated substantially in scenarios involving large group effects and/or group label impurity. In these settings, SVA achieves relatively good performance despite an occasionally inflated FDR (up to 0.2). Finally we make the following recommendations for scRNA-seq DE analysis: 1) incorporate known batch variables instead of using batch-corrected data; and 2) employ SVA for latent batch correction. However, better methods are still needed to fully unleash the power of scRNA-seq.Entities:
Keywords: Aggregation-based methods; Batch effects; Differential expression analysis; Fixed effect model; Latent batch effects; Mixed effect model; Surrogate variable based methods; scRNA-seq
Year: 2020 PMID: 32322368 PMCID: PMC7163294 DOI: 10.1016/j.csbj.2020.03.026
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Schematic diagram of the evaluation of different methods accounting for the batch effects in scRNA-seq DE analysis. Two different batch effects scenarios were simulated. Eleven major methods with different configurations were compared in terms of FDR, statistical power, F1-score and the area under the curve (AUC). We provide comparison summaries and recommendations.
Different simulation settings.
| Batches | Group effect | Total number of cells | Impurity level | Number of replicates |
|---|---|---|---|---|
| Matched | Small, FC = 1.5 | Small, 600 | 0 | 50 |
| Matched | Small, FC = 1.2 | Large, 12,000 | 0 | 50 |
| Independent | Small, FC = 1.5 | Small, 600 | 0 | 50 |
| Independent | Small, FC = 1.2 | Large, 12,000 | 0 | 50 |
| Matched | Large, FC = 20 | Small, 600 | 0 | 10 |
| Matched | Large, FC = 20 | Large, 12,000 | 0 | 10 |
| Independent | Large, FC = 20 | Small, 600 | 0 | 10 |
| Independent | Large, FC = 20 | Large, 12,000 | 0 | 10 |
| Matched | Large, FC = 25 | Small, 600 | 5% | 10 |
| Matched | Large, FC = 25 | Large, 12,000 | 5% | 10 |
| Splatter | Default setting | Small, 600 | 0 | 50 |
| Splatter | Default setting | Large, 6000 | 0 | 50 |
Evaluated methods, package versions, and parameter configurations.
| Methods | Version | Batch type | Description |
|---|---|---|---|
| scImpute | 0.0.9 | – | Cluster is set to 6 to reflect 6 batches. Impute threshold is default 0.5 |
| batch, batch_scran | edgeR: 3.23.5scran: 1.10.2 | Known | Include the batches directly in the DE analysis using edgeR. “_scran” means scran is used to estimate the size factor, otherwise the total UMI count is used. |
| ComBat | 3.34.0 | Known matched | Use the default parametric adjustments. The input is the log transformed matrix. The f.pvalue from the R package sva is used to calculate the P values based on the corrected matrix. This can be applied only on known matched batches. |
| MNNCorrect | batchelor 1.2.4 | Known matched | Correct all the genes based on the 2000 high variable genes selected using the function modelGeneVar. |
| scMerge | 1.2.0 | Known matched | Unsupervised gene selection is used by choosing the top 2000 stably expressed genes using the function scSEGIndex. kmeansK is set to two clusters per batch. For the supervised version, the group information is used as the “cell type”; this is similar to using the RUV method. |
| zinbwave | 1.8.0 | Latent | zinbwave_normalized fits a default intercept model and then uses the corrected matrix for DE analysis.zinbwave fits a model with the group variable as the covariates and uses the extracted 20 components as surrogate batch variables. We set the zero inflation to false so that only negative binomial distribution is used. |
| CorrConf | 2.1 | Latent | The name has the pattern CorrConf<_k20><_scran><_ns>. “_k20” means setting the number of surrogate variables to 20; otherwise it is automatically estimated by ChooseK. “_scran” means that scran is used to estimate the size factor; otherwise the total UMI count is used. “_ns” means using the original count matrix without summing; otherwise, 20 cells are summed into a “summed cell” to form the new count matrix. |
| cate | 1.0.4 | Latent | Similar method name pattern as for CorrConf. When the number of surrogate variables is not specified, CBCV from CorrConf is used to automatically estimate the number used. |
| dSVA | 1.0 | Latent | Similar method name pattern as for CorrConf. When the number of surrogate variables is not specified, it is automatically estimated. |
| SVA | 3.29.1 | Latent | Similar method name pattern as for CorrConf. When the number of surrogate variables is not specified, it is automatically estimated. |
| pseudo_bulk | edgeR: 3.23.5 | Latent | Aggregate all cell counts within each batch to generate a pseudo bulk sample. Then perform the DE analysis using a quasi-likelihood (QL) based method using edgeR. |
| fixed_effect | edgeR: 3.23.5 | Latent | The batch effects are nested within each group by using the formula in edgeR ~ group + group:batch. We set the contrast to contr.sum and test whether the group effect is 0. The likelihood based test is used. Scran is used to estimate the size factor. |
| mixed model | SAS 9.4 | Latent | The counts are modeled using negative binomial distribution, and the batch effects are modeled using a random Gaussian distribution in SAS. Four different combinations of test options are used: laplace_ChiSq, quad_ChiSq, PL_default_F, and PL_KR_F. laplace_ChiSq is shown as mixed_model in the results. laplace and quad means the approach uses Laplace approximation and adaptive quadrature, respectively, when using the maximum likelihood estimation. PL means pseudo-likelihood estimation, default_F means the default F test, KR_F means the F test with the Kenward and Roger adjustment on the degree of freedom. quad_ChiSq and PL_KR_F failed to finish on several data sets, and we use the rest for FDR and power estimation. |
FDR and relative power of representative methods.
| Methods | Small group effect | Large group effect | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Matched | Independent | Splatter | Matched | Independent | Impure | |||||||
| S | L | S | L | S | L | S | L | S | L | S | L | |
| FDR | ||||||||||||
| batch_scran | 0.041 | 0.042 | 0.042 | 0.043 | 0.064 | 0.054 | 0.047 | 0.073 | 0.039 | 0.044 | 0.045 | 0.073 |
| scImpute_batch_scran | 0.044 | 0.049 | 0.525 | NA | NA | NA | ||||||
| ComBat | 0.078 | NA | NA | 0.071 | 0.062 | NA | NA | |||||
| MNNCorrect | 0.034 | 0.048 | NA | NA | 0.042 | 0.048 | NA | NA | ||||
| scMerge | NA | NA | NA | NA | NA | |||||||
| zinbwave_normalized | 0.046 | 0.040 | ||||||||||
| zinbwave | 0.064 | 0.070 | ||||||||||
| CorrConf_k20_scran | 0.063 | 0.045 | 0.046 | 0.042 | 0.074 | 0.048 | 0.062 | 0.051 | ||||
| cate_k20_scran | 0.061 | 0.068 | 0.058 | 0.049 | 0.054 | 0.057 | ||||||
| dSVA_k20_scran | 0.057 | 0.064 | 0.057 | 0.047 | 0.058 | 0.121 | ||||||
| SVA_k20_scran | 0.044 | 0.051 | 0.042 | 0.075 | 0.069 | 0.044 | 0.049 | 0.039 | 0.048 | |||
| pseudo_bulk | 0.000 | 0.000 | 0.033 | 0.001 | 0.007 | 0.000 | 0.002 | 0.000 | 0.014 | 0.069 | 0.003 | 0.000 |
| fixed_effect | 0.050 | 0.042 | 0.059 | 0.055 | 0.069 | 0.056 | 0.072 | |||||
| mixed_effect | 0.056 | NA | NA | 0.028 | NA | NA | NA | NA | NA | NA | NA | |
| Relative power | ||||||||||||
| batch_scran | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| scImpute_batch_scran | 0.969 | 1.161 | 0.992 | 1.834 | 1.182 | 1.000 | NA | 1.000 | NA | 1.000 | NA | |
| ComBat | 0.964 | 0.940 | NA | NA | 0.996 | 0.968 | 1.000 | 1.000 | NA | NA | 1.000 | 1.000 |
| MNNCorrect | NA | NA | 0.998 | 0.966 | 1.000 | 1.000 | NA | 1.000 | 1.000 | |||
| scMerge | NA | NA | NA | 1.000 | 1.000 | NA | 1.000 | |||||
| zinbwave_normalized | 1.073 | 1.004 | 1.022 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |||
| zinbwave | 1.182 | 0.980 | 1.138 | 1.003 | 1.007 | 1.003 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| CorrConf_k20_scran | 1.012 | 0.950 | 0.967 | 0.987 | 0.973 | 0.981 | 0.948 | 0.987 | 1.000 | |||
| cate_k20_scran | 1.079 | 0.920 | 1.046 | 0.997 | 1.004 | 0.997 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | |
| dSVA_k20_scran | 1.085 | 0.995 | 1.045 | 1.001 | 0.996 | 0.996 | 1.000 | 0.999 | 0.959 | 1.000 | ||
| SVA_k20_scran | 0.956 | 0.998 | 0.906 | 0.972 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||
| pseudo_bulk | 1.000 | 1.000 | 0.998 | 0.998 | 1.000 | 1.000 | ||||||
| fixed_effect | 0.973 | 1.010 | 1.108 | 0.995 | 1.025 | 1.001 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 |
| mixed_effect | NA | 0.956 | NA | NA | NA | NA | NA | NA | NA | NA | ||
S: small number of cells; L: large number of cells; NA: not computed.
For FDR, regular font indicates FDR ≤ 0.08, underlined font indicates 0.08 < FDR ≤ 0.2, bold font indicates FDR > 0.2
For power, regular font indicates relative power ≥0.9, underlined font indicates 0.8 ≤ relative power <0.9, bold font indicates relative power <0.8.
Fig. 2FDR and power from the simulation of the large sample size and small group effects. a) FDR of matched batches; b) FDR of independent batches; c) power of matched batches; d) power of independent batches; e) F1-score of matched batches; f) F1-score of independent batches; g) AUC of the Precision-Recall curve of matched batches; h) AUC of the Precision-Recall curve of independent batches. The FDR, power, F1-score, and AUC of each method is plotted as a boxplot based on replications. For the FDR, the redline is the nominal threshold of 0.05. A large deviation from this line indicates either inflation or deflation of the FDR.
Fig. 3FDR and power from the simulation of the large sample size and large group effects. a) FDR of matched batches; b) FDR of independent batches; c) power of matched batches; d) power of independent batches; e) F1-score of matched batches; f) F1-score of independent batches; g) AUC of the Precision-Recall curve of matched batches; h) AUC of the Precision-Recall curve of independent batches. The FDR, power, F1-score, and AUC of each method is plotted as a boxplot based on replications. For the FDR, the redline is the nominal threshold of 0.05. A large deviation from this line indicates either inflation or deflation of the FDR.
Fig. 4FDR (a), power (b), F1-score (c) and AUC of the Precision-Recall curve (d) from the simulation of the large sample size and impure group labels with matched batches. The FDR, power, F1-score, and AUC of each method is plotted as a boxplot based on replications. For the FDR, the redline is the nominal threshold of 0.05. A large deviation from this line indicates either inflation or deflation of the FDR.
Summary of evaluated methods.
| Methods | Advantage | Limitation | Recommend application |
|---|---|---|---|
| ComBat, MNNCorrect, scMerge | Good for combining data sets from different sources for visualization and clustering | It is suboptimal to use the batch corrected matrix for DE analysis | Clustering, visualization of data from different sources/batches |
| zinbwave | Useful for modeling non-UMI based scRNA-seq | Large inflated FDR or reduced power in DE analysis with latent batches | DE analysis for non-UMI based scRNA-seq with no need for latent batch correction |
| CorrConf | Good control of FDR and high power when the group effects are small | Inflated FDR or reduced power when the group effects are large or the group is impure | DE analysis for moderate effects or when the group information is highly accurate. Can be used together with SVA for a robust check |
| cate | Good or slightly inflated FDR and high power when the group effects are small | Inflated FDR or reduced power when the group effects are large or the group is impure | DE analysis when the group information is highly accurate. Can be used together with SVA for a robust check |
| dSVA | Good or slightly inflated FDR and high power when the group effects are small | Inflated FDR or reduced power when the group effects are large or the group is impure | DE analysis for moderate effects or the group information is highly accurate. Can be used together with SVA for a robust check |
| SVA | Good control of FDR and high power when the group effects are small; it is also little affected by the group label purity | Occasionally not very stable | Good candidate for DE analysis. Can be used together with cate/CorrConf /dSVA for a robust check |
| pseudo_bulk | Superfast, easy to apply | Low power | Good for identifying strong DE genes |
| fixed_effect | fast | Need to assume that the average batch effects are similar between groups | DE analysis when we are sure the average batch effects per group are similar, as in a paired/blocked design |
| mixed_effect | Can have higher power than pseudo bulk | Very slow for a large number of cells, and the power is low | When the cell number per batch is small (e.g., 〈1 0 0) and the number of batches is large (e.g., ≥5) and a mixed model is strongly preferred because of other modeling aspects. |
Comparison on real data with two batches as discovery and one batch as validation. TPM ≥ 1 is applied to the single-cell results.
| Methods | DECount | TPCount | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| batch_scran | 10,090 | 7711 | 0.764 | 0.788 | 0.776 |
| pseudo_bulk | 403 | 378 | 0.938 | 0.039 | 0.074 |
| CorrConf_scran | 3260 | 2320 | 0.712 | 0.237 | 0.356 |
| cate_scran | 7843 | 5604 | 0.715 | 0.573 | 0.636 |
| dSVA_scran | 3139 | 2430 | 0.774 | 0.248 | 0.376 |
| SVA_scran | 9904 | 7432 | 0.750 | 0.759 | 0.755 |
# of DE genes in validation set: 9788.
Fig. 5UpSet plot showing the intersections of DE genes among different methods. In each UpSet plot, the bar height in the top panel indicates the size of a specific intersection. The bubbles below each bar with non-gray color indicate which sets are in the intersection. A line is drawn to connect those non-gray bubbles when there are at least two different sets in the intersection. The columns of bars and bubbles are sorted by the number of sets in the intersection. a) UpSet plot showing the number of DE genes for each method and their intersections when using the third single-cell RNA-seq data set used as the validation data set; b) UpSet plot showing the number of DE genes for each method and their intersections when using the bulk RNA-seq data used as the validation data set.
Comparison on real data with three batches, using bulk RNA-seq as the ground truth. TPM ≥ 1 is applied to single-cell results and FPKM ≥ 1 is applied to the bulk RNA-seq results, with FDR cutoffs of 0.05 and 0.1.
| Methods | DECount | TPCount | Precision | Recall | F1 Score |
|---|---|---|---|---|---|
| FDR in bulk < 0.05 (#DE genes in bulk: 3322) | |||||
| batch_scran | 10,606 | 2958 | 0.279 | 0.890 | 0.425 |
| pseudo_bulk | 1324 | 1042 | 0.787 | 0.314 | 0.449 |
| CorrConf_scran | 3079 | 1115 | 0.362 | 0.336 | 0.348 |
| cate_scran | 7056 | 2639 | 0.374 | 0.794 | 0.502 |
| dSVA_scran | 3130 | 1419 | 0.453 | 0.427 | 0.440 |
| SVA_scran | 10,344 | 2970 | 0.287 | 0.894 | 0.435 |
| FDR in bulk < 0.1 (#DE genes in bulk: 4475) | |||||
| batch_scran | 10,606 | 3899 | 0.368 | 0.871 | 0.517 |
| pseudo_bulk | 1324 | 1093 | 0.826 | 0.244 | 0.377 |
| CorrConf_scran | 3079 | 1361 | 0.442 | 0.304 | 0.360 |
| cate_scran | 7056 | 3299 | 0.468 | 0.737 | 0.572 |
| dSVA_scran | 3130 | 1711 | 0.547 | 0.382 | 0.450 |
| SVA_scran | 10,344 | 3928 | 0.380 | 0.878 | 0.530 |