| Literature DB >> 32605541 |
Tristan Zindler1, Helge Frieling2, Alexandra Neyazi2, Stefan Bleich2, Eva Friedel3,4.
Abstract
BACKGROUND: Systematic technical effects-also called batch effects-are a considerable challenge when analyzing DNA methylation (DNAm) microarray data, because they can lead to false results when confounded with the variable of interest. Methods to correct these batch effects are error-prone, as previous findings have shown.Entities:
Keywords: 450 K array; Batch effects; ComBat; DNA methylation; EPIC array; Illumina; Simulation
Mesh:
Year: 2020 PMID: 32605541 PMCID: PMC7328269 DOI: 10.1186/s12859-020-03559-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1a Different case and control sample distributions of n = 48 on six chips (red, case; dark grey, control). b Systematic increase of the number of factor levels for one corrected factor. The vertical red line shows the factor level with the first false FDR-significant result. c Mean p-value decrease for different sample sizes in combination with a systematic increase of the number of factor levels. d Number of FDR-significant CpG sites with a systematic increase of factor levels for the different Illumina arrays
Fig. 2“mod” refers to a model matrix for the outcome of interest. a Boxplots of p-value distributions without added batch effects and under various conditions. Dotted line indicates expected mean p-value of 0.5. b Boxplots of p-value distributions with simulated batch effects and under various conditions. c Boxplots of FDR false significant sites without batch. d Boxplots of FDR false significant sites without batch. Boxplots for “without ComBat” are indicating the false positive sites due to uncorrected batch effects for the respective sample distributions
ComBat (ChAMP) applied to increasing sample sizes
| sample size | batch factors | balanced | random | unbalanced | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mean- | λ+ | FDR* | BF** | mean- | λ+ | FDR* | BF** | mean- | λ+ | FDR* | BF** | ||
| 8 rows + 6 chips | 0.4131827 | 1.720126 | 3540 | 15 | 0.3735538 | 1.990719 | 27,057 | 132 | 0.2872 | 3.880235 | 156,084 | 2803 | |
| 8 rows + 12 chips | 0.4395739 | 1.465350 | 265 | 4 | 0.4254060 | 1.538365 | 1412 | 19 | 0.2609 | 4.999723 | 214,009 | 11,048 | |
| 8 rows + 18 chips | 0.4487983 | 1.391675 | 84 | 2 | 0.4308005 | 1.453891 | 1218 | 8 | 0.2335 | 6.362690 | 273,773 | 25,522 | |
| 8 rows + 24 chips | 0.4533012 | 1.355385 | 47 | 6 | 0.4438228 | 1.428494 | 218 | 2 | 0.2133315 | 7.746447 | 318,618 | 42,427 | |
| 8 rows + 30 chips | 0.4565824 | 1.334509 | 27 | 1 | 0.4424769 | 1.419485 | 226 | 4 | 0.1978271 | 9.738140 | 352,805 | 59,949 | |
| 8 rows + 36 chips | 0.4588096 | 1.320805 | 9 | 2 | 0.4443718 | 1.406706 | 181 | 4 | 0.1854536 | 11.319961 | 379,740 | 77,903 | |
| 8 rows + 42 chips | 0.4580821 | 1.310937 | 11 | 2 | 0.4453380 | 1.388399 | 144 | 4 | 0.1733198 | 13.128076 | 406,856 | 97,847 | |
| 8 rows + 48 chips | 0.4589804 | 1.303460 | 28 | 1 | 0.4512571 | 1.370874 | 41 | 3 | 0.1574975 | 15.871418 | 440,828 | 127,703 | |
| 8 rows + 54 chips | 0.4609161 | 1.296584 | 2 | 0 | 0.4532645 | 1.360342 | 84 | 4 | 0.1565864 | 16.385741 | 442,872 | 130,896 | |
| 8 rows + 60 chips | 0.4629813 | 1.290615 | 0 | 0 | 0.4545829 | 1.352150 | 77 | 3 | 0.1495949 | 18.068155 | 457,389 | 146,684 | |
| 8 rows + 66 chips | 0.4640470 | 1.286172 | 4 | 1 | 0.4541513 | 1.347141 | 89 | 4 | 0.1444482 | 19.435062 | 469,246 | 159,929 | |
| 8 rows + 72 chips | 0.4643993 | 1.282560 | 5 | 0 | 0.4562415 | 1.346690 | 67 | 3 | 0.1392911 | 21.116253 | 480,228 | 174,133 | |
| 8 rows + 78 chips | 0.4647209 | 1.279449 | 1 | 1 | 0.4546367 | 1.351416 | 97 | 3 | 0.1319790 | 22.583813 | 495,269 | 193,145 | |
| 8 rows + 84 chips | 0.4646304 | 1.277031 | 10 | 0 | 0.4550306 | 1.344902 | 99 | 1 | 0.1285275 | 24.769395 | 501,777 | 201,233 | |
| 8 rows + 90 chips | 0.4629008 | 1.275070 | 5 | 1 | 0.4552286 | 1.342806 | 94 | 3 | 0.1250251 | 26.836670 | 509,759 | 213,140 | |
| 8 rows + 96 chips | 0.4631314 | 1.273107 | 0 | 0 | 0.4545927 | 1.341437 | 118 | 3 | 0.1209156 | 28.445529 | 517,941 | 224,289 | |
+ Genomic Inflation Factor λ
* Significant CpG sites with False Discovery Rate 5%
** Significant CpG sites with Bonferroni correction 5%
Fig. 3Detection auf added significant effects for different use cases of ComBat. The X axes show the uncorrected p-value of the simulated effects before adding the batch effects. The Y axes show the percentage of CpG sites detected as FDR p < 0.05 after adding the batch effects for the different forms of batch effect correction
Fig. 4Heatmap of FDR-significant CpG sites after ComBat application. The dotted line indicates the number of corrected factors when “correcting” for a technical batch. Grey tiles were not assessed due to the sample–factor ratio. The redder the cells are, the more false significant CpG sites were found after applying ComBat