| Literature DB >> 29776330 |
Xuan Li1, Yuejiao Fu2, Xiaogang Wang1, Weiliang Qiu3.
Abstract
BACKGROUND: Recently differential variability has been showed to be valuable in evaluating the association of DNA methylation to the risks of complex human diseases. The statistical tests based on both differential methylation level and differential variability can be more powerful than those based only on differential methylation level. Anh and Wang (2013) proposed a joint score test (AW) to simultaneously detect for differential methylation and differential variability. However, AW's method seems to be quite conservative and has not been fully compared with existing joint tests.Entities:
Keywords: Joint score tests; Methylation data; Variability
Mesh:
Year: 2018 PMID: 29776330 PMCID: PMC5960098 DOI: 10.1186/s12859-018-2185-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The empirical Type I error rates (× 100) and power (× 100) for the six tests when methylation values were generated from normal distributions without (Outlier=No) or with an outlier (Outlier =Yes). The numbers of non-diseased and diseased samples are (100, 100)
| Scenarios | Outlier | jointLRT | KS | AW | iAW.Lev | iAW.BF | iAW.TM | |
|---|---|---|---|---|---|---|---|---|
| eqM&eqV | No | 5 | 5.1 | 3.4 | 5.1 | 5.1 | 5.0 | 5.1 |
| (Type I error) | No | 1 | 1.0 | 0.5 | 1.1 | 1.0 | 1.0 | 1.1 |
| No | 0.5 | 0.5 | 0.4 | 0.6 | 0.6 | 0.5 | 0.6 | |
| diffM&eqV | No | 5 | 97.3 | 95.5 | 97.1 | 97.1 | 97.2 | 97.2 |
| No | 1 | 90.2 | 84.9 | 89.4 | 89.8 | 90.0 | 89.7 | |
| No | 0.5 | 85.3 | 75.0 | 84.3 | 83.1 | 83.8 | 83.6 | |
| eqM&diffV | No | 5 | 90.0 | 25.1 | 87.3 | 84.1 | 83.8 | 83.8 |
| No | 1 | 74.3 | 6.1 | 65.7 | 63.5 | 62.9 | 62.6 | |
| No | 0.5 | 66.3 | 2.4 | 55.2 | 51.6 | 52.0 | 52.5 | |
| diffM&diffV | No | 5 | 83.2 | 63.9 | 81.0 | 79.3 | 79.2 | 79.3 |
| No | 1 | 63.7 | 36.8 | 59.9 | 56.9 | 56.8 | 56.3 | |
| No | 0.5 | 53.9 | 24.5 | 48.8 | 45.5 | 46.3 | 46.2 | |
| eqM&eqV | Yes | 5 | 12.2 | 3.2 | 3.7 | 4.8 | 4.8 | 4.8 |
| (Type I error) | Yes | 1 | 3.7 | 0.5 | 0.5 | 0.9 | 0.9 | 1.0 |
| Yes | 0.5 | 2.3 | 0.4 | 0.3 | 0.4 | 0.4 | 0.4 | |
| diffM&eqV | Yes | 5 | 95.6 | 94.9 | 98.4 | 98.1 | 98.1 | 98.1 |
| Yes | 1 | 83.0 | 86.6 | 94.5 | 92.3 | 92.7 | 92.4 | |
| Yes | 0.5 | 77.5 | 76.9 | 91.0 | 89.4 | 90.0 | 89.3 | |
| eqM&diffV | Yes | 5 | 46.3 | 16.7 | 54.3 | 69.3 | 68.8 | 68.9 |
| Yes | 1 | 20.3 | 5.3 | 31.5 | 43.0 | 43.5 | 43.2 | |
| Yes | 0.5 | 15.1 | 2.2 | 22.8 | 36.0 | 36.8 | 36.1 | |
| diffM&diffV | Yes | 5 | 54.6 | 58.4 | 75.5 | 78.4 | 78.5 | 78.7 |
| Yes | 1 | 26.5 | 38.2 | 56.9 | 56.1 | 57.2 | 57.0 | |
| Yes | 0.5 | 20.3 | 25.8 | 47.5 | 48.4 | 50.4 | 49.1 |
The empirical Type I error rates (× 100) and power (× 100) of the six tests when methylation values were generated from Beta distributions. The numbers of non-diseased and diseased samples are (100, 100)
| Scenarios | Outlier | jointLRT | KS | AW | iAW.Lev | iAW.BF | iAW.TM | |
|---|---|---|---|---|---|---|---|---|
| eqM&eqV | No | 5 | 5.7 | 3.5 | 5.4 | 5.4 | 5.4 | 5.5 |
| (Type I error) | No | 1 | 1.5 | 0.5 | 1.0 | 1.1 | 1.2 | 1.1 |
| No | 0.5 | 0.8 | 0.3 | 0.5 | 0.5 | 0.5 | 0.5 | |
| diffM&eqV | No | 5 | 96.8 | 94.7 | 97.5 | 97.2 | 97.4 | 97.4 |
| No | 1 | 88.4 | 86.7 | 91.7 | 90.5 | 91.0 | 90.9 | |
| No | 0.5 | 83.1 | 77.5 | 87.8 | 86.6 | 87.9 | 87.4 | |
| eqM&diffV | No | 5 | 88.1 | 18.9 | 86.8 | 83.1 | 82.7 | 83.0 |
| No | 1 | 68.6 | 6.2 | 65.8 | 62.2 | 60.6 | 61.3 | |
| No | 0.5 | 60.4 | 2.5 | 55.6 | 53.5 | 52.6 | 53.2 | |
| diffM&diffV | No | 5 | 83.5 | 64.5 | 88.6 | 84.9 | 85.8 | 85.9 |
| No | 1 | 58.1 | 42.8 | 70.4 | 63.0 | 64.5 | 64.8 | |
| No | 0.5 | 48.9 | 30.2 | 60.6 | 54.6 | 57.6 | 56.8 | |
| eqM&eqV | Yes | 5 | 11.0 | 3.6 | 3.8 | 5 | 4.9 | 4.9 |
| (Type I error) | Yes | 1 | 3.3 | 0.6 | 0.7 | 1.0 | 1.0 | 1.0 |
| Yes | 0.5 | 1.8 | 0.3 | 0.3 | 0.5 | 0.5 | 0.5 | |
| diffM&eqV | Yes | 5 | 97.6 | 95.9 | 98.8 | 98.6 | 98.8 | 98.7 |
| Yes | 1 | 89.2 | 87.7 | 94.8 | 93.4 | 94.0 | 93.7 | |
| Yes | 0.5 | 82.6 | 79.6 | 91.6 | 89.3 | 89.9 | 89.8 | |
| eqM&diffV | Yes | 5 | 31.9 | 15.7 | 24.9 | 61.2 | 59.8 | 60.6 |
| Yes | 1 | 11.5 | 5.1 | 6.7 | 33.0 | 31.3 | 32.1 | |
| Yes | 0.5 | 6.6 | 2.0 | 4.0 | 23.2 | 21.3 | 22.0 | |
| diffM&diffV | Yes | 5 | 26.4 | 59.9 | 36.6 | 52.6 | 53.4 | 53.6 |
| Yes | 1 | 8.4 | 38.3 | 15.4 | 24.9 | 25.7 | 25.5 | |
| Yes | 0.5 | 4.5 | 26.0 | 10.6 | 16.5 | 17.2 | 17.2 |
The empirical Type I error rates (× 100) and power (× 100) for the six tests when methylation values generated from chi-squared distributions. The numbers of non-diseased and diseased samples are (100, 100)
| Scenarios | Outlier | jointLRT | KS | AW | iAW.Lev | iAW.BF | iAW.TM | |
|---|---|---|---|---|---|---|---|---|
| eqM&eqV | No | 5 | 13.8 | 4.2 | 5.0 | 6.3 | 5.3 | 5.2 |
| (Type I errror) | No | 1 | 6.3 | 0.7 | 0.9 | 1.5 | 1.3 | 1.2 |
| No | 0.5 | 4.4 | 0.4 | 0.4 | 0.8 | 0.5 | 0.5 | |
| diffM&eqV | No | 5 | 90.2 | 99.7 | 99.8 | 99.6 | 99.9 | 99.9 |
| No | 1 | 53.8 | 97.1 | 99.0 | 97.1 | 99.4 | 99.4 | |
| No | 0.5 | 40.9 | 95.9 | 98.1 | 94.9 | 99.2 | 99.0 | |
| eqM&diffV | No | 5 | 18.6 | 10.2 | 29.2 | 29.6 | 35.4 | 34.6 |
| No | 1 | 5.8 | 2.1 | 10.3 | 11.4 | 14.7 | 15.0 | |
| No | 0.5 | 3.9 | 1.3 | 6.9 | 7.0 | 11.1 | 10.4 | |
| diffM&diffV | No | 5 | 18.4 | 42.2 | 59.9 | 54.9 | 70.6 | 69.0 |
| No | 1 | 3.7 | 17.9 | 35.7 | 27.5 | 45.5 | 43.8 | |
| No | 0.5 | 2.1 | 13.9 | 27.9 | 18.9 | 38.9 | 35.6 | |
| eqM&eqV | Yes | 5 | 20.1 | 4.0 | 4.8 | 6.7 | 5.5 | 5.3 |
| (Type I error) | Yes | 1 | 10.3 | 0.7 | 0.7 | 1.7 | 1.1 | 1.1 |
| Yes | 0.5 | 7.8 | 0.5 | 0.2 | 0.8 | 0.5 | 0.5 | |
| diffM&eqV | Yes | 5 | 67.9 | 99.5 | 99.9 | 99.4 | 99.9 | 99.9 |
| Yes | 1 | 23.7 | 96.5 | 99.1 | 96.4 | 99.4 | 99.3 | |
| Yes | 0.5 | 12.9 | 95.0 | 98.7 | 94.0 | 99.0 | 98.8 | |
| eqM&diffV | Yes | 5 | 27.5 | 9.5 | 34.0 | 39.7 | 41.0 | 41.5 |
| Yes | 1 | 9.9 | 1.8 | 11.9 | 16.6 | 19.3 | 19.0 | |
| Yes | 0.5 | 6.1 | 1.1 | 7.3 | 11.2 | 14.0 | 13.7 | |
| diffM&diffV | Yes | 5 | 21.9 | 39.8 | 65.2 | 60.4 | 73.2 | 72.1 |
| Yes | 1 | 6.3 | 16.3 | 39.9 | 31.7 | 49.7 | 47.7 | |
| Yes | 0.5 | 3.4 | 12.2 | 32.5 | 23.9 | 41.8 | 39.7 |
The empirical Type I error rates (× 100) and power (× 100) for the six tests when methylation values generated from mixtures of two normal distributions. The numbers of non-diseased and diseased samples are (100, 100)
| Scenarios | Outlier | jointLRT | KS | AW | iAW.Lev | iAW.BF | iAW.TM | |
|---|---|---|---|---|---|---|---|---|
| eqM&eqV | No | 5 | 2.4 | 3.8 | 4.9 | 9.4 | 5.4 | 12.3 |
| (Type I error) | No | 1 | 0.4 | 0.7 | 0.8 | 3.2 | 1.3 | 4.5 |
| No | 0.5 | 0.2 | 0.4 | 0.4 | 2.0 | 0.8 | 2.8 | |
| diffM&eqV | No | 5 | 16.6 | 58.4 | 74.9 | 56.2 | 87.0 | 53.6 |
| No | 1 | 4.0 | 30.8 | 55.1 | 26.6 | 65.8 | 25.5 | |
| No | 0.5 | 2.3 | 25.5 | 45.1 | 17.8 | 53.9 | 19.6 | |
| eqM&diffV | No | 5 | 34.5 | 98.1 | 55.1 | 88.8 | 57.8 | 69.9 |
| No | 1 | 10.5 | 81.1 | 36.1 | 71.6 | 32.4 | 47.7 | |
| No | 0.5 | 6.4 | 72.7 | 28.9 | 62.5 | 23.6 | 40.5 | |
| diffM&diffV | No | 5 | 37.7 | 98.7 | 61.1 | 92.0 | 68.3 | 76.4 |
| No | 1 | 12.0 | 85.2 | 41.5 | 77.2 | 42.6 | 54.3 | |
| No | 0.5 | 7.8 | 78.1 | 34.0 | 68.3 | 32.2 | 46.7 | |
| eqM&eqV | Yes | 5 | 25.0 | 3.9 | 2.8 | 6.5 | 4.8 | 8.1 |
| (Type I error) | Yes | 1 | 6.8 | 0.7 | 0.4 | 1.4 | 1.0 | 2.1 |
| Yes | 0.5 | 3.7 | 0.4 | 0.2 | 0.7 | 0.6 | 1.3 | |
| diffM&eqV | Yes | 5 | 4.2 | 59.4 | 16.3 | 21.5 | 78.1 | 34.9 |
| Yes | 1 | 1.1 | 32.1 | 5.1 | 5.2 | 55.7 | 9.8 | |
| Yes | 0.5 | 0.5 | 26.5 | 3.3 | 3.5 | 44.7 | 5.2 | |
| eqM&diffV | Yes | 5 | 0.6 | 97.4 | 14.4 | 80.2 | 49.6 | 63.3 |
| Yes | 1 | 0.1 | 79.5 | 5.1 | 59.8 | 27.4 | 39.4 | |
| Yes | 0.5 | 0.0 | 71.2 | 3.5 | 54.1 | 19.7 | 31.5 | |
| diffM&diffV | Yes | 5 | 1.0 | 98.1 | 19.5 | 84.6 | 61.0 | 71.1 |
| Yes | 1 | 0.2 | 83.6 | 7.5 | 65.7 | 37.5 | 47.0 | |
| Yes | 0.5 | 0.1 | 76.8 | 5.6 | 60.1 | 27.9 | 38.1 |
The performances of 6 joint tests based on HM27k data GSE37020 and GSE20080
| Test | nSig | nValidation | nTV | pTV(%) | nFV | pFV(%) |
|---|---|---|---|---|---|---|
| JointLRT | 4556 | 2213 | 1705 | 77.0 | 508 | 23.0 |
| KS | 1288 | 60 | 47 | 78.3 | 13 | 21.7 |
| AW | 1850 | 262 | 220 | 84.0 | 42 | 16.0 |
| iAW.Lev | 2041 | 747 | 666 | 89.2 | 81 | 10.8 |
| iAW.BF | 1843 | 339 | 296 | 87.3 | 43 | 12.7 |
| iAW.TM | 1838 | 387 | 342 | 88.4 | 45 | 11.6 |
nSig: the number of significant CpG sites detected in GSE37020 based on FDR adjusted p-value < 0.05;
nValidation: the number of validated CpG sites in GSE20080 based on unadjusted p-value < 0.05;
nTV: the number of truly validated CpG sites with the same difference directions in means and variances between the two groups;
pTV: , the proportion of significant CpG sites detected in GSE37020 and truly validated in GSE20080;
nFV: the number of falsely validated CpG sites in GSE20080 with inconsistent difference direction in means or variances between the two groups;
pFV: , the proportion of significant CpG sites detected in GSE37020 but falsely validated in GSE20080
Fig. 1Paired parallel boxplots of DNA methylation levels (y axis) versus case-control status (x axis) for the 5 unique top CpG sites acquired by the 6 joint tests based on HM27k data sets. The dots indicate subjects.1A and 1B are for cg26363196 (jointLRT). 2A and 2B are for cg2196766 (KS). 3A and 3B are for cg00321478 (AW). 4A and 4B are for cg21303386 (iAW.Lev). 5A and 5B are for cg06784466 (iAW.BF, iAW.TM). 1A,2A,3A,4A,5A are based on GSE37020. 1B,2B,3B,4B,5B are based on GSE20080
The performances of 6 joint tests based on EPIC data GSE107080
| Test | nSig | nValidation | nTV | pTV(%) | nFV | pFV(%) |
|---|---|---|---|---|---|---|
| JointLRT | 51994 | 19806 | 5652 | 28.5 | 14154 | 71.5 |
| KS | 10 | 3 | 1 | 33.3 | 2 | 66.7 |
| AW | 12 | 5 | 2 | 40.0 | 3 | 60.0 |
| iAW.Lev | 709 | 201 | 89 | 44.3 | 112 | 55.7 |
| iAW.BF | 22 | 7 | 4 | 57.1 | 3 | 42.9 |
| iAW.TM | 22 | 9 | 5 | 55.6 | 4 | 44.4 |
nSig: the number of significant CpG sites detected in the training set of GSE107080 based on FDR adjusted p-value < 0.05;
nValidation: the number of validated CpG sites in the validation set of GSE107080 based on unadjusted p-value < 0.05;
nTV: the number of truly validated CpG sites with the same difference directions in means and variances between the two groups;
pTV: , the proportion of significant CpG sites detected in the training set and truly validated in the validation set;
nFV: the number of falsely validated CpG sites in validation set with inconsistent difference direction in means or variances between the two groups;
pFV: , the proportion of significant CpG sites detected in the training set but falsely validated in the validation set