| Literature DB >> 23151154 |
Kui Wang1, Shu Kay Ng, Geoffrey J McLachlan.
Abstract
BACKGROUND: Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models.Entities:
Mesh:
Year: 2012 PMID: 23151154 PMCID: PMC3574839 DOI: 10.1186/1471-2105-13-300
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Bias and RMSE in brackets from 1000 simulated datasets (generated from new EMMIX-WIRE (EM-W) model with equal to 0.5)
| -0.002 | 0.016 | -0.009 | -0.001 | 0.011 | -0.015 | |
| 0.1,0.315) | (0.045) | (0.052) | (0.033) | (0.029) | (0.051) | (0.051) |
| 0.002 | 0.008 | -0.006 | -0.036 | -0.003 | -0.009 | |
| 1,0.2) | (0.135) | (0.137) | (0.175) | (0.186) | (0.186) | (0.182) |
| -0.001 | -0.018 | 0.024 | 0.004 | 0.004 | -0.001 | |
| 1,0.02) | (0.119) | (0.124) | (0.272) | (0.160) | (0.175) | (0.152) |
| 0.009 | -0.015 | -0.164 | 0.031 | 0.027 | 0.008 | |
| 0.9,0.01) | (0.119) | (0.132) | (0.223) | (0.160) | (0.149) | (0.183) |
| 0.055 | 1.543 | 0.089 | 1.346 | 0.110 | 1.443 | |
| 0.5,0.5) | (0.082) | (1.547) | (0.164) | (1.349) | (0.152) | (1.446) |
| -0.023 | -0.395 | -0.043 | -0.372 | -0.043 | -0.392 | |
| 0.6,0.6) | (0.036) | (0.397) | (0.082) | (0.374) | (0.058) | (0.394) |
| 0.0171 | | -0.017 | | 0.011 | | |
| 1.0,1.0) | (0.055) | | (0.127) | | (0.088) | |
| -0.112 | | -0.091 | | -0.118 | | |
| 0.2,0.3) | (0.145) | | (0.102) | | (0.134) | |
| | EM-W | Kim | Proportion | |||
| | Mean (RMSE) SD | Mean (RMSE) SD | (EM-W is better) | |||
| Error rate | 0.036 (0.044) 0.026 | 0.099 (0.108) 0.044 | 986/1000 | |||
| Rand | 0.954 (0.056) 0.032 | 0.863 (0.149) 0.060 | 993/1000 | |||
| Adjusted | 0.906 (0.113) 0.064 | 0.726 (0.299) 0.120 | 993/1000 | |||
Bias and RMSE in brackets from 1000 simulated datasets (generated from new EMMIX-WIRE (EM-W) model with equal to 1.3
| -0.006 | 0.035 | -0.009 | -0.002 | 0.015 | -0.033 | |
| 0.1,0.315) | (0.061) | (0.080) | (0.047) | (0.045) | (0.070) | (0.074) |
| 0.001 | 0.018 | -0.004 | -0.069 | -0.00 | -0.014 | |
| 1,0.2) | (0.137) | (0.147) | (0.173) | (0.197) | (0.186) | (0.178) |
| 0.010 | -0.062 | 0.017 | -0.031 | 0.001 | -0.002 | |
| 1,0.02) | (0.162) | (0.227) | (0.388) | (0.236) | (0.230) | (0.199) |
| 0.009 | -0.042 | -0.180 | 0.073 | 0.032 | 0.009 | |
| 0.9,0.01) | (0.124) | (0.166) | (0.235) | (0.188) | (0.163) | (0.213) |
| -0.042 | 1.671 | -0.030 | 1.449 | 0.008 | 1.549 | |
| 1.3,1.3) | (0.097) | (1.677) | (0.223) | (1.460) | (0.153) | (1.556) |
| 0.009 | -0.249 | -0.001 | -0.228 | 0.002 | -0.250 | |
| 0.6,0.6) | (0.020) | (0.251) | (0.055) | (0.235) | (0.025) | (0.252) |
| 0.131 | | 0.121 | | 0.141 | | |
| 1.0,1.0) | (0.155) | | (0.219) | | (0.186) | |
| -0.151 | | -0.124 | | -0.160 | | |
| 0.2,0.3) | (0.172) | | (0.129) | | (0.168) | |
| | EM-W | Kim | Proportion | |||
| | Mean (RMSE) SD | Mean (RMSE) SD | (EM-W is better) | |||
| Error rate | 0.094 (0.102) 0.039 | 0.184 (0.192) 0.053 | 988/1000 | |||
| Rand | 0.881 (0.129) 0.049 | 0.758 (0.252) 0.069 | 1000/1000 | |||
| Adjusted | 0.760 (0.259) 0.097 | 0.518 (0.500) 0.133 | 1000/1000 | |||
Bias and RMSE in brackets from 1000 simlated datasets (generated from new EMMIX-WIRE (EM-W) model with equal to 0.5 and equal to 0)
| 0.001 | 0.008 | -0.001 | -0.003 | -0.001 | -0.005 | |
| 0.1,0.315) | (0.009) | (0.012) | (0.008) | (0.008) | (0.010) | (0.011) |
| 0.001 | 0.008 | -0.001 | -0.018 | 0.003 | -0.014 | |
| 1,0.2) | (0.017) | (0.019) | (0.018) | (0.026) | (0.016) | (0.016) |
| -0.002 | -0.023 | -0.001 | -0.005 | 0.003 | -0.006 | |
| 1,0.02) | (0.049) | (0.060) | (0.059) | (0.062) | (0.049) | (0.049) |
| -0.001 | -0.014 | 0.016 | 0.019 | 0.002 | 0.004 | |
| 0.9,0.01) | (0.026) | (0.031) | (0.033) | (0.038) | (0.032) | (0.033) |
| 0.071 | 1.162 | 0.081 | 1.158 | 0.078 | 1.159 | |
| 0.5,0.5) | (0.081) | (1.162) | (0.119) | (1.160) | (0.090) | (1.159) |
| -0.032 | -0.337 | -0.037 | -0.339 | -0.036 | -0.339 | |
| 0.6,0.6) | (0.038) | (0.337) | (0.062) | (0.340) | (0.045) | (0.340) |
| -0.059 | | -0.069 | | -0.064 | | |
| 1.0,1.0) | (0.068) | | (0.106) | | (0.077) | |
| 0 | | 0.001 | | 0.000 | | |
| 0,0) | (0.000) | | (0.001) | | (0.001) | |
| | EM-W | Kim | Proportion | |||
| | Mean (RMSE) SD | Mean (RMSE) SD | (EM-W is better) | |||
| Error rate | 0.078 (0.078) 0.008 | 0.081 (0.081) 0.009 | 738/1000 | |||
| Rand | 0.891 (0.110) 0.012 | 0.886 (0.115) 0.012 | 806/1000 | |||
| Adjusted | 0.780 (0.222) 0.023 | 0.769 (0.232) 0.025 | 802/1000 | |||
Bias and RMSE in brackets from 100 simulated datasets (generated from new EMMIX-WIRE (EM-W) model with equal to 1.3 and equal to 0)
| -0.001 | 0.024 | 0.002 | -0.005 | -0.001 | -0.019 | |
| 0.1,0.315) | (0.014) | (0.029) | (0.016) | (0.017) | (0.017) | (0.026) |
| -0.001 | 0.018 | 0.003 | -0.046 | 0.000 | -0.005 | |
| 1,0.2) | (0.027) | (0.035) | (0.026) | (0.053) | (0.021) | (0.021) |
| 0.001 | -0.068 | 0.005 | -0.041 | 0.001 | 0.008 | |
| 1,0.02) | (0.085) | (0.146) | (0.108) | (0.127) | (0.086) | (0.085) |
| 0.003 | -0.031 | 0.005 | 0.047 | 0.002 | 0.004 | |
| 0.9,0.01) | (0.042) | (0.063) | (0.054) | (0.072) | (0.050) | (0.054) |
| -0.059 | 1.254 | -0.076 | 1.251 | -0.052 | 1.242 | |
| 1.3,1.3) | (0.087) | (1.254) | (0.178) | (1.257) | (0.104) | (1.243) |
| 0.012 | -0.198 | -0.013 | -0.201 | 0.009 | -0.203 | |
| 0.6,0.6) | (0.019) | (0.199) | (0.039) | (0.206) | (0.023) | (0.204) |
| 0.046 | | 0.056 | | 0.039 | | |
| 1.0,1.0) | (0.070) | | (0.145) | | (0.084) | |
| 0.000 | | 0.001 | | 0.000 | | |
| 0.,0.) | (0.000) | | (0.001) | | (0.000) | |
| | EM-W | Kim | Proportion | |||
| | Mean (RMSE) SD | Mean (RMSE) SD | (EM-W is better) | |||
| Error rate | 0.154 (0.154) 0.011 | 0.161 (0.162) 0.012 | 835/1000 | |||
| Rand | 0.796 (0.204) 0.014 | 0.783 (0.217) 0.016 | 912/1000 | |||
| Adjusted | 0.590 (0.411) 0.028 | 0.566 (0.435) 0.031 | 896/1000 | |||
Bias and RMSE in brackets from 1000 simulated datasets (generated from [14] with equal to 0.5)
| -0.003 | 0.000 | -0.008 | 0.001 | 0.010 | -0.000 | |
| 0.1,0.315) | (0.004) | (0.003) | (0.023) | (0.003) | (0.024) | (0.004) |
| 0.002 | 0.000 | 0.003 | 0.001 | 0.001 | 0.001 | |
| 1,0.2) | (0.013) | (0.013) | (0.010) | (0.010) | (0.010) | (0.010) |
| 0.015 | 0.001 | -0.236 | -0.002 | 0.047 | 0.003 | |
| 1,0.02) | (0.041) | (0.036) | (0.333) | (0.037) | (0.073) | (0.035) |
| 0.014 | -0.000 | -0.308 | -0.001 | 0.058 | 0.001 | |
| 0.9,0.01) | (0.026) | (0.021) | (0.345) | (0.023) | (0.067) | (0.025) |
| -0.034 | -0.000 | -0.006 | -0.001 | -0.021 | -0.000 | |
| 0.5,0.5) | (0.036) | (0.006) | (0.027) | (0.015) | (0.025) | (0.009) |
| 0.020 | -0.000 | 0.013 | -0.001 | 0.023 | -0.001 | |
| 0.6,0.6) | (0.021) | (0.007) | (0.025) | (0.017) | (0.028) | (0.009) |
| 0.025 | | 0.014 | | 0.022 | | |
| 0.0,0.0) | (0.026) | | (0.015) | | (0.023) | |
| 0.000 | | 0.045 | | 0.042 | | |
| 0,0) | (0.000) | | (0.095) | | (0.056) | |
| | EM-W | Kim | Proportion | |||
| | Mean (RMSE) SD | Mean (RMSE) SD | (EM-W is not worse) | |||
| Error rate | 0.018 (0.019) 0.006 | 0.016 (0.017) 0.004 | 422/1000 | |||
| Rand | 0.978 (0.023) 0.006 | 0.980 (0.021) 0.005 | 365/1000 | |||
| Adjusted | 0.955 (0.046) 0.012 | 0.959 (0.042) 0.011 | 363/1000 | |||
Bias and RMSE in brackets from 1000 simulated datasets (generated from [14] with equal to 1.3)
| -0.009 | 0.001 | -0.007 | 0.005 | 0.016 | -0.001 | |
| 0.1,0.315) | (0.013) | (0.010) | (0.012) | (0.011) | (0.020) | (0.013) |
| -0.002 | -0.000 | 0.015 | 0.001 | 0.003 | -0.000 | |
| 1,0.2) | (0.023) | (0.023) | (0.024) | (0.019) | (0.016) | (0.016) |
| -0.005 | -0.001 | 0.054 | -0.000 | 0.003 | 0.000 | |
| 1,0.02) | (0.071) | (0.074) | (0.0928) | (0.083) | (0.068) | (0.064) |
| 0.015 | -0.000 | -0.131 | 0.001 | 0.020 | 0.000 | |
| 0.9,0.01) | (0.036) | (0.036) | (0.135) | (0.045) | (0.041) | (0.043) |
| -0.195 | -0.000 | -0.185 | -0.003 | -0.186 | -0.002 | |
| 1.3,1.3) | (0.196) | (0.016) | (0.192) | (0.049) | (0.189) | (0.025) |
| 0.043 | -0.000 | 0.037 | -0.002 | 0.044 | -0.001 | |
| 0.6,0.6) | (0.043) | (0.007) | (0.042) | (0.022) | (0.045) | (0.010) |
| 0.144 | | 0.131 | | 0.143 | | |
| 0.0,0.0) | (0.145) | | (0.133) | | (0.144) | |
| 0.000 | | 0.000 | | 0.001 | | |
| 0.,0.) | (0.000) | | (0.001) | | (0.001) | |
| | EM-W | Kim | Proportion | |||
| | Mean (RMSE) SD | Mean (RMSE) SD | (EM-W is not worse) | |||
| Error rate | 0.103 (0.104) 0.009 | 0.102 (0.103) 0.010 | 426/1000 | |||
| Rand | 0.864 (0.137) 0.012 | 0.866 (0.135) 0.012 | 360/1000 | |||
| Adjusted | 0.725 (0.276) 0.025 | 0.729 (0.272) 0.025 | 352/1000 | |||
Figure 1Clustering of gene expression profiles into four groups for the yeast dataset 1.
Estimation of parameters for the yeast cell cycle dataset 1 (237 genes)
| 0.104 | 0.054 | 0.118 | 0.724 | |
| -0.107 | 0.400 | -0.807 | 0.298 | |
| 1.009 | -0.119 | -0.053 | 0.079 | |
| 0.027 | 0.011 | 0.025 | 0.278 | |
| 0.174 | 0.417 | 0.443 | 0.307 | |
| 0.278 | 0.717 | 0.435 | 0.053 | |
| 0.191 | 0.001 | 0.031 | 0.310 | |
| 85 | 85 | 85 | 85 |
Figure 2Clustering of gene expression profiles into five groups for the yeast dataset 2.
Estimation of parameters for the yeast cell cycle dataset 2 (384 genes)
| 0.238 | 0.290 | 0.151 | 0.165 | 0.157 | |
| 0.643 | -0.061 | -0.736 | -0.616 | 0.329 | |
| -0.062 | 1.019 | 0.285 | -0.772 | -1.001 | |
| 0.011 | 0.046 | 0.037 | 0.028 | 0.006 | |
| 0.498 | 0.296 | 0.470 | 0.309 | 0.244 | |
| 0.503 | 0.269 | 0.364 | 0.379 | 0.550 | |
| 0.062 | 0.052 | 0.044 | 0.065 | 0.030 | |
| 85 | 85 | 85 | 85 | 85 |
Figure 3Clustering of gene expression profiles into twenty-one groups for the complete yeast dataset: (a) eight clusters of periodic genes; (b) thirteen clusters of non-periodic genes.
Distribution of five phases of peak expression over eight clusters obtained (complete yeast data)
| 1 | 1 | 40 | 0 | 1 | 42 |
| 2 | 98 | 0 | 19 | 0 | 1 |
| 3 | 24 | 24 | 31 | 3 | 2 |
| 4 | 16 | 1 | 0 | 20 | 13 |
| 5 | 0 | 7 | 30 | 0 | 0 |
| 6 | 72 | 1 | 3 | 3 | 1 |
| 7 | 0 | 51 | 1 | 0 | 2 |
| 8 | 12 | 34 | 8 | 20 | 31 |
Figure 4Simulated gene expression profiles for the three models.