| Literature DB >> 25161234 |
Tommi Suvitaival1, Simon Rogers1, Samuel Kaski2.
Abstract
MOTIVATION: Data analysis for metabolomics suffers from uncertainty because of the noisy measurement technology and the small sample size of experiments. Noise and the small sample size lead to a high probability of false findings. Further, individual compounds have natural variation between samples, which in many cases renders them unreliable as biomarkers. However, the levels of similar compounds are typically highly correlated, which is a phenomenon that we model in this work.Entities:
Mesh:
Year: 2014 PMID: 25161234 PMCID: PMC4147908 DOI: 10.1093/bioinformatics/btu455
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Plate diagrams of the one-level peak-clustering model (a) (Suvitaival ) and the two-level compound-clustering model (b) (proposed in this work). The two-level model has a second level of hierarchy for modeling coherently responding groups of compounds. The shaded variables are observed: the intensity data X, the covariate vector a and the peak-clustering matrix V, which is acquired from the peak-clustering stage. White variables are inferred: the compound-specific latent variable , the peak-specific variance , the compound-clustering W, the compound group-specific latent variable z and the covariate effects . The compound-level variance parameter is selected via cross-validation
Design in the simulated experiments 1, 2 and 3 (columns in the table; Sections 3.1.1, 3.1.2 and 3.1.3, respectively)
| Experiment | 1 | 2 | 3 |
|---|---|---|---|
| Number of samples (‘case’ + ‘control’), | 10 + 10 | 10 + 10 | 10 + 10 |
| Number of observed variables per a lower-level cluster (peaks), | 7 | 2 | 2 |
| Number of lower-level clusters per a higher-level cluster (compounds), | 7 | 7 | 1, 3, … , 19 |
| Number of higher-level clusters (groups of similarly responding compounds), | 7 | 1 | 1 |
| Covariate effects of the higher-level clusters, | 0.2 | ||
| Validation range of the number higher-level clusters, | - | - | |
| Validation range of the higher-level variance parameter, | |||
The two-level model was more accurate at small effect sizes of the covariate on simulated data
| True covariate effect | RMSE | Corrected | |||
|---|---|---|---|---|---|
| Single | 1-level | 2-level | |||
| to Single | to 1-level | ||||
| 0 | 1.16 | 0.53 | |||
| +0.5 | 1.42 | 0.68 | |||
| −1.0 | 1.03 | 0.58 | 1.7 × 10−27** | 4.4 × 10−1 | |
| +2.0 | 1.22 | 1.13 | |||
Note: The two-level and one-level models, and the single-peak approach (‘2-level’, ‘1-level’ and ‘Single’, respectively), were compared by their MSE between the inferred and the true covariate effect. The smallest MSE for each true effect is highlighted in bold. The significance of the difference between the two-level model and the two comparison approaches was tested with the two-sided paired t-test with the Benjamini–Hochberg control (Benjamini and Hochberg, 1995) for the false discovery rate. The result is from the first simulated experiment (Section 3.1.1).
*/** Significant difference at confidence level 95/99%.
Fig. 2.The peak-clustering and the compound-clustering models (‘1-level’ and ‘2-level’) reduced uncertainty around the covariate effect compared with the single-peak approach. The hierarchical models have a bias toward zero, which follows from the model assumption incorporated to the prior of the covariate effect. The prior-induced bias lead to a slight increase in the error of the peak-clustering models as the true effect increased but acted to prevent the models from overfitting and thus from false findings at normal effect sizes. (a) Pairwise difference in the error between the single-peak approach and each of the two clustering models shown as a function of the magnitude of the true effect. (b) Inferred effect as a function of the magnitude of the true effect. Result from the second simulated experiment (Section 3.1.2), where the true covariate effect was varied from 0 to 2
Fig. 3.The error in the covariate effect inferred by the compound-clustering model (‘2-level’) decreased when more coherently responding compounds were observed. The accuracy of the peak-clustering model and the data-based single-peak approach (‘1-level’ and ‘Single peak’, respectively) remained constant. Root mean squared error (RMSE) is shown as a function of the number of compounds (i.e. lower-level clusters) per higher-level cluster. Result from the third simulated experiment (Section 3.1.3), where a weak covariate effect of 0.2 was generated and the number of compounds was gradually increased from 1 to 19
The two-level model is most accurate at small levels of covariate effects and both the Bayesian models are more accurate than the single-peak approach on the metabolomic benchmark data (Section 3.2)
| True covariate effect (%) | RMSE | Corrected | |||
|---|---|---|---|---|---|
| Single | 1-level | 2-level | |||
| to Single | to 1-level | ||||
| (a) Positive ion mode | |||||
| +0 | 0.42 | 0.31 | |||
| +20 | 0.41 | 0.22 | |||
| +40 | 0.44 | 0.33 | |||
| +100 | 1.06 | 0.92 | |||
| (b) Negative ion mode | |||||
| +0 | 0.42 | 0.31 | |||
| +20 | 0.54 | 0.26 | |||
| +40 | 0.45 | 0.35 | |||
| +100 | 0.82 | 0.88 | |||
Note: The two-level and one-level models, and the single-peak approach (‘2-level’, ‘1-level’ and ‘Single’, respectively), are compared by their MSE between the inferred and the true covariate effect. The smallest MSE for each true effect is highlighted in bold. The significance of the difference between the two-level model and the two comparison approaches is tested with the two-sided paired t-test with the Benjamini–Hochberg correction (Benjamini and Hochberg, 1995) for the P-values. A near-zero value below the machine accuracy is denoted by ‘ε’.
*/** Significant difference at confidence level 95/99%.