| Literature DB >> 22300499 |
Manuela Hische1,2, Abdelhalim Larhlimi2, Franziska Schwarz1, Antje Fischer-Rosinský1, Thomas Bobbert1, Anke Assmann1, Gareth S Catchpole3, Andreas Fh Pfeiffer1,4, Lothar Willmitzer3,5, Joachim Selbig2,3, Joachim Spranger1,6,7.
Abstract
BACKGROUND: High blood glucose and diabetes are amongst the conditions causing the greatest losses in years of healthy life worldwide. Therefore, numerous studies aim to identify reliable risk markers for development of impaired glucose metabolism and type 2 diabetes. However, the molecular basis of impaired glucose metabolism is so far insufficiently understood. The development of so called 'omics' approaches in the recent years promises to identify molecular markers and to further understand the molecular basis of impaired glucose metabolism and type 2 diabetes. Although univariate statistical approaches are often applied, we demonstrate here that the application of multivariate statistical approaches is highly recommended to fully capture the complexity of data gained using high-throughput methods.Entities:
Year: 2012 PMID: 22300499 PMCID: PMC3298809 DOI: 10.1186/2043-9113-2-3
Source DB: PubMed Journal: J Clin Bioinforma ISSN: 2043-9113
Characterisation of the investigated MESY-BEPO sub-cohort
| Clinical Characteristics | Baseline | Follow-up |
|---|---|---|
| Age [years] | 55.7 ± 11.7 | 61.5 ± 11.5 |
| Gender [% female] | 62.2 | |
| Waist circumference [cm] | 93.8 ± 13.8 | 94.6 ± 17.3 |
| Body mass index [kg/m2] | 28.6 ± 5.2 | 29.1 ± 5.3 |
| Fasting glucose [mg] | 92.1 ± 11.6 | 100.5 ± 13.6 |
| Δ | 1.0 ± 2.3 | |
| Time between baseline and follow-up [years] | 5.6 ± 0.7 | |
Characterisation of the investigated MESY-BEPO sub-cohort (n = 172) at baseline and follow-up. Data are presented as mean ± standard deviation.
Spearman's rank correlation
| Spearman Correlation | p-value | % variance explained | |
|---|---|---|---|
| Hypoxanthine | 0.40 | 16.20 | |
| Aspartic acid | 0.30 | 0.0001 | 9.18 |
| Pyroglutamic acid | 0.29 | 0.0001 | 8.34 |
| 2-methyl-Malic acid | 0.27 | 0.0004 | 7.21 |
| NA 1033 (trisaccharide) | -0.27 | 0.0004 | 7.05 |
| NA 1034 | -0.23 | 0.0026 | 5.23 |
| NA 1052 (carbohydrate) | -0.22 | 0.0046 | 4.64 |
| Myo-inositol | 0.21 | 0.0052 | 4.51 |
| NA 1027 (sterol phosphate) | 0.21 | 0.0061 | 4.34 |
| Threonic acid | 0.21 | 0.0063 | 4.31 |
| NA 997 (Uridine-5'-monophosphate) | 0.20 | 0.0078 | 4.09 |
| Glutamic acid | 0.19 | 0.0104 | 3.80 |
| NA 831 | 0.19 | 0.0119 | 3.67 |
| NA 653 | 0.18 | 0.0179 | 3.25 |
| Ketopentose | 0.18 | 0.0181 | 3.24 |
| Fucose | 0.18 | 0.0192 | 3.18 |
| Uracil | 0.18 | 0.0213 | 3.08 |
| Fructose | -0.17 | 0.0233 | 2.99 |
| NA 631 (D-Glucopyranose) | -0.17 | 0.0235 | 2.98 |
| NA 613 | 0.17 | 0.0251 | 2.91 |
| NA 597 (pyranose) | -0.16 | 0.0312 | 2.70 |
| Isoleucine | 0.16 | 0.0358 | 2.57 |
| NA 275 | 0.16 | 0.0379 | 2.51 |
| NA 639 | 0.16 | 0.0388 | 2.49 |
| NA 442 | 0.16 | 0.0395 | 2.47 |
| NA 560 | 0.16 | 0.0402 | 2.45 |
| NA 854 (carbohydrate) | 0.16 | 0.0413 | 2.43 |
| Tartaric acid | 0.15 | 0.0429 | 2.39 |
| Fructose | -0.15 | 0.0466 | 2.31 |
| NA 632 | -0.15 | 0.0486 | 2.27 |
Spearman's rank correlation coefficient, p-values and % of explained variance for all metabolites significantly correlated with Δglucose (significance level α = 0.05). Not yet identified metabolites are marked with NA, putative biochemical structures are given in round brackets.
Figure 1Correlation matrix. This correlation matrix visualises not only the significant correlations between Δ glucose and the metabolites (first row/column) but also the correlation among the metabolites. The colour intensity and tile size indicate the strength of correlation. Positive correlation are marked blue, negative correlation are marked red.
Figure 2Random Forest feature selection. Iterative bisection of the number of metabolites by removing the 50% of metabolites with the smallest importance measure. The remaining metabolites were used to build the Random Forest regression model. Shown is the median cross-validation accuracy. The accuracy remains stable up to a pattern of nine metabolites.
Random Forest Importance of the highest ranked metabolites
| Metabolite | Random Forest Importance |
|---|---|
| Hypoxanthine | 12.52 |
| Pyroglutamic acid | 9.37 |
| NA 1027 (sterol phosphate) | 8.13 |
| NA 611 (Allantoin) | 7.80 |
| NA 718 (carboxylic acid) | 7.56 |
| NA 1033 (trisaccharide) | 6.28 |
| Aspartic acid | 5.34 |
| NA 1034 | 5.26 |
| Citric acid | 4.60 |
Metabolites belonging to the pattern identified using Random Forests and their Importance. Not yet identified metabolites are marked with NA, putative biochemical structures are given in round brackets.
Regression model accuracy
| Metabolite Selection | Model | all samples | CV |
|---|---|---|---|
| Spearman Correlation | linear Model | 0.57 | 0.22 |
| Spearman Correlation | Random Forest | 0.97 | 0.41 |
| RF importance | Random Forest | 0.97 | 0.47 |
| RF importance + Established markers | Random Forest | 0.97 | 0.46 |
| Established markers | Random Forest | 0.90 | 0.05 |
Accuracy of the models (median Pearson correlation between real and estimated Δglucose levels) based on metabolites and/or established risk markers was calculated using all samples of the training set and after tenfold cross-validation (CV). The established risk markers are: gender, waist circumference, BMI, age and baseline fasting glucose levels.