| Literature DB >> 34059937 |
Moa Lugner1, Soffia Gudbjörnsdottir2,3, Naveed Sattar4, Ann-Marie Svensson2,3, Mervete Miftaraj3, Katarina Eeg-Olofsson3, Björn Eliasson2, Stefan Franzén2,3.
Abstract
AIMS/HYPOTHESIS: Research using data-driven cluster analysis has proposed five novel subgroups of diabetes based on six measured variables in individuals with newly diagnosed diabetes. Our aim was (1) to validate the existence of differing clusters within type 2 diabetes, and (2) to compare the cluster method with an alternative strategy based on traditional methods to predict diabetes outcomes.Entities:
Keywords: Cardiovascular diseases; Cluster analysis; Diabetes complications; Diabetes mellitus type 2; Epidemiology; Mortality
Mesh:
Year: 2021 PMID: 34059937 PMCID: PMC8382658 DOI: 10.1007/s00125-021-05485-5
Source DB: PubMed Journal: Diabetologia ISSN: 0012-186X Impact factor: 10.122
Descriptive statistics for the total population and for four clusters
| Variable | Total population ( | Cluster 1 ( | Cluster 2 ( | Cluster 3 ( | Cluster 4 ( |
|---|---|---|---|---|---|
| Sex, female | 43.1 | 32.0 | 44.5 | 49.5 | 38.0 |
| Age at diagnosis, years | 62.8 ± 12.78 | 57.0 ± 11.68 | 65.4 ± 9.70 | 71.5 ± 9.01 | 51.8 ± 9.99 |
| BMI, kg/m2 | 30.5 ± 5.65 | 31.6 ± 5.97 | 30.8 ± 5.19 | 28.0 ± 4.26 | 32.9 ± 6.17 |
| HbA1c, mmol/mol | 54.3 ± 17.10 | 89.5 ± 21.12 | 50.9 ± 10.88 | 49.0 ± 10.09 | 50.9 ± 10.48 |
| HbA1c, % | 7.1 | 10.3 | 6.8 | 6.6 | 6.8 |
| Systolic BP, mmHg | 137.2 ± 17.37 | 137.8 ± 15.98 | 155.8 ± 14.88 | 132.0 ± 13.07 | 127.9 ± 11.69 |
| Diastolic BP, mmHg | 79.6 ± 10.04 | 82.9 ± 9.60 | 88.6 ± 8.20 | 73.3 ± 7.69 | 78.7 ± 7.82 |
| Triacylglycerol, mmol/l | 2.0 ± 1.33 | 3.8 ± 2.55 | 1.8 ± 0.90 | 1.6 ± 0.78 | 2.0 ± 0.98 |
| HDL-cholesterol, mmol/l | 1.2 ± 0.38 | 1.0 ± 0.28 | 1.3 ± 0.35 | 1.4 ± 0.43 | 1.1 ± 0.27 |
| LDL-cholesterol, mmol/l | 3.1 ± 1.00 | 3.5 ± 1.08 | 3.4 ± 0.99 | 2.8 ± 0.92 | 3.1 ± 0.94 |
| eGFR, ml min−1 [1.73 m]−2 | 84.8 ± 24.57 | 97.1 ± 26.99 | 80.7 ± 19.85 | 71.9 ± 18.35 | 100.0 ± 23.69 |
| Country of birth | |||||
| Sweden | 80.1 | 77.1 | 72.5 | 82.9 | 85.5 |
| Nordic countries (excl. Sweden) | 5.3 | 4.9 | 4.4 | 6.3 | 5.5 |
| Europe (excl. EU27 & Nordic countries) | 3.0 | 3.7 | 4.0 | 2.9 | 2.2 |
| EU27 (excl. Nordic countries) | 2.5 | 2.4 | 2.3 | 2.7 | 2.7 |
| Mediterranean countries | 0.4 | 0.5 | 0.5 | 0.4 | 0.4 |
| Middle East | 4.5 | 5.6 | 8.8 | 2.7 | 1.9 |
| Asia | 1.5 | 2.2 | 2.9 | 0.6 | 0.7 |
| South America | 0.7 | 1.0 | 1.1 | 0.5 | 0.4 |
| North America & Oceania | 0.2 | 0.3 | 0.4 | 0.2 | 0.1 |
| Africa | 1.6 | 2.5 | 3.0 | 0.8 | 0.5 |
Data are means ± SD or percentages
Clinical characteristics for the NDR population as a total and divided into four clusters
The cluster analysis is based on the k-means algorithm using imputed and normalised observations on age, BMI, HbA1c, systolic BP, diastolic BP, triacylglycerol, HDL-cholesterol, LDL-cholesterol and eGFR from individuals with newly diagnosed type 2 diabetes
The regions represent countries of birth grouped in larger areas
Fig. 1Elbow plot for real dataset (a) and elbow plot for simulated dataset (b). (a) The elbow plot is made by calculating the within-cluster sum of squares with k ranging from 1 to 10 in the NDR population. (b) A multivariate normal distribution is fitted to the data (the same data as used for the clustering), producing a vector of estimated averages and a covariance matrix. The estimated averages and covariance matrix are then used to simulate a dataset of the same size and structure as the data used for the real cluster analysis, except that the simulated data only have one cluster. The simulated data are then subjected to the same k-means clustering algorithm as the observed data
Fig. 2Silhouette method on sampled dataset (a) and Gap statistics on sampled dataset (b). (a) The silhouette method was performed on a sampled dataset consisting of 20,000 observations. The vertical dashed line indicates that the mean silhouette width was highest for k equal to 2 in this study. The Silhouette method is used to evaluate how well each person lies within their cluster and to estimate the mean distance between clusters. The silhouette coefficients range from −1 to +1, where a high value indicates that the individuals are well matched to their own clusters and poorly matched to neighbouring clusters. (b) Gap statistics were performed on a sampled dataset consisting of 20,000 observations, using 50 bootstrap samples. The gap statistic compares the total intra-cluster variation between observed data and reference data with a random uniform distribution (a distribution with no obvious clustering) for different values of k, the number of clusters. The optimal value of k is interpreted as the one that maximises the gap, in the figure indicated by the vertical dashed line
Fig. 3Cluster characteristics in four clusters. (a–i) Distribution of (a) age at diagnosis, (b) BMI, (c) HbA1c, (d) LDL-C (LDL-cholesterol), (e) HDL-C (HDL-cholesterol), (f) triacylglycerol, (g) SBP (systolic BP), (h) DBP (diastolic BP) and (i) eGFR at baseline for each of the four clusters
Concordance for prediction models for mortality and CVD events using 2–7 clusters, an ordinary Cox model and a Cox model with smoothing splines
| No. of clusters/model | Mortality | CVD | ||
|---|---|---|---|---|
| Concordance | 95% CI | Concordance | 95% CI | |
| 2 cluster | 0.63 | 0.63, 0.64 | 0.64 | 0.63, 0.65 |
| 3 cluster | 0.63 | 0.63, 0.64 | 0.64 | 0.63, 0.65 |
| 4 cluster | 0.66 | 0.65, 0.66 | 0.66 | 0.65, 0.67 |
| 5 cluster | 0.66 | 0.65, 0.66 | 0.68 | 0.67, 0.69 |
| 6 cluster | 0.65 | 0.65, 0.66 | 0.68 | 0.67, 0.68 |
| 7 cluster | 0.66 | 0.66, 0.66 | 0.68 | 0.67, 0.69 |
| PH Cox | 0.77 | 0.76, 0.77 | 0.77 | 0.77, 0.78 |
| Spline Cox | 0.78 | 0.77, 0.78 | 0.78 | 0.77, 0.78 |
PH Cox, Cox model with the variables as they are
Spline Cox, smoothing splines for all variables