| Literature DB >> 32290101 |
Martina Troll1,2, Stefan Brandmaier1,2, Sandra Reitmeier3,4, Jonathan Adam1,2, Sapna Sharma1,2, Alice Sommer2,5, Marie-Abèle Bind5, Klaus Neuhaus3, Thomas Clavel3,6, Jerzy Adamski7,8,9, Dirk Haller3,4, Annette Peters2,10, Harald Grallert1,2,11.
Abstract
The analysis of the gut microbiome with respect to health care prevention and diagnostic purposes is increasingly the focus of current research. We analyzed around 2000 stool samples from the KORA (Cooperative Health Research in the Region of Augsburg) cohort using high-throughput 16S rRNA gene amplicon sequencing representing a total microbial diversity of 2089 operational taxonomic units (OTUs). We evaluated the combination of three different components to assess the reflection of obesity related to microbiota profiles: (i) four prediction methods (i.e., partial least squares (PLS), support vector machine regression (SVMReg), random forest (RF), and M5Rules); (ii) five OTU data transformation approaches (i.e., no transformation, relative abundance without and with log-transformation, as well as centered and isometric log-ratio transformations); and (iii) predictions from nine measurements of obesity (i.e., body mass index, three measures of body shape, and five measures of body composition). Our results showed a substantial impact of all three components. The applications of SVMReg and PLS in combination with logarithmic data transformations resulted in considerably predictive models for waist circumference-related endpoints. These combinations were at best able to explain almost 40% of the variance in obesity measurements based on stool microbiota data (i.e., OTUs) only. A reduced loss in predictive performance was seen after sex-stratification in waist-height ratio compared to other waist-related measurements. Moreover, our analysis showed that the contribution of OTUs less prevalent and abundant is minor concerning the predictive power of our models.Entities:
Keywords: 16S rRNA; gut microbiota; machine learning; obesity; waist–height ratio
Year: 2020 PMID: 32290101 PMCID: PMC7232268 DOI: 10.3390/microorganisms8040547
Source DB: PubMed Journal: Microorganisms ISSN: 2076-2607
Benchmark data on the KORA (Cooperative Health Research in the Region of Augsburg) FF4 study population (mean (standard deviation)).
| Characteristics | Overall ( | Men ( | Women ( |
|---|---|---|---|
| Age (years) | 60.0 (12.1) | 60.3 (12.3) | 59.7 (11.9) |
| Body mass index (kg/m2) | 27.9 (5.0) | 28.3 (4.5) | 27.5 (5.4) |
| Waist circumference (cm) | 97.0 (14.2) | 102.8 (12.4) | 91.5 (13.6) |
| Waist–hip ratio | 0.91 (0.09) | 0.96 (0.07) | 0.85 (0.07) |
| Waist–height ratio | 0.58 (0.08) | 0.59 (0.07) | 0.56 (0.09) |
| Body adiposity index | 31.0 (6.0) | 27.8 (4.0) | 34.0 (5.9) |
| Fat mass index (kg/m2) | 9.4 (3.4) | 8.1 (2.8) | 10.6 (3.5) |
| Lean body mass index (kg/m2) | 18.5 (2.6) | 20.1 (2.1) | 16.9 (2.1) |
| Appendicular muscle mass index (kg/m2) | 7.7 (1.3) | 8.6 (1.0) | 6.8 (1.0) |
| Body fat (%) | 32.9 (7.1) | 28.1 (5.3) | 37.5 (5.5) |
Figure 1Comparison of adiposity measures: (A) Sex-stratified distribution of adiposity measures and (B) Spearman correlation in study population (n = 1923). Adiposity measurements were z-transformed (mean = 0, standard deviation = 1). AMMI: appendicular muscle mass index; BAI: body adiposity index; BF (%): body fat percentage; BMI: body mass index; FMI: fat mass index; LBMI: lean body mass index; WC: waist circumference; WHR: waist–hip ratio; WHtR: waist–height ratio.
Comparison of adiposity measures, transformation methods of operational taxonomic unit (OTU) count data, and machine learning algorithms in the study population (n = 1923): Each method includes 10-fold cross-validation, and for each logarithmic transformation, one pseudocount was added to the raw counts. CC: correlation coefficient; CLR: centered log-ratio transformation; ILR: isometric log-ratio transformation; NPK: normalized poly kernel; PLS: partial least squares regression; RF: random forest; RA: relative abundance; RMSE: root mean squared error; SVMReg: support vector machine regression.
| Data Transformation | Machine Learning Algorithm | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| SVMReg NPK | RF | M5Rules | PLS20 | PLS4 | ||||||
| CC | RMSE | CC | RMSE | CC | RMSE | CC | RMSE | CC | RMSE | |
| Body mass index | ||||||||||
| Raw counts | 0.18 | 1.01 | 0.26 | 0.97 | 0.19 | 1.06 | 0.17 | 1.24 | 0.21 | 0.99 |
| RA (100%) | 0.18 | 1.01 | 0.26 | 0.97 | 0.23 | 1.02 | 0.21 | 1.11 | 0.22 | 0.99 |
| RA + Log | 0.30 | 0.97 | 0.25 | 0.97 | 0.26 | 1.01 | 0.22 | 1.22 | 0.32 | 0.96 |
| Raw counts + CLR | 0.33 | 0.95 | 0.21 | 0.98 | 0.25 | 1.01 | 0.22 | 1.23 | 0.31 | 0.97 |
| Raw counts + ILR | 0.33 | 0.95 | 0.19 | 0.98 | 0.28 | 0.99 | 0.22 | 1.23 | 0.31 | 0.97 |
| Waist circumference | ||||||||||
| Raw counts | 0.24 | 0.98 | 0.32 | 0.95 | 0.22 | 1.07 | 0.22 | 1.19 | 0.25 | 0.98 |
| RA (100%) | 0.24 | 0.98 | 0.33 | 0.95 | 0.23 | 1.08 | 0.27 | 1.09 | 0.28 | 0.97 |
| RA + Log | 0.36 | 0.94 | 0.29 | 0.96 | 0.34 | 0.97 | 0.27 | 1.18 | 0.37 | 0.94 |
| Raw counts + CLR | 0.39 | 0.93 | 0.28 | 0.96 | 0.34 | 0.97 | 0.27 | 1.19 | 0.37 | 0.94 |
| Raw counts + ILR | 0.39 | 0.93 | 0.28 | 0.96 | 0.30 | 0.99 | 0.27 | 1.19 | 0.37 | 0.94 |
| Waist-hip ratio | ||||||||||
| Raw counts | 0.23 | 0.99 | 0.31 | 0.96 | 0.18 | 1.06 | 0.21 | 1.16 | 0.23 | 0.98 |
| RA (100%) | 0.23 | 0.99 | 0.29 | 0.96 | 0.25 | 1.02 | 0.24 | 1.11 | 0.26 | 0.97 |
| RA + Log | 0.34 | 0.94 | 0.26 | 0.97 | 0.29 | 0.99 | 0.28 | 1.16 | 0.35 | 0.95 |
| Raw counts + CLR | 0.36 | 0.94 | 0.25 | 0.97 | 0.28 | 1.00 | 0.28 | 1.17 | 0.35 | 0.95 |
| Raw counts + ILR | 0.36 | 0.94 | 0.25 | 0.97 | 0.28 | 1.00 | 0.28 | 1.17 | 0.35 | 0.95 |
| Waist-height ratio | ||||||||||
| Raw counts | 0.21 | 1.00 | 0.31 | 0.95 | 0.22 | 1.06 | 0.20 | 1.23 | 0.23 | 0.98 |
| RA (100%) | 0.21 | 1.00 | 0.30 | 0.96 | 0.23 | 1.03 | 0.25 | 1.09 | 0.26 | 0.98 |
| RA + Log | 0.33 | 0.95 | 0.27 | 0.96 | 0.30 | 0.99 | 0.24 | 1.21 | 0.36 | 0.94 |
| Raw counts + CLR | 0.37 | 0.94 | 0.25 | 0.97 | 0.29 | 1.00 | 0.24 | 1.22 | 0.36 | 0.94 |
| Raw counts + ILR | 0.37 | 0.94 | 0.26 | 0.97 | 0.28 | 1.00 | 0.24 | 1.22 | 0.36 | 0.95 |
| Body adiposity index | ||||||||||
| Raw counts | 0.13 | 1.02 | 0.14 | 0.99 | 0.13 | 1.07 | 0.10 | 1.28 | 0.13 | 1.00 |
| RA (100%) | 0.13 | 1.02 | 0.14 | 0.99 | 0.09 | 1.15 | 0.12 | 1.15 | 0.13 | 1.01 |
| RA + Log | 0.23 | 0.99 | 0.14 | 0.99 | 0.18 | 1.03 | 0.13 | 1.29 | 0.23 | 0.99 |
| Raw counts + CLR | 0.24 | 0.99 | 0.12 | 0.99 | 0.15 | 1.06 | 0.13 | 1.30 | 0.24 | 1.00 |
| Raw counts + ILR | 0.24 | 0.99 | 0.12 | 0.99 | 0.13 | 1.06 | 0.13 | 1.30 | 0.24 | 1.00 |
| Fat mass index | ||||||||||
| Raw counts | 0.14 | 1.02 | 0.17 | 0.99 | 0.11 | 1.07 | 0.12 | 1.21 | 0.16 | 1.00 |
| RA (100%) | 0.14 | 1.02 | 0.18 | 0.98 | 0.16 | 1.05 | 0.14 | 1.14 | 0.16 | 1.01 |
| RA + Log | 0.26 | 0.98 | 0.15 | 0.99 | 0.18 | 1.04 | 0.17 | 1.26 | 0.26 | 0.98 |
| Raw counts + CLR | 0.28 | 0.97 | 0.17 | 0.99 | 0.19 | 1.03 | 0.17 | 1.27 | 0.26 | 0.99 |
| Raw counts + ILR | 0.28 | 0.97 | 0.16 | 0.99 | 0.17 | 1.05 | 0.17 | 1.27 | 0.26 | 0.99 |
| Lean body mass index | ||||||||||
| Raw counts | 0.25 | 0.99 | 0.32 | 0.95 | 0.18 | 1.12 | 0.19 | 1.26 | 0.24 | 0.98 |
| RA (100%) | 0.25 | 0.99 | 0.33 | 0.95 | 0.25 | 1.02 | 0.25 | 1.10 | 0.27 | 0.97 |
| RA + Log | 0.34 | 0.94 | 0.27 | 0.96 | 0.30 | 0.99 | 0.28 | 1.18 | 0.33 | 0.96 |
| Raw counts + CLR | 0.36 | 0.94 | 0.26 | 0.97 | 0.28 | 1.00 | 0.28 | 1.19 | 0.33 | 0.96 |
| Raw counts + ILR | 0.36 | 0.94 | 0.28 | 0.96 | 0.29 | 1.00 | 0.28 | 1.19 | 0.33 | 0.96 |
| Appendicular muscle mass index | ||||||||||
| Raw counts | 0.25 | 0.99 | 0.32 | 0.95 | 0.22 | 1.07 | 0.20 | 1.24 | 0.23 | 0.98 |
| RA (100%) | 0.25 | 0.99 | 0.29 | 0.96 | 0.21 | 1.06 | 0.24 | 1.10 | 0.27 | 0.97 |
| RA + Log | 0.34 | 0.94 | 0.26 | 0.97 | 0.29 | 0.99 | 0.28 | 1.17 | 0.33 | 0.96 |
| Raw counts + CLR | 0.35 | 0.94 | 0.29 | 0.96 | 0.29 | 1.00 | 0.28 | 1.18 | 0.33 | 0.96 |
| Raw counts + ILR | 0.35 | 0.94 | 0.28 | 0.96 | 0.32 | 0.98 | 0.28 | 1.18 | 0.33 | 0.96 |
| Body fat percentage | ||||||||||
| Raw counts | 0.14 | 1.02 | 0.19 | 0.98 | 0.08 | 1.12 | 0.10 | 1.19 | 0.15 | 1.00 |
| RA (100%) | 0.14 | 1.02 | 0.18 | 0.98 | 0.04 | 1.24 | 0.12 | 1.16 | 0.15 | 1.00 |
| RA + Log | 0.23 | 0.97 | 0.16 | 0.99 | 0.18 | 1.04 | 0.18 | 1.25 | 0.26 | 0.99 |
| Raw counts + CLR | 0.28 | 0.97 | 0.14 | 0.99 | 0.15 | 1.05 | 0.18 | 1.25 | 0.25 | 1.00 |
| Raw counts + ILR | 0.28 | 0.97 | 0.12 | 0.99 | 0.19 | 1.04 | 0.18 | 1.25 | 0.26 | 1.00 |
Figure 2Dependency of prediction capacity on prevalent and abundant OTUs in waist–height ratio, implemented with the combinations SVMReg NPK with CLR, and PLS4 with RA+Log: In each step, (A) OTUs present in less than x% of the samples were excluded (prevalence); this means for example in the step 80%, that OTUs with a prevalence of less than 80% were excluded for the calculations. (B) OTUs with an across-sample relative abundance of x% or less were excluded (abundance).
Comparison of predictive performance in study population and sex-stratified subsets: Each (sub-)set was scaled (adiposity measure) and transformed (OTUs) separately. CLR: centered log-ratio; NPK: normalized poly kernel; RA: relative abundance (100%); PLS4: partial least squares with four principal components; SVMReg: support vector machine regression.
| Abdominal Adiposity Measures | Total Population ( | Men ( | Women ( | |||
|---|---|---|---|---|---|---|
| CC | RMSE | CC | RMSE | CC | RMSE | |
| SVMReg NPK and CLR | ||||||
| Waist circumference | 0.39 | 0.93 | 0.24 | 0.98 | 0.32 | 0.96 |
| Waist–hip ratio | 0.36 | 0.94 | 0.24 | 0.98 | 0.26 | 0.98 |
| Waist–height ratio | 0.37 | 0.94 | 0.26 | 0.97 | 0.34 | 0.95 |
| PLS4 and RA + Log | ||||||
| Waist circumference | 0.37 | 0.94 | 0.27 | 0.99 | 0.31 | 0.98 |
| Waist–hip ratio | 0.35 | 0.95 | 0.24 | 1.01 | 0.27 | 1.00 |
| Waist–height ratio | 0.36 | 0.94 | 0.28 | 0.99 | 0.34 | 0.97 |