| Literature DB >> 32039191 |
Flávia Luísa Dias-Audibert1, Luiz Claudio Navarro2, Diogo Noin de Oliveira1, Jeany Delafiori1, Carlos Fernando Odir Rodrigues Melo1, Tatiane Melina Guerreiro1, Flávia Troncon Rosa3, Diego Lima Petenuci4, Maria Angelica Ehara Watanabe4, Licio Augusto Velloso5, Anderson Rezende Rocha2, Rodrigo Ramos Catharino1.
Abstract
Weight gain is a metabolic disorder that often culminates in the development of obesity and other comorbidities such as diabetes. Obesity is characterized by the development of a chronic, subclinical systemic inflammation, and is regarded as a remarkably important factor that contributes to the development of such comorbidities. Therefore, laboratory methods that allow the identification of subjects at higher risk for severe weight-associated morbidity are of utter importance, considering the health, and safety of populations. This contribution analyzed the plasma of 180 Brazilian individuals, equally divided into a eutrophic control group and case group, to assess the presence of biomarkers related to weight gain, aiming at characterizing the phenotype of this population. Samples were analyzed by mass spectrometry and most discriminant features were determined by a machine learning approach using Random Forest algorithm. Five biomarkers related to the pathogenesis and chronicity of inflammation in weight gain were identified. Two metabolites of arachidonic acid were upregulated in the case group, indicating the presence of inflammation, as well as two other molecules related to dysfunctions in the cycle of nitric oxide (NO) and increase in superoxide production. Finally, a fifth case group marker observed in this study may indicate the trigger for diabetes in overweight and obesity individuals. The use of mass spectrometry combined with machine learning analyses to prospect and characterize biomarkers associated with weight gain will pave the way for elucidating potential therapeutic and prognostic targets.Entities:
Keywords: biomarkers; machine learning; metabolomics; obesity; random forest
Year: 2020 PMID: 32039191 PMCID: PMC6993102 DOI: 10.3389/fbioe.2020.00006
Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN: 2296-4185
Figure 1Diagnosis classifier training and testing procedure. Upon receiving data from different patients, we train the proposed method and rank the most important features through an analysis of feature distribution. Thereafter a classifier is trained with the selected featured of interest yielding the diagnosis classifier ready to evaluate different patients data.
Figure 2Identification of potential biomarkers through the proposed machine-learning process using the most discriminant features as a proxy.
Statistical metrics definition to evaluate classification results.
| Formula | 2· | ( |
Anthropometric data of subjects.
| No. patients | Control | 90 | 31 | 59 | |||
| Case | 90 | 18 | 72 | ||||
| Age (y) | Control | 35.1 | 12.5 | 34.5 | 13.2 | 35.4 | 12.2 |
| Case | 39.0 | 15.1 | 44.3 | 17.2 | 37.7 | 14.4 | |
| Weigh (kg) | Control | 64.0 | 11.1 | 75.3 | 8.2 | 57.8 | 6.7 |
| Case | 88.6 | 18.9 | 103.5 | 22.5 | 84.8 | 16.0 | |
| BMI (kg/m2) | Control | 22.3 | 2.8 | 24.7 | 2.3 | 21.0 | 2.0 |
| Case | 33.3 | 6.5 | 34.4 | 9.3 | 33.0 | 5.7 | |
| Body fat percentage (%) | Case | 40.2 | 7.6 | 33.7 | 5.9% | 41.7 | 7.2 |
Figure 3Number of trees given by the grid-search procedure as a function of vector length. Cross marks inside the chart denote values evaluated during the grid search. The red line corresponds to the function used later in the method to compute the number of trees during the training stage for the determination of most discriminant features.
Figure 4Search for most important discriminant features (ions), reducing the spectra data vector length while analyzing how f1-score is affected. The best classifier was found with 18 ranked features, including the 5 markers (discriminant features corresponding to ions that are more prevalent in positive samples) described in Table 4.
The 18 most discriminant features found by the Random Forest analysis, highlighting the eight markers found herein.
| 1 | Yes | 299 | 10 | Yes | 283 |
| 2 | No | 673 | 11 | Yes | 389 |
| 3 | Yes | 278 | 12 | No | 252 |
| 4 | No | 672 | 13 | No | 656 |
| 5 | Yes | 263 | 14 | No | 395 |
| 6 | No | 379 | 15 | Yes | 270 |
| 7 | Yes | 304 | 16 | No | 315 |
| 8 | No | 657 | 17 | No | 462 |
| 9 | No | 250 | 18 | Yes | 308 |
Classification results of the validation tests and the final test of the Diagnosis Classifier using the 18 most discriminant features.
| Vector length | 18 | 18 | |
| Number of trees | 58 | 58 | |
| Accuracy (%) | 96.2 | 3.0 | 86.1 |
| Sensitivity (%) | 96.6 | 5.2 | 94.4 |
| Specificity (%) | 95.8 | 3.5 | 77.8 |
| Precision (%) | 95.9 | 3.3 | 81.0 |
| F1-score (%) | 96.2 | 3.2 | 87.2 |
Figure 5Distribution analysis of feature X299 (m/z = 299).
Results of the 10 experiments validation with Random Forest classifiers trained with the eight markers.
| Vector length | 8 | |
| Number of trees | 58 | |
| Accuracy (%) | 90.9 | 4.0 |
| Sensitivity (%) | 93.5 | 3.9 |
| Specificity (%) | 88.4 | 7.3 |
| Precision (%) | 89.5 | 6.2 |
| F1-score (%) | 91.3 | 3.6 |
Comparison of validation results of 18 most discriminant features using different classifiers.
| Accuracy (%) | 96.6 | 2.3 | 97.2 | 2.3 | 96.5 | 3.3 | 96.2 | 3.5 |
| Sensitivity (%) | 97.4 | 4.5 | 98.6 | 3.0 | 97.1 | 5.0 | 95.9 | 5.7 |
| Specificity (%) | 95.8 | 5.0 | 95.8 | 5.0 | 95.8 | 4.8 | 96.5 | 3.7 |
| Precision (%) | 96.2 | 4.3 | 96.2 | 4.3 | 96.0 | 4.7 | 96.6 | 3.7 |
| F1-score (%) | 96.6 | 2.2 | 97.3 | 2.1 | 96.5 | 3.5 | 96.1 | 3.7 |
SVM with two different optimization algorithms: SMO, Sequential Minimal Optimization (Fan et al., .
Markers elected by Random Forest from plasma analysis of case group.
| 278 | 278.0655 | 232–260–236–246 | [M+K]+ | 278.0650 | 1.79 | 65,872 | Dihydrobiopterin |
| 299 | 299.2022 | 263–271–213–281 | [M+H-2H2O]+ | 299.2017 | 1.67 | 3,466 | PGB2 |
| 308 | 308.1571 | 290–248–209 | [M+NH4]+ | 308.1565 | 1.94 | 389 | Argininosuccinate |
| 389 | 389.1942 | 353–319–371–285 | [M+Na]+ | 389.1935 | 1.79 | 36,247 | Carboxy-LTB4 |
| 263 | 263.0885 | 245 [227 | [M+Na]+ | 263.0890 | −1.90 | 45,041 | CMPF |
MS3.
Figure 6The unbalance of cofactors and substrates in the NO cycle, along with the oxidation of tetrahydropterin to dihydrobiopterin, leads to the uncoupled effect and increased superoxide production. This state of oxidative stress leads to the induction and increase of inflammatory mediators, such as arachidonic acid metabolites. These, in turn, are able to induce ROS and oxidation of TH4, exacerbating the inflammatory state and generating its chronicity in obesity. FFA, Free Fat Acids; TCA, Tricarboxylic Acid; NADPH, Nicotinamide adenine dinucleotide phosphate; NOS, Nitric Oxide Synthase; AA, Arachidonic acid; PGB2, Prostaglandin B2; Carboxy-LTB4, Carboxy-Leukotriene B4.