| Literature DB >> 36008772 |
Xin Hu1,2, Jie Wang3,4, Yingjiao Ju3,4, Xiuli Zhang3, Wushou'er Qimanguli5, Cuidan Li3, Liya Yue3, Bahetibieke Tuohetaerbaike6, Ying Li6, Hao Wen6, Wenbao Zhang6, Changbin Chen7, Yefeng Yang8, Jing Wang9,10, Fei Chen11,12,13.
Abstract
BACKGROUND: Tuberculosis (TB) had been the leading lethal infectious disease worldwide for a long time (2014-2019) until the COVID-19 global pandemic, and it is still one of the top 10 death causes worldwide. One important reason why there are so many TB patients and death cases in the world is because of the difficulties in precise diagnosis of TB using common detection methods, especially for some smear-negative pulmonary tuberculosis (SNPT) cases. The rapid development of metabolome and machine learning offers a great opportunity for precision diagnosis of TB. However, the metabolite biomarkers for the precision diagnosis of smear-positive and smear-negative pulmonary tuberculosis (SPPT/SNPT) remain to be uncovered. In this study, we combined metabolomics and clinical indicators with machine learning to screen out newly diagnostic biomarkers for the precise identification of SPPT and SNPT patients.Entities:
Keywords: Diagnostic biomarkers; Machine learning; Metabolite; Metabolome; Mycobacterium tuberculosis (Mtb); Random forest; Smear-positive/negative pulmonary tuberculosis; Tuberculosis (TB)
Mesh:
Substances:
Year: 2022 PMID: 36008772 PMCID: PMC9403968 DOI: 10.1186/s12879-022-07694-8
Source DB: PubMed Journal: BMC Infect Dis ISSN: 1471-2334 Impact factor: 3.667
Baseline characteristics of SPPT and SNPT patients
| Total (N = 100) | SPPT (N = 27) | SNPT (N = 37) | Control (N = 36) | *Adjusted | *Adjusted | *Adjusted | χ2, F or H value (SPPT/SNPT/Ctrl) | ||
|---|---|---|---|---|---|---|---|---|---|
| Gender (%) | 0.162 | 0.43 | 0.449 | 4.868$ | 0.088 | ||||
| Male | 52 (52.0) | 18 (66.7) | 20 (54.1) | 14 (38.9) | |||||
| Female | 48 (48.0) | 9 (33.3) | 17 (45.9) | 22 (61.1) | |||||
| Age (years, median [Q1–Q3]) | 53.50 (35.00–67.25) | 51.00 (32.50–71.00) | 60.00 (49.00–71.00) | 43.50 (34.00–59.25) | 0.087 | 0.011 | 0.315 | 6.369 | 0.041 |
| Occupations (%) | – | – | 0.934 | – | – | ||||
| Farmer | 54 (54.0) | 23 (85.2) | 31 (83.8) | 0 (0.0) | |||||
| Retiree | 11 (11.0) | 1 (3.7) | 1 (2.7) | 9 (25.0) | |||||
| Student | 4 (4.0) | 1 (3.7) | 3 (8.1) | 0 (0.0) | |||||
| Other | 6 (6.0) | 2 (7.4) | 2 (5.4) | 2 (5.6) | |||||
| (Missing value) | 25 (25.0) | 0 (0.0) | 0 (0.0) | 25 (69.4) | |||||
| Marital status (%) | – | – | 0.183 | – | – | ||||
| Single | 7 (7.0) | 5 (18.5) | 2 (5.4) | 0 (0.0) | |||||
| Married | 92 (92.0) | 22 (81.5) | 35 (94.6) | 35 (97.2) | |||||
| (Missing value) | 1 (1.0) | 0 (0.0) | 0 (0.0) | 1 (2.8) | |||||
| BMI (kg/m2, mean[SD]) | 23.19 (4.44) | 20.22 (3.95) | 22.82 (3.93) | 25.33 (4.13) | < 0.001 | 0.022 | 0.027 | 10.02# | < 0.001 |
| Smoking status (%) | – | – | – | – | – | ||||
| Never | 79 (79.0) | 24 (88.9) | 34 (91.9) | 21 (58.3) | |||||
| Current | 11 (11.0) | 3 (11.1) | 2 (5.4) | 6 (16.7) | |||||
| Former | 1 (1.0) | 0 (0.0) | 1 (2.7) | 0 (0.0) | |||||
| (Missing value) | 9 (9.0) | 0 (0.0) | 0 (0.0) | 9 (25.0) | |||||
| Drinking status (%) | – | – | – | – | – | ||||
| Never | 80 (80.0) | 25 (92.6) | 37 (100.0) | 18 (50.0) | |||||
| Current | 11 (11.0) | 2 (7.4) | 0 (0.0) | 9 (25.0) | |||||
| Former | 1 (1.0) | 0 (0.0) | 0 (0.0) | 1 (2.8) | |||||
| (Missing value) | 8 (8.0) | 0 (0.0) | 0 (0.0) | 8 (22.2) | |||||
| TB contact (%) | – | – | < 0.001 | – | – | ||||
| Yes | 44 (68.8) | 8 (29.6) | 36 (97.3) | – | |||||
| No | 16 (25.0) | 16 (59.3) | 0 (0.0) | – | |||||
| (Missing value) | 4 (6.3) | 3 (11.1) | 1 (2.7) | ||||||
| TB treatment (%) | – | – | 0.836 | – | – | ||||
| New cases of TB | 17 (26.6) | 7 (25.9) | 10 (27.0) | – | |||||
| Previously treated | 41 (64.1) | 14 (51.9) | 27 (73.0) | – | |||||
| (Missing value) | 6 (9.4) | 6 (22.2) | 0 (0.0) | ||||||
| Cavitary pulmonary TB (%) | 33 (51.6) | 19 (70.4) | 14 (37.8) | – | – | – | 0.02 | – | |
| Symptoms (%) | |||||||||
| Cough | 60 (92.3) | 22 (81.5) | 37 (100.0) | – | – | – | 0.024 | – | – |
| Expectoration | 60 (92.3) | 22 (81.5) | 37 (100.0) | – | – | – | 0.024 | – | – |
| Dyspnea | 35 (54.7) | 6 (22.2) | 29 (78.4) | – | – | – | < 0.001 | – | – |
| Chest discomfort | 13 (20.3) | 5 (18.5) | 8 (21.6) | – | – | – | 1 | – | – |
| Fever | 5 (7.8) | 3 (11.1) | 2 (5.4) | – | – | – | 0.713 | – | – |
| Hemoptysis | 2 (3.1) | 1 (3.7) | 1 (2.7) | – | – | – | 1 | – | – |
| Chest pain | 2 (3.1) | 1 (3.7) | 1 (2.7) | – | – | – | 1 | – | – |
| Nausea | 1 (1.6) | 0 (0.0) | 1 (2.7) | – | – | – | - | – | – |
| Fatigue | 2 (3.1) | 2 (7.4) | 0 (0.0) | – | – | – | 0.174 | – | – |
| Night sweats | 1 (1.6) | 0 (0.0) | 1 (2.7) | – | – | – | – | – | – |
| Short of breath | 3 (4.7) | 3 (11.1) | 0 (0.0) | – | – | – | – | – | – |
BMI body mass index, Data are shown as n (%), mean (SD) or median (Q1–Q3). p-values are calculated after exclusion of missing data for that variable; *Adjusted p-value for multiple comparisons using Bonferroni-Holm correction. SD: standard deviation; (Q1–Q3): 25th Quartile–75th Quartile. $Chi aquare test; #One Way ANOVA;
Clinical indicators of SPPT and SNPT patients
| Normal range | SPPT | SNPT | Control | *Adjusted | *Adjusted | *Adjusted | H value (SPPT/SNPT/Ctrl) | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Patients | Median (Q1–Q3) | Patients | Median (Q1–Q3) | Patients | Median (Q1–Q3) | |||||||
| Blood routine | ||||||||||||
| Leucocytes, × 109/L | 3.5–9.5 | 27 | 7.20 (5.89–8.61) | 37 | 7.65 (6.32–8.74) | 36 | 6.49 (5.66–7.54) | 0.009 | 0.032 | 0.413 | 6.86 | 0.032 |
| Neutrophils, × 109/L | 1.8–6.3 | 10 | 7.83 (4.62–11.61) | 37 | 5.10 (3.92–5.94) | 36 | 3.82 (3.18–4.51) | < 0.001 | < 0.001 | 0.068 | 19.6 | < 0.001 |
| Erythrocytes, × 1012/L | 4.3–5.8 | 27 | 4.28 (3.88–4.66) | 37 | 4.78 (4.35–4.98) | 36 | 4.77 (4.46–5.25) | < 0.001 | 0.682 | 0.003 | 14.06 | 0.001 |
| Hemoglobin, g/L | 130–175 | 27 | 117.00 (104.50–134.00) | 37 | 139.00 (132.00–151.00) | 36 | 140.00 (132.00–151.50) | < 0.001 | 0.961 | < 0.001 | 29.01 | < 0.001 |
| Platelets, × 109/L | 125–300 | 26 | 305.50 (238.50–371.75) | 37 | 225.00 (201.00–279.00) | 36 | 265.00 (234.50–348.25) | 0.15 | 0.003 | 0.003 | 10.31 | 0.006 |
| Eosinophils, × 109/L | 0.02–0.52 | 27 | 0.12 (0.04–0.34) | 37 | 0.15 (0.10–0.22) | 36 | 0.11 (0.07–0.24) | 0.12 | 0.174 | 0.12 | 0.61 | 0.739 |
| Basophils, × 109/L | 0–0.06 | 27 | 0.02 (0.00—0.07) | 37 | 0.02 (0.01–0.05) | 36 | 0.02 (0.01—0.02) | < 0.001 | 0.005 | < 0.001 | 0.73 | 0.693 |
| Blood biochemistry | ||||||||||||
| Total protein, g/L | 65–85 | 27 | 66.50 (58.75–0.30) | 37 | 66.30 (62.70–68.70) | 34 | 73.40 (70.93–77.18) | < 0.001 | < 0.001 | 0.402 | 31.68 | < 0.001 |
| Albumin, g/L | 40–55 | 27 | 35.30 (31.00–38.75) | 37 | 39.20 (36.00–43.00) | 34 | 44.59 (42.73–45.66) | < 0.001 | < 0.001 | 0.002 | 47.21 | < 0.001 |
| Globulin, g/L | 20–40 | 16 | 30.65 (26.00–35.75) | 37 | 26.40 (24.00–29.30) | 34 | 28.53 (26.36–32.54) | 0.189 | 0.087 | 0.043 | 7.63 | 0.022 |
| Triglyceride, mmol/L | 0.5–1.9 | 25 | 0.99 (0.80–1.22) | 35 | 1.14 (0.89–1.60) | 35 | 1.08 (0.76–1.63) | 0.83 | 0.696 | 0.696 | 1.8 | 0.408 |
| Total cholesterol, mmol/L | 2.3–5.2 | 25 | 3.45 (2.87–3.69) | 35 | 3.63 (3.27–4.58) | 35 | 4.15 (3.76–4.98) | < 0.001 | 0.048 | 0.302 | 12.12 | 0.002 |
| ASP, IU/L | 9–60 | 27 | 23.00 (18.00–36.50) | 37 | 21.00 (18.00–31.00) | 35 | 19.80 (17.65–22.30) | 0.014 | 0.043 | 0.508 | 3.75 | 0.153 |
| ALT, IU/L | 9–50 | 27 | 21.50 (12.55–43.15) | 37 | 19.00 (15.00–29.00) | 35 | 24.20 (15.20–30.40) | 0.269 | 0.269 | 0.586 | 0.51 | 0.776 |
| AKP, IU/L | 45–125 | 27 | 68.00 (53.65–102.50) | 37 | 98.00 (74.00–129.00) | 32 | 76.85 (65.80–89.80) | 0.155 | 0.005 | 0.005 | 10.13 | 0.006 |
| γ-GT, IU/L | 10–60 | 27 | 33.70 (22.00–55.00) | 37 | 26.00 (20.00–59.00) | 32 | 21.50 (15.75–33.50) | 0.189 | 0.189 | 0.559 | 3.79 | 0.15 |
| Creatinine, μmol/L | 57–97 | 27 | 55.20 (44.80–70.70) | 37 | 54.00 (47.00–72.00) | 36 | 65.77 (60.15–74.81) | 0.025 | 0.022 | 0.961 | 7.93 | 0.019 |
| Total bilirubin, μmol/L | 0–26 | 27 | 9.60 (7.90–11.85) | 37 | 11.93 (9.20–19.60) | 34 | 12.13 (10.27–13.95) | 0.042 | 0.123 | 0.042 | 6.61 | 0.037 |
| Direct bilirubin, μmol/L | 0–8 | 27 | 3.39 (2.30–4.50) | 37 | 2.18 (0.30–3.30) | 6 | 2.31 (2.12–3.82) | 0.421 | < 0.001 | < 0.001 | 6.48 | 0.039 |
| Indirect bilirubin, μmol/L | 0–14 | 15 | 6.30 (4.23—8.37) | 37 | 7.90 (6.11–12.15) | 6 | 6.40 (6.20–6.55) | 0.402 | 0.115 | 0.115 | 4.53 | 0.104 |
| Inflammatory-related biomarkers | ||||||||||||
| ESR, mm/h | 0–15 | 26 | 67.50 (43.75–94.25) | 35 | 43.00 (14.00–62.00) | 0 | – | – | – | 0.003 | – | |
| C-reaction protein, mg/L | 0–4 | 27 | 16.53 (9.69–65.08) | 33 | 1.67 (0.80–3.67) | 0 | – | – | – | < 0.001 | – | |
| Procalcitonin, ng/mL | 0–0.05 | 15 | 0.10 (0.07–0.36) | 34 | 0.02 (0.01—0.11) | 0 | – | – | – | 0.013 | – | |
Data are shown as median (Q1–Q3); Missing data of variables are omitted here and showed in the Additional file 1: Table S1; *Adjusted p-value for multiple comparisons using Bonferroni-Holm correction. SD: standard deviation; (Q1–Q3): 25th Quartile—75th Quartile. ASP: aspartate aminotransferase; ALT: alanine aminotransferase; AKP: Alkaline phosphatase; γ-GT: γ-glutamyl transpeptidase; ESR: Erythrocyte sedimentation rate
Fig. 1Plasma metabolomic analysis for the SPPT patients, SNPT patients and controls. A Heatmap showing 103 differential abundant metabolites (DAMs, VIP > 1, p < 0.05) among the three groups. The colored bar above the heatmap represent the SPPT (red), SNPT (orange) and Ctrl (green) samples. The color key indicates the scaled expression levels of the 103 metabolites for the three groups. B Venn diagram showing the differential metabolites among the three groups. C Pie chart showing the chemical classification of the 103 significantly differentially abundant metabolites according to the HMDB database
Fig. 2Scatter plots showing the significantly enriched metabolic pathways among the three groups. The size and color of circles indicate the impact score and p-value of the enriched pathways, respectively
Fig. 3Principal component analyses of clinical indicators (A), DAMs (B) and their combination (C) among SPPT, SNPT and controls. The color key indicates the contribution of the top 5 variables from high (reddish arrows) to low contribution (bluish arrows)
Fig. 4Importance of the screened features for identifying SPPT, SNPT patients from controls. A Importance of the clinical and metabolic features from different optimized combinations for precisely binary classification of SPPT/Ctrl, SNPT/Ctrl and SPPT/SNPT groups (from top to bottom) using random forest model. B Importance of the clinical and metabolic features from the four optimized combinations for simultaneous classification of SPPT, SNPT and Ctrl groups
Classification performance of binary classifications with selected feature combinations on test sets
| Accuracy | Sensitivity | Specificity | PPV | NPV | |
|---|---|---|---|---|---|
| RF | |||||
| SPPT/Ctrl (F1) | 83.33% | 80.00% | 85.71% | 80.00% | 85.71% |
| SNPT/Ctrl (F2) | 92.86% | 85.71% | 100% | 100% | 87.50% |
| SPPT/SNPT (F3) | 83.33% | 80.00% | 85.71% | 80.00% | 85.71% |
| SVM | |||||
| SPPT/Ctrl (F1) | 91.67% | 80.00% | 100% | 100% | 87.50% |
| SNPT/Ctrl (F2) | 92.86% | 85.71% | 100% | 100% | 87.50% |
| SPPT/SNPT (F3) | 91.67% | 80.00% | 100% | 100% | 87.50% |
| MLP | |||||
| SPPT/Ctrl (F1) | 83.33% | 60.00% | 100% | 100% | 77.78% |
| SNPT/Ctrl (F2) | 92.86% | 85.71% | 100% | 100% | 87.50% |
| SPPT/SNPT (F3) | 91.67% | 80.00% | 100% | 100% | 87.50% |
F1 set: albumin and 9-OxoODE; F2 set: l-Pyroglutamic acid, Enterostatin human and 9-OxoODE; F3 set: Val-Ser, Methoxyacetic acid and Ethyl 3-hydroxybutyrate
Fig. 5Confusion matrixes for discriminating SPPT, SNPT and controls with F4 set in the test sets. Confusion matrixes from left to right show the classification performance of SPPT/SNPT/Ctrl groups in the test sets using RF, SVM and MLP models, respectively. F4 set: 9-OxoODE, PGA, Val-Ser, Ethyl 3-hydroxybutyrate, MAA, Enterostatin human, DL-Norvaline, His-Pro and Eicosapentaenoic acid