| Literature DB >> 33304856 |
Eun-Hye Kim1,2, Seunghoon Kim3,4, Hyun-Joo Kim5,6, Hyoung-Oh Jeong3,4, Jaewoong Lee3,4, Jinho Jang3,4, Ji-Young Joo5,6, Yerang Shin1, Jihoon Kang1, Ae Kyung Park2, Ju-Youn Lee5,6, Semin Lee3,4.
Abstract
Periodontitis is a widespread chronic inflammatory disease caused by interactions between periodontal bacteria and homeostasis in the host. We aimed to investigate the performance and reliability of machine learning models in predicting the severity of chronic periodontitis. Mouthwash samples from 692 subjects (144 healthy controls and 548 generalized chronic periodontitis patients) were collected, the genomic DNA was isolated, and the copy numbers of nine pathogens were measured using multiplex qPCR. The nine pathogens are as follows: Porphyromonas gingivalis (Pg), Tannerella forsythia (Tf), Treponema denticola (Td), Prevotella intermedia (Pi), Fusobacterium nucleatum (Fn), Campylobacter rectus (Cr), Aggregatibacter actinomycetemcomitans (Aa), Peptostreptococcus anaerobius (Pa), and Eikenella corrodens (Ec). By adding the species one by one in order of high accuracy to find the optimal combination of input features, we developed an algorithm that predicts the severity of periodontitis using four machine learning techniques. The accuracy was the highest when the models classified "healthy" and "moderate or severe" periodontitis (H vs. M-S, average accuracy of four models: 0.93, AUC = 0.96, sensitivity of 0.96, specificity of 0.81, and diagnostic odds ratio = 112.75). One or two red complex pathogens were used in three models to distinguish slight chronic periodontitis patients from healthy controls (average accuracy of 0.78, AUC = 0.82, sensitivity of 0.71, and specificity of 0.84, diagnostic odds ratio = 12.85). Although the overall accuracy was slightly reduced, the models showed reliability in predicting the severity of chronic periodontitis from 45 newly obtained samples. Our results suggest that a well-designed combination of salivary bacteria can be used as a biomarker for classifying between a periodontally healthy group and a chronic periodontitis group.Entities:
Keywords: chronic periodontitis; machine learning; multiplex qPCR; salivary bacterial copy number; severity prediction; slight periodontitis
Year: 2020 PMID: 33304856 PMCID: PMC7701273 DOI: 10.3389/fcimb.2020.571515
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 5.293
Figure 1Machine learning workflow for predicting the severity of chronic periodontitis using qPCR data.
Characteristics of healthy subjects and periodontitis patients (n = 692).
| Characteristics | Healthy (n = 144) | Chronic periodontitis | P-value | |||
|---|---|---|---|---|---|---|
| Slight(n = 95) | Moderate(n = 245) | Severe(n = 208) | ||||
|
| ||||||
| Male | 70 (48.6%) | 42 (44.2%) | 126 (51.4%) | 123 (59.1%) | 0.065a | |
| Female | 74 (51.4%) | 53 (55.8%) | 119 (48.6%) | 85 (40.9%) | ||
|
| ||||||
| 20-29 | 79 (54.9%) | 6 (6.3%) | 3 (1.2%) | 0 (0.0%) | <0.001a | |
| 30-39 | 42 (29.2%) | 20 (21.1%) | 18 (7.3%) | 11 (5.3%) | ||
| 40-49 | 3 (2.1%) | 20 (21.1%) | 45 (18.4%) | 48 (23.1%) | ||
| 50-59 | 10 (6.9%) | 26 (27.4%) | 85 (34.7%) | 97 (46.6%) | ||
| ≥60 | 10 (6.9%) | 23 (24.2%) | 94 (38.4%) | 52 (25.0%) | ||
|
| ||||||
| Mean ± SD | 2.46 ± 0.29 | 2.82 ± 0.40 | 3.49 ± 0.67 | 4.47 ± 1.18 | <0.001b | |
| Median (IQR) | 2.47 (2.24-2.66) | 2.80 (2.53-3.03) | 3.40 (3.05-3.79) | 4.26 (3.71-5.09) | ||
|
| ||||||
| Mean ± SD | 2.43 ± 0.29 | 2.67 ± 0.36 | 3.16 ± 0.60 | 3.90 ± 0.89 | <0.001b | |
| Median (IQR) | 2.44 (2.23-2.64) | 2.68 (2.44-2.88) | 3.13 (2.82-3.47) | 3.85 (3.29-4.32) | ||
|
| ||||||
| Mean ± SD | 17.00 ± 14.68 | 32.36 ± 21.33 | 46.12 ± 25.47 | 52.78 ± 25.76 | <0.001b | |
| Median (IQR) | 13.84 (7.14-22.24) | 30.95 (15.83-48.28) | 46.88 (24.50-61.61) | 50.00 (31.96-71.76) | ||
|
| ||||||
| Mean ± SD | 0.08 ± 0.16 | 0.39 ± 0.40 | 0.70 ± 0.49 | 1.01 ± 0.55 | <0.001b | |
| Median (IQR) | 0.03 (0.00-0.10) | 0.27 (0.14-0.51) | 0.63 (0.29-1.05) | 0.99 (0.50-1.50) | ||
|
| ||||||
| Never | 120 (83.3%) | 69 (72.6%) | 144 (58.8%) | 97 (46.6%) | <0.001a | |
| Former | 19 (13.2%) | 17 (17.9%) | 52 (21.2%) | 51 (24.5%) | ||
| Daily | 5 (3.5%) | 9 (9.5%) | 49 (20.0%) | 60 (28.8%) | ||
|
| ||||||
| Yes | 113 (78.5%) | 69 (72.6%) | 147 (60.0%) | 113 (54.3%) | <0.001a | |
| No | 31 (21.5%) | 26 (27.4%) | 98 (40.0%) | 95 (45.7%) | ||
|
| ||||||
| ≥3 times | 114 (79.2%) | 53 (55.8%) | 116 (47.3%) | 95 (45.7%) | <0.001a | |
| ≤2 times | 30 (20.8%) | 42 (44.2%) | 129 (52.7%) | 113 (54.3%) | ||
aChi-squared test, bKruskal-Wallis One Way Analysis of Variance on Ranks.
Detection frequency and copy numbers of nine pathogens from clinical samples.
| Pathogens | Detection rate; No. of subjects (%) | P-valuea | Copy numbers; Mean ± SD, Median (IQR) | P-valueb | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Healthy(n = 144) | Chronic periodontitis | Healthy(n = 144) | Chronic periodontitis | |||||||
| Slight(n = 95) | Moderate(n = 245) | Severe(n = 208) | Slight(n = 95) | Moderate(n = 245) | Severe(n = 208) | |||||
|
| 130 (90.3%) | 90 (94.7%) | 242 (98.8%) | 205 (98.6%) | <0.001 | 3.73 ± 1.43 | 4.75 ± 1.37 | 5.51 ± 0.92 | 5.70 ± 0.90 | <0.001 |
|
| 71 (49.3%) | 84 (88.4%) | 239 (97.6%) | 207 (99.5%) | <0.001 | 1.71 ± 1.58 | 3.64 ± 1.46 | 4.45 ± 0.94 | 4.78 ± 0.64 | <0.001 |
|
| 101 (70.1%) | 78 (82.1%) | 236 (96.3%) | 204 (98.1%) | <0.001 | 2.45 ± 1.39 | 3.47 ± 1.72 | 4.55 ± 1.08 | 4.74 ± 0.91 | <0.001 |
|
| 64 (44.4%) | 70 (73.7%) | 215 (87.8%) | 202 (97.1%) | <0.001 | 1.81 ± 2.04 | 3.52 ± 2.15 | 4.51 ± 1.76 | 5.00 ± 1.11 | <0.001 |
|
| 144 (100%) | 95 (100%) | 245 (100%) | 208 (100%) | NA | 5.02 ± 0.61 | 5.30 ± 0.55 | 5.47 ± 0.48 | 5.56 ± 0.45 | <0.001 |
|
| 130 (90.3%) | 94 (98.9%) | 244 (99.6%) | 208 (100%) | <0.001 | 3.52 ± 1.15 | 4.36 ± 0.74 | 4.72 ± 0.58 | 4.88 ± 0.51 | <0.001 |
|
| 11 (7.6%) | 15 (15.8%) | 35 (14.3%) | 52 (25.0%) | <0.001 | 0.45 ± 0.93 | 0.59 ± 1.30 | 0.55 ± 1.28 | 0.96 ± 1.61 | 0.107 |
|
| 137 (95.1%) | 89 (93.7%) | 232 (94.7%) | 206 (99.0%) | 0.055 | 3.94 ± 1.11 | 4.17 ± 1.30 | 4.53 ± 1.28 | 4.92 ± 0.87 | <0.001 |
|
| 139 (96.5%) | 94 (98.9%) | 243 (99.2%) | 207 (99.5%) | 0.076 | 3.75 ± 0.85 | 4.23 ± 0.80 | 4.49 ± 0.67 | 4.51 ± 0.65 | <0.001 |
aChi-squared test, bKruskal-Wallis One Way Analysis of Variance on Ranks.
NA, not applicable; IQR, interquartile range.
Figure 2Quantification of the copy numbers of periodontal pathogens in mouthwash samples from healthy controls and chronic periodontitis patients. Whisker box plots indicate the distributions of copy numbers in each group. Pg, Porphyromonas gingivalis; Tf, Tannerella forsythia; Td, Treponema denticola; Pi, Prevotella intermedia; Fn, Fusobacterium nucleatum; Cr, Campylobacter rectus; Aa, Aggregatibacter actinomycetemcomitans; Pa, Peptostreptococcus anaerobius; Ec, Eikenella corrodens. H, Healthy; Sli, Slight; M, Moderate; S, Severe. *p < 0.05, **p < 0.001, NS, not statistically significant.
Feature combinations and their predictive accuracy with different machine learning methods.
| Group | Model | Feature combination | Accuracy | Balanced accuracy | AUC (95% CI) | Sensitivity (95% CI) | Specificity(95% CI) | Odds ratio(95% CI) | |
|---|---|---|---|---|---|---|---|---|---|
| 1 | H vs. M-S | Neural Network |
| 0.93 | 0.91 | 0.96(0.95–0.98) | 0.95(0.92–0.98) | 0.87(0.78–0.95) | 127.2(67.08–241.01) |
| H vs. M-S | Random Forest |
| 0.93 | 0.89 | 0.96(0.95–0.97) | 0.96(0.92–0.99) | 0.83(0.75–0.91) | 117.2(61.56–223.04) | |
| H vs. M-S | Support Vector Machine |
| 0.92 | 0.86 | 0.96(0.94–0.99) | 0.97(0.95–1.00) | 0.74(0.65–0.83) | 92(48.03–176.33) | |
| H vs. M-S | Regularized Logistic Regression |
| 0.92 | 0.88 | 0.97(0.95–0.98) | 0.97(0.95–0.99) | 0.78(0.66–0.91) | 114.6(59.16–222.12) | |
|
|
|
|
|
|
|
| |||
| 2 | H vs. Sli-M-S | Neural Network |
| 0.90 | 0.86 | 0.94(0.91–0.97) | 0.93(0.90–0.97) | 0.79(0.68–0.89) | 50.0(29.71–84.07) |
| H vs. Sli-M-S | Random Forest |
| 0.91 | 0.84 | 0.94(0.92–0.96) | 0.95(0.91–1.00) | 0.72(0.65–0.80) | 48.9(28.71–83.14) | |
| H vs. Sli-M-S | Support Vector Machine |
| 0.89 | 0.79 | 0.94(0.91–0.96) | 0.97(0.93–1.00) | 0.61(0.53–0.70) | 50.6(27.76–92.12) | |
| H vs. Sli-M-S | Regularized Logistic Regression |
| 0.90 | 0.80 | 0.94(0.91–0.96) | 0.96(0.93–1.00) | 0.64(0.59–0.69) | 42.7(24.73–73.62) | |
|
|
|
|
|
|
|
| |||
| 3 | H vs. Sli | Neural Network |
| 0.80 | 0.77 | 0.82(0.74–0.89) | 0.67(0.55–0.80) | 0.88(0.82–0.93) | 14.89(7.67–28.91) |
| H vs. Sli | Random Forest |
| 0.78 | 0.77 | 0.81(0.75–0.88) | 0.71(0.57–0.84) | 0.83(0.76–0.89) | 12.00(6.42–22.26) | |
| H vs. Sli | Support Vector Machine |
| 0.78 | 0.77 | 0.83(0.78–0.88) | 0.72(0.55–0.89) | 0.82(0.77–0.86) | 11.70(6.33–21.68) | |
| H vs. Sli | Regularized Logistic Regression |
| 0.79 | 0.78 | 0.82(0.77–0.87) | 0.75(0.58–0.91) | 0.81(0.73–0.90) | 12.80(6.85–23.87) | |
|
|
|
|
|
|
|
| |||
AUC, area under the curve; CI, confidence interval.
Figure 3Optimal accuracy and balanced accuracy based on the number of features in Group 3 (H vs. Sli). Big circles represent the feature combination with the highest accuracy in each model.
Characteristics of healthy subjects and periodontitis patients (n = 45).
| Characteristics | Healthy (n = 22) | Chronic periodontitis | P-value | |
|---|---|---|---|---|
| Slight (n = 8) | Moderate-Severe (n = 15)C | |||
|
| ||||
| Male | 14 (63.6%) | 4 (50.0%) | 6 (40.0%) | 0.360a |
| Female | 8 (36.4%) | 4 (50.0%) | 9 (60.0%) | |
|
| ||||
| 20-29 | 2 (9.1%) | 1 (12.5%) | 0 (0.0%) | <0.001a |
| 30-39 | 17 (77.3%) | 2 (25.0%) | 0 (0.0%) | |
| 40-49 | 0 (0.0%) | 4 (50.0%) | 3 (20.0%) | |
| 50-59 | 2 (9.1%) | 0 (0.0%) | 7 (46.7%) | |
| ≥60 | 1 (4.5%) | 1 (12.5%) | 5 (33.3%) | |
|
| ||||
| Mean ± SD | 2.46 ± 0.19 | 2.92 ± 0.52d | 3.96 ± 1.22 | <0.001b |
| Median (IQR) | 2.49 (2.33–2.57) | 2.84 (2.43–3.31)d | 3.72 (3.47–4.29) | |
|
| ||||
| Mean ± SD | 2.36 ± 0.19 | 2.52 ± 0.34d | 3.36 ± 0.63 | <0.001b |
| Median (IQR) | 2.30 (2.22–2.49) | 2.54 (2.27–2.68)d | 3.34 (2.94–3.74) | |
|
| ||||
| Mean ± SD | 25.60 ± 11.42 | 57.04 ± 16.98d | 53.28 ± 20.07 | <0.001b |
| Median (IQR) | 25.89 (19.26–29.48) | 57.41 (45.69–63.39)d | 50.00 (35.71–65.52) | |
|
| ||||
| Mean ± SD | 0.29 ± 0.16 | 0.43 ± 0.39d | 0.86 ± 0.53 | 0.003b |
| Median (IQR) | 0.30 (0.18–0.38) | 0.23 (0.09–0.85)d | 0.92 (0.38–1.18) | |
|
| ||||
| Never | 14 (63.6%) | 5 (62.5%) | 9 (60.0%) | 0.333a |
| Former | 2 (9.1%) | 0 (0.0%) | 3 (20.0%) | |
| Daily | 6 (27.3%) | 2 (25.0%) | 3 (20.0%) | |
| Unknown | 0 (0.0%) | 1 (12.5%) | 0 (0.0%) | |
|
| ||||
| Yes | 16 (72.7%) | 5 (62.5%) | 6 (40.0%) | 0.057a |
| No | 6 (27.3%) | 2 (25.0%) | 9 (60.0%) | |
| Unknown | 0 (0.0%) | 1 (12.5%) | 0 (0.0%) | |
|
| ||||
| ≥3 times | 15 (68.2%) | 6 (75.0%) | 6 (40.0%) | 0.046a |
| ≤2 times | 7 (31.8%) | 1 (12.5%) | 9 (60.0%) | |
| Unknown | 0 (0.0%) | 1 (12.5%) | 0 (0.0%) | |
aFisher’s exact test, bKruskal-Wallis One Way Analysis of Variance on Ranks, cTwo moderate and 13 severe chronic periodontitis patients, dThese values were determined based on data from 7 patients due to missing data in one patient.
Validation of machine learning classification with new data set (n = 45, 22 healthy controls, 8 slight, 15 moderate or severe chronic periodontitis patients).
| Group | Model | Feature combination | Accuracy | Balanced accuracy | Sensitivity | Specificity | |
|---|---|---|---|---|---|---|---|
| 1 | H vs. M-S | Neural Network |
| 0.84 | 0.85 | 0.93 | 0.77 |
| H vs. M-S | Random Forest |
| 0.86 | 0.88 | 0.93 | 0.82 | |
| H vs. M-S | Support Vector Machine |
| 0.78 | 0.81 | 0.93 | 0.68 | |
| H vs. M-S | Regularized Logistic Regression |
| 0.81 | 0.83 | 0.93 | 0.73 | |
|
|
|
|
|
| |||
| 2 | H vs. Sli-M-S | Neural Network |
| 0.69 | 0.69 | 0.74 | 0.64 |
| H vs. Sli-M-S | Random Forest |
| 0.76 | 0.75 | 0.78 | 0.73 | |
| H vs. Sli-M-S | Support Vector Machine |
| 0.67 | 0.66 | 0.87 | 0.45 | |
| H vs. Sli-M-S | Regularized Logistic Regression |
| 0.71 | 0.71 | 0.83 | 0.59 | |
|
|
|
|
|
| |||
| 3 | H vs. Sli | Neural Network |
| 0.63 | 0.51 | 0.25 | 0.77 |
| H vs. Sli | Random Forest |
| 0.73 | 0.66 | 0.50 | 0.82 | |
| H vs. Sli | Support Vector Machine |
| 0.63 | 0.51 | 0.25 | 0.77 | |
| H vs. Sli | Regularized Logistic Regression |
| 0.60 | 0.53 | 0.38 | 0.68 | |
|
|
|
|
|
| |||
Figure 4T-distributed Stochastic Neighbor Embedding (t-SNE) plot of the 692 subjects using random forest model. (A) t-SNE plot of periodontally healthy people (H) and patients with moderate or severe chronic periodontitis (M-S); (B) t-SNE plot of periodontally healthy people (H) and patients with slight, moderate or severe chronic periodontitis (Sli-M-S); (C) t-SNE plot of periodontally healthy people (H) and patients with slight chronic periodontitis (Sli).