| Literature DB >> 34143814 |
Hawazin W Elani1,2, André F M Batista3, W Murray Thomson4, Ichiro Kawachi5, Alexandre D P Chiavegatto Filho3.
Abstract
INTRODUCTION: Little is understood about the socioeconomic predictors of tooth loss, a condition that can negatively impact individual's quality of life. The goal of this study is to develop a machine-learning algorithm to predict complete and incremental tooth loss among adults and to compare the predictive performance of these models.Entities:
Mesh:
Year: 2021 PMID: 34143814 PMCID: PMC8213149 DOI: 10.1371/journal.pone.0252873
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Study flow diagram.
Demographic characteristics of the study sample: National Health and Nutrition Examination Survey (2011–2014).
| Full sample | Edentulous sample | Having fewer than 21 teeth | Missing any tooth | ||
|---|---|---|---|---|---|
| N = 11,977 | N = 736 | N = 2,663 | N = 6,919 | ||
| n | (%) | Survey-Weighted Proportions | |||
| Male | 5,813 | (48.2) | 48.0 | 48.1 | 48.3 |
| Female | 6,164 | (51.8) | 52.0 | 51.9 | 51.7 |
| 46.4 | ± 17.5 | 65.7 ± 13.7 | 61.7 ± 15.0 | 52.9 ± 16.8 | |
| Less than high school degree | 2,578 | (16.0) | 40.1 | 31.0 | 20.7 |
| High school graduate | 2,472 | (21.0) | 29.1 | 31.2 | 24.6 |
| Some college/college graduate | 6,267 | (63.0) | 30.9 | 37.7 | 54.7 |
| Non-Hispanic White | 4,679 | (65.7) | 74.1 | 64.6 | 64.8 |
| Non-Hispanic Black | 2,809 | (11.6) | 12.5 | 17.0 | 13.3 |
| Hispanic | 2,594 | (14.7) | 6.3 | 11.5 | 14.4 |
| Other | 1,895 | (7.9) | 7.1 | 6.9 | 7.4 |
| US-born | 8,463 | (82.4) | 89.2 | 84.8 | 82.2 |
| Foreign born | 3,505 | (17.6) | 10.8 | 15.2 | 17.8 |
| <100 | 2,770 | (17.4) | 29.8 | 24.7 | 18.8 |
| 100–200 | 2,851 | (21.5) | 35.1 | 32.3 | 24.7 |
| >200 | 5,339 | (61.2) | 35.1 | 43.0 | 56.5 |
| Insured | 9,279 | (81.1) | 89.7 | 83.0 | 80.2 |
| Uninsured | 2,680 | (18.9) | 10.3 | 17.0 | 19.8 |
| Yes | 4,379 | (44.7) | 35.4 | 19.9 | 37.3 |
| No | 7,388 | (55.3) | 96.5 | 80.1 | 62.7 |
| ≤24.99 | 3,695 | (31.1) | 29.6 | 25.9 | 27.5 |
| 25.0–29.99 | 3,595 | (32.9) | 33.4 | 31.7 | 33.7 |
| ≥30 | 4,082 | (36.0) | 37.0 | 42.3 | 38.8 |
| Asthma | 1,815 | (15.5) | 17.3 | 15.6 | 14.7 |
| Arthritis | 2,873 | (24.9) | 52.2 | 45.8 | 31.9 |
| Diabetes | 1,432 | (9.4) | 23.5 | 20.6 | 13.4 |
| Hypertension | 4,180 | (32.5) | 60.5 | 56.0 | 41.6 |
| High cholesterol levels | 3,846 | (33.7) | 52.8 | 49.5 | 39.5 |
| Stroke | 431 | (2.9) | 13.9 | 9.0 | 4.2 |
| Heart attack | 433 | (3.3) | 14.2 | 9.7 | 5.2 |
.
a Education is based on individuals ages 20 years and older. Wisdom teeth were excluded, and all analyses were based on a maximum of 28 teeth.
* Survey-Weighted Proportion. SD is standard deviation.
Performance of the machine-learning algorithms on the test data for each study outcome.
| AUC | ACC | Sensitivity | Specificity | F1 | PPV | NPV | Harmonic | |
|---|---|---|---|---|---|---|---|---|
| (95% CI) | Mean | |||||||
| Extreme gradient boosting trees | 88.7 (87.1, 90.2) | 83.8 | 74.3 | 84.5 | 39.4 | 26.8 | 97.7 | 79.0 |
| Random forests | 88.5 (86.9, 90.0) | 84.3 | 73.7 | 85.1 | 40.1 | 27.5 | 97.7 | 78.9 |
| Neural networks | 87.7 (86.0, 89.3) | 82.2 | 78.5 | 82.5 | 38.6 | 25.6 | 98.1 | 80.4 |
| Light gradient boosting machine | 88.4 (86.7, 89.9) | 82.7 | 76.4 | 83.1 | 38.5 | 25.7 | 97.9 | 79.6 |
| Logistic regression | 86.5 (84.7, 88.3) | 83.7 | 71.9 | 84.6 | 38.5 | 26.3 | 97.5 | 77.7 |
| Extreme gradient boosting trees | 88.3 (87.3, 89.3) | 81.5 | 74.1 | 84.2 | 68.1 | 62.9 | 90.0 | 78.8 |
| Random forests | 87.6 (86.5, 88.6) | 81.7 | 48.4 | 93.7 | 58.4 | 73.5 | 83.4 | 63.8 |
| Neural networks | 88.1 (87.0, 89.1) | 82.6 | 56.4 | 92.0 | 63.2 | 71.9 | 85.4 | 69.9 |
| Light gradient boosting machine | 87.7 (86.7, 88.7) | 82.5 | 58.0 | 91.3 | 63.7 | 70.7 | 85.7 | 70.9 |
| Logistic regression | 87.2 (86.2, 88.3) | 81.9 | 53.9 | 92.1 | 61.3 | 71.0 | 84.7 | 68.0 |
| Extreme gradient boosting trees | 83.2 (82.0, 84.4) | 74.0 | 95.9 | 29.6 | 83.2 | 73.4 | 77.8 | 45.2 |
| Random forests | 82.7 (81.4, 83.8) | 77.0 | 89.5 | 55.6 | 83.6 | 80.0 | 68.6 | 68.5 |
| Neural networks | 83.1 (81.9, 84.3) | 77.2 | 85.8 | 59.7 | 83.5 | 81.2 | 67.5 | 70.4 |
| Light gradient boosting machine | 81.9 (80.6, 83.0) | 73.9 | 93.7 | 33.8 | 82.8 | 74.2 | 72.6 | 49.6 |
| Logistic regression | 83.1 (81.9, 84.3) | 76.9 | 85.6 | 59.4 | 83.3 | 81.1 | 67.0 | 70.1 |
Test data: National Health and Nutrition Examination Survey (NHANES 2013–2014). AUC = area under the receiver operating characteristic curve; ACC = accuracy; PPV = positive predictive value; NPV = negative predictive value; F1 = F1 score; Harmonic mean = between sensitivity and specificity.
Fig 2Receiver-operating characteristics curves for the five analyzed predictive models for each outcome.
Fig 3Variable importance plot in the extreme gradient boosting trees models for each outcome.
Performance of the machine-learning algorithms on the test data for each study outcome when including clinical dental predictors.
| AUC | ACC | Sensitivity | Specificity | F1 | PPV | NPV | Harmonic | |
|---|---|---|---|---|---|---|---|---|
| (95% CI) | Mean | |||||||
| Extreme gradient boosting trees | 83.9 (82.1, 85.5) | 83.9 | 52.1 | 86.9 | 35.9 | 27.4 | 95.0 | 65.1 |
| Random forests | 78.0 (75.3, 80.6) | 80.7 | 61.9 | 82.5 | 35.7 | 25.1 | 95.8 | 70.7 |
| Neural networks | 83.7 (82.0, 85.3) | 77.1 | 73.4 | 77.4 | 35.7 | 23.6 | 96.8 | 75.3 |
| Light gradient boosting machine | 83.0 (81.2, 84.8) | 81.5 | 61.3 | 83.4 | 36.5 | 25.9 | 95.8 | 70.6 |
| Logistic regression | 84.6 (83.0, 86.1) | 76.6 | 77.7 | 76.5 | 36.4 | 23.8 | 97.3 | 77.1 |
| Extreme gradient boosting trees | 80.4 (78.9, 81.7) | 75.7 | 45.0 | 89.6 | 53.6 | 66.3 | 78.2 | 59.9 |
| Random forests | 80.0 (78.5, 81.4) | 75.1 | 42.3 | 90.0 | 51.4 | 65.7 | 77.4 | 57.5 |
| Neural networks | 80.3 (78.9,81.7) | 75.5 | 50.4 | 87.0 | 56.3 | 63.8 | 79.4 | 63.8 |
| Light gradient boosting machine | 79.1 (77.7, 80.5) | 71.3 | 72.3 | 70.8 | 61.1 | 53.0 | 84.9 | 71.5 |
| Logistic regression | 79.3 (77.9, 80.7) | 74.9 | 47.3 | 87.4 | 54.1 | 63.0 | 78.5 | 61.3 |
| Extreme gradient boosting trees | 79.8 (78.2, 81.2) | 76.9 | 93.1 | 29.6 | 85.7 | 79.4 | 59.6 | 44.9 |
| Random forests | 79.8 (78.2, 81.2) | 75.6 | 98.8 | 8.0 | 85.8 | 75.8 | 69.5 | 14.8 |
| Neural networks | 79.4 (77.8, 80.8) | 76.7 | 89.3 | 40.1 | 85.1 | 81.3 | 56.3 | 55.3 |
| Light gradient boosting machine | 78.3 (76.7, 79.8) | 75.7 | 98.6 | 8.7 | 85.8 | 75.9 | 68.7 | 15.9 |
| Logistic regression | 79.5 (78.0, 81.0) | 76.8 | 91.3 | 34.6 | 85.5 | 80.3 | 57.8 | 50.2 |
Test data: National Health and Nutrition Examination Survey (NHANES 2013–2014). AUC = area Under the receiver operating characteristic curve; ACC = accuracy; PPV = positive predictive value; NPV = negative predictive value; F1 = F1 score; Harmonic mean = between sensitivity and specificity.
a Predictor variables included are the number of decayed teeth, periodontal disease, age, gender, and race.