| Literature DB >> 30996304 |
Enrico Barelli1, Ennio Ottaviani1,2, Pietro Auconi3, Guido Caldarelli4,5,6, Veronica Giuntini7, James A McNamara8,9, Lorenzo Franchi7,8.
Abstract
The aim of the study was to investigate how to improve the forecasting of craniofacial unbalance risk during growth among patients affected by Class III malocclusion. To this purpose we used computational methodologies such as Transductive Learning (TL), Boosting (B), and Feature Engineering (FE) instead of the traditional statistical analysis based on Classification trees and logistic models. Such techniques have been applied to cephalometric data from 728 cross-sectional untreated Class III subjects (6-14 years of age) and from 91 untreated Class III subjects followed longitudinally during the growth process. A cephalometric analysis comprising 11 variables has also been performed. The subjects followed longitudinally were divided into two subgroups: favourable and unfavourable growth, in comparison with normal craniofacial growth. With respect to traditional statistical predictive analytics, TL increased the accuracy in identifying subjects at risk of unfavourable growth. TL algorithm was useful in diffusion of information from longitudinal to cross-sectional subjects. The accuracy in identifying high-risk subjects to growth worsening increased from 63% to 78%. Finally, a further increase in identification accuracy, up to 83%, was produced by FE. A ranking of important variables in identifying subjects at risk of growth worsening, therefore, has been obtained.Entities:
Mesh:
Year: 2019 PMID: 30996304 PMCID: PMC6470156 DOI: 10.1038/s41598-019-42384-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Cephalometric landmarks.
Figure 4Histograms of the used variables.
Figure 5Logical scheme of the analysis sequence.
Comparison of various configurations of experiments on the K-fold cross-validation (K = 10), mean and standard deviation (SD) across the folds are reported.
| Mean cv accuracy | SD cv accuracy | Mean cv f1 score | SD cv f1 score | Mean cv precision | SD cv precision | Mean cv recall | SD cv recall | |
|---|---|---|---|---|---|---|---|---|
| LASSO | 0.626 | 0.123 | 0.536 | 0.165 | 0.433 | 0.142 | 0.710 | 0.210 |
| LASSO (TL) | 0.620 | 0.186 | 0.506 | 0.244 | 0.450 | 0.253 | 0.666 | 0.365 |
| LASSO (TL + FE) | 0.684 | 0.128 | 0.553 | 0.159 | 0.505 | 0.117 | 0.666 | 0.258 |
| GB | 0.648 | 0.187 | 0.445 | 0.297 | 0.444 | 0.307 | 0.495 | 0.336 |
| GB (TL) | 0.782 | 0.147 | 0.628 | 0.287 | 0.627 | 0.305 | 0.650 | 0.300 |
| GB (FE + TL) | 0.834 | 0.088 | 0.710 | 0.164 | 0.768 | 0.201 | 0.700 | 0.221 |
Gradient Boosting is more accurate than Logistic L1. The higher f1 score shows that it also balances errors in a better way. While through direct inspection of the logistic model a clear interpretation of how the prediction can be made, the accuracy and balance of the model is much poorer. The addition of TL reduces the standard deviation of the results of the GB algorithm.
Figure 6GB (with TL and FE) Mean ROC curve (with the standard deviation represented by the blue shadow) across the k-fold cross-validation (K = 10).
Figure 2On the x-axis: Normalized importance ranking of cephalometric variables in predicting good/bad Class III growers. The ranking was obtained by averaging the information gain obtained through splitting on a specific variable across all trees, and then across all the folds of the K-fold cross-validation (K = 10). Error bars corresponding to the standard deviation across the folds is reported. It can be noted that, while significant, the error is not large enough to completely change the ranking.
Mean and standard deviation (Std) of the sample grouped by age of the patient.
| Age | ArGoMe | CoGo | NMe | NSAr | PPMP | PPSN | Sar | SN | SNA | SNB | GoGn |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 130.54 (5.50) | 46.64 (3.68) | 105.19 (5.87) | 120.91 (5.47) | 26.92 (4.47) | 7.95 (3.58) | 29.17 (2.65) | 68.04 (3.47) | 80.30 (3.48) | 79.45 (3.35) | 72.43 (4.81) |
| 8 | 129.78 (6.04) | 48.64 (3.97) | 108.86 (6.28) | 122.09 (5.01) | 26.46 (4.95) | 8.36 (4.95) | 30.30 (2.88) | 69.50 (3.62) | 79.72 (3.57) | 79.05 (3.46) | 74.69 (4.63) |
| 9 | 129.24 (6.36) | 49.56 (4.12) | 110.26 (6.73) | 121.42 (4.90) | 26.38 (4.96) | 8.34 (3.11) | 30.66 (3.22) | 69.56 (3.81) | 80.06 (3.39) | 79.46 (3.32) | 75.43 (4.69) |
| 10 | 130.52 (6.33) | 50.41 (4.59) | 112.52 (5.95) | 121.45 (5.04) | 26.10 (5.00) | 7.90 (2.86) | 31.96 (2.76) | 70.52 (3.73) | 81.14 (3.94) | 80.61 (3.61) | 77.21 (4.92) |
| 11 | 130.57 (5.17) | 52.89 (4.74) | 117.85 (6.66) | 122.66 (4.68) | 27.91 (5.40) | 8.65 (2.86) | 32.22 (2.75) | 71.01 (3.82) | 80.00 (3.85) | 79.63 (3.25) | 80.20 (4.85) |
| 12 | 130.56 (6.73) | 54.66 (6.42) | 120.30 (7.51) | 122.47 (4.75) | 27.16 (6.18) | 9.05 (3.75) | 32.62 (3.70) | 72.42 (3.75) | 80.29 (4.16) | 79.99 (3.73) | 82.05 (6.21) |
| 13 | 129.45 (6.09) | 55.69 (5.75) | 122.32 (8.18) | 122.36 (5.49) | 26.66 (4.88) | 8.79 (2.92) | 33.91 (3.60) | 72.10 (3.62) | 80.65 (3.56) | 80.36 (3.08) | 82.77 (5.88) |
Figure 3On the x-axis: Normalized importance ranking of cephalometric variables in predicting good/bad Class III growers. The ranking was obtained by averaging the information gain obtained through splitting on a specific variable across all trees, and then across all the folds of the K-fold cross-validation (K = 10). Error bars corresponding to the standard deviation across the folds is reported.
Correlation matrix between the variables.
| ArGoMe | CoGo | NMe | NSAr | PPMP | PPSN | Sar | SN | SNA | SNB | GoGn | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ArGoMe | 1 | −0.22 | 0.16 | 0.07 | 0.56 | 0.10 | −0.09 | 0.04 | −0.11 | −0.22 | −0.28 |
| CoGo | −0.22 | 1 | 0.62 | 0.04 | −0.25 | −0.01 | 0.50 | 0.50 | 0.11 | 0.23 | 0.54 |
| NMe | 0.16 | 0.62 | 1 | 0.09 | 0.32 | 0.21 | 0.56 | 0.52 | −0.14 | −0.21 | 0.62 |
| NSAr | 0.07 | 0.04 | 0.09 | 1 | −0.05 | 0.38 | −0.05 | −0.08 | −0.41 | −0.43 | 0.02 |
| PPMP | 0.56 | −0.25 | 0.32 | −0.05 | 1 | −0.21 | −0.09 | −0.11 | −0.22 | −0.42 | −0.05 |
| PPSN | 0.10 | −0.01 | 0.21 | 0.38 | −0.21 | 1 | −0.16 | 0.06 | −0.35 | −0.47 | 0.01 |
| Sar | −0.09 | 0.50 | 0.56 | −0.05 | −0.09 | −0.16 | 1 | 0.31 | 0.11 | 0.16 | 0.45 |
| SN | 0.04 | 0.50 | 0.52 | −0.08 | −0.11 | 0.06 | 0.31 | 1 | −0.13 | −0.03 | 0.52 |
| SNA | −0.11 | 0.11 | −0.14 | −0.41 | −0.22 | −0.35 | 0.11 | −0.13 | 1 | 0.79 | 0.17 |
| SNB | −0.22 | 0.23 | −0.21 | −0.43 | −0.42 | −0.47 | 0.16 | −0.03 | 0.79 | 1 | 0.28 |
| GoGn | −0.28 | 0.54 | 0.62 | 0.02 | −0.05 | 0.01 | 0.45 | 0.52 | 0.17 | 0.28 | 1 |
Figure 7Histograms and scatterplots of some of the variables by age group.