| Literature DB >> 23940512 |
Seoung Bum Kim1, Jung Woo Lee, Sin Young Kim, Deok Won Lee.
Abstract
Relevant statistical modeling and analysis of dental data can improve diagnostic and treatment procedures. The purpose of this study is to demonstrate the use of various data mining algorithms to characterize patients with dentofacial deformities. A total of 72 patients with skeletal malocclusions who had completed orthodontic and orthognathic surgical treatments were examined. Each patient was characterized by 22 measurements related to dentofacial deformities. Clustering analysis and visualization grouped the patients into three different patterns of dentofacial deformities. A feature selection approach based on a false discovery rate was used to identify a subset of 22 measurements important in categorizing these three clusters. Finally, classification was performed to evaluate the quality of the measurements selected by the feature selection approach. The results showed that feature selection improved classification accuracy while simultaneously determining which measurements were relevant.Entities:
Mesh:
Year: 2013 PMID: 23940512 PMCID: PMC3734183 DOI: 10.1371/journal.pone.0067862
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Landmark points (left figure) and planes (right figure) of lateral cephalometric radiograph.
Results of the Rand index and adjusted Rand index methods to determine k.
| k | 2 | 3 | 4 | 5 |
| Rand index | 0.92 | 0.92 | 0.74 | 0.73 |
| Adjusted Rand index | 0.80 | 0.82 | 0.34 | 0.26 |
Basic statistics of features (measurements) in each cluster (n = the number of patients).
| Feature (Measurement) | Cluster 1( | Cluster 2( | Cluster 3( | |
| Mean±SD | ||||
| 1 | S-N to FH | 9.3±3.0 | 8.8±3.3 | 8.8±2.9 |
| 2 | S-N to palatal | 8.1±2.9 | 8.6±3.3 | 9.6±3.1 |
| 3 | S-N to mandibular | 34.7±4.4 | 39.7±4.9 | 38.7±3.7 |
| 4 | FH to occlusal | 6.7±4.2 | 9.7±3.6 | 11.8±3.0 |
| 5 | FH to mandibular | 25.4±5.0 | 30.9±4.6 | 30.0±4.1 |
| 6 | SNA | 83.1±2.3 | 79.4±3.2 | 80.4±2.5 |
| 7 | FH to NA | 92.4±2.4 | 88.2±2.5 | 89.1±1.9 |
| 8 | Convexity | −0.3±8.0 | −4.7±6.5 | 8.8±4.0 |
| 9 | SNB | 83.2±2.5 | 81.3±3.2 | 76.0±2.6 |
| 10 | SN Pog | 83.2±2.7 | 81.7±3.2 | 76.4±2.6 |
| 11 | FH to NB | 92.5±3.1 | 90.0±2.7 | 84.8±2.0 |
| 12 | Facial angle | 92.5±3.4 | 90.5±2.9 | 85.1±1.9 |
| 13 | Y axis | 59.3±3.4 | 61.5±2.9 | 65.4±2.5 |
| 14 | Gonial angle | 126.5±8.1 | 131.5±6.4 | 124.2±7.1 |
| 15 | ANB difference | −0.1±3.3 | −1.8±2.7 | 4.4±1.7 |
| 16 | Palatal to mandibular | 26.6±5.2 | 31.1±4.7 | 29.2±4.1 |
| 17 | FH to U1 | 122.8±9.1 | 116.6±7.2 | 113.7±9.8 |
| 18 | FH to L1 | 59.2±6.4 | 63.8±6.8 | 52.1±4.8 |
| 19 | Interincisal angle | 116.4±8.7 | 127.2±8.7 | 118.4±11.6 |
| 20 | Mandibular to L1 | 95.4±5.6 | 85.2±5.8 | 97.9±4.9 |
| 21 | NP to U1 (mm) | 9.1±4.3 | 4.8±2.5 | 12.1±3.1 |
| 22 | NP to L1 (mm) | 9.3±3.8 | 5.9±3.4 | 7.3±2.5 |
Figure 2The photos and X-ray images representing three clusters identified by a k-means clustering algorithm.
Figure 3A scree plot to determine the number of PCs.
Figure 4Three-dimensional PCA score plot of 72 patients with facial deformities.
Selected features by FDR procedures for α = 0.01 and 0.05.
| α = 0.01 | α = 0.05 | ||
| Feature |
| Feature |
|
| FH to occlusal | 0.000 | FH to occlusal | 0.000 |
| Convexity | 0.000 | Convexity | 0.000 |
| SNB | 0.000 | SNB | 0.000 |
| SN Pog | 0.000 | SN Pog | 0.000 |
| FH to NB | 0.000 | FH to NB | 0.000 |
| Facial angle | 0.000 | Facial angle | 0.000 |
| Y axis | 0.000 | Y axis | 0.000 |
| ANB difference | 0.000 | ANB difference | 0.000 |
| FH to L1 | 0.000 | FH to L1 | 0.000 |
| FH to NA | 0.001 | ||
| FH to U1 | 0.002 | ||
Figure 5PCA plots using the features selected by (a) FDR level = 0.01 and (b) FDR level = 0.05.
Misclassification rate of KNN (k = 2, 4, 8, 16) for the datasets used with different numbers of features.
| KNN( | KNN( | KNN( | KNN( | |
| All Features | 0.22(0.10) |
| 0.20(0.11) | 0.25(0.12) |
| FDR(α = 0.05) | 0.23(0.10) | 0.19(0.10) |
| 0.20(0.11) |
| FDR(α = 0.01) | 0.23(0.10) | 0.19(0.10) |
| 0.21(0.10) |
Average standard errors from 1,000 experiments are shown inside the parentheses; boldface values indicate in each dataset the KNN models with minimum error rates.
Figure 6Box plots of different clusters using nine features selected by the FDR-based feature selection approach using α = 0.01.