| Literature DB >> 36182939 |
Kang-Woo Lee1, Hyung-Jin Lee2, Hyewon Hu1, Hee-Jin Kim3,4.
Abstract
Transfer learning using a pre-trained model with the ImageNet database is frequently used when obtaining large datasets in the medical imaging field is challenging. We tried to estimate the value of deep learning for facial US images by assessing the classification performance for facial US images through transfer learning using current representative deep learning models and analyzing the classification criteria. For this clinical study, we recruited 86 individuals from whom we acquired ultrasound images of nine facial regions. To classify these facial regions, 15 deep learning models were trained using augmented or non-augmented datasets and their performance was evaluated. The F-measure scores average of all models was about 93% regardless of augmentation in the dataset, and the best performing model was the classic model VGGs. The models regarded the contours of skin and bones, rather than muscles and blood vessels, as distinct features for distinguishing regions in the facial US images. The results of this study can be used as reference data for future deep learning research on facial US images and content development.Entities:
Mesh:
Year: 2022 PMID: 36182939 PMCID: PMC9526737 DOI: 10.1038/s41598-022-20969-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Nine facial regions, their landmarks, and US images corresponding to each landmark. Transverse US images at the region were used for deep learning models. Forehead: 1, trichion (hair line at the midline); 2, metopion (midpoint of bilateral frontal eminence), 3, half point between 2 and 4; 4, glabella; 5, frontal eminence; 6, meeting point between lines passing 3 and medial canthus; 7, meeting point between lines passing 3 and mid-pupil; 8, meeting point between lines passing through 3 and lateral canthus. Oral: 9, half point between subnasale and 10; 10, lower point on cupid’s bow; 11, stomion; 12, midpoint of lower vermillion border. Mentum: 13, deepest point of the chin at the midline; 14, pogonion; 15, gnathion. Nose: 16, sellion; 17, rhinion; 18, pronasale. Supraorbital: 19, meeting point between lines passing 20 and the medial canthus; 20, superior orbital rim at the mid-pupillary line; 21, meeting point between lines passing 20 and the lateral canthus; 22, meeting point between lines passing 20 and the lateral orbital rim. Lateral nose: 23, meeting point between lines passing 26 and the medial canthus; 24, point between 23 and 25; 25, alare. Infraorbital: 26, superior orbital rim at the mid-pupillary line; 27, meeting point between lines passing 26 and the lateral canthus; 28, meeting point between lines passing 26 and the lateral orbital rim; 29, point between 26 and 32; 30, point between 27 and 33; 31, point between 28 and 34; 32, meeting point between lines passing alare and middle pupil; 33, meeting point between lines passing alare and the lateral canthus; 34, meeting point between lines passing alare and the lateral orbital rim. Anterior cheek: 35, meeting point between the line passing 9 and nasolabial folds; 36, meeting point between lines passing stomion and middle pupil; 37, meeting point between lines passing stomion and lateral cantus. Posterior cheek: 38–41, points that divide the masseter by the upper and lower boundaries.
Pre-trained deep learning models using ImageNet.
| Model | Depth | Size (MB) | Parameters (millions) | Image input size |
|---|---|---|---|---|
| AlexNet | 8 | 227 | 61 | 227 × 227 |
| DenseNet-201 | 201 | 77 | 20 | 224 × 224 |
| GoogleNet | 22 | 27 | 7 | 224 × 224 |
| Inception-ResNet-v2 | 164 | 209 | 55.9 | 299 × 299 |
| Inception-v3 | 48 | 89 | 23.9 | 299 × 299 |
| Mobilenet-v2 | 53 | 13 | 3.5 | 224 × 224 |
| NasNet-Mobile | a | 20 | 5.3 | 224 × 224 |
| ResNet-18 | 18 | 44 | 11.7 | 224 × 224 |
| ResNet-50 | 50 | 96 | 25.6 | 224 × 224 |
| ResNet-101 | 101 | 167 | 44.6 | 224 × 224 |
| ShuffleNet | 50 | 5.4 | 1.4 | 224 × 224 |
| SqueezeNet | 18 | 5.2 | 1.24 | 227 × 227 |
| VGG-16 | 16 | 515 | 138 | 224 × 224 |
| VGG-19 | 19 | 535 | 144 | 224 × 224 |
| Xception | 71 | 85 | 22.9 | 299 × 299 |
aThe NasNet-Mobile does not consist of a linear sequence of modules. MB, megabyte.
Figure 2Scatter plot of facial US images’ size.
Figure 3BRISQUE score according to facial US image size change. 224: 224 × 224 × 3, 227: 227 × 227 × 3, 229: 229 × 229 × 3.
The training final accuracy and loss values of the model using the non-augmented dataset and the models using the augmented dataset (accuracy: mean ± standard deviation %).
| Model | Non-augmented dataset | Augmented dataset | ||
|---|---|---|---|---|
| Accuracy | Loss | Accuracy | Loss | |
| AlexNet | 94.68 ± 1.40 | 0.21 ± 0.06 | 94.12 ± 1.13 | 0.19 ± 0.07 |
| DenseNet-201 | 93.88 ± 1.94 | 0.19 ± 0.07 | 95.31 ± 2.60 | 0.15 ± 0.08 |
| GoogleNet | 91.90 ± 2.14 | 0.23 ± 0.06 | 92.22 ± 3.03 | 0.23 ± 0.07 |
| Inception-ResNet-v2 | 92.53 ± 2.45 | 0.24 ± 0.07 | 93.65 ± 3.26 | 0.20 ± 0.07 |
| Inception-v3 | 93.73 ± 2.68 | 0.2 ± 0.07 | 94.92 ± 2.05 | 0.16 ± 0.07 |
| Mobilenet-v2 | 92.77 ± 2.09 | 0.19 ± 0.04 | 94.52 ± 2.06 | 0.16 ± 0.05 |
| NasNet-Mobile | 91.50 ± 3.36 | 0.28 ± 0.08 | 93.88 ± 1.35 | 0.21 ± 0.06 |
| ResNet-18 | 94.04 ± 2.40 | 0.19 ± 0.07 | 94.52 ± 2.13 | 0.18 ± 0.08 |
| ResNet-50 | 93.17 ± 1.84 | 0.21 ± 0.06 | 94.36 ± 2.06 | 0.18 ± 0.08 |
| ResNet-101 | 94.12 ± 1.80 | 0.18 ± 0.04 | 94.52 ± 1.88 | 0.17 ± 0.05 |
| ShuffleNet | 93.49 ± 2.50 | 0.22 ± 0.08 | 94.12 ± 2.54 | 0.17 ± 0.07 |
| SqueezeNet | 93.17 ± 1.80 | 0.24 ± 0.08 | 93.01 ± 3.05 | 0.28 ± 0.19 |
| VGG-16 | 95.71 ± 1.98 | 0.17 ± 0.11 | 96.03 ± 2.01 | 0.16 ± 0.10 |
| VGG-19 | 96.74 ± 1.60 | 0.13 ± 0.07 | 95.63 ± 2.15 | 0.20 ± 0.10 |
| Xception | 91.82 ± 2.89 | 0.26 ± 0.04 | 92.85 ± 2.84 | 0.25 ± 0.06 |
Figure 4Training results for 10 folds of each deep learning model.
Performance on the test set of the model with the non-augmented dataset and the models with the augmented dataset (model) (mean ± standard deviation %).
| Model | Non-augmented dataset | Augmented dataset | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | |
| AlexNet | 95.51 ± 0.74 | 95.22 ± 0.79 | 95.23 ± 0.79 | 94.51 ± 1.29 | 94.11 ± 1.64 | 94.12 ± 1.58 |
| DenseNet-201 | 94.35 ± 0.88 | 94.22 ± 0.88 | 94.20 ± 0.87 | 94.99 ± 0.90 | 94.72 ± 0.95 | 94.74 ± 0.98 |
| GoogleNet | 93.65 ± 1.15 | 93.28 ± 1.06 | 93.24 ± 1.06 | 93.58 ± 0.74 | 93.06 ± 0.92 | 93.00 ± 0.96 |
| Inception-ResNet-v2 | 91.70 ± 1.31 | 91.17 ± 1.45 | 91.13 ± 1.49 | 93.74 ± 1.22 | 93.33 ± 1.31 | 93.30 ± 1.33 |
| Inception-v3 | 94.24 ± 0.82 | 94.00 ± 0.86 | 93.96 ± 0.85 | 93.86 ± 0.99 | 93.56 ± 0.99 | 93.48 ± 1.00 |
| Mobilenet-v2 | 93.92 ± 0.84 | 93.67 ± 0.88 | 93.66 ± 0.86 | 95.03 ± 0.80 | 94.83 ± 0.87 | 94.80 ± 0.87 |
| NasNet-Mobile | 91.06 ± 1.27 | 90.17 ± 1.28 | 90.13 ± 1.30 | 91.32 ± 1.17 | 90.44 ± 1.30 | 90.41 ± 1.30 |
| ResNet-18 | 93.82 ± 1.12 | 93.50 ± 1.17 | 93.49 ± 1.17 | 93.58 ± 1.29 | 94.44 ± 1.39 | 93.04 ± 1.44 |
| ResNet-50 | 95.07 ± 0.97 | 94.83 ± 1.02 | 94.79 ± 1.01 | 96.31 ± 0.62 | 96.06 ± 0.67 | 96.05 ± 0.66 |
| ResNet-101 | 93.71 ± 1.00 | 93.33 ± 1.14 | 93.20 ± 1.18 | 94.97 ± 1.14 | 96.06 ± 1.48 | 94.41 ± 1.51 |
| ShuffleNet | 92.06 ± 1.18 | 91.56 ± 1.36 | 91.59 ± 1.30 | 92.52 ± 0.94 | 91.94 ± 1.12 | 91.98 ± 1.09 |
| SqueezeNet | 93.95 ± 1.39 | 93.50 ± 1.53 | 93.47 ± 1.52 | 93.86 ± 1.86 | 93.28 ± 2.33 | 93.24 ± 2.44 |
| VGG-16 | 96.88 ± 0.50 | 96.78 ± 0.51 | 96.76 ± 0.51 | 96.85 ± 0.84 | 96.67 ± 0.94 | 96.64 ± 0.97 |
| VGG-19 | 96.69 ± 0.57 | 96.56 ± 0.57 | 96.54 ± 0.58 | 95.99 ± 1.68 | 95.56 ± 1.98 | 95.58 ± 1.92 |
| Xception | 91.63 ± 0.77 | 91.44 ± 0.84 | 91.39 ± 0.84 | 91.71 ± 0.57 | 91.56 ± 0.63 | 91.44 ± 0.59 |
Figure 5Test results for 10 folds of each deep learning model.
Performance on the test set of the model with the non-augmented dataset and the models with the augmented dataset (region) (mean ± standard deviation %).
| Region | Non-augmented dataset | Augmented dataset | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F-measure | Precision | Recall | F-measure | |
| Anterior cheek | 93.54 ± 5.59 | 82.30 ± 6.33 | 87.31 ± 4.11 | 93.82 ± 5.3 | 84.5 ± 7.25 | 88.64 ± 4.37 |
| Forehead | 95.03 ± 5.01 | 95.50 ± 5.54 | 95.13 ± 4.06 | 96.90 ± 4.52 | 93.23 ± 6.29 | 94.88 ± 4.23 |
| Lateral nose | 99.80 ± 0.93 | 98.46 ± 2.31 | 99.11 ± 1.21 | 99.93 ± 0.54 | 99.33 ± 1.70 | 99.62 ± 0.90 |
| Mentum | 93.88 ± 4.96 | 94.30 ± 2.89 | 94.01 ± 3.13 | 94.31 ± 5.26 | 94.00 ± 3.17 | 94.05 ± 3.10 |
| Nose | 98.95 ± 2.18 | 98.73 ± 2.6 | 98.81 ± 1.69 | 99.08 ± 2.08 | 99.26 ± 1.95 | 99.15 ± 1.43 |
| Oral | 87.85 ± 5.35 | 96.93 ± 3.56 | 92.05 ± 3.41 | 89.31 ± 6.36 | 98.56 ± 2.54 | 93.56 ± 3.57 |
| Infraorbital | 94.13 ± 5.45 | 90.13 ± 7.16 | 91.88 ± 4.97 | 93.33 ± 6.3 | 89.2 ± 8.33 | 90.83 ± 5.14 |
| Supraorbital | 87.97 ± 7.36 | 87.86 ± 6.24 | 87.71 ± 5.48 | 86.77 ± 7.58 | 87.56 ± 6.67 | 86.84 ± 5.14 |
| Posterior cheek | 93.74 ± 4.72 | 97.7 ± 3.2 | 95.6 ± 3.09 | 94.14 ± 4.43 | 98.26 ± 2.95 | 96.09 ± 2.84 |
Figure 6Test results for 10 folds of each deep learning model in each region.
Figure 7LIME analysis of classification criteria for the facial region of VGG-16 and Xception. The top row is the local features considered by VGG-16, the middle row is the original image, and the bottom row is the local features considered by Xception. The red area is the strongly weighted local, and the blue area is the weakly weighted local.