| Literature DB >> 32368332 |
Xu Wang1, Jingwei Liu1, Chaoyong Wu1, Junhong Liu2, Qianqian Li1, Yufeng Chen1, Xinrong Wang1, Xinli Chen1, Xiaohan Pang1, Binglong Chang1, Jiaying Lin1, Shifeng Zhao3, Zhihong Li1, Qingqiong Deng3, Yi Lu4, Dongbin Zhao4, Jianxin Chen1.
Abstract
Tongue diagnosis plays a pivotal role in traditional Chinese medicine (TCM) for thousands of years. As one of the most important tongue characteristics, tooth-marked tongue is related to spleen deficiency and can greatly contribute to the symptoms differentiation and treatment selection. Yet, the tooth-marked tongue recognition for TCM practitioners is subjective and challenging. Most of the previous studies have concentrated on subjectively selected features of the tooth-marked region and gained accuracy under 80%. In the present study, we proposed an artificial intelligence framework using deep convolutional neural network (CNN) for the recognition of tooth-marked tongue. First, we constructed relatively large datasets with 1548 tongue images captured by different equipments. Then, we used ResNet34 CNN architecture to extract features and perform classifications. The overall accuracy of the models was over 90%. Interestingly, the models can be successfully generalized to images captured by other devices with different illuminations. The good effectiveness and generalization of our framework may provide objective and convenient computer-aided tongue diagnostic method on tracking disease progression and evaluating pharmacological effect from a informatics perspective.Entities:
Keywords: Artificial intelligence; Convolutional neural network; Tongue diagnosis; Tooth-marked tongue; Traditional Chinese Medicine
Year: 2020 PMID: 32368332 PMCID: PMC7186367 DOI: 10.1016/j.csbj.2020.04.002
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Overview of the datasets construction and main processing procedures. (A) The illustration of tongue image capturing with standard equipment. (B) Construction of the raw tongue image dataset and exemplar of tooth-marked and non-tooth-marked tongue. (C) Construction of the tongue region image dataset and exemplar of tooth-marked and non-tooth-marked tongue. (D) The training, testing, and validating of the convolutional neural network model. (E) The testing of the models in a new dataset of tongue images captured by ordinary camera.
Fig. 2Visualization of the ResNet34 model structure. Conv and pool stand for convolutional and pooling, respectively. The pooling or stride size is 2 (denoted by “/2”). “7 × 7 conv, 64” means that the convolutional kernel size is 7 × 7 and number of filters is 64. Solid lines indicate the input and output have identical dimensions, dashed lines indicate the input and output have different dimensions.
Five-fold cross-validation results of the ResNet34 architecture.
| Raw tongue image dataset (n = 1548) | Tongue region image dataset (n = 1548) | |||||
|---|---|---|---|---|---|---|
| Acc | Sens | Spec | Acc | Sens | Spec | |
| Fold 1 | 88.71% | 82.22% | 93.71% | 90.97% | 86.67% | 94.29% |
| Fold 2 | 93.23% | 90.48% | 95.11% | 92.58% | 88.10% | 95.65% |
| Fold 3 | 89.35% | 84.85% | 92.70% | 90.97% | 84.09% | 96.07% |
| Fold 4 | 91.26% | 90.70% | 91.67% | 92.88% | 88.37% | 96.11% |
| Fold 5 | 89.97% | 88.00% | 91.82% | 89.97% | 87.33% | 92.45% |
| Average (SD) | 87.25% (3.29%) | 93.00% (1.28%) | 86.91% (1.53%) | 94.91% (1.40%) | ||
Abbreviations: Acc, accuracy; Sens, sensitivity; Spec, specificity; SD, standard deviation.
The average accuracy was in bold.
Classification results on a new testing dataset.
| New raw tongue image dataset (n = 50) | New tongue region image dataset (n = 50) | |||||
|---|---|---|---|---|---|---|
| Acc | Sens | Spec | Acc | Sens | Spec | |
| Model 1 | 78.00% | 66.67% | 91.30% | 90.00% | 92.59% | 86.96% |
| Model 2 | 86.00% | 74.07% | 100% | 80.00% | 85.19% | 95.65% |
| Model 3 | 86.00% | 77.78% | 95.65% | 88.00% | 92.59% | 82.61% |
| Model 4 | 88.00% | 85.19% | 91.30% | 92.00% | 85.19% | 100% |
| Model 5 | 78.00% | 85.19% | 69.57% | 84.00% | 96.30% | 69.57% |
| Average (SD) | 77.78% (7.03%) | 89.56% (10.50%) | 90.37% (4.44%) | 86.96% (10.65%) | ||
Abbreviations: Acc, accuracy; Sens, sensitivity; Spec, specificity; SD, standard deviation.
The average accuracy was in bold.
Five-fold cross-validation results of the VGG16 architecture.
| Raw tongue image dataset (n = 1548) | Tongue region image dataset (n = 1548) | |||||
|---|---|---|---|---|---|---|
| Acc | Sens | Spec | Accuracy | Sens | Spec | |
| Fold 1 | 88.39% | 81.48% | 93.71% | 90.97% | 82.22% | 97.71% |
| Fold 2 | 90.97% | 85.71% | 94.57% | 91.29% | 88.10% | 93.48% |
| Fold 3 | 88.71% | 83.33% | 92.70% | 90.32% | 86.36% | 93.26% |
| Fold 4 | 91.26% | 86.05% | 95.00% | 92.88% | 93.02% | 92.78% |
| Fold 5 | 87.70% | 84.00% | 91.19% | 89.32% | 89.33% | 89.31% |
| Average (SD) | 84.11% (1.67%) | 93.43% (1.37%) | 87.81% (3.55%) | 93.31% (2.67%) | ||
Abbreviations: Acc, accuracy; Sens, sensitivity; Spec, specificity; SD, standard deviation.
The average accuracy was in bold.
Five-fold cross-validation results of the Sun’s architecture.
| Raw tongue image dataset (n = 1548) | Tongue region image dataset (n = 1548) | |||||
|---|---|---|---|---|---|---|
| Acc | Sens | Spec | Acc | Sens | Spec | |
| Fold 1 | 71.61% | 54.07% | 85.17% | 70.65% | 48.89% | 87.43% |
| Fold 2 | 74.19% | 56.35% | 86.41% | 75.16% | 63.49% | 83.15% |
| Fold 3 | 67.42% | 43.18% | 85.39% | 70.65% | 54.55% | 82.58% |
| Fold 4 | 74.11% | 57.36% | 86.11% | 73.46% | 58.91% | 83.89% |
| Fold 5 | 65.70% | 48.67% | 81.76% | 68.93% | 61.33% | 76.10% |
| Average (SD) | 51.93% (5.31%) | 84.96% (1.67%) | 57.43% (5.20%) | 82.63% (3.68%) | ||
Abbreviations: Acc, accuracy; Sens, sensitivity; Spec, specificity; SD, standard deviation.
The average accuracy was in bold.
Five-fold cross-validation results of the Sun’s architecture with input image size of 416.
| Raw tongue image dataset (n = 1548) | Tongue region image dataset (n = 1548) | |||||
|---|---|---|---|---|---|---|
| Acc | Sens | Spec | Acc | Sens | Spec | |
| Fold 1 | 68.06% | 44.44% | 86.29% | 69.35% | 57.78% | 78.29% |
| Fold 2 | 73.87% | 56.35% | 85.87% | 74.84% | 53.17% | 89.67% |
| Fold 3 | 68.06% | 50.00% | 81.46% | 70.65% | 56.82% | 80.90% |
| Fold 4 | 73.79% | 65.12% | 80.00% | 73.79% | 51.16% | 90.00% |
| Fold 5 | 66.67% | 58.67% | 74.21% | 68.61% | 60.00% | 76.73% |
| Average (SD) | 54.92% (7.13%) | 81.57% (4.41%) | 55.79% (3.20%) | 83.12% (5.64%) | ||
Abbreviations: Acc, accuracy; Sens, sensitivity; Spec, specificity; SD, standard deviation.
The average accuracy was in bold.
Fig. 3Comparison with other tooth-marked tongue recognition methods. Our models with ResNet34 and VGG16 architectures can increase the accuracy of tooth-marked tongue classification by about 20%.
Fig. 4Grad-CAM visualization examplars for the tongue image with tooth-mark. The upper panel shows the tongue region images and lower panel shows the heatmap of indicative regions by Grad-CAM overlapped on the tongue region images.