| Literature DB >> 36203936 |
Huimin Xian1, Yanyan Xie1, Zizhu Yang1, Linzi Zhang1, Shangxuan Li1, Hongcai Shang2, Wu Zhou1, Honglai Zhang1.
Abstract
The quality of tongue images has a significant influence on the performance of tongue diagnosis in Chinese medicine. During the acquisition process, the quality of the tongue image is easily affected by factors such as the illumination, camera parameters, and tongue extension of the subject. To ensure that the quality of the collected images meet the diagnostic criteria of traditional Chinese Medicine practitioners, we propose a deep learning model to evaluate the quality of tongue images. First, we acquired the tongue images of the patients under different lighting conditions, exposures, and tongue extension conditions using the inspection instrument, and experienced Chinese physicians manually screened them into high-quality and unqualified tongue datasets. We then designed a multi-task deep learning network to classify and evaluate the quality of tongue images by adding tongue segmentation as an auxiliary task, as the two tasks are related and can promote each other. Finally, we adaptively designed different task weight coefficients of a multi-task network to obtain better tongue image quality assessment (IQA) performance, as the two tasks have relatively different contributions in the loss weighting scheme. Experimental results show that the proposed method is superior to the traditional deep learning tongue IQA method, and as an additional task of the network, it can output the tongue segmentation area, which provides convenience for follow-up clinical tongue diagnosis. In addition, we used network visualization to verify the effectiveness of the proposed method qualitatively.Entities:
Keywords: deep learning; multi-task learning model; tongue image quality assessment; tongue segmentation; traditional Chinese medicine
Year: 2022 PMID: 36203936 PMCID: PMC9531121 DOI: 10.3389/fphys.2022.966214
Source DB: PubMed Journal: Front Physiol ISSN: 1664-042X Impact factor: 4.755
FIGURE 1High-quality and unqualified tongue images. (A) high quality tongue images; (B)-(E) unqualified images; (B) blurred tongue images, (C) too brightly lit tongue images; (D) too dimly lit tongue images, (E) tongue with insufficient tongue extension.
FIGURE 2(A) Original image (B) segmentation mask (C) extracted tongue image (D) resized image.
FIGURE 3Proposed multi-task deep learning framework.
FIGURE 4Structure of U-Net based on VGG16.
Ablation models. STL: single task learning; MTL: multi-task learning; Cla: classification; Seg: segmentation.
| Models | Task | Input images |
|---|---|---|
| STL_OTI | Cla | Original tongue images ( |
| STL_ETI | Cla | Extracted tongue image ( |
| MTL_equal_weight | Cla + Seg | Original images ( |
| MTL_adaptive_weight | Cla + Seg | Original images ( |
Performance comparison of different methods.
| Classification (mean ± sd) | Segmentation(mean ± sd) | |||||||
|---|---|---|---|---|---|---|---|---|
| Models | Accuracy | Precision | Recall | F1-score | DSC | JI | MIoU | FWIoU |
| ResNet_base ( | 0.813 ± 0.041 | 0.811 ± 0.027 | 0.801 ± 0.048 | 0.807 ± 0.031 | -- | -- | -- | -- |
| Deeptongue ( | -- | -- | -- | -- | 0.9647 ± 0.0402 | 0.9581 ± 0.2373 | 0.9569 ± 0.0715 | 0.9573 ± 0.1964 |
| Deeplabv3 ( | -- | -- | -- | -- | 0.9651 ± 0.0136 | 0.9617 ± 0.4013 | 0.9577 ± 0.0116 | 0.9579 ± 0.2399 |
| MTL_adaptive weight(ours) | 0.890 ± 0.018 | 0.873 ± 0.034 | 0.899 ± 0.035 | 0.870 ± 0.017 | 0.9673 ± 0.0015 | 0.9711 ± 0.0044 | 0.9681 ± 0.0604 | 0.9693 ± 0.0170 |
Performance of ablation study in the proposed method.
| Classification (mean ± sd) | Segmentation(mean ± sd) | |||||||
|---|---|---|---|---|---|---|---|---|
| Models | Accuracy | Precision | Recall | F1-score | DSC | JI | MIoU | FWIoU |
| STL_OTI ( | 0.816 ± 0.035 | 0.819 ± 0.015 | 0.803 ± 0.019 | 0.810 ± 0.023 | 0.9657 ± 0.0008 | 0.9691 ± 0.3395 | 0.9672 ± 0. 2613 | 0.9677 ± 0.0686 |
| STL_ETI | 0.878 ± 0.027 | 0.864 ± 0.031 | 0.822 ± 0.025 | 0.842 ± 0.014 | -- | -- | -- | -- |
| MTL_equal weight | 0.879 ± 0.021 | 0.833 ± 0.028 | 0.885 ± 0.037 | 0.858 ± 0.026 | 0.9662 ± 0.0020 | 0.9698 ± 0.3347 | 0.9675 ± 0.1300 | 0.9678 ± 0.2701 |
| MTL_adaptive weight (ours) | 0.890 ± 0.018 | 0.873 ± 0.034 | 0.899 ± 0.035 | 0.870 ± 0.017 | 0.9673 ± 0.0015 | 0.9711 ± 0.0044 | 0.9679 ± 0.0604 | 0.9707 ± 0. 1278 |
FIGURE 5Accuracy and loss curves for the different methods in tongue images.
FIGURE 6The visualization results of MTL_adaptive weight strategy. (A) Visualization of the weight values (Sigma_Cla = σ1, Sigma_Seg = σ2) changing during training. (B) Loss curves for segmentation and classification tasks.
FIGURE 7Visualization of saliency maps. Image1: high quality tongue image; and Image2. True indicates that the prediction is correct, and False indicates that the prediction is wrong.
FIGURE 8Visualization of tongue segmentation, prediction: prediction results; Total uncertainty: data uncertainty and model uncertainty; The value range of uncertainty maps is between 0 and 1, and a larger value represents a higher degree of uncertainty.