| Literature DB >> 32831901 |
Xiaodong Huang1,2,3, Hui Zhang1,2, Li Zhuo1,2, Xiaoguang Li1,2, Jing Zhang1,2.
Abstract
Extracting the tongue body accurately from a digital tongue image is a challenge for automated tongue diagnoses, as the blurred edge of the tongue body, interference of pathological details, and the huge difference in the size and shape of the tongue. In this study, an automated tongue image segmentation method using enhanced fully convolutional network with encoder-decoder structure was presented. In the frame of the proposed network, the deep residual network was adopted as an encoder to obtain dense feature maps, and a Receptive Field Block was assembled behind the encoder. Receptive Field Block can capture adequate global contextual prior because of its structure of the multibranch convolution layers with varying kernels. Moreover, the Feature Pyramid Network was used as a decoder to fuse multiscale feature maps for gathering sufficient positional information to recover the clear contour of the tongue body. The quantitative evaluation of the segmentation results of 300 tongue images from the SIPL-tongue dataset showed that the average Hausdorff Distance, average Symmetric Mean Absolute Surface Distance, average Dice Similarity Coefficient, average precision, average sensitivity, and average specificity were 11.2963, 3.4737, 97.26%, 95.66%, 98.97%, and 98.68%, respectively. The proposed method achieved the best performance compared with the other four deep-learning-based segmentation methods (including SegNet, FCN, PSPNet, and DeepLab v3+). There were also similar results on the HIT-tongue dataset. The experimental results demonstrated that the proposed method can achieve accurate tongue image segmentation and meet the practical requirements of automated tongue diagnoses.Entities:
Mesh:
Year: 2020 PMID: 32831901 PMCID: PMC7428885 DOI: 10.1155/2020/6029258
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Schematic diagram of a typical computer-aided tongue diagnosis system.
Figure 2Difficult cases for automated tongue image segmentation. (a) Tongue bodies with different size and shape. (b) Severe fissures and teeth marks on the surface of a tongue body. (c) Similar color between tongue body and lips.
Figure 3The structures of Receptive Field Block.
Figure 4The structures of Receptive Field Block.
Figure 5The architecture of the tongue image segmentation network.
Training parameters of the proposed method.
| Hyperparameters | Interpretation |
|---|---|
| Normalization | Mean centering and standard deviation normalization of the intensities were performed |
| Cropping | All the images were center-cropped to a 512 × 512 pixel size |
| Optimization | SGD optimizer with the base learning rate 0.01 |
| Learning rate scheduling | The poly learning rate policy was used, where the current learning rate equals to the base one multiplying (1 − |
| Batch size | 4 |
| Epoch number size | 50 |
| Momentum | 0.9 |
| Weight decay | 0.0005 |
Description of the evaluation metrics.
| Metric name | Abbr. | Range | Interpretation | Category |
|---|---|---|---|---|
| Dice similarity coefficient | DSC | 0–1 | Similarity between masks | Overlap |
| Hausdorff distance | HD | >0 | Longest Euclidean distance between mask contours (absolute error) | Distance |
| Symmetric mean absolute | MSD | >0 | Mean Euclidean distance between mask contours (mean error) | Distance |
| Surface distance precision | PPV | 0–1 | Low values mean that the method tends to over segment | Statistical |
| Sensitivity | TPR | 0–1 | Low values mean that the method tends to under segment | Statistical |
| Specificity | TNR | 0–1 | Quality of segmented background | Statistical |
Evaluation of the segmentation results for 100 tongue images from the HIT-tongue dataset.
| Method | DSC | HD (pixel) | MSD (pixel) | Precision | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| SegNet | 0.9821 ± 0.0097 | 14.8461 ± 4.1231 | 3.0021 ± 2.0801 | 0.9814 ± 0.0153 | 0.9832 ± 0.0168 | 0.9893 ± 0.0082 |
| FCN | 0.9700 ± 0.0148 | 17.9651 ± 7.0000 | 4.8904 ± 2.8970 | 0.9646 ± 0.0246 | 0.9762 ± 0.0184 | 0.9792 ± 0.0143 |
| PSPNet | 0.9800 ± 0.0071 | 12.9046 ± 5.6969 | 3.2129 ± 1.0758 | 0.9806 ± 0.0119 | 0.9797 ± 0.0138 | 0.9885 ± 0.0075 |
| DeepLab v3+ | 0.9867 ± 0.0060 | 10.8410 ± 4.0000 | 2.1777 ± 1.0120 | 0.9834 ± 0.0104 | 0.9901 ± 0.0103 | 0.9903 ± 0.0064 |
| Ours | 0.9869 ± 0.0067 | 10.7215 ± 4.0000 | 2.1107 ± 1.0312 | 0.9862 ± 0.0096 | 0.9878 ± 0.0124 | 0.9921 ± 0.0053 |
Figure 6Segmentation results of tongue images from HIT-tongue dataset.
Evaluation of the segmentation results for 300 tongue images from the SIPL-tongue dataset.
| Method | DSC | HD (pixel) | MSD (pixel) | Precision | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| SegNet | 0.9645 ± 0.0194 | 26.3156 ± 33.4805 | 5.9745 ± 6.3669 | 0.9429 ± 0.0382 | 0.9871 ± 0.0114 | 0.9831 ± 0.0118 |
| FCN | 0.9646 ± 0.0148 | 14.4019 ± 5.4493 | 4.4256 ± 1.6332 | 0.9466 ± 0.0324 | 0.9843 ± 0.0144 | 0.9843 ± 0.0144 |
| PSPNet | 0.9680 ± 0.0138 | 12.9473 ± 5.1630 | 4.0266 ± 1.6854 | 0.9519 ± 0.0298 | 0.9854 ± 0.0132 | 0.9854 ± 0.0132 |
| DeepLab v3+ | 0.9699 ± 0.0148 | 12.5066 ± 6.2588 | 3.8472 ± 1.8831 | 0.9483 ± 0.0299 | 0.9931 ± 0.0060 | 0.9840 ± 0.0103 |
| Ours | 0.9726 ± 0.0136 | 11.2963 ± 5.7781 | 3.4737 ± 1.7573 | 0.9566 ± 0.0294 | 0.9897 ± 0.0085 | 0.9868 ± 0.0096 |
Figure 7Segmentation results of tongue images from SIPL-tongue dataset.
Ablation study on the SIPL-tongue dataset.
| Method | DSC | H D(pixel) | MSD (pixel) | Precision | Sensitivity | Specificity |
|---|---|---|---|---|---|---|
| Baseline | 0.9611 ± 0.0141 | 17.0623 ± 13.9580 | 5.1126 ± 2.2745 | 0.9387 ± 0.0324 | 0.9856 ± 0.0146 | 0.9809 ± 0.0110 |
| +RFB | 0.9654 ± 0.0124 | 13.8587 ± 5.0418 | 4.3694 ± 1.5632 | 0.9485 ± 0.0279 | 0.9836 ± 0.0148 | 0.9841 ± 0.0095 |
| +FPN | 0.9689 ± 0.0128 | 13.2633 ± 7.6874 | 3.9478 ± 1.6488 | 0.9515 ± 0.0292 | 0.9878 ± 0.0116 | 0.9850 ± 0.0099 |
| +MS | 0.9726 ± 0.0136 | 11.2963 ± 5.7781 | 3.4737 ± 1.7573 | 0.9566 ± 0.0294 | 0.9897 ± 0.0085 | 0.9868 ± 0.0096 |
+: adding a new module or strategy based on the last row instead of the baseline. RFB: embedding RFB block into the segmentation network. FPN: employing the FPN structure. MS: fusing multiscale feature maps before final pixel prediction.