Literature DB >> 35345817

Deep learning-based pelvic levator hiatus segmentation from ultrasound images.

Zeping Huang¹, Enze Qu¹, Yishuang Meng², Man Zhang¹, Qiuwen Wei², Xianghui Bai², Xinling Zhang¹.

Abstract

Purpose: To automatically segment and measure the levator hiatus with a deep learning approach and evaluate the performance between algorithms, sonographers, and different devices.
Methods: Three deep learning models (UNet-ResNet34, HR-Net, and SegNet) were trained with 360 images and validated with 42 images. The trained models were tested with two test sets. The first set included 138 images to evaluate the performance between the algorithms and sonographers. An independent dataset including 679 images assessed the performances of algorithms between different ultrasound devices. Four metrics were used for evaluation: DSC, HDD, the relative error of segmentation area, and the absolute error of segmentation area.
Results: The UNet model outperformed HR-Net and SegNet. It could achieve a mean DSC of 0.964 for the first test set and 0.952 for the independent test set. UNet was creditable compared with three senior sonographers with a noninferiority test in the first test set and equivalent in the two test sets collected by different devices. On average, it took two seconds to process one case with a GPU and 2.4 s with a CPU. Conclusions: The deep learning approach has good performance for levator hiatus segmentation and good generalization ability on independent test sets. This automatic levator hiatus segmentation approach could help shorten the clinical examination time and improve consistency.

Entities: Chemical

Keywords: DSC, Dice similarity coefficient; Deep learning; HDD, Hausdorff distance; Levator hiatus segmentation; PFD, pelvic floor dysfunction or disorder; POP, pelvic organ prolapses; Pelvic floor ultrasound; UNet, UNet-ResNet34

Year: 2022 PMID： 35345817 PMCID： PMC8956942 DOI： 10.1016/j.ejro.2022.100412

Source DB: PubMed Journal: Eur J Radiol Open ISSN： 2352-0477

Introduction

Female pelvic floor dysfunction or disorder (PFD) are prevalent in postpartum women with increasing morbidity that could reduce the quality of life to a certain degree. This condition is related to the weakness of levator ani muscles that support the pelvic organs. The group of PFD conditions includes defecation, urination, sexual activity disorders, pelvic organ prolapses (POP), and pelvic cavity pain [1]. Transperineal pelvic floor ultrasound is currently well-accepted in clinical practice, and the levator hiatus dimension is an essential diagnostic index since it has been proven a standardized parameter [2], [3], [4], [5]. However, it's difficult to represent the levator hiatus by conventional 2D ultrasound imaging since the muscles lie in the true axial view of the pelvis from the pubic symphysis to the puborectalis loop. 3D ultrasound technology can visualize the pelvic floor muscles and evaluate muscles' integrity and function with reconstruction. After acquiring a static 3D volume image, a 2D axial (C-plane) levator hiatus image is reconstructed by rendering technology. The manual approach is frequently adopted to outline the levator hiatus contour in this 2D C-plane rendered ultrasound image to compute the hiatus area for clinical diagnosis [6]. A semiautomatic outlining levator hiatus approach was first proposed in 2016 to reduce clinicians' workload [7]. However, it required manually inputting the pubic symphysis and pubovisceral muscle position in the C-plane in the initial step, and the Dice similarity coefficient (DSC) could achieve 0.92. Artificial intelligence is increasingly applied in clinical practice, and many machine learning methods have been developed to help clinical workflow optimization and decision making [8], [9]. Deep learning is a subgroup of machine learning which can automatically extract and learn features for further analysis [10]. Machine learning and deep learning have been extensively applied to medical image analysis and have succeeded in image classification, object detection and segmentation [11], [12], [13], [14], [15], [16]. In the past five years, several machine learning and deep learning methods have been developed for fully automatic hiatus segmentation on C-plane ultrasound images [13], [14], [15], [16]. In 2019, Li et al. [13] applied convolutional neural networks to segment the levator hiatus automatically with a DSC of 0.964. Most of the studies used a training set and test set split from the same dataset. Van Den Noort et al. [14] trained a deep learning model and tested it with two independent datasets, with mean DSCs of 0.94 and 0.93. The two test sets were collected from the same ultrasound device with the same setting but by different operators. Therefore, there are still considerable similarities between these two test sets. However, algorithm performance is usually affected by different ultrasound systems and different operators in clinical practice. Clinical practice, therefore, requires a more generalized model to adapt to different situations. Thus, the developed model needs to be further evaluated for its robustness and generalizability. This study developed three deep learning algorithms to segment the levator hiatus on C-plane fully automatically rendered ultrasound images. We evaluated the performance with two test sets independently collected by different devices and operators. The performances of algorithms were assessed by comparing with three senior sonographers in the first test set and their generalizability was evaluated in the second test set.

Material and methods

Data

The data used in this study were composed of two independently collected datasets. Both datasets comprise pelvic floor ultrasound data from postpartum women or older women in the Third Affiliated Hospital of Sun Yat-sen University. Informed consent was obtained from all patients. The two datasets were collected by different operators and different ultrasound devices. Access to both datasets was approved by the institutional review board of the Third Affiliated Hospital of Sun Yat-sen University. Dataset1 was collected by Philips EPIQ 7 with a V9-2 transducer and VM 5.2 software platform (Andover, MA, United States), including 540 hiatus C-plane ultrasound images from 90 women in DICOM format. It was split into a training set (360 images from 60 women), a validation set (42 images from 7 women), and a test set (138 images from 23 women). In this study, Dataset1 was used to train the deep learning models and evaluate the performance between algorithms and sonographers. Dataset2 was collected by GE Voluson E8 and E10 (with a RAB 4–8 MHz volume probe) from GE Medical Systems (Tiefenbach, Austria), including 679 hiatus images from 368 women in JPEG format. In this study, Dataset2 was used as an independent test set to further evaluate the algorithms' performances between different ultrasound devices and their generalizability. All data were annotated by three senior sonographers (with more than 10 years of ultrasound experience and more than 5 years of pelvic floor ultrasound experience). The minority obeys the majority mechanism applied to label fusion and obtain the final ground truth.

Algorithm

The ultrasound images were pre-processed by uninformatively cropping the background and resizing to a fixed target size (768 *544) automatically using Python programming language. The images were changed to HSV channel and then to binary with a fixed threshold (80 for grayscale from 0 to 255). The bounding box of this connected region in this binary image is used for image cropping. The training set was augmented in various ways, such as image rotation and zooming, to add data diversity. In this study, three deep learning networks were applied to train hiatus segmentation models. They were U-Net with ResNet34 (UNet-ResNet34), HR-Net, and SegNet. The trained segmentation models were tested in two test sets from Dataset1 (n = 138) and Dataset2 (n = 679). Then, post-processing was applied only to keep the maximum connected region of the segmentation results. Fig. 1 shows the flow chart of algorithm training and testing.

Fig. 1

Flow chart of levator hiatus segmentation.

Flow chart of levator hiatus segmentation. The UNet, HR-Net, and SegNet are shown in Fig. 2. UNet is based on the widely used image segmentation network U-Net architecture [17] and uses the ResNet architecture [18] as the encoder in the downsampling section. This study used U-Net with a pre-trained ResNet34 as the backbone for levator hiatus segmentation. HR-Net starts with a high-resolution convolution stream, gradually adds high-to-low streams, and connects these streams in parallel to realize multiresolution fusion. The essential advantage is that it can maintain high resolution during the whole process [19]. SegNet consists of an encoder network and a corresponding decoder network, using max-pooling indices to up sample [20].

Fig. 2

Architecture of UNet-ResNet34, HR-Net and SegNet.

Architecture of UNet-ResNet34, HR-Net and SegNet. These deep learning networks were implemented on an Ubuntu 18.04 with an NVIDIA GTX 1080Ti GPU and an Intel Core i7-6800 K @ 3.40 GHz CPU using Pytorch. The same loss function and stochastic gradient optimizer were applied to these three deep learning architectures. The loss function combined focal loss [21] and dice loss, the initial learning rate was 1e-3 and the number of epochs was 40. The batch size used for UNet, HR-Net, and SegNet were 6, 3 and 2.

Evaluation metrics

Four metrics were used for algorithm performance evaluation: DSC, HDD (Hausdorff distance), relative error of segmentation area, and absolute error of segmentation area [22], [23], [24], [25]. DSC and HDD are two widely used segmentation metrics. DSC is used to evaluate segmentation's overlapping degree, and HDD is used to assess the maximum distance between the segmentation contour and ground truth contour [23]. The expressions are described below. Considering that the levator hiatus area is an essential diagnostic index in ultrasound pelvic floor assessments [24], [25], the segmentation area's relative error and absolute error were also included in the result evaluation. According to pixel spacing, HDD and absolute area error were adjusted to cm and cm2. For HDD and relative error of area, noninferiority tests and equivalence tests are conducted to analyze the performance between the algorithm and sonographers and the algorithm performance between Dataset1 and Dataset2.

Results

The quantitative results of the test set from Dataset1 are listed in Table 1 (mean ± SD). A1, A2, and A3 represent the manually contoured annotations of three senior sonographers. A voting mechanism fuses the reference ground truth by A1, A2, and A3. For all metrics, the UNet architecture outperformed the other two models. UNet achieved a mean DSC of 0.964 ± 0.02, HR-Net had a mean DSC of 0.930 ± 0.04, and SegNet had a mean DSC of 0.952 ± 0.02. The HDD also indicated that the segmentation output from UNet had the least maximum distance compared with the ground truth. The same result could be concluded by the levator hiatus area's relative error and absolute error. The box plot of the evaluation metrics on Dataset1 is shown in Fig. 3.

Table 1

Performance of UNet-ResNet34, HR-Net, and SegNet on Dataset1.

	DSC	HDD (cm)	Relative error of area (%)	Absolute error of area (cm²)
A1	0.972 ± 0.03	0.25 ± 0.13	4.50 ± 0.06	0.97 ± 1.24
A2	0.968 ± 0.02	0.30 ± 0.15	5.95 ± 0.05	1.28 ± 1.05
A3	0.972 ± 0.03	0.26 ± 0.14	4.46 ± 0.08	0.88 ± 1.20
UNet-ResNet34	0.964 ± 0.02	0.30 ± 0.17	4.40 ± 3.30	0.98 ± 0.89
HR-Net	0.930 ± 0.04	0.55 ± 0.33	9.87 ± 7.54	2.09 ± 1.61
SegNet	0.952 ± 0.02	0.46 ± 0.40	6.04 ± 4.65	1.32 ± 1.16

Fig. 3

Algorithm segmentation and manual contouring performance in Dataset1. A1, A2 and A3 represent three manually contoured annotations.

Performance of UNet-ResNet34, HR-Net, and SegNet on Dataset1. Algorithm segmentation and manual contouring performance in Dataset1. A1, A2 and A3 represent three manually contoured annotations. Considering the significance of these metrics to clinical diagnosis, a noninferiority test is conducted on HDD and the relative error of area to analyze the performance between the algorithms' segmentation results and sonographers' manual contouring results with hypothesis difference values = 0.2 and 0.05, respectively. The p-values are listed in Table 2. For UNet, all the p-values are < 0.05. The UNet's HDD is no more than 0.2 cm higher than senior sonographers, and the area relative error is no more than 5% higher. The same result can be concluded for area relative error of SegNet.

Table 2

P-value of noninferiority test between algorithm and human.

	HDD (cm)a			Relative error of area (%)b
	A1	A2	A3	A1	A2	A3
UNet-ResNet34	0	0	0	0	0	0
HR-Net	1	0.944	0.999	0.689	0.072	0.702
SegNet	0.656	0.127	0.507	0	0	0

H0: Algorithm-Human> = 0.2, H1: Algorithm-Human < 0.2.

H0: Algorithm-Human> = 0.05, H1: Algorithm-Human < 0.05.

P-value of noninferiority test between algorithm and human. H0: Algorithm-Human> = 0.2, H1: Algorithm-Human < 0.2. H0: Algorithm-Human> = 0.05, H1: Algorithm-Human < 0.05. To test the generalization of the models and evaluate their performances on datasets collected from different devices, the trained models were tested on Dataset2. The test results are listed in Table 3. The mean DSC of UNet reached 0.952 ± 0.03, and the HDD was 0.38. Fig. 4 shows the box plot of the evaluation metrics results in two test sets.

Table 3

Performance of UNet-ResNet34, HR-Net, and SegNet on Dataset2.

	DSC	HDD (cm)	Relative error of area (%)	Absolute error of area (cm²)
UNet-ResNet34	0.952 ± 0.03	0.38 ± 0.27	6.40 ± 0.06	1.30 ± 1.68
HR-Net	0.894 ± 0.08	0.91 ± 0.59	14.88 ± 0.16	2.77 ± 2.67
SegNet	0.924 ± 0.06	0.70 ± 0.69	10.5 ± 0.12	2.08 ± 2.61

Fig. 4

Algorithm performance in two test sets.

Performance of UNet-ResNet34, HR-Net, and SegNet on Dataset2. Algorithm performance in two test sets. An equivalence test is conducted on HDD and the relative error of area results between test sets in Dataset1 and Dataset2. Table 4 indicates that UNet's performance can be regarded as equivalent in these two test sets when the hypothesis is an HDD difference less than 0.2 cm, and the relative error of the area difference is less than 5%. The performance results of HR-Net and SegNet cannot be regarded as equivalent based on the same hypothesis.

Table 4

P-value of equivalence test between two test sets.

	HDD (cm)a	Relative error of area (%)b
UNet-ResNet34	0	0
HR-Net	1	0.502
SegNet	0.809	0.187

H0: |metric difference|> = 0.2, H1: |metric difference|< 0.2.

H0: |metric difference|> = 0.05, H1: |metric difference|< 0.05.

P-value of equivalence test between two test sets. H0: |metric difference|> = 0.2, H1: |metric difference|< 0.2. H0: |metric difference|> = 0.05, H1: |metric difference|< 0.05. Fig. 5 shows examples of segmentation results using UNet in the test sets of Dataset1 and Dataset2. The worst-case in Dataset2 has a DSC of only 0.654. Besides this, all other cases had a DSC higher than 0.75. In this specific case, a vast enlargement of levator hiatus led to the loss of pubic symphysis, which made an incorrect contour placement of UNet. With deep learning approaches, the levator hiatus segmentation model takes an average of 2 s to process one case with a GPU and takes 2.4 s with a CPU. It can help reduce pelvic floor ultrasound scanning time compared with the manual outlining levator hiatus approach.

Fig. 5

Segmentation results of UNet-ResNet34 in the 0th, 25th, 50th, 75th and 100th DSC percentiles in the test set from Dataset1 and Dataset2. The green lines represent the ground truth contour; the red lines represent the algorithm segmentation contour.

Discussion

In this study, we developed a deep learning-based method to automatically segment the levator hiatus, which can help optimize the current pelvic floor ultrasound scanning workflow by replacing the manual contouring operation with an automatic approach to save clinicians time and improve scanning efficiency. We implemented three deep learning-based levator hiatus segmentation algorithms and tested their performances with two test datasets. The experimental results show that UNet outperforms the other two models by achieving mean DSCs of 0.964 ± 0.02 and 0.952 ± 0.03 in the two test sets. When using 0.2 cm for HDD and 5% for the relative error of the levator hiatus area as the acceptable difference, UNet can be regarded as acceptable compared with three senior sonographers. The results showed that this fully automatic levator hiatus segmentation algorithm has high accuracy and can assist clinical operation. UNet's performance can also be regarded equivalent in the two different test sets with the same acceptable difference. This indicates that this deep learning approach has good robustness and generalizability. It is possible that the model can be adapted to different clinical situations. In prior studies, Bonmati et al. [15] trained a deep learning algorithm and achieved a DSC of 0.90 in leave-one-patient-out cross-validation in 2018. The training and test sets were split from the same data source in this study. However, algorithm performance is usually affected by the device and operator for pelvic floor ultrasound. Further evaluation with an independent set is needed to demonstrate the algorithm's robustness. In 2019, Van Den Noort F et al. [14] trained U-Net and further evaluated it with an independent test set that achieved a DSC of 0.93. This independent test set was collected by another operator. Still, it used the same ultrasound device and settings, which resulted in many similarities between the training set and the independent test set. Compared with prior works, we created the test set from the same dataset as the training set and evaluated algorithm performance with test sets collected by different ultrasound devices and different operators to simulate different clinical scenes and assess further algorithm robustness. There are more considerable differences between the training set and the independent test set, and thus, a more solid experimental conclusion can be drawn. In 2021, Williams et al. [26] proposed a method to measure levator hiatus area from 3D pelvic ultrasound images, including C-plane extraction and levator hiatus segmentation. Compared with an expert, the mean HDD was 11.26 ± 5.95 mm and the mean area absolute error was 2.66 ± 2.78 cm2. Williams' study covered not only levator hiatus segmentation but also the previous step: C-plane extraction. However, the computer-observer differences were only evaluated with one clinician, leading to a highly operator-dependent. Compared to Williams' study, we focused on levator hiatus segmentation and achieved higher performance and more reliable results. This automatic segmentation tool can be applied to levator hiatus analysis in pelvic floor ultrasound scanning in clinical practice. This DL-based segmentation system has been installed into an experimental ultrasound device to optimize the practice workflow. With the automatic contouring function, there is no need to trace the margin of levator hiatus manually, which is time-consuming. In addition to being integrated into an ultrasound device, it can also serve as an offline post-processing workstation. It may also be generalized to different clinical sites and different ultrasound devices and help reduce inconsistencies between other operators. There are several limitations to this study. First, this is a single-site study, and both datasets were collected from the same hospital and lacked ethnic differences. In the future, a global multisite study can be conducted to add data diversity and further evaluate algorithm robustness between different ethnicities. Second, current segmentation algorithms are applied to 2D C-plane rendered ultrasound images, and a 3D segmentation approach can be explored in future research. Moreover, muscle analysis could also be further explored in addition to levator hiatus dimension analysis. In conclusion, we have developed and evaluated deep learning levator hiatus segmentation algorithms with two independent test sets. The experimental results showed that one deep learning algorithm, UNet-ResNet34, has good performance and generalizability. This automatic levator hiatus segmentation approach may be applied in clinical practice to help shorten examination time and improve consistency.

Funding

The authors did not report any funding source.

CRediT authorship contribution statement

Zeping Huang: Conceptualization, Writing – original draft. Enze Qu: Writing – original draft, Formal analysis, Writing – review & editing. Yishuang Meng: Software, Methodology. Man Zhang: Validation, Formal analysis. Qiuwen Wei: Software, Data curation. Xianghui Bai: Supervision. Xinling Zhang: Conceptualization, Writing – review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

19 in total

1. Automatic segmentation of levator hiatus from ultrasound images using U-net with dense connections.

Authors: Xu Li; Yuan Hong; Dexing Kong; Xinling Zhang
Journal: Phys Med Biol Date: 2019-04-04 Impact factor: 3.609

2. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.

Authors: Vijay Badrinarayanan; Alex Kendall; Roberto Cipolla
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2017-01-02 Impact factor: 6.226

3. Deep High-Resolution Representation Learning for Visual Recognition.

Authors: Jingdong Wang; Ke Sun; Tianheng Cheng; Borui Jiang; Chaorui Deng; Yang Zhao; Dong Liu; Yadong Mu; Mingkui Tan; Xinggang Wang; Wenyu Liu; Bin Xiao
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2020-04-01 Impact factor: 6.226

Review 4. Translabial US and Dynamic MR Imaging of the Pelvic Floor: Normal Anatomy and Dysfunction.

Authors: Luciana P Chamié; Duarte Miguel Ferreira Rodrigues Ribeiro; Angela H M Caiado; Gisele Warmbrand; Paulo C Serafini
Journal: Radiographics Date: 2018 Jan-Feb Impact factor: 5.333

5. Variability of the pubic arch architecture and its influence on the minimal levator hiatus area.

Authors: Ghazaleh Rostaminia; Michael Machiorlatti; S Abbas Shobeiri; Lieschen H Quiroz
Journal: Int J Gynaecol Obstet Date: 2016-04-22 Impact factor: 3.561

6. Automatic Extraction of Hiatal Dimensions in 3-D Transperineal Pelvic Ultrasound Recordings.

Authors: Helena Williams; Laura Cattani; Dominique Van Schoubroeck; Mohammad Yaqub; Carole Sudre; Tom Vercauteren; Jan D'Hooge; Jan Deprest
Journal: Ultrasound Med Biol Date: 2021-09-15 Impact factor: 2.998

7. The effect of levator avulsion on hiatal dimension and function.

Authors: Zeelha Abdool; Ka Lai Shek; Hans Peter Dietz
Journal: Am J Obstet Gynecol Date: 2009-05-08 Impact factor: 8.661

8. Deep learning enables automatic quantitative assessment of puborectalis muscle and urogenital hiatus in plane of minimal hiatal dimensions.

Authors: F van den Noort; C H van der Vaart; A T M Grob; M K van de Waarsenburg; C H Slump; M van Stralen
Journal: Ultrasound Obstet Gynecol Date: 2019-06-26 Impact factor: 7.299

9. Automatic segmentation method of pelvic floor levator hiatus in ultrasound using a self-normalizing neural network.

Authors: Ester Bonmati; Yipeng Hu; Nikhil Sindhwani; Hans Peter Dietz; Jan D'hooge; Dean Barratt; Jan Deprest; Tom Vercauteren
Journal: J Med Imaging (Bellingham) Date: 2018-01-10

Review 10. Three-dimensional/four-dimensional transperineal ultrasound: clinical utility and future prospects.

Authors: Ginevra Salsi; Ilaria Cataneo; Gaia Dodaro; Nicola Rizzo; Gianluigi Pilu; Mar Sanz Gascón; Aly Youssef
Journal: Int J Womens Health Date: 2017-09-12