| Literature DB >> 35782963 |
Nicolette Taku1, Kareem A Wahid1, Lisanne V van Dijk1,2, Jaakko Sahlsten3, Joel Jaskari3, Kimmo Kaski2, Clifton D Fuller1, Mohamed A Naser1.
Abstract
Purpose: Segmentation of involved lymph nodes on head and neck computed tomography (HN-CT) scans is necessary for the radiotherapy planning of early-stage human papilloma virus (HPV) associated oropharynx cancers (OPC). We aimed to train a deep learning convolutional neural network (DL-CNN) to segment involved lymph nodes on HN-CT scans.Entities:
Year: 2022 PMID: 35782963 PMCID: PMC9240370 DOI: 10.1016/j.ctro.2022.06.007
Source DB: PubMed Journal: Clin Transl Radiat Oncol ISSN: 2405-6308
Fig. 1Schematic representation of the pre-processing workflow. Head and neck computed tomography scans were cropped using the mandible, sternum, and external contours as boundaries (A & B). Scans were divided into 4 patches of 96 × 96 × 96 voxels in dimension (C).
Fig. 2Schematic representation of the U-Net architecture implemented for the deep learning convolutional neural network with annotations pertaining to the number of channels, batch normalization (BN) layers, and Parametric Rectified Linear Unit (PReLU) layers.
Fig. 3Five sub-model testing segmentation masks and one consensus segmentation mask were generated for each head and neck computed tomography scan in the testing dataset. The red contour corresponds to the ground-truth masks, the blue contours correspond to the sub-model testing segmentation masks, and the yellow contour corresponds to the consensus segmentation mask generated by combing the 5 sub-model testing segmentation masks using the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Patient and tumor clinical characteristics for all patients (N = 110), patients in the training/validation dataset (n = 70), and patients in the testing dataset (n = 40).
| n (%) | n (%) | n (%) | |
|---|---|---|---|
| Median age [IQR], y | 60 [53–65] | 60 [54–65] | 59 [53–67] |
| Sex | |||
| Male | 101 (91.8) | 66 (94.3) | 35 (87.5) |
| Female | 9 (8.2) | 4 (5.7) | 5 (12.5) |
| Smoking Status | |||
| Never | 72 (65.5) | 44 (62.9) | 28 (70.0) |
| Former | 34 (30.9) | 24 (34.3) | 10 (25.0) |
| Current | 4 (3.6) | 2 (2.9) | 2 (5.0) |
| Oropharynx subsite | |||
| Base of tongue | 51 (46.4) | 38 (54.3) | 13 (32.5) |
| Tonsil | 59 (53.6) | 32 (45.7) | 27 (67.5) |
| Clinical tumor classification | |||
| cT1 | 63 (57.3) | 44 (62.9) | 19 (47.5) |
| cT2 | 47 (42.7) | 26 (37.1) | 21 (52.5) |
| Clinical lymph node classification | |||
| cN0 | 20 (18.2) | 0 (0.0) | 20 (50.0) |
| cN1 | 90 (81.8) | 70 (100.0) | 20 (50.0) |
| Median number of removed lymph nodes [IQR] | 26 [21–34.8] | 26.5 [21–35] | 26 [23–29] |
| Number of involved lymph nodes | |||
| 0 | 20 (18.2) | 0 (0.0) | 20 (50.0) |
| 1 | 68 (61.8) | 53 (75.7) | 16 (40.0) |
| 2 | 18 (16.4) | 16 (22.9) | 2 (5.0) |
| 3 | 3 (2.7) | 0 (0) | 2 (5.0) |
| 4 | 1 (0.9) | 1 (1.4) | 0 (0.0) |
| Median individual lymph node volume [IQR], cc | 6.7 [2.8–10.3] | 7.7 [3.0–11.1] | 5.3 [2.1–8.5] |
| Median ground truth segmentation mask volume [IQR], cc | 8.4 [5.6–12.9] | 9.6 [7.3–14.9] | 6.6 [2.2–15.5] |
Abbreviations: IQR, interquartile range; cc, cubic centimeters; y, years.
Minimum, maximum, median, interquartile range values for the overlap-based (Dice similarity coefficient) and volume-based (volume similarity) metrics for the sub-model validation segmentation masks when compared to the ground-truth masks.
| Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.83 | 0.94 | 0.91 | 0.88–0.92 | 0.81 | 0.96 | 0.92 | 0.90–0.94 | 0.83 | 0.94 | 0.91 | 0.88–0.93 | 0.80 | 0.95 | 0.91 | 0.88–0.94 | 0.67 | 0.97 | 0.90 | 0.85–0.91 | |
| 0.84 | 0.99 | 0.96 | 0.95–0.98 | 0.86 | 1.00 | 0.96 | 0.94–0.98 | 0.85 | 0.99 | 0.97 | 0.93–0.98 | 0.80 | 0.99 | 0.95 | 0.93–0.97 | 0.70 | 1.00 | 0.95 | 0.92–0.97 | |
Abbreviations: DSC, Dice similarity coefficient; IQR, interquartile range; Max., maximum; Min., minimum; VS, volume similarity.
Minimum, maximum, median, interquartile range values for the overlap-based (Dice similarity coefficient), volume-based (volume similarity), spatial distance-based (Hausdorff distance), and probabilistic-based (Cohen Kappa Coefficient) metrics for the sub-model testing segmentation masks and consensus segmentation masks when compared ground-truth masks.
| Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | Min. | Max. | Med. | IQR | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.55 | 0.95 | 0.92 | 0.89–0.94 | 0.66 | 0.95 | 0.92 | 0.88–0.94 | 0.58 | 0.95 | 0.92 | 0.87–0.94 | 0.62 | 0.96 | 0.91 | 0.87–0.94 | 0.69 | 0.96 | 0.92 | 0.88–0.94 | 0.61 | 0.96 | 0.92 | 0.89–0.95 | |
| 0.64 | 1.00 | 0.97 | 0.95–0.98 | 0.70 | 1.00 | 0.96 | 0.92–0.99 | 0.59 | 1.00 | 0.97 | 0.93–0.99 | 0.73 | 1.00 | 0.97 | 0.91–0.99 | 0.72 | 1.00 | 0.97 | 0.92–0.99 | 0.68 | 1.00 | 0.97 | 0.94–0.99 | |
| 1.11 | 92.0 | 4.92 | 1.11–16.0 | 1.22 | 90.0 | 5.78 | 1.22–17.4 | 1.65 | 86.9 | 5.08 | 1.64–18.5 | 1.22 | 90.7 | 4.15 | 1.22–9.04 | 1.22 | 91.4 | 5.56 | 1.22–11.7 | 1.22 | 90.9 | 4.52 | 1.22–8.38 | |
| 0.55 | 0.95 | 0.92 | 0.89–0.94 | 0.66 | 0.95 | 0.92 | 0.88–0.94 | 0.58 | 0.95 | 0.92 | 0.87–0.95 | 0.62 | 0.96 | 0.91 | 0.87–0.94 | 0.69 | 0.96 | 0.92 | 0.88–0.94 | 0.61 | 0.96 | 0.92 | 0.89–0.95 | |
Abbreviations: CKC, Cohen Kappa Coefficient; DSC, Dice similarity coefficient; HD, Hausdorff distance (in mm); IQR, interquartile range; Max., maximum; Min., minimum; STAPLE, Simultaneous Truth and Performance Level Estimation; VS, volume similarity.
Fig. 4Comparison of consensus segmentation masks (yellow) to ground-truth masks (red) for a subset of testing dataset patients with greater or equal Dice similarity coefficients (A, B, C; 1 involved lymph node, 3 involved lymph nodes, and 2 involved lymph nodes, respectively), slightly lower Dice similarity coefficients (D, E; 2 involved lymph nodes and 1 involved lymph node, respectively), and much lower Dice similarity coefficient (F; 1 involved lymph node) than the median Dice similarity coefficient value of 0.92. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 5Receiver operating characteristic curves for node-positive versus node-negative head and neck computed tomography scan discrimination comparing three resampled image resolutions (High, 1.0 mm; Medium, 1.5 mm; and Low, 2.0 mm) and their corresponding confusion matrices.
Physician observer evaluations of manually generated (i.e., “human”) and model generated (i.e., “computer”) segmentations.
| Computer | Human | Computer | Human | Computer | Human | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| n (%) | n (%) | n (%) | n (%) | n (%) | n (%) | ||||||
| 0.83 | 0.66 | 0.72 | |||||||||
| Clinically acceptable, highly accurate | 13 (0.65) | 13 (0.65) | Computer | 8 (0.40) | 9 (0.45) | Very confident | 2 (0.10) | 6 (0.30) | |||
| Clinically acceptable, errors not significant | 3 (0.15) | 3 (0.15) | Human | 12 (0.6) | 11 (0.55) | Somewhat confident | 14 (0.70) | 9 (0.45) | |||
| Requires correction, minor errors | 2 (0.10) | 3 (0.15) | Somewhat unconfident | 4 (0.20) | 4 (0.20) | ||||||
| Requires correction, large errors | 2 (0.10) | 1 (0.05) | Very unconfident | 0 (0.0) | 1 (0.05) | ||||||
| 0.75 | 0.82 | 0.046 | |||||||||
| Clinically acceptable, highly accurate | 4 (0.20) | 8 (0.4) | Computer | 11 (0.55) | 11 (0.55) | Very confident | 0 (0.0) | 1 (0.05) | |||
| Clinically acceptable, errors not significant | 8 (0.40) | 10 (0.50) | Human | 9 (0.45) | 9 (0.45) | Somewhat confident | 4 (0.20) | 9 (0.45) | |||
| Requires correction, minor errors | 6 (0.30) | 1 (0.05) | Somewhat unconfident | 15 (0.75) | 10 (0.50) | ||||||
| Requires correction, large errors | 2 (0.10) | 1 (0.05) | Very unconfident | 1 (0.05) | 0 (0.0) | ||||||
| 0.41 | 0.26 | 0.28 | |||||||||
| Clinically acceptable, highly accurate | 14 (0.70) | 10 (0.50) | Computer | 7 (0.35) | 7 (0.35) | Very confident | 0 (0.0) | 0 (0.0) | |||
| Clinically acceptable, errors not significant | 5 (0.25) | 10 (0.50) | Human | 13 (0.65) | 13 (0.65) | Somewhat confident | 3 (0.15) | 1 (0.0) | |||
| Requires correction, minor errors | 1 (0.05) | 0 (0.0) | Somewhat unconfident | 7 (0.35) | 6 (0.30) | ||||||
| Requires correction, large errors | 0 (0.0) | 0 (0.0) | Very unconfident | 10 (0.50) | 13 (0.65) | ||||||