| Literature DB >> 34349135 |
Manuel Schultheiss1,2, Philipp Schmette3, Jannis Bodden4, Juliane Aichele4, Christina Müller-Leisse4, Felix G Gassert4, Florian T Gassert4, Joshua F Gawlitza4, Felix C Hofmann4, Daniel Sasse4, Claudio E von Schacky4, Sebastian Ziegelmayer4, Fabio De Marco3, Bernhard Renger4, Marcus R Makowski4, Franz Pfeiffer3,4, Daniela Pfeiffer4.
Abstract
We present a method to generate synthetic thorax radiographs with realistic nodules from CT scans, and a perfect ground truth knowledge. We evaluated the detection performance of nine radiologists and two convolutional neural networks in a reader study. Nodules were artificially inserted into the lung of a CT volume and synthetic radiographs were obtained by forward-projecting the volume. Hence, our framework allowed for a detailed evaluation of CAD systems' and radiologists' performance due to the availability of accurate ground-truth labels for nodules from synthetic data. Radiographs for network training (U-Net and RetinaNet) were generated from 855 CT scans of a public dataset. For the reader study, 201 radiographs were generated from 21 nodule-free CT scans with altering nodule positions, sizes and nodule counts of inserted nodules. Average true positive detections by nine radiologists were 248.8 nodules, 51.7 false positive predicted nodules and 121.2 false negative predicted nodules. The best performing CAD system achieved 268 true positives, 66 false positives and 102 false negatives. Corresponding weighted alternative free response operating characteristic figure-of-merits (wAFROC FOM) for the radiologists range from 0.54 to 0.87 compared to a value of 0.81 (CI 0.75-0.87) for the best performing CNN. The CNN did not perform significantly better against the combined average of the 9 readers (p = 0.49). Paramediastinal nodules accounted for most false positive and false negative detections by readers, which can be explained by the presence of more tissue in this area.Entities:
Year: 2021 PMID: 34349135 PMCID: PMC8339004 DOI: 10.1038/s41598-021-94750-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Synthetic radiographs, ground truth masks and results of reader and computer-based detection. (A) Synthetic input radiograph as shown to the reader and evaluated by the CNNs (B) corresponding ground-truth radiograph with nodules marked green (C) center position of nodules marked by a reader (D) U-Net prediction (E) RetinaNet bounding-box predictions with scores.
Figure 2Localization of false negative and false positive predictions in the reader study. Backgrounds were determined by averaging over all reader study radiographs. (A) All inserted nodules of different sizes in all radiographs marked blue. (B–J) False negative and false positive predictions by reader. (K) Location of false negative predictions of RetinaNet and false positive predictions of RetinaNet. (L) False negative predictions of U-Net and false positive predictions of U-Net.
True positives (TP), false positives (FP) and true negatives (FN) for RetinaNet, U-Net and the readers. For RetinaNet a nodule with a confidence score greater than 0.5 was counted as positive.
| TP | FP | FN | |
|---|---|---|---|
| RetinaNet | 268 | 66 | 102 |
| U-Net | 256 | 279 | 114 |
| Reader 1 | 244 | 5 | 126 |
| Reader 2 | 278 | 15 | 92 |
| Reader 3 | 207 | 29 | 163 |
| Reader 4 | 185 | 9 | 185 |
| Reader 5 | 201 | 9 | 169 |
| Reader 6 | 294 | 35 | 76 |
| Reader 7 | 273 | 52 | 97 |
| Reader 8 | 281 | 276 | 89 |
| Reader 9 | 276 | 35 | 94 |
Figure of merits (FOM), 95% confidence intervals (CI) and standard error (StdErr) for wAFROC metrics. The weighted lesion localization fraction (wLLF score) was retrieved at an x-axis 0.2 operating point for all readers and CNNs on the wAFROC curve.
| wAFROC | |||||
|---|---|---|---|---|---|
| FOM | CI Lower | CI Upper | StdErr | wLLF | |
| RetinaNet | 0.81 | 0.75 | 0.87 | 0.028 | 0.71 |
| U-Net | 0.58 | 0.47 | 0.68 | 0.052 | 0.41 |
| Reader 1 | 0.82 | 0.79 | 0.86 | 0.017 | 0.71 |
| Reader 2 | 0.87 | 0.83 | 0.90 | 0.017 | 0.79 |
| Reader 3 | 0.74 | 0.68 | 0.79 | 0.029 | 0.57 |
| Reader 4 | 0.74 | 0.70 | 0.78 | 0.022 | 0.59 |
| Reader 5 | 0.78 | 0.75 | 0.81 | 0.015 | 0.65 |
| Reader 6 | 0.87 | 0.83 | 0.91 | 0.020 | 0.79 |
| Reader 7 | 0.83 | 0.79 | 0.88 | 0.022 | 0.74 |
| Reader 8 | 0.54 | 0.44 | 0.63 | 0.048 | 0.31 |
| Reader 9 | 0.84 | 0.79 | 0.88 | 0.023 | 0.75 |
Figure 3Comparison of CNN and reader based diagnostic performance. (A) FROC plot with lesion localization fraction (LLF) plotted against non lesion fraction (NLF) (B) wAFROC plot with weighted LLF on the ordinate. The plot was generated using RJafroc[26].
Figure 4Comparision of detection performance by nodule size (A) Relative true positive fraction (B) absolute number of true positives for RetinaNet CNN, U-Net CNN and readers R1–R9. The plot was generated using Matplotlib[27].
Figure 5Workflow for generating synthetic radiographs containing tumour nodules with perfect ground truth knowledge. Based on natural shapes, various sizes of tumors are generated and subsequently inserted into clean CT scans and different locations. The 3D CT data set is then forward projected to generate p.a. thorax radiographs. In parallel, the tumors only are forward projected to obtain perfect ground-truth masks. These ground-truth masks is later used to compare the radiologist’s findings with the expected findings.
Figure 6Projected segmentations of 19 nodules, which were artificially inserted into the synthetic radiographs.
Figure 7Architecture of the utilized U-Net. Input and output were a single channel 512 512 matrix. Downsampling was performed using convolutional layers with a stride of 2. A second network (lesion scoring network) was used to retrieve a per-lesion score of the segmented nodules. Numbers above layers indicate convolution filters for convolution layers and number of neurons for dense layers.