| Literature DB >> 31664075 |
Hyunkwang Lee1,2, Chao Huang1, Sehyo Yune1, Shahein H Tajmir1, Myeongchan Kim1, Synho Do3.
Abstract
Recent advancements in deep learning for automated image processing and classification have accelerated many new applications for medical image analysis. However, most deep learning algorithms have been developed using reconstructed, human-interpretable medical images. While image reconstruction from raw sensor data is required for the creation of medical images, the reconstruction process only uses a partial representation of all the data acquired. Here, we report the development of a system to directly process raw computed tomography (CT) data in sinogram-space, bypassing the intermediary step of image reconstruction. Two classification tasks were evaluated for their feasibility of sinogram-space machine learning: body region identification and intracranial hemorrhage (ICH) detection. Our proposed SinoNet, a convolutional neural network optimized for interpreting sinograms, performed favorably compared to conventional reconstructed image-space-based systems for both tasks, regardless of scanning geometries in terms of projections or detectors. Further, SinoNet performed significantly better when using sparsely sampled sinograms than conventional networks operating in image-space. As a result, sinogram-space algorithms could be used in field settings for triage (presence of ICH), especially where low radiation dose is desired. These findings also demonstrate another strength of deep learning where it can analyze and interpret sinograms that are virtually impossible for human experts.Entities:
Year: 2019 PMID: 31664075 PMCID: PMC6820559 DOI: 10.1038/s41598-019-51779-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary of the 12 different models evaluated in this study.
| Fully sampled | Moderately sampled | Sparsely sampled |
|---|---|---|
Inception-v3 was used for analyzing reconstructed images (I1–I6), and SinoNet was used for interpreting singorams (S1–S6). ‘sino360x729’, ‘sino120x240’, and ‘sino40x80’ represent sinograms with 360 projection views by 729 detector pixels, 120 projection views by 240 detector pixels, and 40 projection views by 80 detector pixels, respectively. ‘wsino360x729’, ‘wsino120x240’, and ‘wsino40x80’ represent sinograms created from windowed CT images. ‘recon360x729’, ‘recon120x240’, and ‘recon40x80’ are images reconstructed from the corresponding sinograms, and ‘wrecon360x729’, ‘wrecon120x240’, and ‘wrecon40x80’ were reconstructed images from the windowed sinograms.
Figure 1Performance of 12 different models trained on reconstruction images and sinograms with varying numbers of projections and detectors for body part recognition. 95% confidence intervals (CIs) are indicated in black error bars. The purple and blue bars (I1–I6) compare the test accuracy of Inception-v3 trained with full dynamic range reconstructed images with abdominal window setting reconstructed images (window-level = 40 HU, window-width = 400 HU). The green and red bars (S1–S6) compare the performance of SinoNet models trained with sinograms generated from full-range and windowed reconstructed images, respectively.
Figure 2ROC curves for performance of 12 different models trained with reconstruction images and sinograms with various sparsity configurations in numbers of projections and detectors. The purple and blue curves (I1–I6) correspond to performance of Inception-v3 trained with reconstruction images with a full dynamic range of HU values and brain window setting (window-level = 50 HU, window-width = 100 HU), respectively. The green and red curves (S1–S6) show performance of SinoNet models trained with sinograms generated from full-range and windowed reconstruction images, respectively. The areas under the curve (AUCs) for the 12 models are present in legends with their 95% CIs. Statistical significance of the difference between AUCs of paired models (Ix - Sx) was evaluated. n.s., p > 0.05; *p < 0.05; **p < 0.01.
Comparison of Inception-v3 and SinoNet network performance when both networks are trained on full-range sinograms are varying sampling densities for body part recognition and intracranial hemorrhage (ICH) detection.
| Input | Body part recognition (Test accuracy) | ICH detection (AUC) | ||
|---|---|---|---|---|
| Inception-v3 | SinoNet | Inception-v3 | SinoNet | |
|
| 93.9% (93.4%–94.4%) | 96.6% (96.2%–96.9%) | 0.873 (0.849–0.895) | 0.918* (0.899–0.935) |
|
| 93.5% (93.0%–94.0%) | 96.3% (95.9%–96.7%) | 0.874 (0.851–0.896) | 0.915* (0.897–0.932) |
|
| 93.4% (92.9%–93.9%) | 96.2% (95.8%–96.6%) | 0.852 (0.828–0.876) | 0.899* (0.879–0.917) |
Body part recognition is reported in accuracy. ICH detection as AUC. 95% CIs in parentheses. *p < 0.0001.
Figure 3Examples of reconstructed images and sinograms with different labels for (a), body part recognition and (b), ICH detection. From left to right: original CT images, windowed CT images, sinograms with 360 projections by 729 detector pixels, and windowed sinograms 360 × 729. In the last row, an example CT with hemorrhage is annotated with a dotted circle in image-space with the region of interest converted into the sinogram domain using Radon transform. This area is highlighted in red on the sinogram in the fifth column.
Distribution of training, validation, and test datasets for body part recognition.
| Train | Validation | Test | |
|---|---|---|---|
| No. Cases | 140 | 30 | 30 |
| No. Images | 39,472 | 8,383 | 8,479 |
| L1: Head | 1,980 | 483 | 435 |
| L2: Eye lens | 878 | 189 | 188 |
| L3: Nose | 1,449 | 309 | 323 |
| L4: Salivary gland | 1,803 | 361 | 349 |
| L5: Thyroid | 1,508 | 312 | 333 |
| L6: Upper lung | 1,632 | 345 | 392 |
| L7: Thymus | 3,213 | 727 | 672 |
| L8: Heart | 3,360 | 707 | 762 |
| L9: Chest | 4,647 | 914 | 935 |
| L10: Upper abdomen | 4,943 | 1,008 | 1,103 |
| L11: Lower abdomen | 1,736 | 342 | 368 |
| L12: Upper pelvis | 2,524 | 617 | 545 |
| L13: Lower pelvis | 2,230 | 563 | 422 |
| L14: Bladder | 3,144 | 609 | 766 |
| L15: Upper leg | 2,607 | 563 | 532 |
| L16: Lower leg | 1,818 | 334 | 354 |
Distribution of training, validation, and test datasets for ICH detection.
| Train | Validation | Test | ||||
|---|---|---|---|---|---|---|
| No. Cases | No. Images | No. Cases | No. Images | No. Cases | No. Images | |
| No ICH | 141 | 2,202 | 30 | 474 | 30 | 475 |
| ICH | 337 | 1,915 | 91 | 490 | 91 | 475 |
| Total | 478 | 4,117 | 121 | 964 | 121 | 950 |
Figure 4(a) Schematic of sinogram generation with 360 projection views and 729 detectors (‘sino360x729’) from original CT images (converted into linear attenuation coefficients). (b) Sparse sinograms were created from ‘sino360x729’ by downsampling in the horizontal dimension and signal averaging in the vertical dimension to simulate the effect of acquiring an image with 120 projection views and 240 detectors (‘sino120x240’) or an image with 40 projection views and 80 detectors (‘sino40x80’). R, Radon transform.
Figure 5(a) Overall network architecture of SinoNet. (b) Detailed network diagram within the Inception modules that include rectangular convolutional filters and pooling layers. The modified Inception module contains multiple rectangular convolution filters of varying sizes: height-wise rectangular filters (projection dominant) in red; width-wise rectangular filters (detector dominant) in orange; “Conv3x3/s2” indicates a convolutional layer with 3 × 3 filters and 2 stride, and “Conv3x2” means a convolution layer with 3 × 2 filters and 1 stride. (c) Dense-Inception layers contain two densely connected Inception modules. (d) Transition modules situated between Dense-Inception modules reduce the size of feature maps. Conv = convolution layer, MaxPool = max pooling layer, AvgPool = average pooling layer.