| Literature DB >> 31595006 |
Sripad Krishna Devalla1, Giridhar Subramanian1, Tan Hung Pham1,2, Xiaofei Wang1,3, Shamira Perera2,4, Tin A Tun1,2, Tin Aung2,4, Leopold Schmetterer2,5,6,7, Alexandre H Thiéry8, Michaël J A Girard9,10.
Abstract
Optical coherence tomography (OCT) has become an established clinical routine for the in vivo imaging of the optic nerve head (ONH) tissues, that is crucial in the diagnosis and management of various ocular and neuro-ocular pathologies. However, the presence of speckle noise affects the quality of OCT images and its interpretation. Although recent frame-averaging techniques have shown to enhance OCT image quality, they require longer scanning durations, resulting in patient discomfort. Using a custom deep learning network trained with 2,328 'clean B-scans' (multi-frame B-scans; signal averaged), and their corresponding 'noisy B-scans' (clean B-scans + Gaussian noise), we were able to successfully denoise 1,552 unseen single-frame (without signal averaging) B-scans. The denoised B-scans were qualitatively similar to their corresponding multi-frame B-scans, with enhanced visibility of the ONH tissues. The mean signal to noise ratio (SNR) increased from 4.02 ± 0.68 dB (single-frame) to 8.14 ± 1.03 dB (denoised). For all the ONH tissues, the mean contrast to noise ratio (CNR) increased from 3.50 ± 0.56 (single-frame) to 7.63 ± 1.81 (denoised). The mean structural similarity index (MSSIM) increased from 0.13 ± 0.02 (single frame) to 0.65 ± 0.03 (denoised) when compared with the corresponding multi-frame B-scans. Our deep learning algorithm can denoise a single-frame OCT B-scan of the ONH in under 20 ms, thus offering a framework to obtain superior quality OCT B-scans with reduced scanning times and minimal patient discomfort.Entities:
Mesh:
Year: 2019 PMID: 31595006 PMCID: PMC6783551 DOI: 10.1038/s41598-019-51062-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Single-frame (A), denoised (B), and multi-frame (C) B-scans for a healthy subject are shown. The denoised B-scan can be observed to be qualitatively similar to its corresponding multi-frame B-scan. Specifically, the visibility of the retinal layers, and choroid, and lamina cribrosa were prominently improved. Sharp and clear boundaries were also obtained for retinal layers, and the choroid-scleral interface.
Figure 2Single-frame, denoised and multi-frame B-scans for four healthy subjects (1–4) are shown. The signal to noise ratio (SNR), contrast to noise ratio (CNR; mean of all tissues) and the structural similarity index (SSIM) for the respective B-scans are shown as well. In all cases, the denoised B-scans (2nd column) were consistently similar (qualitatively) to their corresponding multi-frame B-scans (3rd column).
Mean CNR for all ONH tissues computed for the single-frame, denoised and multi-frame B-scans.
| Tissue | Single-frame | Denoised | Multi-frame |
|---|---|---|---|
| RNFL | 2.97 ± 0.42 | 7.28 ± 0.63 | 5.18 ± 0.76 |
| GCL + IPL | 3.83 ± 0.43 | 12.09 ± 4.22 | 11.62 ± 1.85 |
| All other retinal layers | 2.71 ± 0.33 | 5.61 ± 1.46 | 4.62 ± 0.86 |
| RPE | 5.62 ± 0.72 | 9.25 ± 2.25 | 8.10 ± 1.44 |
| Choroid | 2.99 ± 0.43 | 5.99 ± 0.45 | 5.75 ± 0.63 |
| Sclera | 2.42 ± 0.39 | 6.40 ± 1.68 | 6.00 ± 0.96 |
| LC | 4.02 ± 1.23 | 6.81 ± 1.99 | 6.46 ± 1.81 |
Figure 3Single-frame (A), denoised (with data augmentation) (B), denoised (without data augmentation) (C), and multi-frame (D) B-scans for a healthy subject are shown. The denoised B-scan (B) obtained from a network trained with data augmentation can be observed to be qualitatively similar to its corresponding multi-frame B-scan (D). However, when trained with limited training data (without data augmentation), although the network is able to reduce the speckle noise primarily, the denoised B-scan is over-smoothened (C) with smudged and unclear tissue boundaries.
Figure 4The architecture comprised of two towers: (1) A downsampling tower – to capture the contextual information (i.e., spatial arrangement of the tissues), and (2) an upsampling tower – to capture the local information (i.e., tissue texture). Each tower consisted of two blocks: (1) a standard block, and (2) a residual block. The latent space was implemented as a standard block. The multi-scale hierarchical feature extraction unit helped better recover tissue edges eroded by speckle noise. The network consisted of 900 k trainable parameters.
Figure 5An exhaustive offline data augmentation was done to circumvent the scarcity of training data. (A–E) represent the original and the data augmented ‘clean’ B-scans (multi-frame). (F–J) Represent the same for the corresponding ‘noisy’ B-scans. The occluding patches (B,G; red boxes) were added to make the network robust in the presence of blood vessel shadows. Elastic deformations (C,H; cyan boxes) were used to make the network invariant to atypical morphologies[104]. A total of 23,280 B-scans of each type (clean/noisy) were generated from 2,328 baseline B-scans.