| Literature DB >> 32425358 |
Maram Mahmoud A Monshi1, Josiah Poon2, Vera Chung2.
Abstract
Substantial progress has been made towards implementing automated radiology reporting models based on deep learning (DL). This is due to the introduction of large medical text/image datasets. Generating radiology coherent paragraphs that do more than traditional medical image annotation, or single sentence-based description, has been the subject of recent academic attention. This presents a more practical and challenging application and moves towards bridging visual medical features and radiologist text. So far, the most common approach has been to utilize publicly available datasets and develop DL models that integrate convolutional neural networks (CNN) for image analysis alongside recurrent neural networks (RNN) for natural language processing (NLP) and natural language generation (NLG). This is an area of research that we anticipate will grow in the near future. We focus our investigation on the following critical challenges: understanding radiology text/image structures and datasets, applying DL algorithms (mainly CNN and RNN), generating radiology text, and improving existing DL based models and evaluation metrics. Lastly, we include a critical discussion and future research recommendations. This survey will be useful for researchers interested in DL, particularly those interested in applying DL to radiology reporting.Entities:
Keywords: Convolutional neural network; Deep learning; Natural language processing; Radiology; Recurrent neural network
Year: 2020 PMID: 32425358 PMCID: PMC7227610 DOI: 10.1016/j.artmed.2020.101878
Source DB: PubMed Journal: Artif Intell Med ISSN: 0933-3657 Impact factor: 5.326
Fig. 1Example of a radiology report and associated images (obtained from an IU X-ray) [21].
Fig. 2Radiology imaging modalities and characteristics. Note: X-ray (a), CT (b), MRI (c), US (d), image characteristics (e).
Radiology image/text dataset (available online).
| Dataset | Description | Base annotation | Employed by |
|---|---|---|---|
| IU X-Ray | 7470 chest x-rays | Thorax diseases | [ |
| ChestX-ray14 | 112,120 chest x-rays | Atelectasis, consolidation, infiltration, pneumothorax, edema, emphysema, fibrosis, effusion, pneumonia, pleural thickening, cardiomegaly, nodule, mass and hernia | [ |
| CheXpert | 224,316 chest x-rays | No finding, enlarged cardamom, cardiomegaly, lung opacity, lung lesion, edema, consolidation, pneumonia, atelectasis, pneumothorax, pleural effusion, pleural other, fracture, support devices | – |
| MIMIC-CXR | 371,920 chest x-rays | [ | |
| PadChest | 160,868 chest x-rays | 174 radiology findings, 19 diagnoses and 104 anatomic locations | [ |
| PEIR Digital Library | 4732 images in 20 categories | Multiple (e.g. abdomen, adrenal, aorta, breast, chest, heads and kidney) | [ |
| DDSM | 2620 breast mammography | Normal, benign and malignant | [ |
https://openi.nlm.nih.gov/faq.php.
https://nihcc.app.box.com/v/ChestXray-NIHCC.
https://stanfordmlgroup.github.io/competitions/chexpert/.
https://archive.physionet.org/physiobank/database/mimiccxr/.
http://bimcv.cipf.es/bimcv-projects/padchest/.
http://peir.path.uab.edu/library/index.php?/category/106.
http://marathon.csee.usf.edu/Mammography/Database.html.
Fig. 3Deep learning.
Activation function for DL.
| Name | Equation | Plot | Characteristics |
|---|---|---|---|
| Sigmoid | Range [0,[ | ||
| TanH | Range [-1, 1] | ||
| ReLU [ | It doesn’t saturate | ||
| leaky ReLU [ | Overcome dead ReLU problem | ||
CNN architectures (ILSVRC winners).
| Winer by year | No. of conv. layers | Top-5 error rate (%) |
|---|---|---|
| 2012 - AlexNet [ | 8 | 16.4 |
| 2013 - ZFNet [ | 8 | 11.7 |
| 2014 second - VGG-16 [ | 16 | 7.4 |
| 2014 first - GoogLeNet [ | 22 | 6.67 |
| 2015 - ResNet [ | 152 | 3.57 |
| 2016 second – ResNeXt [ | 101 | 3.03 |
| 2016 first – CUImage Team [ | 152 | 2.99 |
| 2017 - SENets [ | 152 | 2.25 |
Fig. 4Framework of the radiology reporting models.
DL models for generating radiology report.
| Model | Proposed by | Image Modality | Dataset | Organ | Pathology | Software | CNN Architecture | Base | |
|---|---|---|---|---|---|---|---|---|---|
| Technique | Task | ||||||||
| Deep mining model | Shin, et al. [ | CT | PACS of NIH clinical centre [ | Multiple (e.g., neck, bone, liver, brain and heart) | Multiple (e.g. adenopathy, metastasis and sinus diseases) | Caffe [ | AlexNet [ | LDA & RNN | Generate semantic labels |
| CNN | Map from images to label spaces | ||||||||
| LDPO: looped deep pseudo task optimization network | Wang, et al. [ | Caffe [ | AlexNet [ | CNN | Initialize looped optimization | ||||
| K-means/RIM | Cluster images | ||||||||
| NLP | Extracts semantically relevant words | ||||||||
| PCA | Reduce dimensionality | ||||||||
| CNN-based classification model | Dong, et al. [ | X-Ray | PACS of the fourth people’s hospital (Chinees reports) | Chest | 9 diseases (e.g. emphysema & bronchitis) | Caffe [ | VGG-16 [ | NLP | Extract disease labels from reports |
| CNN | Classify images | ||||||||
| RNN | Describe a detected disease | ||||||||
| CheXNet | Rajpurkar, et al. [ | ChestX-ray14 [ | Pneumonia & 13 other pathologies | - | DenseNet [ | CNN | Classify images | ||
| CAM [ | Produce heatmaps | ||||||||
| ChestNet | Wang and Xia [ | Caffe [ | Resnet-152 [ | CNN | Perform feature extraction-classification | ||||
| Attention mechanism (Grad-CAM [ | Exploits correlation between class labels & pathology locations | ||||||||
| DualNet | Rubin, et al. [ | MIMIC-CXR [ | 14 Thorax diseases (e.g pneumonia & edema) | PyTorch [82] | DenseNet-121 [ | NLP (NegBio [ | Map reports into UMLS concept ids | ||
| CNN | Recognize multiple diseases | ||||||||
| Multi-view model | Monshi, et al. [ | 12 Thorax diseases | Resnet-50 [ | CNN | Detect diseases | ||||
| discriminative learning rates [ | Tune each layer with various learning rates | ||||||||
|
| |||||||||
| Recurrent neural cascade model | Shin, et al. [ | X-Ray | IU X-Ray [ | Chest | Thorax diseases (e.g. cardiomegaly, and granuloma) | - | NIN [ | CNN | Classify images |
| LSTM-RNN [ | Describe disease contexts | ||||||||
| Multi-task-loss CNN model | Kisilev, et al. [ | Mammograph | DDSM | Breast | Tumour | Caffe [79] | AlexNet (5 conv. layers) [ | CNN | Produce ranked ROI |
| Multi-task learning model | Jing, et al. [ | Multiple | PEIR Gross | 21 organ categories (e.g. kidney) | Multiple | - | VGG-19 [ | CNN | Learn visual features |
| MLC | Predict relevant tags | ||||||||
|
| |||||||||
| Multi-task learning model | Jing, et al. [ | X-Ray | IU X-Ray [ | Chest | Thorax diseases | - | VGG-19 [ | CNN | Learn visual features |
| hierarchical LSTM | Generate long paragraphs | ||||||||
| MLC | Predict relevant tags | ||||||||
| Multimodal recurrent model with attention | Xue, et al. [ | IU X-Ray [ | - | Resnet-152 [ | CNN | Extract visual features | |||
| Single layer LSTM | Sentence decoding | ||||||||
| Bi-LSTM and ID CNN | Sentence encoding | ||||||||
| TieNet: text-image embedding network | Wang, et al. [ | IU X-Ray [ | TensorFlow [ | ResNet-50 [ | NLP | Mine disease labels | |||
| CNN-RNN | Link words with image regions | ||||||||
| LSTM-RNN | Produce reports | ||||||||
| HRGR-Agent: hybrid retrieval-generation reinforced agent | Li, et al. [ | IU X-Ray [ | PyTorch [ | DensNet [ | CNN | Extract visual features | |||
Quantitative evaluation of generated radiology reports based on DL models.
| Model | Database | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | METEOR | ROUGH | ROUGH_L | CIDER | |
|---|---|---|---|---|---|---|---|---|---|---|
| Sentence-level | ||||||||||
| Recurrent neural cascade model [ | LSTM | IU X-Ray [ | 9.1 | 0.0 | 0.0 | – | – | – | – | |
| GRU | 78.5 | 0.0 | – | – | – | – | ||||
| Multi-task learning model [ | PEIR | 0.300 | 0.218 | 0.165 | 0.113 | 0.149 | 0.279 | – | 0.329 | |
|
| ||||||||||
| Paragraph-level | ||||||||||
| Multi-task learning model [ | IU X-Ray [ | 0.217 | – | 0.327 | ||||||
| Multimodal recurrent model with attention [ | 0.464 | 0.358 | 0.270 | 0.195 | 0.366 | – | – | |||
| TieNet [ | 0.2860 | 0.1597 | 0.1038 | 0.0736 | 0.1076 | – | 0.2263 | – | ||
| HRGR-Agent [ | 0.438 | 0.298 | 0.208 | 0.151 | – | 0.322 | – | |||
| CX-CHR | 0.673 | 0.587 | 0.530 | 0.486 | – | 0.612 | – | 2.895 | ||
Evaluation metrics (image caption measures).
| Metric | Purpose | Algorithm | Strengths | Weaknesses |
|---|---|---|---|---|
| BLEU [ | machine translation | Correlates with human judgments | Lack of explicit word matching | |
| ROUGE [ | document summarization | Favours long sentences | Works only in single document summarization | |
| METEOR [ | machine translation | Benefit from synonyms and paraphrase matching | Lack of semantic similarity capturing | |
| CIDEr [ | image captioning | Works in linguistics means | May weight irrelevant sentence’s details | |
| SPICE [ | image captioning | Can match noun / object between captions | Reliant on the performance of parsing |