| Literature DB >> 36160365 |
Djamila-Romaissa Beddiar1, Mourad Oussalah1,2, Tapio Seppänen1,2.
Abstract
Automatically understanding the content of medical images and delivering accurate descriptions is an emerging field of artificial intelligence that combines skills in both computer vision and natural language processing fields. Medical image captioning is involved in various applications related to diagnosis, treatment, report generation and computer-aided diagnosis to facilitate the decision making and clinical workflows. Unlike generic image captioning, medical image captioning highlights the relationships between image objects and clinical findings, which makes it a very challenging task. Although few review papers have already been published in this field, their coverage is still quite limited and only particular problems are addressed. This motivates the current paper where a rapid review protocol was adopted to review the latest achievements in automatic medical image captioning from the medical domain perspective. We aim through this review to provide the reader with an up-to-date literature in this field by summarizing the key findings and approaches in this field, including the related datasets, applications and limitations as well as highlighting the main competitions, challenges and future directions.Entities:
Keywords: Automatic image captioning; Caption; Diagnosis generation; Medical images; PRISMA; Rapid review; Report generation
Year: 2022 PMID: 36160365 PMCID: PMC9483422 DOI: 10.1007/s10462-022-10270-w
Source DB: PubMed Journal: Artif Intell Rev ISSN: 0269-2821 Impact factor: 9.588
Fig. 1Taxonomy of MIC related aspects discussed in this survey
Analysis of some state-of-the-art comprehensive surveys on MIC
| Survey | Year | Image modality | Datasets | Methods | Evaluation measures | Observations |
|---|---|---|---|---|---|---|
| Allaouzi et al. ( | 2018 | No specific modality | IU chest X-ray, PEIR gross, ICLEFCaption, BCIDR | Generative-based models and retrieval-based methods | BLEU, METEOR, CIDer, SPICE, ROUGE | Too short and did not analyze accurately the state-of-the-art |
| Pavlopoulos et al. ( | 2019 | No specific modality | IU chest X-ray, PEIR gross, ICLEFCaption | Encoder-decoder based methods and retrieval-based methods | BLEU, METEOR, CIDer, SPICE, ROUGE | Did not analyze accurately the literature, missing datasets, metrics, and methods not structured. |
| Pavlopoulos et al. | 2021 | No specific modality | IU chest X-ray, MIMIC-cxr | Early approaches, DL-based generative models, template-based and retrieval-based | Word overlap metrics: BLEU, METEOR, CIDEr, SPICE, ROUGE and clinical correctness measures: keyword accuracy | Not well structured, many aspects related to MIC are missing, few datasets highlighted and study methodology not mentioned. |
| Monshi et al. ( | 2020 | Radiology | IU chest X-ray, CheXpert, X-ray14, Mimic-cxr, PadChest, PEIR gross, DDSM | DL-based methods | ROUGE, BLEU, METEOR, CIDEr, SPICE and qualitative evaluation | DL-driven review, ignoring other techniques and also emphasis made only on radiology and passing over other modalities. |
| Ayesha et al. ( | 2021 | No specific modality | ICLEFCaption, BCIDR, IU chest X-ray, X-ray14, PEIR gross, PadChest, CheXpert, Mimic-cxr | DL-based methods | BLEU, METEOR, CIDer, SPICE, ROUGE-L | Focus made upon DL-based method ignoring other techniques. |
| Ours | 2022 | No specific modality | ICLEFCaption, BCIDR, IU chest X-ray, PEIR gross, PadChest, CheXpert, Mimic-cxr, ROCO | Generative-based models, retrieval-based methods, template-based methods and hybrid models | ROUGE, BLEU, METEOR, CIDEr, SPICE | Deeper analysis. |
BLEU stand for Bilingual Evaluation Understudy score, METEOR for Metric for Evaluation of Translation with Explicit Ordering, ROUGE-L for Recall Oriented Understudy for Gisting Evaluation-Longest common subsequence, CIDer for Consensus-based Image Description Evaluation and SPICE for Semantic Propositional Image Caption Evaluation. For the datasets’ full names, refer to section 5.2
Search keywords and their substitutes
| Keyword | Can be substituted by |
|---|---|
| Captioning | Caption generation, Report generation, Diagnosis generation, Description, Annotation, Diagnostic captioning |
| Medical | Biomedical, Ultrasound, MRI, CT, Radiology, PET, XRay |
| Imaging | Image, Radiograph, Figure |
| Automatic | Automated |
Fig. 2Flow Diagram of our review methodology
Fig. 3Results of our rapid review a Number of records studied per year, b Number of included studies that used each of the existing benchmark datasets
Fig. 4Word cloud for our rapid review results
Fig. 5Different sections of the medical reports. Example from the IU X-Ray dataset, retrieved from (Jing et al. 2017)
Fig. 6Samples of a a normal chest from the Chest X-Ray Images dataset (Kermany et al. 2018), b a brain with meningioma MRI Image, retrieved from the Brain MRI Images for Brain Tumor dataset (Cheng 2017), c CT scans for COVID-19 patients from the COVID-CT-Dataset (Zhao et al. 2020), d PET scans from TADPOLE challenge PET data (Marinescu et al. 2018) for an Alzheimer’s disease and e an image from the Ultrasound breast images dataset (Al-Dhabyani et al. 2020)
Fig. 7Percentage of included publications in this review according to the imaging modality
Fig. 8Categorization of MIC methods
Performance results on some state-of-the-art methods for MIC and natural image captioning (rows in italics)
| Method | Datasets | B1 | B2 | B3 | B4 | M | R | C |
|---|---|---|---|---|---|---|---|---|
| Template-based models | ||||||||
| | ||||||||
| Retrieval-based models | ||||||||
| | ||||||||
| Syeda-Mahmood et al. ( | Own created | 0.56 | 0.51 | 0.50 | 0.49 | 0.55 | 0.58 | – |
| Merge models | ||||||||
| Mishra et al. ( | Stare | 0.87 | 0.66 | 0.52 | 0.44 | – | – | – |
| Alsharid et al. ( | Own created | 0.27 | – | 0.42 | – | |||
| Rahman et al. ( | ImageCLEF | 0.17 | – | – | – | |||
| Alsharid et al. ( | Own created | 0.11 | – | 0.59 | – | |||
| Wang et al. ( | IU X-ray | 0.34 | 0.22 | 0.15 | 0.10 | 0.14 | 0.30 | 0.32 |
| Encoder-decoder models | ||||||||
| | ||||||||
| Lydon et al. ( | ImageCLEF | 0.10 | – | – | – | |||
| Pelka et al. ( | ImageCLEF | 0.07 | – | – | – | |||
| Shin et al. ( | IU X-ray | 0.78 | 0.40 | 0.00 | 0.00 | – | – | – |
| Zheng et al. ( | Own created | 0.63 | 0.55 | 0.47 | 0.42 | 0.76 | – | 4.42 |
| Sun et al. ( | Own created | 0.61 | 0.41 | 0.33 | 0.24 | – | – | 0.62 |
| Zeng et al. ( | IU X-ray | 0.47 | 0.40 | 0.30 | – | 0.45 | 0.26 | 3.41 |
| own created | 0.65 | 0.56 | 0.45 | – | 0.45 | 0.79 | 4.67 | |
| Chelaramani et al. ( | Own created | 0.32 | – | – | – | |||
| Zeng et al. ( | Own created | 0.30 | 0.22 | 0.18 | – | 0.19 | 0.29 | 0.99 |
| Haezig et al. ( | IU X-ray | 0.39 | 0.27 | 0.19 | 0.14 | 0.18 | 0.33 | 0.39 |
| Attention-based Encoder-decoder models | ||||||||
| | ||||||||
| | ||||||||
| | ||||||||
| Gajbhiye et al. ( | IU X-ray | 0.50 | 0.38 | 0.32 | 0.28 | 0.28 | 0.44 | 1.07 |
| Rodin et al. ( | Mimic CXR | 0.68 | 0.61 | 0.54 | 0.48 | – | – | – |
| Tian et al. ( | IU X-ray | 0.88 | 0.87 | 0.87 | 0.86 | – | 0.93 | – |
| Van Sonsbeek et al. ( | Mimic CXR | 0.36 | 0.24 | 0.16 | 0.093 | 0.32 | 0.34 | – |
| Hasan et al. ( | ImageCLEF | 0.32 | – | – | – | |||
| Park et al. ( | IU X-ray | 0.33 | 0.20 | 0.14 | 0.09 | – | 0.27 | 0.19 |
| Huang et al. ( | IU X-ray | 0.48 | 0.34 | 0.24 | 0.17 | – | 0.35 | 0.30 |
| Yin et al. ( | IU X-ray | 0.45 | 0.29 | 0.20 | 0.15 | 0.18 | 0.34 | 0.34 |
| Xiong et al. ( | IU X-ray | 0.35 | 0.23 | 0.14 | 0.10 | – | – | 0.32 |
| Yang et al. ( | BCD 2018 | 0.47 | 0.36 | 0.27 | 0.21 | 0.31 | 0.46 | 0.65 |
| Yuan et al. ( | ChexPert | 0.65 | 0.50 | 0.41 | 0.30 | 0.42 | 0.50 | – |
| Yang et al. ( | IU X-ray | 0.44 | 0.31 | 0.22 | 0.15 | – | 0.37 | 0.50 |
| Gu et al. ( | own created | 0.76 | 0.72 | 0.68 | 0.65 | 0.49 | 0.81 | – |
| Xue et al. ( | IU X-ray | 0.46 | 0.36 | 0.27 | 0.20 | 0.27 | 0.37 | – |
| Spinks et al. ( | Own created | 0.49 | 0.35 | 0.25 | 0.18 | 0.27 | 0.40 | 0.60 |
| Xue et al. ( | IU X-ray | 0.49 | 0.34 | 0.25 | 0.20 | 0.23 | 0.48 | 0.57 |
| Hybrid models | ||||||||
| Xie et al. ( | IU X-ray | 0.44 | 0.34 | 0.24 | 0.18 | – | 0.35 | 0.37 |
| Wang et al. ( | IU X-ray | 0.50 | 0.33 | 0.24 | 0.18 | – | 0.36 | 0.33 |
| CX-CHR | 0.71 | 0.64 | 0.59 | 0.55 | – | 0.68 | 3.25 | |
| Li et al. ( | IU X-ray | 0.44 | 0.30 | 0.21 | 0.15 | – | 0.32 | 0.34 |
| ChexPert | 0.67 | 0.59 | 0.53 | 0.49 | – | 0.61 | 2.90 | |
| Li et al. ( | IU X-ray | 0.48 | 0.33 | 0.23 | 0.16 | – | 0.34 | 0.28 |
| ChexPert | 0.67 | 0.59 | 0.53 | 0.47 | – | 0.62 | 2.85 | |
B stands for BLEU, M for METEOR, R for ROUGE-L and C for CIDEr
Fig. 9General architecture of Encoder-decoder models with attention
Fig. 10General architecture of merge models (during training phase)
Fig. 11Categorization of studied publications according to the method used for caption generation
State-of-the-art dataset, their sizes (# of image-caption pairs), body parts diagnosed, image modality used, source of the images and annotations technique used to create underlying captions. ( body parts and modalities mean different body parts and different modalities have been considered by the datasets, respectively)
| Dataset | Size | Body part | Images modality | Nature of Images | Annotations |
|---|---|---|---|---|---|
| IU Chest X-Ray dataset (Demner-Fushman et al. | 3996 | Chest | X-Ray | Real images from the Indiana university hospital | Manual |
| CheXpert dataset (Irvin et al. | 224,316 | Chest | X-Ray | Real images from the Stanford hospital | Automatic |
| MIMIC-CXR dataset (Johnson et al. | 371,920 | Chest | X-Ray | Real images from the Beth Israel Deaconess Medical Center | Automatic |
| PadChest dataset (Bustos et al. | 160,868 | Chest | X-Ray | Real images from the San Juan Spain hospital | Automatic, 27% Manual |
| BCIDR dataset(Zhang et al. | 5,000 | Bladder tissues | – | – | Manual |
| The PEIR Grossa | 7,443 | Pathology Education Informational Resource digital library | Manual | ||
| ImageCLEF caption 2017 dataset (Eickhoff et al. | 184,614 | Open-access biomedical literature database PubMedCentral | – | ||
| ImageCLEF caption 2018 dataset (Garcia Seco De Herrera et al. | 232,305 | Open-access biomedical literature database PubMedCentral | – | ||
| ROCO dataset (Pelka et al. | 81,000 | Open-access biomedical literature database PubMedCentral | – |
ahttps://peir.path.uab.edu/library/
Fig. 12Samples of chest x-ray image-report pairs of two patients and two views (lateral, frontal) from the IU X-Ray dataset (Demner-Fushman et al. 2016)
Fig. 13Example study from the MIMIC-CXR dataset. a Highlights the radiology report, b the frontal view and c the lateral view of the chest radiographs
Fig. 14Samples of image-caption pairs from the PEIR Gross subset from the nervous class
Main symbols and notations used for the different evaluation metrics
| Symbol | Description |
|---|---|
| c | Candidate sentence |
| Reference sentence | |
| Set of reference sentences | |
| N (by default = 4) | The number of n-grams (uni-gram, bi-gram, 3-gram and 4-gram) |
| Number of words of a given reference sentence | |
| Number of words of the candidate sentence | |
| Number of sentences in the set of reference sentences | |
| R | Recall |
| P | Precision |
Symbols and notations of equations related to the BLEU metric
| Symbol | Description |
|---|---|
| BP | Brevity penalty |
| The weight of each modified precision | |
| The modified precision | |
| Clipped n-gram counts of the candidate sentence in the corpus | |
| The number of candidate n-grams |
Symbols and notations of equations related to the ROUGE-L metric
| Symbol | Description |
|---|---|
| LCS | The longest common subsequence |
| LCS-based precision | |
| LCS-based recall | |
| Length of the longest common subsequence of X and Y | |
| LCS score of the union longest common subsequence between a reference sentence and the candidate sentence |
Symbols and notations of equations related to the METEOR metric
| Symbol | Description |
|---|---|
| Penalty | |
| Number of chunks | |
| Number of unigrams that matched between the candidate and the reference | |
| M(c) | Number of unigrams in the candidate sentence that are mapped |
| U(r) | Total number of unigrams in the reference sentence |
| U(c) | Total number of unigrams in the candidate sentence |
Symbols and notations of equations related to the CIDEr metric
| Symbol | Description |
|---|---|
| Vector formed by all n-grams of length n of the candidate sentence | |
| Magnitude of the vector | |
| Vector formed by all n-grams of length n of the set of reference sentences | |
| Magnitude of the vector | |
| TF-IDF weighting for each n-gram | |
| TF-IDF weighting for each n-gram | |
| Number of occurrences of an n-gram | |
| Number of occurrences of an n-gram | |
| Vocabulary of all n-grams | |
| I | Set of all images of the dataset |
Symbols and notations of equations related to the SPICE metric
| Symbol | Description |
|---|---|
| G(c) | Scene graph of the candidate sentence |
| Scene graph of each reference sentence | |
| Scene graph of all reference sentences | |
| O(c) | Set of objects in the candidate sentence |
| E(c) | Set of attributes in the candidate sentence |
| K(c) | Set of relations in the candidate sentence |
| The function that allows us to return logical tuples |
Fig. 15Example of a medical image, its caption, for which part of speech tags (POS tags) and relations between them are shown, and its scene graph. We present the objects (nouns) in blue, the attributes (adjectives in orange) and relationships between objects in green. POS tags are: DT for determiner, NN for nouns, VBZ and VBG for verbs, IN for preposition and JJ for adjective. For dependencies between words, we have det for determiner, compound for compound words, nsubj for nominal topic, aux for auxiliary, obj for object, obl for indirect nominal, case for case marking and amond for adjectival modifier
Summary of main limitations of MIC systems and potential associated solutions
| Limitations | Potential solutions and ways forward | |
|---|---|---|
| Data issues | Few, and small MIC dataset with particular image modality and difficulty for generalization of the developed systems | Construction of large scale dataset of real medical images of various body parts and from different modalities. |
| Class imbalanced dataset and rarity of abnormal data | Data augmentation for abnormal class expansion, which takes into account the nature of medical images and preserves their contents. | |
| Lack of resources for use of complex deep learning models and transfer learning | Construction of large scale labeled datasets and intelligent features selection when transferring knowledge from any domain to the medical domain. | |
| Privacy issues for acquiring medical data | Advanced anonymization and data preprocessing could be implemented to hide patient identity and personal information. In addition, federated learning could be investigated to hide raw data and hence ensure privacy of medical data. | |
| Model issues | Complex nature of medical images that require deep expertise and extensive experience | Domain-specific generative or retrieval based models to deal with discrete features of the medical images. Also, Promoting tailored preprocessing tools for medical images to simplify interpretability. |
| Different styles and templates and specific terminology for medical reports generation in addition to human errors | Implementation of unified templates with specific terminology to improve the quality of the generated reports. Also, involving the physician in the process of report generation to correct and approve the automatically generated sentences. | |
| Incoherent sentences and incorrect order of words generated automatically, which are not clinically acceptable | Involving the physician in the report generation by allowing him to see, correct and approve the automatically generated report and providing him with evidences about each highlighted finding. In addition, the use of specific and limited vocabulary in the generation process could help to provide coherent and efficient reports. | |
| Difficulty to distinguish purely local features from purely global features | Datasets should include annotated bounding boxes for abnormal regions to allow the system to get fine-grained features (Singh et al. | |
| Evaluation issues | Lack of efficiency of evaluation metrics | Development of appropriate domain-specific evaluation metrics to deal with medical report generation. |
| Lack of consensus among clinicians on reporting | Promote standards and template reporting in different image modalities and diagnoses. Also, promote the use of explainable AI solutions, which highlight approximations and visualization of complex deep learning models to ease understanding of the results in a way to promote consensus among clinicians. | |
| Ambiguity and incorrect detection of objects from medical images | Increasing the human interaction by incorporating manual evaluation by qualified physicians for better reports. | |
| Increasing costs of human evaluation and annotations | Development and implementation of specific crowd-sourcing tools in addition to human-like evaluation mechanisms for medical domain use. This would allow generation of automatic annotations for medical images and facilitate the process of evaluation. |
Analysis of some state-of-the-art comprehensive studies on MIC
| Paper | Year | Approaches | Imaging modality | Metrics | Inputs | Outputs | Datasets | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ED | ED+att | Merged | T+R | R+G | T+G | R | T | X-ray | Ultrasound | MRI | Fundus | Others | Automatic Eval | Human Eval | |||||
| Zeng et al. ( | 2020 | + | + | + | + | Medical images | Diagnosis | IU X-ray + own created dataset | |||||||||||
| Xiong et al. ( | 2019 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Yuan et al. ( | 2019 | + | + | + | Chest X-ray images | Medical report | IU X-ray and ChexPert datasets | ||||||||||||
| Han et al. ( | 2021 | + | + | + | Spinal images | Medical report | Own created dataset | ||||||||||||
| Syeda-Mahmood et al. ( | 2020 | + | + | + | Chest X-ray images | Medical report | Own created dataset | ||||||||||||
| Singh et al. ( | 2019 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Lyndon et al. ( | 2017 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Beddiar et al. ( | 2021 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Castro et al. ( | 2021 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Charalampakos et al. ( | 2021 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Wang et al. ( | 2021 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Tsuneda et al. ( | 2021 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Nicolson et al. ( | 2021 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Sun et al. ( | 2019 | + | + | + | Breast mammography images | Medical report | Own created dataset | ||||||||||||
| Alsharid et al. ( | 2019 | + | + | + | + | Fetal ultrasound images | Description | Own created dataset | |||||||||||
| Li et al. ( | 2018 | + | + | + | + | Lateral and frontal chest X-ray images | Medical report | IU X-ray + ChexPert dataset | |||||||||||
| Huang et al. ( | 2019 | + | + | + | Chest X-ray images + patient’s indication | Medical report | IU X-ray dataset | ||||||||||||
| Alsharid et al. ( | 2020 | + | + | + | Fetal ultrasound images + audio transcripted into text | Description | Own created dataset | ||||||||||||
| Benzarti et al. ( | 2021 | + | + | + | Medical images | Diagnosis | PEIR dataset | ||||||||||||
| Gajbhiye et al. ( | 2020 | + | + | + | Lateral and frontal chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Park et al. ( | 2020 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Rodin et al. ( | 2019 | + | + | + | Lateral and frontal chest X-ray images + patient’s indication | Medical report | Mimic CXR dataset | ||||||||||||
| Kisilev et al. ( | 2015 | + | + | + | Breast mammography images | Medical report | Own created dataset | ||||||||||||
| Yin et al. ( | 2019 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Yang et al. ( | 2020 | + | + | + | + | Lateral and frontal chest X-ray images | Medical report | IU X-ray dataset | |||||||||||
| Li et al. ( | 2019 | + | + | + | + | Lateral and frontal chest X-ray images | Medical report | IU X-ray dataset | |||||||||||
| Tian et al. ( | 2020 | + | + | + | Chest X-ray images + patient’s indication and doctor’s findings | Medical report | IU X-ray dataset | ||||||||||||
| Yang et al. ( | 2021 | + | + | + | Ultrasound images | Medical report | PEIR, BCD 2018 and own created dataset | ||||||||||||
| Shin et al. ( | 2016 | + | + | + | Chest X-ray images | Annotation | IU X-ray dataset | ||||||||||||
| Xue et al. ( | 2018 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Wang et al. | 2020 | + | + | + | + | Chest X-ray images | Medical report | IU X-ray + CX-CHR dataset | |||||||||||
| Harzig et al. ( | 2020 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Gu et al. ( | 2019 | + | + | + | X-ray chest images | Medical report | IU X-ray + own created dataset | ||||||||||||
| Xue et al. ( | 2019 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Harzig et al. ( | 2019 | + | + | + | Endoscopic gastrointestinal videos | Medical report | KvasirV2 + Medico2018 datasets | ||||||||||||
| Wang et al. ( | 2019 | + | + | + | Chest X-ray images | Medical report | IU X-ray dataset | ||||||||||||
| Han et al. ( | 2018 | + | + | + | Spinal images | Medical report | Own created dataset | ||||||||||||
| Hasan et al. ( | 2018 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Zeng et al. ( | 2018 | + | + | + | Ultrasound images | Description | Own created dataset | ||||||||||||
| Wu et al. ( | 2017 | + | + | + | Fundus images | Diagnosis | DIARETDB0 dataset | ||||||||||||
| Xu et al. ( | 2019 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| van Sonsbeek et al. ( | 2020 | + | + | + | Chest X-ray images + patient’s indication | Diagnosis | Mimic CXR dataset | ||||||||||||
| Wang ( | 2019 | + | + | + | Medical images | Description | imageCLEF + ROCO datasets | ||||||||||||
| Spinks et al. ( | 2019 | + | + | + | Chest X-ray images | Diagnosis | Own created dataset | ||||||||||||
| Xie et al. ( | 2019 | + | + | + | Lateral and frontal chest X-ray images | Description | IU X-ray dataset | ||||||||||||
| Mishra et al. ( | 2020 | + | + | + | Fundus images | Desciption | Stare dataset | ||||||||||||
| Ambati and Reddy Dudyala ( | 2018 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Chelaramani et al. ( | 2020 | + | + | + | Fundus images | Diagnosis | Own created dataset | ||||||||||||
| Onita et al. ( | 2020 | + | + | + | Chest X-ray images | Description | PADChest dataset | ||||||||||||
| Rahman et al. ( | 2018 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Pelka et al. ( | 2017 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
| Zeng et al. ( | 2020 | + | + | + | Ultrasound images | Annotation | Own created dataset | ||||||||||||
| Hasan et al. ( | 2017 | + | + | + | Medical images | Medical report | imageCLEF dataset | ||||||||||||
| Kougia et al. ( | 2019 | + | + | + | Medical images | Description | imageCLEF dataset | ||||||||||||
ED : Encoder-decoder architecture ED + att : ED with attention Merged : Merged architecture T : Template-based technique R : Retrieval-based technique T + R : Fusion of Template-based and Retrieval-based techniques T + G : Fusion of Template-based and Generative-based techniques G + R : Fusion of Generative-based and Retrieval-based techniques
Others : Not mentioned modality, other modalities, different modalities at once
Automatic eval : Automatic metrics such as BLEU, CIDEr, METEOR, keyword accuracy, ARS, ...etc. Human eval : Human evaluation + refers to the adoption of the method, the modality or the metrics. However, inside inputs, outputs and datasets columns, it refers to the addition