| Literature DB >> 30457565 |
Jason J Lau1, Soumya Gayen1, Asma Ben Abacha1, Dina Demner-Fushman1.
Abstract
Radiology images are an essential part of clinical decision making and population screening, e.g., for cancer. Automated systems could help clinicians cope with large amounts of images by answering questions about the image contents. An emerging area of artificial intelligence, Visual Question Answering (VQA) in the medical domain explores approaches to this form of clinical decision support. Success of such machine learning tools hinges on availability and design of collections composed of medical images augmented with question-answer pairs directed at the content of the image. We introduce VQA-RAD, the first manually constructed dataset where clinicians asked naturally occurring questions about radiology images and provided reference answers. Manual categorization of images and questions provides insight into clinically relevant tasks and the natural language to phrase them. Evaluating with well-known algorithms, we demonstrate the rich quality of this dataset over other automatically constructed ones. We propose VQA-RAD to encourage the community to design VQA tools with the goals of improving patient care.Entities:
Mesh:
Year: 2018 PMID: 30457565 PMCID: PMC6244189 DOI: 10.1038/sdata.2018.251
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Flow Diagram of VQA-RAD build.
Naturally occurring question and answer types about radiology images.
| Question Type | Description |
|---|---|
| How an image is taken – CT, x-ray, T2 weighted MRI, etc. | |
| Orientation of an image slicing through the body – axial, sagittal, coronal | |
| Categorization that connects anatomical structures with pathophysiology, diagnosis, and treatment – pulmonary, cardiac, musculoskeletal system | |
| Normalcy of an image or object. For example, “is there something wrong with the image?” or “What is abnormal about the lung?”, “Does the liver look normal?” | |
| Objects could be normal structures like organs or body parts but could also be abnormal objects such as masses or lesions. Clinicians may refer to the presence of conditions in an image or patient – fractures, midline shift, infarction | |
| position or location of an object or organ, including what side of a patient, in respect to the image borders, or relative to other objects in the image | |
| signal intensity including enhancement or opaqueness | |
| measurement of size of an object, e.g., enlargement, atrophy | |
| other types of description questions | |
| focusing on a quantity of objects, e.g., number of lesions | |
| catch-all categorization for questions that do not fall into the previous categories | |
| Answer Type | |
| yes/no and other limited choices. For example, “Is the mass on the left or right?” | |
| Do not have a limited question structure and could have multiple correct answers |
Figure 2Breakdown of different types of Closed vs Open-ended free form questions shows that certain question types are more likely to be open-ended: positional, counting questions and other.
Accuracy of the systems’ answers to the closed-ended free-form questions sub-grouped into question types.
| YES | NO | MCB_VQA1.0 | MCB_CLEF | SAN_CLEF | MCB_CLEF + RAD | SAN_CLEF + RAD | MCB_RAD | SAN_RAD | |
|---|---|---|---|---|---|---|---|---|---|
| Overall, MCB_RAD and SAN_RAD performed the best. MCB_VQA1.0 shows that one of the baseline systems is not better than all yes/no baselines. | |||||||||
| ABN | 33.3% | 58.3% | 58.3% | 4.2% | 29.2% | 58.3% | 66.7% | 62.5% | |
| ATTRIB | 33.3% | 50.0% | 0% | 16.7% | 50.0% | 50.0% | 50.0% | 16.7% | |
| COLOR | 50.0% | 50.0% | 50.0% | 0% | 50.0% | 50.0% | 50.0% | 50.0% | |
| COUNT | 0% | 0% | 0% | 0% | 0% | 0% | 0% | ||
| MODALITY | 33.3% | 46.7% | 46.7% | 0% | 40.0% | 60.0% | 60.0% | ||
| ORGAN | 100.0% | 0% | 100.0% | 0% | 0% | 50.0% | 100.0% | 100.0% | 100.0% |
| OTHER | 11.1% | 33.3% | 0% | 11.1% | 44.4% | 22.2% | 33.3% | 11.1% | |
| PLANE | 41.7% | 41.7% | 25.0% | 0% | 50.0% | 50.0% | 41.7% | 50.0% | |
| POS | 33.3% | 0% | 33.3% | 0% | 0% | 33.3% | 66.7% | ||
| PRES | 41.8% | 58.2% | 41.8% | 40.5% | 50.6% | 60.8% | 62.0% | 58.2% | |
| SIZE | 57.7% | 34.6% | 50.0% | 7.7% | 42.3% | 57.7% | 61.5% | 50.0% | |
| 42.8% | 48.9% | 44.4% | 19.4% | 41.1% | 57.8% | 58.9% | 57.2% | ||
| 43.6% | 37.9% | 44.4% | 4.8% | 30.9% | 50.4% | 51.3% | 54.2% | ||
| 0.804 | 0.812 | 0.003 | 0.020 | 0.167 | 0.167 | 0.168 | 0.622 |
Accuracy of the systems’ answers to the open-ended free-form questions sub-grouped into questions types.
| MCB_VQA1.0 | MCB_CLEF | SAN_CLEF | MCB_CLEF + RAD | SAN_CLEF + RAD | MCB_RAD | SAN_RAD | |
|---|---|---|---|---|---|---|---|
| Overall, modality and plane questions are predicted well. Deficiencies are seen in other areas like Abnormality, Color, and Counting. MCB_VQA1.0 shows how a baseline system’s BLEU scores disagree with manual judgments. | |||||||
| ABN | 0% | 0% | 0% | 0% | 0% | 0% | |
| ATTRIB | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| COLOR | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| COUNT | 0% | 0% | 0% | 0% | 0% | 0% | 0% |
| MODALITY | 0% | 32.1% | 0% | 71.4% | 50.0% | 57.1% | |
| ORGAN | 0% | 7.1% | 0% | 14.3% | 0% | 14.3% | |
| OTHER | 0% | 10.0% | 0% | 10.0% | 10.0% | 10.0% | |
| PLANE | 0% | 0% | 0% | 72.7% | 72.7% | ||
| POS | 5.9% | 2.9% | 4.4% | 16.2% | 22.1% | 25.0% | |
| PRES | 1.7% | 3.4% | 0% | 0.0% | 0% | 0% | |
| SIZE | 0% | 0% | 0% | 0.0% | 0% | 0% | |
| 0% | 6.7% | 1.7% | 23.8% | 24.6% | 24.2% | ||
| 0.7% | 5.1% | 0.9% | 18.5% | 15.5% | 18.6% | ||
| 0.0058 | 0.0113 | 0.0031 | 0.0339 | 0.0308 | 0.0576 |
Effect of paraphrasing on models.
| Question | Answer (GS) | Answer (MCB_RAD) |
|---|---|---|
| Sample of free-form and paraphrased questions that resulted in differing predicted answers from models. MCB model is trained on MCB_RAD. GS represents gold standard answer generated by clinical annotators. | ||
| Is the heart enlarged? | Yes | yes |
| Is cardiomegaly present? | Yes | no |
| The image is taken in what plane? | axial | dwi |
| What plane is the above image acquired in? | axial | axial |
| Is this a CT or an MRI? | MRI | ct |
| Was a CT or MRI used to take the above image? | MRI | t2 weighted |
| Does the liver show an enhancing mass or lesion? | No | normal |
| Is there an enhancing lesion in the liver? | No | no |
| What are the hyperdense lesions noted at the edges of the aorta? | Calcified atherosclerosis | calcifications |
| What are the hyperintensities surrounding the aorta? | Calcified atherosclerosis | ribs |