| Literature DB >> 34082729 |
Arlene Casey1, Emma Davidson2, Michael Poon2, Hang Dong3,4, Daniel Duma5, Andreas Grivas6, Claire Grover6, Víctor Suárez-Paniagua3,4, Richard Tobin6, William Whiteley2,7, Honghan Wu4,8, Beatrice Alex5,9.
Abstract
BACKGROUND: Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports.Entities:
Keywords: Natural language processing; Radiology; Systematic review
Mesh:
Year: 2021 PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Metadata enriching steps undertaken for each publication
| Metadata enriching steps | |
|---|---|
| 1. Match the paper with its DOI via the Crossref API [ | |
| 2. If DOI matched, check Semantic Scholar for metadata/abstract [ | |
| 3. If no DOI match and no abstract, search PubMed for abstract | |
| 4. Search arXiv [ | |
| 5. If no PDF link, search Unpaywall for available open access versions [ | |
| 6. If PDF but no separate abstract via Semantics Scholar/PubMed, extract abstract from the PDF |
Automated filtering steps to remove irrelevant publications
| Automated filtering steps | |
|---|---|
| 1. Document language is English | |
| 2. Word ’patent’ in title or URL | |
| 3. Year of publication out of range (<2015) | |
| 4. The words ’review’ or ’overview’ in the title, ’this review’ in the abstract | |
| 5. Image keywords in title or abstract with no NLP terminology in abstract | |
| 6. No radiology keywords in title or abstract | |
| 7. No NLP terminology in abstract |
Fig. 1PRISMA diagram for search publication retrieval
Fig. 2Clinical application of publication by year
Scan modality
| Scan modality | No. studies |
|---|---|
| Multiple modalities | 46 |
| MRI | 16 |
| CT | 38 |
| X-Ray | 8 |
| Mammogram | 5 |
| Ultrasound | 4 |
| Not specified | 47 |
| Total | 164 |
Image sampling method
| Sampling method | No. studies |
|---|---|
| Consecutive images | 33 |
| Non-consecutive images | 38 |
| Not specified | 93 |
| Total | 164 |
Anatomical region scanned
| Anatomical region | No. studies |
|---|---|
| Mixed | 43 |
| Thorax | 32 |
| Head/neck | 25 |
| Abdomen | 15 |
| Breast | 15 |
| Extremities | 9 |
| Spine | 5 |
| Other | 1 |
| Unspecified | 19 |
| Total | 164 |
Disease category
| Disease category | No. studies |
|---|---|
| Not specific disease related | 40 |
| Oncology | 39 |
| Various | 20 |
| Musculoskeletal | 10 |
| Cerebrovascular | 13 |
| Other | 13 |
| Respiratory | 10 |
| Trauma | 7 |
| Cardiovascular | 6 |
| Gastrointestinal | 3 |
| Hepatobiliary | 2 |
| Genitourinary | 1 |
| Total | 164 |
Radiology report language
| Report language | No. studies |
|---|---|
| English | 141 |
| Chinese | 5 |
| Spanish | 4 |
| German | 3 |
| Italian | 2 |
| French | 2 |
| Hebrew | 1 |
| Polish | 1 |
| Brazilian Portuguese | 1 |
| Unspecified | 4 |
| Total | 164 |
Clinical application category by technical objective
| Application category | Information extraction | Report/sentence | Lexicon/ ontology | Clustering |
|---|---|---|---|---|
| (n = 73) | Classification (n = 81) | Discovery (n = 9) | (n = 1) | |
| Disease information & Classification | 15 | 31 | - | - |
| Diagnostic surveillance | 28 | 17 | - | - |
| Quality compliance | 5 | 15 | – | – |
| Cohort-Epid. | 6 | 10 | – | – |
| Language discovery & knowledge | 13 | 4 | 9 | 1 |
| Technical NLP | 6 | 4 | – | – |
Fig. 3NLP method breakdown
Fig. 4NLP method by year
Breakdown of NLP method
| ML (n = 74) | No studies | Deep learning (n = 36) | No studies |
|---|---|---|---|
| SVM | 34 | RNN variants | 14 |
| Logistic regression | 23 | CNN | 10 |
| Random forest | 18 | Other | 5 |
| Naïve Bayes | 17 | Compare CNN, RNN | 4 |
| Maximum entropy | 7 | Combine CNN + RNN | 3 |
| Decision trees | 4 |
NLP Method by data size properties, minimum data size, maximum data size and median value, studies reporting in numbers of radiology reports
| NLP method | Min size | Max size | Median |
|---|---|---|---|
| Compare methods | 513 | 2,167,445 | 2,845 |
| Hybrid methods | 40 | 34,926 | 918 |
| Deep learning (Only) | 120 | 1,567,581 | 5000 |
| Machine learning (Only) | 101 | 2,977,739 | 2531 |
| Rules (only) | 31 | 10,000,000 | 8000 |
| Other | 25 | 12,377,743 | 10,000 |
Grouped data size and number of studies in each group, only for studies reporting in numbers of radiology reports
| Data size group | No. studies (%) |
|---|---|
| <200 | 9 (6.7) |
| 200 < 500 | 6 (4.4) |
| 500 < 1000 | 18 (13.3) |
| 1000 < 2000 | 17 (12.6) |
| 2000 < 5000 | 17 (12.6) |
| 5000 < 10,000 | 12 (8.9) |
| 10,000+ | 53 (39.3) |
| Unspecified | 3 (2.2) |
Studies reporting on total data size used and details on training set size, validation set size, test set size and annotation set size
| Dataset type | No. of studies | Comments |
|---|---|---|
| Total dataset size | 151 | 5 |
| Training set size | 129 | |
| Validation set size | 52 | 27 report size, 25 report k-fold cross validation |
| Test set size | 81 | |
| Annotation set size | 97 |
Fig. 5Application Category and NLP Method, Mean and Median Summaries. Mean value is indicated by a vertical bar, the box shows error bars and the asterisk is the median value