| Literature DB >> 35120466 |
Gunvant R Chaudhari1, Yeshwant R Chillakuru1,2, Timothy L Chen1,3, Valentina Pedoia1, Thienkhai H Vu1, Christopher P Hess1, Youngho Seo1, Jae Ho Sohn4.
Abstract
BACKGROUND: The comprehensiveness and maintenance of the American College of Radiology (ACR) Appropriateness Criteria (AC) makes it a unique resource for evidence-based clinical imaging decision support, but it is underutilized by clinicians. To facilitate the use of imaging recommendations, we develop a natural language processing (NLP) search algorithm that automatically matches clinical indications that physicians write into imaging orders to appropriate AC imaging recommendations.Entities:
Keywords: Appropriateness criteria; Information retrieval; Natural language processing; Term frequency-inverse document frequency
Mesh:
Year: 2022 PMID: 35120466 PMCID: PMC8815252 DOI: 10.1186/s12880-022-00740-6
Source DB: PubMed Journal: BMC Med Imaging ISSN: 1471-2342 Impact factor: 1.930
Fig. 1Algorithm flow diagram. This figure is a flow diagram of the algorithm’s backend from input to output. TF-IDF (Term frequency-inverse document frequency); AC (Appropriateness Criteria)
Fig. 2Flowchart on institutional radiology report dataset. This figure is a flow diagram of patient indication inclusion, exclusion, and processing in evaluating our algorithm on the clinical radiology report indications
Patient and study characteristics of annotated institutional radiology report dataset
| Characteristic | Proportion of dataset |
|---|---|
| Abdomen/pelvis | 0.32 |
| Chest/breast | 0.15 |
| Extremity | 0.08 |
| Head | 0.30 |
| Spine | 0.13 |
| Full body | 0.02 |
| Under 13 | 0.07 |
| 13–65 | 0.63 |
| Over 65 | 0.30 |
| Male | 0.44 |
| Female | 0.56 |
| CT | 0.43 |
| MRI | 0.47 |
| US | 0.08 |
| NM | 0.02 |
Total size of the dataset was 100 indications. For age, mean ± one standard deviation is also reported. Ultrasound (US), Nuclear medicine (NM)
Fig. 3Accuracy on simulated indications dataset. This figure is a cumulative graph of the percentage of indications with ground truths ranked as the top search result to those within the top 60 search results. It shows a significant difference between algorithm performances on simple and complex simulated indications (p < 1e−10, two sample Kolmogorov Smirnov test)
Simulated indications dataset results
| Analysis metric | Simple indications | Complex indications | |
|---|---|---|---|
| Proportion of ground truth documents in top 3 | 0.985 | 0.849 | |
| Average ground truth rank | 1.36 | 2.66 | |
| Average NDCG | 0.841 | 0.801 |
Chi-squared test was used to calculate significance of the proportion of ground truth documents in top 3, and Mann–Whitney U test was used for ground truth rank and NDCG. All metrics show that the algorithm performed significantly better on simple indications than on complex ones. Normalized discounted cumulative gain (NDCG), Appropriateness criteria (AC)
Fig. 4Search ranking relevance on generated indications dataset. This figure is a bar graph of the average NDCG on simple and complex queries in all 12 categories. Error bars indicate 1 standard deviation. The number of documents (and therefore queries) for each category is in parentheses. There is a significant difference between the NDCG values among categories (p = 0.00054, Kruskal–Wallis H-test). Note: ‘Major Trauma’ category was excluded from statistical analysis due to sample size of 1. NDCG (normalized discounted cumulative gain)
Institutional radiology report clinical indications dataset results
| Classifications and metrics | Number of documents (proportion) |
|---|---|
| Correct doc ranked top 3 | 51 (0.864) |
| Correct doc ranked top 10 | 57 (0.966) |
| All correct docs ranked top 5 | 14 (0.777) |
| All correct docs ranked top 10 | 18 (1.0) |
| Inadequate indication | 11 (0.478) |
| No AC doc for indication | 12 (0.522) |
Normalized discounted cumulative gain (NDCG), Appropriateness criteria (AC)
Fig. 5Comparison to a custom google search. This figure shows the relative accuracies of our proposed Sent2Vec-based algorithm and a custom Google search engine on a subset of the simulated indications dataset (n = 100). A lower ranked search result was defined as the ground truth AC document being ranked 4th highest or worse. Document retrieval performance between search engines is statistically significant for both simple and complex indications (p < 0.0001 for both indication types, Friedman Rank Test)
Error analysis
| Dataset and indications | Ground truth document | Ground truth rank | Main cause of error |
|---|---|---|---|
| 1. “64yo woman with history of obesity and alcohol use disorder presents with chronic onset of progressive big toe pain and swelling.” | Chronic extremity joint pain, suspected inflammatory arthritis | 10 | Semantic matching of “big toe pain” |
| 2. “60yo female with history of hypertension presents with right groin pain, fatigue, and weight loss for past 3 months. Concerning for sarcoma” | Soft-tissue masses | 54 | No clinical context in ground truth document |
| 3. “21yo G2P0A1 with history of recent termination procedure presenting with vaginal bleeding and vomiting for a week” | Gestational trophoblastic disease | 15 | Vague indication |
| 1. “Aortic dissection suspected. Cancer metastatic pt with pancreatic cancer stage IV currently on treatment and needs restaging scan” | Acute chest pain–suspected aortic dissection | Not top 10 | Overly specific indication with distracting medical history |