| Literature DB >> 33059590 |
C Hamel1,2, S E Kelly3,4, K Thavorn5,4, D B Rice5,6, G A Wells5,3,4, B Hutton5,4.
Abstract
BACKGROUND: Systematic reviews often require substantial resources, partially due to the large number of records identified during searching. Although artificial intelligence may not be ready to fully replace human reviewers, it may accelerate and reduce the screening burden. Using DistillerSR (May 2020 release), we evaluated the performance of the prioritization simulation tool to determine the reduction in screening burden and time savings.Entities:
Keywords: Artificial intelligence; Automation; Efficiency; Machine learning; Natural language processing; Prioritization; Rapid reviews; Systematic reviews; Time savings; True recall
Mesh:
Year: 2020 PMID: 33059590 PMCID: PMC7559198 DOI: 10.1186/s12874-020-01129-1
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Terminology and descriptions
| Terminology | Description |
|---|---|
| Estimated recall | The estimated percent of how many studies at title/abstract level have been identified among those that will be passed through to full-text screening. As this is calculated based on a set of records that have not been completely screened, the estimated recall may differ from the true recall. |
| Final include | A primary study included in the completed systematic review. |
| Iteration | A set of records that is used to assign a score around the likeliness of inclusion and prioritize the remaining unscreened records in order from highest relevance to lowest relevance. |
| Modified screening approach | An approach to modify how screening is being performed. For example, changing from: (i) dual-independent screening to liberal accelerated screening; (ii) dual-independent screening to single-reviewer screening; or (iii) assigning the remaining records to the AI reviewer to exclude, with a human reviewer(s) also screening these records as a second reviewer. |
| Prioritized screening | Through active machine learning, the presentation of records to reviewers is continually adjusted based on the AI’s estimated likelihood of relevance. The frequency of adjustment may differ by software application. |
| Screening burden | The total number of records at title/abstract to be screened. |
| Stop screening approach | An approach to screening whereby the remaining records are not screened once a certain threshold has been achieved (e.g., estimated recall @ 95%). These records are assumed to be excluded. |
| Record not yet identified [i.e., title/abstract false negative (FN)] | When an estimated recall (at any %) or true recall of less than 100% is used, these are the records that would have been included based on the title/abstract to be further reviewed at full-text screening, but were not yet identified. Had these records been screened at title/abstract and further screened based on the full text, they may have been excluded or included in the final review (i.e., a final include). |
| Title/abstract include [i.e., title/abstract true positive (TP)] | Records included based on the title/abstract to be further reviewed based on the full text. These records may then be excluded at full-text review or included in the final review. |
| Training set | One or more iterations which inform the machine learning to score and prioritize the remaining unscreened records. |
| Title/abstract exclude [i.e., true negative (TN)] | Records considered excluded based on title/abstract screening. |
| True recall | This is only known once all references have been screened and includes the percentage of the actual number of records that were title/abstract includes. True recall % calculated as: [title/abstract TP / (title/abstract TP + title/abstract FN)] |
Fig. 1AI simulation flow
Study results
| Project | Project details | Iteration details | # of records needed to screen to achieve true recall @ 95% | Title/abstract includes not yet identified | Hours saved at title/abstract | Final included studies missed |
|---|---|---|---|---|---|---|
| Mean (%) [SD]; Median (range) | ||||||
| Hot flashes | 2569; 451 (17.6%); 38 (1.48%) | 51 records; 17 or 18 iterations | 892.5 (34.7%) [26.9]; 892.5 (867–918) | 19.2 (4.3%) [2.04]; 19 (15–22) | 27.9 [0.45]; 27.9 (27.5–28.4) | 0 |
| Opioid use disorder | 16,282; 984 (6.0%); 71 (0.44%) | 200 records; 23 or 23 iterations | 4480 (27.5%) [103.3]; 4400 (4400–4600) | 46.1 (4.7%) [3.38]; 48 (41–49) | 196.7 [1.72]; 198.0 (194.7–198.0) | 0 |
| Meniere’s disease | 2889; 332 (11.5%); 23 (0.80%) | 57 records; 19–22 iterations | 1168.5 (40.5%) [55.4]; 1140 (1083–1254) | 15.0 (4.5%) [1.33]; 15.5 (12–16) | 28.7 [0.92]; 29.2 (27.3–30.1) | 0 |
| Non-small cell lung cancer | 3145; 795 (25.3%); 13 (0.40%) | 62 records; 29 or 30 iterations | 1829 (58.2%) [0.01]; 1829 (1798–1860) | 33.7 (4.2%) [3.53]; 32.5 (29–39) | 21.9 [0.54]; 21.9 (21.4–22.5) | 0 |
| Prophylaxis for influenza | 8278; 395 (4.8%); 104 (1.26%) | 165 records; 18 or 19 iterations | 3019.5 (36.5%) [79.7]; 2970 (2970–3135) | 18.8 (4.8%) [0.42]; 19 (18–19) | 87.6 [1.33]; 88.5 (85.7–88.5) | 0 |
| Smoking cessation | 2250; 881 (39.2%); 14 (0.62%) | 45 records; 35 iterations | 1575 (70.0%) [0]; 1575 (0) | 39.9 (4.5%) [2.60]; 40 (34–44) | 11.3 [0]; 11.3 (0) | 0 |
| Asthma/ Urticaria | 3265; 482 (14.8%); 12 (0.36%) | 65 records; 22 or 23 iterations | 1488.5 (45.6%) [20.55]; 1495 (1430–1495) | 22.5 (4.7%) [1.51]; 23 (20–24) | 29.6 [0.34]; 29.5 (29.5–30.6) | 0 |
| Depression screening | 4174; 126 (3.0%); 1 (0.02%) | 83 records; 23–26 iterations | 2025 (48.5%) [70]; 1992 (1909–2158) | 5.8 (4.6%) [0.42]; 6 (5–6) | 35.8 [1.17]; 36.4 (33.6–37.8) | 0 |
| Prophylaxis for HIV | 4502; 1184 (26.4%); 46 (1.02%) | 90 records; 30 iterations | 2700 (60.0%) [0]; 2700 (0) | 53.7 (4.5%) [1.49]; 53.5 (52–56) | 30.0 [0]; 30.0 (0) | 0 |
| SSBs | 22,309; 4993 (22.4%); 127 (0.57%) | 200 records; 64 iterations | 12,800 (57.4%) [0]; 12800 (0) | 242.7 (4.9%) [2.06]; 243 (238–246) | 158.5 [0]; 158.5 (0) | 0 |
HIV Human immunodeficiency virus, SD Standard deviation, SSB Sugar sweetened beverage
a Total number of records; Number of includes at title/abstract (% of all records); Number of included studies in the SR (% of all records)
b Where there was no SD or range, this is identified by a 0
c Hours saved at title/abstract = [(Total records - # of records needed to screen to identify 95% of includes)/60
Fig. 2a Title/abstract includes and excludes and screening burden reduction. b – Relationship of mean % reduction in screening burden and % of title/abstract includes
Fig. 3Title/abstract includes not yet identified (i.e., title/abstract false negatives)
Fig. 4Mean hours saved in title/abstract screening using a true recall @ 95% modified approach
Fig. 5Estimated total time saved
Fig. 6Screening burden to achieve true recall @ 95% and @ 100%