| Literature DB >> 31727159 |
Gerald Gartlehner1,2, Gernot Wagner3, Linda Lux4, Lisa Affengruber3,5, Andreea Dobrescu3, Angela Kaminski-Hartenthaler3, Meera Viswanathan4.
Abstract
BACKGROUND: Web applications that employ natural language processing technologies to support systematic reviewers during abstract screening have become more common. The goal of our project was to conduct a case study to explore a screening approach that temporarily replaces a human screener with a semi-automated screening tool.Entities:
Keywords: Accuracy; Machine-learning; Methods study; Rapid reviews; Systematic reviews
Year: 2019 PMID: 31727159 PMCID: PMC6857277 DOI: 10.1186/s13643-019-1221-3
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Definitions of commonly used terms
Accuracy: the proportion of correctly classified records: False negatives (FNs): the number of records incorrectly classified as excludes. Also referred to as “missed studies.” | |
| False positives (FPs): the number of records incorrectly classified as includes. | |
| Prediction: a forecast of whether a record is relevant (include) or irrelevant (exclude) for a given systematic review. | |
| Semi-automated screening tool: any web-based application that employs a combination of text mining and text classification to assist systematic reviewers during the title and abstract screening process. | |
| Sensitivity: the ability of a screening tool to correctly classify | |
| Specificity: the ability of a screening tool to correctly classify irrelevant records as excludes: | |
| Text classification: a standard machine-learning process in which the aim is to categorize texts into groups of interest [ | |
| Text mining: the process of discovering knowledge and structure from unstructured data. | |
| True negatives (TNs): the number of records correctly identified as excludes. | |
| True positives (TPs): the number of records correctly identified as includes. |
Fig. 1Graphical presentation of the study flow
Different performance measures for the machine-assisted screening approach, single-reviewer screening, and screening with DistillerAI alone
| Sensitivity | Specificity | Area under the curve (95% CI) | |||||
|---|---|---|---|---|---|---|---|
| Team 1 | |||||||
| Machine-assisted screening | 0.78 (0.59 to 0.90) | 0.96 (0.96 to 0.97) | 0.87 (0.80 to 0.95) | 7/32 (22%) | 97/2172 (4%) | 126/2172 (6%) | 10/300 |
| Single-reviewer screening | 0.78 (0.59 to 0.90) | 0.96 (0.95 to 0.97) | 0.87 (0.80 to 0.94) | 7/32 (22%) | 110/2172 (5%) | ||
| DistillerAI screening | 0.03 (0.00 to 0.21) | 0.99 (0.98 to 0.99) | 0.51 (0.48 to 0.54) | 31/32 (97%) | 27/2172 (1%) | ||
| Team 2 | |||||||
| Machine-assisted screening | 0.89 (0.70 to 0.97) | 0.92 (0.91 to 0.93) | 0.90 (0.84 to 0.96) | 3/27 (11%) | 232 /2172 (11%) | 226/2172 (10%) | 15/300 |
| Single-reviewer screening | 0.89 (0.69 to 0.97) | 0.91 (0.89 to 0.92) | 0.90 (0.84 to 0.96) | 3/27 (11%) | 221/2172 (10%) | ||
| DistillerAI screening | 0.00 | 0.99 (0.99 to 0.99) | 0.50 (0.49 to 0.50) | 27/27 (100%) | 18/2172 (1%) | ||
| Team 3 | |||||||
| Machine-assisted screening | 0.65 (0.44 to 0.82) | 0.96 (0.95 to 0.97) | 0.81 (0.71 to 0.90) | 9/26 (35%) | 130/2172 (6%) | 100/2172 (5%) | 16/300 |
| Single-reviewer screening | 0.65 (0.44 to 0.82) | 0.96 (0.95 to 0.97) | 0.81 (0.71 to 0.90) | 9/26 (35%) | 104/2172 (5%) | ||
| DistillerAI screening | 0.23 (0.10 to 0.44) | 0.99 (0.98 to 0.99) | 0.61 (0.53 to 0.69) | 20/26 (77%) | 30/2172 (1%) | ||
| Team 4 | |||||||
| Machine-assisted screening | 0.86 (0.66 to 0.95) | 0.94 (0.93 to 0.95) | 0.90 (0.83 to 0.96) | 4/28 (14%) | 199/2172 (9%) | 194/2172 (9%) | 14/300 |
| Single-reviewer screening | 0.82 (0.62 to 0.93) | 0.93 (0.92 to 0.94) | 0.88 (0.80 to 0.95) | 5/28 (18%) | 165/2172 (8%) | ||
| DistillerAI screening | 0.32 (0.17 to 0.52) | 0.97 (0.96 to 0.98) | 0.65 (0.56 to 0.73) | 19/28 (68%) | 69/2172 (3%) | ||
| Team 5 | |||||||
| Machine-assisted screening | 0.74 (0.55 to 0.87) | 0.95 (0.94 to 0.96) | 0.84 (0.77 to 0.92) | 8/31 (26%) | 187/2172 (9%) | 181/2172 (8%) | 11/300 |
| Single-reviewer screening | 0.74 (0.55 to 0.87) | 0.95 (0.94 to 0.95) | 0.84 (0.77 to 0.92) | 8/31 (26%) | 138/2172 (6%) | ||
| DistillerAI screening | 0.13 (0.05 to 0.31) | 0.97 (0.96 to 0.98) | 0.55 (0.49 to 0.61) | 27/31 (87%) | 65/2172 (3%) | ||
| Combined | |||||||
| Machine-assisted screening | 0.78 (0.66 to 0.90) | 0.95 (0.92 to 0.97) | 0.87 (0.83 to 0.90) | 6/30 (22%) | 8% | 165/2172 (8%) | 13/300 |
| Single-reviewer screening | 0.78 (0.66 to 0.89) | 0.94 (0.91 to 0.97) | 0.86 (0.82 to 0.89) | 6/30 (22%) | 7% | ||
| DistillerAI screening | 0.14 (0.00 to 0.31) | 0.98 (0.97 to 1.00) | 0.56 (0.53 to 0.59) | 25/30 (86%) | 2% | ||
CI = confidence interval; N = number
Fig. 2Sensitivities and specificities of machine-assisted screening, single-reviewer screening, and screening with DistillerAI alone
Fig. 3Receiver operating characteristics curve for DistillerAI