| Literature DB >> 32203803 |
Brian E Howard1, Jason Phillips2, Arpit Tandon2, Adyasha Maharana2, Rebecca Elmore2, Deepak Mav2, Alex Sedykh2, Kristina Thayer3, B Alex Merrick4, Vickie Walker4, Andrew Rooney4, Ruchir R Shah2.
Abstract
BACKGROUND: In the screening phase of systematic review, researchers use detailed inclusion/exclusion criteria to decide whether each article in a set of candidate articles is relevant to the research question under consideration. A typical review may require screening thousands or tens of thousands of articles in and can utilize hundreds of person-hours of labor.Entities:
Keywords: Active learning; Document screening; Evidence mapping; Machine learning; Recall estimation; Systematic review
Mesh:
Year: 2020 PMID: 32203803 PMCID: PMC8082972 DOI: 10.1016/j.envint.2020.105623
Source DB: PubMed Journal: Environ Int ISSN: 0160-4120 Impact factor: 9.621
Summary of datasets used to assess performance of active learning and recall estimation methods.
| Dataset | Source | Label Type | Records from Search | Included | Excluded |
|---|---|---|---|---|---|
| PFOA/PFOS and immunotoxicity | NIEHS | Full text | 6331 | 95 (1.5%) | 6236 (98.5%) |
| Bisphenol A (BPA) and obesity | NIEHS | Full text | 7700 | 111 (1.4%) | 7589 (98.6%) |
| Transgenerational inheritance of health effects | NIEHS | Tiab | 48,638 | 765 (1.6%) | 47,873 (98.4%) |
| Fluoride and neurotoxicity in animal models | NIEHS | Full text | 4479 | 51 (1.1%) | 4428 (98.9%) |
| Neuropathic pain | CAMARADES | Tiab | 29,207 | 5011 (17.2%) | 24,196 (82.8%) |
| Skeletal muscle relaxants | Full Text | 1643 | 9 (0.6%) | 1634 (99.4%) | |
| Opioids | Full Text | 1915 | 15 (0.8%) | 1900 (99.2%) | |
| Antihistamines | Full Text | 310 | 16 (5.2%) | 294 (94.8%) | |
| ADHD | Full Text | 851 | 20 (2.4%) | 831 (97.6%) | |
| Triptans | Full text | 671 | 24 (3.6%) | 647 (96.4%) | |
| Urinary Incontinence | Full text | 327 | 40 (12.2%) | 287 (87.8%) | |
| Ace Inhibitors | Full text | 2544 | 41 (1.6%) | 2503 (98.4%) | |
| Nonsteroidal anti-inflammatory | Full text | 393 | 41 (10.4%) | 352 (89.6%) | |
| Beta blockers | Full text | 2072 | 42 (2.0%) | 2030 (98.0%) | |
| Proton pump inhibitors | Full text | 1333 | 51 (3.8%) | 1282 (96.2%) | |
| Estrogens | Full text | 368 | 80 (21.7%) | 288 (78.3%) | |
| Statins | Full text | 3465 | 85 (2.5%) | 3380 (97.5%) | |
| Calcium-channel blockers | Full text | 1218 | 100 (8.2%) | 1118 (91.8%) | |
| Oral hypoglycemics | Full text | 503 | 136 (27.0%) | 367 (73.0%) | |
| Atypical antipsychotics | Full text | 1120 | 146 (13.0%) | 974 (87.0%) | |
| Mammalian | EBTC | Tiab | 1442 | 263 (18.2%) | 1179 (81.8%) |
| USDA 1 | USDA | Tiab | 1776 | 225 (12.7%) | 1551 (87.3%) |
| USDA 2 | USDA | Tiab | 9103 | 382 (4.2%) | 8721 (95.8%) |
| USDA 3 | USDA | Tiab | 608 | 9 (1.5%) | 599 (98.5%) |
| USDA 4 | USDA | Tiab | 104 | 12 (11.5%) | 92 (88.5%) |
| USDA 5 | USDA | Tiab | 1570 | 25 (1.6%) | 1545 (98.4%) |
Results of simulated screening experiments on 26 datasets using active learning and recall estimation with δ = 2. Mean and standard deviation over 30 trials with initially randomized order.
| Records from Search | % Screened | Cost | Theoretical WSS@95 | Obtained WSS@95 | Estimated Recall | Obtained True Recall | |
|---|---|---|---|---|---|---|---|
| Transgenerational inheritance of health effects | 48,638 | 0.371 | 0.128 | 0.742 (0.003) | 0.613 (0.001) | 0.950 | 0.986 (0.001) |
| Neuropathic pain | 29,207 | 0.402 | 0.040 | 0.613 (0.001) | 0.573 (0.022) | 0.950 | 0.976 (0.014) |
| USDA 2 | 9,103 | 0.332 | 0.099 | 0.755 (0.004) | 0.655 (0.010) | 0.950 | 0.987 (0.001) |
| Bisphenol A (BPA) and obesity | 7,700 | 0.354 | 0.161 | 0.807 (0.010) | 0.646 (0.013) | 0.950 | 1.000 (0.000) |
| PFOA/PFOS and immunotoxicity | 6,331 | 0.448 | 0.280 | 0.833 (0.009) | 0.552 (0.010) | 0.950 | 1.000 (0.000) |
| Fluoride and neurotoxicity in animal models | 4,479 | 0.443 | 0.324 | 0.862 (0.018) | 0.538 (0.009) | 0.950 | 0.981 (0.004) |
| Statins | 3,465 | 0.576 | 0.024 | 0.399 (0.035) | 0.375 (0.015) | 0.950 | 0.951 (0.002) |
| Ace inhibitors | 2,544 | 0.550 | 0.333 | 0.758 (0.023) | 0.425 (0.013) | 0.950 | 0.976 (0.000) |
| Beta blockers | 2,072 | 0.629 | 0.262 | 0.586 (0.016) | 0.324 (0.014) | 0.950 | 0.953 (0.004) |
| Opioids | 1,915 | 0.856 | 0.114 | 0.257 (0.028) | 0.144 (0.038) | 0.950 | 1.000 (0.000) |
| USDA 1 | 1,776 | 0.445 | 0.116 | 0.659 (0.007) | 0.543 (0.009) | 0.950 | 0.988 (0.004) |
| Skeletal muscle relaxants | 1,643 | 0.902 | 0.191 | 0.289 (0.065) | 0.098 (0.013) | 0.950 | 1.000 (0.000) |
| USDA 5 | 1,570 | 0.704 | 0.328 | 0.624 (0.040) | 0.296 (0.018) | 0.950 | 1.000 (0.000) |
| Mammalian | 1,442 | 0.580 | 0.121 | 0.529 (0.015) | 0.408 (0.014) | 0.950 | 0.988 (0.004) |
| Proton pump inhibitors | 1,333 | 0.743 | 0.139 | 0.397 (0.018) | 0.257 (0.009) | 0.950 | 1.000 (0.000) |
| Calcium Channel Blockers | 1,218 | 0.620 | 0.194 | 0.563 (0.021) | 0.369 (0.024) | 0.950 | 0.989 (0.005) |
| Atypical Antipsychotics | 1,120 | 0.680 | −0.070 | 0.165 (0.020) | 0.235 (0.014) | 0.950 | 0.915 (0.016) |
| ADHD | 851 | 0.694 | 0.474 | 0.734 (0.046) | 0.259 (0.017) | 0.950 | 0.953 (0.013) |
| Triptans | 671 | 0.782 | 0.240 | 0.458 (0.030) | 0.218 (0.012) | 0.950 | 1.000 (0.000) |
| USDA 3 | 608 | 0.863 | 0.094 | 0.231 (0.068) | 0.137 (0.018) | 0.950 | 1.000 (0.000) |
| Oral Hypoglycemics | 503 | 0.835 | −0.009 | 0.092 (0.018) | 0.101 (0.019) | 0.951 | 0.936 (0.039) |
| Nonsteroidal anti-inflammatory | 393 | 0.632 | 0.254 | 0.621 (0.019) | 0.368 (0.013) | 0.950 | 1.000 (0.000) |
| Estrogens | 368 | 0.664 | 0.129 | 0.454 (0.021) | 0.325 (0.013) | 0.951 | 0.989 (0.006) |
| Urinary Incontinence | 327 | 0.798 | 0.199 | 0.401 (0.018) | 0.202 (0.020) | 0.951 | 1.000 (0.000) |
| Antihistamines | 310 | 0.829 | −0.042 | 0.072 (0.034) | 0.115 (0.025) | 0.951 | 0.944 (0.019) |
| USDA 4 | 104 | 0.846 | 0.325 | 0.479 (0.070) | 0.154 (0.019) | 0.952 | 1.000 (0.000) |
Fig. A2.Performance (WSS) vs dataset size. The log-linear trendline (R2 = 0.61) indicates that work saved over random sampling is an increasing function of the number of references in the project. However, the relationship is too weak on its own for accurate prediction of recall.
Fig. A3.Violin plot of obtained true recall. The figure below shows the obtained recall for simulated screening of the 26 datasets, given estimated equal to 95%. The median obtained recall is 99%, indicating that the recall estimate tends to be conservative. In fact, the majority of the obtained true recall values (23/26) are above 95%. An outlier occurs at obtained recall equal to 91.5%.
Fig. A4.Actual and simulated score densities. Red line is for excluded documents; green is for included documents. All simulated data sets used overall inclusion rate of 0.05 and 10,000 total documents. Datasets shown are as follows: (a) BPA (actual); (b) BPA (simulated); (c) PFOS/PFOA (actual); (d) PFOS/PFOA (simulated); (e) Transgenerational health (actual); (f) Transgenerational health (simulated); (g) Neuropathic pain (actual); (h) Neuropathic pain (simulated). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. A5.Estimated recall using simulated score densities. True recall versus estimated recall. Datasets shown are (a) BPA; (b) PFOS/PFOA; (c) Transgenerational; (d) Neuropathic pain.
Performance of recall estimation method on simulated score distributions.
| Dataset | Lookback (δ) | Theoretical WSS@95 | Obtained WSS@95 | “Cost” | Actual Recall |
|---|---|---|---|---|---|
| 2 | 0.637 | 0.520 | 0.117 | 0.979 | |
| 2 | 0.693 | 0.594 | 0.099 | 0.982 | |
| 5 | 0.349 | 0.316 | 0.033 | 0.959 | |
| 5 | 0.259 | 0.261 | −0.002 | 0.948 |
Effect of list length on performance of recall estimation method. The dataset shown is BPA; the overall inclusion rate, p = 0.015; and the lookback, δ = 2. Cost is averaged over 5 trials.
| List length | Cost |
|---|---|
| 0.453 | |
| 0.167 | |
| 0.110 | |
| 0.048 | |
| 0.037 |
Effect of inclusion rate on performance of recall estimation method. The dataset shown is BPA; the list length = 10,000; and the lookback, δ = 2. Cost is averaged over 5 trials.
| p | Cost |
|---|---|
| 0.200 | |
| 0.110 | |
| 0.104 | |
| 0.079 | |
| 0.075 | |
| 0.052 |
Effect of lookback, δ, on performance of recall estimation method. The dataset shown is neuropathic pain; the list length = 30,000; and p = 0.17. Cost is average over 5 trials.
| δ | Cost |
|---|---|
| −0.063 | |
| −0.005 | |
| 0.008 | |
| 0.017 | |
| 0.062 | |
| 0.072 |
Fig. A6.Effect of lookback, δ, on recall estimate variability. In panel (a) δ = 1 and in panel (b) δ = 100. The results illustrate that increasing lookback, δ, decreases variability in the recall estimate.
Fig. A1.SWIFT-Active Screener user interface. The review summary screen (A) shows the progress so far on the review and includes the overall estimated recall along with number of included and excluded documents for each screener. The Screen References window (B) displays the current title and abstract to the screener for review.