| Literature DB >> 35065679 |
Ian Shemilt1, Anna Noel-Storr2, James Thomas1, Robin Featherstone3, Chris Mavergames4.
Abstract
BACKGROUND: This study developed, calibrated and evaluated a machine learning (ML) classifier designed to reduce study identification workload in maintaining the Cochrane COVID-19 Study Register (CCSR), a continuously updated register of COVID-19 research studies.Entities:
Keywords: Automation; COVID-19; Cochrane Library; Crowdsourcing; Information retrieval; Machine learning; Methods/methodology; Searching; Study classifiers; Systematic reviews
Mesh:
Year: 2022 PMID: 35065679 PMCID: PMC8783177 DOI: 10.1186/s13643-021-01880-6
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Fig. 1Distribution of classifier scores among ‘included’ and ‘excluded’ calibration records (N=16,123) and related performance metrics
Distribution of classifier scores among ‘included’ and ‘excluded’ calibration records and related performance metrics
| Classifier score | 90–99 | 80–89 | 70–79 | 60–69 | 50–59 | 40–49 | 30–39 | 20–29 | 10–19 | 0–9 | Totals |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 2853 | 1059 | 610 | 402 | 284 | 202 | 195 | 180 | 129 | 91 | 6005 | |
| 83 | 156 | 190 | 237 | 290 | 364 | 578 | 885 | 1736 | 5599 | 10,118 | |
| 2936 | 1215 | 800 | 639 | 574 | 566 | 773 | 1065 | 1865 | 5690 | 16,123 | |
| 0.97 | 0.87 | 0.76 | 0.63 | 0.49 | 0.36 | 0.25 | 0.17 | 0.07 | 0.02 | ||
| 0.48 | 0.65 | 0.75 | 0.82 | 0.87 | 0.90 | 0.93 | 0.96 | 0.98 | 1.00 | ||
| 0.97 | 0.94 | 0.91 | 0.88 | 0.84 | 0.80 | 0.75 | 0.68 | 0.57 | 0.37 | ||
| 7 | |||||||||||
| 5950 | |||||||||||
| 5487 | |||||||||||
| 0.52 | |||||||||||
| 55 | |||||||||||
| 4631 | |||||||||||
| 4686 | |||||||||||
| 29.1% | |||||||||||
aAt threshold score = 7 (recall >0.99)
Fig. 2Distribution of classifier scores among ‘included’ and ‘excluded’ evaluation records (N=4722) and related performance metrics
Distribution of classifier scores among ‘included’ and ‘excluded’ evaluation records and related performance metrics
| Classifier Score | 90–99 | 80–89 | 70–79 | 60–69 | 50–59 | 40–49 | 30–39 | 20–29 | 10–19 | 0–9 | Totals |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1037 | 417 | 256 | 157 | 122 | 85 | 74 | 66 | 63 | 33 | 2310 | |
| 23 | 39 | 62 | 62 | 69 | 87 | 149 | 188 | 395 | 1338 | 2412 | |
| 1060 | 456 | 318 | 219 | 191 | 172 | 223 | 254 | 458 | 1371 | ||
| 0.98 | 0.91 | 0.81 | 0.72 | 0.64 | 0.49 | 0.33 | 0.26 | 0.14 | 0.02 | ||
| 0.45 | 0.63 | 0.74 | 0.81 | 0.86 | 0.90 | 0.93 | 0.96 | 0.99 | 1.00 | ||
| 0.98 | 0.96 | 0.93 | 0.91 | 0.89 | 0.86 | 0.81 | 0.77 | 0.68 | 0.49 | 0.98 | |
| 7 | |||||||||||
| 2285 | |||||||||||
| 1299 | |||||||||||
| 0.64 | |||||||||||
| 25 | |||||||||||
| 1113 | |||||||||||
| 0.99 | |||||||||||
| 1138 | |||||||||||
| 24.1% | |||||||||||
aAt threshold score = 7
Key characteristics of development, calibration and evaluation data sets
| Data set (classifier development stage) | Size | Number of eligible records (%) | Number of title-only records (%) | Number of title-only records that were eligible (%) | Provenance of records |
|---|---|---|---|---|---|
| 59,513 | 20,878 (35.1%) | 18,669 (31.4%) | 4495 (21.5%) | 3229 (5.4%)—Embase 2083 (3.5%)—preprint 54201 (91.1%)—PubMed | |
| 16,123 | 6005 (37.2%) | 3626 (22.5%) | 821 (13.7%) | 1994 (12.4%)—Embase 287 (1.8%)—pre-print 13842 (85.8%)—PubMed | |
| 4722 | 2310 (48.9%) | 896 (19.0%) | 285 (12.3%) | 89 (1.9%)—Embase 202 (4.3%)—pre-print 4431 (93.8%)—PubMed |