| Literature DB >> 33810798 |
John Zimmerman1, Robin E Soler2, James Lavinder3, Sarah Murphy3, Charisma Atkins2, LaShonda Hulbert2, Richard Lusk3, Boon Peng Ng2,4.
Abstract
BACKGROUND: Systematic Reviews (SR), studies of studies, use a formal process to evaluate the quality of scientific literature and determine ensuing effectiveness from qualifying articles to establish consensus findings around a hypothesis. Their value is increasing as the conduct and publication of research and evaluation has expanded and the process of identifying key insights becomes more time consuming. Text analytics and machine learning (ML) techniques may help overcome this problem of scale while still maintaining the level of rigor expected of SRs.Entities:
Keywords: Applied case study; Machine learning; Machine learning configurations; Natural language processing; Systematic review screening; Transfer learning
Year: 2021 PMID: 33810798 PMCID: PMC8017891 DOI: 10.1186/s13643-021-01640-6
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Fig. 1Flow chart of proposed approach to operationalize machine learning in a down selection process
Average post hoc model performance
| Average post hoc model performance | ||||||
|---|---|---|---|---|---|---|
| Data set | Prediction threshold for 1st iteration | Prediction threshold for 2nd iteration | Prediction threshold for 3rd iteration | Total articles | % of total human-reviewed articles needed to return 95% relevant articles | % of total human-reviewed articles needed to return 98% relevant articles |
| 1st SR review | 50.0% | 20.0% | 20% | 14,655 | 19.3% | 24% |
| 2nd SR review | 50.1% | 30.3% | 44% | 15,234 | 18.9% | 25% |
| 3rd SR review | 75.0% | 20.0% | 20% | 7,670 | 10.0% | 34% |
| 4th SR review | 70.0% | 27.5% | 19.5% | 1,820 | 30.0% | 41.8% |
| Weighted average | 57.6% | 26.0% | 29.5% | N/A | 20.9% | 29.8% |
Results for each iteration and random sample
| Relevant training sample (iteration) | Non-relevant sample (iteration) | Threshold selected (iteration) | Articles for review (iteration) | Relevant articles (iteration) | Non-relevant articles (iteration) | Total articles reviewed (cumulative) | Total articles not reviewed (cumulative) | |
|---|---|---|---|---|---|---|---|---|
| First iteration | 15 | 40 | 0.4 | 458 | 155 | 303 | 513 | 2624 |
| Second iteration | 170 | 343 | 0.3 | 260 | 43 | 217 | 773 | 2364 |
| Third iteration | 213 | 560 | 0.3 | 45 | 0 | 45 | 818 | 2319 |
| Fourth iteration | 213 | 605 | Not selected | N/A | N/A | N/A | 818 | 2319 |
| Random sample | N/A | N/A | N/A | 156 | 1 | 155 | 974 | 2163 |
Final results after quality check
| Number of articles | Percent of total | |
|---|---|---|
| Total articles reviewed during ML down selection process | 974 | 31.0% |
| Total articles not reviewed | 2163 | 69.0% |
| Total relevant articles meeting inclusion criteria during ML down selection process ML–iterative review only | 213 | 6.8% |
| Total relevant articles meeting review after random sampling error check and iterative review | 214 | 6.8% |