| Literature DB >> 30646959 |
Alexandra Bannach-Brown1,2,3, Piotr Przybyła4, James Thomas5, Andrew S C Rice6, Sophia Ananiadou4, Jing Liao7, Malcolm Robert Macleod7.
Abstract
BACKGROUND: Here, we outline a method of applying existing machine learning (ML) approaches to aid citation screening in an on-going broad and shallow systematic review of preclinical animal studies. The aim is to achieve a high-performing algorithm comparable to human screening that can reduce human resources required for carrying out this step of a systematic review.Entities:
Keywords: Analysis of human error; Automation tools; Citation screening; Machine learning; Systematic review
Mesh:
Year: 2019 PMID: 30646959 PMCID: PMC6334440 DOI: 10.1186/s13643-019-0942-7
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Fig. 1Diagram of the layout of the study
Equations used to assess performance of machine learning algorithms
| Sensitivity or recall | TP/(TP + FN) |
| Specificity | TN/(TN + FP) |
| Precision | TP/(TP + FP) |
| Accuracy | (TP + TN)/(TP + FP + FN + TN) |
| WSS@95% | ((TN + FN)/N) – (1.0 – 0.95) |
| Positive likelihood ratio (LR+) | (Sensitivity)/(1-specificity) |
Sensitivity, specificity, precision, accuracy and WSS@95% equations from [5]. Positive likelihood ratio equation from [45]
Fig. 2Error analysis. The methodology for using cross-validation to assign ML-predicted probability scores. The ML-predicted probability scores for the records were checked against the original human inclusion decision
Performance of machine learning approaches on depression training dataset
| Approach 1 | Approach 2 | |
|---|---|---|
| Training set size | 5749 | 5749 |
| Optimal cut-off score | 0.1 | 0.07 |
| Sensitivity | 98.7% | 98.7% |
| Upper 95% CI | 0.997 | 0.997 |
| Lower 95% CI | 0.949 | 0.949 |
| Specificity | 86.0% | 84.7% |
| Precision | 50% | 47.66% |
| Accuracy | 1096/1251 = 87.6% | 1081/1251 = 86.4% |
| WSS@95% | 0.705 | 0.693 |
| LR+ | 7.421 | 9.451 |
Fig. 3Performance of machine learning approaches. For the interactive version of this plot with cut-off values, see code and data at https://github.com/abannachbrown/The-use-of-text-mining-and-machine-learning-algorithms-in-systematic-reviews/blob/master/ML-fig3.html
Reclassification of records in validation after error analysis
| Test 1—original machine learning algorithms results | ||||
|---|---|---|---|---|
| In | Out | Total | ||
| Test 2—post-error analysis ML results | In | 153 | 153 | 306 |
| 160 | 116 | 276 | ||
| Out | 2 | 943 | 945 | |
| 3 | 972 | 975 | ||
| Total | 155 | 1096 | 1251 | |
| 163 | 1088 | |||
Performance of machine learning approach after error analysis
| Updated approach 1 | Original approach 1 | |
|---|---|---|
| Cut-off | 0.09 | 0.10 |
| Sensitivity | 98.7% | 98.7% |
| Upper 95% CI of sensitivity | 0.997 | 0.997 |
| Lower 95% CI of sensitivity | 0.949 | 0.949 |
| Specificity | 88.3% | 86.7% |
| Precision | 55.9% | 52.61% |
| Accuracy | 89.7% | 88.2% |
| WSS@95% | 961/1251 – (0.05) = 0.718 | 945/1251 – (0.05) = 0.705 |
| LR+ | 8.436 | 7.421 |
Fig. 4Performance of approach 1 after error analysis. The updated approach is retrained on the corrected training set after error analysis correction. Performance on both the original and the updated approach is measured on the corrected validation set (with error analysis correction). For the interactive version of this plot with the ability to read off performance at all cut-off values, see code and data at https://github.com/abannachbrown/The-use-of-text-mining-and-machine-learning-algorithms-in-systematic-reviews/blob/master/error-analysis-plot.html