| Literature DB >> 33308292 |
E Popoff1, M Besada2, J P Jansen3, S Cope1, S Kanters1,4.
Abstract
BACKGROUND: Despite existing research on text mining and machine learning for title and abstract screening, the role of machine learning within systematic literature reviews (SLRs) for health technology assessment (HTA) remains unclear given lack of extensive testing and of guidance from HTA agencies. We sought to address two knowledge gaps: to extend ML algorithms to provide a reason for exclusion-to align with current practices-and to determine optimal parameter settings for feature-set generation and ML algorithms.Entities:
Keywords: Classification; Downsampling; Machine learning; Methods; Reasons for exclusion; Study selection; Systematic literature reviews; Text mining; Updates
Mesh:
Year: 2020 PMID: 33308292 PMCID: PMC7734810 DOI: 10.1186/s13643-020-01520-5
Source DB: PubMed Journal: Syst Rev ISSN: 2046-4053
Datasets and model parameters considered across 870 simulations
| Datasets (size) | Data scenarios | Downsampling | Word frequencies | Classification algorithms | Model metrics |
|---|---|---|---|---|---|
| Psoriasis (4442) | Abstract screening | With | Removing words appearing < 5 times across all citations | SVM | ROC |
| Lung cancer (12,769) | Full-text screening | Without | Removing words appearing < 10 times across all citations | Naïve Bayes | Sensitivity |
| Liver cancer (8507) | Removing full-text excludes | Removing words appearing < 100 times across all citations | Bagged CART | ||
| Melanoma (3089) | Removing words appearing < 500 times across all citations | ||||
| Obesity (5187) | Keeping top 50 words in terms of variable importancea | ||||
| Keeping top 100 words in terms of variable importancea | |||||
| Keeping top 500 words in terms of variable importancea |
aNot applicable to the SVM algorithm
bNot applicable to the bagged CART algorithm
Summary of systematic literature reviews included
| Datasets | Selection criteria | Abstract screening | Full-text screening | ||
|---|---|---|---|---|---|
| Population | Study design | Abstracts screened, | Abstracts included, | Full-texts included, | |
| Psoriasis | Adults patients with moderate to severe psoriasis | RCTs and observational studies | 4442 | 613 (13.8) | 171 (27.9) |
| Lung cancer | Adult patients with advanced or metastatic lung cancer | RCTs | 12,769 | 215 (1.7) | 66 (30.7) |
| Liver cancer | Adult patients with unresectable liver cancer | RCTs and observational studies | 8507 | 1141 (13.4) | 294 (25.8) |
| Melanoma | Adult patients with unresectable stage III or IV melanoma | RCTs | 3089 | 124 (4.0) | 41 (33.1) |
| Obesity | Adult patients with BMI ≥ 25 kg/m2 | RCTs with trial duration ≥ 12 months | 5187 | 228 (4.4) | 47 (20.6) |
Abbreviations: RCTs, randomized controlled trials
^Percentage of included abstracts selected for full-text screening
Fig. 1Complete study selection process to identify studies to exclude with reasons, including algorithm training and parameter setting
Fig. 2Comparing best fitting full-text decisions (SVM, frequency 5, ROC, downsampling) and abstract decision (SVM, frequency 5, sensitivity, downsampling) algorithm and settings
Model results for each dataset while varying all model characteristics
| Datasets | Sensitivity | Specificity | Precision | Accuracy | Correct reason for exclusion |
|---|---|---|---|---|---|
| Psoriasis | 84.97% (14.99%) | 75.72% (15.84%) | 19.82% (9.55%) | 76.17% (14.63%) | 88.66% (7.76%) |
| Lung cancer | 77.01% (21.36%) | 90.98% (9.05%) | 10.94% (7.19%) | 90.87% (8.88%) | 93.78% (5.89%) |
| Liver cancer | 84.23% (12.81%) | 67.66% (18.69%) | 19.38% (8.65%) | 68.73% (16.87%) | 82.58% (7.60%) |
| Melanoma | 88.05% (22.11%) | 87.05% (17.09%) | 27.60% (21.71%) | 87.07% (16.67%) | 89.31% (7.84%) |
| Obesity | 78.45% (23.95%) | 84.80% (15.10%) | 13.25% (10.25%) | 84.71% (14.68%) | 82.18% (16.02%) |
Sensitivity = TP/(TP + FN), specificity = TN/(TN + FP), precision = TP/(TP + FP), and accuracy = (TP + TN)/(TP + FP + FN + TN); where TP (true positive) is a true included citation identified as an include or no decision, FN (false negative) is a true included citation identified as an exclude with a reason for exclusion, TN (true negative) is a true excluded citation identified as an exclude with a reason for exclusion, and FP (false positive) is a true excluded citation identified as an include or identified as having no decision
Correct reason for exclusion was defined as the number of citations whose true reason for exclusion fell above the 90% threshold over the total number of citations with any reason for exclusion. Sensitivity, specificity, precision, and accuracy were calculated by holding each factor constant while averaging over all other model characteristics (e.g., downsampling and performance metric)
Abbreviations: SD, standard deviation
Fig. 3Comparative model results across all tested model characteristics
Model results for each factor while varying all other model characteristics
| Sensitivity | Specificity | Precision | Accuracy | Correct reason for exclusion | |
|---|---|---|---|---|---|
| Abstract decisions | 89.76% (10.74%) | 70.50% (15.48%) | 11.20% (4.53%) | 71.43% (14.65%) | 84.77% (6.12%) |
| Full-text decisions | 76.49% (15.74%) | 83.07% (13.02%) | 20.11% (9.49%) | 83.04% (12.27%) | 85.63% (8.97%) |
| Modified full-text | 81.37% (13.18%) | 84.86% (12.07%) | 24.23% (11.05%) | 84.90% (11.42%) | 87.52% (9.45%) |
| CART | 83.63% (16.56%) | 70.35% (15.20%) | 14.92% (7.91%) | 71.00% (14.04%) | 78.64% (7.72%) |
| NB | 79.17% (11.41%) | 88.45% (6.85%) | 22.59% (10.14%) | 88.30% (6.42%) | 91.38% (4.41%) |
| SVM | 87.20% (15.26%) | 74.05% (16.39%) | 15.43% (10.62%) | 74.78% (15.67%) | 84.76% (7.54%) |
| Frequency = 5 | 85.67% (15.31%) | 81.94% (13.09%) | 20.07% (11.47%) | 82.21% (12.23%) | 87.40% (7.25%) |
| Frequency = 10 | 85.54% (14.29%) | 82.13% (13.27%) | 20.70% (12.26%) | 82.39% (12.46%) | 87.23% (7.29%) |
| Frequency = 100 | 83.52% (11.73%) | 80.49% (13.38%) | 18.19% (10.08%) | 80.83% (12.59%) | 86.82% (8.25%) |
| Frequency = 500 | 78.81% (19.22%) | 67.38% (17.50%) | 10.93% (6.62%) | 68.20% (16.68%) | 80.31% (6.91%) |
| Importance = 50 | 82.19% (9.54%) | 85.44% (8.91%) | 19.08% (7.17%) | 85.51% (8.29%) | 89.49% (6.29%) |
| Importance = 100 | 83.59% (9.19%) | 85.10% (10.75%) | 20.49% (8.27%) | 85.22% (10.06%) | 89.46% (7.47%) |
| Importance = 500 | 82.60% (10.75%) | 84.73% (13.78%) | 23.82% (11.18%) | 84.84% (12.94%) | 89.95% (7.45%) |
| ROC | 83.90% (13.67%) | 77.09% (15.32%) | 17.09% (8.81%) | 77.52% (14.44%) | 84.02% (8.73%) |
| Sensitivity | 80.07% (15.43%) | 83.81% (13.34%) | 21.10% (12.27%) | 83.90% (12.59%) | 89.53% (6.23%) |
| Without downsampling | 75.46% (15.87%) | 85.55% (11.24%) | 23.27% (10.93%) | 85.45% (10.43%) | 88.93% (5.90%) |
| With downsampling | 89.62% (7.97%) | 73.40% (15.79%) | 13.76% (7.00%) | 74.12% (15.06%) | 83.01% (9.35%) |
Sensitivity = TP/(TP + FN), specificity = TN/(TN + FP), precision = TP/(TP + FP), and accuracy = (TP + TN)/(TP + FP + FN + TN); where TP (true positive) is a true included citation identified as an include or no decision, FN (false negative) is a true included citation identified as an exclude with a reason for exclusion, TN (true negative) is a true excluded citation identified as an exclude with a reason for exclusion, and FP (false positive) is a true excluded citation identified as an include or identified as having no decision
Correct reason for exclusion was defined as the number of citations whose true reason for exclusion fell above the 90% threshold over the total number of citations with any reason for exclusion. Sensitivity, specificity, precision, and accuracy were calculated by holding each factor constant while averaging over all other model characteristics (e.g., downsampling and performance metric)
Abbreviations: SD, standard deviation
Fig. 4Sensitivity and specificity of model variations showcasing differences in classification algorithms and sampling
Fig. 5Sensitivity and specificity of model variations showcasing differences in classification algorithms and data decisions