| Literature DB >> 24475099 |
Tanja Bekhuis1, Eugene Tseytlin1, Kevin J Mitchell1, Dina Demner-Fushman2.
Abstract
OBJECTIVES: Evidence-based medicine depends on the timely synthesis of research findings. An important source of synthesized evidence resides in systematic reviews. However, a bottleneck in review production involves dual screening of citations with titles and abstracts to find eligible studies. For this research, we tested the effect of various kinds of textual information (features) on performance of a machine learning classifier. Based on our findings, we propose an automated system to reduce screeing burden, as well as offer quality assurance.Entities:
Mesh:
Year: 2014 PMID: 24475099 PMCID: PMC3903545 DOI: 10.1371/journal.pone.0086277
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Computer-assisted screening task.
Depicts a computer-assisted, decision support system for systematic reviewers. Instead of screening an entire set of citations twice, reviewers divide the labor. The system could further reduce screening burden, as well as offer quality assurance by confirming concordant decisions and naming studies that need to be reconsidered. A and B are random halves of the citations from a review. A|B = independent test of classifier on A dataset given model from training on B; B|A = independent test of classifier on B dataset given model from training on A. TN = true negative; FN = false negative; TP = true positive; FP = false positive; m = confusion matrix that displays classification results for an independent test.
Figure 2EDDA workflow.
An overview of the project workflow. EDDA = Evidence in Documents, Discovery, and Analysis. Reference Filer = in-house Java program that sorts citations into folders; resultant datasets A and B are random halves of the citations stratified with respect to eligibility for provisional inclusion in a systematic review; citations include titles, abstracts, and metadata. RapidMiner is an open source, data mining suite. cNB = Weka complement naïve Bayes classifier available in Rapid Miner; suitable for imbalanced data typical of systematic reviews. Grid Parameter Optimization operator searches for best performance over a grid; dimensions based on combinations of parameter settings.
Number and allocation of citations per systematic review.
| Influenza | Malaria | Galacto | Organ Trans | Ameloblastoma | |
| A exclude | 2593 | 1245 | 1052 | 5155 | 811 |
| A include | 154 | 177 | 47 | 243 | 57 |
| Subtotal (% eligible) | 2747 (5.6%) | 1424 (12.4%) | 1100 (4.3%) | 5398 (4.5%) | 868 (6.7%) |
| B exclude | 2575 | 1246 | 1053 | 5154 | 890 |
| B include | 163 | 178 | 47 | 244 | 58 |
| Subtotal (% eligible) | 2738 (6.0%) | 1422 (12.5%) | 1099 (4.3%) | 5398 (4.5%) | 948 (6.1%) |
| Total (% eligible) | 5485 (5.8%) | 2846 (12.5%) | 2199 (4.3%) | 10796 (4.5%) | 1816 (6.3%) |
Galacto = Galactomannan.
Organ Trans = Organ Transplant.
% eligible = percentage provisionally eligible for inclusion in a review; judgments based on screening citations (titles and abstracts) by domain experts.
Feature set size by systematic review before and after filtering for information gain.
| N features (n if IG≥0.001) | |||||
| Alphabetic | Alphanumeric+ | Indexing | Topic model | SR concepts | |
|
| |||||
| A | 6880 (4759) | 52013 (10404) | 5392 (5251) | 1602 (1601) | 821 (681) |
| B | 6982 (4740) | 52231 (13043) | 5361 (5226) | 1602 (1601) | 821 (697) |
|
| |||||
| A | 4901 (3274) | 35947 (10481) | 1391 (1353) | 902 (901) | 575 (519) |
| B | 4937 (3249) | 36375 (10272) | 1382 (1318) | 902 (901) | 575 (531) |
|
| |||||
| A | 4026 (2757) | 27960 (6423) | 1012 (1000) | 602 (594) | 449 (359) |
| B | 4057 (2761) | 27561 (6422) | 1060 (1048) | 602 (601) | 449 (352) |
|
| |||||
| A | 11765 6680) | 72807 (15005) | 5577 (5435) | 1902 (1894) | 793 (571) |
| B | 11856 6294) | 72529 (13872) | 5521 (4617) | 1902 (1901) | 793 (587) |
|
| |||||
| A | 3247 (2303) | 19311 (5043) | 563 (556) | 602 (601) | 351 (279) |
| B | 3352 (2422) | 20712 (5311) | 611 (601) | 602 (600) | 351 (290) |
IG = information gain.
SR = systematic review.
A and B refer to random halves of the data.
Sample of features by type for the influenza review.
| Alphabetic | Alphanumeric+ | Indexing | Topic model | SR concepts |
| ag | aged | *Aged |
| Old_age |
| elderli | elderly | Aged | cells | Elderly_population_group |
| influenza | influenza | *Influenza Vaccines | autologous | Influenza_vaccination |
| vaccin | vaccines | Influenza Vaccines | virology | Aged_80_and_over |
| influenza_vaccin | influenza_vaccination | elderly | measured | prevention_control |
| epidemiolog | epidemiology/prevention | 80 and over | virus-specific | Nursing_Homes |
| agent | vaccines/adverse | immunology | t-lymphocytes | therapeutic_aspects |
| advers | agents/ae | Influenza A virus/ | cytotoxicity | Vaccines |
| epidem | h3n2_epidemic | (Antigens, Viral) | activity | Sudden_death |
| Case | case-control | Serology and Transplantation | ctl | Mortality_Vital_Statistics |
| Control | 1990–1991 | Case-Control Studies | cytotoxic | Respiratory_Tract_Infections |
| Commun | community-dwelling | case report | … | historical_cohort_design |
| Sydnei | a/Sydney/05/97 | adverse effects |
| case_comparison_design |
| Journal | new_england_journal_of_medicine | 147205-72-9 (CD40 Ligand) | observed | Chronic_obstructive_asthma_with … |
| Blind | double_blinded | Interferon-gamma/bi | cytokines/bl | |
| cytokine | ||||
| il-10 | ||||
| obtained | ||||
| il-6 | ||||
| assay | ||||
| results | ||||
| blood | ||||
| cytokines | ||||
| … | ||||
|
| ||||
| … | ||||
| include_divergence | ||||
| exclude_divergence |
Note: For alphabetic and alphanumeric+ sets, features with an underscore between pairs of words came from titles. For the indexing set, features mainly came from MeSH and Emtree; an asterisk indicates a major concept. For topic model set, number of topics determined prior to training (see Methods); based on alphanumeric+ features; Kullbach-Leibler (KL) divergences from mediods for include or exclude class. SR = systematic review. For SR concepts, lexicon consists of UMLS concepts (including parent and children) in SRs and study design terms.
Mean performance of the cNB classifier by systematic review and feature set.
| Alphanumeric+ | Alphabetic | Topics | SR concepts | Indexing | |
|
| |||||
| Ameloblastoma | 75.11 | 74.52 | 71.51 | 68.22 | 68.68 |
| Influenza | 65.52 | 57.16 | 61.97 | 59.38 | 63.11 |
| Galactomannan | 87.31 | 90.73 | 74.73 | 78.88 | 74.13 |
| Malaria | 88.09 | 89.30 | 86.42 | 83.33 | 81.85 |
| Organ transplant | 57.82 | 64.39 | 59.17 | 54.24 | 52.52 |
|
|
|
|
|
|
|
|
| |||||
| Ameloblastoma | 80.01 | 79.98 | 78.27 | 87.78 | 81.76 |
| Influenza | 76.44 | 59.68 | 73.63 | 76.33 | 77.70 |
| Galactomannan | 89.37 | 96.81 | 96.81 | 95.74 | 92.55 |
| Malaria | 90.98 | 95.77 | 93.80 | 92.67 | 90.69 |
| Organ transplant | 59.77 | 71.87 | 74.95 | 74.14 | 80.31 |
|
|
|
|
|
|
|
|
| |||||
| Ameloblastoma | 49.15 | 46.47 | 40.39 | 26.89 | 28.21 |
| Influenza | 30.08 | 42.00 | 25.83 | 19.82 | 23.50 |
| Galactomannan | 72.38 | 58.12 | 24.51 | 30.51 | 26.64 |
| Malaria | 68.75 | 55.55 | 50.60 | 43.73 | 43.78 |
| Organ transplant | 46.05 | 33.25 | 20.45 | 17.05 | 13.82 |
|
|
|
|
|
|
|
|
| |||||
| Ameloblastoma | 6.66 | 7.15 | 8.73 | 20.31 | 14.39 |
| Influenza | 12.31 | 7.16 | 13.82 | 19.24 | 15.90 |
| Galactomannan | 1.91 | 3.14 | 12.92 | 9.51 | 11.33 |
| Malaria | 6.33 | 10.09 | 12.20 | 15.78 | 15.74 |
| Organ transplant | 5.06 | 7.78 | 14.29 | 18.90 | 26.08 |
|
|
|
|
|
|
|
Baseline F3 (%): Ameloblastoma = 40.20; Influenza = 38.96; Galactomannan = 31.00; Malaria = 58.82; Organ transplant = 32.03. All mean F3 values surpassed the baseline values, one-tailed Z-tests, P<0.001.
Higher ranks associated with better performance.
Lower ranks associated with better performance.
Mean ranks significantly different for F3, precision, and classification error: Friedman's test of mean F3 ranks (4 df) = 9.760, P = .045; Friedman's test of mean precision ranks (4 df) = 16.480, P = .002; Friedman's test of mean classification error ranks (4 df) = 16.480, P = .002.
Friedman's test of mean recall ranks (4 df) = 1.980, P = .739, NS.