| Literature DB >> 24954015 |
Makoto Miwa1, James Thomas2, Alison O'Mara-Eves3, Sophia Ananiadou4.
Abstract
In systematic reviews, the growing number of published studies imposes a significant screening workload on reviewers. Active learning is a promising approach to reduce the workload by automating some of the screening decisions, but it has been evaluated for a limited number of disciplines. The suitability of applying active learning to complex topics in disciplines such as social science has not been studied, and the selection of useful criteria and enhancements to address the data imbalance problem in systematic reviews remains an open problem. We applied active learning with two criteria (certainty and uncertainty) and several enhancements in both clinical medicine and social science (specifically, public health) areas, and compared the results in both. The results show that the certainty criterion is useful for finding relevant documents, and weighting positive instances is promising to overcome the data imbalance problem in both data sets. Latent dirichlet allocation (LDA) is also shown to be promising when little manually-assigned information is available. Active learning is effective in complex topics, although its efficiency is limited due to the difficulties in text classification. The most promising criterion and weighting method are the same regardless of the review topic, and unsupervised techniques like LDA have a possibility to boost the performance of active learning without manual annotation.Entities:
Keywords: Active learning; Certainty; Systematic reviews; Text mining
Mesh:
Year: 2014 PMID: 24954015 PMCID: PMC4199186 DOI: 10.1016/j.jbi.2014.06.005
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317
Fig. 1The active learning process.
Fig. 2Stages and conditions evaluated.
The characteristics of the Clinical data sets.
| Proton beam | Micro nutrients | Copd | |
|---|---|---|---|
| #Positives | 243 | 258 | 196 |
| #Negatives | 4,508 | 3,752 | 1,410 |
| Title | |||
| Abstract | |||
| Title concepts | |||
| Keywords |
The characteristics of the Social science data sets.
| Cooking Skills | Sanitation | Tobacco Packaging | Youth Development | |
|---|---|---|---|---|
| #Positives | 220 | 498 | 149 | 1537 |
| #Negatives | 11,295 | 4966 | 3061 | 14,007 |
| Title | ||||
| Abstract | ||||
| Title concepts | x | x | x | x |
| Keywords | x | x | x | x |
Fig. 3Evaluation on the micro nutrients corpus with different criteria and weighting methods (Stage 1 of Fig. 2).
Fig. 4Evaluation on the micro nutrients corpus with different enhancements (Stage 2 of Fig. 2).
Fig. 5Evaluation of different criteria and weighting methods with a previous analysis carried out on the micro nutrients corpus (Stage 1 of Fig. 2).
Fig. 6Evaluation of different enhancements with a previous analysis carried out on the micro nutrients corpus (Stage 2 of Fig. 2).
Fig. 7Evaluation on the Cooking Skills corpus with different criteria and weighting methods (Stage 1 of Fig. 2).
Fig. 8Evaluation on the Cooking Skills corpus with different enhancements (Stage 2 of Fig. 2).
Fig. 9Evaluation on the micro nutrients corpus with different views.