| Literature DB >> 28648605 |
Georgios Kontonatsios1, Austin J Brockmeier2, Piotr Przybyła1, John McNaught1, Tingting Mu1, John Y Goulermas2, Sophia Ananiadou3.
Abstract
Citation screening, an integral process within systematic reviews that identifies citations relevant to the underlying research question, is a time-consuming and resource-intensive task. During the screening task, analysts manually assign a label to each citation, to designate whether a citation is eligible for inclusion in the review. Recently, several studies have explored the use of active learning in text classification to reduce the human workload involved in the screening task. However, existing approaches require a significant amount of manually labelled citations for the text classification to achieve a robust performance. In this paper, we propose a semi-supervised method that identifies relevant citations as early as possible in the screening process by exploiting the pairwise similarities between labelled and unlabelled citations to improve the classification performance without additional manual labelling effort. Our approach is based on the hypothesis that similar citations share the same label (e.g., if one citation should be included, then other similar citations should be included also). To calculate the similarity between labelled and unlabelled citations we investigate two different feature spaces, namely a bag-of-words and a spectral embedding based on the bag-of-words. The semi-supervised method propagates the classification codes of manually labelled citations to neighbouring unlabelled citations in the feature space. The automatically labelled citations are combined with the manually labelled citations to form an augmented training set. For evaluation purposes, we apply our method to reviews from clinical and public health. The results show that our semi-supervised method with label propagation achieves statistically significant improvements over two state-of-the-art active learning approaches across both clinical and public health reviews.Entities:
Keywords: Active learning; Citation screening; Label propagation; Semi-supervised learning; Text classification
Mesh:
Year: 2017 PMID: 28648605 PMCID: PMC5726085 DOI: 10.1016/j.jbi.2017.06.018
Source DB: PubMed Journal: J Biomed Inform ISSN: 1532-0464 Impact factor: 6.317
Fig. 1Architecture of the semi-supervised active learning approach used for citations screening.
Fig. 2Smoothed density function of the distances between pairs of citations in a spectral embedded feature space, which shows that distances between two eligible citations is typically less than the distance between arbitrary pairs of citations.
Fig. 3t-SNE visualisation of citations, encoded in a spectral embedded feature space, of a clinical and a public health review. Solid blue dots indicate eligible citations and red crosses indicate ineligible citations. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Characteristics of the employed systematic review datasets.
| Domain | # Instances | # eligible / # ineligible | |
|---|---|---|---|
| Proton Beam | Clinical | 4751 | 0.05 |
| COPD | Clinical | 1606 | 0.14 |
| Cooking Skills | Public health | 11,515 | 0.02 |
| Sanitation | Public health | 5464 | 0.10 |
| Tobacco Packaging | Public health | 3210 | 0.05 |
| Youth Development | Public health | 15,544 | 0.11 |
Fig. 4Utility performance achieved by certainty and uncertainty-based active learning models when applied to a clinical (Proton Beam) and a public health (Tobacco Packaging) review.
Average utility performance (%) of certainty-based and uncertainty-based active learning models () when a seed size of 5%, 10%, 25% and 100% of the instances are used for training across two clinical (i.e., COPD and Proton Beam) and four public health reviews (i.e., Cooking Skills, Sanitation, Tobacco Packaging and Youth Development). Emboldened values indicate the highest utility performance for a given seed size and dataset. The table also summarises the average standard deviation (i.e., average SD) of utility values across 10 runs while the last two rows of the table report the average gain in utility over the baseline AL method that is achieved by the two semi-supervised methods, namely SemiBow and SemiSpectral, across all six systematic review datasets. The superscript indicates that the corresponding semi-supervised method significantly outperformed the AL method (across the datasets with a one-tailed sign test with at a level of ).
| Dataset | Method | Percentage of citations manually screened | ||||
|---|---|---|---|---|---|---|
| 5% | 10% | 25% | 100% | |||
| AL | 60.92/73.71 | 64.37/80.63 | 78.88/90.10 | 92.35/95.14 | ||
| COPD | SemiBow | 65.33/77.26 | 75.41/80.57 | 86.19/89.36 | 94.19/94.91 | |
| SemiSpectral | 65.30/ | 74.35/ | 85.65/ | 94.06/ | ||
| AL | 47.57/79.23 | 62.57/88.31 | 82.65/94.39 | 93.33/96.21 | ||
| Proton | SemiBow | 50.68/ | 68.34/88.90 | 84.94/94.72 | 93.84/96.31 | |
| Beam | SemiSpectral | 53.65/79.57 | 70.49/ | 86.03/ | 94.12/ | |
| AL | 46.66/47.59 | 59.40/60.56 | 75.17/73.69 | 89.68/88.59 | ||
| Cooking | SemiBow | 56.26/57.13 | 68.05/66.11 | 80.75/76.77 | 91.43/89.07 | |
| Skills | SemiSpectral | |||||
| AL | 24.44/ | 32.10/32.23 | 52.09/48.49 | 82.49/80.63 | ||
| Sanitation | SemiBow | 24.27/24.68 | 35.37/32.82 | 54.54/48.36 | 83.18/80.59 | |
| SemiSpectral | 24.37/17.30 | |||||
| AL | 45.70/43.48 | 53.96/55.79 | 75.35/72.70 | 90.85/90.06 | ||
| Tobacco | SemiBow | 50.27/55.61 | 61.92/62.79 | 78.66/75.78 | 91.68/90.56 | |
| Pack. | SemiSpectral | 54.70/ | 63.98/ | |||
| AL | 22.71/28.09 | 31.34/36.52 | 51.97/56.34 | 82.81/83.66 | ||
| Youth | SemiBow | 32.61/41.43 | 42.62/46.48 | 61.77/60.02 | 85.69/84.43 | |
| Dev. | SemiSpectral | 36.40/ | 44.13/ | 62.29/ | ||
| AL | ||||||
| Average SD | SemiBow | |||||
| SemiSpectral | ||||||
| Average gain | SemiBow | 5.23/6.33 | 7.99★/3.93 | 5.12★/1.55★ | 1.42★/0.26★ | |
| over AL | SemiSpectral | 7.85/8.62 | 9.59★/6.73 | 6.07★/4.11★ | 1.70★/0.99★ | |