Seunghee Kim1, Jinwook Choi2. 1. Department of Biomedical Engineering, College of Medicine, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul 110-799, Republic of Korea. 2. Department of Biomedical Engineering, College of Medicine, Seoul National University, 103 Daehak-ro, Jongno-gu, Seoul 110-799, Republic of Korea. Electronic address: jinchoi@snu.ac.kr.
Abstract
OBJECTIVE: To determine whether SVM-based classifiers, which are trained on a combination of inclusion and common exclusion articles, are useful to experts reviewing journal articles for inclusion during new systematic reviews. METHODS: Test collections were built using the annotated reference files from 19 procedure and 4 drug systematic reviews. The classifiers were trained by balanced data sets, which were sampled using random sampling. This approach compared two balanced data sets, one with a combination of included and commonly excluded articles and one with a combination of included and excluded articles. AUCs were used as evaluation metrics. RESULTS: The AUCs of the classifiers, which were trained on the balanced data set with included and commonly excluded articles, were significantly higher than those of the classifiers, which were trained on the balanced data set with included and excluded articles. CONCLUSION: Automatic, high-quality article classifiers using machine learning could reduce the workload of experts performing systematic reviews when topic-specific data are scarce. In particular, when used as training data, a combination of included and commonly excluded articles is more helpful than a combination of included and excluded articles.
OBJECTIVE: To determine whether SVM-based classifiers, which are trained on a combination of inclusion and common exclusion articles, are useful to experts reviewing journal articles for inclusion during new systematic reviews. METHODS: Test collections were built using the annotated reference files from 19 procedure and 4 drug systematic reviews. The classifiers were trained by balanced data sets, which were sampled using random sampling. This approach compared two balanced data sets, one with a combination of included and commonly excluded articles and one with a combination of included and excluded articles. AUCs were used as evaluation metrics. RESULTS: The AUCs of the classifiers, which were trained on the balanced data set with included and commonly excluded articles, were significantly higher than those of the classifiers, which were trained on the balanced data set with included and excluded articles. CONCLUSION: Automatic, high-quality article classifiers using machine learning could reduce the workload of experts performing systematic reviews when topic-specific data are scarce. In particular, when used as training data, a combination of included and commonly excluded articles is more helpful than a combination of included and excluded articles.
Authors: Jean I Garcia-Gathright; Andrea Oh; Phillip A Abarca; Mary Han; William Sago; Marshall L Spiegel; Brian Wolf; Edward B Garon; Alex A T Bui; Denise R Aberle Journal: Comput Biol Med Date: 2015-01-13 Impact factor: 4.589