| Literature DB >> 28938902 |
Yannis Papanikolaou1, Grigorios Tsoumakas2, Manos Laliotis3, Nikos Markantonatos4, Ioannis Vlahavas2.
Abstract
BACKGROUND: In this paper we present the approach that we employed to deal with large scale multi-label semantic indexing of biomedical papers. This work was mainly implemented within the context of the BioASQ challenge (2013-2017), a challenge concerned with biomedical semantic indexing and question answering.Entities:
Keywords: BioASQ; Machine learning; Multi-label ensemble; Multi-label learning; Semantic indexing; Supervised learning
Mesh:
Year: 2017 PMID: 28938902 PMCID: PMC5610407 DOI: 10.1186/s13326-017-0150-0
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Number of articles being added to MEDLINE each year from 1950 to 2017
Fig. 2Labels per document for a subset of 4.3 million references of MEDLINE
Fig. 3MeSH concept frequencies for a subset of 4.3 million references of MEDLINE
Characteristics of the aforementioned multi-label ensemble methods and MULE
| Ensemble method | Composition | Combination | Combination |
|---|---|---|---|
| scheme | level | ||
| ECC [ | Homogeneous | Fusion | Global |
| EPS [ | Homogeneous | Fusion | Global |
| HTD, HBAYES, | Homogeneous | Fusion | Global |
| HTPR [ | |||
| Bayesian Networks | Homogeneous | Fusion | Global |
| Ensemble [ | |||
| Tahir et al. [ | Heterogeneous | Fusion | Gobal |
| Yepes et al. [ | Heterogeneous | Selection | Local |
| MULE | Heterogeneous | Selection | Local |
Chronological period covered by the training, validation and test sets for both data sets
| Period | |
|---|---|
| Data set A | |
| Training set | October 2007 - January 2012 |
| Validation set | December 2012 - July 2013 |
| Test set | July 2013 - January 2014 |
| Data set B | |
| Training set | July 2013 - October 2013 |
| Validation set | October 2013 - December 2013 |
| Test set | December 2013 - January 2014 |
Performance of component models for the test sets of data sets A and B
| Micro-F | Macro-F | ||||
|---|---|---|---|---|---|
| Model | A | B | A | B | |
| Meta-Labeler | 0.58555 | 0.49853 | 0.54884 | 0.43381 | |
| Vanilla SVM | 0.55675 | 0.41254 | 0.47891 | 0.35355 | |
| Tuned SVM | 0.56653 | 0.45631 | 0.51022 | 0.37922 | |
| LLDA | 0.36983 | 0.38873 | 0.30100 | 0.37140 | |
Comparison of the three ensemble methods for both data sets with respect to the micro-F measure
| Micro-F measure | |||||||
|---|---|---|---|---|---|---|---|
| Data set |
|
|
|
| Improve micro-F | Improve F [ | MULE |
| A | |||||||
| ✓ | ✓ | 0.58546 | 0.58127 △ | 0.58705 | |||
| ✓ | ✓ | 0.58601 | 0.58260 △ | 0.58734 | |||
| ✓ | ✓ | 0.55522 | 0.52144 △ | 0.55675 | |||
| ✓ | ✓ | ✓ | 0.57246 | 0.54166 △ | 0.57458 | ||
| ✓ | ✓ | ✓ | ✓ | 0.58695 | 0.55836 △ | 0.58919 | |
| B | |||||||
| ✓ | ✓ | 0.50136 | 0.49445 △ | 0.50435 | |||
| ✓ | ✓ | 0.50144 | 0.49329 △ | 0.50522 | |||
| ✓ | ✓ | 0.44159 | 0.42726 △ | 0.44304 | |||
| ✓ | ✓ | ✓ | 0.46247 | 0.45685 △ | 0.45868 | ||
| ✓ | ✓ | ✓ | ✓ | 0.50058 | 0.49227 △ | 0.50353 |
“Improve micro-F” is the initial version of MULE, without the statistical test. “Improve-F” is the method proposed by [13]. A △ symbol suggests that the difference with the best performing model is statistically significant with a z-test and a significance level of 0.05
Comparison of the three ensemble methods for both data sets with respect to the macro-F measure
| Macro-F measure | ||||||
|---|---|---|---|---|---|---|
| Data set |
|
|
|
| Improve F [ | MULE Macro |
| A | ||||||
| ✓ | ✓ | 0.53390 △ | 0.54820 | |||
| ✓ | ✓ | 0.53221 △ | 0.54921 | |||
| ✓ | ✓ | 0.42563 △ | 0.47918 | |||
| ✓ | ✓ | ✓ | 0.49437 △ | 0.51099 | ||
| ✓ | ✓ | ✓ | ✓ | 0.52487 △ | 0.54847 | |
| B | ||||||
| ✓ | ✓ | 0.42573 △ | 0.43342 | |||
| ✓ | ✓ | 0.42429 △ | 0.43212 | |||
| ✓ | ✓ | 0.37556 | 0.37335 | |||
| ✓ | ✓ | ✓ | 0.38149 | 0.38058 | ||
| ✓ | ✓ | ✓ | ✓ | 0.42240 △ | 0.43324 | |
Comparison of the three ensemble methods regarding the number of labels predicted by each model
| # of labels predicted from each model | ||||
|---|---|---|---|---|
|
|
|
|
| |
| Data set A | ||||
| Improve micro-F | 10751 | 15002 | ||
| Improve F [ | 11256 | 14497 | ||
| MULE | 25192 | 561 | ||
| Improve micro-F | 19549 | 6204 | ||
| Improve F [ | 15293 | 10460 | ||
| MULE | 25322 | 431 | ||
| Improve micro-F | 18862 | 6891 | ||
| Improve F [ | 12900 | 12853 | ||
| MULE | 25702 | 51 | ||
| Improve micro-F | 8213 | 17037 | 503 | |
| Improve F [ | 8723 | 16351 | 679 | |
| MULE | 25210 | 526 | 17 | |
| Improve micro-F | 10066 | 2938 | 2499 | 250 |
| Improve F [ | 10887 | 2815 | 11782 | 269 |
| MULE | 24814 | 174 | 760 | 5 |
| Data set B | ||||
| Improve micro-F | 4252 | 12059 | ||
| Improve F [ | 4699 | 11612 | ||
| MULE | 16053 | 258 | ||
| Improve micro-F | 9342 | 6969 | ||
| Improve F [ | 10920 | 5391 | ||
| MULE | 15826 | 485 | ||
| Improve micro-F | 1500 | 14811 | ||
| Improve F [ | 801 | 15510 | ||
| MULE | 15998 | 313 | ||
| Improve micro-F | 1804 | 12774 | 1733 | |
| Improve F [ | 1732 | 12688 | 1891 | |
| MULE | 16121 | 38 | 152 | |
| Improve micro-F | 3817 | 494 | 11331 | 669 |
| Improve F [ | 4198 | 400 | 11053 | 660 |
| MULE | 15736 | 144 | 117 | 43 |
The numbers are given for the micro-F optimization (first series of experiments)
Average frequency of labels for the labelsets selected by each algorithm
|
| 16.98 |
|
| 182.87 |
|
| 208.54 |
|
| 129.35 |
The results shown are for data set A and the combination of all models
Performance for training sets going back in time
| Size | Date | Micro-F | Macro-F |
|---|---|---|---|
| 100,000 | December 2012- July 2013 | 0.5591 | 0.3616 |
| 250,000 | January 2012- July 2013 | 0.5827 | 0.4567 |
| 500,000 | August 2010- July 2013 | 0.5941 | 0.5130 |
| 750,000 | January 2009- July 2013 | 0.5977 | 0.5358 |
| 1,000,000 | August 2007- July 2013 | 0.5993 | 0.5480 |
| 1,500,000 | July 2004- July 2013 | 0.5995 | 0.5637 |
| 2,000,000 | August 2001- July 2013 | 0.5963 | 0.5652 |
| 4,300,000 | December 1946 - July 2013 | 0.58646 | 0.56014 |
A fixed test set of 50k abstracts is employed for the experiment, from July 2013 to January 2014
Fig. 4Micro-F and macro-F measures (left and right figures respectively) against number of documents (in thousands)
Fig. 5Micro-F and macro-F measures (left and right respectively) for 20 equal test sets ranging from 2007–2013