| Literature DB >> 33267154 |
Vincent Vigneron1, Hichem Maaref1.
Abstract
The goal of classifier combination can be briefly stated as combining the decisions of individual classifiers to obtain a better classifier. In this paper, we propose a method based on the combination of weak rank classifiers because rankings contain more information than unique choices for a many-class problem. The problem of combining the decisions of more than one classifier with raw outputs in the form of candidate class rankings is considered and formulated as a general discrete optimization problem with an objective function based on the distance between the data and the consensus decision. This formulation uses certain performance statistics about the joint behavior of the ensemble of classifiers. Assuming that each classifier produces a ranking list of classes, an initial approach leads to a binary linear programming problem with a simple and global optimum solution. The consensus function can be considered as a mapping from a set of individual rankings to a combined ranking, leading to the most relevant decision. We also propose an information measure that quantifies the degree of consensus between the classifiers to assess the strength of the combination rule that is used. It is easy to implement and does not require any training. The main conclusion is that the classification rate is strongly improved by combining rank classifiers globally. The proposed algorithm is tested on real cytology image data to detect cervical cancer.Entities:
Keywords: HPV; aggregation; binary linear programming; cervical cancer; classifier combination; data fusion; independence; mutual information; plurality voting; rank; total order
Year: 2019 PMID: 33267154 PMCID: PMC7514928 DOI: 10.3390/e21050440
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1General framework for classifier combination. The classifier produces output vector . Finally, from the combination function produces a final decision vector .
Confusion matrix of a classifier used to estimate in the Bayesian approach. denotes the classifier decision on class being ranked jth.
| Predicted Classes | |||||||
|---|---|---|---|---|---|---|---|
| … | … | ||||||
| True classes | ⋯ | ⋯ | |||||
| ⋮ | ⋮ | ⋱ | ⋮ | ⋮ | ⋮ | ||
| ⋯ | ⋯ | ||||||
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | |||
| ⋯ | ⋯ | ||||||
| ⋯ | ⋯ | ||||||
Figure 2Permutation matrix put for the ranking of classifier .
Figure 3Condorcet matrices.
Dataset characteristics.
| HPV Test | Total Number of Cells | Number (or %) of | |||
|---|---|---|---|---|---|
| – Debris – – Cancer – | |||||
| positive | 405 | 78 | 49 | ||
| positive | 114 | 19 | 8 | ||
| positive | 206 | 31 | 13 | ||
| positive | 448 | 30 | 2 | ||
| positive | 519 | 70 | 33 | (0.06) | |
| negative | 137 | 13 | – | – | |
| negative | 76 | 5 | – | – | |
| negative | 211 | 84 | – | – | |
| negative | 251 | 31 | – | – | |
| negative | 251 | 52 | – | – | |
| negative | 257 | 40 | – | – | |
| negative | 223 | 24 | – | – | |
| negative | 691 | 155 | – | – | |
| negative | 67 | 23 | – | – | |
| Total | 3857 | 655 | 105 | ||
Figure 4Images of cervical cells colored with Papanicolaou stain. (a) Clumps of abnormal cells with large nuclei. (b) Abnormal cells with dense nuclei.
Overview of the studied dataset.
| No. of Patients | No. of Nuclei | No./Yype of Data | |
|---|---|---|---|
| control patients | 9 | 2165 | 427/noisy objects |
| risky patients | 5 | 1692 | 105/atypical nuclei |
| – | – | 228 / noisy objects | |
| Total | 14 | 3857 | 760 objects |
Figure 5Overview of the processing chain.
Classification results with disagreement and Condorcet combination rules using a set of M classifiers (with ).
| Disagreement Distance | Condorcet Distance | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Error Rate | FPR | FNR | Error Rate | FPR | FNR | ||||
| 0.873 | 1321 | 0.0800 | 0.0934 | 0.0644 | 0.901 | 1321 | 0.0820 | 0.0870 | 0.0777 |
| 0.901 | 1073 | 0.0544 | 0.0576 | 0.906 | 1073 | 0.0572 | 0.0566 | 0.0561 | |
| 0.866 | 907 | 0.0560 | 0.0553 | 0.0562 | 0.966 | 907 | 0.0529 | ||
| 0.895 | 845 | 0.0524 | 0.0593 | 0.920 | 845 | 0.0508 | 0.0465 | ||
| 0.870 | 765 | 0.0502 | 0.0500 | 0.822 | 765 | 0.0516 | 0.0555 | 0.0493 | |
| 0.800 | 538 | 0.0636 | 0.0675 | 0.0600 | 0.817 | 538 | 0.0664 | 0.0718 | 0.0592 |
| 0.792 | 302 | 0.0728 | 0.0766 | 0.0710 | 0.744 | 302 | 0.1020 | 0.0923 | 0.1126 |
| 0.781 | 205 | 0.0896 | 0.0906 | 0.0864 | 0.757 | 205 | 0.1472 | 0.1015 | 0.1968 |
| 0.759 | 120 | 0.1292 | 0.1157 | 0.1439 | 0.776 | 120 | 0.1472 | 0.1118 | 0.1846 |
| 0.697 | 95 | 0.1424 | 0.1023 | 0.1809 | 0.660 | 95 | 0.1424 | 0.0985 | 0.1888 |
| 0.706 | 66 | 0.1392 | 0.1098 | 0.1667 | 0.689 | 66 | 0.1548 | 0.1055 | 0.2090 |
| 0.739 | 49 | 0.1328 | 0.0840 | 0.1854 | 0.672 | 49 | 0.1568 | 0.1117 | 0.2067 |
| 0.694 | 38 | 0.1460 | 0.1152 | 0.1770 | 0.516 | 38 | 0.1772 | 0.1404 | 0.2233 |
| 0.643 | 19 | 0.1540 | 0.1294 | 0.1801 | 0.484 | 19 | 0.1696 | 0.1323 | 0.2032 |
| 0.496 | 13 | 0.1644 | 0.1127 | 0.2224 | 0.455 | 13 | 0.1728 | 0.1261 | 0.2248 |
| 0.561 | 4 | 0.1580 | 0.1238 | 0.1962 | 0.477 | 4 | 0.1668 | 0.1234 | 0.2130 |
Figure 6Graphic representations of the classification results for disagreement (blue) and Condorcet (red) distances.
Results obtained for the sparse k-means (SkM), general sparse multi-class linear discriminant analysis (GSM-LDA), and sparse EM (sEM) algorithms: Average and standard error of clustering error rate, false positive rate fpr, and false negative rate fnr on 20 simulations.
| Algorithm | Error Rate | FPR | FNR |
|---|---|---|---|
| skm [ | 0.192 | 0.205 | 0.165 |
| gsm [ | 0.159 | 0.133 | 0.118 |
| sem [ | 0.090 | 0.077 | 0.062 |
Proposed rank classifier combination using disagreement and Condorcet distances.
| Digits | Classifier Ranks | Proposed Rank | ||||
|---|---|---|---|---|---|---|
| 0 | 1 | 4 | 3 | 10 | 3 | 3 |
| 1 | 2 | 2 | 1 | 2 | 2 | 2 |
| 2 | 3 | 1 | 2 | 1 | 1 | 1 |
| 3 | 4 | 6 | 4 | 3 | 4 | 4 |
| 4 | 5 | 5 | 7 | 5 | 5 | 6 |
| 5 | 6 | 3 | 6 | 4 | 6 | 5 |
| 6 | 7 | 8 | 5 | 6 | 7 | 7 |
| 7 | 8 | 7 | 8 | 9 | 8 | 8 |
| 8 | 9 | 10 | 10 | 8 | 10 | 10 |
| 9 | 10 | 9 | 9 | 7 | 9 | 9 |