| Literature DB >> 22937075 |
Luisa Cutillo1, Annamaria Carissimo, Silvia Figini.
Abstract
We consider the problem of finding the set of rankings that best represents a given group of orderings on the same collection of elements (preference lists). This problem arises from social choice and voting theory, in which each voter gives a preference on a set of alternatives, and a system outputs a single preference order based on the observed voters' preferences. In this paper, we observe that, if the given set of preference lists is not homogeneous, a unique true underling ranking might not exist. Moreover only the lists that share the highest amount of information should be aggregated, and thus multiple rankings might provide a more feasible solution to the problem. In this light, we propose Network Selection, an algorithm that, given a heterogeneous group of rankings, first discovers the different communities of homogeneous rankings and then combines only the rank orderings belonging to the same community into a single final ordering. Our novel approach is inspired by graph theory; indeed our set of lists can be loosely read as the nodes of a network. As a consequence, only the lists populating the same community in the network would then be aggregated. In order to highlight the strength of our proposal, we show an application both on simulated and on two real datasets, namely a financial and a biological dataset. Experimental results on simulated data show that Network Selection can significantly outperform existing related methods. The other way around, the empirical evidence achieved on real financial data reveals that Network Selection is also able to select the most relevant variables in data mining predictive models, providing a clear superiority in terms of predictive power of the models built. Furthermore, we show the potentiality of our proposal in the bioinformatics field, providing an application to a biological microarray dataset.Entities:
Mesh:
Year: 2012 PMID: 22937075 PMCID: PMC3427185 DOI: 10.1371/journal.pone.0043678
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Scenario 1A average within distance and relative standard deviation for communities simulated with and .
|
|
|
|
|
| 0.210159967 | 0.02889782 |
|
| 0.209791857 | 0.02889778 |
|
| 0.210035548 | 0.02895248 |
|
| 0.209975231 | 0.02892337 |
Scenario 1A average between distance and relative standard deviation for communities simulated with and .
|
|
|
|
|
|
|
| 0.64152464 | 0.028919 |
|
|
| 0.64156696 | 0.028832 |
|
|
| 0.64155527 | 0.028768 |
|
|
| 0.64174671 | 0.028965 |
|
|
| 0.64177845 | 0.028846 |
|
|
| 0.64156148 | 0.028891 |
Scenario 2 average within distance and relative standard deviation for communities simulated with and .
|
|
|
|
|
| 0.209805957 | 0.028877625 |
|
| 0.209939113 | 0.028840451 |
|
| 0.210040162 | 0.028895175 |
|
| 0.210083126 | 0.028848004 |
Scenario 2 average between distance and relative standard deviation for communities simulated with and .
|
|
|
|
|
|
|
| 0.641569346 | 0.028915927 |
|
|
| 0.641501867 | 0.028864567 |
|
|
| 0.641801049 | 0.028848633 |
|
|
| 0.641560463 | 0.028739462 |
|
|
| 0.641760131 | 0.028849104 |
|
|
| 0.641406958 | 0.028983586 |
Predictive models on the whole data set.
| Model | decile | LIFT | CCR |
| Tree | 1 | 2.77 | 27.71 |
| Tree | 2 | 2.54 | 53.07 |
| Tree | 3 | 1.56 | 68.70 |
| Tree | 4 | 0.57 | 74.36 |
| Tree | 5 | 0.46 | 78.93 |
| Tree | 6 | 0.46 | 83.49 |
| Tree | 7 | 0.46 | 88.05 |
| Tree | 8 | 0.46 | 92.62 |
| Tree | 9 | 0.46 | 97.18 |
| Tree | 10 | 0.28 | 100.00 |
| Log Reg | 1 | 0.09 | 0.86 |
| Log Reg | 2 | 0.69 | 7.76 |
| Log Reg | 3 | 0.34 | 11.21 |
| Log Reg | 4 | 0.60 | 17.24 |
| Log Reg | 5 | 0.86 | 25.86 |
| Log Reg | 6 | 1.38 | 39.66 |
| Log Reg | 7 | 2.07 | 60.34 |
| Log Reg | 8 | 0.95 | 69.83 |
| Log Reg | 9 | 1.55 | 85.34 |
| Log Reg | 10 | 1.47 | 100.00 |
NetSel communities extraction result on the proposed set of financial ratios.
|
|
|
|
|
| Supplier target days | Liquidity ratio | Cost income ratio | Trade payable ratio |
| Outside capital strucure | Cash ratio | ||
| Capital tied up | Equity ratio | ||
| Cash flow to effective debt | |||
| Liabilities ratio | |||
| Result ratio |
NetSel extracted communities within distance and relative standard deviation.
|
|
|
|
|
| 0.341511596 | 0.006955172 |
|
| 0.33071787 | 0.11191681 |
Average distance between the NetSel extracted communities and relative standard deviation.
|
|
|
|
|
|
|
| 0.605087116 | 0.04946876 |
|
|
| 0.500418 | 0.105093445 |
|
|
| 0.394367666 | 0.141355664 |
|
|
| 0.500584162 | 0.060114893 |
|
|
| 0.560789719 | 0.093720458 |
|
|
| 0.662595709 | NA |
Predictive models on .
| Model | decile | LIFT | CCR |
| Tree | 1 | 4.32 | 43.21 |
| Tree | 2 | 1.25 | 55.70 |
| Tree | 3 | 0.84 | 64.06 |
| Tree | 4 | 0.84 | 72.42 |
| Tree | 5 | 0.84 | 80.78 |
| Tree | 6 | 0.81 | 88.91 |
| Tree | 7 | 0.29 | 91.85 |
| Tree | 8 | 0.29 | 94.79 |
| Tree | 9 | 0.29 | 97.73 |
| Tree | 10 | 0.23 | 100.00 |
| Log Reg | 1 | 3.00 | 30.00 |
| Log Reg | 2 | 2.17 | 51.72 |
| Log Reg | 3 | 0.95 | 61.21 |
| Log Reg | 4 | 0.69 | 68.10 |
| Log Reg | 5 | 0.86 | 76.72 |
| Log Reg | 6 | 0.52 | 81.90 |
| Log Reg | 7 | 0.86 | 90.52 |
| Log Reg | 8 | 0.34 | 93.97 |
| Log Reg | 9 | 0.34 | 97.41 |
| Log Reg | 10 | 0.26 | 100.00 |
Predictive models on .
| Model | decile | LIFT | CCR |
| Tree | 1 | 3.17 | 31.70 |
| Tree | 2 | 2.08 | 52.47 |
| Tree | 3 | 1.94 | 71.85 |
| Tree | 4 | 0.94 | 81.29 |
| Tree | 5 | 0.72 | 88.51 |
| Tree | 6 | 0.37 | 92.23 |
| Tree | 7 | 0.37 | 95.94 |
| Tree | 8 | 0.37 | 99.66 |
| Tree | 9 | 0.03 | 100.00 |
| Tree | 10 | 0.00 | 100.00 |
| Reg | 1 | 3.10 | 31.03 |
| Reg | 2 | 1.98 | 50.86 |
| Reg | 3 | 1.72 | 68.10 |
| Reg | 4 | 1.21 | 80.17 |
| Reg | 5 | 0.60 | 86.21 |
| Reg | 6 | 0.60 | 92.24 |
| Reg | 7 | 0.34 | 95.69 |
| Reg | 8 | 0.34 | 99.14 |
| Reg | 9 | 0.09 | 100.00 |
| Reg | 10 | 0.00 | 100.00 |
Percentage of tissue samples assigned by to each community.
| tissue/community |
|
|
|
|
| 0.78 | 0.03 | 0.19 |
|
| 0.03 | 0.61 | 0.36 |
|
| 0.03 | 0.555 | 0.415 |