| Literature DB >> 19228411 |
Vasyl Pihur1, Susmita Datta, Somnath Datta.
Abstract
BACKGROUND: Researchers in the field of bioinformatics often face a challenge of combining several ordered lists in a proper and efficient manner. Rank aggregation techniques offer a general and flexible framework that allows one to objectively perform the necessary aggregation. With the rapid growth of high-throughput genomic and proteomic studies, the potential utility of rank aggregation in the context of meta-analysis becomes even more apparent. One of the major strengths of rank-based aggregation is the ability to combine lists coming from different sources and platforms, for example different microarray chips, which may or may not be directly comparable otherwise.Entities:
Mesh:
Year: 2009 PMID: 19228411 PMCID: PMC2669484 DOI: 10.1186/1471-2105-10-62
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Clustering algorithms ranks
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| APN | SM | FN | ST | KM | PM | HR | AG | CL | DI | MO |
| AD | SM | FN | KM | PM | CL | ST | DI | HR | AG | MO |
| ADM | FN | SM | ST | KM | CL | PM | DI | HR | AG | MO |
| FOM | SM | CL | KM | PM | FN | ST | DI | HR | AG | MO |
| Connectivity | HR | AG | DI | KM | MO | SM | FN | CL | PM | ST |
| Dunn | HR | AG | KM | PM | DI | SM | CL | MO | FN | ST |
| Silhouette | HR | AG | KM | SM | CL | PM | ST | DI | FN | MO |
10 clustering algorithms ranked by 7 validation measures (in rows). The rank of 1 means that the algorithm received the best scored from a particular validation measure. For example, SOM is deemed to be the best algorithm by APN, AD, and FOM measures, while MO is ranked last by 5 out of 7 measures.
Validation scores
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
| APN | 0.11 | 0.12 | 0.15 | 0.15 | 0.17 | 0.17 | 0.17 | 0.18 | 0.18 | 0.26 |
| AD | 1.63 | 1.67 | 1.70 | 1.71 | 1.71 | 1.74 | 1.83 | 1.85 | 1.85 | 2.50 |
| ADM | 0.28 | 0.31 | 0.37 | 0.47 | 0.48 | 0.48 | 0.57 | 0.63 | 0.63 | 0.85 |
| FOM | 0.57 | 0.58 | 0.58 | 0.59 | 0.59 | 0.59 | 0.60 | 0.68 | 0.68 | 0.80 |
| Connectivity | 23.91 | 23.91 | 35.44 | 36.09 | 37.49 | 38.40 | 38.82 | 39.58 | 39.84 | 49.93 |
| Dunn | 0.17 | 0.17 | 0.12 | 0.11 | 0.11 | 0.11 | 0.08 | 0.08 | 0.08 | 0.06 |
| Silhouette | 0.39 | 0.39 | 0.38 | 0.36 | 0.35 | 0.35 | 0.33 | 0.31 | 0.30 | 0.16 |
Clustering validation scores for each validation measure as produced by the clValid package. Please note that the rows are ordered in either ascending or descending order depending whether larger or smaller scores are desirable for a particular validation measure.
Figure 1Rank aggregation in the clustering context using the CE algorithm. Visual Representation of the aggregation results through the plot( ) function for the Clustering example using the CE algorithm and the Spearman footrule distance. The first plot in the top row shows the path of minimum values of the objective function over time. The global minimum is shown in the top right corner. The histogram of the objective function scores at the last iteration is displayed in the second plot. Looking at these two plots, one can get a general idea about the rate of convergence and the distribution of candidate lists at the last iteration. The third plot at the bottom shows the individual lists and the obtained solution along with optional average ranking.
Figure 2Rank aggregation in the clustering context using the GA algorithm. Visual representation of rank aggregation for the Clustering example using the GA algorithm with the Weighted Spearman distance.
Top-25 prostate cancer gene lists
| Luo | Welsh | Dhana | True | Singh | |
| 1 | HPN | HPN | OGT | AMACR | HPN |
| 2 | AMACR | AMACR | AMACR | HPN | SLC25A6 |
| 3 | CYP1B1 | 0ACT2 | FASN | NME2 | EEF2 |
| 4 | ATF5 | GDF15 | HPN | CBX3 | SAT |
| 5 | BRCA1 | FASN | UAP1 | GDF15 | NME2 |
| 6 | LGALS3 | ANK3 | GUCY1A3 | MTHFD2 | LDHA |
| 7 | MYC | KRT18 | 0ACT2 | MRPL3 | CANX |
| 8 | PCDHGC3 | UAP1 | SLC19A1 | SLC25A6 | NACA |
| 9 | WT1 | GRP58 | KRT18 | NME1 | FASN |
| 10 | TFF3 | PPIB | EEF2 | COX6C | SND1 |
| 11 | MARCKS | KRT7 | STRA13 | JTV1 | KRT18 |
| 12 | OS-9 | NME1 | ALCAM | CCNG2 | RPL15 |
| 13 | CCND2 | STRA13 | GDF15 | AP3S1 | TNFSF10 |
| 14 | NME1 | DAPK1 | NME1 | EEF2 | SERP1 |
| 15 | DYRK1A | TMEM4 | CALR | RAN | GRP58 |
| 16 | TRAP1 | CANX | SND1 | PRKACA | ALCAM |
| 17 | FM05 | TRA1 | STAT6 | RAD23B | GDF15 |
| 18 | ZHX2 | PRSS8 | TCEB3 | PSAP | TMEM4 |
| 19 | RPL36AL | ENTPD6 | EIF4A1 | CCT2 | CCT2 |
| 20 | ITPR3 | PPP1CA | LMAN1 | G3BP | SLC39A6 |
| 21 | GCSH | ACADSB | MAOA | EPRS | RPL5 |
| 22 | DDB2 | PTPLB | ATP6V0B | CKAP1 | RPS13 |
| 23 | TFCP2 | TMEM23 | PPIB | LIG3 | MTHFD2 |
| 24 | TRAM1 | MRPL3 | FM05 | SNX4 | G3BP2 |
| 25 | YTHDF3 | SLC19A1 | SLC7A5 | NSMAF | UAP1 |
Top-25 upregulated genes from 5 different prostate microarray experiments (as reported in [1]). HPN is the sole gene that appears in all five lists.
Figure 3Rank aggregation of gene lists using the GA algorithm. Plots created by the plot( ) function for the GA rank aggregation of the gene lists. We can see that algorithm stabilized after roughly 500 iterations. The distribution of the population in the final generation is concentrated just to the right of the optimal solution. The bottom plot clearly shows why the solution makes sense. Genes ranked high in the final list usually come from several individual lists as indicated by the presence of multiple lines intersecting. The genes at the end of the final list are the ones included in a single list but somewhere close to the top. The rank of 26 is artificial in our procedure and it simply indicates that that particular gene is not present in the individual list.