| Literature DB >> 23244782 |
Faisal Saeed1, Naomie Salim, Ammar Abdo.
Abstract
BACKGROUND: Although many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster.Entities:
Year: 2012 PMID: 23244782 PMCID: PMC3541359 DOI: 10.1186/1758-2946-4-37
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
MDDR dataset activity classes
| 31420 | Renin Inhibitors | 1130 | 0.290 |
| 71523 | HIV Protease Inhibitors | 750 | 0.198 |
| 37110 | Thrombin Inhibitors | 803 | 0.180 |
| 31432 | Angiotensin II AT1 Antagonists | 943 | 0.229 |
| 42731 | Substance P Antagonists | 1246 | 0.149 |
| 06233 | Substance P Antagonists | 752 | 0.140 |
| 06245 | 5HT Reuptake Inhibitors | 359 | 0.122 |
| 07701 | D2 Antagonists | 395 | 0.138 |
| 06235 | 5HT1A Agonists | 827 | 0.133 |
| 78374 | Protein Kinase C Inhibitors | 453 | 0.120 |
| 78331 | Cyclooxygenase Inhibitors | 636 | 0.108 |
Effectiveness of clustering of MDDR dataset using F-Measure: ALOGP Fingerprint
| Consensus clustering | CVAA | Correlation | 26.80 | 21.96 | 18.96 | 18.49 | 17.6 | 15.45 |
| Cosine | 24.79 | 21.72 | 19.01 | 18.19 | 16.46 | 14.81 | ||
| Euclidean | ||||||||
| Hamming | 24.02 | 20.48 | 16.31 | 16.85 | 14.95 | 14.68 | ||
| Jaccard | 23.58 | 21.96 | 18.01 | 18.46 | 16.72 | 15.35 | ||
| Manhattan | 27.03 | 25.23 | 21.16 | 20.36 | 19.10 | 19.05 | ||
| CSPA | Correlation | 5.06 | 4.65 | 4.16 | 3.56 | 3.35 | 3.04 | |
| Cosine | 5.17 | 4.65 | 4.08 | 3.62 | 3.37 | 3.05 | ||
| Euclidean | 5.12 | 4.64 | 4.04 | 3.61 | 3.38 | 3.00 | ||
| Hamming | 5.30 | 4.74 | 4.16 | 3.62 | 3.54 | 3.13 | ||
| Jaccard | 5.31 | 4.82 | 4.15 | 3.77 | 3.48 | 3.13 | ||
| Manhattan | 5.33 | 4.80 | 4.21 | 3.62 | 3.45 | 3.05 | ||
| HGPA | Correlation | 7.13 | 5.48 | 5.45 | 4.65 | 4.35 | 4.37 | |
| Cosine | 8.06 | 6.04 | 5.03 | 4.52 | 4.45 | 4.08 | ||
| Euclidean | 7.08 | 6.55 | 5.65 | 4.67 | 4.56 | 4.60 | ||
| Hamming | 8.37 | 5.73 | 4.94 | 5.29 | 4.97 | 4.93 | ||
| Jaccard | 7.63 | 6.22 | 5.98 | 4.53 | 5.24 | 3.92 | ||
| Manhattan | 7.72 | 6.48 | 5.23 | 5.35 | 4.90 | 4.12 | ||
| Individual clustering | Ward's method | 9.93 | 9.19 | 8.19 | 7.17 | 6.67 | 6.44 | |
Effectiveness of clustering of MDDR dataset using F-Measure: ECFP_4 Fingerprint
| Consensus clustering | CVAA | Correlation | 33.58 | 29.81 | 24.44 | 20.09 | 18.41 | 17.43 |
| Cosine | 34.75 | 31.32 | 24.97 | 20.26 | 18.46 | 17.73 | ||
| Euclidean | 25.43 | 23.34 | 20.51 | 19.13 | 16.47 | 14.64 | ||
| Hamming | 25.48 | 24.04 | 20.23 | 19.62 | 17.31 | 14.73 | ||
| Jaccard | ||||||||
| Manhattan | 25.41 | 23.98 | 20.30 | 19.53 | 17.25 | 14.65 | ||
| CSPA | Correlation | 5.53 | 4.88 | 4.23 | 3.85 | 3.6 | 3.18 | |
| Cosine | 5.43 | 4.88 | 4.28 | 3.91 | 3.55 | 3.10 | ||
| Euclidean | 5.47 | 4.87 | 4.17 | 3.79 | 3.53 | 3.33 | ||
| Hamming | 5.45 | 4.82 | 4.23 | 3.87 | 3.58 | 3.19 | ||
| Jaccard | 5.51 | 4.99 | 4.25 | 3.99 | 3.62 | 3.20 | ||
| Manhattan | 5.44 | 4.85 | 4.23 | 3.89 | 3.62 | 3.20 | ||
| HGPA | Correlation | 7.01 | 6.2 | 5.21 | 4.5 | 4.16 | 3.68 | |
| Cosine | 6.83 | 5.95 | 5.29 | 4.47 | 4.21 | 3.93 | ||
| Euclidean | 7.29 | 5.82 | 5.29 | 4.39 | 4.48 | 3.94 | ||
| Hamming | 7.01 | 5.83 | 5.29 | 4.50 | 4.37 | 3.69 | ||
| Jaccard | 6.87 | 5.91 | 5.31 | 4.81 | 4.80 | 3.66 | ||
| Manhattan | 7.81 | 5.17 | 5.38 | 4.61 | 4.66 | 3.68 | ||
| Individual clustering | Ward's method | 11.61 | 10.71 | 9.04 | 8.29 | 7.64 | 7.02 | |
Effectiveness of clustering of MDDR dataset using QPI: ALOGP Fingerprint
| Consensus clustering | CVAA | Correlation | 43.84 | 47.38 | 48.72 | 50.70 | 53.41 | 54.06 |
| Cosine | 45.60 | 46.08 | 47.56 | 50.46 | 53.79 | 54.50 | ||
| Euclidean | 44.43 | 45.54 | 47.95 | 48.65 | 52.68 | 54.86 | ||
| Hamming | 53.13 | 56.08 | 59.07 | 60.58 | 64.02 | 67.76 | ||
| Jaccard | ||||||||
| Manhattan | 56.01 | 58.10 | 60.99 | 61.86 | 64.56 | 65.97 | ||
| CSPA | Correlation | 46.81 | 50.04 | 51.72 | 51.78 | 54.23 | 56.36 | |
| Cosine | 46.04 | 49.49 | 51.42 | 52.11 | 54.48 | 55.92 | ||
| Euclidean | 46.20 | 49.86 | 51.05 | 51.88 | 54.36 | 56.33 | ||
| Hamming | 54.67 | 58.50 | 60.27 | 61.78 | 62.33 | 65.66 | ||
| Jaccard | 55.03 | 59.13 | 60.84 | 61.03 | 63.73 | 67.44 | ||
| Manhattan | 55.08 | 59.00 | 59.10 | 60.84 | 61.78 | 64.61 | ||
| HGPA | Correlation | 47.59 | 49.51 | 52.39 | 54.45 | 56.86 | 58.56 | |
| Cosine | 45.58 | 48.44 | 52.78 | 54.42 | 56.36 | 58.70 | ||
| Euclidean | 46.92 | 51.41 | 53.20 | 54.75 | 57.00 | 58.97 | ||
| Hamming | 55.24 | 58.48 | 60.30 | 63.99 | 68.21 | 69.22 | ||
| Jaccard | 55.71 | 59.89 | 64.10 | 65.15 | 70.48 | 71.60 | ||
| Manhattan | 54.84 | 58.98 | 62.73 | 63.58 | 65.85 | 69.97 | ||
| Individual clustering | Ward's method | 52.33 | 54.86 | 56.90 | 59.00 | 61.33 | 63.17 | |
Effectiveness of clustering of MDDR dataset using QPI: ECFP_4 Fingerprint
| Consensus clustering | CVAA | Correlation | 74.86 | 78.02 | 82.39 | 84.16 | 85.71 | 87.04 |
| Cosine | 74.79 | 78.12 | 81.85 | 84.78 | 85.91 | 87.18 | ||
| Euclidean | 71.04 | 74.92 | 78.41 | 81.91 | 84.47 | 86.80 | ||
| Hamming | 70.99 | 74.36 | 78.47 | 81.68 | 84.24 | 86.28 | ||
| Jaccard | ||||||||
| Manhattan | 70.74 | 74.26 | 78.52 | 81.74 | 84.12 | 86.09 | ||
| CSPA | Correlation | 70.58 | 73.29 | 74.86 | 76.86 | 79.17 | 82.03 | |
| Cosine | 71.23 | 71.85 | 76.43 | 76.55 | 78.06 | 81.21 | ||
| Euclidean | 65.33 | 67.09 | 72.49 | 72.73 | 74.50 | 78.75 | ||
| Hamming | 64.68 | 66.82 | 69.88 | 71.25 | 74.17 | 76.64 | ||
| Jaccard | 69.91 | 71.73 | 74.20 | 76.01 | 77.72 | 79.26 | ||
| Manhattan | 63.07 | 65.77 | 68.83 | 71.50 | 74.06 | 77.33 | ||
| HGPA | Correlation | 72.61 | 74.85 | 76.4 | 78.32 | 80.22 | 82.26 | |
| Cosine | 72.06 | 74.25 | 77.21 | 79.54 | 81.02 | 83.31 | ||
| Euclidean | 70.71 | 72.82 | 75.02 | 76.80 | 80.50 | 82.66 | ||
| Hamming | 69.45 | 72.21 | 74.08 | 77.71 | 79.67 | 82.36 | ||
| Jaccard | 67.88 | 70.58 | 73.93 | 76.56 | 77.65 | 79.67 | ||
| Manhattan | 72.74 | 72.14 | 75.68 | 77.94 | 81.42 | 82.97 | ||
| Individual clustering | Ward's method | 75.83 | 79.88 | 83.34 | 84.25 | 86.49 | 88.25 | |
-test statistical significance testing using F-measure
| | ||||||
|---|---|---|---|---|---|---|
| a) ALOGP: | | | | | | |
| Pair 1: CVAA - Wards | 15.37 | 1.77 | 0.72 | 13.50 | 17.23 | 0.000004 |
| Pair 2: CVAA -CSPA | 19.16 | 2.11 | 0.86 | 16.95 | 21.38 | 0.000003 |
| Pair 3: CVAA - HGPA | 17.24 | 1.75 | 0.71 | 15.40 | 19.08 | 0.000002 |
| b) ECFP_4: | | | | | | |
| Pair 1: CVAA - Ward | 17.25 | 5.48 | 2.24 | 11.49 | 23.01 | 0.000589 |
| Pair 2: CVAA - CSPA | 22.01 | 6.41 | 2.62 | 15.27 | 28.75 | 0.000391 |
| Pair 3: CVAA – HGPA | 20.84 | 5.95 | 2.43 | 14.58 | 27.09 | 0.000357 |
T-test statistical significance testing using QPI measure
| | ||||||
|---|---|---|---|---|---|---|
| | ||||||
| a) ALOGP: | | | | | | |
| Pair 1: CVAA - Wards | 7.61 | 1.92 | 0.78 | 5.58 | 9.63 | 0.000199 |
| Pair 2: CVAA -CSPA | 4.20 | 2.08 | 0.85 | 2.02 | 6.39 | 0.004290 |
| Pair 3: CVAA - HGPA | 1.62 | 1.06 | 0.43 | 0.49 | 2.74 | 0.013842 |
| b) ECFP_4: | | | | | | |
| Pair 1: CVAA - Ward | 5.81 | 1.60 | 0.65 | 4.12 | 7.49 | 0.000301 |
| Pair 2: CVAA - CSPA | 12.31 | 1.49 | 0.61 | 10.74 | 13.88 | 0.000005 |
| Pair 3: CVAA – HGPA | 10.77 | 1.50 | 0.61 | 9.20 | 12.34 | 0.000010 |