| Literature DB >> 22642815 |
Federico Fogolari1, Alessandra Corazza, Paolo Viglino, Gennaro Esposito.
Abstract
BACKGROUND: For many predictive applications a large number of models is generated and later clustered in subsets based on structure similarity. In most clustering algorithms an all-vs-all root mean square deviation (RMSD) comparison is performed. Most of the time is typically spent on comparison of non-similar structures. For sets with more than, say, 10,000 models this procedure is very time-consuming and alternative faster algorithms, restricting comparisons only to most similar structures would be useful.Entities:
Year: 2012 PMID: 22642815 PMCID: PMC3403935 DOI: 10.1186/1748-7188-7-16
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Number of RMSD computations for the 4state_reduced decoy dataset with varying RMSD threshold.
| Decoy set | t = 2.4 Å | t = 2.8 Å | t = 3.2 Å | t = 3.6 Å | |
|---|---|---|---|---|---|
| 1ctf | 87,219 | 103,780 | 122,095 | 140,094 | 198,765 |
| 1r69 | 130,510 | 150,415 | 171,407 | 189,659 | 228,150 |
| 1sn3 | 106,523 | 124,587 | 147,101 | 174,527 | 217,470 |
| 2cro | 183,135 | 206,133 | 189,277 | 221,219 | 227,475 |
| 3icb | 130,653 | 119,774 | 139,425 | 176,184 | 213,531 |
| 4pti | 106,581 | 131,345 | 175,432 | 191,091 | 236,328 |
| 4rxn | 95,306 | 117,515 | 143,481 | 190,997 | 228,826 |
| Total | 839,927 | 953,549 | 1,088,220 | 1,283,751 | 1,550,545 |
Number of RMSD computations for the semfold decoy dataset.
| Decoy set | This work | ratio | |
|---|---|---|---|
| 1ctf | 11,753,426 | 64,997,101 | 0.18 |
| 1e68 | 9,039,397 | 64,541,841 | 0.14 |
| 1eh2 | 7,332,361 | 65,453,961 | 0.11 |
| 1khm | 22,047,014 | 222,193,740 | 0.10 |
| 1nkl | 7,966,008 | 67,995,291 | 0.12 |
| 1pgb | 13,465,834 | 63,636,121 | 0.21 |
| Total | 71,604,689 | 548,818,055 | 0.13 |
For this table the RMSD threshold was 3.0 Å.
Figure 1Ratio of the number of RMSD computations performed over .
Fragment clustering.
| RMSD threshold (Å) | This work | ratio | |||
|---|---|---|---|---|---|
| 0.05 | 107,184 | 57,978,627 | 5,744,151,336 | 0.0100 | 79,994 |
| 0.1 | 79,994 | 28,610,140 | 3,199,480,021 | 0.0089 | 47,502 |
| 0.2 | 47,502 | 19,089,084 | 1,128,196,251 | 0.0169 | 13,066 |
| 0.4 | 13,066 | 4,479,481 | 85,353,645 | 0.0525 | 1,853 |
| 0.8 | 1,853 | 393,824 | 1,715,878 | 0.2295 | 131 |
| Total | 107,184 | 110,551,156 | 5,744,151,336 | 0.0193 | |
Columns report the threshold RMSD chosen for clustering, the number of starting fragments, the number of RMSD computations done, the number of computations in an all-vs-all comparison, the number of representative fragments, used as starting fragments at the next iteration. CA atoms have been used for superposition.
Fragment clustering on backbone.
| RMSD threshold (Å) | This work | ratio | |||
|---|---|---|---|---|---|
| 0.05 | 107,184 | 53,793,796 | 5,744,151,336 | 0.0094 | 105,299 |
| 0.1 | 105,299 | 124,506,221 | 5,543,887,051 | 0.0225 | 87,195 |
| 0.2 | 87,195 | 57,001,567 | 3,801,440,415 | 0.0150 | 63,829 |
| 0.4 | 63,829 | 65,723,340 | 2,037,038,706 | 0.0322 | 21,637 |
| 0.8 | 21,637 | 33,375,313 | 234,069,066 | 0.1426 | 2,445 |
| 1.6 | 2,445 | 2,586,436 | 2,987,790 | 0.8657 | 33 |
| 3.2 | 33 | 528 | 528 | 1.0000 | 2 |
| Total | 107,184 | 336,987,202 | 5,744,151,336 | 0.0587 | |
Columns report the threshold RMSD chosen for clustering, the number of starting fragments, the number of RMSD computations done, the number of computations in an all-vs-all comparison, the number of representative fragments, used as starting fragments at the next iteration. Atoms N, CA, C, O have been used for superposition.
Figure 2Fragments superposed on the representative fragment for the 16 cluster populated with more than 1% of the whole dataset.