| Literature DB >> 21910884 |
Dorota Herman1, David Ochoa, David Juan, Daniel Lopez, Alfonso Valencia, Florencio Pazos.
Abstract
BACKGROUND: The prediction and study of protein interactions and functional relationships based on similarity of phylogenetic trees, exemplified by the mirrortree and related methodologies, is being widely used. Although dependence between the performance of these methods and the set of organisms used to build the trees was suspected, so far nobody assessed it in an exhaustive way, and, in general, previous works used as many organisms as possible. In this work we asses the effect of using different sets of organism (chosen according with various phylogenetic criteria) on the performance of this methodology in detecting protein interactions of different nature.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21910884 PMCID: PMC3179974 DOI: 10.1186/1471-2105-12-363
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Schema of the methodology.From an initial set of organisms with completely sequenced genomes (left), a number of subsets (red) are constructed according with two taxonomic criteria: "nearest" (blue) - following the taxonomy of the reference organism (E coli K12) back to the root of the taxonomic tree, all the genomes belonging to each node visited (E coli species, Enterobacteriaceae family, etc.) are taken; "level" (purple) - the tree is successively cut at each taxonomic level (superkingdom, phylum, ...) and one organism is taken from each one of the resulting groups (the one with the largest proteome). On the other hand, a number of "gold standard" interaction datasets representing physical and functional interactions of different nature are used (top). For each combination interaction dataset/organism subset, the performance of the three mirrortree-based methodologies is assessed with a partial-ROC analysis (colored curves).
Figure 2Matrix of partial ROC curves. The partial ROC curves evaluate the performance of a given methodology for a given set of interactions using a given set of reference genomes. The rows represent the interaction datasets and the columns the methods. For a given combination method-dataset, the colored curves represent the organism sets according with the included legend. In the legend, the number of organisms within each subset is indicated within brackets. The dotted line highlights the diagonal of the ROC plots, which represents the background performance expected for a random method.
Examples of potentially "new" and "old" interacting pairs of proteins whose co-evolution was evaluated using two sets of organisms
| Protein | Level9(= all) | Nearest2 | ||||||
|---|---|---|---|---|---|---|---|---|
| Tot/+ | AUC | Interactor (corr) | Tot/+ | AUC | Interactor (corr) | |||
| MINE_ECOLI | Cell division topological specificity factor | 846/1 | 0.12 | MIND_ECOLI (0.52) | 223/1 | |||
| "recent" | PABA_ECOLI | Para-aminobenzoate synthase glutamine amidotransferase component II | 671/1 | 0.28 | PABB_ECOLI (0.49) | 106/1 | ||
| DHAS_ECOLI | Aspartate-semialdehyde dehydrogenase | 760/1 | 0.17 | DNAK_ECOLI (0.48) | 384/1 | |||
| GSHB_ECOLI | Glutathione synthetase | 755/1 | 0.30 | AMPM_ECOLI (0.61) | 375/1 | |||
| DPO3A_ECOLI | DNA polymerase III subunit alpha | 306/1 | 128/1 | 0.11 | DPO3E_ECOLI (0.57) | |||
| DPO3E_ECOLI | DNA polymerase III subunit epsilon | 357/1 | 123/1 | 0.22 | (0.57) | |||
| RPOB_ECOLI | DNA-directed RNA polymerase subunit beta | 280/7 | 126/4 | 0.48 | (0.93) | |||
| "old" | RPOA_ECOLI | DNA-directed RNA polymerase subunit alpha | 258/6 | 90/3 | 0.48 | (0.93) | ||
| ZNUB_ECOLI | High-affinity zinc uptake system membrane protein znuB | 370/2 | 129/1 | 0.36 | ZNUC_ECOLI (0.74) | |||
| ZNUC_ECOLI | Zinc import ATP-binding protein ZnuC | 386/2 | 123/2 | 0.41 | (0.74) | |||
| ZNUA_ECOLI | High-affinity zinc uptake system protein znuA | 395/2 | 39/1 | 0.79 | ZNUC_ECOLI (0.74) | |||
The co-evolution between these proteins was evaluated using the "level9" and "nearest2" sets of organisms. The total number of pairs involving each protein for which it was possible to make calculations, as well as the number of positives (+) are indicated. The co-evolutionary score with the interactor is also shown (corr). For the cases for which the list contain more than one positive the score is the highest one (max). Finally, the AUC value for the list of scores is also included.