| Literature DB >> 20003527 |
Kwangbom Choi1, Shawn M Gomez.
Abstract
BACKGROUND: The understanding of evolutionary relationships is a fundamental aspect of modern biology, with the phylogenetic tree being a primary tool for describing these associations. However, comparison of trees for the purpose of assessing similarity and the quantification of various biological processes remains a significant challenge.Entities:
Mesh:
Year: 2009 PMID: 20003527 PMCID: PMC3087345 DOI: 10.1186/1471-2105-10-423
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The three different types of embedded structure alignment described in this work. (a) rCEED aligns two target structures indirectly using a reference structure. This alignment is based on classical Procrustes superimposition. (b) For the detection of outliers and/or common substructures, we use vCEED to perform a local alignment (rather than global in the case of rCEED). (c) If neither a reference structure nor correspondence information is available, we can align the structures using gCEED which adapts a Gaussian mixture model approach for the accurate superimposition.
Figure 2Schematic overview of rCEED approach. (a) Genetic distances obtained from sequence alignment or patristic distance obtained from phylogenetic tree are mapped into Euclidean space by multidimensional scaling. Orthologous protein families Xand Xalong with two identical reference structures (16S rRNA orthologs), X, are embedded in a Euclidean space. (b) Next, each reference structure is superimposed onto their respective protein families. (c) All four structures are now superimposed based on estimated transformations between each set of references. Since both reference structures were orthogonally transformed in (b), they will match exactly at this step. (d) The final superimposition result after removal of the reference structures.
AUCs of tested approaches for detecting protein interactions via coevolution.
| Methods | AUC (PR curve)1 | AUC (ROC curve)2 | p-value3 |
|---|---|---|---|
| rCEED 4 | 0.069 | 0.763 ± 0.013 | N/A |
| rCEED 5 | 0.083 | 0.7965 | |
| vCEED | 0.763 ± 0.013 | 0.9919 | |
| mirrortree | 0.048 | 0.687 ± 0.013 | <0.0001 |
| tol-mirrortree | 0.063 | 0.722 ± 0.014 | <0.0001 |
| phylogenetic vector projection | 0.053 | 0.704 ± 0.013 | <0.0001 |
| partial correlation | 0.050 | 0.687 ± 0.013 | <0.0001 |
| Interactions identified in DIP6 | 388 | ||
1Area under precision-recall curve
2Area under receiver operating characteristic curve
3The significance was computed using rCEED (observed distances) as reference according to [24].
4Based on observed distances
5Based on patristic distances after the reconstruction of neighbor joining trees
6August 2009 version of DIP.
Figure 3Schematic of the difference between classical Procrustes alignment and Verboonian robust alignment (vCEED). Classical Procrustes alignment is shown in (a) with errors distributed across all corresponding pairs during global alignment. This is in contrast to vCEED (b), where an outlier becomes clearly distinguishable due to the alignment of a matching (local) substructure.
Figure 4HGT detection via vCEED for RuvB. The phylogenetic tree of the RuvB (COG2255) family is shown on the left (redrawn from [27]). Shown on the right are the vCEED alignment errors between COG2255 and 16S rRNA. The vertical line at 0.01 was the threshold c we used in this analysis (see Equation (6)).
Figure 5HGT detection via vCEED for uppS. The phylogenetic tree of the UppS (COG0020) family is shown on the left (redrawn from [27]). In addition to RP425 and RC0590 which was previously identified, an archaeal gene, APE1385, is clustered within a group of bacterial genes. Also observable is a bacterial branch consisting of DR2447, Cgl0966, Rv1086, and ML2467, with abnormal affinity to archaeal species. Both examples appear as outliers with vCEED (right) and indicate possible horizontal gene transfer. See Results for further details.
Figure 6Prediction of interaction specificity with gCEED. (a) The phylogenetic trees and binding specificity between two multigene families, GyrA/parC and GyrB/parE (redrawn from [28]). (b) A series of probability matrices that visualize the recursive prediction of individual interaction specificities. Each colored box/arrow indicates the indeterminate block that was chosen for further alignment via gCEED. (c) The final probability matrix with predicted mappings in black/grey. A perfect prediction (assuming no cross-interactions) would be expected to show black squares along the diagonal and white squares everywhere else in the matrix.
Stringent accuracy of specificity prediction.
| Protein Family Name | size | correlation | MATRIX | MORPH | gCEED |
|---|---|---|---|---|---|
| GyrA/B, ParC/E ( | 20 | 0.9932 | 50.0 | 50.0 | 50.0 |
| ParC/ParE ( | 12 | 0.9921 | 50.0 | 66.7 | 66.7 |
| Lyt-type regulator/sensors (E. coli/B. subtilis) | 4 | 0.9709 | 50.0 | 50.0 | 50.0 |
| GyrA/GyrB (Gram positive bacteria) | 18 | 0.9795 | 33.3 | 44.4 | 55.5 |
| Acetyl CoA carboxylase | 16 | 0.9756 | 75.0 | 75.0 | 62.5 |
| ParC/ParE (bacteria) | 26 | 0.9757 | 46.2 | 38.5 | 61.5 |
| GyrA/GyrB ( | 20 | 0.9723 | 90.0 | 80.0 | 50.0 |
| ParC/ParE (Gram positive bacteria) | 14 | 0.9634 | 14.3 | 28.6 | 28.6 |
| CheA/CheB (bacteria) | 8 | 0.9712 | 100.0 | 100.0 | 75.0 |
| Pyruvate dehydrogenase | 17 | 0.9599 | 64.7 | 70.6 | 35.3 |
| GyrA/B, ParC/E (Gram positive bacteria) | 28 | 0.9484 | 10.7 | 7.1 | 10.7 |
| DNA polymerase III E2/E3 (bacteria) | 20 | 0.9378 | 20.0 | 40.0 | 70.0 |
| Succinate CoA synthetase | 13 | 0.9182 | 7.7 | 30.8 | 23.1 |
| Ntr-type regulator/sensors (8 bacteria) | 14 | 0.9025 | 28.6 | 42.9 | 21.4 |
| Succinate CoA synthetase | 22 | 0.8959 | 54.6 | 50.0 | 54.5 |
| Omp-type regulator/sensors (5 bacteria) | 16 | 0.9307 | 0.0 | 68.8 | 31.3 |
| CCR-type chemokine/receptor (mouse/human) | 6 | 0.8790 | 66.7 | 66.7 | 33.3 |
| Acetyl CoA carboxylase | 9 | 0.8818 | 55.6 | 55.6 | 77.8 |
| Chemokine/receptor (mouse/human/rat) | 31 | 0.8789 | 19.4 | 16.1 | 3.2 |
| CKR-type chemokine/receptor (mouse/human/rat) | 18 | 0.8511 | 22.2 | 0.0 | 11.1 |
| CheA/CheY (11 bacteria) | 13 | 0.8370 | 23.1 | 15.4 | 23.1 |
| Nar-type regulator/sensors (8 bacteria) | 22 | 0.8488 | 18.2 | 9.1 | 13.6 |
| GyrA/GyrB (archaea) | 10 | 0.7948 | 20.0 | 20.0 | 10.0 |
| Cit-type regulator/sensors (E. coli/B. subtilis) | 5 | 0.7497 | 60.0 | 60.0 | 60.0 |
| ABC transporter membrane/binding protein (E. coli) | 17 | 0.4203 | 5.9 | 5.9 | 0.0 |
| ABC transporter membrane protein 1/2 (E. coli) | 19 | 0.6219 | 0.0 | 10.5 | 10.5 |
| ABC transporter membrane binding protein (H. influenzae) | 13 | 0.0427 | 15.4 | 23.1 | 7.7 |
| Two-component sensor/regulators (E. coli) | 27 | 0.6028 | 14.8 | 14.8 | 11.1 |
| Chemokine/receptor (human) | 13 | 0.5004 | 23.1 | 15.4 | 0.0 |
| ABC transporter membrane protein 1/2 (H. influenzae) | 14 | 0.3916 | 21.4 | 21.4 | 21.4 |
| Omp-type regulator/sensors (E. coli/B. subtilis) | 27 | 0.5314 | 7.4 | 33.3 | 3.7 |
| Omp-type regulator/sensors (E. coli) | 14 | 0.4295 | 28.6 | 14.3 | 14.3 |
| Omp-type regulator/sensors (B. subtilis) | 13 | 0.5628 | 15.4 | 7.7 | 15.4 |
| Lyt, Ple, and other type regulator/sensors (8 bacteria) | 20 | 0.4899 | 5.0 | 20.0 | 30.0 |
Figure 7Comparison of trees of different size. The large tree is a 20-node GyrB tree. The smaller is a GyrA tree, formed from random sampling of nodes with sizes ranging from nineteen to ten nodes (x-axis). For each size of the smaller tree a histogram of vicinity hit rate is shown on the y-axis, based on 100 randomly-formed trees of a given size. The dark line specifies the average hit rate. (a) Accuracy of comparison without using any known interaction information. (b) Accuracy of comparison when using a single correct protein interaction pair as prior information.