| Literature DB >> 16613604 |
Vichetra Sam1, Chin-Hsien Tai, Jean Garnier, Jean-Francois Gibrat, Byungkook Lee, Peter J Munson.
Abstract
BACKGROUND: Current classification of protein folds are based, ultimately, on visual inspection of similarities. Previous attempts to use computerized structure comparison methods show only partial agreement with curated databases, but have failed to provide detailed statistical and structural analysis of the causes of these divergences.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16613604 PMCID: PMC1513609 DOI: 10.1186/1471-2105-7-206
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1ROC Curves. ROC curves of VAST (dotted line) and SHEBA (solid line) obtained by plotting the True Positive Rate (TPR, eq. 1, see Methods) against the False Positive Rate (FPR, eq. 2, see Methods). Area Under the Roc Curve (AUC) for VAST is 0.90, AUC for SHEBA is 0.93.
Figure 2Confusion matrix heat map. Confusion matrix heat map for VAST with a Pcli cutoff value of 2.5 and for SHEBA with a Zscore cutoff value of 2.7. The cutoffs correspond to an overall average FPR of 0.01, and result in an overall average TPR of 0.616 and 0.748 for VAST and SHEBA respectively. The x (target folds) and y (query folds) axes of the heat maps are labeled by the SCOP folds, grouped into classes A, B, C, D, E, F and G. Each class is delimited by a vertical line (for the x axis) and a horizontal line (for the y axis). Each pixel within the heat maps represents a fold-specific true or false positive rate and takes value between 0 and 1. Diagonal and off-diagonal pixels correspond to fold-specific true positive rate TPR(c) (eq. 4, see Methods) and fold-specific false positive rate FPR(c) (eq. 3, see methods) respectively. To improve the visibility of the heat maps, rates between 0 and 0.2 are represented in grey scale where white corresponds to a rate of 0 and black to a rate at or above 0.2. For high resolution heat maps of VAST and SHEBA, [See Additional file 1].
Figure 3Distribution of true positive rates. Distribution of fold specific true positive rates within each SCOP class (A to G) for VAST and SHEBA. TPR(eq. 4, see Methods) are obtained using same cutoff values as in Figure 2. The scale of the y axes for VAST and SHEBA distributions are the same within fold class. Histogram bar height represents the number of folds for a given range of TPR. The x axis is divided in 20 bins. The class-specific average TPRis reported within each subplot. For the list of TPRobtained by each fold, with VAST and SHEBA, [See Additional file 2].
Folds having domain pairs with undetected similarity by both VAST and SHEBA.
| A | a.4(1576/13572), a.118(777/2550), a.39(282/1640), a.60(238/812), a.138(166/272), a.24(77/930), a.1(62/930), a.2(47/272), a.100(39/90), a.25(37/182), a.3(37/992), a.29(25/132), a.26(20/650), a.23(10/20), a.28(10/72), a.69(9/20), a.7(9/342), a.93(8/42), a.102(7/600), a.112(4/20), a.127(4/30), a.35(4/110), a.61(4/30), a.5(3/90), a.55(3/20), a.74(3/272), a.116(2/20), a.126(2/30), a.133(2/20), a.137(2/6), a.64(2/20), a.128(1/42), a.144(1/12), a.27(1/72), a.48(1/6), a.6(1/42). |
| B | b.1(2973/57840), b.40(1382/7482), b.34(436/2652), b.82(341/930), b.10(323/1640), b.2(164/702), b.29(163/1056), b.85(91/156), b.43(69/702), b.84(49/182), b.30(32/110), b.50(16/132), b.18(14/552), b.13(12/110), b.35(11/72), b.7(11/182), b.19(10/30), b.6(8/1406), b.80(8/110), b.92(7/56), b.3(6/110), b.60(6/420), b.106(5/6), b.52(4/132), b.49(3/12), b.58(3/20), b.21(2/6), b.45(2/12), b.53(2/6), b.83(2/2). |
| C | c.37(6218/14762), c.1(1152/32942), c.55(929/2756), c.26(255/1722), c.52(228/506), c.2(197/9702), c.23(161/4160), c.69(92/2550), c.94(90/600), c.66(87/1190), c.56(38/552), c.47(17/2550), c.58(16/110), c.92(16/110), c.3(13/2070), c.10(12/306), c.53(12/72), c.8(12/90), c.14(9/110), c.51(9/156), c.72(6/210), c.43(4/42), c.61(3/272), c.36(2/342), c.19(1/6), c.63(1/20), c.78(1/132), c.87(1/30), c.9(1/2), c.97(1/12). |
| D | d.58(2052/17556), d.92(235/552), d.3(221/380), d.142(164/380), d.15(104/3080), d.169(74/552), d.26(74/306), d.17(59/552), d.81(54/210), d.153(49/600), d.166(42/90), d.211(40/132), d.144(33/650), d.110(26/306), d.129(23/182), d.68(23/90), d.2(22/132), d.14(14/240), d.79(14/210), d.108(12/210), d.16(12/182), d.87(10/156), d.4(8/12), d.104(5/210), d.109(4/182), d.122(4/110), d.143(4/6), d.41(4/90), d.67(4/20), d.10(3/20), d.50(3/72), d.184(2/2), d.52(2/90), d.18(1/2), d.74(1/56), d.82(1/6). |
| E | e.8(110/182), e.26(3/6) |
| F | f.1(58/110), f.4(46/182), f.21(12/42), f.23(5/20), f.7(4/6). |
| G | g.3(357/1406), g.41(96/420), g.15(5/90), g.17(4/132), g.39(2/132). |
Folds from classes A, B, C, D, E, F and G are reported in rows labeled by the name of the class. Reported folds within a given class are ordered by decreasing number of domain pairs with undetected similarity they contain. The number of such pairs within a fold and the total number of pairs are indicated for each fold in parenthesis. Similarity between domains of a pair was considered undetected when their Pcli and Zscore were below the 5% FPR cutoffs of 1 for Pcli and 1.6 for Zscore.
Figure 4Structural variations within fold b.1. Domains (a) d1c5ch2, (b) d1akjd_, and (c) d1pama1 belong to the fold b.1 (Immunoglobulin-like beta-sandwich; 7 strands in 2 sheets greek-key, some members of the fold have additional strands). Domain pair (a) and (b) have Pcli = 4.7 and Zscore = 3.09; domain pair (b) and (c) have Pcli = 4.2 and Zscore = 3.34; and domain pair (a) and (c) have Pcli=-0.5 and Zscore = 0.11. Domains d1c5ch2, d1akjd_, and d1pama1 have 103, 114, and 86 residues, respectively. The helices are colored in green, the strands in red, and the other regions in blue. This and all other structure figures were prepared using Pymol [43].
Figure 5Repeat of a structural motif within fold a.118. The color scheme is the same as in Figure 4. Structures of domains (a) d1a17_, (b) d1kula_ and (c) d1qbkb_ from fold a.118 (alpha-alpha superhelix, multihelical; 2 (curved) layers: alpha/alpha; right-handed superhelix). Domains have 159 residues and 7 helices, 211 residues and 10 helices, and 888 residues and 48 helices, respectively. The VAST similarity score Pcli assigned to the domain pair (a) and (b) is -2.3, to (a) and (c) is -3, and to (b) and (c) is -8. The SHEBA Zscores are respectively 1.8, 1.3, and 1.3. The negative values reported for the Pcli should be interpreted as values very close or equal to zero (no similarity), and resulted from the use of an approximation for the computation of Pcli.
Figure 6Decoration of a common core. Structures of domains d1e9ga_ (a) and d1enfa1 (b) of SCOP fold b.40 (barrel, closed or partly opened n = 5, S = 10 or S = 8; greek-key). Color scheme is the same as in Figure 4. Domain (a) has 284 residues, and (b) has 100 residues. Pcli and Zscore values assigned by VAST and SHEBA to this pair are 0.1 and -1.3, respectively.
Sets of folds confused by both VAST and SHEBA.
| Sets of confused folds, S | Number of domains in S | Sheba | Sheba | Vast | Vast | Explanation for confusion | |||
| 1 | a.28, a.39 | 50 | 29 | 57 | 10 | 16 | 4 helix bundle up-and-down (a.28), and 4 helix array of 2 hairpins folds. Confusion is caused by match of helices oriented similarly. Folds confused mostly by SHEBA. | ||
| 2 | a.46, a.52 | 9 | 45 | 97 | 7 | 36 | 4 helix bundle left and right-handed super helix (a.46), and 4 helix right-handed super helix folds. Confusion is caused by match of helices oriented similarly. Folds confused mostly by SHEBA. | ||
| 3 | a.47, a.7 | 24 | 87 | 88 | 8 | 20 | 3 helix bundle (a.7) and 4 helix bundle (a.47) folds. Confusion due to match of very similar structure. Folds confused mostly by SHEBA. | ||
| 4 | b.68, b.69, b.66, b.67, b.70 | 45 | 92 | 98 | 40 | 83 | Beta-propellers (repetitive 4-stranded blades) folds, of 4, 5, 6, 7 or 8 blades depending on the fold. Confusion is caused by match of several 4-stranded blades among domains of these folds. | ||
| 5 | b.1, b.2, b.3, b.7, b.12. | 297 | 19 | 66 | 32 | 68 | Beta sandwich folds of 7, 8, 9 stranded-sheet, with Greek-key topology. The motif causing the confusion among folds is a sandwich, which is rather well matched between domains of these folds. | ||
| 6 | b.24, b.71 | 24 | 69 | 97 | 27 | 93 | Sandwich fold, with 10 strands in 2 sheets, and "folded meander topology" fold (b.24), and folded sheet with Greek-key topology. Confusion is due to match of parts of the sheets of the common core of these folds. | ||
| 7 | b.60, b.61 | 30 | 63 | 90 | 57 | 78 | Closed barrel, with meander topology. Confusion caused by good match of between barrel motifs of the common core. | ||
| 8 | b.43, b.49, b.58, b.44 | 39 | 42 | 71 | 32 | 72 | Folds of closed barrel with Greek-key topology. Confusion is due the match of substantial part of the barrel common core, among domains of these folds. | ||
| 9 | b.107, b.4 | 4 | 100 | 100 | 25 | 100 | Sandwich fold (b.4), and closed barrel fold (b.107). Confusion is caused by the good match between a deformed barrel motif and a sandwich motif. | ||
| 10 | b.34, b.38 | 62 | 69 | 67 | 19 | 49 | Barrel folds, with meander topology. Confusion is caused by the match between the barrel common cores. | ||
| 11 | b.38, b.56 | 12 | 52 | 100 | 65 | 93 | Open barrel (b.38) and closed barrel (b.56) folds. Confusion is caused by the match of the barrel. | ||
| 12 | b.10, b.19, b.13, b.18, b.22, b.23 | 91 | 42 | 76 | 16 | 54 | Folds with common core motif of beta sandwich; the 2 sheets are made of 8, 9 or 10 strands depending on the fold, and with jelly roll topology. The confusion among these folds is caused by the match of the strands of the beta sandwich common core. | ||
| 13 | c.1, c.6 | 185 | 62 | 75 | 78 | 87 | TIM barrel (c.1) and variant of beta/alpha barrel, with closed parallel beta-sheet barrel (c.6) folds. Confusion is caused by the match of almost the whole TIM barrel. | ||
| 14 | c.8, c.98 | 14 | 50 | 75 | 30 | 54 | 3 layer beta/beta/alpha (c.8) and 3 layer alpha/beta/alpha (c.98) folds. Confusion is caused by the match between common beta/alpha layers. | ||
| 15 | c.84, c.95 | 19 | 65 | 91 | 55 | 92 | 3 layer alpha/beta/alpha of 4 strands (c.84), and of 5 strands (c.95) folds. Match of the 3 layer alpha/beta/alpha common core causes the confusion. | ||
| 16 | c.101, c.73, c.27 | 7 | 11 | 100 | 49 | 100 | 3 layer alpha/beta/alpha folds, with 5, 6 or 8 strands depending on the fold. Confusion is caused by the match of the 3 layer alpha/beta/alpha common core. | ||
| 17 | c.100, c.28, c.25, c.24, c.30, c.78, c.108, c.116, c.31, c.114, c.3, c.4, c.49, c.59, c.16, c.57, c.44, c.48, c.2, c.33, c.32, c.34, c.23, c.62, c.65, c.5 | 334 | 24 | 80 | 51 | 92 | 3 layer alpha/beta/alpha folds, with beta sheet of 4, 5, 6 or 7 strands depending of the fold. 3 layer beta/beta/alpha with central of 5 strands for c.3. Confusion among 3 layer alpha/beta/alpha folds is caused by the match of the 3 layer alpha/beta/alpha common core. Confusion between 3 layer alpha/beta/alpha and beta/beta/alpha is caused by the match of the 2 layer beta/alpha. | ||
| 18 | d.13, d.173 | 7 | 26 | 93 | 43 | 86 | Fold containing the 3 layer alpha/beta/alpha common core (d.130 and unusual fold containing a common core of beta-alpha-beta-alpha-beta-alpha-beta (d.173). Confusion caused by the match of some strands and helices. | ||
| 19 | d.65, d.67 | 7 | 47 | 46 | 60 | 64 | 2 layer alpha/beta sandwich fold. Confusion caused by the match of 2 layer alpha/beta sandwich common core. | ||
| 20 | d.181, d.212 | 5 | 50 | 60 | 17 | 60 | Folds containing beta-alpha-beta units. Confusion caused by match on the alpha/beta layers. | ||
| 21 | d.10, d.50 | 14 | 34 | 66 | 40 | 61 | 2 layer alpha/beta folds. Confusion caused by match on the 2 layer alpha/beta common cores. | ||
| 22 | d.140, d.68 | 12 | 34 | 68 | 40 | 52 | Fold with 2 layer beta/alpha sandwich common core. Confusion is caused by match of the 2 layer beta/alpha sandwich. | ||
| 23 | d.151, d.160 | 7 | 75 | 100 | 58 | 100 | Beta-sandwich; duplication of alpha+beta (d.151), 4 layers: alpha/beta/beta/alpha; mixed beta sheets (d.160) folds. Confusion due to match of the alpha beta sandwich. | ||
| 24 | d.95, d.206, d.64 | 12 | 18 | 96 | 34 | 79 | 2 layer alpha/beta sandwich folds. Confusion caused by the match of the 2 layer alpha/beta sandwich. | ||
| 25 | d.11, d.40 | 5 | 100 | 100 | 67 | 100 | 2 layer alpha/beta sandwich folds. Confusion caused by match of the 2 layer alpha/beta sandwich. | ||
| 26 | d.130, d.80, d.52 | 19 | 53 | 90 | 51 | 62 | 2 layer alpha/beta sandwich folds. Confusion is caused by the match of the 2 layer alpha/beta sandwich. | ||
| 27 | d.45, d.74, d.58, d.51, d.94, d.141, d.105 | 160 | 43 | 58 | 48 | 59 | 2 layer alpha/beta sandwich, and two beta-sheets and one alpha-helix packed around single core (d.141) folds. Confusion caused by match of the sheet and strands of the 2 layer alpha/beta sandwich core motif. | ||
| 28 | e.24, c.16, c.57, c.44, c.23, c.5 | 79 | 47 | 73 | 68 | 85 | A domain component of a "multi-domain" domain of fold e.24 can matches the full domain of another fold which does not belong to the E class | ||
| 29 | e.4, c.48, c.2, c.32, c.33, c.34, c.23 | 178 | 35 | 74 | 74 | 87 | A domain component of a "multi-domain" domain of fold e.4 matches the full domain of another fold which does not belong to the E class |
Clusters of confused folds in VAST and SHEBA heat maps are reported. Rows 1 to 27 are intra-class clusters of confused folds found along the diagonal of the heat map. Only confusions in classes A, B, C and D are reported. Rows 28 and 29 are two off-diagonal clusters involving multi domains. Clusters and confused folds are listed in the order of appearance in the heat map. The heat maps of both methods obtained at 1% overall FPR were used to determined these clusters. Column 3 is the total number of domains within the set S. Columns 4 to 6 report the FPR, TPR(see Methods) and their ratios (in bold), for SHEBA, respectively, similarly, columns 7 to 9, report FPR, TPRand their ratios (in bold), for VAST, respectively.
Figure 7Confusion matrix for the B class. Confusion matrix heat map for VAST and SHEBA showing confusion among some SCOP folds of the class B, mainly beta domains. Fold identifiers appear on the x and y axis. Grey scale from white to black for positive rates from 0 to 1.
Figure 8Similar structures in different SCOP folds. Structures of domains (a) d1gyha_ of fold b.67 and (b) d1loqa2 of fold b.69, with 318 residues and 295 residues respectively. They correspond to beta propeller domains with respectively 5 and 7 four-stranded blades. The Pcli and Zscore values are 5.2 and 7.2, respectively. Color scheme is the same as in Figure 4.
Figure 9Superposition of two structures. Superposition by VAST of two structures from different 3 layers alpha/beta/alpha SCOP folds of class C. View of backbones of domains (a) d1a8p_2 and (b) d1a9xa2, from folds c.25 and c.24, respectively. The common parts of both structures superposed by VAST, are in red and the unmatched residues in green. The superposition aligned 71 residues; d1a8p_2 has 158 residues and d1a9xa2 has 138; RMSD = 2.7, Pcli = 6.0. SHEBA Zscore is 3.4. The SCOP definition of fold c.25 is: Methylglyoxal synthase-like; 3 layers, alpha/beta/alpha; parallel beta-sheet of 5 strands, order 32145. The SCOP definition of fold c.24 is: Ferredoxin reductase-like, C-terminal NADP-linked domain; 3 layers, alpha/beta/alpha; parallel beta-sheet of 5 strands, order 32145.
Figure 10Confusion matrix heat map for the D class. Confusion matrix heat map for the D class for VAST and SHEBA showing clusters of confused SCOP folds. The fold identifiers appear on the x and y axis. Grey scale from white to black for positive rates from 0 to 1.
Figure 11Confusion between SCOP folds of class B. Color scheme is the same as in Figure 4. Domain (a) d1tvda_ and domain (b) d1pama1 belong to the same fold, b.1 (sandwich; 7 strands in 2 sheets; greek-key), and are found similar with Pcli = 3 and Zscore = 3.38. Domain (c) d1ep3b1 and domain (d) d1d2ea1 belong to the same fold, b.43 (barrel, closed; n = 6, S = 10; greek-key), and are found similar with Pcli = 4.5 and Zscore = 3.7. Domains (b) and (c) belong to folds defined by different folding patterns. Both VAST and SHEBA found them similar with Pcli = 3.1 and Zscore = 3.32. Domains (a) and (d) were found dissimilar by VAST and SHEBA with a Pcli = -1.8, and a Zscore = 0.73.
The four possible outcomes of ROC analysis for a particular domain.
| Domain | Domain | |
| Domain | ||
| Domain |