| Literature DB >> 23408879 |
Shanshan Cheng1, Charles L Brooks.
Abstract
Viral capsid proteins assemble into large, symmetrical architectures that are not found in complexes formed by their cellular counterparts. Given the prevalence of the signature jelly-roll topology in viral capsid proteins, we are interested in whether these functionally unique capsid proteins are also structurally unique in terms of folds. To explore this question, we applied a structure-alignment based clustering of all protein chains in VIPERdb filtered at 40% sequence identity to identify distinct capsid folds, and compared the cluster medoids with a non-redundant subset of protein domains in the SCOP database, not including the viral capsid entries. This comparison, using Template Modeling (TM)-score, identified 2078 structural "relatives" of capsid proteins from the non-capsid set, covering altogether 210 folds following the definition in SCOP. The statistical significance of the 210 folds shared by two sets of the same sizes, estimated from 10,000 permutation tests, is less than 0.0001, which is an upper bound on the p-value. We thus conclude that viral capsid proteins are segregated in structural fold space. Our result provides novel insight on how structural folds of capsid proteins, as opposed to their surface chemistry, might be constrained during evolution by requirement of the assembled cage-like architecture. Also importantly, our work highlights a guiding principle for virus-based nanoplatform design in a wide range of biomedical applications and materials science.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23408879 PMCID: PMC3567143 DOI: 10.1371/journal.pcbi.1002905
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Capsid shells and the folded topology of a typical capsid protein.
A) Representative icosahedral viral capsid structures with varying sizes. The Satellite Tobacco Mosaic Virus which is a T = 1 virus has a radius of 8.8 nm, and the Paramecium bursaria Chlorella virus 1 (PBCV-1) which is a pT = 169 virus has a radius of 92.9 nm. Here pT stands for ‘pseudo T number’, which simply means the subunits are not chemically identical (the primary sequences are different). These protein shells are large in that they are assembled from tens of up to hundreds of protein monomers, and they are highly symmetrical. B) The signature jelly-roll of viral capsid proteins, with 8 β-strands forming two antiparallel sheets. The wedge or trapezoidal shape of this particular fold immediately reveals six flat surfaces for monomer-monomer interaction; the sides, the two loop ends and the top and the bottom. The prevalence of the jelly-roll fold among capsid proteins might be related to their relative ease for tiling.
Figure 2Comparison in structural fold space of capsid proteins and non-capsid ones.
Capsid proteins form large, highly symmetric protein shells (left), while generic proteins form other types of complexes (right), exemplified here by an RNA polymerase elongation complex. Overlap between the structural space of viral capsid proteins and that of generic proteins signifies the set of non-capsid ‘relatives’ of capsid proteins. Figure is for illustration purposes and not drawn to scale.
Figure 3Domain size distribution.
Shown in pink is the density distribution of the lengths of non-capsid proteins, and that of capsid proteins is shown in blue. Viral capsid proteins appear to have overall larger domains compared to their cellular counterparts, with a few exceptionally complex domains having more than 600 residues. 600 was later used as a size cutoff in order to examine the two sets that are of comparable sizes.
Figure 4Clustering to find representative capsid folds.
Shown here are all pairwise distances between members from the same cluster (grey) and between members from different clusters (blue). Partitioning was chosen such that each cluster is maximally homogeneous, with no members within the same cluster being farther than 0.6 apart.
Figure 5The 56 representative capsid folds.
Domains within one cluster are superimposed on one another to show good structural alignment, with number of members in each cluster indicated. The prevalence of singlet clusters reflects the scarcity of structural data for many viral families.
The 21 folds covered by structural relatives of capsid proteins.
| fold (as in SCOP) | name of fold | description of fold | whether contains capsid proteins | example of non-capsid relatives | ID of example |
| b.1 | Immunoglobulin-like beta-sandwich |
| Yes | Titin, I27 | d1tiua_ |
| b.2 | Common fold of diphtheria toxin/transcription factors/cytochrome f |
| No | Runt-related transcription factor 1 | d1eaqa_ |
| b.6 | Cupredoxin-like |
| No | Auracyanin | d1qhqa_ |
| b.7 | C2 domain-like |
| No | Chaperone protein Caf1M | d1p5va2 |
| b.14 | Calpain large subunit, middle domain (domain III) |
| No | M-Calpain | d1df0a2 |
| b.18 | Galactose-binding domain-like |
| No | Xyn10B carbohydrate-binding module | d1h6ya_ |
| b.22 | TNF-like |
| No | Tumor necrosis factor superfamily member 4 | d2hewf1 |
| b.23 | CUB-like |
| No | Acidic seminal fluid protein (spermadhesin) | d1sfpa_ |
| b.29 | Concanavalin A-like lectins/glucanases |
| Yes | Sugar binding protein | d1is3a_ |
| b.47 | Trypsin-like serine proteases |
| Yes | human alpha-thrombin | d1h8d.1 |
| b.71 | Glycosyl hydrolase domain |
| No | alpha-galactosidase | d1uasa1 |
| b.82 | Double-stranded beta-helix |
| No | transcriptional regulator, HTH_3 family | d1y9qa2 |
| b.121 | Nucleoplasmin-like/VP (viral coat and capsid proteins) |
| Yes | Nucleoplasmin-like protein (histone chaperone) | d1nlqa_ |
| b.132 | Supernatant protein factor (SPF), C-terminal domain |
| No | Lipid Binding Protein | d1olma2 |
| b.135 | Superantigen (mitogen) Ypm |
| No | superantigen from Yersinia pseudotuberculosis | d1pm4a_ |
| c.2 | NAD(P)-binding Rossmann-fold domains |
| No | Shikimate dehydrogenase | d1nyta1 |
| c.16 | Lumazine synthase |
| No | lumazine synthase | d1ejba_ |
| c.23 | Flavodoxin-like |
| No | Lysine aminomutase | d1xrsb1 |
| c.37 | P-loop containing nucleoside triphosphate hydrolases |
| No | elongation factor SelB | d1wb1a4 |
| c.44 | Phosphotyrosine protein phosphatases I-like |
| No | IIBcellobiose | d1iiba_ |
| c.66 | S-adenosyl-L-methionine-dependent methyltransferases |
| No | salicylic acid carboxyl methyltransferase | d1m6ex_ |
14 out of these 21 folds are either greek-key or jelly-roll (the latter fold being a specific variation of the former). Remarkably, 17 folds are specific to non-capsid proteins, and are only marginally similar to capsid proteins in structure.
Figure 6Capsid proteins are structurally distant from generic proteins.
Each curve plots the empirical cumulative fraction distribution of distances between one set of 56 proteins and their nearest neighbor in the complementary set. The comparison between the capsid set and the non-capsid proteins is colored in blue, while those from the 10,000 permutation tests are colored in grey. The average empirical cumulative fraction distribution of the 10,000 permutation tests is colored in red. The capsid set is clearly further away from its non-self set compared to what happens with random chances.
Figure 7Statistical significance of test statistic.
No single case in the 10,000 permutations has resulted in 210 or fewer shared folds between the set of 56 protein domains and their complement set, which makes the p-value of our test statistic less than 0.0001, as an upper bound for the statistical significance.
Seven functional classes of proteins we studied are found to be not significantly distinguished in their folded topology.
| Functional class | Size of class | Subgroups, if any, included | One-tail p-value |
| Kinase | 213 |
| 0.1449 |
| Globin | 32 |
| 0.4154 |
| Dehydrogenase | 297 |
| 0.3461 |
| Polymerase | 67 |
| 0.0572 |
| Chaperone | 33 |
| 0.2925 |
| Antigen | 49 |
| 0.4411 |
| Muscle | 18 |
| 0.1972 |
The shared folds between each functional class of proteins and their complement are not significantly small compared to what happens with random chances, with a one-tailed p-value greater than 0.05 in every case, suggesting that these cellular proteins are highly connected in structural fold space.