| Literature DB >> 29728050 |
Taushif Khan1, Shailesh Kumar Panday2, Indira Ghosh2.
Abstract
BACKGROUND: In protein design, correct use of topology is among the initial and most critical feature. Meticulous selection of backbone topology aids in drastically reducing the structure search space. With ProLego, we present a server application to explore the component aspect of protein structures and provide an intuitive and efficient way to scan the protein topology space. RESULT: We have implemented in-house developed "topological representation" in an automated-pipeline to extract protein topology from given protein structure. Using the topology string, ProLego, compares topology against a non-redundant extensive topology database (ProLegoDB) as well as extracts constituent topological modules. The platform offers interactive topology visualization graphs.Entities:
Keywords: Protein Graph.; Protein topology; Server application.; Topology comparisons.; Visualization.
Mesh:
Substances:
Year: 2018 PMID: 29728050 PMCID: PMC5935970 DOI: 10.1186/s12859-018-2171-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Comparing topology visualization using ProLego (a) and PTGL (b) for the case of photosynthetic reaction centre (Photosystem1 (PDB Id: 1JB0; chain: L)). The chain an anit-parallel beta sheet at the N-terminal followed by seven alpha-helices. Fig a.ii, shows a cartoon representation of protein chain using VMD. In linear topology (a.iv) strands are represented as triangles (with relative orientation as up/down triangle) and helices are represented as rectangle. The length of helical rectangles scaled as per number of residues in corresponding helix. The protein chain is represented as red to green to blue as passes from N to C terminal. The linear lines, connecting secondary structure (SS) blocks shows chain connectivity, whereas the arc lines represent spatial connectivity and type of SS contact (colour coded as labelled in Additional file 1: Table S4). The secondary structure contact map (a.i), shows all spatial contact between pairs of SS. A 3D carton representation (VMD generated a.ii) and 2D topology cartoon (a.iii) plot is generated from ProLego. The 2D ProLego cartoon shows contact between two SS blocks by red dotted lines and chain connectivity by black continuous line. Figure b, shows the topology representation of same protein generated using protein topology graph library (http://ptgl.uni-frankfurt.de/api/index.php/pg/1jb0/L/albe/json), the alpha-beta graph. The graph represents SSEs from N to C terminal in left to right fashion. Helices are represented as circles and stands as rectangles. PTGL considers, 310 helices also in total helix, hence the addition of 1st and 7th helix, giving total number of helix to 9 instead of 7 alpha-helix as per ProLego in this protein. PTGL misses the N-terminal sheet, which is represented as up-down triangle (for anti-parallel orientation) in case of ProLego
Description of ProLegoDB
| Structure Class | Topologiesa | Proteinsb | Domainsc |
|---|---|---|---|
| A | 3315 | 6064 | 2134 |
| B | 2485 | 3955 | 1520 |
| Mix AB | 1401 | 48,167 | 10,754 |
| Total | 7201 | 58,186 | 14,408 |
The topology database, ProLegoDB, describes protein topology space. Representative datasets of non-redundant protein chains and domain has been constructed as described in (S1.3). Above table summarises the database with different structure class (A: all-alpha, B: all-beta and mix AB: Alpha-Beta). Number of astatistically significant topology group for each structure classes has been shown with table heading of “Topologies”. Number of proteins in the database for each structure class has been reported in the next columns. bProtein chains are considered from extracted non- redundant datasets of PDB, whereas cDomains are protein entry from curated domain databases of CATH (3.5) and Astral (SCOP v1.75). The maximum pairwise sequence identity between chains are < 40%
Fig. 2Distribution of topology and protein in groups of “Non-Prevalent” (left to dashed line) and “Prevalent” (right to dashed lines) has been shown as violin plots. This plot is generated for the statistically significant topologies (P-value < 0.001; Additional file 1: Table S3), from represented dataset of PDB (58,186 protein chains). Description of dataset has been provided in the text and supplementary. The shape of violin plot describes the kernel density estimation of the distribution of data in different topologies and proteins. A summary of statistics can be drawn from the inner boxplot. The white dot represents the median, thick bar shows the interquartile range and thin line describes the 95% confidence interval. A clear distinction can be drawn on the nature of distribution of proteins as well as topologies in “Prevalent” and “Non-prevalent” groups. A comparison of distribution with non-parametric Wilcoxon rank-sum test has been performed and P-values are indicated as ‘*’ (‘****’: P-val < 0.001 and ‘**’: P-val < 0.01) in the bottom