| Literature DB >> 32230759 |
Mikhail Yu Lobanov1, Ilya V Likhachev1,2, Oxana V Galzitskaya1,3.
Abstract
We created a new library of disordered patterns and disordered residues in the Protein Data Bank (PDB). To obtain such datasets, we clustered the PDB and obtained the groups of chains with different identities and marked disordered residues. We elaborated a new procedure for finding disordered patterns and created a new version of the library. This library includes three sets of patterns: unique patterns, patterns consisting of two kinds of amino acids, and homo-repeats. Using this database, the user can: (1) find homologues in the entire Protein Data Bank; (2) perform a statistical analysis of disordered residues in protein structures; (3) search for disordered patterns and homo-repeats; (4) search for disordered regions in different chains of the same protein; (5) download clusters of protein chains with different identity from our database and library of disordered patterns; and (6) observe 3D structure interactively using MView. A new library of disordered patterns will help improve the accuracy of predictions for residues that will be structured or unstructured in a given region.Entities:
Keywords: disordered residues; homo-repeats; identity; low complexity regions; protein structure
Mesh:
Substances:
Year: 2020 PMID: 32230759 PMCID: PMC7180803 DOI: 10.3390/molecules25071522
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Figure 1Dependence of the number of clusters on the identity between protein chains for the different years of Protein Data Bank (PDB) clusterization.
Comparison of data for different AC2 patterns (pdb).
| № | AC2 Pattern | Nu | Nf | Nu–Nf | NC75 | Nprot |
|---|---|---|---|---|---|---|
| Homo-repeat and adjacent amino acid | ||||||
| 1 | 00000001 | 342.6 | 166.6 | 176.0 | 73 | 101 |
| 2 | 01111111 | 349.3 | 175.8 | 173.6 | 79 | 113 |
| Sum | 467.3 | 225.3 | 242.0 | 80 | 114 | |
| Internal repeat in the patterns | ||||||
| 3 | 00010001 | 499.1 | 17.3 | 481.8 | 56 | 66 |
| 4 | 01000100 | 384.9 | 46.4 | 338.6 | 61 | 75 |
| 5 | 00100010 | 355.3 | 47.7 | 307.6 | 56 | 65 |
| 6 | 01010101 | 369.7 | 66.3 | 303.4 | 53 | 80 |
| 7 | 01110111 | 336.1 | 46.3 | 289.8 | 51 | 62 |
| 8 | 01100110 | 35.4 | 12.6 | 22.8 | 7 | 10 |
| 9 | 00110011 | 32.4 | 19.6 | 12.8 | 7 | 11 |
| Sum | 1040.0 | 228.2 | 811.9 | 147 | 196 | |
| Other patterns (part) | ||||||
| 10 | 00010000 | 1159.5 | 142.3 | 1017.2 | 125 | 163 |
| 11 | 00001000 | 1199.2 | 194.4 | 1004.8 | 131 | 171 |
| 12 | 00100001 | 982.8 | 81.4 | 901.3 | 84 | 111 |
| 13 | 01111011 | 865.0 | 63.7 | 801.3 | 87 | 110 |
| 14 | 01000010 | 765.8 | 53.7 | 712.2 | 87 | 114 |
| 15 | 00011111 | 663.7 | 105.1 | 558.6 | 122 | 163 |
| 16 | 00000111 | 257.9 | 33.5 | 224.4 | 48 | 60 |
| 17 | 00001111 | 236.2 | 38.0 | 198.2 | 42 | 52 |
| 18 | 00100100 | 307.6 | 167.7 | 139.9 | 50 | 78 |
| 19 | 01001001 | 261.5 | 132.3 | 129.3 | 38 | 49 |
| 20 | 01111100 | 127.6 | 8.2 | 119.5 | 22 | 24 |
| 21 | 00111101 | 182.9 | 65.1 | 117.8 | 32 | 37 |
| 22 | 01010100 | 132.9 | 22.7 | 110.2 | 23 | 27 |
| 23 | 00111110 | 129.6 | 24.0 | 105.6 | 24 | 27 |
| 24 | 01101101 | 228.4 | 124.8 | 103.6 | 35 | 54 |
| Sum | 5014.3 | 2491.5 | 2522.8 | 857 | 1230 | |
| Total sum | 6033.1 | 2827.9 | 3205.2 | 971 | 1398 | |
Notes: Nu/Nf is the sum of the average number of unfolded/folded residues in the clusters with 75% identity (C75); NC75 is the number of the clusters with 75% identity (C75).
Figure 2Dependence of the difference in the number of disordered and ordered residues in sequences covered by the patterns on the length of the last: (A) histidine tags, (B) homo-repeats, and (C) AC2-sequences in the clustered PDB.
Comparison of data for the different AC2 patterns.
| № | Pair a.a. | Nu | Nf | Nu–Nf | NC75 | Nprot |
|---|---|---|---|---|---|---|
| 1 | GlySer | 2539.9 | 319.4 | 2220.6 | 248 | 312 |
| 2 | GluAsp | 258.9 | 42.5 | 216.4 | 32 | 59 |
| 3 | GlyHis | 162.7 | 9.2 | 153.6 | 28 | 31 |
| 4 | SerHis | 156.6 | 3.7 | 152.9 | 25 | 27 |
| 5 | AlaPro | 171.4 | 41.8 | 129.6 | 25 | 57 |
| 6 | GlyLys | 94.5 | 2.5 | 92.0 | 12 | 18 |
| 7 | GlnPro | 88.0 | 3.2 | 84.9 | 9 | 13 |
| 8 | SerAsp | 92.0 | 16.3 | 75.7 | 11 | 13 |
| 9 | GluLys | 86.0 | 11.0 | 75.1 | 11 | 14 |
| 10 | GlyArg | 79.1 | 14.4 | 64.6 | 8 | 12 |
| 11 | AsnHis | 58.6 | 0.0 | 58.6 | 6 | 11 |
| 12 | AsnAsp | 61.8 | 6.3 | 55.5 | 6 | 6 |
| 104 | IleGlu | 0.0 | 42.0 | −42.0 | 5 | 7 |
| 105 | LeuGlu | 9.9 | 57.4 | −47.5 | 9 | 12 |
| 106 | AlaGlu | 73.8 | 129.9 | −56.1 | 25 | 35 |
| 107 | IleAla | 10.0 | 76.8 | −66.8 | 12 | 15 |
| 108 | ValAla | 17.0 | 84.1 | −67.1 | 14 | 22 |
| 109 | AlaSer | 53.7 | 135.3 | −81.7 | 18 | 27 |
| 110 | ValGly | 7.3 | 91.0 | −83.7 | 12 | 16 |
| 111 | AlaArg | 18.8 | 108.9 | −90.1 | 15 | 21 |
| 112 | LeuAla | 94.9 | 226.0 | −131.1 | 44 | 75 |
| 113 | GlyPro | 14.2 | 190.9 | −176.7 | 16 | 21 |
| Total sum | 6033.1 | 2827.9 | 3205.2 | 971 | 1398 |
Figure 3Scheme for server processing.
Comparison of data for the different versions of libraries of disordered patterns (Lmin is the minimum length of pattern and Lmax is the maximum length of pattern).
| № | Year | Number of Unique Patterns | Lmin | Lmax |
|---|---|---|---|---|
| 1 | 2010 | 109 | 6 | 24 |
| 2 | 2011 | 141 | 4 | 17 |
| 3 | 2012 | 171 | 4 | 21 |
| 4 | 2018 | 384 | 4 | 28 |
| 5 | 2019 | 518 | 4 | 90 |
Figure 4Statistics of all disordered residues from the clustered PDB at the 75% level of identity for the different positions of protein chains.
Connection between patterns and histidine tag.
| Region | Number of Patterns |
|---|---|
| PC = 0% (all patterns are far from H4) | 186 |
| 0% < PC < 33% | 226 |
| 33% ≤ PC ≤ 67% | 33 |
| 67% < PC < 100% | 40 |
| PC = 100% (all patterns are near with H4) | 33 |
Figure 5Common view of the database.