| Literature DB >> 24594841 |
Swagata Das1, Uttam Pal1, Supriya Das1, Khyati Bagga1, Anupam Roy1, Arpita Mrigwani1, Nakul C Maiti1.
Abstract
An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24594841 PMCID: PMC3940659 DOI: 10.1371/journal.pone.0089781
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Some of the intrinsically disordered human proteins from DisProt database.
| Sl No. | DisProt ID | Protein | Localization/source | Function/role | PI | Sequence length | aa |
|
| DP00004_C002 | Antibacterial protein LL-37 | Secreted | Antibacterial activity | 10.61 | 37 | 5,11,16 |
|
| DP00016 | Cyclin-dependent kinase inhibitor 1 | Cytoplasm, Nucleus | Role cyclin-dependent kinase activity | 8.69 | 164 | 23, 26, 81 |
|
| DP00017 | Cyclin-dependent kinase inhibitor 1C | Negative regulator of cell proliferation | 5.39 | 316 | 31, 27, 202 | |
|
| DP00028 | Eukaryotic translation initiation factor 4E-binding protein 1 | Cytosol | Regulates eIF4E activity | 5.32 | 118 | 14,12, 56 |
|
| DP00039 | Non-histone chromosomal protein HMG-17 | Cytoplasm, Nucleus | Binds to nucleosomal DNA | 10.00 | 89 | 14, 26, 40 |
|
| DP00040 | High mobility group protein HMG-I/HMG-Y | Chromosome, Nucleus | Processing of mRNA transcripts | 10.31 | 107 | 15, 27, 36 |
|
| DP00069 | Vesicle-associated membrane protein 2 | Synaptic vesicles | Membrane transport | 7.84 | 116 | 12,13, 66 |
|
| DP00070 | α-synuclein | Membrane-bound in dopaminergic neurons | Dopamine release and transport | 4.67 | 140 | 24,15, 73 |
|
| DP000126 | Tau [Isoform Tau-F] | Axons | Microtubule assembly and stability | 8.24 | 441 | 56, 58, 200 |
|
| DP00174 | Stathmin | Cytoplasm | Regulation of the microtubule (MT) | 5.76 | 149 | 36, 32, 52 |
|
| DP00199 | β-casein | Secreted | Modulate surface properties of the casein micelles | 5.52 | 226 | 20,15, 131 |
|
| DP00214 | Osteopontin | Secreted | cell-matrix interaction | 4.37 | 314 | 75, 29, 103 |
|
| DP00219 | Protein phosphatase 1 regulatory subunit 11 | Widely expressed | Inhibitor of protein phosphatase 1 | 6.52 | 126 | 20,19, 51 |
|
| DP00287 | Tumor suppressor [Isoform 1] | Cytoplasm | Involved in the ubiquitination | 4.70 | 213 | 41, 23, 114 |
|
| DP00332 | Bone sialoprotein 2 | Secreted | Cell attachment | 4.12 | 317 | 76, 23, 103 |
|
| DP00357 | Thymosin β-4 | Cytoplasm | Organization of the cytoskeleton | 5.02 | 44 | 11, 9, 13 |
|
| DP00372 | Uncharacterized protein C8orf4 | Apoptosis | 10.14 | 106 | 13, 24, 42 | |
|
| DP00510 | Nuclear protein 1 | Nucleus | Proapoptotic stimuli | 9.98 | 82 | 10, 15, 34 |
|
| DP00521 | Securin | Cytoplasm, Nucleus | Chromosome stability | 6.18 | 202 | 27, 26, 105 |
|
| DP00546 | Huntingtin-interacting protein K [Isoform 1] | 5.35 | 175 | 34, 29, 81 | ||
|
| DP00555 | β-synuclein | Cytoplasm | Regulator of SNCA aggregation process | 4.41 | 134 | 28, 13, 68 |
|
| DP00592 | Purkinje cell protein 4 | Cytoplasm, Nucleus | Nervous system development | 6.21 | 62 | 11,11, 23 |
|
| DP00617 | 26S proteasome complex subunit DSS1 | Proteolysis | 3.81 | 70 | 27, 5, 26 | |
|
| DP00630 | γ-synuclin | Cytoplasm | Neurofilament network integrity | 4.89 | 127 | 23, 17, 56 |
|
| Aβ42 | APP(Amyloid precursor protein) | Cytoplasm | Alzheimer disease | 5.31 | 42 | 6, 3, 25 |
: −, + and 0 represent number of negative (−), positive(+) and neutral amino acids in the protein sequence, respectively.
: from UniProt database and reference therein.
Their localization, function, PI, sequence length and amino acid compositions are listed.
Content of AR and LCR sequences in different class of disordered proteins.
| Database/Type | Class | Total number of proteins | Amyloidogenic proteins (count) | Amyloidogenic proteins (%) | AR (count) | AR (%) | LCR (count) | LCR % | Overlap regions (count) |
|
| LDP | 56 | 39 | 69.64 | 102 | 3.35 | 269 | 21.49 | 15 |
| MDP | 58 | 53 | 91.38 | 248 | 6.45 | 223 | 13.52 | 11 | |
| PDP | 107 | 99 | 92.52 | 544 | 9.35 | 146 | 5.51 | 27 | |
| Total | 221 | 191 | 84.51 | 894 | 7.22 | 638 | 13.49 | 53 | |
|
| LDP | 124 | 70 | 56.45 | 556 | 3.47 | 542 | 27.08 | 22 |
| MDP | 101 | 82 | 81.19 | 325 | 6.51 | 286 | 16.52 | 17 | |
| PDP | 207 | 188 | 90.82 | 1008 | 9.89 | 243 | 8.56 | 35 | |
| Total | 432 | 340 | 78.70 | 1889 | 7.26 | 1071 | 15.74 | 74 | |
|
| LDP | 45 | 39 | 86.67 | 176 | 3.18 | 325 | 16.86 | 9 |
| MDP | 65 | 61 | 93.85 | 311 | 6.16 | 248 | 11.56 | 20 | |
| PDP | 76 | 75 | 98.68 | 384 | 7.78 | 137 | 5.56 | 7 | |
| Total | 186 | 175 | 93.07 | 871 | 6.10 | 710 | 10.39 | 36 | |
|
| LDP | 8 | 8 | 100.00 | 19 | 3.40 | 27 | 12.00 | 3 |
| MDP | 7 | 7 | 100.00 | 33 | 7.75 | 24 | 19.84 | 0 | |
| PDP | 10 | 9 | 90.00 | 60 | 9.09 | 15 | 4.26 | 2 | |
| Total | 25 | 24 | 96.00 | 112 | 6.89 | 66 | 11.10 | 5 |
LDP, 71–100% disordered protein; MDP, 31–70% disordered protein; PDP, <30% disordered protein.
Figure 1Content of AR and LCR sequences in different classes of disordered proteins.
(A), DisProt human; (B), IDEAL human; (C), DisProt nonhuman and (D), IDEAL nonhuman. White bar signifying the LCR region, gray bar signifying the AR region and black bar signifying the overlapped region of AR and LCR. (E and F), Percentage of AR and percentage of LCR sequences in different group of disordered proteins, respectively. Bottom-axis in all the plots represents the three groups of disordered proteins with different degree of disorderness, PDP (0–30% disorder), MDP (31–70% disorder) and LDP (71–100% disorder). In (E) and (F), asterisks indicate the statistically significant difference from that of other groups (see Table S5).
LCRs, ARs (*) and overlap regions (†) in some of the human disordered proteins from DisProt data.
| DisProt ID | LCR/AR | Protein length | AR (%) | LCR (%) |
|
|
| 164 | 0 | 10 |
|
|
| 316 | 0 | 43 |
|
| ||||
|
| ||||
|
| ||||
|
|
| 89 | 0 | 62 |
|
| ||||
|
|
| 107 | 0 | 66 |
|
| ||||
|
| ||||
|
| ||||
|
|
| 116 | 14 | 33 |
|
| ||||
|
| ||||
|
| ||||
|
|
| 140 | 4 | 21 |
|
| ||||
|
| ||||
|
|
| 441 | 1 | 17 |
|
| ||||
|
| ||||
|
| ||||
|
|
| 149 | 3 | 0 |
|
|
| 226 | 0 | 38 |
|
| ||||
|
| ||||
|
| ||||
|
| ||||
|
| ||||
|
|
| 314 | 0 | 20 |
|
| ||||
|
| ||||
|
|
| 126 | 0 | 37 |
|
| ||||
|
| ||||
|
|
| 213 | 8 | 23 |
|
| ||||
|
| ||||
|
| ||||
|
|
| 317 | 3 | 41 |
|
| ||||
|
| ||||
|
| ||||
|
| ||||
|
| ||||
|
| ||||
|
|
| 106 | 17 | 0 |
|
| ||||
|
| ||||
|
|
| 82 | 0 | 31 |
|
| ||||
|
|
| 202 | 3 | 5 |
|
| ||||
|
|
| 175 | 5 | 21 |
|
| ||||
|
| ||||
|
|
| 134 | 8 | 28 |
|
| ||||
|
| ||||
|
|
| 62 | 10 | 0 |
|
|
| 70 | 0 | 36 |
|
| ||||
|
|
| 127 | 0 | 30 |
|
| ||||
|
|
| 42 | 29 | 0 |
|
|
Sequence positions are given in the parentheses. Single letter code is used to represent individual aa residues.
Figure 2Probability distribution of LCR and AR lengths and percentages.
Distribution of LCR lengths (A) and percentage of LCR (B) in LCR containing disordered proteins. C and D, respectively; represent probability distribution of AR lengths and AR content (%) of IDPs. Fitted statistical parameters are given in Table 4. Histograms of data are shown with a suitable bin size.
Statistical analysis on AR/LCR length/content.
| Stable distribution parameters | AR length distribution | AR percentage distribution | LCR length distribution | LCR percentage distribution |
|
| 1.02 | 1.34 | 0.92 | 1.08 |
|
| 0.99 | 0.99 | 0.99 | 0.99 |
|
| 6.55 | 9.73 | 14.99 | 9.73 |
|
| 0.94 | 2.24′ | 4.67 | 2.24 |
Stable distribution function fitting parameters.
Figure 3Smoothed kernel density estimation for the LCR and AR content in a protein.
Left and right panel, respectively, represents the density for LCR and AR. The plots have been shown in two different clipping planes. Bottom figures show the smoothed 3D histogram for the AR and LCR.
Figure 4Correlations between content of LCR and AR sequence with the protein length.
(A) Correlations between content of LCR sequence with the protein length. No significant correlation could be obtained for the LCR content in a protein sequence. The figure shows a negative hyperbolic fit (y = 9.44056+1926.61/x; R2, 0.113058) with standard deviation bands (at 1σ, 2σ, and 3σ). (B) Correlations between content of AR sequence with the protein length. No significant correlation could be obtained for the AR content in a protein sequence. The figure shows a negative hyperbolic fit (y = 6.05937+651.62/x; R2, 0.112173) with standard deviation bands (at 1σ, 2σ, and 3σ).
Overlapping regions in DisProt human proteins.
| Disprot ID | LCR/AR overlap region | |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
| LCR |
| |
| AR |
| |
| LCR |
| |
| AR |
| |
|
| LCR |
|
| AR |
| |
| LCR |
| |
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
| LCR |
| |
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
| LCR |
| |
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
| LCR |
| |
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
| LCR |
| |
| AR |
| |
| LCR |
| |
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
|
| LCR |
|
| AR |
| |
| LCR |
| |
| AR |
|
Length and sequence positions are given in the parentheses. Single letter codes are used to represent individual aa residues. Overlapping regions are aligned. Only the proteins with AR/LCR overlapping regions are shown.
Figure 5Content of different types of aa residues present in the LCR, AR and total proteins.
The panel compares the percentage of individual aa residues in the LCR (Series 1, blue), AR (Series 2, red), and total protein (Series 3, green). X-axis started with the most abundant residues in the AR. The amino acid residues are presented with a single letter code along the bottom axis.
Figure 6Comparison of the conformational preferences of residues in the ARs with that of total protein.
A 3D plot shows the percentage of residues with conformational preference for α-helix (green), β- strand/sheet (red) and coil (blue) for total proteins and their ARs as represented in X-axis. Lower panel shows the 2D plot of the above data along with the error limits.
Discrete analysis.
| Protein type | AR (%) | LCR (%) | ||||
| Range | Mean | Median | Range | Mean | Median | |
|
| 0.43–31.50 | 8.36 | 6.98 | 1.41–91.94 | 15.86 | 10.21 |
|
| 1.20–44.00 | 9.27 | 7.50 | 1.30–96.80 | 16.80 | 12.20 |
|
| 0.69–22.37 | 6.56 | 5.93 | 1.09–70.80 | 13.74 | 10.93 |
|
| 1.08–17.53 | 7.03 | 6.69 | 1.67–70.67 | 13.15 | 8.14 |
Range, Mean, Median and Mode of AR and LCR sequence percentage in different group of proteins.
Content of ARs and LCRs in a group of known amyloidogenic proteins.
| Name | UniProt ID | Sequence length | LCR | LCR (%) | AR | AR (%) | Overlapping sequences |
|
| P01308 | 110 | 2–24 | 20.91 | 36–42 | 17.30 | |
| 99–110 | |||||||
|
| P02647 | 267 | 8–15 | 3.00 | |||
|
| P32081 | 67 | 14–20 | 8.20 | |||
| 26–34 | |||||||
| 47–52 | |||||||
|
| P14621 | 99 | |||||
|
| P06654 | 448 | 69–114 | 24.55 | |||
| 241–253 | |||||||
| 379–413 | |||||||
| 427–442 | |||||||
|
| P37840-1 | 140 | 10–23 | 35–40 | |||
| 63–78 | |||||||
|
| P27986 | 724 | 79–102 | 7.18 | 72–78 | 6.40 | |
| 303–314 | 263–269 | ||||||
| 533–548 | 290–296 | ||||||
| 331–336 | |||||||
| 401–406 | |||||||
| 483–495 | |||||||
|
| P10636 | 441 | 274–279 | 1.36 | |||
|
| P01034 | 146 | 2–33 | 21.92 | 10–20 | 22.60 | 10–20 |
| 56–61 | |||||||
| 84–92 | |||||||
| 124–130 | |||||||
|
| P01607 | 108 | 32–37 | 20.40 | |||
| 45–53 | |||||||
| 71–77 | |||||||
|
| P00698 | 147 | 52–62 | 11.60 | |||
| 142–147 | |||||||
|
| P04156 | 253 | 50–94 | 38.74 | 8–17 | 19.40 | 240–252 |
| 113–135 | 171–176 | ||||||
| 188–201 | 178–185 | ||||||
| 237–252 | 222–227 | ||||||
| 231–235 | |||||||
| 240–253 | |||||||
|
| P05453 | 685 | 5–64 | 27.88 | 9–18 | 20.00 | 9–18 |
| 68–113 | 31–36 | 31–36 | |||||
| 130–142 | 45–56 | 45–56 | |||||
| 164–209 | 69–74 | 69–74 | |||||
| 241–253 | 102–108 | 102–108 | |||||
| 398–410 | 260–266 | ||||||
| 278–285 | |||||||
| 304–313 | |||||||
| 426–445 | |||||||
| 471–476 | |||||||
| 527–538 | |||||||
| 566–571 | |||||||
| 584–596 |
Proteins were selected from the reference 56.