Literature DB >> 24594841

Sequence complexity of amyloidogenic regions in intrinsically disordered human proteins.

Swagata Das1, Uttam Pal1, Supriya Das1, Khyati Bagga1, Anupam Roy1, Arpita Mrigwani1, Nakul C Maiti1.   

Abstract

An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24594841      PMCID: PMC3940659          DOI: 10.1371/journal.pone.0089781

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

The available genome sequences and several computational methods have revealed a unique presence of some proteins which remain disordered under physiological condition and resemble their own functional states [1]–[9]. These proteins are known by different names like intrinsically disordered [10], natively denatured [11], natively unfolded protein and intrinsically unstructured proteins [3], [10]. The accepted convention is however intrinsically disordered protein (IDP). It comprises of 25–30% of eukaryotic proteome and ∼50% of eukaryotic proteins contain long disorder regions [12]. The IDPs lack any well-defined three dimensional folded structures in solution and structurally they remain as an ensemble of interconverting conformations under physiological conditions [13]–[15]. The lack of a rigid and folded stable structure may provide large plasticity to IDPs to interact efficiently with different targets, as compared to a globular protein with limited conformational flexibility [16], [17]. These characteristics possibly aid good efficacy to IDPs to be involved in different pathological and biochemical functions [5], [6], [13], [16], [18]–[20]. The functional domain varies from DNA binding to cell cycle regulation, membrane transport, different molecular recognition processes, and other important cellular functions [19], [21]–[23]. In addition to IDPs' important role in cellular activity, the inherited structural disorder plays an important role in the formation of protein assembly structure [24]. The structural disorder and flexibility of IDPs are also linked to formation of amyloid aggregates that is implicated in several human disorder such as Parkinson's disease, Alzheimer's disease, type II diabetes and others [25]–[30]. The major protein component of fibrillar deposits found in Parkinson's disease is a disordered protein, α-synuclein [25]–[30]. Alzheimer's disease is directly linked with production of ordered fibrillar structure of peptide Aβ42. Thus several neurological disorders are linked to formation of amyloid fibrils and their deposition in various cellular organs. However, it is not very clear how normally soluble disordered proteins/peptides are converted into amyloid fibre that possesses compact β-sheet structure. It has been also further observed and presented in many in vitro experiments that some structured proteins convert to amyloid fibrils under solution conditions where the proteins attained partial disordered structure [31], [32]. Experimental study and many computational analyses showed that short sequence stretches in proteins may be responsible and act as nucleating centres for amyloid fibril formation [33]–[36]. These regions are often known as amyloidogenic regions (ARs). Amyloidogenic sequences of six to eight residues when inserted in the C-terminal hinge loop of RNase A, the enzyme shows amyloidogenicity and forms amyloid fibres [34]–[36]. Presence of such regions in many water soluble proteins has been suggested by Dobson [36], [37] and others [38]. According to ‘amyloid stretch hypothesis’ [35], a short amyloid stretch (equivalent to AR) in a certain solution condition triggers the aggregation process. Mutation or reshuffling in this regions leads to decrease or total absence of such aggregation [33], [39]. Thus AR often acts as a nucleation center and governs protein aggregation that eventually leads to formation of β sheet rich amyloid fiber. The IDPs are also rich sequences with biased amino acid residues in a stretch, often known as low complexity regions (LCRs). These regions may also play a critical role in protein stability and energetic of fibril formation [1], [40]–[47]. LCRs are usually of two types: a majority of LCRs is composed of mixed polar and charged amino acid (aa) residues and the presence of such regions enhances protein solubility and mobility in solution. Second type of LCR is a repeat of one/two sequence which is prone to form amyloid fiber. A good example of such region is a stretch of Glu (polyGlu) [48]. Thus the presence of LCR modulates the solubility and amyloidogenicity of disordered proteins [45], [49], [50]. The composition, content and distribution of ARs and LCRs in a protein sequence, therefore, may have a certain role in protein aggregation and amyloidogenicity. However, no major investigation has been carried out regarding sequence complexity of ARs and their spacing among LCRs which are commonly found in IDP sequences. In the present investigation, we computationally detected and analyzed the sequence composition and complexity, distribution pattern and structural aspects of ARs and LCRs in proteins those are deposited in DisProt and IDEAL databases [4], [50], [51]. About 8% residue is found to be in AR and the average length of the region is 8 residues. Further we have found that the sequences in AR are highly complex and they rarely overlap with LCR. Among many recently developed computational approaches and algorithms, we have used Waltz method that is developed by Maurer-Stroh et al. [52]–[56] to predict the ARs. The Waltz algorithm uses a position specific scoring matrix (PSSM) and combined physical properties and structural aspects of protein residues to identify AR [40], [41], [57], [58]. Computation tool SMART is used to predict the sequence complexity parameters. We have measured the structural propensity of the residues in AR by APSSP2 algorithm which is freely available in the World Wide Web [59], [60].

Materials and Methods

Selection of Intrinsically Disordered Proteins

DisProt database release 5.6 (http://www.disprot.org/) provides a set of proteins with different degree of disorderness [4]. It gives the name of the protein, accession codes, aa sequence, location of the disordered region(s), and methods used for structural (disorder) characterization. DisProt analysis also reveals biological function(s) of each disordered regions. Sequences of each protein were retrieved in FASTA format. Length, the aa composition, residue characteristics such as total number of positive and negative residues and theoretical isoelectric point (PI) were computed using the ProtParam tool of ExPASy Proteomic server (http://us.expasy.org/tools/protparam.html). The total charge of the proteins was calculated by ‘protein calculator’ server (http://www.scripps.edu/~cdputnam/protcalc.html). Additional disordered proteins were selected from IDEAL data set that contained experimentally verified IDPs [51]. The structural disorder of the proteins was varied from 0 to 100%. The proteins with (−1)% disorder were excluded. Structural disorder was further calculated using IUPred algorithm, which is available at http://iupred.enzim.hu [61]. Protein disorderness was estimated by counting the number of residues in disordered regions in a protein as predicted by IUPred and it was divided by the length of the protein sequence followed by multiplication with 100.

Calculating LCR and AR

Protein sequences obtained from DisProt and IDEAL were used to calculate both the LCR and AR. The content of LCR of an individual protein was predicted by SEG method as implemented in SMART (simple modular architecture research tool) [40], [62], a web based server available at http://www.bork.embl-heidelberg.de/Modules/sinput.shtml. Default SEG parameters were used for finding the LCR. The SEG method detects LCRs based on the measurement of information content present in the complexity state vector [40]. The ratio of total number of aa residues in all the LCRs of a protein to the protein sequence length was used to calculate the content of low-complexity region in a particular protein. Amyloidogenic region of the proteins was identified by a web based computational tool Waltz [56], http://waltz.switchlab.org. The % content of residues in AR in a protein was measured by taking a ratio of sequences in all the ARs and the sequence length of the protein.

Prediction of Secondary Structure

APSSP2 was used for the secondary structure prediction of each protein from their aa sequence [59]. The algorithm uses a sequence of amino acids as a query input and predicts the corresponding secondary structure with certain confidence level. Percentages of residues those prefer to be in α-helix, β-strand and coiled conformation were calculated by taking a ratio of total residues in a particular conformation to the sequence length of the proteins. Structural preferences of the residues in ARs and LCRs were obtained by selecting the respective sequence regions in the predicted structure of the protein. Percentage of AR/LCR sequence with a preference for a particular conformation was measured against the total number of AR/LCR sequence in the protein.

Statistical Analysis

All the statistical analysis was performed in Wolfram Mathematica 8. Mean, standard error of mean (SEM), standard deviation (SD) were calculated for AR/LCR length and content. Stable distribution function (Text S1) with index of stability α, skewness parameter β, location parameter μ, and scale parameter σ was fitted to the data to show distribution pattern of AR/LCR length and the AR/LCR content in a protein. Bivariate probability distribution such as smoothed kernel density distribution was used to show the distribution of AR/LCR content with the protein length. To find the correlation between the AR/LCR content and protein sequence length negative hyperbolic equations were fitted to the data.

Results

Content of AR and LCR in Different Classes of IDPs

The DisProt database analysis revealed 221 human proteins and 432 nonhuman (other than human) proteins with different degree of disorderness. Table 1, Tables S1 and S2 list some of these proteins with their physicochemical properties. Additional 186 unstructured human proteins and 25 nonhuman proteins were obtained from IDEAL database (Tables S3 and S4). Tables S1, S2, S3, and S4 show the protein name, database ID and the % of protein disorder measured by IUPred. The tables also show the content (%) of AR and LCR in a particular group of proteins. Last two columns in the tables display the number of ARs found within 15 residues from the C- and N- terminal of the protein sequence and these are marked as ‘C’ and ‘N’ column, respectively. The DisProt database provides the content of structural disorder, however, the disorderness of all the proteins present in IDEAL and DisProt databases was calculated using IUPred server. The proteins from both the databases were arranged in a descending order of disorderness. The content (%) of AR sequences decreased with increasing order of structural disorder. However, a less number of LCR sequence was present in proteins with high content of structural elements.
Table 1

Some of the intrinsically disordered human proteins from DisProt database.

Sl No.DisProt IDProteinLocalization/sourceFunction/role PISequence lengthaa# (−,+,0)
1 DP00004_C002Antibacterial protein LL-37SecretedAntibacterial activity10.61375,11,16
2 DP00016Cyclin-dependent kinase inhibitor 1Cytoplasm, NucleusRole cyclin-dependent kinase activity8.6916423, 26, 81
3 DP00017Cyclin-dependent kinase inhibitor 1CNegative regulator of cell proliferation5.3931631, 27, 202
4 DP00028Eukaryotic translation initiation factor 4E-binding protein 1CytosolRegulates eIF4E activity5.3211814,12, 56
5 DP00039Non-histone chromosomal protein HMG-17Cytoplasm, NucleusBinds to nucleosomal DNA10.008914, 26, 40
6 DP00040High mobility group protein HMG-I/HMG-YChromosome, NucleusProcessing of mRNA transcripts10.3110715, 27, 36
7 DP00069Vesicle-associated membrane protein 2Synaptic vesiclesMembrane transport7.8411612,13, 66
8 DP00070α-synucleinMembrane-bound in dopaminergic neuronsDopamine release and transport4.6714024,15, 73
9 DP000126Tau [Isoform Tau-F]AxonsMicrotubule assembly and stability8.2444156, 58, 200
10 DP00174StathminCytoplasmRegulation of the microtubule (MT)5.7614936, 32, 52
11 DP00199β-caseinSecretedModulate surface properties of the casein micelles5.5222620,15, 131
12 DP00214OsteopontinSecretedcell-matrix interaction4.3731475, 29, 103
13 DP00219Protein phosphatase 1 regulatory subunit 11Widely expressedInhibitor of protein phosphatase 16.5212620,19, 51
14 DP00287Tumor suppressor [Isoform 1]CytoplasmInvolved in the ubiquitination4.7021341, 23, 114
15 DP00332Bone sialoprotein 2SecretedCell attachment4.1231776, 23, 103
16 DP00357Thymosin β-4CytoplasmOrganization of the cytoskeleton5.024411, 9, 13
17 DP00372Uncharacterized protein C8orf4Apoptosis10.1410613, 24, 42
18 DP00510Nuclear protein 1NucleusProapoptotic stimuli9.988210, 15, 34
19 DP00521SecurinCytoplasm, NucleusChromosome stability6.1820227, 26, 105
20 DP00546Huntingtin-interacting protein K [Isoform 1]5.3517534, 29, 81
21 DP00555β-synucleinCytoplasmRegulator of SNCA aggregation process4.4113428, 13, 68
22 DP00592Purkinje cell protein 4Cytoplasm, NucleusNervous system development6.216211,11, 23
23 DP0061726S proteasome complex subunit DSS1Proteolysis3.817027, 5, 26
24 DP00630γ-synuclinCytoplasmNeurofilament network integrity4.8912723, 17, 56
25 Aβ42APP(Amyloid precursor protein)CytoplasmAlzheimer disease5.31426, 3, 25

: −, + and 0 represent number of negative (−), positive(+) and neutral amino acids in the protein sequence, respectively.

: from UniProt database and reference therein.

Their localization, function, PI, sequence length and amino acid compositions are listed.

: −, + and 0 represent number of negative (−), positive(+) and neutral amino acids in the protein sequence, respectively. : from UniProt database and reference therein. Their localization, function, PI, sequence length and amino acid compositions are listed. Based on the calculated disorderness, the proteins in each type (human/nonhuman) of proteins were grouped into three categories as suggested in previous report [63]. Proteins with 71–100% structural disorder were grouped as largely disordered proteins (LDPs). Moderately disordered proteins (MDPs) possessed 31–70% sequences in disorder region(s) and the remaining proteins with less than 30% sequences the disorder segment were grouped as partially disordered proteins (PDPs). Sequence details of the AR and LCR in this group of proteins are shown in Table 2. Figure 1 displays the graphical view of the analysis. The number of LDPs was less compared to MDPs and PDPs. Percentage content of amyloidgenic proteins (proteins that contained at least one AR) was also found to be less in LDP group. To gain confidence about this analysis, a t-test was performed based on sequence content (%) in an individual protein of each group (LDP, MDP and PDP). Confidence level was gained from the respective p-values as given in Table S5.
Table 2

Content of AR and LCR sequences in different class of disordered proteins.

Database/TypeClassTotal number of proteinsAmyloidogenic proteins (count)Amyloidogenic proteins (%)AR (count)AR (%)LCR (count)LCR %Overlap regions (count)
DisProt/Human LDP563969.641023.3526921.4915
MDP585391.382486.4522313.5211
PDP1079992.525449.351465.5127
Total22119184.518947.2263813.4953
DisProt/Nonhuman LDP1247056.455563.4754227.0822
MDP1018281.193256.5128616.5217
PDP20718890.8210089.892438.5635
Total43234078.7018897.26107115.7474
IDEAL/Human LDP453986.671763.1832516.869
MDP656193.853116.1624811.5620
PDP767598.683847.781375.567
Total18617593.078716.1071010.3936
IDEAL/Nonhuman LDP88100.00193.402712.003
MDP77100.00337.752419.840
PDP10990.00609.09154.262
Total252496.001126.896611.105

LDP, 71–100% disordered protein; MDP, 31–70% disordered protein; PDP, <30% disordered protein.

Figure 1

Content of AR and LCR sequences in different classes of disordered proteins.

(A), DisProt human; (B), IDEAL human; (C), DisProt nonhuman and (D), IDEAL nonhuman. White bar signifying the LCR region, gray bar signifying the AR region and black bar signifying the overlapped region of AR and LCR. (E and F), Percentage of AR and percentage of LCR sequences in different group of disordered proteins, respectively. Bottom-axis in all the plots represents the three groups of disordered proteins with different degree of disorderness, PDP (0–30% disorder), MDP (31–70% disorder) and LDP (71–100% disorder). In (E) and (F), asterisks indicate the statistically significant difference from that of other groups (see Table S5).

Content of AR and LCR sequences in different classes of disordered proteins.

(A), DisProt human; (B), IDEAL human; (C), DisProt nonhuman and (D), IDEAL nonhuman. White bar signifying the LCR region, gray bar signifying the AR region and black bar signifying the overlapped region of AR and LCR. (E and F), Percentage of AR and percentage of LCR sequences in different group of disordered proteins, respectively. Bottom-axis in all the plots represents the three groups of disordered proteins with different degree of disorderness, PDP (0–30% disorder), MDP (31–70% disorder) and LDP (71–100% disorder). In (E) and (F), asterisks indicate the statistically significant difference from that of other groups (see Table S5). LDP, 71–100% disordered protein; MDP, 31–70% disordered protein; PDP, <30% disordered protein. Table 2 and Tables S1, S2, S3, and S4 show that some of the proteins in each group contained no AR. For instance, among 221 human proteins in DisProt database, 191 (∼86%) proteins were amyloidogenic and each contained at least one AR. 30 human proteins contained no ARs. The number of amyloidogenic proteins was maximum (93%) for PDPs. However, the value decreased to 70% for the LDPs. A similar trend was observed with nonhuman proteins as presented in Table 2 and Table S2. Analysis of protein sequence from IDEAL database also revealed a similar trend in the content of amyloidogenic protein in different group of proteins (Table 2 and Table S3). Percentage of sequences in low complexity region (LCR) in each and individual protein in DisProt and IDEAL databases are also given in Tables S1, S2, S3, and S4. A group wise distribution of the LCRs is presented in Figure 1 and Table 2. The content of LCR sequence (%) was maximum in LDPs and a little more than 20% of the sequence was found in LCR regions in human proteins found in DisProt. The content of LCR sequences was found to increase with the decrease of structural disorder. Nonhuman DisProt proteins contained slightly higher percentage (16%) of LCR sequences than the proteins in human category. The LCR sequence content in proteins of IDEAL database was less than the DisProt proteins. The content of LCR was least in PDPs. P-values from the t-test of some of the above comparison are given in Table S5. The sequence length of the AR/LCR and their content varied from protein to protein. Table 3 and Table S6 provide the sequence detail of the ARs, LCRs and the overlap regions between the two regions (AR/LCR). The table provides information regarding AR/LCR length and sequence position of the regions and the percentage of AR/LCR sequences in an individual protein. Individual AR lengths varied from 5 to 34 residues. The content of AR sequences was between 0 to 44% (Tables S1, S2, S3, and S4). For example, the shortest protein, 37 residues long antibacterial LL-37 (DP0004_C002) contained no AR, tau with 441 amino acids enriched with 1.3% AR residues. DP00069 with sequence length of 116 was very rich in AR sequences (14%).
Table 3

LCRs, ARs (*) and overlap regions (†) in some of the human disordered proteins from DisProt data.

DisProt IDLCR/ARProtein lengthAR (%)LCR (%)
DP00016 GPRRGRDELG GGRRPG (81–96) 164010
DP00017 RLLLAPRPVA VAVAVSPPLE PAAES (101–125) 316043
PSVPVPAPAS TPPPVPVLAP APAPAPAPVA APVAAPVAVA VLAPAPAPAP APAPAPAPVA APAPAPAPAP APAPAPAPAP DAAP (137–220)
AAGTAAASAN GAA (251–263)
VPAPCPSPSA APGVGSV (291–307)
DP00039 KRKAEGDAKG DKAKVKDE (2–19) 89062
AKPAPPKPEP KPKKAPAKKG EKVPKGKKGK ADAGKEG (29–65)
DP00040 SESSSKSS (2–9) 107066
KRGRGRPRKQ PP (23–34)
PKRPRGRPKG SKNKG (54–68)
KTRKTTTTPG RKPRGRPKKL EKEEEEGISQ ESSEEE (71–106)
DP00069 ATAATAPPAA PAGEGGPPAP PP (3–24) 1161433
IILGVICAII LIIIIV (97–112)
VICAIILIII IVYFSS (101–116)*
VICAIILIII IV (101–112)†
DP00070 KAKEGVVAAA EKTK (10–23) 140421
EGVLYV (35–40)*
VTNVGGAVVT GVTAVA (63–78)
DP00126 SKSKDGTGSD DKKAKGADGK TKIAT (129–153) 441117
PAKTPPAPKT PPSSGEPPKS GDRSGYSSPG SPGTPGSRSR
TPSLPTPPTR EP (172–223)
KVQIIN (274–279)*
DP00174 AFELI (19–23)* 14930
DP00199 VLILACLVAL A (3–15) 226038
ETIESLSSSE ESITE (17–31)
HEDQQQGEDE HQD (41–53)
LPLAQPAVVL PVPQP (82–96)
LHLPLPLLQP LMQQVPQPIP Q (139–159)
LLLNQELLLN (196–205)
DP00214 SHDHMDDMDD EDDDDHVDSQ DSIDSNDSDD VDDTDDSHQS 314020
DESHHSDESD E (81–131)
EFHSHEFHSH E (272–282)
DP00219 ETVTETTVTV TTE (10–22) 126037
ESSTESDEEE EE (72–83)
PTPTTPPQPP DPSQPPPGPM Q (105–125)
DP00287 EAEVGAEEAG VEEYGPEEDG GEESGAEESG PEESGPEELG 213823
AEEEMEAG (10–57)
SQVIF (72–76)*
IFANITLPVY TL (147–158)*
DP00332 GSSDSSEENG DDSSEEEEEE EETSNEGEN NEESNEDEDS EAENTT (62–106) 317341
KEKESDEEEE EEEEGNENEE SEAEVDENE (145–173)
TGANAEGTTE TGGQGKGTSK TTTSPNGG (207–234)
GKTTTVEYEG EYEYTG (252–267)
GQGYDGYDGQ NYY (302–314)
GQNYYHHQ (310–317)*
GQNYY (310–314)†
DP00372 HQAIIM (7–12)* 106170
AVGNIF (35–40)*
IIFAID (66–71)*
DP00510 EDEDSSLDES DLYSL (18–32) 82031
GGGGRKGRTK RE (38–48)
DP00521 ATLIYV (2–7)* 20235
PPSPVKMPSP P (163–173)
DP00546 GAERRCGPGP APPPPRAEA (16–34) 175521
RRSREQKAKQ EREKELAK (116–133)
VEAL IALTN (167–175)*
DP00555 EGVLYV (35–40)* 134828
GAGNIA (73–78)*
EEVAQEAAEE PLIEPLMEPE GESYEDPPQE EYQEYEPE (96–133)
DP00592 AAVAIQ (42–47)* 62100
DP00617 LLEEDDEFEE F (12–22) 70036
VWEDNWDDDN VEDD (38–51)
DP00630 AVSEAVVSSV NTVATKTV (65–82) 127030
QQEGEASKEK EEVAEEAQSG (106–125)
Aβ42 KLVFFA (16–21)* 42290
GGVVIA (37–42)*

Sequence positions are given in the parentheses. Single letter code is used to represent individual aa residues.

Sequence positions are given in the parentheses. Single letter code is used to represent individual aa residues. In contrast to ARs, most of the LCRs were 8–40 residues long. The shortest LCR was 8 residues long. One such region was detected in DP00040. The largest LCR of 84 residues long was detected in DP00017. LCRs in tau (DP00126), for instance, occupied 17% of its total sequences. More than 35% residues in β-casein (DP00199) and regulatory subunit 1 (DP00219) were in LCRs. Statistical analysis was carried out to reveal the average of AR/LCR content (%) and the length of the two regions (AR/LCR) in human proteins. To obtain the statistical parameters, AR/LCR content in all the human proteins from DisProt and IDEAL databases (Tables S1 and S2) was combined. The total number of proteins examined was 407 and the combined number of AR and LCR were 1765 and 1348, respectively, (Table 2). A stable distribution function (see Materials and Methods and Text S1) was applied to the experimental data (detected ARs and LCRs). Figure 2 shows the frequency histogram and the fitted distribution function for both the LCR and AR. Table 4 reports the statistical parameter values estimated from the fit to ARs/LCRs. It was found that the statistical population (% of AR/LCR sequences) was characterized by a positive (and much larger than zero) value of the skewness coefficient. The mean value was ∼8% of sequences for the AR. A similar distribution fit was made to the available lengths of the ARs/LCRs as shown in Figure 3 and the mean value was about 8 residues for the AR and 34 residues for the LCR.
Figure 2

Probability distribution of LCR and AR lengths and percentages.

Distribution of LCR lengths (A) and percentage of LCR (B) in LCR containing disordered proteins. C and D, respectively; represent probability distribution of AR lengths and AR content (%) of IDPs. Fitted statistical parameters are given in Table 4. Histograms of data are shown with a suitable bin size.

Table 4

Statistical analysis on AR/LCR length/content.

Stable distribution parametersAR length distributionAR percentage distributionLCR length distributionLCR percentage distribution
Index of stability, α 1.021.340.921.08
Skewness parameter, β 0.990.990.990.99
Location parameter, μ 6.559.7314.999.73
Scale parameter, σ 0.942.24′4.672.24

Stable distribution function fitting parameters.

Figure 3

Smoothed kernel density estimation for the LCR and AR content in a protein.

Left and right panel, respectively, represents the density for LCR and AR. The plots have been shown in two different clipping planes. Bottom figures show the smoothed 3D histogram for the AR and LCR.

Probability distribution of LCR and AR lengths and percentages.

Distribution of LCR lengths (A) and percentage of LCR (B) in LCR containing disordered proteins. C and D, respectively; represent probability distribution of AR lengths and AR content (%) of IDPs. Fitted statistical parameters are given in Table 4. Histograms of data are shown with a suitable bin size.

Smoothed kernel density estimation for the LCR and AR content in a protein.

Left and right panel, respectively, represents the density for LCR and AR. The plots have been shown in two different clipping planes. Bottom figures show the smoothed 3D histogram for the AR and LCR. Stable distribution function fitting parameters. Figure 3 shows the smoothed kernel density estimation for the LCR/AR content in a protein (left and right panel, respectively). The plots have been shown in two different clipping planes. Bottom figure shows the smoothed 3D histogram. The smoothed kernel density estimation plot shows a distinct peak suggesting ∼8% AR content in a ∼400 aa long protein and indicated that the detected proteins in the two databases populated at ∼400 aa long and largely contributed to the estimate of average content of the AR and LCR. No correlation could be observed between the AR/LCR content and protein length (Figure 4). Although at deeper clipping plane it suggested a negative hyperbolic fit i.e. with the increase in protein length there is decrease in the AR/LCR content. However, no significant fit could be obtained to validate this assumption.
Figure 4

Correlations between content of LCR and AR sequence with the protein length.

(A) Correlations between content of LCR sequence with the protein length. No significant correlation could be obtained for the LCR content in a protein sequence. The figure shows a negative hyperbolic fit (y = 9.44056+1926.61/x; R2, 0.113058) with standard deviation bands (at 1σ, 2σ, and 3σ). (B) Correlations between content of AR sequence with the protein length. No significant correlation could be obtained for the AR content in a protein sequence. The figure shows a negative hyperbolic fit (y = 6.05937+651.62/x; R2, 0.112173) with standard deviation bands (at 1σ, 2σ, and 3σ).

Correlations between content of LCR and AR sequence with the protein length.

(A) Correlations between content of LCR sequence with the protein length. No significant correlation could be obtained for the LCR content in a protein sequence. The figure shows a negative hyperbolic fit (y = 9.44056+1926.61/x; R2, 0.113058) with standard deviation bands (at 1σ, 2σ, and 3σ). (B) Correlations between content of AR sequence with the protein length. No significant correlation could be obtained for the AR content in a protein sequence. The figure shows a negative hyperbolic fit (y = 6.05937+651.62/x; R2, 0.112173) with standard deviation bands (at 1σ, 2σ, and 3σ).

Sequence Aspects of AR and LCR

One interesting observation was that a major number of proteins contained both the AR and LCR, however, the two regions rarely overlapped with each other (Figure 1, Tables S1, S2, S3, and S4, Table 3 and Table 5). For instance, DisProt human proteins contained 894 ARs and 638 LCRs, however, only 53 occurrences of sequence overlapping between the two regions were observed and in most of the cases the overlap was partial (Table 5). A LCR with residues 97–112 in DP00069 overlapped with C-terminal AR of residues 101–116, and the overlapping region contain 12 residues. Whereas in DP00332, LCR with residues from 302–314 overlapped with an AR (310–317). Only four residues were found in the overlapping region. Similarly four ARs from DP00119, DP00551, DP00643_A002 and DP00683 partially overlapped with the LCRs. In other group of proteins also a similar result was obtained. Among 1889 AR regions in DisProt nonhuman proteins, only 74 ARs overlapped with the LCRs. In an average, ∼3% of the AR sequences overlapped with the LCR sequences. These observations clearly indicated that the residues in AR were very complex and rarely overlapped with the LCR.
Table 5

Overlapping regions in DisProt human proteins.

Disprot IDLCR/AR overlap region
DP00069 LCR IILGVICAIILIIIIV---- (97–112)
AR ----VICAIILIIIIVYFSS (101–116)
DP00332 LCR GQGYDGYDGQNYY--- (302–314)
AR --------GQNYYHHQ (310–317)
DP00119 LCR LLILLSVALLALSSAESSSEDVSQEESL---- (2–29)
AR --------------------------SLFLIS (28–33)
DP00551 LCR ALLLLLFLHLAFL (10–22)
AR --LLLLFLHLAFL (12–22)
DP00643_A002 LCR VILRLLRYIVRLVWR-- (122–136)
AR ----LLRYIVRLVWRMH (126–138)
DP00683 LCR LVSVYNSYPYYPYLY- (210–224)
AR LVSVYNSYPYYPYLYC (210–225)
DP00012 LCR -----FNSSAFFFSGFFVVFLSV----------- (305–322)
AR AYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIIL (300–333)
LCR IQLLLIVIGAIAVVAVLQ (995–1012)
AR -QLLLIVIGAIA------ (996–1006)
LCR -IFVIFFIAVTFISI-- (1106–1119)
AR MIFVIFFIAVTFISILT (1105–1121)
DP00074 LCR AAYEFNAAAAANA (58–70)
AR AAYEFN------- (58–63)
LCR LTLQQQHQRLAQLLLIL- (495–511)
AR -----------QLLLILS (506–512)
DP00099 LCR ---TIITPPTPIIP (336–346)
AR AGWTIIT------- (333–339)
DP00162 LCR TTGVVTVIVILIAIAALGALILG----- (9–31)
AR -------IVILIAIAALGALILGCWCYL (16–36)
DP00191 LCR LLLLLFL-- (8–14)
AR -LLLLFLKS (9–16)
DP00231 LCR ------QTPQGQQGLLQAQNLLTQLPQQ (210–231)
AR AQFIISQ--------------------- (204–210)
DP00272 LCR --------LALADALATSTL (112–123)
AR ATNIYIFNLA---------- (104–113)
DP00282 LCR KNNWNIEDNNIKN (1132–1144)
AR -NNWNIE------ (1133–1138)
DP00306 LCR ----ITILIIALIAL------ (51–61)
AR NVVFITILIIALIALSVGQYN (47–67)
DP00307 LCR LEQILEYELLLIQQL------ (140–154)
AR -------ELLLIQQLNFHLIV (147–160)
DP00311 LCR AVAGLVLVALLAILV---- (232–246)
AR --------ALLAILVENWH (240–250)
DP00314 LCR PKLPDDTTFPLPPPRPK----- (149–165)
AR ----------------KNVIFE (165–170)
DP00317 LCR TEKRKKRSTKKE---------- (301–312)
AR -----------EVFNILQAAYV (312–322)
DP00324 LCR GGNFGGRSSGPYGGGG--- (329–344)
AR --------------GGQYF (343–347)
DP00338 LCR MILFLIMLVLVLF--- (20–32)
AR -ILFLIMLVLVLFGYG (21–35)
DP00339 LCR MILFLIMLVLVLF--- (20–32)
AR -ILFLIMLVLVLFGYG (21–35)
LCR GDFYYLGGFFGG (261–272)
AR GDFYYLGGFFG- (261–271)
DP00356 LCR NNQYFNHHPYPHNHYMP (120–136)
AR NNQYFN----------- (120–125)
DP00381 LCR -----NNTQTTTHLQPLHHP (819–833)
AR ELNNINNTQ----------- (814–822)
DP00406 LCR LQALYALQALVVTL- (1522–1535)
AR LQALYALQALVVTLE (1522–1536)
DP00428 LCR -------LELCRRRSLLEL (130–141)
AR NDFVFVVLEL--------- (123–132)
DP00448 LCR LVVKTALKLLLVFV--- (217–230)
AR --------LLLVFVEYS (225–233)
DP00464 LCR KKLKEKKDELD--------- (45–55)
AR ---------LDSLITAITTN (54–64)
DP00466 LCR SPPVILLISFLIFLIV- (237–252)
AR ---VILLISFLIFLIVG (240–253)
DP00467 LCR AKPNATTANGNTALAIA (785–801)
AR -----------TALAIA (796–801)
DP00503 LCR ---------LLIILFIIVPIFLLL (167–181)
AR KDGIIMIQTLLIILFIIVPIFLL- (158–180)
DP00508 LCR LAVLILAIILL------ (7–17)
AR LAVLILAIILLQGTLAQ (7–23)
DP00519 LCR -----SSGAKSPSKSGA (1355–1366)
AR KAVEFSS---------- (1350–1356)
LCR LEELEKERSLLLADLDKEEKEKD----------- (134–156)
AR ---------------------KDWYYAQLQNLTK (155–167)
DP00520 LCR KSPKGSGKPPGVPASSKSGK------ (332–351)
AR -------------------KAFSYYL (351–357)
DP00553 LCR ASLLFLNVLAFAAL- (716–729)
AR ASLLFLNVLAFAALY (716–730)
DP00574 LCR GPGRLEREAAAAAATTPAPTAGAL--- (52–75)
AR --------------------AGALYSG (72–78)
LCR -----SGSEGDSESGEEEELGAE (77–94)
AR AGALYSG---------------- (72–78)
DP00616 LCR LVFLVLLFLGALGLCLA (3–19)
AR ---LVLLFLGA------ (6–13)
DP00628 LCR LRELSELSLLSL-- (235–246)
AR --------LLSLYG (243–248)
DP00632 LCR YSTYSQAAAQQGYSAYTAQ (6–24)
AR -----------GYSAYTA- (17–23)
LCR ---SYTQAQTTATYGQTAYATSYGQPPTGYTTPTAPQA (51–85)
AR TDVSYTQAQTTATYGQTAYATSYG-------------- (48–71)
LCR QPVTAPPSYPPTSYSSTQPTSYDQSSYSQQNTYG…QSS (182–266)
AR ----------------------------QQNTYG------ (210–215)
DP00633 LCR -LQAYQQRLLQQQ (2257–2268)
AR SLQAYQ------- (2256–2261)
DP00641 LCR AALLWLLLIAAA-- (5–16)
AR AALLWLLLIAAAFS (5–18)
DP00666 LCR IILLLLVLLIL-- (1130–1140)
AR -----LVLLILCF (1135–1142)
DP00670 LCR AVAAAAIFVIIIF- (314–326)
AR --AAAAIFVIIIFY (316–327)
DP00706 LCR GKGDSSGFSSYSGSSSSGSSISSARSSGGGSSG…AGS (58–105)
AR ------GFSSYS--------------------------- (64–69)
LCR GYSQVSYSSGSGSSLQGASGSSQLGSSSSHSGNSGS…GSA (111–175)
AR --SQVSYSS--------------------------------- (113–119)

Length and sequence positions are given in the parentheses. Single letter codes are used to represent individual aa residues. Overlapping regions are aligned. Only the proteins with AR/LCR overlapping regions are shown.

Length and sequence positions are given in the parentheses. Single letter codes are used to represent individual aa residues. Overlapping regions are aligned. Only the proteins with AR/LCR overlapping regions are shown. We also calculated average content of different types of amino acid residues in both the AR and LCR. Figure 5 displays the average content of different types of residues present in the AR, LCR and total proteins. A major fraction of the AR residues was hydrophobic and Leu was the most abundant (12.6%) residue. Other major residues in the region were Ile (11.2%), Phe (8.8%), Tyr (8.6%), Val (8.1%), Ala (7.3%). The AR regions were depleted in Pro, Lys, His and others. A major number of residues in the LCR was hydrophilic in nature and the regions were enriched with Ser (13.1%), Pro (12.1%), Gly (9.8%) and Ala (9.2%).
Figure 5

Content of different types of aa residues present in the LCR, AR and total proteins.

The panel compares the percentage of individual aa residues in the LCR (Series 1, blue), AR (Series 2, red), and total protein (Series 3, green). X-axis started with the most abundant residues in the AR. The amino acid residues are presented with a single letter code along the bottom axis.

Content of different types of aa residues present in the LCR, AR and total proteins.

The panel compares the percentage of individual aa residues in the LCR (Series 1, blue), AR (Series 2, red), and total protein (Series 3, green). X-axis started with the most abundant residues in the AR. The amino acid residues are presented with a single letter code along the bottom axis. The structural propensities of residues in the ARs were measured using the APSSP2 algorithm (see Materials and Methods). The analysis showed that the conformational preference of the AR residues was not confined to any particular structure, rather in average a mixed structural preference of the AR residues was observed in all three groups of proteins. Figure 6 displays the overall structural heterogeneity of the AR sequences present in human (DisProt) proteins. The average number of sequence that preferred α-helical conformation was ∼38%. Preferences for β-sheet/strand and coil conformations were ∼31% and ∼32%, respectively. This result indicated that all of the sequences in the ARs did not favour β-conformation. When compared with total protein sequence present in the same group of proteins, about 56% residues preferred coil conformation and ∼30% residues showed structural propensity towards α-helical conformation. Remaining 14% favoured β-sheet/strand conformations. Number of residues that preferred β-sheet component increased substantially in the ARs, however, large fraction of the AR residues (38%) favoured α-helical conformation.
Figure 6

Comparison of the conformational preferences of residues in the ARs with that of total protein.

A 3D plot shows the percentage of residues with conformational preference for α-helix (green), β- strand/sheet (red) and coil (blue) for total proteins and their ARs as represented in X-axis. Lower panel shows the 2D plot of the above data along with the error limits.

Comparison of the conformational preferences of residues in the ARs with that of total protein.

A 3D plot shows the percentage of residues with conformational preference for α-helix (green), β- strand/sheet (red) and coil (blue) for total proteins and their ARs as represented in X-axis. Lower panel shows the 2D plot of the above data along with the error limits.

Discussion

It is known from previous investigations that AR acts as a key for several protein aggregations and amyloid fibril formation. In this report we detected ARs by using Waltz algorithm and analyzed computationally the sequence complexity, conformational preference and the distribution of ARs in disordered human proteins present in Disprot and IDEAL databases. There are several methods to detect ARs [56], [64]–[66]. Some important algorithms and software to predict aggregation aspects of proteins are Tango [55], Waltz [56], PASTA [67]–[70], Aggrescan [71], SALSA [72], Zyggregator [73], AmylPred [64], FoldAmyloid [74]. The ability of the protein sequences to form β-strands/sheets is a predominant feature in most of these algorithms. PASTA was developed based on hidden β-propensity of the protein sequences [67]–[70]. Aggrescan software was based on an aggregation propensity scale for the 20 natural amino acids [71]. This method stressed that short and specific sequence stretches were responsible for protein aggregation. Based on average packing density of the aa residues, FoldAmyloid identified a sequence pattern that could promote amyloid fibril formation [34]. Waltz methodology was used in this investigation because many of its selected regions were experimentally verified and the method was better capable to differentiate amyloid fiber formation and amorphous aggregates [56]. The investigation revealed that more than ∼80% disordered human proteins (DisProt and IDEAL databases) possessed at least one AR, indicating that a significant number of disordered proteins were amyloidogenic. Waltz detected ARs from a large number of proteins in DisProt and IDEAL databases. The large number of data set helped to derive, along with discrete analysis (Table 6), statistical average of AR and LCR sequence percentage and the average of AR and LCR sequence length. Discrete analysis result of all groups of proteins is given in Table 2 and Table 6. The average values did not differ much with statistical analysis result (Table 4). However, the statistical values may be more acceptable to represent the average properties and composition of the LCRs and ARs.
Table 6

Discrete analysis.

Protein typeAR (%)LCR (%)
RangeMeanMedianRangeMeanMedian
DisProt human 0.43–31.508.366.981.41–91.9415.8610.21
DisProt nonhuman 1.20–44.009.277.501.30–96.8016.8012.20
IDEAL human 0.69–22.376.565.931.09–70.8013.7410.93
IDEAL nonhuman 1.08–17.537.036.691.67–70.6713.158.14

Range, Mean, Median and Mode of AR and LCR sequence percentage in different group of proteins.

Range, Mean, Median and Mode of AR and LCR sequence percentage in different group of proteins. Percentage of amyloidogenic proteins was higher in the PDP groups. Thus the content of AR sequences was more in proteins with less structural disorder or in structured proteins. A similar observation was also made by Linding et al. [75]. These proteins contained less number of LCRs which were composed of less number of hydrophobic amino acids. LCR thus may have a significant role in protein aggregation process and amyloid formation. AR may be exposed to start the aggregation process and LCR regions could have certain role in the process. However, a large number of LCR along with a high content of polar amino acids and attenuated hydrophobicity may not allow the protein to misfold/fold further to gain β-sheet rich amyloid aggregate, in largely disordered proteins [3]. Therefore, the content of AR and LCR and the unique balance between the two regions are very crucial for protein stability (for disordered proteins) and amyloid formation. A proper solution condition may be needed based on the content of AR/LCR to unfold the region of structured proteins partially or fully to trigger amyloid fiber formation [76]. Nature may have designed the disordered proteins with a unique balance of AR and LCR sequences to provide stability and the ability to perform multifunction. However, an external disturbance or change in internal cellular condition may break this unique balance and could enhance protein aggregation and amyloid formation. Most of the detected ARs in amyloidogenic proteins were six to eight residues long. We detected six residues long (residues 35–40) AR in α-synuclein. It was significantly shorter than the aggregation prone segment obtained by Der-Sarkissian et al. Zhang et al. showed four additional segments that might be involved in α-synuclein aggregation [72]. However, the used methods did not define adequately the characteristics of nucleation site of amyloid formation. Waltz allowed identification and better distinction between amyloid sequences from the protein segments that promote β-sheet rich amorphous aggregates, and that could be a possible reason of less number of AR regions found in this investigation. Statistical analysis results and discreet analysis (Tables S1, S2, S3, and S4, Table 6) established that the content of AR sequences was not always proportional to the protein sequence length. It showed a negative hyperbolic correlation among the protein sequence length and the percentage of AR/LCR sequence (Figure 4). The reason of this was not known. Chiti et al. observed less aggregation propensity of proteins those were longer with respect to short proteins [77]. The longer proteins thus may have evolved with attenuation (low content) of ARs to reduce unwanted aggregation and fibril formation. It would be interesting, however, to test whether increasing number of ARs could enhance the aggregation kinetics or the quality of fibril formation in longer proteins. In this regard, it was also important to know the conformational preferences of AR residues. We observed that aa residues in the ARs showed propensity towards α-helix, β-sheet/strand and coil conformations and all the residues were not very hydrophobic. Waltz, used in this investigation, did not fully rely on β-sheet structural propensity of the residues but was built on PSSM and on consideration of other physicochemical properties of the protein sequences. It allows some tolerance towards charged and polar residues with different hidden structural propensity. Proteins with diverse structural domains (β-sheet, α-helix, or random coil) including globular proteins were found to produce aggregates with fibrillar structure under certain solution condition [23], however, a crucial structural rearrangement often occurred during conversion of these proteins into amyloid fiber [78]. Thus slightly polar amino acids or the presence of LCR may play important role in structural reorganization. Aggregation propensity and overall protein aggregation may also depend on the location of AR in the protein sequence, and how the ARs are surrounded by local excess of polar/charged amino acids or LCRs. Kar et al. recently showed that addition of a polyproline sequence to C-terminal side of polyGlu slowed aggregation of the peptide [48]. However insertion of the same residues to the N-terminal side of polyGlu caused very little effect on overall aggregation of the peptide. N-terminal residues in Huntingtin protein situated adjacent to the polyGlu sequence dramatically altered aggregation property of the peptide. However, position dependent role of LCRs, rich in polar and charged residues, on aggregation propelled by ARs was not known with certainty. According to amyloid stretch hypothesis the AR containing proteins were needed to be locally/partially unfolded to initiate and promote the process of amyloid fiber formation [35]. Thus the presence of LCR in a protein with less disorder may significantly alter the amyloid formation kinetics. The IDPs play a vital role in molecular recognition process and the interaction has found to lead formation of structured protein complexes. A model of molecular recognition features or elements (MoRFs) has been proposed to define this interaction and the reorganization processes [79]–[82]. The MoRF model recognizes, in a disordered protein sequence, a linear region that undergoes a disorder-to-order transition upon binding to its partner. These regions are often referred as MoRFs. The regions could attain α-helices, form β-strands (β-MoRFs), irregular structures (ι-MoRFs), and a combination of all these structural elements upon binding to its partner. However, our analysis largely directed to find the amyloid forming region and the region of protein sequences that are sequentially less complex. Both the AR and LCR could be part of MoRFs and may be involved in molecular reorganization process. However, further analysis may be needed to address this issue. One of the significant observations was that the AR sequences were highly complex. Our analysis with IDPs showed that ∼20% sequence was in the LCR and the value was close to the overall predicted value for SWISS-PROT database [41]. However most (greater than 97%, Table 2) of the AR sequences were not within the LCRs. It indicated complexity pattern of the AR sequences and confirmed the presence of less number of biased aa residues in the ARs. Some LCRs with one or more aa residues form stretches of a single amino acid, produce homopolymeric structure [41], [49], [40], [83] and became amyloidogenic [84]. However, we could detect in IDPs no such LCR which were polymeric in nature and amyloidogenic. Many prion proteins, e.g mammalian PrP, the yeast prions, Ure2p and Sup35 contain disordered stretches that also form beta sheet rich aggregates. These aggregate prone domains are also found to contain segments with low sequence complexity and often are enriched with Glu/Asp [85]–[88]. Thus prion proteins also contained both the ARs and LCRs. A test was performed with prion protein (P04156) and Huntingtin (P42858), however waltz methods could detect the palindromic region (residue 112–119) in P04156 and polyQ region in Huntingtin (P42858) only when ‘custom’ is used as the threshold in the analysis [56]. In our analysis, ‘best overall performance’ was used as the threshold and it missed the detection of above two amyloidogenic regions. We also analysed the content of ARs and LCRs in a group of proteins which were amyloidogenic and the amyloidogencity of the proteins were experimentally proven [56]. The list of the proteins and the analysis results are shown in Table 7. It includes protein like insulin, prion protein (P04156) and yeast protein Sup 35 (P05453). The observation was that the sequence overlapping of the AR and LCR were also very less (Table 7). This indicated that the ARs are compositionally highly complex. As such the sequence complexity and structural heterogeneity of the AR sequences was a vital observation. Also a few % of residues that overlapped with the LCR showed mixed structural propensity. The C terminal LCR in DP00069 that overlapped with the AR contained seven Ile (not at a stretch) and these residues showed preference for α-helical conformation. The overlapping sequences of AR and LCR, however, in DP00332 showed propensity towards random coil structure. Being a part of an AR both the overlapping regions was expected to induce aggregation in a certain solution condition. However, the LCR component may modulate the aggregation process in different way and the content may be changed depending on the solution condition [89]. Future experiments, starting with these overlapping ARs and LCRs, would enhance our understanding about how the sequence region composed of AR with low complexity sequences would modulate the protein aggregation process that lead to eventual formation of amyloid fiber.
Table 7

Content of ARs and LCRs in a group of known amyloidogenic proteins.

NameUniProt IDSequence lengthLCRLCR (%)ARAR (%)Overlapping sequences
Insulin P013081102–2420.9136–4217.30
99–110
Apolipoprotein A1 P026472678–153.00
Cold shock protein cspB P320816714–208.20
26–34
47–52
Acylphosphatase2 P1462199
Immunoglobulin G-binding protein G P0665444869–11424.55
241–253
379–413
427–442
Alpha- synuclein P37840-114010–2335–40
63–78
PI3-kinase alpha P2798672479–1027.1872–786.40
303–314263–269
533–548290–296
331–336
401–406
483–495
Microtubule-associated protein Tau P10636441274–2791.36
Cystatin-C P010341462–3321.9210–2022.6010–20
56–61
84–92
124–130
Ig kappa chain V-I region Rei P0160710832–3720.40
45–53
71–77
Lysozyme C P0069814752–6211.60
142–147
Major prion protein PrP P0415625350–9438.748–1719.40240–252
113–135171–176
188–201178–185
237–252222–227
231–235
240–253
Sup35 P054536855–6427.889–1820.009–18
68–11331–3631–36
130–14245–5645–56
164–20969–7469–74
241–253102–108102–108
398–410260–266
278–285
304–313
426–445
471–476
527–538
566–571
584–596

Proteins were selected from the reference 56.

Proteins were selected from the reference 56.

Conclusion

The current investigation was focused on sequence complexity and content of AR present in proteins which were partially or fully disordered. The study observed a very high sequence complexity of the ARs and the regions not commonly overlapped with the LCRs which were abundant in the protein sequence. The future investigation may examine experimentally whether a unique balance between the content of AR and LCR could provide a suitable stability to a monomeric disordered protein to remain in a solution state. It would be interesting to examine how the spacing of LCR and AR and, swapping of AR positions influence the energetic of amyloid fiber formation. It will enhance our understanding why some proteins favor aggregation in a certain environment and may add more information about the mechanism of amyloid formation which is linked to several pathological human disorders. Stable distribution function. Details of the statistical distribution function applied to AR/LCR length/content distribution. (DOCX) Click here for additional data file. DisProt human proteins. Protein name, database IDs and AR/LCR content measured by IUPred are listed. Last two columns in the tables display the number of ARs found within 15 residues from the C- and N- terminal of the protein sequence and these are marked as ‘C’ and ‘N’ column, respectively. (XLSX) Click here for additional data file. DisProt nonhuman proteins. Protein name, database IDs and AR/LCR content measured by IUPred are listed. Last two columns in the tables display the number of ARs found within 15 residues from the C- and N- terminal of the protein sequence and these are marked as ‘C’ and ‘N’ column, respectively. (XLSX) Click here for additional data file. IDEAL human proteins. Protein name, database IDs and AR/LCR content measured by IUPred are listed. Last two columns in the tables display the number of ARs found within 15 residues from the C- and N- terminal of the protein sequence and these are marked as ‘C’ and ‘N’ column, respectively. (XLSX) Click here for additional data file. IDEAL nonhuman proteins. Protein name, database IDs and AR/LCR content measured by IUPred are listed. Last two columns in the tables display the number of ARs found within 15 residues from the C- and N- terminal of the protein sequence and these are marked as ‘C’ and ‘N’ column, respectively. (XLSX) Click here for additional data file. The t-test results for the comparison of different group of proteins. The p-values were obtained using inbuilt program in Microsoft Office Excel. (XLSX) Click here for additional data file. AR sequences and their positions in DisProt proteins. UniProt and DisProt IDs and the AR/LCR percentages in respective proteins are given. (XLSX) Click here for additional data file.
  84 in total

1.  Protein secondary structure prediction based on position-specific scoring matrices.

Authors:  D T Jones
Journal:  J Mol Biol       Date:  1999-09-17       Impact factor: 5.469

Review 2.  Coupling of folding and binding for unstructured proteins.

Authors:  H Jane Dyson; Peter E Wright
Journal:  Curr Opin Struct Biol       Date:  2002-02       Impact factor: 6.809

3.  Sequence determinants of amyloid fibril formation.

Authors:  Manuela López de la Paz; Luis Serrano
Journal:  Proc Natl Acad Sci U S A       Date:  2003-12-22       Impact factor: 11.205

4.  Sequence patterns associated with disordered regions in proteins.

Authors:  S Lise; D T Jones
Journal:  Proteins       Date:  2005-01-01

5.  Prediction of aggregation-prone regions in structured proteins.

Authors:  Gian Gaetano Tartaglia; Amol P Pawar; Silvia Campioni; Christopher M Dobson; Fabrizio Chiti; Michele Vendruscolo
Journal:  J Mol Biol       Date:  2008-05-13       Impact factor: 5.469

6.  SMART, a simple modular architecture research tool: identification of signaling domains.

Authors:  J Schultz; F Milpetz; P Bork; C P Ponting
Journal:  Proc Natl Acad Sci U S A       Date:  1998-05-26       Impact factor: 11.205

Review 7.  Intrinsic disorder and functional proteomics.

Authors:  Predrag Radivojac; Lilia M Iakoucheva; Christopher J Oldfield; Zoran Obradovic; Vladimir N Uversky; A Keith Dunker
Journal:  Biophys J       Date:  2006-12-08       Impact factor: 4.033

Review 8.  Protein disorder, prion propensities, and self-organizing macromolecular collectives.

Authors:  Liliana Malinovska; Sonja Kroschwald; Simon Alberti
Journal:  Biochim Biophys Acta       Date:  2013-01-14

9.  IDEAL: Intrinsically Disordered proteins with Extensive Annotations and Literature.

Authors:  Satoshi Fukuchi; Shigetaka Sakamoto; Yukiko Nobe; Seiko D Murakami; Takayuki Amemiya; Kazuo Hosoda; Ryotaro Koike; Hidekazu Hiroaki; Motonori Ota
Journal:  Nucleic Acids Res       Date:  2011-11-08       Impact factor: 16.971

10.  ANCHOR: web server for predicting protein binding regions in disordered proteins.

Authors:  Zsuzsanna Dosztányi; Bálint Mészáros; István Simon
Journal:  Bioinformatics       Date:  2009-08-28       Impact factor: 6.937

View more
  9 in total

1.  Insights into Unfolded Proteins from the Intrinsic ϕ/ψ Propensities of the AAXAA Host-Guest Series.

Authors:  Clare-Louise Towse; Jiri Vymetal; Jiri Vondrasek; Valerie Daggett
Journal:  Biophys J       Date:  2016-01-19       Impact factor: 4.033

Review 2.  Prions, amyloids, and RNA: Pieces of a puzzle.

Authors:  Anton A Nizhnikov; Kirill S Antonets; Stanislav A Bondarev; Sergey G Inge-Vechtomov; Irina L Derkatch
Journal:  Prion       Date:  2016-05-03       Impact factor: 3.931

Review 3.  Protein Aggregation Landscape in Neurodegenerative Diseases: Clinical Relevance and Future Applications.

Authors:  Niccolò Candelise; Silvia Scaricamazza; Illari Salvatori; Alberto Ferri; Cristiana Valle; Valeria Manganelli; Tina Garofalo; Maurizio Sorice; Roberta Misasi
Journal:  Int J Mol Sci       Date:  2021-06-02       Impact factor: 5.923

Review 4.  Identification of inhibitors of biological interactions involving intrinsically disordered proteins.

Authors:  Daniela Marasco; Pasqualina Liana Scognamiglio
Journal:  Int J Mol Sci       Date:  2015-04-02       Impact factor: 5.923

5.  A stress assembly that confers cell viability by preserving ERES components during amino-acid starvation.

Authors:  Margarita Zacharogianni; Angelica Aguilera-Gomez; Tineke Veenendaal; Jan Smout; Catherine Rabouille
Journal:  Elife       Date:  2014-11-11       Impact factor: 8.140

6.  Predicting Amyloidogenic Proteins in the Proteomes of Plants.

Authors:  Kirill S Antonets; Anton A Nizhnikov
Journal:  Int J Mol Sci       Date:  2017-10-16       Impact factor: 5.923

7.  Understanding the evolutionary trend of intrinsically structural disorders in cancer relevant proteins as probed by Shannon entropy scoring and structure network analysis.

Authors:  Sagnik Sen; Ashmita Dey; Sourav Chowdhury; Ujjwal Maulik; Krishnananda Chattopadhyay
Journal:  BMC Bioinformatics       Date:  2019-02-04       Impact factor: 3.169

Review 8.  Attempt to Untangle the Prion-Like Misfolding Mechanism for Neurodegenerative Diseases.

Authors:  Daniela Sarnataro
Journal:  Int J Mol Sci       Date:  2018-10-09       Impact factor: 5.923

Review 9.  Long noncoding RNA and protein abundance in lncRNPs.

Authors:  Man Wu; Liang-Zhong Yang; Ling-Ling Chen
Journal:  RNA       Date:  2021-09-15       Impact factor: 4.942

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.