Literature DB >> 19468045

NetCSSP: web application for predicting chameleon sequences and amyloid fibril formation.

Changsik Kim1, Jiwon Choi, Seong Joon Lee, William J Welsh, Sukjoon Yoon.   

Abstract

The calculation of contact-dependent secondary structure propensity (CSSP) is a unique and sensitive method that detects non-native secondary structure propensities in protein sequences. This method has applications in predicting local conformational change, which typically is observed in core sequences of protein aggregation and amyloid fibril formation. NetCSSP implements the latest version of the CSSP algorithm and provides a Flash chart-based graphic interface that enables an interactive calculation of CSSP values for any user-selected regions in a given protein sequence. This feature also can quantitatively estimate the mutational effect on changes in native or non-native secondary structural propensities in local sequences. In addition, this web tool provides precalculated non-native secondary structure propensities for over 1,400,000 fragments that are seven-residues long, collected from PDB structures. They are searchable for chameleon subsequences that can serve as the core of amyloid fibril formation. The NetCSSP web tool is available at http://cssp2.sookmyung.ac.kr/.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19468045      PMCID: PMC2703942          DOI: 10.1093/nar/gkp351

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The sequence potential for non-native β-strand formation and the presence of chameleon sequences have been investigated extensively from the perspective that such structural features are implicated in the induction of fatal amyloid-related diseases (1–3). Our previous studies have shown that the α-helix and β-strand share similar sequence contexts and that the tertiary interaction is an important determinant of local secondary structure formation (4,5). Conventional secondary structure prediction methods, however, rely heavily on the intrinsic propensity of local sequences (6,7), and consequently they are not sensitive enough to predict non-native secondary structure formation. Thus, we have developed a computational method that quantifies the influence of tertiary interaction on secondary structural preference (4). Artificial neural network (ANN)-based algorithms that use preparameterized tertiary interactions with sequence inputs from users are designed to predict contact-dependent secondary structure propensities (CSSPs) (5,8). Many attempts have been made to predict the aggregation-prone or amyloidogenic regions in protein sequences. The role of the physico-chemical properties of amino acids was investigated in determining the aggregation rate of a given sequence (9–11), and an optimal combination of physico-chemical properties of its amino acids provided a predictor, Zyggregator (12). Aggregation-prone fragments of amino-acid sequence were also predicted by using a statistical mechanics algorithm, TANGO (13). More recently, Trovato and his colleagues developed residue-based potentials to form parallel- or anti-parallel beta structure and used them to predict the core of amyloids (14,15). Structural features of core amyloid (16) have been also considered to evaluate amyloid fibril formation. The aggregation propensity of an inserted amino acid in the middle of the β-amyloid sequence has been experimental investigated and parameterized to predict the amyloidogenic propensities of other peptides (17). Despite these efforts, however, no rapid sequence-based methods have been reported to predict non-native secondary structure propensity in a globular protein and to pinpoint the aggregation-prone, core sequences of amyloid fibril formation. Thus, CSSP algorithms were proposed to evaluate the secondary structure propensity of a local sub-sequence in terms of tertiary interaction energies (4,5,8). The CSSP methods, which adopt a fast machine learning algorithm, allow non-native secondary structure propensity in local sequences to be systematically evaluated with a step-wise increase of tertiary interaction energies. The trained single ANN exhibits 74% accuracy in predicting the native secondary structure of test sequences in their native tertiary interaction energy state, and the dual ANN-based predictor has an 83% accuracy (440 884 SCOP20 fragments were used for the training, while 22 707 exclusive fragments from unique fold proteins were used for the tests) (8). In order to investigate the ability of NetCSSP in predicting amyloid fibril formation, we also retrieved two test sets of amino-acid fragments with experimental aggregation data from literature (13), and tested the output of single ANN for the predictability on aggregation-prone sequences (Figure 1). CSSP methods predict the secondary structure propensity for the center residue in a seven-residue sequence. Therefore we selected fragments of ≥10 amino acids length from the original test sets in order to obtain CSSP values for at least four residues in the middle. The sequence potential for aggregation was calculated from CSSP-derived P(helix), P(β) and P(coil) values in the form, ln(P(β)/[P(helix) × P(coil)] (5). The ROC plots in Figure 1 represent the sensitivity and specificity of the CSSP method in predicting the aggregation-prone sequences. Considering that the single ANN exhibits 74% accuracy in predicting secondary structures, the observed accuracy for predicting aggregation-prone fragments (i.e. AUC of 0.77 and 0.88 for test set 1 and 2) indicate that the CSSP provides an effective measure of the aggregation propensity. In our previous studies, we have already shown that CSSP methods can pinpoint the core of amyloid fibrils in many sequences (4,5,8). In addition, calculated CSSPs were shown to have a quantitative correlation with the aggregation rate of test fragments (5). All of these validation data are present in the ‘Intro’ page of the NetCSSP website.
Figure 1.

ROC plot validation of NetCSSP algorithm on two data sets. The sequence potential for aggregation was calculated from CSSP-derived P(helix), P(β) and P(coil) values in the form, ln(P(β)/[P(helix) × P(coil)] (5). Test set1 includes a total of 104 fragments of ≥10 amino-acid length, and test set2 includes 70 fragments of 10 amino-acid length. Both test sets were retrieved from literature (13). ROC plots represent the prioritization of aggregation-prone sequences over non-aggregates based on the CSSP values. AUC (Area Under the Curve) is in the range of 0–1 and represents the predictive power of the method.

ROC plot validation of NetCSSP algorithm on two data sets. The sequence potential for aggregation was calculated from CSSP-derived P(helix), P(β) and P(coil) values in the form, ln(P(β)/[P(helix) × P(coil)] (5). Test set1 includes a total of 104 fragments of ≥10 amino-acid length, and test set2 includes 70 fragments of 10 amino-acid length. Both test sets were retrieved from literature (13). ROC plots represent the prioritization of aggregation-prone sequences over non-aggregates based on the CSSP values. AUC (Area Under the Curve) is in the range of 0–1 and represents the predictive power of the method. We integrated these single and dual ANN methods into the current NetCSSP with a user-friendly web interface (Figure 2). Because it returns CSSP profiles quickly, NetCSSP can be used for very long sequences or various combinations of amino-acid substitutions at particular sites. The easy Flash chart-based interface enables the interactive calculation of CSSP values for any user-selected regions in a given protein sequence. In addition, it compares experimental native secondary structures and predicted CSSPs when a PDB structure is inputted to the server. A third-party validation also has been reported and demonstrates that the CSSP calculation uniquely reveals local changes in β-strand propensity by mutations (18). This web tool also provides precalculated CSSPs with native secondary structure information for over 1 400 000 fragments that are seven-residues long, collected from SCOP90 domains. It is searchable for comparative evaluation of native and non-native secondary structure propensities and thus predicts amyloidogenic or chameleon sequences. We believe that the current NetCSSP is a unique tool for systematically predicting the sequence potential for local secondary structural changes. It has applications in protein engineering in addition to studies of amyloid fibrils.
Figure 2.

Workflow of CSSP calculation in the NetCSSP web server. Only sequence information is required for the CSSP calculation. When the 3D structure (in PDB format) is submitted, the predicted CSSP will be displayed in comparison with the native secondary structure information.

Workflow of CSSP calculation in the NetCSSP web server. Only sequence information is required for the CSSP calculation. When the 3D structure (in PDB format) is submitted, the predicted CSSP will be displayed in comparison with the native secondary structure information.

NetCSSP PROFILE

NetCSSP provides a simple user interface to load input sequences. When a 3D structure file is loaded (in PDB format), the server automatically extracts sequence information for CSSP calculation and also runs the DSSP (Dictionary of Secondary Structure in Protein) program (19) to define native secondary structures (Figure 2). For CSSP calculation, the selected ANN runs multiple times with stepwise increases in preparameterized tertiary interaction energies [see ref. (8) for the detail]. A typical output of the single network-based calculation for a 3D structure input file of horse myoglobin (PDB ID: 1DWRa) shows the CSSP profiles and native secondary structure together (Figure 3). It provides the residue-based profile of secondary structure propensities in diverse tertiary interaction energies. Thus users can intuitively identify the potential amyloidogenic subsequences from the CSSP profile. In the present display, the entire myoglobin sequence adopts a primarily helical conformation in the native structure. The myoglobin sequence has been reported to form amyloid fibrils by switching its helices to β-strand conformations (2). In particular, the N-terminal region (1–29) is known to have a high propensity for β-aggregation (20). The CSSP profile in Figure 3 shows high helical and beta propensities that are consistent with the previous experimental observation. The propensity for each of three secondary structure elements, helix, β-strand and coil, is calculated at 20 different levels of >(i, I ± 4) interaction energy. Most of the N-terminal regions show both strong native helical propensity at high >(i, I ± 4) interaction energies and non-native beta propensity at low >(i, I ± 4) interaction energies.
Figure 3.

Output of single ANN mode NetCSSP profile of horse myoglobin (PDB ID: 1DWR). Only the N-terminal region (sequence 4–53) is displayed. The native helical conformation is displayed in red bars. The CSSP is predicted at 20 different energy steps of >(i, I ± 4) interaction for helical, beta and coil propensities. The bottom diagram shows the sum of energy step-wise CSSPs. The additive CSSP values for the entire sequence and the residue-average values are given in the upper panel. One can also interactively calculate the CSSPs for any user-specified residues and energy steps. The light pink box shows the CSSP values for seventh residue, W, at an intermediate >(i, I ± 4) energy level. The blue-shaded region represents a selection of 25-GQEVLI-30 sub-sequence and its CSSPs are presented at the upper panel.

Output of single ANN mode NetCSSP profile of horse myoglobin (PDB ID: 1DWR). Only the N-terminal region (sequence 4–53) is displayed. The native helical conformation is displayed in red bars. The CSSP is predicted at 20 different energy steps of >(i, I ± 4) interaction for helical, beta and coil propensities. The bottom diagram shows the sum of energy step-wise CSSPs. The additive CSSP values for the entire sequence and the residue-average values are given in the upper panel. One can also interactively calculate the CSSPs for any user-specified residues and energy steps. The light pink box shows the CSSP values for seventh residue, W, at an intermediate >(i, I ± 4) energy level. The blue-shaded region represents a selection of 25-GQEVLI-30 sub-sequence and its CSSPs are presented at the upper panel. Users can also quantitatively analyze the CSSP profile interactively. For example, Figure 3 shows a selection, 25-GQEVLI-30, which is included in the highly amyloidogenic N-terminal sequence of horse myoglobin. NetCSSP returns detailed CSSP values for the selected sequence at the upper panel. GQEVLI shows similar propensity to form a helix and β-strand [i.e. P(helix) = 0.37 and P(β) = 0.34], although the entire myloglobin sequence shows a higher helical propensity than the β-strand propensity [i.e. P(helix) = 0.320, P(β) = 0.229]. The diagram at the bottom of Figure 3 shows the residue-based sum of CSSPs, which clearly shows that GQEVLI is a potential hot spot for accelerating amyloid fibril formation.

CHAMELEON SEQUENCES

Non-native secondary structure propensities that are predicted from CSSP profiles can be directly confirmed by searching chameleon sequences, which are also provided by the present NetCSSP server. For example, the search output for the GQEVLI query that is selected in Figure 3 shows that GQEVL and QEVL are found in both helical and beta contexts in the native structures of various proteins (Table 1). This search result implies that the GQEVLI sequence in myoglobin can form non-native beta conformations by altering tertiary interactions during the course of amyloid fibril formation.
Table 1.

Search of chameleon sequences

SequenceSecondary structurePDBChainSCOPCSSPa (for native structure)Non-native P(helix)Non-native P(β)
GQEVLLTCCEEEEE1o89Ab.35.1.20.480.3
QEVLLVQHHHHHHH1a8oa.28.3.10.430.33
QEVLLWLHHHHHHH1csha.103.1.10.460.36
TLAQEVLHHHHHHH1e1oAd.104.1.10.520.23
AQEVLLAEEEEEEE1exsAb.60.1.10.280.51
KPIQEVLCCHHHHH1hetAc.2.1.10.550.21
QEVLKSIHHHHHHH1mg7Ad.14.1.60.530.22
NLQEVLGCCCEEEC1n3lAc.26.1.10.330.41
LQEVLNTHHHHHHH1odfAc.37.1.60.550.24
QEVLLPRCEEEECC1ojqAd.166.1.10.510.19
AHQEVLFEEEEEEE1p9lAd.81.1.30.310.34
IQEVLEVHHHHHCC1qguBc.92.2.30.530.34
QEVLETMHHHHHHH1tmlc.6.1.10.580.27

The subsequence (GQEVLI) in the shaped box in Figure 3 has both strong helical and beta propensities. Searching the fragment database, including precalculated CSSPs values, shows that GQEVL and QEVL are found in both helical and beta contexts in various native proteins. The native secondary structure is represented by C (coil), E (extended β) and H (helix).

aCSSP represents the calculated propensity for the native secondary structure for a seven-residue sequence. For example, when a residue adopts ‘coil’ for the native structure, P(coil) of calculated CSSPs was selected.

Search of chameleon sequences The subsequence (GQEVLI) in the shaped box in Figure 3 has both strong helical and beta propensities. Searching the fragment database, including precalculated CSSPs values, shows that GQEVL and QEVL are found in both helical and beta contexts in various native proteins. The native secondary structure is represented by C (coil), E (extended β) and H (helix). aCSSP represents the calculated propensity for the native secondary structure for a seven-residue sequence. For example, when a residue adopts ‘coil’ for the native structure, P(coil) of calculated CSSPs was selected. This chameleon sequence database includes 1 424 079 seven-residue fragments that are extracted from 2339 unique fold SCOP20 domains. The native 3D structure of each fragment was obtained from the PDB structure, and the CSSP was calculated within the context of a complete 3D structure of the proteins. User-defined query sequences can include up to seven residues. By searching this database, you can directly compare the experimental native secondary structures with the calculated CSSPs for native and non-native secondary structures. Table 1 shows that GQEVL and QEVL are found in both helical and beta contexts in the native structure. Consistently, the calculated CSSPs show that they have similar propensity for native and non-native secondary structures in many different protein contexts. One also can search the database using a cutoff for CSSP values. Table 2 shows search outputs of the top-five lists for the highest non-native helical and beta propensities when the database is searched using cutoffs for non-native P(helix) and non-native P(β). Complete information on the seven-residue sequences, PDB ID, SCOP ID, native secondary structures, and CSSP values are available. The information in the chameleon database is useful for designing new ambivalent or non-ambivalent peptide sequences, as well as identifying amyloidogenic chameleon subsequences in a given protein.
Table 2.

Output of search of chameleon sequences with the highest non-native P(helix) and non-native P(β) values

SequenceSecondary structurePDBChainSCOPCSSP (for native structure)Non-native P(helix)Non-native P(β)Relative P(helix)Relative P(β)
LRRARAACCCCCCC1cerOd.81.1.10.180.690.133.930.73
KQMLAKACCCCCCC1gojAc.37.1.90.130.680.185.101.33
QEQLEKACCCCCCC1gx5Ae.8.1.40.170.680.133.890.77
AKEAAQKCCCCCCC1g9lAa.144.1.10.20.680.13.500.53
ARAQARQCCCEEEE1omhAd.89.1.50.140.684.77
AVIVVFDCCCCCCC1bgxTc.120.1.20.150.220.581.453.80
VTVTVFDCCCCCCC1eu1Ab.52.2.20.30.120.570.411.88
VFEVNIRHHHHHHH1nxcAa.102.2.10.230.572.52
VYWFTVEHHHCCCC1tohd.178.1.10.220.572.54
VYVVFSVCCCCCCC1vhoAc.56.5.40.160.230.571.453.52

By searching the fragment DB, one can quantitatively analyze non-native secondary structure propensities in comparison with native secondary structure patterns.

Output of search of chameleon sequences with the highest non-native P(helix) and non-native P(β) values By searching the fragment DB, one can quantitatively analyze non-native secondary structure propensities in comparison with native secondary structure patterns.

FUNDING

Basic Research Program of the Korea Science & Engineering Foundation [grant No. R01-2006-000-10515-0]; SRC program of MOST/KOSEF (Research Center for Women's Diseases); Korea Research Foundation Grant funded by the Korean Government (MOEHRD) [KRF-2006-311-C00582]; grant from the KRIBB Research Initiative Program. Funding for open access charge: SRC program of MOST/KOSEF (Research Center for Women's Diseases). Conflict of interest statement. None declared.
  20 in total

1.  Amyloid fibrils from muscle myoglobin.

Authors:  M Fändrich; M A Fletcher; C M Dobson
Journal:  Nature       Date:  2001-03-08       Impact factor: 49.962

2.  Rationalization of the effects of mutations on peptide and protein aggregation rates.

Authors:  Fabrizio Chiti; Massimo Stefani; Niccolò Taddei; Giampietro Ramponi; Christopher M Dobson
Journal:  Nature       Date:  2003-08-14       Impact factor: 49.962

3.  Prediction of the absolute aggregation rates of amyloidogenic polypeptide chains.

Authors:  Kateri F DuBay; Amol P Pawar; Fabrizio Chiti; Jesús Zurdo; Christopher M Dobson; Michele Vendruscolo
Journal:  J Mol Biol       Date:  2004-08-27       Impact factor: 5.469

4.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins.

Authors:  Ana-Maria Fernandez-Escamilla; Frederic Rousseau; Joost Schymkowitz; Luis Serrano
Journal:  Nat Biotechnol       Date:  2004-09-12       Impact factor: 54.908

Review 5.  The Zyggregator method for predicting protein aggregation propensities.

Authors:  Gian Gaetano Tartaglia; Michele Vendruscolo
Journal:  Chem Soc Rev       Date:  2008-05-27       Impact factor: 54.564

6.  PHD: predicting one-dimensional protein structure by profile-based neural networks.

Authors:  B Rost
Journal:  Methods Enzymol       Date:  1996       Impact factor: 1.600

7.  Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features.

Authors:  W Kabsch; C Sander
Journal:  Biopolymers       Date:  1983-12       Impact factor: 2.505

8.  Rapid assessment of contact-dependent secondary structure propensity: relevance to amyloidogenic sequences.

Authors:  Sukjoon Yoon; William J Welsh
Journal:  Proteins       Date:  2005-07-01

Review 9.  Therapeutic strategies for human amyloid diseases.

Authors:  James C Sacchettini; Jeffery W Kelly
Journal:  Nat Rev Drug Discov       Date:  2002-04       Impact factor: 84.694

10.  Detecting hidden sequence propensity for amyloid fibril formation.

Authors:  Sukjoon Yoon; William J Welsh
Journal:  Protein Sci       Date:  2004-08       Impact factor: 6.725

View more
  23 in total

1.  Acid-denatured small heat shock protein HdeA from Escherichia coli forms reversible fibrils with an atypical secondary structure.

Authors:  Shiori Miyawaki; Yumi Uemura; Kunihiro Hongo; Yasushi Kawata; Tomohiro Mizobata
Journal:  J Biol Chem       Date:  2018-12-10       Impact factor: 5.157

2.  Hot spots in apolipoprotein A-II misfolding and amyloidosis in mice and men.

Authors:  Olga Gursky
Journal:  FEBS Lett       Date:  2014-02-20       Impact factor: 4.124

3.  The Relation between α-Helical Conformation and Amyloidogenicity.

Authors:  Boris Haimov; Simcha Srebnik
Journal:  Biophys J       Date:  2018-04-10       Impact factor: 4.033

Review 4.  Amyloid-Forming Properties of Human Apolipoproteins: Sequence Analyses and Structural Insights.

Authors:  Madhurima Das; Olga Gursky
Journal:  Adv Exp Med Biol       Date:  2015       Impact factor: 2.622

5.  Human beta-synuclein rendered fibrillogenic by designed mutations.

Authors:  Shahin Zibaee; Graham Fraser; Ross Jakes; David Owen; Louise C Serpell; R Anthony Crowther; Michel Goedert
Journal:  J Biol Chem       Date:  2010-09-10       Impact factor: 5.157

6.  Discordant and chameleon sequences: their distribution and implications for amyloidogenicity.

Authors:  Deena M A Gendoo; Paul M Harrison
Journal:  Protein Sci       Date:  2011-03       Impact factor: 6.725

7.  Bioinformatics Methods in Predicting Amyloid Propensity of Peptides and Proteins.

Authors:  Małgorzata Kotulska; Jakub W Wojciechowski
Journal:  Methods Mol Biol       Date:  2022

Review 8.  Protein aggregation: in silico algorithms and applications.

Authors:  R Prabakaran; Puneet Rawat; A Mary Thangakani; Sandeep Kumar; M Michael Gromiha
Journal:  Biophys Rev       Date:  2021-01-17

9.  Investigating the Disordered and Membrane-Active Peptide A-Cage-C Using Conformational Ensembles.

Authors:  Olena Dobrovolska; Øyvind Strømland; Ørjan Sele Handegård; Martin Jakubec; Morten L Govasli; Åge Aleksander Skjevik; Nils Åge Frøystein; Knut Teigen; Øyvind Halskau
Journal:  Molecules       Date:  2021-06-12       Impact factor: 4.411

10.  A consensus method for the prediction of 'aggregation-prone' peptides in globular proteins.

Authors:  Antonios C Tsolis; Nikos C Papandreou; Vassiliki A Iconomidou; Stavros J Hamodrakas
Journal:  PLoS One       Date:  2013-01-10       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.