Literature DB >> 15980459

DiANNA: a web server for disulfide connectivity prediction.

Abstract

Correctly predicting the disulfide bond topology in a protein is of crucial importance for the understanding of protein function and can be of great help for tertiary prediction methods. The web server http://clavius.bc.edu/~clotelab/DiANNA/ outputs the disulfide connectivity prediction given input of a protein sequence. The following procedure is performed. First, PSIPRED is run to predict the protein's secondary structure, then PSIBLAST is run against the non-redundant SwissProt to obtain a multiple alignment of the input sequence. The predicted secondary structure and the profile arising from this alignment are used in the training phase of our neural network. Next, cysteine oxidation state is predicted, then each pair of cysteines in the protein sequence is assigned a likelihood of forming a disulfide bond--this is performed by means of a novel architecture (diresidue neural network). Finally, Rothberg's implementation of Gabow's maximum weighted matching algorithm is applied to diresidue neural network scores in order to produce the final connectivity prediction. Our novel neural network-based approach achieves results that are comparable and in some cases better than the current state-of-the-art methods.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2005 PMID： 15980459 PMCID： PMC1160173 DOI： 10.1093/nar/gki412

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Disulfide bonds are covalently bonded sulfur atoms from nonadjacent cysteine residues, which stabilize the protein structure and are often found in extracytoplasmatic proteins. The knowledge of cysteine connectivity (i.e. which, if any, pairs of cysteines form a bond in a given protein sequence) can reduce greatly the conformational space for protein structure prediction algorithms. Moreover, as shown by Chuang and co-workers (1), a similar disulfide connectivity pattern frequently implies a structural similarity even when the sequence similarity is undetectable. Notwithstanding, only a few attempts have been made to solve this problem. In contrast, many methods have been developed for the related, but simpler problem of cysteine oxidation state prediction, i.e. to determine the cysteines that are involved in a disulfide bond, without predicting the connectivity pattern. Recent methods based on machine learning techniques have reached an outstanding accuracy of 90% on certain test data (2–5). In spite of this, accuracy for the disulfide connectivity problem remains measured. The reason for this is simple—amino acids that flank half-cystines (disulfide-bonded cysteines) are quite different from those that flank free cysteines (non-bonded cysteines) (6,7). In contrast, the residues that flank two incorrectly paired half-cystines are quite similar to those that flank the half-cystines in a disulfide bond. Two recent and remarkable papers based on different approaches (8,9) outperform early attempts by Fariselli and co-workers (10,11). The Vullo and Frasconi method (9) uses recursive neural networks (12) to score undirected graphs that represent cysteine connectivity. The method of Zhao and co-workers (8) is based on recurrent patterns of sequence separation between bonded half-cystines. Web servers that allow online disulfide connectivity prediction are available for Vullo/Frasconi () and, as a prototype, for Fariselli/Casadio (). Here, we describe a web server for disulfide connectivity prediction that implements our novel approach, which results in comparable and sometimes better than the state-of-the-art methods (8,9). Algorithm details and performance of the method are described previously by Ferrè and Clote (13).

METHODS

The stand-alone program for disulfide connectivity prediction, implemented in our web server DiANNA (for DiAminoacid Neural Network Application), uses a three-step procedure. First, a neural network is trained to recognize cysteines in an oxidized state (sulfur covalently bonded) as distinct from cysteines in a reduced state (sulfur occurring in reactive sulfhydryl group SH), based on the previous work by Fariselli et al. (14) only those monomers that have at least two predicted half-cysteines are submitted to the second step. The neural network input is a window of size w centered at each cysteine in the sequence. This first filtering step is called Module A. Then, a second neural network (Module B) is used to score each pair of symmetric windows of size w, each one centered at a cysteine in the input sequence. The network input contains evolutionary information, i.e. each residue is encoded by 20 input units corresponding to the PSIBLAST-computed profile row (obtained from the multiple alignment of the input sequence against the non-redundant SwissProt), and secondary structure information, computed using PSIPRED (15) and encoded in unary format by the addition of three input units, e.g. helix is encoded 1 0 0, coil is 0 1 0 and sheet is 0 0 1). Using secondary structure information leads to a marked improvement and is justified by the bias in the secondary structure preference of free cysteines and half-cystines (16). The architecture of the Module B neural network is as follows. Given an encoded input containing secondary structure information, thus having w × 23 input units, we designed a first hidden layer containing units, one for each pair 1 ≤ i < j ≤ w of positions, with connections to input units representing the profile for residues at position i, j and secondary structures at those positions. Thus, each of the w (w − 1)/2 hidden units in the first hidden layer (the diresidue layer) is connected to 2 (20 + 3) = 46 input units (Figure 1). A second hidden layer, containing five units, all fully connected with those of the first hidden layer, is then fully connected to the single output unit. We designed this unusual neural network architecture, with the aim of emphasizing the signal that arises when using diresidue position-specific scoring matrices (13), i.e. for all windows of length w, for positions 1 ≤ i < j ≤ w and amino acids a, b, we consider the frequency of occurrence of amino acid a in position i when amino acid b is found in position j; moreover, though there are many hidden units, the training phase is still reasonably fast since the diresidue layer is not fully connected with the input layer.

Figure 1

A toy example of the diresidue neural network architecture. Six input units (named 1, …, 6) are connected to the units of the first hidden layer (7, …, 21), called the diresidue layer. Each pair of input units is connected to a distinct unit in the diresidue layer. The units of the diresidue layer are then fully connected to the five units (22, …, 26) of the second hidden layer, which are fully connected to the single output unit. Using the second hidden layer provided a better performance than connecting the diresidue layer units directly to the output unit. In the DiANNA application, each residue is encoded by 23 input units (20 encoding the evolutionary information and 3 for the secondary structure information); therefore, each unit in the diresidue layer is connected to 23 + 23 = 46 input units that code a pair of residues.

Finally, following Fariselli and Casadio (10), our algorithm applies the Edmonds–Gabow maximum weight matching algorithm (17,18), using Ed Rothberg's implementation wmatch (), to the weighted complete graph, whose nodes are half-cystines and whose weights are values output from the neural network of Module B. This last step is called Module C.

SERVER DESCRIPTION

The web server takes as input a protein sequence in FASTA format and can output the following: (i) oxidation state prediction for all the cysteines in the input sequence, using our implementation of the neural network described in (14) (Module A); (ii) a score for each pair of cysteines in the input, obtained by our diresidue neural network (Module B); (iii) the disulfide connectivity prediction obtained using the maximum weighted matching algorithm (Module C) applied to the scores of Module B. The user is warned if Module A predicts less than two half-cystines in the input sequence. A statistical evaluation of the connectivity prediction is not attempted. A sample output is shown in Figure 2.

Figure 2

Output from DiANNA when given as input the sequence for human growth hormone receptor (SwissProt ID GHR_HUMAN, PDB code 1kf9 chain F). This protein has 6 cysteines that form 3 disulfide bonds, with connectivity pattern 1–2, 3–4, 5–6 (between cysteines 6 and 16, 33 and 44, 58 and 72). The upper portion of the output page reports the Module B score (see text) for each pair of cysteines, ranging from 0 to 1 (scores >0.9 are highlighted). In the lower portion, the proposed connectivity (i.e. the Module C output) is shown.

DISCUSSION

Trained and tested on a list of proteins having at most five and at lest two bonds, equivalent to those used in (9,11), the software achieves a rate Qp of 49% for perfect predictions (i.e. the fraction of proteins for which there are no false-positive or false-negative predictions made), 86% accuracy and 51% Matthews' correlation coefficient (13). For proteins having two and four bonds, the fraction of perfect predictions improves to 62 and 55%, respectively. Although future improvement for disulfide connectivity is still desired, our approach is nonetheless reliable when used on proteins having a relatively small number of disulfide bonds.

15 in total

1. Protein secondary structure prediction based on position-specific scoring matrices.

Authors: D T Jones
Journal: J Mol Biol Date: 1999-09-17 Impact factor: 5.469

2. Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins.

Authors: P Fariselli; P Riccobelli; R Casadio
Journal: Proteins Date: 1999-08-15

3. Prediction of disulfide connectivity in proteins.

Authors: P Fariselli; R Casadio
Journal: Bioinformatics Date: 2001-10 Impact factor: 6.937

4. A general framework for adaptive processing of data structures.

Authors: P Frasconi; M Gori; A Sperduti
Journal: IEEE Trans Neural Netw Date: 1998

5. Relationship between protein structures and disulfide-bonding patterns.

Authors: Chao-Chun Chuang; Chun-Yin Chen; Jinn-Moon Yang; Ping-Chiang Lyu; Jenn-Kang Hwang
Journal: Proteins Date: 2003-10-01

6. Different sequence environments of cysteines and half cystines in proteins. Application to predict disulfide forming residues.

Authors: A Fiser; M Cserzö; E Tüdös; I Simon
Journal: FEBS Lett Date: 1992-05-11 Impact factor: 4.124

7. Cysteine separations profiles on protein sequences infer disulfide connectivity.

Authors: East Zhao; Hsuan-Liang Liu; Chi-Hung Tsai; Huai-Kuang Tsai; Chen-hsiung Chan; Cheng-Yan Kao
Journal: Bioinformatics Date: 2004-12-07 Impact factor: 6.937

8. Disulfide connectivity prediction using secondary structure information and diresidue frequencies.

Authors: F Ferrè; P Clote
Journal: Bioinformatics Date: 2005-03-01 Impact factor: 6.937

9. Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences.

Authors: Yu-Ching Chen; Yeong-Shin Lin; Chih-Jen Lin; Jenn-Kang Hwang
Journal: Proteins Date: 2004-06-01

10. Disulfide connectivity prediction using recursive neural networks and evolutionary information.

Authors: Alessandro Vullo; Paolo Frasconi
Journal: Bioinformatics Date: 2004-01-22 Impact factor: 6.937

108 in total

1. Role of cysteine residues in cell surface expression of the human riboflavin transporter-2 (hRFT2) in intestinal epithelial cells.

Authors: Veedamali S Subramanian; Laramie Rapp; Jonathan S Marchant; Hamid M Said
Journal: Am J Physiol Gastrointest Liver Physiol Date: 2011-04-21 Impact factor: 4.052

2. Characterization of the structure of RAMP1 by mutagenesis and molecular modeling.

Authors: John Simms; Debbie L Hay; Mark Wheatley; David R Poyner
Journal: Biophys J Date: 2006-04-21 Impact factor: 4.033

3. Cloning, expression and characterization of a metagenome derived thermoactive/thermostable pectinase.

Authors: Rajvinder Singh; Samriti Dhawan; Kashmir Singh; Jagdeep Kaur
Journal: Mol Biol Rep Date: 2012-06-19 Impact factor: 2.316

4. Post-translational modifications of the gamma-subunit affect intracellular trafficking and complex assembly of GlcNAc-1-phosphotransferase.

Authors: Marisa Encarnação; Katrin Kollmann; Maria Trusch; Thomas Braulke; Sandra Pohl
Journal: J Biol Chem Date: 2010-12-20 Impact factor: 5.157

5. Expression profiling and in silico homology modeling of Inositol pentakisphosphate 2-kinase, a potential candidate gene for low phytate trait in soybean.

Authors: Nabaneeta Basak; Veda Krishnan; Vanita Pandey; Mansi Punjabi; Alkesh Hada; Ashish Marathe; Monica Jolly; Bhagath Kumar Palaka; Dinakara R Ampasala; Archana Sachdev
Journal: 3 Biotech Date: 2020-05-27 Impact factor: 2.406

6. Unfolding the fold of cyclic cysteine-rich peptides.

Authors: Amarda Shehu; Lydia E Kavraki; Cecilia Clementi
Journal: Protein Sci Date: 2008-03 Impact factor: 6.725

7. Thiol-based redox proteins in abscisic acid and methyl jasmonate signaling in Brassica napus guard cells.

Authors: Mengmeng Zhu; Ning Zhu; Wen-yuan Song; Alice C Harmon; Sarah M Assmann; Sixue Chen
Journal: Plant J Date: 2014-04-15 Impact factor: 6.417

8. Biochemical and functional characterization of the klotho-VS polymorphism implicated in aging and disease risk.

Authors: Tracey B Tucker Zhou; Gwendalyn D King; CiDi Chen; Carmela R Abraham
Journal: J Biol Chem Date: 2013-11-11 Impact factor: 5.157

9. Molecular characterization, modeling, and docking analysis of late phytic acid biosynthesis pathway gene, inositol polyphosphate 6-/3-/5-kinase, a potential candidate for developing low phytate crops.

Authors: Mansi Punjabi; Navneeta Bharadvaja; Archana Sachdev; Veda Krishnan
Journal: 3 Biotech Date: 2018-07-28 Impact factor: 2.406

10. Deorphanization and target validation of cross-tick species conserved novel Amblyomma americanum tick saliva protein.

Authors: Albert Mulenga; Tae Kwon Kim; Adriana Mércia Guaratini Ibelli
Journal: Int J Parasitol Date: 2013-02-19 Impact factor: 3.981