| Literature DB >> 15980459 |
Abstract
Correctly predicting the disulfide bond topology in a protein is of crucial importance for the understanding of protein function and can be of great help for tertiary prediction methods. The web server http://clavius.bc.edu/~clotelab/DiANNA/ outputs the disulfide connectivity prediction given input of a protein sequence. The following procedure is performed. First, PSIPRED is run to predict the protein's secondary structure, then PSIBLAST is run against the non-redundant SwissProt to obtain a multiple alignment of the input sequence. The predicted secondary structure and the profile arising from this alignment are used in the training phase of our neural network. Next, cysteine oxidation state is predicted, then each pair of cysteines in the protein sequence is assigned a likelihood of forming a disulfide bond--this is performed by means of a novel architecture (diresidue neural network). Finally, Rothberg's implementation of Gabow's maximum weighted matching algorithm is applied to diresidue neural network scores in order to produce the final connectivity prediction. Our novel neural network-based approach achieves results that are comparable and in some cases better than the current state-of-the-art methods.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15980459 PMCID: PMC1160173 DOI: 10.1093/nar/gki412
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1A toy example of the diresidue neural network architecture. Six input units (named 1, …, 6) are connected to the units of the first hidden layer (7, …, 21), called the diresidue layer. Each pair of input units is connected to a distinct unit in the diresidue layer. The units of the diresidue layer are then fully connected to the five units (22, …, 26) of the second hidden layer, which are fully connected to the single output unit. Using the second hidden layer provided a better performance than connecting the diresidue layer units directly to the output unit. In the DiANNA application, each residue is encoded by 23 input units (20 encoding the evolutionary information and 3 for the secondary structure information); therefore, each unit in the diresidue layer is connected to 23 + 23 = 46 input units that code a pair of residues.
Figure 2Output from DiANNA when given as input the sequence for human growth hormone receptor (SwissProt ID GHR_HUMAN, PDB code 1kf9 chain F). This protein has 6 cysteines that form 3 disulfide bonds, with connectivity pattern 1–2, 3–4, 5–6 (between cysteines 6 and 16, 33 and 44, 58 and 72). The upper portion of the output page reports the Module B score (see text) for each pair of cysteines, ranging from 0 to 1 (scores >0.9 are highlighted). In the lower portion, the proposed connectivity (i.e. the Module C output) is shown.