| Literature DB >> 16844987 |
Abstract
DiANNA is a recent state-of-the-art artificial neural network and web server, which determines the cysteine oxidation state and disulfide connectivity of a protein, given only its amino acid sequence. Version 1.0 of DiANNA uses a feed-forward neural network to determine which cysteines are involved in a disulfide bond, and employs a novel architecture neural network to predict which half-cystines are covalently bound to which other half-cystines. In version 1.1 of DiANNA, described here, we extend functionality by applying a support vector machine with spectrum kernel for the cysteine classification problem-to determine whether a cysteine is reduced (free in sulfhydryl state), half-cystine (involved in a disulfide bond) or bound to a metallic ligand. In the latter case, DiANNA predicts the ligand among iron, zinc, cadmium and carbon. Available at: http://bioinformatics.bc.edu/clotelab/DiANNA/.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16844987 PMCID: PMC1538812 DOI: 10.1093/nar/gkl189
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Total number of different cysteine species in datasets considered in this paper
| Dataset | Total | ||||
|---|---|---|---|---|---|
| 624 | 60 | 2 | 546 | 1230 | |
| 216 | 1481 | 37 | 2412 | 4109 | |
| 624 | 608 | 24 | 1199 | 2455 |
The description of each dataset can be found in the section ‘Dataset’. Legend: IA, intra-chain disulfide bonds; IE, inter-chain disulfide bonds; HC, half-cystines; FC, free cysteines; LC, ligand-bound cysteines.
Breakdown of protein chains which contain at the same time half-cystines (HC), free cysteines (FC) and ligand-bound cysteines (LC), for each of the three datasets considered in this paper
| Chains | UPMA | UP | MA |
|---|---|---|---|
| Total | 526 | 202 | 967 |
| w/ | 140 | 19 | 291 |
| w/ | 363 | 139 | 716 |
| w/ | 189 | 202 | 52 |
| w/ both | 28 | 9 | 65 |
| w/ both | 17 | 19 | 1 |
| w/ both | 128 | 139 | 26 |
| w/ | 7 | 9 | 0 |
Performance measure (Q3 and Q scores) for the three-class prediction of LC, HC, FC using different kernels and input representation
| Kernel | ||||||
|---|---|---|---|---|---|---|
| SpR | MmR | PrfR | SpR | MmR | PrfR | |
| Linear | 0.75 | 0.64 | 0.63 | 0.45 | 0.43 | 0.43 |
| Polynomial (2) | 0.78 | 0.74 | 0.74 | 0.53 | 0.46 | 0.45 |
| Polynomial (3) | 0.76 | 0.72 | 0.72 | 0.5 | 0.47 | 0.47 |
| 0.75 | 0.73 | 0.72 | 0.43 | 0.43 | 0.43 | |
The Q3 score is the ratio between correct prediction and total number of examples. The Q score is the fraction of proteins for which all cysteines are correctly predicted. Q3 and Q scores are obtained averaging the results of a 5-fold cross validation. Optimal values of the C parameter and the γ parameter for the radial basis function (RBF) kernel are estimated by a grid search. Legend: SpR—Spectrum representation; MmR—Mismatch representation; PrfR—Profile representation.
Total number of distinct atomic ligands found covalently bound to cysteine residues in the UPMA dataset.
| Cys-bound atom | Examples |
|---|---|
| As | 2 |
| Au | 1 |
| C | 89 |
| Cd | 39 |
| Cu | 10 |
| Fe | 185 |
| H | 1 |
| Hg | 24 |
| Mn | 1 |
| Ni | 6 |
| Pb | 1 |
| S | 27 |
| U | 2 |
| Zn | 225 |
Performance measures for the prediction of cysteines bound to specific ligands
| Measure | Zn | Cd | Fe | C |
|---|---|---|---|---|
| Acc | 0.93 | 0.99 | 0.91 | 0.96 |
| Sen | 0.8 | 0.97 | 0.67 | 0.74 |
| Spe | 0.99 | 1 | 0.98 | 0.99 |
| MCC | 0.84 | 0.99 | 0.74 | 0.83 |
| AUC | 0.97 | 0.97 | 0.94 | 0.94 |
Legend: Acc—accuracy; Sen—sensitivity; Spe—specificity; MCC—Matthew's correlation coefficient; AUC—area under the ROC curve.
Figure 1ROC curves for the prediction of cysteines covalently bound to specific ligands. [For an explanation of receiver operating characteristic (ROC) curves see (20)].
Figure 2DiANNA ternary cysteine classification prediction input and output example. Upper panel: The DiANNA web-server update allows the user to choose between disulfide connectivity prediction and cysteine classification (ternary cysteine classification is only available in the 1.1 update). In the latter case, the user can type or paste a FASTA sequence in a text box, then choose among four different classification predictions by means of a drop down menu (i.e. the ternary LC versus HC versus FC classification, and the three binary classifications LC versus HC, LC versus FC and HC versus FC). Lower panel: Output for the ternary classification. For each cysteine in the submitted sequence, the SVM model predicts the probability of being half-cystine, free cysteine or ligand-bound. The class having the highest probability is highlighted. If a specific cysteine is predicted as ligand bound, a tentative prediction about the putative ligand (out of four possible ligands) is attempted.