| Literature DB >> 17026775 |
Denis C Bauer1, Mikael Bodén, Ricarda Thier, Elizabeth M Gillam.
Abstract
BACKGROUND: Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17026775 PMCID: PMC1624854 DOI: 10.1186/1471-2105-7-437
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The STAR-score prediction accuracy.
| Configuration | r |
| FFNN (0 hidden) | 0.56 |
| FFNN (20 hidden) | 0.56 |
| FFNN (40 hidden) | 0.56 |
| BRNN (7+7 hidden) | 0.66 |
The STAR-score prediction accuracy (established from test data using the correlation with the calculated STAR-score) when sequence data was presented directly to the machine learning algorithm.
The STAR-score prediction accuracy.
| Configuration | r |
| FFNN (0 hidden) | 0.86 |
| FFNN (20 hidden) | 0.86 |
| FFNN (40 hidden) | 0.86 |
| BRNN (7+7 hidden) | 0.89 |
| 0.82 | |
| 0.80 | |
| 0.83 |
The STAR-score prediction accuracy (established from test data using the correlation with the calculated STAR-score) when sequence data was presented as the predicted 3-class Continuum Secondary Structure.
Figure 1The SCHEMA-profile, the STAR-profile, and post-processed profiles for Conseq, MUpro and l-Mutant2.0 for the protein 1BLS. (a) The multi-parent S-scores (normalised) along with the calculated and predicted STAR-scores for 1BLS. The profiles indicate the number of disrupted connections (y-axis) at each sequence position (x-axis). (b) The post-processed score from MUpro for 1BLS. The profile indicates the structural stability change caused by mutation (y-axis) for each sequence residue (x-axis). (c) The post-processed score from I-Mutant2.0 for 1BLS. The profile indicates the structural stability change caused by mutation (y-axis) for each sequence residue (x-axis). (d) The post-processed score from Conseq. The profile indicates the level of amino acid conservation (y-axis) for each sequence residue (x-axis). The successful recombination sites from a random DNA shuffling experiment are added to each graph and plotted as vertical lines [1].
Figure 2The 1BLSprotein structure. Three residues are identified, each translating into divergent SCHEMA- and STAR-scores.
Figure 3The SCHEMA-profile and the STAR-profile for the protein 1CDD. The multi-parent S-score for 1CDD and GART (normalised) and the single-parent S-score for 1CDD (normalised), along with the predicted STAR-score for 1CDD. The gap is caused by the lack of information in the PDB file for residues 110 to 133. The prediction for the complete sequence accurately disqualifies recombination in this area, while agrees with the prediction generated for a sequence in which these 23 residues were removed. The successful recombination sites from a DNA shuffling experiment are added and plotted as vertical lines [23, 24].