| Literature DB >> 16573808 |
Alessandro Vullo1, Ian Walsh, Gianluca Pollastri.
Abstract
BACKGROUND: Protein topology representations such as residue contact maps are an important intermediate step towards ab initio prediction of protein structure. Although improvements have occurred over the last years, the problem of accurately predicting residue contact maps from primary sequences is still largely unsolved. Among the reasons for this are the unbalanced nature of the problem (with far fewer examples of contacts than non-contacts), the formidable challenge of capturing long-range interactions in the maps, the intrinsic difficulty of mapping one-dimensional input sequences into two-dimensional output maps. In order to alleviate these problems and achieve improved contact map predictions, in this paper we split the task into two stages: the prediction of a map's principal eigenvector (PE) from the primary sequence; the reconstruction of the contact map from the PE and primary sequence. Predicting the PE from the primary sequence consists in mapping a vector into a vector. This task is less complex than mapping vectors directly into two-dimensional matrices since the size of the problem is drastically reduced and so is the scale length of interactions that need to be learned.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16573808 PMCID: PMC1484494 DOI: 10.1186/1471-2105-7-180
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
PE prediction: two-class problem. Accuracy estimates with 95% confidence intervals and SOV. A * in the first three columns of a row indicates whether the results are obtained augmenting the network input with secondary structure predicted by Porter (P), hydrophobicity profile using the interactivity scale of [20] (H) and using second stage filtering network (F).
| P | H | F | SOV | |||
| - | - | - | 72.0 ± .6 | 73.1 ± .6 | 70.8 ± .6 | 44.4 |
| - | - | * | 72.1 ± .6 | 73.4 ± .6 | 70.7 ± .6 | 46.0 |
| * | - | - | 72.3 ± .6 | 73.1 ± .6 | 71.4 ± .6 | 47.6 |
| * | - | * | 72.3 ± .6 | 72.8 ± .6 | 71.9 ± .6 | 49.6 |
| * | * | - | 72.5 ± .6 | 74.0 ± .6 | 71.0 ± .6 | 47.2 |
| * | * | * | 72.6 ± .6 | 73.8 ± .6 | 71.2 ± .6 | 49.8 |
| baseline | 56.8 ± .5 | 58.4 ± .5 | 55.1 ± .5 | - | ||
PE prediction: three-class problem. Accuracy estimates with 95% confidence intervals and SOV. A * in the first three columns of a row indicates whether the results are obtained augmenting the network input with secondary structure predicted by Porter (P), hydrophobicity profile using the interactivity scale of [20] (H) and using second stage filtering network (F).
| P | H | F | SOV | ||||
| - | - | - | 55.8 ± .5 | 63.7 ± .5 | 35.5 ± .4 | 67.6 ± .6 | 38.3 |
| - | - | * | 56.2 ± .5 | 61.6 ± .6 | 40.6 ± .5 | 65.8 ± .6 | 40.3 |
| * | - | - | 56.3 ± .6 | 65.8 ± .6 | 36.9 ± .5 | 65.5 ± .6 | 42.1 |
| * | - | * | 56.6 ± .6 | 64.1 ± .6 | 41.1 ± .4 | 63.9 ± .6 | 43.6 |
| * | * | - | 56.4 ± .5 | 65.2 ± .6 | 36.8 ± .4 | 66.4 ± .6 | 42.6 |
| * | * | * | 56.7 ± .5 | 63.3 ± .6 | 41.4 ± .4 | 64.6 ± .6 | 44.0 |
| baseline | 39.8 ± .4 | 50.6 ± .5 | 8.5 ± .2 | 59.3 ± .6 | - | ||
PE prediction: four-class problem. Accuracy estimates with 95% confidence intervals and SOV. A * in the first three columns of a row indicates whether the results are obtained augmenting the network input with secondary structure predicted by Porter (P), hydrophobicity profile using the interactivity scale of [20] (H) and using second stage filtering network (F).
| P | H | F | SOV | |||||
| - | - | - | 45.6 ± .5 | 59.9 ± .6 | 29.4 ± .4 | 26.8 ± .4 | 65.0 ± .6 | 33.2 |
| - | - | * | 46.0 ± .5 | 58.5 ± .5 | 30.6 ± .4 | 30.5 ± .4 | 63.3 ± .6 | 34.6 |
| * | - | - | 46.2 ± .5 | 62.1 ± .6 | 30.5 ± .4 | 27.3 ± .4 | 63.1 ± .6 | 37.3 |
| * | - | * | 46.5 ± .5 | 60.7 ± .5 | 32.3 ± .4 | 29.6 ± .4 | 61.8 ± .5 | 37.4 |
| * | * | - | 45.9 ± .5 | 61.5 ± .5 | 30.0 ± .4 | 27.3 ± .3 | 63.3 ± .6 | 37.1 |
| * | * | * | 46.5 ± .5 | 60.0 ± .5 | 32.2 ± .4 | 30.4 ± .4 | 61.9 ± .6 | 37.8 |
| baseline | 30.7 ± .4 | 48.1 ± .5 | 5.4 ± .2 | 8.5 ± .2 | 59.0 ± .6 | - | ||
Figure 24-class principal eigenvector prediction for protein 1A2P (108 amino acids). Solid line: exact eigenvector class. Dashed line: predicted eigenvector class. The class value is averaged over a moving window of 5 residues.
Performance results for contact map prediction. Contact threshold: 8 Å. Accuracy, coverage and F1 (as %) for 8 Å contact map predictor for distance separations greater than 5, 11 and 23 amino acids.
| | | | | | | ||||||||||
| P | R | F1 | P | P | R | F1 | P | P | R | F1 | P | |
| MA | 0 | 0 | 0 | 97.2 | 0 | 0 | 0 | 97.6 | 0 | 0 | 0 | 97.9 |
| MA_PE | 39.4 | 12.2 | 18.6 | 97.5 | 36.2 | 8.4 | 13.5 | 97.7 | 27.8 | 2.0 | 3.7 | 97.9 |
| MA_SS_ACC | 50.5 | 7.4 | 12.9 | 97.4 | 48.8 | 4.0 | 7.4 | 97.6 | 25.7 | .2 | .3 | 97.9 |
| MA_SS_ACC_PE | 43.3 | 11.3 | 17.9 | 97.5 | 38.9 | 7.2 | 12.1 | 97.7 | 25.5 | 2.2 | 4.1 | 97.9 |
Performance results for contact map prediction. Contact threshold: 8 Å. Accuracy and coverage (as %) for 8 Å contact map predictor for distance separations greater than 5, 11 and 23 amino acids, when the top N/5 and top N/2 contacts are considered (where N is the length of the protein).
| | | | | | | ||||||||||
| P | R | P | R | P | R | P | R | P | R | P | R | |
| MA | 30.8 | 3.9 | 26.2 | 8.5 | 23.9 | 3.8 | 19.2 | 7.8 | 14.1 | 3.3 | 11.1 | 6.6 |
| MA_PE | 43.0 | 5.5 | 34.6 | 11.2 | 34.2 | 5.5 | 26.6 | 10.8 | 19.6 | 4.6 | 15.0 | 8.9 |
| MA_SS_ACC | 44.4 | 5.7 | 36.0 | 11.6 | 34.2 | 5.5 | 26.6 | 10.8 | 17.8 | 4.2 | 14.7 | 8.7 |
| MA_SS_ACC_PE | 46.4 | 5.9 | 36.6 | 11.8 | 35.4 | 5.7 | 27.0 | 11.0 | 19.8 | 4.6 | 15.7 | 9.3 |
Performance results for contact map prediction. Contact threshold: 12 Å. Accuracy, coverage and F1 (as %) for 12 Å contact map predictor for distance separations greater than 5, 11 and 23 amino acids.
| | | | | | | ||||||||||
| P | R | F1 | P | P | R | F1 | P | P | R | F1 | P | |
| MA | 60.4 | 10.6 | 18.1 | 87.2 | 55.8 | 0.1 | 0.1 | 87.8 | 38.9 | 0.03 | 0.06 | 88.8 |
| MA_PE | 49.5 | 24.5 | 32.8 | 88.6 | 39.4 | 16.8 | 23.6 | 89.3 | 34.5 | 13.6 | 19.5 | 89.9 |
| MA_SS_ACC | 61.6 | 19.6 | 29.7 | 88.2 | 48.9 | 7.5 | 13.1 | 88.5 | 40.2 | 2.8 | 5.3 | 89.0 |
| MA_SS_ACC_PE | 54.2 | 23.5 | 32.8 | 88.6 | 42.2 | 14.6 | 21.7 | 89.2 | 36.7 | 10.9 | 16.8 | 89.7 |
Performance results for contact map prediction. Contact threshold: 12 Å. Accuracy and coverage (as %) for 12 Å contact map predictor for distance separations greater than 5, 11 and 23 amino acids, when the top N/5 and top N/2 contacts are considered (where N is the length of the protein).
| | | | | | | ||||||||||
| P | R | P | R | P | R | P | R | P | R | P | R | |
| MA | 79.6 | 2.0 | 71.9 | 4.6 | 50.1 | 1.6 | 46.2 | 3.8 | 43.3 | 1.9 | 38.1 | 4.2 |
| MA_PE | 87.5 | 2.2 | 81.6 | 5.2 | 59.8 | 1.9 | 54.1 | 4.4 | 47.9 | 2.1 | 42.4 | 4.7 |
| MA_SS_ACC | 89.7 | 2.3 | 85.3 | 5.5 | 61.3 | 2.0 | 54.9 | 4.5 | 43.7 | 1.9 | 39.4 | 4.4 |
| MA_SS_ACC_PE | 89.9 | 2.3 | 85.5 | 5.5 | 62.5 | 2.0 | 55.6 | 4.6 | 49.9 | 2.2 | 43.8 | 4.9 |
Figure 3Examples of contact map predictions at 12 Å for protein 1A2P (108 amino acids). Exact map in the top-right half, predicted map in the bottom-left half. Prediction by MA_SS_ACC on the left, MA_SS_ACC_PE on the right (see text for details).
Figure 1Distribution of values in the training set. See text for details.