| Literature DB >> 17207271 |
Irini A Doytchinova1, Darren R Flower.
Abstract
BACKGROUND: Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach--such as speed and cost efficiency--its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17207271 PMCID: PMC1780059 DOI: 10.1186/1471-2105-8-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
VaxiJen models validation.
| AUCROCa | accuracy%c | sensitivity%d | specificity%e | |||
| LOO-CV | 0.883 | 0.5 | 80 | 79 | 81 | |
| bacterial | test set | 0.726 | 0.5 | 70 | 76 | 64 |
| LOO-CV (mean)g | 0.899 | 0.5 | 83 | 81 | 85 | |
| LOO-CV | 0.937 | 0.5 | 87 | 91 | 82 | |
| viral | test set | 0.743 | 0.4 | 70 | 84 | 56 |
| LOO-CV (mean)g | 0.810 | 0.5 | 73 | 74 | 71 | |
| LOO-CV | 0.964 | 0.5 | 89 | 94 | 84 | |
| tumour | test set | 0.930 | 0.5 | 86 | 96 | 76 |
| LOO-CV (mean)g | 0.911 | 0.5 | 82 | 78 | 86 |
aThe area under the curve (AUC) is a quantitative measure of the predictive ability and varies from 0.5 for a random prediction to 1.0 for a perfect prediction.
bThe threshold of the highest accuracy.
cAccuracy = (true antigens + true non-antigens)/total.
dSensitivity = true antigens/all antigens.
eSpecificity = true non-antigens/all non-antigens.
gMean values of five training sets.
Figure 1ROC curves for VaxiJen bacterial model.
Figure 2ROC curves for VaxiJen viral model.
Figure 3ROC curves for VaxiJen tumour model.
Similarities between sequences in the three training sets.
| Bacterial | 84 | 1 | 4 | 74 | 1.19 | 6,2,2 |
| Viral | 87 | 1 | 5 | 78 | 1.15 | 7,1,0,1 |
| Tumour | 76 | 1 | 7 | 66 | 1.32 | 4,2,2,1,0,1 |
For a given cut-off, a perfectly diverse set of sequences will have number of clusters equal to the number of sequences, a maximum and minimum cluster size of one, and an average cluster size of one.
a for non-singleton clusters of 2 or more members. Cluster numbers are shown in ascending cluster size.