| Literature DB >> 18463141 |
A E Lobley1, T Nugent, C A Orengo, D T Jones.
Abstract
One of the challenges of the post-genomic era is to provide accurate function annotations for large volumes of data resulting from genome sequencing projects. Most function prediction servers utilize methods that transfer existing database annotations between orthologous sequences. In contrast, there are few methods that are independent of homology and can annotate distant and orphan protein sequences. The FFPred server adopts a machine-learning approach to perform function prediction in protein feature space using feature characteristics predicted from amino acid sequence. The features are scanned against a library of support vector machines representing over 300 Gene Ontology (GO) classes and probabilistic confidence scores returned for each annotation term. The GO term library has been modelled on human protein annotations; however, benchmark performance testing showed robust performance across higher eukaryotes. FFPred offers important advantages over traditional function prediction servers in its ability to annotate distant homologues and orphan protein sequences, and achieves greater coverage and classification accuracy than other feature-based prediction servers. A user may upload an amino acid and receive annotation predictions via email. Feature information is provided as easy to interpret graphics displayed on the sequence of interest, allowing for back-interpretation of the associations between features and function classes.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18463141 PMCID: PMC2447771 DOI: 10.1093/nar/gkn193
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Server process flow diagram.
Figure 2.Sample output from the server for sequence IPI00745501.
Classification performance for six eukaryotic proteomes
| MCC | Sensitivity | Specificity | Precision | No. of Proteins | No. of Categories | |
|---|---|---|---|---|---|---|
| Human | 0.66 | 0.67 | 0.99 | 0.68 | 32 528 | 197 |
| Mouse | 0.57 | 0.48 | 0.98 | 0.52 | 26 557 | 196 |
| Zebrafish | 0.65 | 0.58 | 0.97 | 0.64 | 12 684 | 186 |
| Worm | 0.47 | 0.47 | 0.97 | 0.56 | 11 770 | 165 |
| Fly | 0.44 | 0.40 | 0.98 | 0.57 | 13 107 | 175 |
| Yeast | 0.42 | 0.34 | 0.97 | 0.61 | 5527 | 99 |
Each performance statistic represents the mean average value for all GO term classifiers. MCC represents Matthew's correlation coefficient, a measure of overall classifier accuracy. A value of 0 indicates random performance, whilst a value of 1 implies perfect classification. Sensitivity represents the proportion of positive examples recovered by the classifier, i.e. TP/(TP + FN). Specificity represents the proportion of negatives examples recovered by the classifier i.e. TN/(FP + TN). Precision represents the proportion of positive assignments made by the classifier that were correct, i.e. TP/(TP + FP). TP, true positives; TN; true negatives; FP, false positives; FN, false negatives.