| Literature DB >> 17519041 |
Craig E Jones1, Alfred L Brown, Ute Baumann.
Abstract
BACKGROUND: Annotations that describe the function of sequences are enormously important to researchers during laboratory investigations and when making computational inferences. However, there has been little investigation into the data quality of sequence function annotations. Here we have developed a new method of estimating the error rate of curated sequence annotations, and applied this to the Gene Ontology (GO) sequence database (GOSeqLite). This method involved artificially adding errors to sequence annotations at known rates, and used regression to model the impact on the precision of annotations based on BLAST matched sequences.Entities:
Mesh:
Year: 2007 PMID: 17519041 PMCID: PMC1892569 DOI: 10.1186/1471-2105-8-170
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Maximal precision estimates, regression coefficients and annotation error rate estimates by experiment type.
| Non-ISS annotation error | ||||||
| Cross-validation group 1 | 0.882 | 0.841 | 0.760 | -0.710 | 17 | 11 |
| Cross-validation group 2 | 0.880 | 0.849 | 0.756 | -0.703 | 18 | 13 |
| Cross-validation group 3 | 0.885 | 0.845 | 0.767 | -0.712 | 17 | 11 |
| Cross-validation group 4 | 0.882 | 0.855 | 0.757 | -0.704 | 18 | 14 |
| Cross-validation group 5 | 0.886 | 0.836 | 0.757 | -0.699 | 19 | 11 |
| Cross-validation group 6 | 0.876 | 0.850 | 0.749 | -0.698 | 18 | 15 |
| Cross-validation group 7 | 0.891 | 0.867 | 0.768 | -0.717 | 17 | 14 |
| Cross-validation group 8 | 0.884 | 0.864 | 0.758 | -0.709 | 18 | 15 |
| Cross-validation group 9 | 0.885 | 0.873 | 0.760 | -0.710 | 18 | 16 |
| Cross-validation group 10 | 0.873 | 0.833 | 0.748 | -0.697 | 18 | 12 |
| ISS annotation error | 0.442 | 0.443 | 0.305 | -0.282 | 49 | 49 |
a Mp1, the maximal precision estimate derived from the ratio of non-zero precision cases to total in the highest precision sample.
b Mp2, the maximal precision estimate derived from the highest precision UniProt-UniProt sequence matches.
c E1, the annotation error estimate based on Mp1.
d E2, the annotation error estimate based on Mp2.
Figure 1Insertion of annotation errors: Errors were randomly inserted into reference set annotations at a fixed error rate. The precision of reference set annotations for predicting the annotations of query sequences was determined, and the average precision at that error rate was recorded. This process was repeated 100 times for a given error rate value, after which the error rate was incremented. This process continued until data was obtained for artificially increased error rates of between 2% and 40%.