| Literature DB >> 18211675 |
Andrey M Leontovich1, Konstantin Y Tokmachev, Hans C van Houwelingen.
Abstract
BACKGROUND: This paper discusses the problem of automated annotation. It is a continuation of the previous work on the A4-algorithm (Adaptive algorithm of automated annotation) developed by Leontovich and others.Entities:
Mesh:
Year: 2008 PMID: 18211675 PMCID: PMC2267706 DOI: 10.1186/1471-2105-9-31
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A4 algorithm scheme.
Certain characteristics of sequences used for the testing
| Total number of query sequences, i.e., sequences used during the main testing | 308 |
| Total number of words that belong to query sequences ( | 1176 |
| Including: non-degenerate words | 824 |
| degenerate words | 352 |
| Total number of words for which the prediction was performed ( | 9236 |
| Including: non-degenerate words | 7207 |
| degenerate words | 2029 |
Testing results for different basic variants of the statistics.
| Statistic | |||||
| 130 | 104 | 234 | 0.123 | 0.201 | |
| 136 | 106 | 242 | 0.129 | 0.208 | |
| 164 | 103 | 267 | 0.152 | 0.232 | |
| 216 | 103 | 319 | 0.196 | 0.281 | |
| 180 | 141 | 321 | 0.171 | 0.277 | |
| 200 | 124 | 324 | 0.185 | 0.283 | |
| 234 | 111 | 345 | 0.213 | 0.304 | |
| 205 | 148 | 353 | 0.193 | 0.307 | |
| 142 | 217 | 359 | 0.148 | 0.294 | |
| 164 | 224 | 388 | 0.167 | 0.321 | |
The table shows the testing results of the different statistics for n= 9236 'predictions'. In n= 1176 cases the "predicted word" belonged to the sequence, in n- n= 8060 it did not. N1 denotes the number of false negatives, N2 is the number of false positives. N= N1 + N2, P(1) = N1/n, P(2) = N2/(n- n) and P(+) = N2/N2 + (n- N1)). See the main text for the description of the variants.
Lay out of the confusion table for the results of Table 2.
| word "predicted" | word "not predicted" | total | |
| word present | 1176- | ||
| word absent | 8060- | ||
| total | 1176- |
Figure 2ROC curves for the best variants of the statistics. The figure shows the ROC curves (1–5) for the best variants of the statistics (i.e., for variants marked in Table 2). Curve 1 corresponds to η[1,1]; curve 2 corresponds to T(1)[1,0]; curve 3 corresponds to T(2)[1,1]; curve 4 corresponds to [0,1]; curve 5 corresponds to q.