| Literature DB >> 17156431 |
E Michael Gertz1, Yi-Kuo Yu, Richa Agarwala, Alejandro A Schäffer, Stephen F Altschul.
Abstract
BACKGROUND: TBLASTN is a mode of operation for BLAST that aligns protein sequences to a nucleotide database translated in all six frames. We present the first description of the modern implementation of TBLASTN, focusing on new techniques that were used to implement composition-based statistics for translated nucleotide searches. Composition-based statistics use the composition of the sequences being aligned to generate more accurate E-values, which allows for a more accurate distinction between true and false matches. Until recently, composition-based statistics were available only for protein-protein searches. They are now available as a command line option for recent versions of TBLASTN and as an option for TBLASTN on the NCBI BLAST web server.Entities:
Mesh:
Year: 2006 PMID: 17156431 PMCID: PMC1779365 DOI: 10.1186/1741-7007-4-41
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Figure 1Statistical accuracy of three variants of TBLASTN. One thousand queries were randomly selected from mouse proteins, permuted, and aligned to human nuclear DNA. For each variant, we plot against x the number of queries with P-value less than or equal to x. The solid line is the theoretically ideal distribution of these values.
Figure 2A portion of the ROC curves for three variants of TBLASTN. The ROC curves were generated by analyzing the results of aligning 102 queries against the yeast genome. The ROC-250 score for each version of TBLASTN is included in the legend in parentheses after the name of the version. True positives are plotted against false positives, on a linear scale. The total number of true positives possible in this test set was 988. Inset: part of the same ROC curves, plotted on a different scale to show the separation between curves.
Figure 3A semi-log plot of a portion of the ROC curves for three variants of TBLASTN. The same data as Figure2 in a semi-log plot, using the scales of coverage and errors per query.
ROC scores for three variants of TBLASTN. ROC scores for three variants of TBLASTN, at several thresholds of false positive matches. These scores were generated by analyzing the results of aligning 102 queries against the yeast genome.
| Program | 50 | 150 | 250 |
| B-TBLASTN | 0.458 ± 0.004 | 0.478 ± 0.002 | 0.484 ± 0.001 |
| S-TBLASTN | 0.454 ± 0.004 | 0.471 ± 0.002 | 0.478 ± 0.001 |
| C-TBLASTN | 0.455 ± 0.004 | 0.474 ± 0.002 | 0.481 ± 0.001 |