| Literature DB >> 15005802 |
Estienne C Swart1, Winston A Hide, Cathal Seoighe.
Abstract
BACKGROUND: Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account.Entities:
Mesh:
Substances:
Year: 2004 PMID: 15005802 PMCID: PMC344743 DOI: 10.1186/1471-2105-5-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flow diagram illustrating the strategy used to generate in-frame alignments from fragmentary sequences.
Figure 2Screenshot of the user interface to the MySQL database. The screenshot shows a subset of the data available from the substitution statistics table and also illustrates the query functionality available from the web-based phpMyAdmin interface [41].
Substitution rates derived from unconcatenated alignments. The table shows values obtained by averaging over individual in-frame alignments of sequence fragments. The alignments were derived from the chimpanzee BESs with and without the removal of low-quality sequence regions, annotated coding sequences retrieved from Genbank and coding sequence alignments derived from ESTs deposited in Genbank. Standard deviations from the mean are given in brackets.
| BES (quality filtered) | 13930 | 0.932 | 0.012 (0.023) | 0.090 (0.520) | 0.301 (0.43) |
| BES (unfiltered) | 42477 | 0.935 | 0.022 (0.021) | 0.076 (0.314) | 0.500 (0.53) |
| Genbank CDSs | 25488 | 0.966 | 0.010 (0.014) | 0.028 (0.038) | 0.461 (0.55) |
| Genbank ESTs | 114759 | 0.972 | 0.008 (0.002) | 0.048 (0.210) | 0.235 (0.01) |
Substitution rates derived from concatenated alignments. Results derived from concatenated alignments from the same datasets as in Table 1 are shown. Sampling error, in parentheses, was estimated using the bootstrap method described in the Implementation section.
| BES (quality filtered) | 19490 | 0.959 | 0.012 (0.001) | 0.033 (0.002) | 0.374 (0.04) |
| BES (unfiltered) | 45702 | 0.935 | 0.022 (0.001) | 0.046 (0.002) | 0.491 (0.02) |
| Genbank CDSs | 26019 | 0.966 | 0.009 (0.001) | 0.020 (0.001) | 0.473 (0.05) |
| Genbank ESTs | 125198 | 0.958 | 0.006 (0.0003) | 0.026 (0.001) | 0.235 (0.01) |