| Literature DB >> 26140928 |
Andrew Ndhlovu1, Pierre M Durand2, Scott Hazelhurst3.
Abstract
The evolutionary rate at codon sites across protein-coding nucleotide sequences represents a valuable tier of information for aligning sequences, inferring homology and constructing phylogenetic profiles. However, a comprehensive resource for cataloguing the evolutionary rate at codon sites and their corresponding nucleotide and protein domain sequence alignments has not been developed. To address this gap in knowledge, EvoDB (an Evolutionary rates DataBase) was compiled. Nucleotide sequences and their corresponding protein domain data including the associated seed alignments from the PFAM-A (protein family) database were used to estimate evolutionary rate (ω = dN/dS) profiles at codon sites for each entry. EvoDB contains 98.83% of the gapped nucleotide sequence alignments and 97.1% of the evolutionary rate profiles for the corresponding information in PFAM-A. As the identification of codon sites under positive selection and their position in a sequence profile is usually the most sought after information for molecular evolutionary biologists, evolutionary rate profiles were determined under the M2a model using the CODEML algorithm in the PAML (Phylogenetic Analysis by Maximum Likelihood) suite of software. Validation of nucleotide sequences against amino acid data was implemented to ensure high data quality. EvoDB is a catalogue of the evolutionary rate profiles and provides the corresponding phylogenetic trees, PFAM-A alignments and annotated accession identifier data. In addition, the database can be explored and queried using known evolutionary rate profiles to identify domains under similar evolutionary constraints and pressures. EvoDB is a resource for evolutionary, phylogenetic studies and presents a tier of information untapped by current databases.Entities:
Mesh:
Year: 2015 PMID: 26140928 PMCID: PMC4492416 DOI: 10.1093/database/bav065
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Workflow for the development and compilation of EvoDB.
Statistics of the EvoDB database representation for the PFAM-A seed alignments database
| Sequence data | Numbers | Percentage | ||
|---|---|---|---|---|
| Pandit | EvoDB | Pandit | EvoDB | |
| Evolutionary rate (ω MLE) profiles | — | 13 277 | — | 97.1 |
| Nucleotide sequence alignments | 7738 | 13 512 | 56.6 | 98.83 |
| Nucleotide sequences | 174 760 | 501 375 | 25.8 | 74 |
The numbers of corresponding sequence data in Pandit (Pandit-Plus) have been provided for comparison. The percentage represents comparison of EvoDB coverage to the total numbers found in the PFAM-A seed alignments database.
Figure 2.The EvoDB web interface allows for easy query and download of data. The database can be queried using PFAM-A domain identifiers and accession identifiers. The results shown here are for the tumor suppressor p53 domain. The CODEML ‘mlc’ and ‘rst’ analysis results for the M1a and M2ac models are provided and a summary of results is provided for viewing. Graphical plots of evolutionary rate profiles can also be viewed or downloaded in various picture file formats. EvoDB provides an interface for downloading the corresponding nucleotide sequences of PFAM protein domain families.