| Literature DB >> 17962297 |
Jaime Huerta-Cepas1, Anibal Bueno, Joaquín Dopazo, Toni Gabaldón.
Abstract
The complete collection of evolutionary histories of all genes in a genome, also known as phylome, constitutes a valuable source of information. The reconstruction of phylomes has been previously prevented by large demands of time and computer power, but is now feasible thanks to recent developments in computers and algorithms. To provide a publicly available repository of complete phylomes that allows researchers to access and store large-scale phylogenomic analyses, we have developed PhylomeDB. PhylomeDB is a database of complete phylomes derived for different genomes within a specific taxonomic range. All phylomes in the database are built using a high-quality phylogenetic pipeline that includes evolutionary model testing and alignment trimming phases. For each genome, PhylomeDB provides the alignments, phylogentic trees and tree-based orthology predictions for every single encoded protein. The current version of PhylomeDB includes the phylomes of Human, the yeast Saccharomyces cerevisiae and the bacterium Escherichia coli, comprising a total of 32 289 seed sequences with their corresponding alignments and 172 324 phylogenetic trees. PhylomeDB can be publicly accessed at http://phylomedb.bioinfo.cipf.es.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17962297 PMCID: PMC2238872 DOI: 10.1093/nar/gkm899
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Screenshot of a PhylomeDB entry page. The entry page of an individual seed protein (Hsa0017176) of the human phylome (Hsapiens001), includes information on the sequence and links to (i) the feature page containing all necessary information of the phylogenetic pipeline used; (ii) all the phylogenetic trees derived by the different programs and models used in the pipeline, in this case a NJ tree, ML trees derived from JTT, WAG, RtREV and BLOSUM62 models, and the consensus tree derived from the Bayesian analyses; (iii) the raw and trimmed (clean) alignments and (iv) the list of tree-based orthology predictions for the species included in the tree.
Current content in phylomeDB
| Phylome code | Seed species | Seed proteins | Species content | Total trees | Phylogenetic methods | Brief description |
|---|---|---|---|---|---|---|
| Hsapiens001 | 21 588 | 38 | 157 233 | NJ, Bayesian ML(JTT,WAG,B62, RtREV, MtREv) | 38 eukaryotic species from Ensembl, Integr8 and 3 other sources. | |
| Ecoli001 | 4604 | 421 | 9280 | NJ,ML(JTT,WAG) | 421 eukaryotic, archaeal and bacterial species from Integr8. | |
| Scerevisiae001 | 5811 | 421 | 5811 | NJ,ML(JTT) | The same species set as Ecoli001 | |
| Total | 32 003 | 443 | 172 324 |
For each phylome included in the current release of PhylomeDB, the PhylomeDB internal code, the number of seed proteins, the number of species included, the total number of phylogenetic trees, the phylogenetic reconstruction methods and a brief description is provided. Phylogenetic methods are indicated as follows: Neighbor Joining (NJ), Bayesian analysis (Bayesian) and Maximum Likelihood (ML), which can be performed using JTT, WAG, Blosum62 (B62), RtREV and MtREV evolutionary models. Bayesian analysis was always performed using the evolutionary model that rendered the best likelihood in the ML analysis.
Figure 3.Visualization of phylogenetic trees: different views of the ML phylogenetic tree, using JTT evolutionary model, for the seed protein (Hsa0017176), as displayed by the environment for tree exploration tool (ETE). Trees can be represented in rectangular (A), circular (B) or radial (C) modes. Several information labels such as support values or branch distances (A) can be displayed on nodes and edges, which can also be colored to indicate different evolutionary event. In (A) an example is shown where the lineages leading to the seed protein (Hsa0017176) are colored in blue or red to mark duplications and speciation events, respectively. This feature is shown after running the orthology prediction algorithm by clicking on the ‘orthologs’ icon. Using the ETE toolbar (icons on the top of the section A), the user can navigate within the tree and browse it using different visualization options.
Figure 2.Visualization of multiple sequence alignments: Example of the trimmed (clean) multiple sequence alignment of the example entry (Hsa0017176) as displayed by JalView (17). Sequence names are indicated by PhylomeID names, which include the three-letters species code followed by a number. The levels of sequence conservation and quality of each column, as well as a consensus sequence are indicated bellow. Colors, sequence format and visualization options can be changed using JalView interface.