| Literature DB >> 34718760 |
Diego Fuentes1,2, Manuel Molina1,2, Uciel Chorostecki1,2, Salvador Capella-Gutiérrez1, Marina Marcet-Houben1,2, Toni Gabaldón1,2,3.
Abstract
PhylomeDB is a unique knowledge base providing public access to minable and browsable catalogues of pre-computed genome-wide collections of annotated sequences, alignments and phylogenies (i.e. phylomes) of homologous genes, as well as to their corresponding phylogeny-based orthology and paralogy relationships. In addition, PhylomeDB trees and alignments can be downloaded for further processing to detect and date gene duplication events, infer past events of inter-species hybridization and horizontal gene transfer, as well as to uncover footprints of selection, introgression, gene conversion, or other relevant evolutionary processes in the genes and organisms of interest. Here, we describe the latest evolution of PhylomeDB (version 5). This new version includes a newly implemented web interface and several new functionalities such as optimized searching procedures, the possibility to create user-defined phylome collections, and a fully redesigned data structure. This release also represents a significant core data expansion, with the database providing access to 534 phylomes, comprising over 8 million trees, and homology relationships for genes in over 6000 species. This makes PhylomeDB the largest and most comprehensive public repository of gene phylogenies. PhylomeDB is available at http://www.phylomedb.org.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34718760 PMCID: PMC8728271 DOI: 10.1093/nar/gkab966
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) Example of the search functionality in PhylomeDB. Users can choose among four different search approaches, shown in different tabs: Search by gene (left-most tab), in which users can search a gene tree by PhylomeID, gene name or external identifiers; Search by sequence (middle-left) in which users can provide a protein sequence which is going to be used for similarity search against sequences in PhylomeDB; Search by phylome (middle-right) in which users can search for specific phylomes among those publicly available; Finally, collection search (right-most) allows users to filter gene and phylome searches to specific collections. (B) Example of the integrated tree visualization showing the gene SHP1 from the yeast Saccharomyces cerevisiae. Several items can be distinguished: The top panel (I) allows the user to switch among available trees, including the ones containing the target protein sequence as seed as well as the ones in which that sequence is present but is not seed (i.e: collateral trees). The tool panel (II) above the tree has multiple elements: it can open a drop-down list of tree features: to interact with during the tree visualization, it can generate a hard link of the tree, download the tree image in .PNG format or download the orthology relationships within the tree in OrthoXML format (48). The tree features pop-up (III) allows the user to change the number of attributes displayed by the image. The search pop-up (IV) allows highlighting specific nodes that match the query term for different categories such as the species name. In addition, clicking on the nodes and leaves will generate a pop-up menu with multiple options such as collapsing nodes, switching sister branches, rerooting, and more. There is also a domain and sequence panel in which PFAM motifs are represented by different shapes, lengths and colors (V). They can be clicked and a direct link will redirect the motif to the original PFAM entry. Finally, the tree legend (VI) indicates the rooting strategy followed in this tree and its classification for the rest of evolutionary events.
Figure 2.Plots depicting the results obtained in two test sets of the QFO 2020 benchmark when comparing different approaches to orthology prediction based on phylome information. Graphs represented here correspond to the generalized species tree discordance test (G_STD2) run on the set of fungal (graphs A and C) and vertebrate (graphs B and D) orthology predictions. This test compares a gene tree reconstructed based on the submitted orthology predictions to a pre-computed, binary, species tree. In the x-axis we find the number of completed gene trees obtained from the submitted orthologs and in the y-axis the robinson and foulds (RF) measure that calculates the number of shared bipartitions between the species tree and the gene tree and normalized by the total number of bipartitions in both trees. Graphs A and B compare results obtained using four different rooting methods: in pink rooting to the farthest sequence from an outgroup taxon (oldest), in orange rooting to the leaf that is farthest located from the seed, in cyan rooting based on minimizing the reconciliation cost and in purple using midpoint rooting. Graphs C and D compare different ways to filter orthology predictions. In green are all found orthologous pairs, in brown all orthologous pairs with a consistency score above 0.5, in yellow all possible orthologous pairs involving the seed protein, and in blue all orthologous pairs involving the seed and with a consistency score above 0.5. Grey coloured dots represent results obtained by other methods and were extracted from the QFO public results 2020 (https://orthology.benchmarkservice.org/proxy/). Size of the dots is relative to the number of orthologs in the dataset. Square found in graph D indicates which sets of data are found in the region as they overlap.
Representative projects where PhylomeDB has been coupled to annotation and first analysis of newly sequenced genomes
| Species (common name) | Phylomedb ID | Reference |
|---|---|---|
|
| ||
|
| 215–222 | ( |
|
| 817 | ( |
|
| 152 | ( |
|
| 8–11 | ( |
|
| 147 | ( |
|
| ||
| 48 bird species | 225–230 | ( |
|
| 18 | ( |
|
| 583 and 584 | ( |
|
| 277 and 278 | ( |
|
| ||
|
| 701–706 | ( |
|
| 134–136 | ( |
|
| 177 | ( |
|
| 196 | ( |
|
| 787 | ( |
|
| ||
|
| 279–283 | ( |
|
| 252–255 | ( |
|
| 233–236 | ( |
|
| 777 | ( |
First column indicates the name of the species of interest for the project, the second column lists the phylomeID for the phylomes reconstructed as part of the project and in the third column is the reference to the publication.