| Literature DB >> 33392435 |
Peifen Zhang1, Tanya Z Berardini1, Dustin Ebert2, Qian Li1, Huaiyu Mi2, Anushya Muruganujan2, Trilok Prithvi1, Leonore Reiser1, Swapnil Sawant1, Paul D Thomas2, Eva Huala1.
Abstract
We aim to enable the accurate and efficient transfer of knowledge about gene function gained from Arabidopsis thaliana and other model organisms to other plant species. This knowledge transfer is frequently challenging in plants due to duplications of individual genes and whole genomes in plant lineages. Such duplications result in complex evolutionary relationships between related genes, which may have similar sequences but highly divergent functions. In such cases, functional inference requires more than a simple sequence similarity calculation. We have developed an online resource, PhyloGenes (phylogenes.org), that displays precomputed phylogenetic trees for plant gene families along with experimentally validated function information for individual genes within the families. A total of 40 plant genomes and 10 non-plant model organisms are represented in over 8,000 gene families. Evolutionary events such as speciation and duplication are clearly labeled on gene trees to distinguish orthologs from paralogs. Nearly 6,000 families have at least one member with an experimentally supported annotation to a Gene Ontology (GO) molecular function or biological process term. By displaying experimentally validated gene functions associated to individual genes within a tree, PhyloGenes enables functional inference for genes of uncharacterized function, based on their evolutionary relationships to experimentally studied genes, in a visually traceable manner. For the many families containing genes that have evolved to perform different functions, PhyloGenes facilitates the use of evolutionary history to determine the most likely function of genes that have not been experimentally characterized. Future work will enrich the resource by incorporating additional gene function datasets such as plant gene expression atlas data.Entities:
Year: 2020 PMID: 33392435 PMCID: PMC7773024 DOI: 10.1002/pld3.293
Source DB: PubMed Journal: Plant Direct ISSN: 2475-4455
Represented taxa and representation among gene families
| Counts | |
|---|---|
| Number of species in PhyloGenes | 50 |
| Dicots | 25 |
| Monocots | 10 |
| Basal flowering plants | 1 |
| Spike mosses | 1 |
| Mosses | 1 |
| Green algae | 2 |
| Animals | 6 |
| Fungi | 2 |
| Other Eukaryotes | 1 |
| Bacteria | 1 |
| Genes (protein‐coding) | 1,259,624 |
| Taxonomic range of gene families | |
| All kingdom | 1,061 |
| Eukaryotes | 3,608 |
| Viridiplantae | 858 |
| Embryophyta | 1,172 |
| Tracheophyta | 176 |
| Magnoliophyta | 461 |
| Fabids | 7 |
| Brassicaceae | 9 |
| Solanacea | 2 |
| Poacea | 72 |
| Chlorophyta | 45 |
| Other | 1,048 |
FIGURE 1Species tree of the 50 genomes included in PhyloGenes 2.1 (a) and the distribution of gene family sizes (b)
FIGURE 2A screenshot of a gene family page showing the tree graph panel on the left and the gene information panel on the right. Red arrows indicate (a) an ‘expand all’ icon; (b) column settings control; (c) toggle to view multiple sequence alignment; (d) option for pruning tree by removing species; and (e) operations button for accessing tools to highlight, prune, download, or save the tree. Yellow flasks indicate genes with experimentally determined function, green squares indicate phylogenetically inferred functions from PAINT annotation
Comparison of PhyloGenes to similar resources
| PhyloGenes | PANTHER | PLAZA | EnsemblPlants | Phytozome | OrthoDB | |
|---|---|---|---|---|---|---|
| No. of plant genomes | 40 | 40 | 127 | 79 | 93 | 117 |
| No. of non‐plant genomes | 10 | 102 | 0 | 5 | 0 | 7,167 |
| Displays phylogenetic tree |
|
|
|
|
|
|
| User can add sequences to tree |
|
|
|
|
|
|
| User can remove sequences from tree |
|
|
|
|
|
|
| Displays known function of family members next to tree |
|
|
|
|
|
|
| Other gene information displayed next to tree | MSA | MSA | protein domain, gene structure | aligned region |
|
|
PhyloGenes and PANTHER can add user sequences (one at a time) by tree grafting without altering the tree's original topology. PLAZA can add multiple user sequences to a tree by reconstructing the tree.
PhyloGenes lets user remove sequences of species not of interest without changing the tree's original topology. PLAZA removes sequences then reconstructs the tree.
PLAZA and Ensembl Plants both display known gene functions as GO terms with experimental evidence code. However, the information is shown on pages separate from the gene trees.
FIGURE 3A use case illustrating how PhyloGenes can be used to predict gene function for a grape gene (gene name in red text on the gene tree). For simplicity, this figure shows a pruned tree view including only genes from grape, Arabidopsis and fruit fly. Genes from Arabidopsis and fruit fly in this family have been experimentally characterized. Red dotted arrows indicate the ancestral node where the annotated function is likely to have arisen
FIGURE 4Screen shots of pruned tree views of gene families showing only foxtail millet and Arabidopsis genes. Foxtail millet orthologs of the Arabidopsis IRX9, IRX9H, and IRX14/IRX14H are indicated within red boxes, whereas paralogs of IRA9H are indicated within a blue box (a). Foxtail millet orthologs of the Arabidopsis IRX7 and IRX10 are indicated within red boxes in (b)