| Literature DB >> 25964301 |
Yi Wang1, Devin Coleman-Derr2, Guoping Chen3, Yong Q Gu4.
Abstract
Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that is useful for genome wide comparisons and visualization of orthologous clusters. OrthoVenn provides coverage of vertebrates, metazoa, protists, fungi, plants and bacteria for the comparison of orthologous clusters and also supports uploading of customized protein sequences from user-defined species. An interactive Venn diagram, summary counts, and functional summaries of the disjunction and intersection of clusters shared between species are displayed as part of the OrthoVenn result. OrthoVenn also includes in-depth views of the clusters using various sequence analysis tools. Furthermore, OrthoVenn identifies orthologous clusters of single copy genes and allows for a customized search of clusters of specific genes through key words or BLAST. OrthoVenn is an efficient and user-friendly web server freely accessible at http://probes.pw.usda.gov/OrthoVenn or http://aegilops.wheat.ucdavis.edu/OrthoVenn.Entities:
Mesh:
Year: 2015 PMID: 25964301 PMCID: PMC4489293 DOI: 10.1093/nar/gkv487
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Total numbers of categorized protein sequences in OrthoVenn
| Category | Number of species | Number of protein sequences |
|---|---|---|
| Vertebrates | 69 | 1 276 453 |
| Metazoa | 55 | 1 020 988 |
| Protists | 32 | 458 802 |
| Fungi | 52 | 567 086 |
| Plants | 38 | 1 321 298 |
| Bacteria | 26 | 56 830 |
| Total | 272 | 4 701 457 |
Figure 1.A results page in OrthoVenn. (A) Venn diagram showing the distribution of shared gene families (orthologous clusters) among Aegilops tauschii, Brachypodium distachyon, Oryza sativa, Sorghum bicolor, Hordeum vulgare and Zea mays. The cluster number in each component is listed. (B) Color selector and display mode setting for the Venn diagram. (C) Counts of clusters in each genome. (D) Key word search and BLAST links for finding specific clusters in the result.
Figure 2.Distributions of Aegilops tauschii specific gene sets in biological process GO slim terms.
Figure 3.The annotation of cluster602 with different methods. (A) Composition of the cluster. (B) Multiple sequence alignment viewer. (C) Motifs in the proteins of the cluster. (D) Phylogenetic tree showing the inferred evolutionary relationships among the sequences in cluster602. (E) Network layout of the cluster. Nodes represent proteins and the edge width indicates the similarity between protein nodes. (F) The relationships between cluster602 and other clusters. Each node is a cluster and node size represents the number of proteins in the cluster. The edge weight means the amount of similar sequences by counting similar sequence pairs between the clusters.