| Literature DB >> 17712414 |
Sven Heinicke1, Michael S Livstone, Charles Lu, Rose Oughtred, Fan Kang, Samuel V Angiuoli, Owen White, David Botstein, Kara Dolinski.
Abstract
Many biological databases that provide comparative genomics information and tools are now available on the internet. While certainly quite useful, to our knowledge none of the existing databases combine results from multiple comparative genomics methods with manually curated information from the literature. Here we describe the Princeton Protein Orthology Database (P-POD, http://ortholog.princeton.edu), a user-friendly database system that allows users to find and visualize the phylogenetic relationships among predicted orthologs (based on the OrthoMCL method) to a query gene from any of eight eukaryotic organisms, and to see the orthologs in a wider evolutionary context (based on the Jaccard clustering method). In addition to the phylogenetic information, the database contains experimental results manually collected from the literature that can be compared to the computational analyses, as well as links to relevant human disease and gene information via the OMIM, model organism, and sequence databases. Our aim is for the P-POD resource to be extremely useful to typical experimental biologists wanting to learn more about the evolutionary context of their favorite genes. P-POD is based on the commonly used Generic Model Organism Database (GMOD) schema and can be downloaded in its entirety for installation on one's own system. Thus, bioinformaticians and software developers may also find P-POD useful because they can use the P-POD database infrastructure when developing their own comparative genomics resources and database tools.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17712414 PMCID: PMC1942082 DOI: 10.1371/journal.pone.0000766
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparative genomics web resources.
| Name | Description | Ortholog prediction | Larger seq. families | Disease information | Curated literature |
| Clusters of Orthologous Groups (COGs/KOGs) | Provides groups of orthologous proteins for seven eukaryotic species; the construction protocol involves manual curation | Yes | Yes | No | No |
| Eukaryotic Gene Orthologs (EGO) | Displays predicted orthologs derived from several eukaryotic genomes based on gene alignments | Yes | No | No | No |
| Homologene | Provides automated predictions of homologs among the genes of several eukaryotes | No | Yes | Yes | No |
| Inparanoid | Houses pair-wise groups of orthologous proteins for multiple species | Yes | No | No | No |
| OrthoDisease | Uses the Inparanoid algorithm to generate pair-wise orthologs between human disease genes and genes from other species | Yes | No | Yes | No |
| OrthoMCL-DB | Utilizes a Markov Cluster algorithm to predict orthologous groups of proteins for multiple species simultaneously | Yes | No | No | No |
| Sybil (S. Angiuoli and O. White, in preparation) | Uses Jaccard clustering to group sequences based on pair-wise BLAST analysis | No | Yes | No | No |
| YOGY | Retrieves orthologous proteins from four different resources: KOGs, Inparanoid, Homologene, and OrthoMCL-DB | Yes | No | No | Yes (only budding and fission yeast) |
| P-POD (This study) | Orthologs and Jaccard clusters | Yes | Yes | Yes | Yes |
Figure 1Steps in the analysis pipeline.
Components of the analysis pipeline.
| Program | Version | Source |
| GMOD::Loader | This study | |
| WU-BLAST | 2.0MP-WashU 10-May-2005 |
|
| OrthoMCL | Version 1.2 14-March-2005 |
|
| MCL | Version 1.005, 05-118 |
|
| Jaccard Clustering | NA | S. Angiuoli and O. White (in preparation) |
| Clustal W | Version 1.83 |
|
| PHYLIP | Version 3.64 |
|
| createTree | This study |
Sources and numbers of sequences analyzed.
| Organism | Proteins | Database | Filename |
|
| 6704 | SGD | orf_trans_all.fasta.gz |
|
| 33869 | ENSEMBL | Homo_sapiens.NCBI35.nov.pep.fa.gz |
|
| 36471 | ENSEMBL | Mus_musculus.NCBIM34.nov.pep.fa |
|
| 32143 | ENSEMBL | Danio_rerio.ZFISH5.nov.pep.fa |
|
| 19178 | FlyBase | dmel-all-translation-r4.2.1.fa |
|
| 22858 | WormBase | wormpep150.fa |
|
| 30690 | TAIR | TAIR6_pep_20051108.fa |
|
| 5363 | PlasmoDB | Pfa3D7_WholeGenome_Annotated_PEP_2005.2.11.fa |
Figure 2Screenshots of the P-POD web interface.
(A) A portion of the results page for the DPM1 OrthoMCL family is shown superimposed on the search form. Results from OrthoMCL are provided, and a link to the larger Jaccard family (B) is also available. Disease information from OMIM is displayed, as well as any relevant disease or cross-complementation literature.
Number of proteins in each organism found in OrthoMCL or Jaccard families.
| Organism | OrthoMCL | Jaccard | Orphan (% of total proteome) |
|
| 4,333 | 3,660 | 2,176 (32%) |
|
| 27,606 | 29,315 | 3,193 (9%) |
|
| 29,214 | 31,388 | 3,902 (11%) |
|
| 27,602 | 28,968 | 1,903 (6%) |
|
| 16,015 | 15,048 | 2,503 (13%) |
|
| 18,070 | 16,308 | 4,078 (7%) |
|
| 27,987 | 25,819 | 2,279 (13%) |
|
| 3,909 | 2,293 | 1,284 (33%) |
Functional conservation vs. ortholog prediction: comparing experimental results with the OrthoMCL ortholog predictions for disease-related families.
| OrthoMCL | Experimental | Yeast gene | Protein(s) tested | Citation |
| No | No |
|
|
|
| No | No |
|
|
|
| No | No |
|
|
|
| No | No |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| Yes | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| No | Yes |
|
|
|
| Yes | No |
|
|
|
| Yes | No |
|
|
|
| Yes | No |
|
|
|
| Yes | No |
|
|
|
| Yes | No |
|
|
|
In all but one of these experiments, the yeast gene was mutated and the gene from the other organism was tested for the ability to complement the mutant phenotype. In the one exception, the yeast gene DPM1 was expressed in mouse. In the OrthoMCL column, “Yes” indicates that the OrthoMCL algorithm placed the two proteins in the same ortholog family, while “No” indicates it did not. In the Experimental column, “Yes” indicates functional complementation, while “No” indicates none. Thus, when both columns are the same, the OrthoMCL prediction is consistent with the experimental result i.e. in the cases where both are “Yes,” the predicted orthologs are functionally conserved, and when both are “No,” the proteins are not predicted to be orthologs, and they are not functionally conserved.
Figure 3OrthoMCL family of the alpha tubulins.
This OrthoMCL family contains only the alpha tubulins, while the tubulin family generated by the Jaccard family (too large to be shown here) contains the alpha, beta, and gamma tubulins.
Figure 4The MET3/MET14 families.
(A) MET14 Jaccard family, and (B) MET3/MET14 OrthoMCL family.
Figure 5OrthoMCL and Jaccard clustering results for the second largest RNA polymerase subunit families of S. cerevisiae.
The second largest subunits of RNA polymerase I, II, and III in yeast are named RPA135, RPB2, and RET1, respectively. (A) Phylogenetic tree display of OrthoMCL results showing individual yeast subunit RPA135 and its predicted orthologs resolved into a distinct family. OrthoMCL results showing yeast RNA polymerase subunits RET1 (B) and RPB2 (C) resolved into separate families of orthologs. (D) Jaccard clustering results showing a “super family” of related RNA polymerase subfamilies. Arrows from each OrthoMCL family on the left point to the separate subfamilies in the Jaccard results. I to IV on the right of each tree indicates RNA polymerase subfamily. The second largest subunits for a fourth RNA polymerase, Pol IV, unique to plants were resolved into their own distinct two-member family by the OrthoMCL program (not shown), and were appropriately clustered with this superfamily by the Jaccard clustering method. (Adapted from figure 2 of [15])
Conservation of yeast proteins involved in N-linked glycosylation.
| Function | Yeast gene | Human gene | CDG (OMIM) |
|
|
|
|
|
|
|
|
| DHDDS | x | x | x | x | |||
|
| TMEM15 | x | x | x | x | x | |||
|
| DPM1 | Ie (608799) | x | x | x | x | x | x | |
|
| ALG5 | x | x | x | x | x | |||
|
| DOLPP1 | x | x | x | x | ||||
|
|
| DPAGT1 | Ij (608093) | x | x | x | x | x | x |
|
| GLT28D1 | x | x | x | x | x | x | ||
|
| unnamed | x | x | x | x | x | x | ||
|
| ALG1 | Ik (608540) | x | x | x | x | x | ||
|
| ALG2 | Ii (607906) | x | x | x | x | |||
|
| unnamed | x | x | x | x | x | |||
|
| RFT1 | x | x | x | x | ||||
|
| ALG3 | Id (601110) | x | x | x | x | |||
|
| ALG9 | Il (608776) | x | x | x | x | x | ||
|
| ALG12 | Ig (607143) | x | x | x | x | |||
|
| ALG6 | Ic (603147) | x | x | x | x | |||
|
| ALG8 | Ih (608104) | x | x | x | x | x | ||
|
| ALG10/KCR1 | x | x | x | |||||
|
|
| RPN1 | x | x | x | x | x | ||
|
| DAD1 | x | x | x | x | x | |||
|
| TUSC3 | x | x | ||||||
|
| ITM1 | x | x | x | x | x | x | ||
|
| DDOST | x | x | x | x | x | x | ||
|
|
| GCS1 | IIb (606056) | x | x | x | x | x | |
|
| GANAB | x | x | x | x | x | |||
|
| MAN1B1 | x | x | x | x | x |
Genes are broadly categorized by function. Human genes are identified by name when possible and the corresponding congenital disorders of glycosylation (CDG, with OMIM ID) are shown. For A. thaliana, C. elegans, D. melanogaster, D. rerio, M. musculus, and P. falciparum, boxes marked with “x” indicate that a peptide from this organism was placed in the same OrthoMCL family with the yeast gene. Not shown: SWP1 is homologous to human ribophorin II [30], and SWP1, OST4, OST5, and OST6 do not lie in ortholog families.