| Literature DB >> 21177658 |
Daniel Park1, Rohit Singh, Michael Baym, Chung-Shou Liao, Bonnie Berger.
Abstract
We describe IsoBase, a database identifying functionally related proteins, across five major eukaryotic model organisms: Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Mus musculus and Homo Sapiens. Nearly all existing algorithms for orthology detection are based on sequence comparison. Although these have been successful in orthology prediction to some extent, we seek to go beyond these methods by the integration of sequence data and protein-protein interaction (PPI) networks to help in identifying true functionally related proteins. With that motivation, we introduce IsoBase, the first publicly available ortholog database that focuses on functionally related proteins. The groupings were computed using the IsoRankN algorithm that uses spectral methods to combine sequence and PPI data and produce clusters of functionally related proteins. These clusters compare favorably with those from existing approaches: proteins within an IsoBase cluster are more likely to share similar Gene Ontology (GO) annotation. A total of 48,120 proteins were clustered into 12,693 functionally related groups. The IsoBase database may be browsed for functionally related proteins across two or more species and may also be queried by accession numbers, species-specific identifiers, gene name or keyword. The database is freely available for download at http://isobase.csail.mit.edu/.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21177658 PMCID: PMC3013743 DOI: 10.1093/nar/gkq1234
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Comparative consistency on the five eukaryotic networks
| IsoRankN | Homologene | OrthoMCL | |
|---|---|---|---|
| Mean entropy | 0.284 | 0.241 | |
| Mean normalized entropy | 0.255 | 0.215 | |
| Exact cluster ratio | 0.355 (4470/12 579) | 0.237 (1973/8326) | |
| Exact protein ratio | 0.469 (13 134/27 988) | 0.364 (5796/15 940) |
Mean entropy and mean normalized entropy of predicted clusters. Note that the boldface numbers represent the best performance with respect to each measure.
aThe fraction of predicted clusters that are ‘exact’, that is all contained proteins have the same GO term.
bThe fraction of proteins in exact clusters.
Figure 1.Web interface and output of IsoBase. (A and B) Webserver entry page. (C) Example of an output page when choosing to browse through all ortholog clusters predicted over the PPI network alignment of two species, D. melanogaster and S. cerevisiae. Mean entropy scores normalized by the number of distinct GO terms for an ortholog cluster are displayed along with external sequence database links for each ortholog and associated KEGG and GO annotations.