| Literature DB >> 15608180 |
Noam Kaplan1, Ori Sasson, Uri Inbar, Moriah Friedlich, Menachem Fromer, Hillel Fleischer, Elon Portugaly, Nathan Linial, Michal Linial.
Abstract
ProtoNet is an automatic hierarchical classification of the protein sequence space. In 2004, the ProtoNet (version 4.0) presents the analysis of over one million proteins merged from SwissProt and TrEMBL databases. In addition to rich visualization and analysis tools to navigate the clustering hierarchy, we incorporated several improvements that allow a simplified view of the scaffold of the proteins. An unsupervised, biologically valid method that was developed resulted in a condensation of the ProtoNet hierarchy to only 12% of the clusters. A large portion of these clusters was automatically assigned high confidence biological names according to their correspondence with functional annotations. ProtoNet is available at: http://www.protonet.cs.huji.ac.il.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608180 PMCID: PMC539961 DOI: 10.1093/nar/gki007
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Representation of selected species in ProtoNet
| Species | Proteins in ProtoNet 2.1 | Proteins in ProtoNet 4.0 |
|---|---|---|
| 8,507 | 47 641 | |
| 5,678 | 41 813 | |
| 2,049 | 22 603 | |
| 1,680 | 39 367 | |
| 153 | 8,434 |
Figure 1The ProtoBrowser allows viewing the near vicinity of a cluster in the ProtoNet hierarchy. Blue triangle-shaped icons represent protein clusters. The cluster currently being viewed is the cluster A268586, which appears in the center in red. Clusters that include proteins with 3D solved structures are marked PDB.
Figure 2Example of a cluster similarity matrix. Colored cells represent different degrees of similarity, ranging from white (no similarity: BLAST E-score higher than 100) to dark blue (high similarity, BLAST E-score close to 0). It is evident that the cluster A222801 is roughly divided into 3 subsets: in the upper left of the diagonal there are proteins that show no similarity to each other or to any protein in the cluster; in the center of the diagonal there is a subset of proteins that are similar to each other but to no other proteins; and at the bottom right of the diagonal there is another subset of proteins that are similar to each other but not to other proteins.