| Literature DB >> 18411203 |
Abstract
We present DAhunter, a web-based server that identifies homologous proteins by comparing domain architectures, the organization of protein domains. A major obstacle in comparison of domain architecture is the existence of 'promiscuous' domains, which carry out auxiliary functions and appear in many unrelated proteins. To distinguish these promiscuous domains from protein domains, we assigned a weight score to each domain extracted from RefSeq proteins, based on its abundance and versatility. A domain's score represents its importance in the 'protein world' and is used in the comparison of domain architectures. In scoring domains, DAhunter also considers domain combinations as well as single domains. To measure the similarity of two domain architectures, we developed several methods that are based on algorithms used in information retrieval (the cosine similarity, the Goodman-Kruskal gamma function, and domain duplication index) and then combined these into a similarity score. Compared with other domain architecture algorithms, DAhunter is better at identifying homology. The server is available at http://www.dahunter.kr and http://localodom.kobic.re.kr/dahunter/index.htm.Entities:
Mesh:
Year: 2008 PMID: 18411203 PMCID: PMC2447808 DOI: 10.1093/nar/gkn172
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of proteins, architectures and domain units of eukaryota, bacteria and archaea (RefSeq proteins)
| Kingdom | Total proteins | Proteins with Pfam domains | Unique architectures | Domain units | |
|---|---|---|---|---|---|
| Single | Double | ||||
| Eukaryote | 1 193 766 | 750 267 | 32 737 | 4764 | 2435 |
| Bacteria | 2 781 568 | 2 170 351 | 25 913 | 4441 | 2002 |
| Archaea | 108 190 | 77 785 | 4399 | 1301 | 229 |
aDue to the small number of archaic organisms in RefSeq, the total number of archaic proteins is relatively small, compared with those of eukaryote and bacteria. The number of organisms in eukaryote, bacteria and archea in RefSeq is 1470, 1079 and 67, respectively.
Figure 1.Schematic of DAhunter workflow. The DAhunter pipeline consists of three major steps: (a) query processing, (b) comparing domain architectures and (c) sorting matched domain architectures.
Figure 2.Screenshot of DAhunter results: (a) domain architecture of a query protein, (b) matched domain architectures and (c) domain unit information.