| Literature DB >> 15608174 |
Andreas Heger1, Christopher Andrew Wilton, Ashwin Sivakumar, Liisa Holm.
Abstract
We used the Automatic Domain Decomposition Algorithm (ADDA) to generate a database of protein domain families with complete coverage of all protein sequences. Sequences are split into domains and domains are grouped into protein domain families in a completely automated process. The current database contains domains for more than 1.5 million sequences in more than 40,000 domain families. In particular, there are 3828 novel domain families that do not overlap with the curated domain databases Pfam, SCOP and InterPro. The data are freely available for downloading and querying via a web interface (http://ekhidna.biocenter.helsinki.fi:9801/sqgraph/pairsdb).Entities:
Mesh:
Year: 2005 PMID: 15608174 PMCID: PMC540050 DOI: 10.1093/nar/gki096
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Overview of domain families in ADDA. The number of families is given in the last row of each category label. (A) Mobile modules, domain families that co-occur with a variety of different domain families, constitute only a fraction of all domain families. Many domains only occur in single-domain proteins or are always associated with the same domain family (associated families). The majority of domain families contain only a single representative sequence on the 40% similarity level (singletons). (B) Taxonomic distribution of domain families over the three superkingdoms (Archaea, Bacteria and Eukaryota). Left: only associated domain families excluding singletons. Right: only mobile modules. Mobile modules tend to be more widely distributed than associated domains. (C) Annotation of domain families. Left: only associated domain families excluding singletons. Right: only mobile modules. Novel domain families do not overlap with domain families from Pfam, SCOP and InterPro. Mobile modules are well known to curated domain databases, but there are many novel domain families left to be explored.