| Literature DB >> 30445555 |
Arun Prasad Pandurangan1, Jonathan Stahlhacke2, Matt E Oates2, Ben Smithers2, Julian Gough1.
Abstract
Here, we present a major update to the SUPERFAMILY database and the webserver. We describe the addition of new SUPERFAMILY 2.0 profile HMM library containing a total of 27 623 HMMs. The database now includes Superfamily domain annotations for millions of protein sequences taken from the Universal Protein Recourse Knowledgebase (UniProtKB) and the National Center for Biotechnology Information (NCBI). This addition constitutes about 51 and 45 million distinct protein sequences obtained from UniProtKB and NCBI respectively. Currently, the database contains annotations for 63 244 and 102 151 complete genomes taken from UniProtKB and NCBI respectively. The current sequence collection and genome update is the biggest so far in the history of SUPERFAMILY updates. In order to the deal with the massive wealth of information, here we introduce a new SUPERFAMILY 2.0 webserver (http://supfam.org). Currently, the webserver mainly focuses on the search, retrieval and display of Superfamily annotation for the entire sequence and genome collection in the database.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30445555 PMCID: PMC6324026 DOI: 10.1093/nar/gky1130
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
SUPERFAMILY annotation statistics for the UniProtKB and NCBI protein sequence collection
| No. of proteomes | No. of proteins | Proteins with assignments % | Amino acid coverage % | |||||
|---|---|---|---|---|---|---|---|---|
| UniProtKB | NCBI | UniProtKB | NCBI | UniProtKB | NCBI | UniProtKB | NCBI | |
| Eukaryota | 1272 | 781 | 194 81 055 | 17 857 765 | 56 | 67 | 38 | 39 |
| Archaea | 793 | 671 | 2 136 652 | 1 822 967 | 62 | 63 | 59 | 60 |
| Bacteria | 17 277 | 93 480 | 66 475 668 | 346 500 943 | 67 | 67 | 62 | 64 |
| Viruses | 43 902 | 7194 | 1 025 062 | 303 337 | 39 | 21 | 39 | 31 |
| Complete proteome | 63 244 | 102 151 | 89 118 437 | 90 495 662 | 64 | 67 | 55 | 62 |
Figure 1.SUPERFAMILY webserver 2.0. (A) Genome summary page, showing SUPERFAMILY domain annotation statistics for Homo sapiens genome. The page also provide links to view and download SUPERFAMILY domain assignments. (B) Domain annotation page showing SUPERFAMILY domain predictions for the protein sequence id ENSP00000257468 of Homo sapiens genome.