| Literature DB >> 18940856 |
Sarah Hunter1, Rolf Apweiler, Teresa K Attwood, Amos Bairoch, Alex Bateman, David Binns, Peer Bork, Ujjwal Das, Louise Daugherty, Lauranne Duquenne, Robert D Finn, Julian Gough, Daniel Haft, Nicolas Hulo, Daniel Kahn, Elizabeth Kelly, Aurélie Laugraud, Ivica Letunic, David Lonsdale, Rodrigo Lopez, Martin Madera, John Maslen, Craig McAnulla, Jennifer McDowall, Jaina Mistry, Alex Mitchell, Nicola Mulder, Darren Natale, Christine Orengo, Antony F Quinn, Jeremy D Selengut, Christian J A Sigrist, Manjula Thimma, Paul D Thomas, Franck Valentin, Derek Wilson, Cathy H Wu, Corin Yeats.
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) integrates together predictive models or 'signatures' representing protein domains, families and functional sites from multiple, diverse source databases: Gene3D, PANTHER, Pfam, PIRSF, PRINTS, ProDom, PROSITE, SMART, SUPERFAMILY and TIGRFAMs. Integration is performed manually and approximately half of the total approximately 58,000 signatures available in the source databases belong to an InterPro entry. Recently, we have started to also display the remaining un-integrated signatures via our web interface. Other developments include the provision of non-signature data, such as structural data, in new XML files on our FTP site, as well as the inclusion of matchless UniProtKB proteins in the existing match XML files. The web interface has been extended and now links out to the ADAN predicted protein-protein interaction database and the SPICE and Dasty viewers. The latest public release (v18.0) covers 79.8% of UniProtKB (v14.1) and consists of 16 549 entries. InterPro data may be accessed either via the web address above, via web services, by downloading files by anonymous FTP or by using the InterProScan search software (http://www.ebi.ac.uk/Tools/InterProScan/).Entities:
Mesh:
Substances:
Year: 2008 PMID: 18940856 PMCID: PMC2686546 DOI: 10.1093/nar/gkn785
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Coverage of the major sequence databases UniProtKB, UniParc and UniMES by InterPro signatures
| Sequence database | Number of proteins in database | Number of proteins with >0 matches to InterPro | Number of proteins with >0 matches combined member database signatures |
|---|---|---|---|
| UniProtKB/Swiss-Prot | 397 539 | 369 830 (93.0%) | 379 897 (95.6%) |
| UniProtKB/TrEMBL | 6 212 793 | 4 628 221 (74.5%) | 4 894 258 (78.8%) |
| UniProtKB (Total) | 6 610 332 | 4 998 051 (75.6%) | 5 274 155 (79.8%) |
| UniParc | 17 718 252 | 12 211 006 (68.9%) | 13 290 858 (75.0%) |
| UniMES | 6 028 191 | 4 132 464 (68.6%) | 4 461 935 (74.0%) |
The number of proteins matching signatures from InterPro and those matching the full set of member database signatures are shown.
Figure 1.Trends in number of signatures integrated into a single entry, categorized by the year the entry was first created. Initially, these entries would have only contained signatures from the founding four consortium members. However, as other member databases joined, they also may have had signatures covering the same families and domains which consequently also became integrated into these entries, leading to the totals we see today. Note that the number of signatures integrated in a single year can vary (between 1000 and 5000 signatures) dependent on the member databases’ release cycles.