| Literature DB >> 22096229 |
Sarah Hunter1, Philip Jones, Alex Mitchell, Rolf Apweiler, Teresa K Attwood, Alex Bateman, Thomas Bernard, David Binns, Peer Bork, Sarah Burge, Edouard de Castro, Penny Coggill, Matthew Corbett, Ujjwal Das, Louise Daugherty, Lauranne Duquenne, Robert D Finn, Matthew Fraser, Julian Gough, Daniel Haft, Nicolas Hulo, Daniel Kahn, Elizabeth Kelly, Ivica Letunic, David Lonsdale, Rodrigo Lopez, Martin Madera, John Maslen, Craig McAnulla, Jennifer McDowall, Conor McMenamin, Huaiyu Mi, Prudence Mutowo-Muellenet, Nicola Mulder, Darren Natale, Christine Orengo, Sebastien Pesseat, Marco Punta, Antony F Quinn, Catherine Rivoire, Amaia Sangrador-Vegas, Jeremy D Selengut, Christian J A Sigrist, Maxim Scheremetjew, John Tate, Manjulapramila Thimmajanarthanan, Paul D Thomas, Cathy H Wu, Corin Yeats, Siew-Yit Yong.
Abstract
InterPro (http://www.ebi.ac.uk/interpro/) is a database that integrates diverse information about protein families, domains and functional sites, and makes it freely available to the public via Web-based interfaces and services. Central to the database are diagnostic models, known as signatures, against which protein sequences can be searched to determine their potential function. InterPro has utility in the large-scale analysis of whole genomes and meta-genomes, as well as in characterizing individual protein sequences. Herein we give an overview of new developments in the database and its associated software since 2009, including updates to database content, curation processes and Web and programmatic interfaces.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22096229 PMCID: PMC3245097 DOI: 10.1093/nar/gkr948
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Coverage of the major sequence databases UniProtKB, UniParc and UniMES by InterPro signatures
| Sequence database | Number of proteins in database | Number of proteins with one or more matches to InterPro (%) |
|---|---|---|
| UniProtKB/Swiss-Prot | 532 146 | 507 297 (95.3 %) |
| UniProtKB/TrEMBL | 16 886 838 | 13 365 742 (79.1 %) |
| UniProtKB (Total) | 17 418 984 | 13 873 039 (79.6 %) |
| UniParc | 28 628 639 | 20 974 897 (73.3 %) |
| UniMES | 6 028 191 | 4 442 162 (73.7 %) |
Figure 1.The ‘Overview' page on the new set of InterPro entry pages, including the family hierarchy for this entry, an extensive description of the family and cross references to three GO terms that are associated with this family. In this case, the entry comprises a single integrated PRINTS signature. Note the red 'F' icon that indicates that this entry describes a protein family.
Figure 2.The InterPro BioMart. This example illustrates the use of the BioMart to return a large set of data. In this case, a query has been built to return all proteins that are predicted to be members of the rhodopsin-like GPCRs (IPR000276) in Drosophila melanogaster.
InterPro DAS data sources
| DAS registry ID | Data source name | URL | Provision |
|---|---|---|---|
| DS_327 | InterPro | details of InterPro signature matches coordinated on UniProtKB protein sequences | |
| DS_1028 | InterPro-matches-overview | summary matches of InterPro entries coordinated on UniProtKB protein sequences and is a default data source on the Dasty3 DAS client [ | |
| DS_1029 | InterPro-UniParc-matches | details of InterPro member database signature matches coordinated on UniParc (UniProt Archive) protein sequences. |