| Literature DB >> 25428371 |
Alex Mitchell1, Hsin-Yu Chang1, Louise Daugherty1, Matthew Fraser1, Sarah Hunter1, Rodrigo Lopez1, Craig McAnulla1, Conor McMenamin1, Gift Nuka1, Sebastien Pesseat1, Amaia Sangrador-Vegas1, Maxim Scheremetjew1, Claudia Rato1, Siew-Yit Yong1, Alex Bateman1, Marco Punta1, Teresa K Attwood2, Christian J A Sigrist3, Nicole Redaschi3, Catherine Rivoire3, Ioannis Xenarios4, Daniel Kahn5, Dominique Guyot5, Peer Bork6, Ivica Letunic6, Julian Gough7, Matt Oates7, Daniel Haft8, Hongzhan Huang9, Darren A Natale9, Cathy H Wu10, Christine Orengo11, Ian Sillitoe11, Huaiyu Mi12, Paul D Thomas12, Robert D Finn13.
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25428371 PMCID: PMC4383996 DOI: 10.1093/nar/gku1243
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.InterPro matches for UniProtKB entry Q3JCG5 showing predicted protein family membership, domains and sites.
Figure 2.Detailed InterPro member database match data for UniProtKB entry Q3JCG5.
Coverage of the major sequence databases UniProtKB and UniParc (the non-redundant protein sequence archive) by InterPro signatures
| Sequence database | Number of proteins in database | Number of proteins with one or more matches to InterPro |
|---|---|---|
| UniProtKB/Swiss-Prot | 546 000 | 525 376 (96.2%) |
| UniProtKB/TrEMBL | 79 824 243 | 66 591 418 (83.4%) |
| UniProtKB (total) | 80 370 243 | 67 116 794 (83.5%) |
| UniParc | 67 862 204 | 55 078 104 (81.2%) |
Figure 3.Number of entries provided by InterPro and its member databases per year.
Release version and number of member database signatures integrated into InterPro release 48.0
| Database | Release number | Total signatures | Integrated signatures | Integrated signatures (%) |
|---|---|---|---|---|
| CATH-Gene3D | 3.5.0 | 2626 | 1718 | 65.4 |
| HAMAP | 201311.27 | 1916 | 1912 | 99.8 |
| PANTHER | 9.0 | 59 948 | 3673 | 6.1 |
| PIRSF | 2.84 | 3251 | 3225 | 99.2 |
| PRINTS | 42 | 2106 | 2024 | 96.1 |
| PROSITE patterns | 20.97 | 1308 | 1290 | 98.6 |
| PROSITE profiles | 20.97 | 1062 | 1038 | 97.7 |
| Pfam | 27 | 14 831 | 14 134 | 95.3 |
| ProDom | 2006.1 | 1894 | 1117 | 59.0 |
| SMART | 6.2 | 1008 | 998 | 99.0 |
| SUPERFAMILY | 1.75 | 2019 | 1372 | 68.0 |
| TIGRFAMs | 13 | 4284 | 4265 | 99.6 |
Figure 4.The InterPro Domain Architecture tool add/remove domains pop-up window. The list of domains can be refined using either the search box (A) or drop down menu (B). Domains can be added or removed from the query using plus or minus buttons (C). The number of copies of a particular domain to add to the query is indicated (D). Selecting the Apply button (E) performs the query.
Figure 5.The InterPro Domain Architecture tool showing the results of searching with a VIT and 14-3-3 domain. Checking the ‘Order sensitivity’ option (A) means that domain order is taken into account in the results section (B). The domains can be reordered by dragging and dropping their graphical representations (C), or removed from the query by dragging them to the dustbin (D) or clicking on the [x] icon next to their name and accession (E). The InterPro accession string (F) summarizes the domain architecture composition.
Figure 6.Growth of the manually-annotated Swiss-Prot and automatically annotated TrEMBL sections of UniProtKB over the last decade.