| Literature DB >> 30398656 |
Alex L Mitchell1, Teresa K Attwood2, Patricia C Babbitt3, Matthias Blum1, Peer Bork4, Alan Bridge5, Shoshana D Brown3, Hsin-Yu Chang1, Sara El-Gebali1, Matthew I Fraser1, Julian Gough6, David R Haft7, Hongzhan Huang8, Ivica Letunic9, Rodrigo Lopez1, Aurélien Luciani1, Fabio Madeira1, Aron Marchler-Bauer10, Huaiyu Mi11, Darren A Natale12, Marco Necci13,14,15, Gift Nuka1, Christine Orengo16, Arun P Pandurangan6, Typhaine Paysan-Lafosse1, Sebastien Pesseat1, Simon C Potter1, Matloob A Qureshi1, Neil D Rawlings1, Nicole Redaschi5, Lorna J Richardson1, Catherine Rivoire5, Gustavo A Salazar1, Amaia Sangrador-Vegas1, Christian J A Sigrist5, Ian Sillitoe16, Granger G Sutton7, Narmada Thanki10, Paul D Thomas11, Silvio C E Tosatto13, Siew-Yit Yong1, Robert D Finn1.
Abstract
The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.Entities:
Mesh:
Year: 2019 PMID: 30398656 PMCID: PMC6323941 DOI: 10.1093/nar/gky1100
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Member database versions integrated into InterPro since release 61.0
| InterPro release | Member database update |
|---|---|
| 61.0 | SFLD (2), PANTHER (11.1) |
| 62.0 | CATH-Gene3D (4.1), HAMAP (201701.18), PROSITE patterns (20.132), PROSITE profiles (20.132) |
| 63.0 | Pfam (31.0) |
| 64.0 | CDD (3.16) |
| 65.0 | SFLD (3), PANTHER (12.0) |
| 66.0 | HAMAP (2017_10), PROSITE patterns (2017_09), PROSITE profiles (2017_09) |
| 67.0 | CATH-Gene3D (4.2) |
| 68.0 | HAMAP (2018_03), PROSITE patterns (2018_02) and PROSITE profiles (2018_02) |
| 69.0 | (MobiDB-lite update) |
| 70.0 | SFLD (4) |
Coverage of UniProtKB by InterPro signatures
| Sequence database | Number of proteins in database | Number of proteins with one or more matches to InterPro |
|---|---|---|
| UniProtKB/reviewed | 558 125 | 539 742 (96.7%) |
| UniProtKB/unreviewed | 124 797 108 | 100 920 355 (80.9%) |
| UniProtKB (total) | 125 355 233 | 101 460 097 (80.9%) |
Figure 1.InterPro coverage of amino acid residues in UniProtKB. (A) Unique residue coverage of UniProtKB by signatures integrated into InterPro, member database signatures awaiting integration, intrinsically disordered regions, and regions predicted to be signal peptides, transmembrane domains or coiled-coils. (B) Residue coverage of InterPro's contributing member databases. Residues matched by signatures integrated into InterPro are shown in green, and residues found only in signatures not yet integrated are shown in blue.
Figure 2.Example API queries. From top to bottom, the first example returns a count of the total number of entries in InterPro and its member databases. The second retrieves information on all InterPro entries. The third and fourth examples return information specific to InterPro entry IPR023411 and PANTHER entry PTHR10000, respectively. The fifth returns InterPro information for all UniProtKB sequences matching InterPro entry IPR00009. The final request returns details of the match between Pfam entry PF00020 and UniProkKB sequence O00220. Further details about the structure of the API URLs are given in (Supplementary Data Table S1).
Figure 3.Selecting data to download from the Browse page creates a link to an appropriately pre-filled form and API request on the Download page.
Figure 4.Intersecting (A) and non-intersecting (B) InterPro matches for the purpose of calculating homologous superfamily relationships.
Figure 5.Reciprocal ‘overlapping homologous superfamilies’ and ‘overlapping entries’ links on the homologous superfamily entry (left) and other InterPro entry (right) pages which display the relationships between these entry types.
Figure 6.The homologous superfamilies annotation track on the ProtVista view on the proteins page allows structural information to be placed in context with other annotations.
Figure 7.(A) Pfam, CATH-Gene3D and SUPERFAMILY domain matches for UniProtKB sequence A0A0Q0BJI4. The segments A1 and A2 form a discontinuous domain and segment B is an independent nested domain. (B) Example InterProScan XML output for the Pfam matches shown in (A).