| Literature DB >> 22301074 |
Sarah Burge1, Elizabeth Kelly, David Lonsdale, Prudence Mutowo-Muellenet, Craig McAnulla, Alex Mitchell, Amaia Sangrador-Vegas, Siew-Yit Yong, Nicola Mulder, Sarah Hunter.
Abstract
InterPro amalgamates predictive protein signatures from a number of well-known partner databases into a single resource. To aid with interpretation of results, InterPro entries are manually annotated with terms from the Gene Ontology (GO). The InterPro2GO mappings are comprised of the cross-references between these two resources and are the largest source of GO annotation predictions for proteins. Here, we describe the protocol by which InterPro curators integrate GO terms into the InterPro database. We discuss the unique challenges involved in integrating specific GO terms with entries that may describe a diverse set of proteins, and we illustrate, with examples, how InterPro hierarchies reflect GO terms of increasing specificity. We describe a revised protocol for GO mapping that enables us to assign GO terms to domains based on the function of the individual domain, rather than the function of the families in which the domain is found. We also discuss how taxonomic constraints are dealt with and those cases where we are unable to add any appropriate GO terms. Expert manual annotation of InterPro entries with GO terms enables users to infer function, process or subcellular information for uncharacterized sequences based on sequence matches to predictive models. Database URL: http://www.ebi.ac.uk/interpro. The complete InterPro2GO mappings are available at: ftp://ftp.ebi.ac.uk/pub/databases/GO/goa/external2go/interpro2go.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22301074 PMCID: PMC3270475 DOI: 10.1093/database/bar068
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Flowchart outlining the decision process taken by InterPro curators in order to assign GO terms.
InterPro GO annotation coverage as of InterPro v34
| InterPro2GO, v 34.0 | Entries | Coverage (%) |
|---|---|---|
| Number of InterPro entries | 22 245 | 100 |
| Associated with at least one GO term | 10 721 | 46.2 |
| Unmapped entries | 11 974 | 54.8 |
| Of which conserved sites | 634 | 2.9 |
| Other unmappable entries | 3335 | 15.0 |
| Number of unique GO terms | 3568 | |
| Number of individual sequences annotated | 11 515 689 |
Figure 2.Application of GO molecular function terms to IPR002201 and its child entries. IPR002201 is a more general entry, which encompasses the proteins matched by its three child entries, IPR011908, IPR011910 and IPR011916. The increased specificity of the child entry can be reflected in the GO annotation; IPR011908 has a more specific Molecular Function term than the parent entry IPR002201.
Figure 3.Complementary domain and family GO mapping for InterPro entries that match the human cellular tumour antigen p53. Domain GO annotation enables the function(s) of the family to be attributed to individual domains within the protein.