Literature DB >> 34761267

The IntAct database: efficient access to fine-grained molecular interaction data.

Noemi Del Toro1, Anjali Shrivastava1, Eliot Ragueneau1, Birgit Meldal1, Colin Combe2, Elisabet Barrera1, Livia Perfetto1,3, Karyn How4, Prashansa Ratan4, Gautam Shirodkar4, Odilia Lu4, Bálint Mészáros5, Xavier Watkins1, Sangya Pundir1, Luana Licata6, Marta Iannuccelli6, Matteo Pellegrini7, Maria Jesus Martin1, Simona Panni8, Margaret Duesbury1,4, Sylvain D Vallet9, Juri Rappsilber2,10, Sylvie Ricard-Blum9, Gianni Cesareni6, Lukasz Salwinski4, Sandra Orchard1, Pablo Porras1, Kalpana Panneerselvam1, Henning Hermjakob1.   

Abstract

The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2022        PMID: 34761267      PMCID: PMC8728211          DOI: 10.1093/nar/gkab1006

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Biomolecular interactions are the fabric underlying almost all processes in living organisms, and they are determined by a broad array of experimental approaches, from focussed studies of pairwise interactions to large-scale determination of 10 000s of interactions in standardised high throughput experiments. However, observed molecular interactions are highly dependent on the biological and experimental conditions under which they are determined. Cellular systems, experimental protein tags sequence modifications, and experimental approaches all heavily influence the observed interaction. Since its inception in 2005, members of the International Molecular Exchange Consortium (IMEx) (1) have collaboratively curated molecular interaction data from the scientific literature and from direct data depositions, emphasizing a deep curation model aiming to capture interaction reports in sufficient detail to support subsequent comprehensive data presentation, aggregation, and analysis. In 2017, the IMEx Consortium became an ELIXIR core data resource (2), recognising it as part of the fundamental infrastructure for life sciences. For an in-depth review of the current IMEx data model, curation strategies and collaborations, see (3). The IntAct database of molecular interactions is used by all currently active IMEx partners (IntAct, DIP (4), UniProt (5), MINT (6), MatrixDB (7), UCL ICS, IID (8)) as a common curation platform, and also acts as a common data dissemination platform, in parallel to the partners’ own websites. While the detailed IMEx interaction data has always been available through download in the feature-rich PSI-MI XML format (9,10), many annotation details were not conveniently accessible through the IntAct website, and often users are not aware of the depth of available annotations. We are increasingly addressing this issue through the release of targeted datasets, in particular for sequence variations impacting interactions, and through a completely redeveloped website, which provides comprehensive filter and display tools to make optimal use of the rich annotation available in the IntAct database.

Data Content

Since the last IntAct NAR publication (11), data content has grown from 408 000 (Jamuary 2014) to 1 114 500 (June 2021)) interaction evidences, and the number of referenced publications has risen from 12 500 to 22 500. This rapid increase is based on the integration of previously curated data from IMEx partners, as well as the ongoing curation work. The faster rise in interaction numbers compared to publication numbers reflects the increasing trend towards large-scale interaction studies. In the same period, interactions from 21 publications have been retracted, usually due to retraction of the supporting publication. Several new datasets have been released, including two key collections: the ‘Mutations dataset’ (6) and the ‘Coronavirus interactome’ (7).

Mutations dataset

This dataset contains annotations describing the effect of small sequence changes on protein interactions. Captured changes comprise both natural variants and experimentally introduced sequence changes. This dataset is continuously maintained and updated, and since the original publication in February 2019 (12) it has grown from 28 000 to 72 000 mutation annotations. In order to fully reflect the importance of this data and to improve accessibility to it for users, in addition to web interface changes (see below), we have also introduced a dedicated tab-delimited download file format (https://www.ebi.ac.uk/intact/download/datasets#mutations).

Coronavirus interactome

After the outbreak of the COVID19 pandemic in Europe in early March 2020, we initiated an IMEx-wide initiative to record molecular interaction data related to SARS-CoV-2 and other members of the Coronaviridae family of viruses, along with human protein interactions of potential relevance for the disease's aethiopathology. Since its publication in November 2020 (13), the Coronavirus interactome dataset has grown from 4400 interaction evidences derived from 151 publications to 9100 interaction evidences from 332 publications in June 2021, and is accessible at https://www.ebi.ac.uk/intact/resources/datasets#coronavirus. Work is still actively ongoing to capture novel interactions, and details of known interactions such as the effects of variants, to further enhance this dataset.

Curation Policies

Given the fast pace at which COVID19-related data has been generated, the IMEx Consortium decided to allow the curation of preprints when the scientific interest contained in these publications justifies it. We will periodically review and update these datasets to ensure only data from peer-reviewed publications is maintained in the database long-term. More information about IMEx's curation policy regarding preprints is provided at www.imexconsortium.org/curation/. Curation practices and controlled vocabularies/ontologies are continuously updated, driven by the development of new methods like BioID (14) (term in PSI-MI Ontology https://www.ebi.ac.uk/ols/ontologies/mi/terms?iri = http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMI_1314). The IntAct sibling resource, the Complex Portal (15), now provides a reference resource for biomolecular complexes, and we are annotating complexes with Complex Portal identifiers as interacting objects where possible, in addition to interactions of proteins with small molecules, nucleic acids and polysaccharides such as glycosaminoglycans.

Web Site

We have redeveloped the IntAct web site (https://www.ebi.ac.uk/intact) to provide efficient, user-friendly access to IntAct data content, with a focus on filter and display functionality to make the detailed interaction data accessible and useful through the user interface. The quick search provides autocompletion to facilitate selection of molecules of interest based on gene names, protein names, and accession numbers. The batch search supports multiple simultaneous query terms and subsequent result refinement. Results are shown both graphically and in tabular format, can be modified through comprehensive filter and visualisation options, and exported in both tabular and graphical formats. Figure 1 provides a view of the new IntAct web interface and its functionality, Figure 2 demonstrates the high level of detail provided for a single interaction. In addition to queries, species-specific interactomes (Figure 3) and datasets like ‘Alzheimers’ (16) are available from tiles on the home page.
Figure 1.

IntAct search results for UniProtKB:P0C6 × 7-PRO_0000037322. The option ‘Affected by Mutation’ has been selected in facet (A), highlighting corresponding interactions in bold pink. The minimum MI score (label B) has been set to 0.43. One interaction between SARS-CoV proteins, nsp10-nsp16 is highlighted (through mouse click) in purple (edge C). The interaction table (D) automatically shows only the highlighted interaction. The legend on the right (E) documents the representation of species, type of biomolecule, mutations and edge types. Clicking on the magnifying glass (icon F) provides detailed information on an interaction, as shown in Figure 2. Interactor positions can be manually rearranged through drag-and-drop, as done here. Figure is a modified screen capture from.https://www.ebi.ac.uk/intact/search?query=P0C6X7-PRO_0000037322&minMIScore=0.43&mutationStyle=true.

Figure 2.

This figure shows the interaction viewer for the highlighted edge from Figure 1. Features of the participants including the N-terminal tags (A) and all the mutations annotated for this interaction are displayed in the viewer (B) and also in the legend (C). All the features are mapped at the amino acid level of the proteins. Further details on the features are available from the features tab (D) below the Interaction viewer. Figure is a modified screen capture from https://www.ebi.ac.uk/intact/details/interaction/EBI-25506442.

Figure 3.

Species-specific interactomes are easily accessible from the ‘Interactomes’ tile of the home page.

IntAct search results for UniProtKB:P0C6 × 7-PRO_0000037322. The option ‘Affected by Mutation’ has been selected in facet (A), highlighting corresponding interactions in bold pink. The minimum MI score (label B) has been set to 0.43. One interaction between SARS-CoV proteins, nsp10-nsp16 is highlighted (through mouse click) in purple (edge C). The interaction table (D) automatically shows only the highlighted interaction. The legend on the right (E) documents the representation of species, type of biomolecule, mutations and edge types. Clicking on the magnifying glass (icon F) provides detailed information on an interaction, as shown in Figure 2. Interactor positions can be manually rearranged through drag-and-drop, as done here. Figure is a modified screen capture from.https://www.ebi.ac.uk/intact/search?query=P0C6X7-PRO_0000037322&minMIScore=0.43&mutationStyle=true. This figure shows the interaction viewer for the highlighted edge from Figure 1. Features of the participants including the N-terminal tags (A) and all the mutations annotated for this interaction are displayed in the viewer (B) and also in the legend (C). All the features are mapped at the amino acid level of the proteins. Further details on the features are available from the features tab (D) below the Interaction viewer. Figure is a modified screen capture from https://www.ebi.ac.uk/intact/details/interaction/EBI-25506442. Species-specific interactomes are easily accessible from the ‘Interactomes’ tile of the home page.

Implementation

The new IntAct public instance is deployed on the EMBL-EBI cloud using Kubernetes to manage the different containerized applications (the images have been built with Docker). The IntAct public interface is based on a Neo4j graph database and Apache Solr to enable the search and navigation features. An externally accessible API (https://www.ebi.ac.uk/intact/documentation/technical_corner#apis), developed in Java™ with the Spring framework to ease the implementation of the microservices architecture, serves data to both the web application and the Cytoscape app (17). The web frontend is a single page application implemented with the Angular framework together with the EMBL-EBI Visual framework for general styling (https://www.ebi.ac.uk/style-lab/websites/). The network display (Figure 1) is based on Cytoscape.js (18), the interaction detail view is the ComplexViewer (19). The IntAct web-based user interface has been specified using two rounds of user testing based on mockups. To provide a consistent user experience, we are co-ordinating the visualisation of interacting molecules in terms of shape (for molecule types) and default colour (for species) between the IntAct web interface and the IntAct Cytoscape app.

Perspectives

Tissue specificity

Recent research emphasizes fundamental differences among cell type specific interactomes (20). Detailed annotation of cell types/tissue has been standard practice in IMEx curation for a long time, but the information is currently partially in free text form and will benefit from standardisation and integration with ontologies like Experimental Factor Ontology (21), Brenda (22), Uberon (23), Cell Line Ontology (24), and Cellosaurus (25). We are currently working on the restructuring of cell type/tissue annotation and increasing exposure of these data through download files and user interfaces.

Rare diseases dataset

As part of our commitment to the clinical community, we are currently populating a rare disease dataset, with a focus on interactions affected by rare disease mutations. Approximately 5500 rare disease interactions have been annotated to date, assigning details such as kinetic parameters, variable experimental conditions or construct details, including binding surfaces and mutations that affect the interactions. The data features information about the amino acid changes, their effect over the interaction and full reference to the experimental interaction evidence from which it was extracted. Currently, around 98% of the annotations are mapped to human proteins, providing high-quality experimental evidence of sequence change effects which directly relate to existing variation data.

Credit attribution

The data presented here has been carefully curated over almost two decades by professional curators from twelve IMEx partners. To value scientific database curation as a key scientific activity in its own right, we are working on credit attribution for past and future IMEx curators through APICURON (26) and ORCID (https://orcid.org/).

Box text: key concepts

IMEx: The International Molecular Exchange Consortium (IMEx), founded in 2005, is an international collaboration of twelve interaction data resources which coordinate their curation strategies. IntAct is an IMEx founding member, and provides the web-based curation platform used by all current IMEx partners. Interaction evidence: Interactions may have two or more participating molecules, and the number of observable interactors may depend on both biological and experimental constraints. As an example, the yeast-two-hybrid array technology (27) typically identifies only pairs of interactors (binary interactions), while techniques like tandem affinity purification (TAP) (28) and BioID (29) may identify two or more interacting molecules (n-ary interactions). Observed n-ary interactions are stored as such in the IntAct database, but for some download files and for visualisation, counting and comparison purposes, they are expanded into multiple binary interactions. In addition, one publication may use more than one experimental method to determine an interaction. One interaction evidence is one pair of interacting molecules, observed by one experimental approach, reported by one publication. In this manuscript, we use ‘interaction’ as a synonym for the technically more correct term of ‘interaction evidence’. MI Score: The MI Score (30) is a quantitative estimate of the confidence in a given interaction. It is a normalized and weighted count of independent interaction evidence and associated experimental methods.

DATA AVAILABILITY

IntAct is open source, open data. The source code is available from https://github.com/intact-portal, all data is freely available through the web interface, API, and from https://www.ebi.ac.uk/intact/download under the CC BY 4.0 licence.
  30 in total

1.  The necdin interactome: evaluating the effects of amino acid substitutions and cell stress using proximity-dependent biotinylation (BioID) and mass spectrometry.

Authors:  Matthea R Sanderson; Katherine E Badior; Richard P Fahlman; Rachel Wevrick
Journal:  Hum Genet       Date:  2020-06-11       Impact factor: 4.132

2.  MINT, the molecular interaction database: 2012 update.

Authors:  Luana Licata; Leonardo Briganti; Daniele Peluso; Livia Perfetto; Marta Iannuccelli; Eugenia Galeota; Francesca Sacco; Anita Palma; Aurelio Pio Nardozza; Elena Santonico; Luisa Castagnoli; Gianni Cesareni
Journal:  Nucleic Acids Res       Date:  2011-11-16       Impact factor: 16.971

3.  Uberon, an integrative multi-species anatomy ontology.

Authors:  Christopher J Mungall; Carlo Torniai; Georgios V Gkoutos; Suzanna E Lewis; Melissa A Haendel
Journal:  Genome Biol       Date:  2012-01-31       Impact factor: 13.583

4.  Merging and scoring molecular interactions utilising existing community standards: tools, use-cases and a case study.

Authors:  J M Villaveces; R C Jiménez; P Porras; N Del-Toro; M Duesbury; M Dumousseau; S Orchard; H Choi; P Ping; N C Zong; M Askenazi; B H Habermann; Henning Hermjakob
Journal:  Database (Oxford)       Date:  2015-02-04       Impact factor: 3.451

5.  ComplexViewer: visualization of curated macromolecular complexes.

Authors:  Colin W Combe; Marine Dumousseau Sivade; Henning Hermjakob; Joshua Heimbach; Birgit H M Meldal; Gos Micklem; Sandra Orchard; Juri Rappsilber
Journal:  Bioinformatics       Date:  2017-11-15       Impact factor: 6.937

6.  UniProt: the universal protein knowledgebase in 2021.

Authors: 
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

7.  Broadening the horizon--level 2.5 of the HUPO-PSI format for molecular interactions.

Authors:  Samuel Kerrien; Sandra Orchard; Luisa Montecchi-Palazzi; Bruno Aranda; Antony F Quinn; Nisha Vinod; Gary D Bader; Ioannis Xenarios; Jérôme Wojcik; David Sherman; Mike Tyers; John J Salama; Susan Moore; Arnaud Ceol; Andrew Chatr-Aryamontri; Matthias Oesterheld; Volker Stümpflen; Lukasz Salwinski; Jason Nerothin; Ethan Cerami; Michael E Cusick; Marc Vidal; Michael Gilson; John Armstrong; Peter Woollard; Christopher Hogue; David Eisenberg; Gianni Cesareni; Rolf Apweiler; Henning Hermjakob
Journal:  BMC Biol       Date:  2007-10-09       Impact factor: 7.431

8.  Encompassing new use cases - level 3.0 of the HUPO-PSI format for molecular interactions.

Authors:  M Sivade Dumousseau; D Alonso-López; M Ammari; G Bradley; N H Campbell; A Ceol; G Cesareni; C Combe; J De Las Rivas; N Del-Toro; J Heimbach; H Hermjakob; I Jurisica; M Koch; L Licata; R C Lovering; D J Lynn; B H M Meldal; G Micklem; S Panni; P Porras; S Ricard-Blum; B Roechert; L Salwinski; A Shrivastava; J Sullivan; N Thierry-Mieg; Y Yehudi; K Van Roey; S Orchard
Journal:  BMC Bioinformatics       Date:  2018-04-11       Impact factor: 3.169

9.  Capturing variation impact on molecular interactions in the IMEx Consortium mutations data set.

Authors:  N Del-Toro; M Duesbury; M Koch; L Perfetto; A Shrivastava; D Ochoa; O Wagih; J Piñero; M Kotlyar; C Pastrello; P Beltrao; L I Furlong; I Jurisica; H Hermjakob; S Orchard; P Porras
Journal:  Nat Commun       Date:  2019-01-02       Impact factor: 14.919

10.  MatrixDB: integration of new data with a focus on glycosaminoglycan interactions.

Authors:  Olivier Clerc; Madeline Deniaud; Sylvain D Vallet; Alexandra Naba; Alain Rivet; Serge Perez; Nicolas Thierry-Mieg; Sylvie Ricard-Blum
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

View more
  6 in total

1.  PRECOGx: exploring GPCR signaling mechanisms with deep protein representations.

Authors:  Marin Matic; Gurdeep Singh; Francesco Carli; Natalia De Oliveira Rosa; Pasquale Miglionico; Lorenzo Magni; J Silvio Gutkind; Robert B Russell; Asuka Inoue; Francesco Raimondi
Journal:  Nucleic Acids Res       Date:  2022-05-26       Impact factor: 19.160

Review 2.  Construction and contextualization approaches for protein-protein interaction networks.

Authors:  Apurva Badkas; Sébastien De Landtsheer; Thomas Sauter
Journal:  Comput Struct Biotechnol J       Date:  2022-06-18       Impact factor: 6.155

3.  The 2022 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

4.  The European Bioinformatics Institute (EMBL-EBI) in 2021.

Authors:  Gaia Cantelli; Alex Bateman; Cath Brooksbank; Anton I Petrov; Rahuman S Malik-Sheriff; Michele Ide-Smith; Henning Hermjakob; Paul Flicek; Rolf Apweiler; Ewan Birney; Johanna McEntyre
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

Review 5.  The Intricacy of the Viral-Human Protein Interaction Networks: Resources, Data, and Analyses.

Authors:  Deeya Saha; Marta Iannuccelli; Christine Brun; Andreas Zanzoni; Luana Licata
Journal:  Front Microbiol       Date:  2022-04-21       Impact factor: 5.640

Review 6.  Multi-Omics Approaches and Resources for Systems-Level Gene Function Prediction in the Plant Kingdom.

Authors:  Muhammad-Redha Abdullah-Zawawi; Nisha Govender; Sarahani Harun; Nor Azlan Nor Muhammad; Zamri Zainal; Zeti-Azura Mohamed-Hussein
Journal:  Plants (Basel)       Date:  2022-10-05
  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.