Literature DB >> 29746660

INTERSPIA: a web application for exploring the dynamics of protein-protein interactions among multiple species.

Daehong Kwon¹, Daehwan Lee¹, Juyeon Kim¹, Jongin Lee¹, Mikang Sim¹, Jaebum Kim¹.

Abstract

Proteins perform biological functions through cascading interactions with each other by forming protein complexes. As a result, interactions among proteins, called protein-protein interactions (PPIs) are not completely free from selection constraint during evolution. Therefore, the identification and analysis of PPI changes during evolution can give us new insight into the evolution of functions. Although many algorithms, databases and websites have been developed to help the study of PPIs, most of them are limited to visualize the structure and features of PPIs in a chosen single species with limited functions in the visualization perspective. This leads to difficulties in the identification of different patterns of PPIs in different species and their functional consequences. To resolve these issues, we developed a web application, called INTER-Species Protein Interaction Analysis (INTERSPIA). Given a set of proteins of user's interest, INTERSPIA first discovers additional proteins that are functionally associated with the input proteins and searches for different patterns of PPIs in multiple species through a server-side pipeline, and second visualizes the dynamics of PPIs in multiple species using an easy-to-use web interface. INTERSPIA is freely available at http://bioinfo.konkuk.ac.kr/INTERSPIA/.

Entities: Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2018 PMID： 29746660 PMCID： PMC6031021 DOI： 10.1093/nar/gky378

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

A cell consists of diverse materials, such as DNAs, RNAs, proteins and metabolites, which are interlinked to carry out diverse functions. Especially, proteins perform most of biological functions through cascading interactions with each other by forming protein complexes (1–3). Hence, understanding interactions among proteins, called protein-protein interactions (PPIs), enables a better understanding of biological functions. With recent advances of experimental techniques for screening PPIs (4), a huge amount of PPI information in various species has been accumulated, and many PPI databases including STRING (5), DIP (6), MINT (7) and BioGRID (8) have been published. PPIs are not completely free from selection constraint during evolution, and various structures of PPIs are finally observed in different species (9). The dynamics of PPIs during evolution are attributed to changes in two distinct sources, one is a protein and the other is a PPI. Gain of new proteins leads to the creation of new interactions (10–12). Conversely, loss of proteins or alteration of protein structure removes interactions between functionally linked protein pairs (1,12,13). The PPIs themselves are also possible to be changed without any gain or loss of proteins during evolution, as shown in a recent study about rewiring of PPIs in the MAPK pathway between yeast and higher organisms (14). Therefore, the identification and analysis of PPI changes during evolution can give us new insight into the evolution of biological functions. A whole collection of PPIs in a cell, called an interactome, was analysed in many recent studies to identify evolutionarily conserved PPIs among bacterial species or among human, yeast, and fly (15–17). In addition, there have been many studies to develop alignment algorithms for protein interaction networks (18–23), which can be used to compare PPI networks in different species. However, they do not provide a web interface with various functions to easily retrieve, visualize and compare PPIs among multiple species. Many web interfaces for PPIs, such as PINV (24) and NetworkAnalyst (25), have been developed to search for interactions linked with queried proteins using a specific PPI database, and visualize them with various options for the analysis of a PPI network. Many PPI databases also contain their own web-based network viewer to visualize their PPI networks. However, most of these web applications focus on visualizing protein interactions associated with queried proteins in a chosen single species. A recently published web application mentha (26) supports the visualization of PPIs of queried proteins in multiple species together. However, mentha only shows separate networks of PPIs of different species, which is not suitable for comprehensive understanding of differences among different species. These lead to difficulties in the identification of different patterns of PPIs in different species and their functional consequences. To resolve these issues, we developed a website, called INTER-Species Protein Interaction Analysis (INTERSPIA). Given a set of proteins of user's interest, INTERSPIA discovers additional proteins that are functionally associated with the input proteins, searches for different patterns of PPIs in multiple species through a server-side pipeline, and visualizes the dynamics of PPIs in multiple species using an easy-to-use web interface.

MATERIALS AND METHODS

Web application overview

Given a target species, a set of proteins of the target species, and additional species to be compared (the top panel of Figure 1), INTERSPIA searches for additional proteins associated with the input proteins using the STRING database (5) by the random walk with restart algorithm (27,28), and the interactions among extended proteins in other species are also identified using the STRING and orthoDB (29) database (middle panel of Figure 1). Finally, the dynamics of the PPIs among multiple species are visualized in an easy-to-use web interface with several features (bottom panel of Figure 1). The details of the workflow are described in the following subsections.

Figure 1.

INTERSPIA workflow for inter-species protein interaction analysis. As user input, INTERSPIA takes a target species (orange-colored species in the right side of the top panel) and a set of proteins of the target species of user's interest (the left side of the top panel) with additional species to be compared (black-colored species in the right side of the top panel). In the server-side step (middle panel), additional proteins interacting with the input target proteins are discovered by the random walk with restart algorithm using the PPIs of the target species in the STRING database. The extended proteins are next used to identify their interactions in the other chosen species using the STRING and orthoDB database. Functions related with the extended proteins are identified by the GO enrichment test. In the client-side step (bottom panel), the dynamics of the protein-protein interactions among the selected species are visualized in an easy-to-use web interface with various features.

Database

PPIs and orthologous protein information of 28 mammalian species (armadillo, bat, cat, chimpanzee, cow, dog, elephant, ferret, flying fox, galago, gibbon, gorilla, gray mouse lemur, ground squirrel, guinea pig, horse, human, kangaroo rat, marmoset, mouse, orangutan, pig, rat, rhesus macaque, rock hyrax, tarsier, tenrec, tree shrews) were collected from the STRING database v10.5 (5) and orthoDB v8 (29) respectively. The STRING database contains direct (physical) as well as indirect (functional) interactions, and both types of interactions are used in INTERSPIA. Gene Ontology (GO) information of 19 mammalian species (armadillo, cat, chimpanzee, cow, dog, elephant, gibbon, gorilla, guinea pig, horse, human, marmoset, mouse, orangutan, pig, rat, tarsier, tenrec, tree shrews) was acquired from the Gene Ontology consortium (release 31 March 2018) (30) and the Ensembl database (release 91) (31). The GO information of the remaining nine species was not found in the database. The current version of INTERSPIA only supports mammalian species.

User input

As main input data, INTERSPIA requires a target species, a set of proteins of the target species, additional species to be compared with the target species, a percentage cut-off, and a STRING PPI confidence score cut-off (the top panel in Figure 1, the percentage cut-off and the STRING PPI confidence score cut-off are not shown). The percentage cut-off is used to limit the amount of extended proteins that are actually used in analysis (details in the following subsection). In the STRING database, the reliability of PPIs is represented as a confidence score. User can change the set of existing PPIs in a species by using the STRING PPI confidence score cut-off (details in the following subsection).

Server-side pipeline

Given user input data, the following three main steps are performed by the server-side pipeline of INTERSPIA (middle panel in Figure 1). First, additional proteins who have a potential for functional association with the input proteins are discovered by the random walk with restart (RWR) algorithm (27,28) using the PPI information of the target species in the STRING database. Starting from input proteins, the RWR algorithm randomly explores a given protein interaction network of a target species by moving to a neighbouring protein or going back to the input proteins based on a given restarting probability at each step of exploration. To obtain proteins closer to the input proteins, 0.95 is used as the restarting probability in this pipeline. All explored proteins have a score representing a relative association with the input proteins, and the percentage cut-off described in the previous subsection is used to extract a certain amount of high ranked ones, which will be finally merged with the input proteins to make final extended proteins. This step is optional, and therefore user can skip this step if they want to use just the input proteins. In this case, the final extended proteins consist of only the input proteins. Second, interactions among the extended proteins in the target species as well as additional species are identified using the STRING and orthoDB database. Specifically, for each of the extended proteins, orthologous proteins in every other species are identified using the orthologous protein information in the orthoDB database. An extended protein is not used in a downstream analysis if its orthologous proteins are not found in all species. This is an effort not to consider the absence of orthologous proteins in other species as the absence of protein interaction in that species. Finally, the presence and absence of PPIs are identified in all species using the extended proteins of the target species, their orthologous proteins in other species, and PPI information in the STRING database. In this step, a PPI is treated as present when the following two conditions hold: (i) its information is present in the STRING database and (ii) its STRING confidence score is greater than or equal to the STRING PPI confidence score cut-off described in the previous subsection. A PPI is considered as absent when the following two conditions hold: (i) its information of the PPI is present in the STRING database but (ii) its STRING confidence score is below the cut-off. Therefore, PPIs whose information is not observed in the STRING database for at least one of the chosen species are ignored from further analyses. This is an effort to reduce biases resulting from uneven amount of PPI information for different species in the STRING database. In some cases, for each protein of a target protein pair, multiple orthologous proteins can be found in other species. In this case, interactions of all possible protein pairs between the two identified orthologous protein sets are examined. The existence of an interaction between the two target proteins in the other species is defined when more than half of the examined protein pairs in the orthologous protein sets have an interaction. To identify functions related with the extended proteins, the GO enrichment analysis is performed by the hypergeometric test using the phyper function in R (https://www.r-project.org/), and the q-values are calculated for the correction of multiple testing by using the qvalue function in the qvalue R package (http://github.com/jdstorey/qvalue). In the GO enrichment analysis, only proteins or genes that have interaction data in the STRING database are used.

Client-side web interface

The searched PPIs by the server-side pipeline are visualized as an undirected network as shown in Figure 2, where nodes and edges represent proteins and protein interactions respectively. The size of a node indicates the relative size of an orthologous protein group to which the protein belongs. To distinguish between the input proteins and the additional ones discovered by the server-side pipeline, the input proteins are drawn as filled nodes while unfilled nodes are used for the additional proteins. An edge represents the existence of an interaction, and it is drawn using multiple colors each of which indicates the existence of the interaction in a certain species. The network in the web interface is drawn using the force-directed layout. The force-directed layout is useful for aggregating nodes according to their interaction (32,33). The layout enables easy identification of dense regions in a network, which may correspond to functional protein complexes (34).

Figure 2.

Examples of the client-side web interface of INTERSPIA showing protein interactions observed in different sets of species. (A) Protein interactions observed in cow or pig. (B) Protein interactions commonly present in cow and pig but absent in horse with the use of the ‘Only species-specific interactions’ option. Phylogenetic trees on the left side represent the evolutionary relationship of used species and chosen species with thick branches to display the observed protein interactions. Edge color indicates a species having the interaction. If an edge is observed in multiple species, multiple colors are used to draw the edge. The main feature of the web interface is an intuitive visualization of PPI differences among different species. By selecting species using species selection buttons or a phylogenetic tree, user can show PPIs present in the chosen species. Furthermore, when the ‘Only species-specific interactions’ option is selected, only PPIs commonly present in all selected species but absent in all unselected species can be visualized. For example, Figure 2A shows PPIs present in cow or pig, while Figure 2B displays PPIs commonly observed in both cow and pig but absent in horse using the ‘Only species-specific interactions’ option. The second feature of the web interface is to show diverse information of the visualized proteins and their interactions. The functions and orthologous proteins of a specific protein can be shown using a mouse right click on a protein (Figure 3A). The names of functions and orthologous proteins are linked to the description page of the GO database and the Ensembl database, respectively. Gene symbol, UniProtKB-AC ID, Ensembl protein and gene ID, and Entrez Gene ID are also shown with a link to the description page of a related database. The list of species having a specific PPI can be displayed in the ‘Interactions’ panel by selecting a node and then clicking an associated edge. For example, Figure 3B shows that an interaction between two proteins ENSBTAP00000001704 (LTF) and ENSBTAP00000022763 (ALB) is present in both cow and pig but absent in horse. Another example in Figure 3B displays an interaction between two proteins ENSBTAP00000019203 (PSMA4) and ENSBTAP00000003523 (ORC1) present in pig but not observed in cow and horse. User can also identify functions related with proteins displayed in a current network by the GO enrichment analysis, whose results are listed in the ‘Functions’ panel with features to filter them by q-value and sort them by q-value, GO ID and GO name (Figure 3C). User can highlight proteins with a specific function by selecting a specific GO term in the ‘Functions’ panel.

Figure 3.

Examples of the client-side web interface of INTERSPIA showing various features. (A) Diverse protein identifiers, a gene symbol, function and orthologous protein information of a chosen protein shown by a mouse right click on a node in the network. (B) Information of two interactions and species containing (green color) or not containing (grey color) the interactions shown by a mouse left click on an edge. The numbers next to the species names represents the STRING confidence score of the PPI in the case of a target species, or the average of all STRING confidence scores of all orthologous protein pairs in the case of other species. (C) Results of the GO enrichment test for the proteins shown in a current network. GO terms can be filtered by q-value and sorted by q-value, GO ID, and GO name. User can highlight proteins with a specific function by selecting a specific GO term. In the figure, proteins associated with the SMAD protein complex assembly (GO:0007183) function are highlighted with a black and magnified label. The web interface also supports various additional features for better understanding of the dynamics of PPIs in multiple species. The layout of the network structure can be changed focusing on protein interactions using the force-directed layout as described above (default setting) or orthologous protein relationships using a grid layout. Reorganized proteins based on orthologous protein relationships allows user to better identify interaction patterns among different orthologous protein groups. User can also control the level of visualized proteins in terms of reachability from the input proteins. For instance, only directly connected proteins to the input proteins can be shown by a button click. Using search function, user can easily search for a specific protein in a network based on gene or protein identifier, function, or orthologous protein information. Finally, the visualized network can be downloaded as various image formats including PDF, PNG, SVG and JPG, and the information of proteins, their interactions, and functions are provided as text files for further downstream analyses. In addition, user can save a current session, and completely restore it when they revisit our web interface.

Implementation

The server-side pipeline was written by the Perl (version 5.10.1) and C language. MySQL (version 5.1.73, https://www.mysql.com/) was used to construct the database of PPIs of multiple species. The client-side web interface was implemented using HTML5 (https://www.w3.org/TR/html5/), Bootstrap (version 3.3.7, https://getbootstrap.com/), and JavaScript (https://www.javascript.com/) along with several libraries including jQuery (http://jquery.com/), jQuery-UI (https://jqueryui.com/) for user interface, and d3.js (http://d3js.org/), PHP (version 5.3.3, http://php.net/) for interactive data processing. Scalable Vector Graphics (https://www.w3.org/Graphics/SVG/) and Canvas (https://www.w3.org/TR/html5/) were blended to implement a network structure.

RESULTS

We illustrate the capability and usefulness of INTERSPIA using 72 Hanwoo-specific proteins associated with the traits of cow. Hanwoo is the indigenous cow breed in Korea. The details about the example protein set can be found in (35). Additional examples for other species are also available in the INTERSPIA website. Given Hanwoo-specific proteins, we ran INTERSPIA to extract the top 1.5% of extended proteins and their interactions to compare the dynamics of their PPIs using additional two related species pig and horse with 0.7 as the STRING confidence score cut-off. Total 187 proteins were obtained, and 489 PPIs were identified as having different interaction patterns among the three species. Supplementary Figure S1 shows cow-specific protein interactions not present in pig and horse. One hundred forty-seven cow-specific PPIs are discovered forming one major cluster (left rectangle in Supplementary Figure S1). There were many proteins in the cluster (labelled proteins in the right-bottom rectangle in Supplementary Figure S1) known as being related with the meat quality of cow (35). This example shows how INTERSPIA is used to discover species-specific protein interactions, which was validated using known cow traits. Supplementary Figure S2 displays example protein interactions only observed in cow but not in pig and horse (panel A), and only observed in pig but not in cow and horse (panel B) with functional analysis by the GO enrichment test. In Supplementary Figure S2A, 17 proteins related with the negative regulation of apoptotic process (GO:0043066) and SMAD protein complex assembly (GO:0007183) with the q-value cut-off 0.05, and 72 cow-specific interactions connected to them are easily identified by INTERSPIA. In Supplementary Figure S2B, 13 proteins involved with five GO functions (Isoprenoid biosynthetic process (GO:0008299), Neutrophil apoptotic process (GO:0001781), Regulation of transmembrane transporter activity (GO:0022898), Peptidyl-tyrosine autophosphorylation (GO:0038083), Retina homeostasis (GO:0001895)) with the q-value cut-off 0.05 are highlighted by INTERSPIA. This example shows the search capability of INTERSPIA and the easy use of the web interface to identify different interaction patterns of proteins among multiple species.

CONCLUSION

INTERSPIA is a web-based application for detecting and visualizing the dynamics of PPIs from the user-provided proteins in multiple species. It discovers additional proteins functionally associated with the input proteins using the random walk with restart algorithm and visualizes their interactions in an easy-to-use web interface to help user easily understand the evolutionary changes of PPIs among the chosen species. INTERSPIA will serve as a highly valuable tool for the evolutionary analysis of protein interactions and their functional consequences. Click here for additional data file.

35 in total

1. Duplication models for biological networks.

Authors: Fan Chung; Linyuan Lu; T Gregory Dewey; David J Galas
Journal: J Comput Biol Date: 2003 Impact factor: 1.479

Review 2. Network biology: understanding the cell's functional organization.

Authors: Albert-László Barabási; Zoltán N Oltvai
Journal: Nat Rev Genet Date: 2004-02 Impact factor: 53.242

3. Pairwise alignment of protein interaction networks.

Authors: Mehmet Koyutürk; Yohan Kim; Umut Topkara; Shankar Subramaniam; Wojciech Szpankowski; Ananth Grama
Journal: J Comput Biol Date: 2006-03 Impact factor: 1.479

4. Modularity of MAP kinases allows deformation of their signalling pathways.

Authors: Areez Mody; Joan Weiner; Sharad Ramanathan
Journal: Nat Cell Biol Date: 2009-03-22 Impact factor: 28.824

Review 5. Evolution of biological interaction networks: from models to real data.

Authors: Mark G F Sun; Philip M Kim
Journal: Genome Biol Date: 2011-12-28 Impact factor: 13.583

6. OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software.

Authors: Evgenia V Kriventseva; Fredrik Tegenfeldt; Tom J Petty; Robert M Waterhouse; Felipe A Simão; Igor A Pozdnyakov; Panagiotis Ioannidis; Evgeny M Zdobnov
Journal: Nucleic Acids Res Date: 2014-11-26 Impact factor: 16.971

7. Expansion of the Gene Ontology knowledgebase and resources.

Authors:
Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971

8. Bacterial protein meta-interactomes predict cross-species interactions and protein function.

Authors: J Harry Caufield; Christopher Wimble; Semarjit Shary; Stefan Wuchty; Peter Uetz
Journal: BMC Bioinformatics Date: 2017-03-16 Impact factor: 3.169

9. Ensembl 2017.

Authors: Bronwen L Aken; Premanand Achuthan; Wasiu Akanni; M Ridwan Amode; Friederike Bernsdorff; Jyothish Bhai; Konstantinos Billis; Denise Carvalho-Silva; Carla Cummins; Peter Clapham; Laurent Gil; Carlos García Girón; Leo Gordon; Thibaut Hourlier; Sarah E Hunt; Sophie H Janacek; Thomas Juettemann; Stephen Keenan; Matthew R Laird; Ilias Lavidas; Thomas Maurel; William McLaren; Benjamin Moore; Daniel N Murphy; Rishi Nag; Victoria Newman; Michael Nuhn; Chuang Kee Ong; Anne Parker; Mateus Patricio; Harpreet Singh Riat; Daniel Sheppard; Helen Sparrow; Kieron Taylor; Anja Thormann; Alessandro Vullo; Brandon Walts; Steven P Wilder; Amonida Zadissa; Myrto Kostadima; Fergal J Martin; Matthieu Muffato; Emily Perry; Magali Ruffier; Daniel M Staines; Stephen J Trevanion; Fiona Cunningham; Andrew Yates; Daniel R Zerbino; Paul Flicek
Journal: Nucleic Acids Res Date: 2016-11-28 Impact factor: 16.971

10. Edgetic perturbation models of human inherited disorders.

Authors: Quan Zhong; Nicolas Simonis; Qian-Ru Li; Benoit Charloteaux; Fabien Heuze; Niels Klitgord; Stanley Tam; Haiyuan Yu; Kavitha Venkatesan; Danny Mou; Venus Swearingen; Muhammed A Yildirim; Han Yan; Amélie Dricot; David Szeto; Chenwei Lin; Tong Hao; Changyu Fan; Stuart Milstein; Denis Dupuy; Robert Brasseur; David E Hill; Michael E Cusick; Marc Vidal
Journal: Mol Syst Biol Date: 2009-11-03 Impact factor: 11.429

3 in total