| Literature DB >> 36158572 |
Vivian Robin1, Antoine Bodein1, Marie-Pier Scott-Boyer1, Mickaël Leclercq1, Olivier Périn2, Arnaud Droit1.
Abstract
At the heart of the cellular machinery through the regulation of cellular functions, protein-protein interactions (PPIs) have a significant role. PPIs can be analyzed with network approaches. Construction of a PPI network requires prediction of the interactions. All PPIs form a network. Different biases such as lack of data, recurrence of information, and false interactions make the network unstable. Integrated strategies allow solving these different challenges. These approaches have shown encouraging results for the understanding of molecular mechanisms, drug action mechanisms, and identification of target genes. In order to give more importance to an interaction, it is evaluated by different confidence scores. These scores allow the filtration of the network and thus facilitate the representation of the network, essential steps to the identification and understanding of molecular mechanisms. In this review, we will discuss the main computational methods for predicting PPI, including ones confirming an interaction as well as the integration of PPIs into a network, and we will discuss visualization of these complex data.Entities:
Keywords: biological network; computational prediction; graphic view; integrated strategies; interactome; protein-protein interaction
Year: 2022 PMID: 36158572 PMCID: PMC9494275 DOI: 10.3389/fmolb.2022.962799
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
Summary table of computational methods for the prediction of a protein–protein interaction. Computational methods for predicting PPIs are grouped into three distinct categories: genomic context–based methods, machine learning, and text mining. Within each of these approaches, several sub-methods exist. A database can be composed of interactions obtained by several prediction methods.
| Main method | Main advantage | Main disadvantage | Database | |
|---|---|---|---|---|
| Genomic context | Domain fusion, conserved gene neighborhood, phylogenetic profiles, and co-evolution ( | Interspecies comparison requires few IT resources, fast calculation | Low coverage rate, prediction, using only genomic features | String ( |
| Machine learning algorithm | Supervised learning: support vector machine, artificial neural networks, naïve Bayes learning, decision trees ( | Handling multi-dimensional and multi-variety data, high efficiency | Data acquisition (massive datasets), High error susceptibility, requires significant IT resources | String, BioGRID, IID ( |
| Unsupervised learning: K-means, hierarchical clustering ( | ||||
| Text mining | Extracting information from scientific studies and references databases as PubMed | Many publications are available, rapidity of execution, inexpensive, easily accessible data | Requests that the interactions be cited in the articles | String, BioGRID, MINT ( |
| Using natural language processing (NLP) technology | ||||
| ( |
FIGURE 1Workflow of key steps to design a PPI network assembly. PPI networks can be integrated horizontally and/or vertically. Horizontal integration creates a PPI network by concatenating interaction information from different PPI databases (here networks 1 and 2 represent two PPI networks from two different databases), while vertical integration gathered information from different omics databases for a given interaction. In the vertical integration box, each omics network represents different interactomes such as protein–protein, drug–protein, and RNA–protein. Once the networks are generated, it is necessary to evaluate its interactions confidence to filter the network. Interactions in red are interactions with a high confidence score. After narrowing the network, specialized tools can be used to visualize the network and information about the connected entities (e.g., identify proteins with a central role in the mechanisms).
Summary table of tool for visualizing of protein–protein interaction network. Visualization methods to analyze network are grouped into three distinct categories: visualization through downloadable tools, visualization by libraries integrated with languages, and visualization through graph-oriented databases. The user has to choose his tools according to his study context. For analysis of high dimensional data containing a large amount of information, it is advisable to manipulate tools based on graph databases. Conversely, if the user wants to have a quick representation, we recommend the user to turn more to visualization libraries or downloadable software.
| Tool | Advantage | Disadvantage | |
|---|---|---|---|
| Visualization through downloadable tools | Cytoscape ( | Many add-on features, flexibility for network analysis, easy to handle, open source and free | Difficult to set up automation interface, working with big networks requires big memory and computing power |
| Visualization by libraries integrated with languages | Igraph ( | Open source and free, well documented, accessible, import and export graphs easily, easy to implement | Graphic possibilities are limited, restricted number of nodes |
| Visualization through graph-oriented databases | Neo4j ( | Speed of calculation, adapted big networks, integrated search engine, Flexible and agile structures | Request for calculation servers. Not very scalable as it is designed for a single server architecture |