| Literature DB >> 35892929 |
Marianna Milano1,2, Giuseppe Agapito2,3, Mario Cannataro1,2.
Abstract
High-Throughput technologies are producing an increasing volume of data that needs large amounts of data storage, effective data models and efficient, possibly parallel analysis algorithms. Pathway and interactomics data are represented as graphs and add a new dimension of analysis, allowing, among other features, graph-based comparison of organisms' properties. For instance, in biological pathway representation, the nodes can represent proteins, RNA and fat molecules, while the edges represent the interaction between molecules. Otherwise, biological networks such as Protein-Protein Interaction (PPI) Networks, represent the biochemical interactions among proteins by using nodes that model the proteins from a given organism, and edges that model the protein-protein interactions, whereas pathway networks enable the representation of biochemical-reaction cascades that happen within the cells or tissues. In this paper, we discuss the main models for standard representation of pathways and PPI networks, the data models for the representation and exchange of pathway and protein interaction data, the main databases in which they are stored and the alignment algorithms for the comparison of pathways and PPI networks of different organisms. Finally, we discuss the challenges and the limitations of pathways and PPI network representation and analysis. We have identified that network alignment presents a lot of open problems worthy of further investigation, especially concerning pathway alignment.Entities:
Keywords: biological pathways; local and global alignment; networks alignment; protein–protein interaction
Year: 2022 PMID: 35892929 PMCID: PMC9326688 DOI: 10.3390/biotech11030024
Source DB: PubMed Journal: BioTech (Basel) ISSN: 2673-6284
The table summarizes the main characteristics of the listed databases. The column refers to the total number of memorized pathways for each database; PathType provides infromation about the handled pathway type; provides information about the total number of covered proteins/genes for each database; is the total number of interactions available in each database; organism provides information on the pathway origin organism; Cur specifies how pathway data are handled in each databases. In the table, SMT is the contraction of Signaling, Metabolic and Transduction pathways. MO indicates Multi-Organisms. H is short for Human. LC and EC are short for Literature and Electronic Curation.
| Pathway DB |
| PathType |
|
| Organism | Cur |
|---|---|---|---|---|---|---|
| Biocarta | 254 | SMT | 1396 | × | H,M | LC |
| BioCyc | 20,005 | MT | 14,544 | × | MO | LC |
| INOH | 9606 | SMT | 22,799 | 5625 | MO | LC |
| KEGG | 551 | M | 19,618 | 3568 | H | LC |
| NetPath | 36 | S | 5337 | 7214 | H | LC |
| PathwayCommons | 5772 | SMT | 18,490 | 2,424,055 | H | EC |
| Reactome | 2553 | SMT | 31,506 | 16,017 | MO | LC |
| SMPDB | 908 | MT | 1576 | × | H | LC |
| WikiPathways | 3053 | SMT | 7858 | × | MO | LC |
The table summarizes the main characteristics of the listed databases. In the table, the column refers to the total number of memorized proteins for each database; the column refers to the total number of memorized protein interactions for each database; the column Organism is related to the organisms on which the ppi are inferred; the column Exper Det. records whether the interactions have been experimentally determined.
| PPI Networks DB |
|
| Organism | Exper. Det. |
|---|---|---|---|---|
| DIP | 28,850 | 81,923 | SC, DM, EC, CE, HS, HP, MM, RN, BT, AT | Yes |
| BioGRID | 2,467,140 | 1,740,000 | SC, DM, EC, CE, HS, HP, MM, RN, BT, AT, SC, SARS-CoV-2, XL | Yes |
| MINT | 27,069 | 132,249 | HS, SC, MM, RN, HP, DM, CE | Yes |
| IntAct | 118,759 | 1,184,144 | HS, SC, MM, RN, EC, DM, CE, HP, SP, BS, SARS-CoV-2 | Yes |
| I2D | 687,072 | 1,279,157 | SC, CE, DM, RN, MM, HS | No |
| STRING | 67,592,464 | 20,052,394,041 | HS, MM, AT, SC, EC, CE, RN, DM, BS, PAO1 | No |
Pathway alignment algorithms.
| Algorithm | one2Many | Many2Many | Method |
|---|---|---|---|
| MP-Align |
| × | Maximum weighted bipartite matching algorithm, largest conserved |
| sub-hypergraph | |||
| PathAligner |
| × | Hierarchical alignment |
| SubMAP |
| × | Maximum weight independent set and eigenvalue problem |
| MetaPathwayHunter |
| × | Subtree homeomorphism and weighted assignments in bipartite graphs |
| MetaPAT |
| × | Subgraph homeomorphism |
| MetNetALigner |
| × | Dynamic programming |
| CAMPways |
| × | Constrained alignment |
PPI Network alignment algorithms.
| Algorithm | GNA or LNA | PNA or MNA | One-to-One or Many-to-Many | Method |
|---|---|---|---|---|
| GLAlign | LNA | PNA | Many-to-many | Seed node and alignment graph |
| NetworkBLAST | LNA | PNA | Many-to-many | Score function |
| NetAligner | LNA | PNA | Many-to-many | Evolutionary method |
| AlignNemo | LNA | PNA | Many-to-many | Score function |
| AlignMCL | LNA | PNA | Many-to-many | Seed node and alignment graph |
| H-GRAAL | GNA | PNA | One-to-one | Graphlet |
| MI-GRAAL | GNA | PNA | One-to-one | Graphlet |
| C-GRAAL | GNA | PNA | One-to-one | Graphlet |
| L-GRAAL | GNA | PNA | One-to-one | Graphlet |
| IsoRank | GNA | PNA | One-to-one | Node similarity |
| GHOST | GNA | PNA | One-to-one | Spectral signature |
| WAVE | GNA | PNA | One-to-one | Seed and extend strategy |
| MAGNA | GNA | PNA | One-to-one | Genetic algorithm |
| MAGNA++ | GNA | PNA | One-to-one | Genetic algorithm |
| SANA | GNA | PNA | One-to-one | Simulated annealing |
| IGLOO | GNA | PNA | One-to-one | Seed and extend strategy |
| MultiMAGNA++ | GNA | MNA | One-to-one | Genetic algorithm |
| GEDEVO-M | GNA | MNA | One-to-one | Evolutionary algorithm |
| IsoRankN | GNA | MNA | Many-to-many | Node similarity |
| SMETANA | GNA | MNA | Many-to-many | Greedy approach |
| LocalAli | LNA | MNA | Many-to-many | Maximum parsimony evolutionary model |
| FUSE | GNA | MNA | One-to-one | Non-negative matrix trifactorization |
| NetCoffee | GNA | MNA | Many-to-many | Simulated annealing |
| BEAMS | GNA | MNA | Many-to-many | Node cost function |