| Literature DB >> 35615022 |
Marianna A Zolotovskaia1,2,3, Victor S Tkachev1, Anastasia A Guryanova1,4, Alexander M Simonov5, Mikhail M Raevskiy5, Victor V Efimov5, Ye Wang6, Marina I Sekacheva5, Andrew V Garazha1, Nicolas M Borisov2, Denis V Kuzmin2, Maxim I Sorokin1,2,4,7, Anton A Buzdin2,5,8,7.
Abstract
OncoboxPD (Oncobox pathway databank) available at https://open.oncobox.com is the collection of 51 672 uniformly processed human molecular pathways. Superposition of all pathways formed interactome graph of protein-protein interactions and metabolic reactions containing 361 654 interactions and 64 095 molecular participants. Pathways are uniformly classified by biological processes, and each pathway node is algorithmically functionally annotated by specific activator/repressor role. This enables online calculation of statistically supported pathway activation levels (PALs) with the built-in bioinformatic tool using custom RNA/protein expression profiles. Each pathway can be visualized as static or dynamic graph, where vertices are molecules participating in a pathway and edges are interactions or reactions between them. Differentially expressed nodes in a pathway can be visualized in two-color mode with user-defined color scale. For every comparison, OncoboxPD also generates a graph summarizing top up- and downregulated pathways.Entities:
Keywords: ARR, activation/repressor role coefficient; BEL, Biological Expression Language; Interactomics; KEGG, Kyoto Encyclopedia of Genes and Genomes; Metabolomics; Molecular pathway database; PAL, pathway activation level; Pathway activation level; Pathway visualization; Protein–protein interactions; SPIKE, Signaling Pathways Integrated Knowledge Engine; SPOKE, Scalable Precision Medicine Open Knowledge Engine; VM, virtual machine
Year: 2022 PMID: 35615022 PMCID: PMC9120235 DOI: 10.1016/j.csbj.2022.05.006
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Composition of source pathway knowledge bases.
| Biocarta 1.2 | 337 | 1 082 | – | protein–protein |
| KEGG 1.2 | 288 | 4 345 | – | protein–protein |
| HumanCyc 1.0 | 300 | 980 | 1 040 | protein–protein, biochemical reactions, transport |
| NCI 1.2 | 775 | 2 214 | – | protein–protein |
| Qiagen 1.4 (SABiosciences) | 379 | 2 493 | – | protein–protein |
| Reactome 1.3 | 945 | 6 105 | – | protein–protein |
| Pathbank 1.0 | 48 648 | 1 405 | 55 571 | protein–protein, biochemical reactions, transport |
| Total | 51 672 | 9 117 | 56 596 | protein–protein, biochemical reactions, transport |
*Numbers are shown for unique items only.
Type of pathway graph edges in OncoboxPD.
| Direct interaction, activation or inhibition | activation or inhibition, respectively |
| SubPathwayInteraction | other |
| ComplexAssembly | other |
| Molecular interaction(between participants from SubPathwayControl item) | activation or inhibition, respectively |
| Catalysis, activation or inhibition | activation or inhibition, respectively |
| Modulation (activation-allosteric, activation-nonallosteric, activation, inhibition-competitive, inhibition-other, inhibition-noncompetitive, inhibition-allosteric, inhibition-irreversible, inhibition) | activation or inhibition, respectively |
| Transport | activation, because it promotes further molecular interaction |
| BiochemicalReaction | activation, because it promotes further molecular interaction |
| Indirect | other |
| Compound | other |
| Others | other |
Type and composition of pathway graph nodes in OncoboxPD.
| Node with one or more gene products/components (proteins, nucleic acid molecules, small molecules, protein complexes, bound complexes) | participant | one or several gene products/components; | +/+ if node contains gene product | participant node |
| Node with name of biological effect, or with another crosslinking molecular pathway (entire pathway as single item) | participant | empty | +/- | participant node |
| Auxiliary transport node | transport | empty | +/- | transport node |
| Auxiliary biochemical reaction node | biochemical reaction | empty | +/- | reaction node |
a “+” and “-” indicates involvement of node in evaluation of ARR coefficients or PAL values.
Fig. 1Schematic representation of molecular pathway functional classification according to GO terms enrichment. Each functional group corresponds to a specific GO term and includes pathways, where this GO term is statistically significantly enriched (adjusted p-value less than 0.05). In this assay, only the pathways with unique gene compositions were considered.
Fig. 2Size distribution of GO functional groups of pathways (number of pathways included). Groups with more than 800 pathways are shown right to dashed red line. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Functional OncoboxPD pathway groups with more than 800 members.
| Activation of protein kinase activity | 1021 |
| Peptidyl-serine phosphorylation | 937 |
| Fc receptor signaling pathway | 896 |
| Response to peptide hormone | 883 |
| Regulation of MAP kinase activity | 882 |
| Immune response-activating cell surface receptor signaling pathway | 869 |
| Neuronal death | 858 |
| Regulation of neuron death | 834 |
| Positive regulation of cellular protein localization | 833 |
| Blood coagulation | 817 |
| Gland development | 810 |
| Positive regulation of cell adhesion | 802 |
Fig. 3Example of pathway activation chart. Green lines show top 10 most strongly activated pathways (ordered top to bottom), red lines show top 10 most strongly inhibited pathways (ordered bottom to top). Thickness is proportionate to absolute value of PAL. In this example, RNA sequencing gene expression profiles of three thyroid cancer samples [39] were compared with six healthy thyroid normal samples from ANTE collection [40]. PAL values, t-test p-values and FDR-adjusted p-values are shown right to the pathway names. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Principle of p-value calculation for pathway activation chart.
| 3 or more | 3 or more | |
| 3 or more | 1 | |
| less than 3 | any number |
Fig. 4TRAF molecular pathway visualization using OncoboxPD software. Nodes correspond to individual pathway components or to their complexes. Color of every node reflects logarithm of mean CNR for all node components, according to a scale given with green upregulated, and red downregulated nodes. Grey nodes have no gene components, and no CNR values. Nodes that are unaffected (with CNR ∼ 1) are shown white. Red and green arrows stand for inhibitory and activating interactions, respectively. Ovals denote gene products, octagons - metabolites, hexagons – complexes, and other nodes (text labels without recognized molecular components) are shown as rectangles. A bold black border indicates outlier values that are outside of scale limits. In this example, RNA sequencing gene expression profile of thyroid papillary cancer sample TC15 [39] was compared against six healthy thyroid samples from ANTE collection [40]. A) Static pathway graph. B) Projection of dynamic interactive pathway graph. The figure can be found following the path: result file in Folders(“example” with green icon)->sample TC15 in Sample table-> Pathway activation level tab->TRAF Pathway-> static and dynamic graphs. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 5OncoboxPD visualization of GDP-mannose biosynthesis molecular pathway. Nodes correspond to pathway participants. Color reflects node activation according to color scale on the bottom. For this example, thyroid papillary cancer sample 5 RNA sequencing profile [39] was normalized on six healthy thyroid samples from ANTE collection [40]. Nodes that don’t correspond to known gene products are shown grey because for them no CNR value can be calculated. Auxiliary central nodes of biochemical reactions (BR) have rhombic shape and are filled in black. Ovals denote gene products, octagons - metabolites. Green arrows denote activating interactions. The figure can be found following the path: result file in Folders (“example” with green icon) ->sample TC15 in Sample table-> Pathway activation level tab-> De novo triacylglycerol biosynthesis pathway -> static graph. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 6Human interactome model of protein interactions and metabolic reactions. Graph vertices represent pathway participants: gene products (green), metabolites (blue) and auxiliary nodes/nodes with label of biological effect (grey). Graph edges are interactions between pathway participants. Edges inherit color from donor nodes. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Major characteristics of selected pathway aggregating databanks.
| Number of DBs/ pathway DBs | 7/7 | 22/8 | 31/12 | 47/4 | 103/11 | 3/3 |
| Human pathway databases included | Biocarta, HumanCyc, KEGG, NCI, PathBank, Reactome, Qiagen | Reactome, NCI, | Reactome, KEGG, Humancyc, NCI, Biocarta, Netpath, INOH, Ehmn, Pharmgkb, Smpdb, Signalink, Wikipathways | NCI and Reactome from Common pathways, KEGG (number of pathways is not available), | AlzPathway, Ma'ayan 2005, CancerCellMap/NetPath, CST, Macrophage, KEGG, | Reactome, KEGG, |
| Number of pathways | 51,672 | 5,772 | 5,578 | 1,822 | Not annotated | 3095 |
| Number of interactions | 391,327 | 2,424,055 | 864,683 | 2,250,197 | 507,997 | greater than215,000 |
| Non-pathway interactions | – | + | + | + | + | – |
| Participant type | molecular | molecular | molecular | 11 types (molecular and and others) | molecular | molecular |
| Web built-in visualization tool for: | canonical pathways | canonical pathways, interactions | canonical pathways, their fragments, interactions | interactions, custom pathways | Not available | merged canonical pathways, interactions |
| Local applications | Python | R, Java, Cytoscape | Cytoscape | – | Python, R, Cytoscape | Python |
| Functional analysis by pathway classification | + | – | – | – | – | – |
| Intracellular localization of pathway participants | – | + | – | – | – | – |
| Web analysis of custom gene expression data | Scoring of pathway activation | – | Enrichment/over-representation analysis | – | – | – |
| Construction of custom pathways | – | – | + | + | + | – |
| Annotated effect of interactions (activation, inhibition, neutral) | + | – | – | – | + | + |