| Literature DB >> 34850110 |
Atanas Kamburov1, Ralf Herwig2.
Abstract
Molecular interactions are key drivers of biological function. Providing interaction resources to the research community is important since they allow functional interpretation and network-based analysis of molecular data. ConsensusPathDB (http://consensuspathdb.org) is a meta-database combining interactions of diverse types from 31 public resources for humans, 16 for mice and 14 for yeasts. Using ConsensusPathDB, researchers commonly evaluate lists of genes, proteins and metabolites against sets of molecular interactions defined by pathways, Gene Ontology and network neighborhoods and retrieve complex molecular neighborhoods formed by heterogeneous interaction types. Furthermore, the integrated protein-protein interaction network is used as a basis for propagation methods. Here, we present the 2022 update of ConsensusPathDB, highlighting content growth, additional functionality and improved database stability. For example, the number of human molecular interactions increased to 859 848 connecting 200 499 unique physical entities such as genes/proteins, metabolites and drugs. Furthermore, we integrated regulatory datasets in the form of transcription factor-, microRNA- and enhancer-gene target interactions, thus providing novel functionality in the context of overrepresentation and enrichment analyses. We specifically emphasize the use of the integrated protein-protein interaction network as a scaffold for network inferences, present topological characteristics of the network and discuss strengths and shortcomings of such approaches.Entities:
Mesh:
Substances:
Year: 2022 PMID: 34850110 PMCID: PMC8728246 DOI: 10.1093/nar/gkab1128
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Growth figures describing the increase in content of human interactions with respect to the last database publication in 2013 (5)
| Human | |||
|---|---|---|---|
| Interaction type | 2013: version 25 (# interactions) | 2022: version 35 (# interactions) | Content growth (# interactions) |
| Protein–protein | 155 855 | 616 304 | 460 449 |
| Signaling or metabolic | 20 682 | 25 046 | 4364 |
| Gene regulatory | 5658 | 18 912 | 13 254 |
| Genetic | 265 | 7936 | 7671 |
| Drug–target | 33 081 | 191 650 | 158 569 |
|
|
|
|
|
| Pathways | 4601 | 5578 | 977 |
| Protein complex-derived sets | 39 685 | 244 987 | 205 302 |
| miRNA–gene target | 0 | 5474 | 5474 |
| Transcription factor–gene target | 0 | 800 | 800 |
| Enhancer–gene target sets | 0 | 217 790 | 217 790 |
It should be noted that enhancer–gene target sets are highly redundant across different cell types.
Figure 1.ConsensusPathDB content. (A) Number of interactions (Y-axis) shared by number of source databases (X-axis). The rightmost tail of the histogram is magnified. The colors within each bar represent the different interaction types. (B) Venn diagram of overlapping gene sets for the apoptosis pathway annotated by three prominent pathway databases: KEGG (pathway identifier: hsa04210), Reactome (R-HSA-109581) and WikiPathways (WP254). (C) Interaction display for the TOP1 (DNA topoisomerase I) gene. Binary interactions are scored for confidence and the confidence value is displayed by a traffic light symbol. Each interaction assigns a specific role to the molecule under study (e.g. ‘I’ interactor, ‘T’ target) and has an external link to the annotating source database. Interactions can be selected for further visualization. (D) Visualization of selected interactions of TOP1. Interactions are displayed with colored nodes indicating the interaction type and interacting molecules are displayed with colored squares indicating their type. Each connection represents a source database that has annotated the interaction. Clicking on each interaction (or molecule) displays further information about the interaction, the confidence score and supporting publications.
Figure 2.PPI network characteristics. (A) Histogram of confidence scores for the 522 618 human binary interactions in ConsensusPathDB. X-axis: confidence score in bins of 0.1; Y-axis: number of interactions. (B) Histogram of shortest path lengths connecting pairs of nodes in the PPI. (C) Node degree distribution of the PPI in log–log scale. X-axis: node degree; Y-axis: number of nodes. Graphs (B) and (C) were generated with the network analysis function (60) within the Cytoscape software (61). (D) Scatter plot of degree (X-axis) and core (Y-axis) of all 19 610 nodes in the PPI. (E) Box plot of node core distribution of 3347 recently annotated cancer genes from the Network of Cancer Genes, NCG version 7 (orange) and 3347 randomly selected genes. The P-value corresponds to the unpaired Wilcoxon’s rank sum test.
Top 30 hub proteins in ConsensusPathDB 2022
| Protein | Gene symbol | Node degree | Node core | Cancer gene (NCG V7) |
|---|---|---|---|---|
| PKHA4_HUMAN | PLEKHA4 | 2932 | 92 | No |
| A4_HUMAN | APP | 2554 | 92 | No |
| ESR2_HUMAN | ESR2 | 2296 | 92 | No |
| ESR1_HUMAN | ESR1 | 2200 | 92 | Yes |
| NTRK1_HUMAN | NTRK1 | 1958 | 92 | Yes |
| MYC_HUMAN | MYC | 1932 | 92 | Yes |
| KIF14_HUMAN | KIF14 | 1707 | 92 | No |
| H4_HUMAN | H4C1 | 1685 | 92 | No |
| JUN_HUMAN | JUN | 1580 | 92 | Yes |
| EGFR_HUMAN | EGFR | 1436 | 92 | Yes |
| CTRO_HUMAN | CIT | 1383 | 92 | No |
| NR2C2_HUMAN | NR2C2 | 1358 | 92 | No |
| RECQ4_HUMAN | RECQL4 | 1353 | 92 | Yes |
| BRD4_HUMAN | BRD4 | 1345 | 92 | Yes |
| U5S1_HUMAN | EFTUD2 | 1345 | 92 | Yes |
| RNF4_HUMAN | RNF4 | 1331 | 92 | Yes |
| BIRC3_HUMAN | BIRC3 | 1324 | 92 | Yes |
| UBC_HUMAN | UBC | 1324 | 92 | No |
| XPO1_HUMAN | XPO1 | 1310 | 92 | Yes |
| P53_HUMAN | TP53 | 1281 | 92 | Yes |
| EGLN3_HUMAN | EGLN3 | 1279 | 92 | No |
| CUL3_HUMAN | CUL3 | 1229 | 92 | Yes |
| BRCA1_HUMAN | BRCA1 | 1096 | 92 | Yes |
| TIF1B_HUMAN | TRIM28 | 1085 | 92 | Yes |
| GRB2_HUMAN | GRB2 | 1056 | 92 | Yes |
| HD_HUMAN | HTT | 1036 | 92 | No |
| PHB_HUMAN | PHB | 1017 | 92 | No |
| KI20A_HUMAN | KIF20A | 999 | 92 | No |
| HSP7C_HUMAN | HSPA8 | 994 | 92 | No |
| CSN5_HUMAN | COPS5 | 985 | 92 | No |