| Literature DB >> 33511361 |
Anthony Federico1,2, Stefano Monti1,2.
Abstract
Protein-protein interaction (PPI) databases are an important bioinformatics resource, yet existing literature-curated databases usually represent cell-type-agnostic interactions, which is at variance with our understanding that protein dynamics are context specific and highly dependent on their environment. Here, we provide a resource derived through data mining to infer disease- and tissue-relevant interactions by annotating existing PPI databases with cell-contextual information extracted from reporting studies. This resource is applicable to the reconstruction and analysis of disease-centric molecular interaction networks. We have made the data and method publicly available and plan to release scheduled updates in the future. We expect these resources to be of interest to a wide audience of researchers in the life sciences.Entities:
Keywords: context-relevant PPI; network biology; protein-protein interaction
Year: 2020 PMID: 33511361 PMCID: PMC7815950 DOI: 10.1016/j.patter.2020.100153
Source DB: PubMed Journal: Patterns (N Y) ISSN: 2666-3899
Figure 1A Graphical Overview
The schematic describes the organization of existing bioinformatics resources to create three mapping tables—(A) the PPI table which maps interactions to reporting publications, (B) the PID table which maps publications to extracted cell lines, and (C) the CLA table which maps cell-line accessions to official cell-line names and associated cell-type information—to generate (D) the presented dataset of contextualized PPIs.
Figure 2The Main Routine Behind PPI Context
The pseudocode includes the main routine executed in the data pre-processing pipeline for creating contextualized PPI entries from the three mapping tables. The tool can be downloaded from GitHub, which includes example commands for installing the required Python dependencies and fetching the raw data.
Figure 3Summary of Contextualized PPIs
The processed dataset provides cell-line information for each contextualized PPI. The summary plots compare the frequencies of annotations for contextualized PPIs,including (A) the most frequently annotated cell-line names, (B) cell-line species of origin, (C) cell-line sex, and (D) cell-line category. The majority of annotations were human cancer-derived cell lines.
Network Propagation of Disease Genes
| Cell name | Nodes | Edges | Density | Assortivity | Mean AUROC | SD | Delta |
|---|---|---|---|---|---|---|---|
| BRCA | 4,645 | 9,015 | 8.4 × 10−4 | −1.8 × 10−1 | 0.693 | 0.028 | |
| Breast Expressed | 10,850 | 180,342 | 3.1 × 10−3 | −6.8 × 10−2 | 0.691 | 0.014 | 0.002 |
| HEK293 | 11,069 | 79,207 | 1.3 × 10−3 | −7.4 × 10−2 | 0.664 | 0.017 | 0.029 |
| HEK293T | 13,884 | 116,709 | 1.2 × 10−3 | −8.2 × 10−2 | 0.659 | 0.014 | 0.034 |
| HeLa | 13,824 | 149,136 | 1.6 × 10−3 | −5.4 × 10−2 | 0.658 | 0.015 | 0.034 |
| MEF (C57BL/6) | 4,189 | 8,689 | 9.9 × 10−4 | −2.2 × 10−1 | 0.643 | 0.023 | 0.049 |
| DU145 | 3,219 | 8,075 | 1.6 × 10−3 | −5.5 × 10−1 | 0.637 | 0.031 | 0.055 |
| Jurkat | 2,467 | 5,953 | 2.0 × 10−3 | −4.1 × 10−1 | 0.637 | 0.034 | 0.056 |
| HCT 116 | 11,936 | 82,956 | 1.2 × 10−3 | 1.1 × 10−2 | 0.633 | 0.018 | 0.060 |
| Schneider 2 | 4,228 | 18,745 | 2.1 × 10−3 | −7.2 × 10−2 | 0.632 | 0.028 | 0.060 |
| U2OS | 6,667 | 25,309 | 1.1 × 10−3 | −2.7 × 10−1 | 0.631 | 0.020 | 0.061 |
| SW480 | 1,763 | 4,316 | 2.8 × 10−3 | −4.0 × 10−1 | 0.629 | 0.029 | 0.063 |
| MCF-10A | 11,469 | 61,722 | 9.4 × 10−4 | −3.7 × 10−2 | 0.627 | 0.018 | 0.066 |
| Hep-G2 | 1,304 | 3,728 | 4.4 × 10−3 | −3.0 × 10−1 | 0.626 | 0.046 | 0.066 |
| BL-21 | 5,135 | 11,433 | 8.7 × 10−4 | −2.0 × 10−1 | 0.625 | 0.021 | 0.068 |
| NCI-H1975 | 1,295 | 3,546 | 4.2 × 10−3 | −4.3 × 10−1 | 0.618 | 0.041 | 0.074 |
| LS513 | 1,246 | 3,486 | 4.5 × 10−3 | −4.4 × 10−1 | 0.603 | 0.038 | 0.089 |
| NIH 3T3 | 2,914 | 4,806 | 1.1 × 10−3 | −2.9 × 10−1 | 0.603 | 0.030 | 0.090 |
| HT-29 | 1,693 | 4,219 | 2.9 × 10−3 | −3.7 × 10−1 | 0.601 | 0.034 | 0.092 |
| HeLa Kyoto | 4,992 | 16,901 | 1.4 × 10−3 | −1.1 × 10−1 | 0.601 | 0.022 | 0.092 |
| MCF-10AT | 11,112 | 57,754 | 9.4 × 10−4 | −1.4 × 10−1 | 0.597 | 0.018 | 0.096 |
| K-562 | 1,922 | 3,687 | 2.0 × 10−3 | −3.8 × 10−1 | 0.596 | 0.036 | 0.096 |
| MRC-5 | 2,015 | 3,538 | 1.7 × 10−3 | −3.7 × 10−1 | 0.583 | 0.022 | 0.109 |
| T-REx-293 | 5,395 | 19,558 | 1.3 × 10−3 | −4.1 × 10−1 | 0.581 | 0.022 | 0.111 |
| HeLa S3 | 8,756 | 39,157 | 1.0 × 10−3 | 1.0 × 10−1 | 0.580 | 0.019 | 0.112 |
| Sf9 | 1,819 | 3,159 | 1.9 × 10−3 | −2.0 × 10−1 | 0.575 | 0.029 | 0.118 |
| JON | 1,354 | 3,629 | 4.0 × 10−3 | −4.1 × 10−1 | 0.574 | 0.036 | 0.119 |
| SH-SY5Y | 8,422 | 27,864 | 7.9 × 10−4 | −1.5 × 10−1 | 0.571 | 0.023 | 0.122 |
| HEK | 6,161 | 19,569 | 1.0 × 10−3 | −2.2 × 10−1 | 0.546 | 0.022 | 0.147 |
| 293T/AT1 | 1,994 | 3,315 | 1.7 × 10−3 | −3.1 × 10−1 | 0.518 | 0.028 | 0.174 |
| hTERT-RPE1 | 2,553 | 6,577 | 2.0 × 10−3 | −4.2 × 10−1 | 0.486 | 0.035 | 0.206 |
A comparison of the recovery of breast cancer disease genes in a breast cancer-centric network and networks built from non-breast cancer interactions, in addition to measured graph properties, including nodes, edges, density, and assortivity. Delta values measure the difference in mean AUROC (area under the receiver operating characteristic) of 100 repeats between the BRCA network and the rest.
Targeted Enrichment by Cell Line
| Cell name | Interactions | BRCA1/2 | % | p | FDR |
|---|---|---|---|---|---|
| MDA-MB-231 | 4,185 | 152 | 0.036 | 2.6 × 10−133 | 7.9 × 10−132 |
| MCF-7 | 6,577 | 67 | 0.010 | 7.0 × 10−26 | 1.0 × 10−24 |
| MCF-10A | 62,019 | 174 | 0.003 | 1.9 × 10−5 | 1.9 × 10−4 |
| MCF-10AT | 57,832 | 163 | 0.003 | 2.8 × 10−5 | 2.1 × 10−4 |
| U2OS | 26,221 | 83 | 0.003 | 8.5 × 10−5 | 5.1 × 10−4 |
| BL-21 | 11,981 | 27 | 0.002 | 3.3 × 10−1 | 1.0 |
| MEF (C57BL/6) | 9,265 | 21 | 0.002 | 3.4 × 10−1 | 1.0 |
| NIH 3T3 | 5,031 | 10 | 0.002 | 5.7 × 10−1 | 1.0 |
| K-562 | 4,197 | 7 | 0.002 | 7.5 × 10−1 | 1.0 |
| JON | 3,641 | 6 | 0.002 | 7.5 × 10−1 | 1.0 |
| HCT 116 | 85,366 | 164 | 0.002 | 8.0 × 10−1 | 1.0 |
| Sf9 | 3,414 | 4 | 0.001 | 9.2 × 10−1 | 1.0 |
| Hep-G2 | 3,854 | 2 | 0.001 | 1.0 | 1.0 |
| SW480 | 4,351 | 1 | 0.000 | 1.0 | 1.0 |
| DU145 | 8,123 | 4 | 0.000 | 1.0 | 1.0 |
| Jurkat | 6,245 | 1 | 0.000 | 1.0 | 1.0 |
| HeLa | 179,407 | 285 | 0.002 | 1.0 | 1.0 |
| HeLa S3 | 39,911 | 35 | 0.001 | 1.0 | 1.0 |
| SH-SY5Y | 27,964 | 17 | 0.001 | 1.0 | 1.0 |
| T-REx-293 | 19,912 | 6 | 0.000 | 1.0 | 1.0 |
| HEK | 20,382 | 4 | 0.000 | 1.0 | 1.0 |
| HeLa Kyoto | 17,093 | 1 | 0.000 | 1.0 | 1.0 |
| HEK293T | 140,112 | 132 | 0.001 | 1.0 | 1.0 |
| HEK293 | 85,737 | 45 | 0.001 | 1.0 | 1.0 |
| Schneider 2 | 18,789 | 0 | 0.000 | 1.0 | 1.0 |
| hTERT-RPE1 | 6,739 | 0 | 0.000 | 1.0 | 1.0 |
| HT-29 | 4,271 | 0 | 0.000 | 1.0 | 1.0 |
| MRC-5 | 3,597 | 0 | 0.000 | 1.0 | 1.0 |
| NCI-H1975 | 3,550 | 0 | 0.000 | 1.0 | 1.0 |
| LS513 | 3,487 | 0 | 0.000 | 1.0 | 1.0 |
A comparison of the annotation of BRCA1/2-interactions across the most frequent cell lines. The significance was computed with a hyper-geometric test for over-representation and p values were adjusted for multiple comparisons using the Benjamini-Hochberg method (FDR).