| Literature DB >> 20502671 |
Karine Audouze1, Agnieszka Sierakowska Juncker, Francisco J S S A Roque, Konrad Krysiak-Baltyn, Nils Weinhold, Olivier Taboureau, Thomas Skøt Jensen, Søren Brunak.
Abstract
Exposure to environmental chemicals and drugs may have a negative effect on human health. A better understanding of the molecular mechanism of such compounds is needed to determine the risk. We present a high confidence human protein-protein association network built upon the integration of chemical toxicology and systems biology. This computational systems chemical biology model reveals uncharacterized connections between compounds and diseases, thus predicting which compounds may be risk factors for human health. Additionally, the network can be used to identify unexpected potential associations between chemicals and proteins. Examples are shown for chemicals associated with breast cancer, lung cancer and necrosis, and potential protein targets for di-ethylhexyl-phthalate, 2,3,7,8-tetrachlorodibenzo-p-dioxin, pirinixic acid and permethrine. The chemical-protein associations are supported through recent published studies, which illustrate the power of our approach that integrates toxicogenomics data with other data types.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20502671 PMCID: PMC2873901 DOI: 10.1371/journal.pcbi.1000788
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Workflow of the strategy for generating a human P-PAN and predicting novel associations.
DATA: Extraction and filtering of human protein-chemical associations from CTD. The visualization of the chemical space by Principal Component Analysis projection confirms that drugs (D) and environmental chemicals (E) shared structural properties, and then may affect similar protein targets. The two first principal components, which explained about 44% of the variance on the calculated properties are shown (green: pharmaceutical actions, red: toxic actions and blue: specialty uses of chemical). All proteins (P) were mapped to Ensembl gene identifiers to facilitate further data integration. MODEL GENERATION: Construction of the P-PAN. The P-PAN was created from associations present in the CTD (dashed edge lines) between chemicals and proteins. In the P-PAN, two proteins are connected to each other (edge lines) if they share a common chemical. A weighted score, represented by the width of the black edges, was assigned to each protein-protein association. It represents the strength of the network between two proteins as defined by the number of shared compounds for both molecular targets. Selection of a scoring function and a high confidence P-PAN after overlaps comparison with two human interactomes (PPIs) based on experimental evidences. Clustering of the P-PAN and evaluation of the biological meaningful of the clusters using Gene Ontology annotations. PREDICTION: (1) Prediction of novel molecular targets for chemical using a neighbor protein procedure. DEHP (orange) is known to be connected with blue proteins and is predicted to be associated with green proteins. A confidence score was calculated for each protein, represented by the width of the edges; thick edge for high score to thin edge for low score. (2) Prediction of disease associated with chemical after integration of protein-disease information using GeneCards in clusters. As example, apocarotenal, a compound found in spinach is predicted to be link to necrosis.
Mining the P-PAN for chemicals associated with breast cancer, lung cancer and necrosis, using a clustering procedure.
| Cluster ID | Disease | Chemical name | p-Value |
| 1 (462 proteins) | Breast cancer (128 proteins) |
| 7.68e-134 |
|
| 4.46e-92 | ||
|
| 1,15e-88 | ||
|
| 2.20e-78 | ||
|
| 7.05e-63 | ||
| 12 (59 proteins) | Lung cancer (29 proteins) |
| 1.57e-26 |
| (10 proteins) | |||
|
| 3.29e-22 | ||
| (12 proteins) | |||
|
| 7.78e-06 | ||
| 2 (433 proteins) | Necrosis (122 proteins) |
| 4.76e-35 |
|
| 1.63e-29 | ||
| (8 proteins) | |||
|
| 2.66e-26 |
Chemicals already known from the literature to be associated to disease are shown in italic. In bold are the chemicals significantly associated to disease, which are unknown to be disease-causing chemical from the literature. The number of proteins is shown in brackets for each cluster, disease and novel association. As example, among the 433 proteins associated to cluster 2, 122 are known to be linked to necrosis. Among these 122, 8 are connected to apocarotenal in CTD.
Predicting novel molecular targets for chemicals.
| Chemical | Known protein | Cpscore | Novel protein | Cpscore | Literature |
| DEHP | CDO1 | 13.23 |
| 5.46 | Yes |
| PPARA | 9.48 |
| 5.44 | Yes | |
| SUOX | 4.35 |
| 5.40 | Yes | |
| (15 proteins) |
| 4.32 | Yes | ||
|
| 4.32 | Yes | |||
|
| 4.26 | Yes | |||
| TCDD | HSPA9B | 82.69 |
| 10.17 | Yes |
| SLC2A4 | 82.69 |
| 8.97 | Yes | |
| TRIP11 | 82.69 |
| 6.96 | Yes | |
| TSP1 | 82.69 |
| 6.39 | Yes | |
| EPHX2 | 75.77 |
| 6.77 | No | |
| MT2A | 10.85 |
| 5.61 | Yes | |
| (90 proteins) | |||||
| PA | CYP4X1 | 5.67 |
| 5.19 | No |
| PPARA | 2.53 |
| 5.19 | No | |
| CES1 | 1.45 |
| 3.19 | Yes | |
| SULT2A1 | 0.87 |
| 2.61 | No | |
| CYP1A1 | 0.37 |
| 2.80 | Yes | |
|
| 1.34 | Yes | |||
|
| 1.21 | No | |||
|
| 1.08 | Yes | |||
|
| 1.04 | No | |||
|
| 0.93 | No | |||
|
| 0.91 | Yes | |||
| (5 proteins) | |||||
| Permethrin | AR | 4.67 |
| 4.43 | Yes |
| WNT10B | 4.12 |
| 3.51 | Yes | |
| PGR | 3.75 |
| 2.89 | No | |
| ESR1 | 3.31 |
| 2.64 | Yes | |
| TFF1 | 3.15 | ||||
| NR1I2 | 2.94 | ||||
| (17 proteins) |
*Proteins known to be associated to a compound were extracted from the CTD. In brackets is the total number of known proteins used to query the P-PAN. To find novel protein targets (in bold) associated to a chemical, a neighbor proteins procedure was used which scored the association between proteins and chemicals (cpscore). Among the novel predicted proteins (thus not input data), some are supported by literature, highlighting the usefulness of the P-PAN to identify new chemical-protein associations.
Figure 2Cross-species comparative toxicogenomics for bisphenol A (BPA).
Molecular targets are represented as nodes, and colored by gene family. Nodes presence represent available information extracted from the CTD and node absence are the unknown information. Colored nodes defined that BPA affect the protein, while nodes are not colored when BPA does not affect the protein. This figure highlights similarities and differences existing between animal model and human responses to chemical exposure.