| Literature DB >> 23144783 |
Benjamin L King1, Allan Peter Davis, Michael C Rosenstein, Thomas C Wiegers, Carolyn J Mattingly.
Abstract
Exposure to chemicals in the environment is believed to play a critical role in the etiology of many human diseases. To enhance understanding about environmental effects on human health, the Comparative Toxicogenomics Database (CTD; http://ctdbase.org) provides unique curated data that enable development of novel hypotheses about the relationships between chemicals and diseases. CTD biocurators read the literature and curate direct relationships between chemicals-genes, genes-diseases, and chemicals-diseases. These direct relationships are then computationally integrated to create additional inferred relationships; for example, a direct chemical-gene statement can be combined with a direct gene-disease statement to generate a chemical-disease inference (inferred via the shared gene). In CTD, the number of inferences has increased exponentially as the number of direct chemical, gene and disease interactions has grown. To help users navigate and prioritize these inferences for hypothesis development, we implemented a statistic to score and rank them based on the topology of the local network consisting of the chemical, disease and each of the genes used to make an inference. In this network, chemicals, diseases and genes are nodes connected by edges representing the curated interactions. Like other biological networks, node connectivity is an important consideration when evaluating the CTD network, as the connectivity of nodes follows the power-law distribution. Topological methods reduce the influence of highly connected nodes that are present in biological networks. We evaluated published methods that used local network topology to determine the reliability of protein-protein interactions derived from high-throughput assays. We developed a new metric that combines and weights two of these methods and uniquely takes into account the number of common neighbors and the connectivity of each entity involved. We present several CTD inferences as case studies to demonstrate the value of this metric and the biological relevance of the inferences.Entities:
Mesh:
Year: 2012 PMID: 23144783 PMCID: PMC3492369 DOI: 10.1371/journal.pone.0046524
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Transitive chemical-disease inferences and the computational approaches used to score inferences.
A) Diagram of local network for the transitive chemical-disease inference (dotted line) between a chemical, X, and a disease, Y, using a set of genes, A, that have both curated chemical-gene interactions and gene-disease associations (solid lines). The chemical, disease and each gene involved have interactions and relationships to other nodes (chemicals, genes, diseases) in the database. Chemical X has some number of other genes (grey circles) that it interacts with and associated diseases (grey squares). Disease Y has other associated genes and curated relationships to other chemicals (grey triangles). Each gene used to make the inference, g to g, are known to interact with other chemicals (grey triangles) and are associated with other diseases (grey squares). B) Diagrams showing three methods to score inferences. The first, C and p, is based on the number of genes (circles) used to make the inference and the connectivity (bold lines) of the chemical (triangle) and disease (square). The second, p, takes the number of genes (circles) used to make the inference and their connectivity (bold lines) into account. The third, S and W, takes the number of genes into account as well as the connectivity of the chemical, disease and each of the genes into account.
Top 20 hub chemicals, genes and disease in the CTD network.
| Chemical Name (ID) | Edges | Gene Symbol | Edges | Disease Name (ID) | Edges |
| Tetrachlorodibenzodioxin (D013749) | 7176 |
| 835 | Prostatic Neoplasms (D011471) | 515 |
| Acetaminophen (D000082) | 6362 |
| 581 | Breast Neoplasms (D001943) | 442 |
| pirinixic acid (C006253) | 5664 |
| 553 | Autistic Disorder (D001321) | 303 |
| Ammonium Chloride (D000643) | 5271 |
| 551 | Lung Neoplasms (D008175) | 240 |
| Ethinyl Estradiol (D004997) | 5066 |
| 546 | Liver Cirrhosis, Experimental (D008106) | 230 |
| Cyclosporine (D016572) | 4601 |
| 521 | Stomach Neoplasms (D013274) | 210 |
| Benzo(a)pyrene (D001564) | 3397 |
| 517 | Colorectal Neoplasms (D015179) | 197 |
| 7,8-Dihydro-7,8-dihydroxybenzo(a)pyrene 9,10-oxide (D015123) | 2918 |
| 492 | Craniofacial Abnormalities (D019465) | 179 |
| 4,4′-diaminodiphenylmethane (C009505) | 2702 |
| 473 | Carcinoma, Hepatocellular (D006528) | 173 |
| 2,4-dinitrotoluene (C016403 | 2647 |
| 458 | Drug-Induced Liver Injury (D056486) | 167 |
| 2,6-dinitrotoluene (C023514) | 2628 |
| 456 | Melanoma (D008545) | 157 |
| Estradiol (D004958) | 2620 |
| 453 | Colonic Neoplasms (D003110) | 126 |
| Tamoxifen (D013629) | 2259 |
| 443 | Inflammation (D007249) | 124 |
| Carbon Tetrachloride (D002251) | 2237 |
| 414 | Liver Diseases (D008107) | 122 |
| Diethylnitrosamine (D004052) | 2153 |
| 404 | Liver Neoplasms (D008113) | 122 |
| Tretinoin (D014212) | 1957 |
| 380 | Neoplasms (D009369) | 118 |
| arsenic trioxide (C006632) | 1938 |
| 375 | Schizophrenia (D012559) | 115 |
| sodium arsenite (C017947) | 1910 |
| 374 | Alzheimer Disease (D000544) | 109 |
| Dietary Fats (D004041) | 1907 |
| 368 | Leukemia, Myeloid, Acute (D015470) | 107 |
| Phenobarbital (D010634) | 1831 |
| 367 | Adenocarcinoma (D000230) and Seizures (D012640) | 95 |
Disease inferences for BPA that are based on five interacting genes.
| Disease (# edges) | Gene Symbols (# edges) |
|
|
|
|
|
| Disorders of Sex Development (MESH:D012734) (5) |
| 7.48 | 17.21 | 27.02 | 22.04 | 17.10 |
| Muscular Dystrophy, Facioscapulohumeral (MESH:D020391) (9) |
| 5.42 | 12.51 | 27.77 | 20.21 | 12.45 |
| Osteosarcoma (MESH:D012516) (11) |
| 4.88 | 11.27 | 22.54 | 16.91 | 11.38 |
| Metabolic Syndrome X (MESH:D024821) (16) |
| 3.96 | 9.19 | 26.30 | 18.36 | 10.27 |
| Precancerous Conditions (MESH:D011230) (20) |
| 3.46 | 8.05 | 23.44 | 15.74 | 8.19 |
| Myocardial Reperfusion Injury (MESH:D015428) (22) |
| 3.25 | 7.59 | 25.19 | 16.39 | 7.75 |
| Drug Hypersensitivity (MESH:D004342) (22) |
| 3.25 | 7.59 | 24.13 | 15.86 | 7.74 |
| Limb Deformities, Congenital (MESH:D017880) (25) |
| 2.98 | 6.98 | 30.32 | 18.98 | 7.41 |
| Carcinoma (MESH:D002277) (25) |
| 2.98 | 6.98 | 24.73 | 15.86 | 7.14 |
| Endometrial Neoplasms (MESH:D016889) (34) |
| 2.37 | 5.62 | 29.24 | 17.79 | 6.07 |
| Cleft Lip (MESH:D002971) (28) |
| 2.75 | 6.46 | 28.09 | 17.28 | 5.83 |
| Leukemia, Promyelocytic, Acute (MESH:D015473) (34) |
| 2.37 | 5.62 | 26.32 | 15.97 | 5.81 |
| Dermatitis, Atopic (MESH:D003876) (37) |
| 2.21 | 5.27 | 24.37 | 14.82 | 5.44 |
| Neuroblastoma (MESH:D009447) (39) |
| 2.11 | 5.05 | 28.09 | 16.57 | 5.26 |
| Glioblastoma (MESH:D005909) (42) |
| 1.97 | 4.76 | 24.17 | 14.46 | 4.94 |
| Cardiovascular Diseases (MESH:D002318) (43) |
| 1.93 | 4.67 | 25.86 | 15.27 | 4.86 |
| Liver Cirrhosis (MESH:D00103) (49) |
| 1.70 | 4.18 | 25.90 | 15.04 | 4.38 |
| Colitis, Ulcerative (MESH:D003093) (50) |
| 1.67 | 4.11 | 26.77 | 15.44 | 4.32 |
| Cell Transformation, Neoplastic (MESH:D002471) (60) |
| 1.37 | 3.48 | 26.52 | 15.00 | 3.70 |
| Lupus Erythematosus, Systemic (MESH:D008180) (62) |
| 1.32 | 3.38 | 25.71 | 14.55 | 3.58 |
| Kidney Diseases (MESH:D007674) (71) |
| 1.11 | 2.97 | 28.15 | 15.56 | 3.20 |
Figure 2Example chemical-disease inference networks with similar numbers of genes, but with different node degrees.
A) Malathion-Breast Neoplasms inference network with C = 7.49, p = 17.30, p = 40.32, S = 28.81 and W = 17.31 that used the following genes (degrees listed in parentheses and used to set gene node diameter): CENPF (29), CYP3A4 (414), HRAS (95), HRAS1(42) IFNB1 (48), IFNG (347), SOD2 (191), TP53 (458) and TYMS (113). B) Pioglitazone-Breast Neoplasms inference network with C = 7.24, p = 16.72, p = 32.10, S = 24.46 and W = 16.73 that used the following genes: CDKN1B (167), CYP3A4 (414), IFNG (347), IL1B (492), NOS2 (456), PTGS2 (521), RB1 (209), RELA (368) and TNF (835).
Top ten C–D inferences by W (a). and S (b).
| (a) | |||||
| Chemical | Disease |
|
|
| Comment |
| decitabine (C014347) | Stomach Neoplasms (D013274) | 102 | 435.03 | 241.34 | Curated |
| Dimethylnitrosamine (D004128) | Liver Cirrhosis, Experimental (D008106) | 80 | 343.09 | 224.11 | Curated (several references in CTD) |
| Estradiol (D004958) | Prostatic Neoplasms (D011471) | 186 | 628.38 | 196.16 | Curated |
| Arsenic (D001151) | Arsenic Poisoning (D020261) | 55 | 271.92 | 192.09 | Curated |
| nitrofen (C007350) | Hernia, Diaphragmatic (D006548) | 35 | 194.42 | 189.09 | Curated (several references in CTD) |
| Ammonium Chloride (D000643) | Prostatic Neoplasms (D011471) | 242 | 795.14 | 174.28 | Novel inference |
| Tetrachlorodibenzodioxin (D013749) | Prostatic Neoplasms (D011471) | 280 | 921.97 | 172.74 | Novel inference |
| Arsenic (D001151) | Skin Diseases (D012871) | 55 | 259.37 | 171.16 | Curated |
| Nitrobenzenes (D009578) | Liver Diseases (D008107) | 29 | 175.48 | 170.82 | Novel inference |
| vorinostat (C111237) | Leukemia, Myeloid, Acute (D015470) | 38 | 201.93 | 164.18 | Novel inference |
Comparison of Specific Inferences in Non-Shuffled and Shuffled Networks.
| Evidence For Inference | Number of Inferences in Real Network | Shuffled Network #1 | Shuffled Network #2 | Shuffled Network #3 | |||
| Matching Inferences | Mean Difference Wxya ( | Matching Inferences | Mean Difference Wxya ( | Matching Inferences | Mean Difference Wxya ( | ||
| Curated | 3,542 | 2,000 | 6.68 (p<0.0001) | 1,977 | 6.96 (p<0.0001) | 1,986 | 6.86 (p<0.0001) |
| Novel | 334,942 | 92,222 | 0.80 (p<0.0001) | 91,338 | 0.79 (p<0.0001) | 92,384 | 0.78 (p<0.0001) |
| TOTAL | 338,484 | 94,222 | 0.92 (p<0.0001) | 93,365 | 0.92 (p<0.0001) | 94,370 | 0.91 (p<0.0001) |
There were a total of 232,072, 232,347 and 233,515 inferences made in the first, second and third shuffled networks.
Figure 3Network of gene regulatory and protein-protein interactions from Ingenuity Pathways Analysis for two chemical-disease inferences.
A) Network of 73 genes used to make the curated inference between BPA and breast neoplasms, and B) network of 43 genes used to make the novel inference between arsenic and breast neoplasms.
Figure 4CTD web interface data tables with ranked C–D relationships.
A) all curated and inferred C–D relationships for BPA (first page only), and B) all curated and inferred C–D relationships for breast neoplasms (first page only) sorted by descending values of W (“Network Score”). Chemical and disease names along with gene symbols are hyperlinked to the CTD detail pages for the chemical, disease and gene, respectively. The direct evidence column is used to display a “M” and/or “T” symbols to indicate whether C–D relationship is curated and the type of the relationship. The “M” symbol indicates that the chemical correlates with disease (marker) or plays a role in the etiology of the disease (mechanism) and the “T” indicates that the chemical has a known or potential therapeutic role in the disease. The number of references in the last column is a hyperlink to the list of references that document the C–G, G–D or C–D relationship. Any references used to make a curated relationship are marked with a “M” or “T” symbol. Users may sort the tables by clicking on the column headings and may also export the tables in Excel or comma-separated, tab-separated or XML text files.