| Literature DB >> 30450112 |
Mohammad Nazmol Hasan1,2, Md Masud Rana1, Anjuman Ara Begum1, Moizur Rahman3, Md Nurul Haque Mollah1.
Abstract
Detection of biomarker genes and their regulatory doses of chemical compounds (DCCs) is one of the most important tasks in toxicogenomic studies as well as in drug design and development. There is an online computational platform "Toxygates" to identify biomarker genes and their regulatory DCCs by co-clustering approach. Nevertheless, the algorithm of that platform based on hierarchical clustering (HC) does not share gene-DCC two-way information simultaneously during co-clustering between genes and DCCs. Also it is sensitive to outlying observations. Thus, this platform may produce misleading results in some cases. The probabilistic hidden variable model (PHVM) is a more effective co-clustering approach that share two-way information simultaneously, but it is also sensitive to outlying observations. Therefore, in this paper we have proposed logistic probabilistic hidden variable model (LPHVM) for robust co-clustering between genes and DCCs, since gene expression data are often contaminated by outlying observations. We have investigated the performance of the proposed LPHVM co-clustering approach in a comparison with the conventional PHVM and Toxygates co-clustering approaches using simulated and real life TGP gene expression datasets, respectively. Simulation results show that the proposed method improved the performance over the conventional PHVM in presence of outliers; otherwise, it keeps equal performance. In the case of real life TGP data analysis, three DCCs (glibenclamide-low, perhexilline-low, and hexachlorobenzene-medium) for glutathione metabolism pathway dataset as well as two DCCs (acetaminophen-medium and methapyrilene-low) for PPAR signaling pathway dataset were incorrectly co-clustered by the Toxygates online platform, while only one DCC (hexachlorobenzene-low) for glutathione metabolism pathway was incorrectly co-clustered by the proposed LPHVM approach. Our findings from the real data analysis are also supported by the other findings in the literature.Entities:
Keywords: co-clustering; doses of chemical compounds (DCCs); logistic probabilistic hidden variable model (LPHVM); logistic transformation; outlying observations; probabilistic hidden variable model (PHVM); toxicogenomic biomarker
Year: 2018 PMID: 30450112 PMCID: PMC6225736 DOI: 10.3389/fgene.2018.00516
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1A typical toxicogenomic experimental model for a single time point according to which gene expression data of the animal samples can be collected. In the figure there is a treatment group of animals and a control group of animals from which the fold change gene expression data can be obtained.
Figure 3Gene and doses of chemical compounds co-clustered view retrieved from the LPHVM generated gene and DCCs joint probability. (A) Represents glutathione metabolism pathway dataset at 24 h time point. (B) Represents PPAR signaling pathway dataset at 24 h time point. (C) Represents glutathione metabolism pathway dataset for all time points of Toxygates data.
Figure 2Average gene and doses of chemical compounds co-clustering ER are plotted against the rate of outliers, when each of the data sets are simulated 100 times and outliers in the datasets are introduced using THCM. In the figure (A) for D1 dataset and (B) for D2 dataset.
Average values of the gene and doses of chemical compounds co-clustering ER for the simulated datasets D1 and D2 when each of the datasets are simulated 100 times and contaminated by outlier using ICM.
| PHVM | 0.175 | 24.675 | 28.950 | 32.912 | 33.500 | 35.125 | 38.487 | |
| Proposed | 0.025 | 0.387 | 0.612 | 0.725 | 1.0 | 1.862 | 2.500 | |
| PHVM | 0.00 | 25.390 | 26.563 | 29.554 | 32.172 | 39.754 | ||
| Proposed | 0.00 | 0.163 | 0.945 | 1.481 | 1.600 | 2.072 | ||
Average values of the Gene and DCCs joint probabilities within the co-clusters generated by the proposed LPHVM algorithm for the simulated and real life datasets.
| 0.0006095721 | 0.0010120670 | 0.0010117088 | |
| 0.0005162618 | 0.0005163485 | 0.0003147069 | |
| Glutathione metabolism pathway | 0.0006196723 | 0.0005331547 | |
| PPAR signaling pathway | 0.0004471087 | 0.0003704091 |
Upregulated and downregulated biomarker genes and their regulatory doses of chemical compounds for real life datasets.
| Glutathione metabolism pathway | Gsta4, Gstm1, Sms, Rrm1, Odc1, Gsta2/Gsta5, Gss, Gstm4, | hexachlorobenzene_Low |
| PPAR signaling pathway | Dbi, Acsl1, Acadl, Hmgcs2, Plin2, Slc27a2, Acadm, Fads2, Fabp3, Me1, Sorbs1, Acsl3, Cyp4a2, Aqp7, Cpt1a, Cyp8b1, OC100365047, LOC100910385, Angptl4, Cpt1b, Cpt2, Plin5, Cyp4a3, Acaa1a, Cyp4a1, Ehhadh, Pdpk1, Apoa5, Fabp4, Cyp27a1, Cpt1c, Fabp5 | benzbromarone_Middle |
Biomarker genes regulatory doses of chemical compounds ranking for real datasets (glutathione metabolism and PPAR signaling pathway).
| Glutathione metabolism pathway | acetaminophen_High | 100.00 |
| nitrofurazone_High | 99.59 | |
| acetaminophen_Middle | 95.98 | |
| methapyrilene_High | 88.66 | |
| nitrofurazone_Middle | 82.24 | |
| acetaminophen_Low | 77.84 | |
| hexachlorobenzene_Low | 74.57 | |
| PPAR signaling pathway | WY14643_High | 100.00 |
| WY14643_Middle | 97.59 | |
| clofibrate_High | 93.25 | |
| aspirin_High | 92.91 | |
| benzbromarone_High | 92.25 | |
| WY14643_Low | 91.19 | |
| aspirin_Middle | 87.93 | |
| aspirin_Low | 86.41 | |
| gemfibrozil_High | 85.51 | |
| gemfibrozil_Middle | 84.52 | |
| benzbromarone_Middle | 79.07 |
Top 20 (ranked) biomarker gene and their regulatory doses of chemical compound relationships for glutathione metabolism pathway and PPAR signaling pathway datasets.
| acetaminophen_High | Gsta5 | 100.00 | WY14643_High | Ehhadh | 100.00 |
| nitrofurazone_High | Gsta5 | 96.26 | WY14643_High | Cyp4a1 | 97.29 |
| acetaminophen_Middle | Gsta5 | 91.69 | WY14643_Middle | Ehhadh | 95.32 |
| acetaminophen_High | G6pd | 90.85 | WY14643_Middle | Cyp4a1 | 93.17 |
| acetaminophen_High | Gpx2 | 89.67 | WY14643_High | Acaa1a | 92.41 |
| nitrofurazone_High | G6pd | 89.48 | clofibrate_High | Ehhadh | 88.93 |
| nitrofurazone_High | Gpx2 | 89.29 | WY14643_Middle | Acaa1a | 88.47 |
| acetaminophen_Middle | Gpx2 | 86.05 | clofibrate_High | Cyp4a1 | 87.34 |
| acetaminophen_Middle | G6pd | 85.91 | benzbromarone_High | Ehhadh | 87.04 |
| acetaminophen_High | Gsr | 85.19 | WY14643_High | Cyp4a3 | 86.68 |
| acetaminophen_High | Gstp1 | 83.54 | WY14643_Low | Ehhadh | 86.65 |
| nitrofurazone_High | Gsr | 83.25 | WY14643_High | Plin5 | 85.99 |
| nitrofurazone_High | Gstp1 | 81.53 | benzbromarone_High | Cyp4a1 | 85.67 |
| acetaminophen_High | Mgst2 | 80.46 | WY14643_Low | Cyp4a1 | 85.17 |
| acetaminophen_High | Gclc | 80.38 | WY14643_High | Cpt2 | 84.46 |
| methapyrilene_High | Gsta5 | 80.23 | WY14643_High | Cpt1b | 84.45 |
| acetaminophen_Middle | Gsr | 79.71 | WY14643_High | Angptl4 | 83.99 |
| acetaminophen_High | Gclm | 79.56 | aspirin_High | Ehhadh | 83.60 |
| methapyrilene_High | Gpx2 | 79.47 | WY14643_Middle | Cyp4a3 | 83.54 |
| nitrofurazone_High | Gclc | 78.93 | aspirin_High | Cyp4a1 | 83.10 |
| Gene group-11 | + | 0 | 0 | ||
| Gene group-12 | – | 0 | 0 | ||
| Gene group-21 | 0 | + | 0 | + | |
| Gene group-22 | 0 | – | 0 | ||
| Gene group-3 | 0 | 0 | 0 | (2) |