| Literature DB >> 25941488 |
Ming-Hua Chung1, Yuping Wang2, Hailin Tang2, Wen Zou2, John Basinger2, Xiaowei Xu3, Weida Tong2.
Abstract
The advancement of high-throughput screening technologies facilitates the generation of massive amount of biological data, a big data phenomena in biomedical science. Yet, researchers still heavily rely on keyword search and/or literature review to navigate the databases and analyses are often done in rather small-scale. As a result, the rich information of a database has not been fully utilized, particularly for the information embedded in the interactive nature between data points that are largely ignored and buried. For the past 10 years, probabilistic topic modeling has been recognized as an effective machine learning algorithm to annotate the hidden thematic structure of massive collection of documents. The analogy between text corpus and large-scale genomic data enables the application of text mining tools, like probabilistic topic models, to explore hidden patterns of genomic data and to the extension of altered biological functions. In this paper, we developed a generalized probabilistic topic model to analyze a toxicogenomics dataset that consists of a large number of gene expression data from the rat livers treated with drugs in multiple dose and time-points. We discovered the hidden patterns in gene expression associated with the effect of doses and time-points of treatment. Finally, we illustrated the ability of our model to identify the evidence of potential reduction of animal use.Entities:
Keywords: TG-GATEs; author-topic model; bioinformatics; machine learning; probabilistic topic modeling; toxicogenomics
Year: 2015 PMID: 25941488 PMCID: PMC4403303 DOI: 10.3389/fphar.2015.00081
Source DB: PubMed Journal: Front Pharmacol ISSN: 1663-9812 Impact factor: 5.810
Summary of different feature specifications of asymmetric author-topic model.
| 1 | Treatment | 1554 | |
| 2 | Drug | 131 | |
| 3 | Time-dose | 12 |
The probability of latent biological processes for acetaminophen under model 1.
| 36 | Low | 3 | 2 | 0.149 | 36 | 0.124 | 181 | 0.122 |
| 37 | Middle | 3 | 0.279 | 111 | 0.168 | 116 | 0.098 | |
| 38 | High | 3 | 0.139 | 39 | 0.1 | 169 | 0.1 | |
| 39 | Low | 7 | 68 | 0.305 | 162 | 0.211 | 69 | 0.165 |
| 40 | Middle | 7 | 0.366 | 149 | 0.12 | 57 | 0.079 | |
| 41 | High | 7 | 0.275 | 27 | 0.08 | 39 | 0.066 | |
| 42 | Low | 14 | 69 | 0.153 | 134 | 0.138 | 63 | 0.138 |
| 43 | Middle | 14 | 0.342 | 128 | 0.104 | 37 | 0.098 | |
| 44 | High | 14 | 0.274 | 113 | 0.082 | 128 | 0.074 | |
| 45 | Low | 28 | 69 | 0.175 | 96 | 0.175 | 160 | 0.153 |
| 46 | Middle | 28 | 0.278 | 96 | 0.152 | 14 | 0.085 | |
| 47 | High | 28 | 0.366 | 197 | 0.091 | 164 | 0.07 | |
Only top three topics for each different treatment (drug-dose-time) are shown. For full table, see Supplementary .
Topic 161 (in bold) is significantly associated with glutathione metabolism.
Functional annotation of KEGG pathways on latent biological process 161 under model 1.
| rno00480:Glutathione metabolism | 8 | 1.55E-05 | 1.65E-08 | GPX2, GSR, GCLC, G6PD, GSTA5, GCLM, GSTP1, MGST2 |
| rno00980:Metabolism of xenobiotics by cytochrome P450 | 7 | 0.00142 | 1.51E-06 | GSTA5, ADH4, UGT2B1, EPHX1, CYP3A9, GSTP1, MGST2 |
| rno00982:Drug metabolism | 7 | 0.00420 | 4.47E-06 | GSTA5, ADH4, UGT2B1, AOX1, CYP3A9, GSTP1, MGST2 |
Functional annotation is done on online database David. Only the top 3 annotated of KEGG pathway terms are shown here. For full table, see Supplementary .
The probability of latent biological processes for acetaminophen, bromobenzene, chlormezanone, coumarin, methimazole, and ticlopidine under model 2.
| 3 | Acetaminophen | 0.201 | 17 | 0.190 | 1 | 0.118 | |
| 16 | Bromobenzene | 0.318 | 1 | 0.138 | 17 | 0.125 | |
| 27 | Chlormezanone | 9 | 0.341 | 0.192 | 1 | 0.128 | |
| 37 | Coumarin | 98 | 0.293 | 0.193 | 1 | 0.142 | |
| 81 | Methimazole | 0.211 | 21 | 0.185 | 32 | 0.143 | |
| 123 | Ticlopidine | 9 | 0.248 | 0.093 | 1 | 0.089 | |
Again, only top three latent processes for each drug are shown. For full table, see Supplementary .
Topic 92 (in bold) is significantly associated with glutathione metabolism.
Functional annotation of KEGG pathways on latent biological process 92 under model 2.
| rno00480:Glutathione metabolism | 11 | 5.67E-07 | 5.18E-10 | GSTM1, GPX2, GSR, GCLC, GSTM4, G6PD, GSTA5, GSTT1, GCLM, GSTP1, GSTM7, MGST2 |
| rno00980:Metabolism of xenobiotics by cytochrome P450 | 9 | 9.31E-04 | 8.51E-07 | GSTM1, GSTM4, GSTA5, ADH4, UGT2B1, EPHX1, GSTT1, GSTP1, GSTM7, MGST2 |
| rno00982:Drug metabolism | 9 | 0.00384 | 3.51E-06 | GSTM1, GSTM4, GSTA5, ADH4, UGT2B1, AOX1, GSTT1, GSTP1, GSTM7, MGST2 |
Functional annotation is done on online database David. Only the top 3 annotated of KEGG pathway terms are shown here. For full table, see Supplementary .
Most similar drugs to acetaminophen based on sKL scores.
| Bromobenzene | 3.04238 |
| Phenacetin | 4.47157 |
| Bucetin | 4.51243 |
| Cimetidine | 5.46445 |
| Disopyramide | 5.85482 |
| Cephalothin | 5.89109 |
| Papaverine | 5.92761 |
| Erythromycin ethylsuccinate | 5.92976 |
| Coumarin | 6.03134 |
| Nitrofurantoin | 6.03479 |
The smaller the sKL is, the more similar two drugs are. Only top 10 ranked drugs are shown here. For full table, see Supplementary .
Generalized linear models for sKL scores under three (Adjusted R-square, AIC, and BIC) criteria, with best outcomes bolded.
| Model 3 | 0.437 | 0.076 | 93.771 | 117.212 | 106.909 | 121.591 | |||
| Acetaminophen | 0.453 | 0.051 | 216.462 | 246.815 | 229.600 | 251.194 | |||
| Coumarin | 0.583 | 0.016 | 258.487 | 296.490 | 273.814 | 300.869 | |||
| Benzbromarone | 0.813 | 0.004 | 225.281 | 340.736 | 240.609 | 345.115 | |||