| Literature DB >> 25085578 |
Yuhua Su1, Lei Zhu, Alan Menius, Jason Osborne.
Abstract
Cross-species research in drug development is novel and challenging. A bivariate mixture model utilizing information across two species was proposed to solve the fundamental problem of identifying differentially expressed genes in microarray experiments in order to potentially improve the understanding of translation between preclinical and clinical studies for drug development. The proposed approach models the joint distribution of treatment effects estimated from independent linear models. The mixture model posits up to nine components, four of which include groups in which genes are differentially expressed in both species. A comprehensive simulation to evaluate the model performance and one application on a real world data set, a mouse and human type II diabetes experiment, suggest that the proposed model, though highly structured, can handle various configurations of differential gene expression and is practically useful on identifying differentially expressed genes, especially when the magnitude of differential expression due to different treatment intervention is weak. In the mouse and human application, the proposed mixture model was able to eliminate unimportant genes and identify a list of genes that were differentially expressed in both species and could be potential gene targets for drug development.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25085578 PMCID: PMC4135333 DOI: 10.1186/1479-7364-8-12
Source DB: PubMed Journal: Hum Genomics ISSN: 1473-9542 Impact factor: 4.639
Possible categories of
| 0 | (NDE,NDE) | (0,0) | 0 |
| 1 | (pDE,pDE) | (+,+) | |
| 2 | (nDE,nDE) | (-,-) | |
| 3 | (pDE,nDE) | (+,-) | |
| 4 | (nDE,pDE) | (-,+) | |
| 5 | (NDE,pDE) | (0,+) | 0 |
| 6 | (NDE,nDE) | (0,-) | 0 |
| 7 | (pDE,NDE) | (+,0) | 0 |
| 8 | (nDE,NDE) | (-,0) | 0 |
Combination of parameters for simulation studies case I and case II
| 10 | 10 | 10 | 10 | 100 | 100 | 100 | 100 | |
| 10 | 10 | 10 | 10 | 10 | 10 | 10 | 10 | |
| 0.25 | 0.75 | 0.25 | 0.75 | 0.25 | 0.75 | 0.25 | 0.75 | |
| 0.25 | 0.75 | 0.25 | 0.75 | 0.25 | 0.75 | 0.25 | 0.75 | |
| 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | 0.1 | |
| 0.1 | 0.1 | 0.3 | 0.3 | 0.1 | 0.1 | 0.3 | 0.3 | |
| Case II | sim9 | sim10 | sim11 | sim12 | sim13 | sim14 | sim15 | sim16 |
The number of genes selected based on (a) bivariate mixture model, (b) conventional one-species approach
| | ||||||
|---|---|---|---|---|---|---|
| sim1 | 132 | 0.034 | 100(0.070) | 129 | 0.029 | 96(0.063) |
| sim2 | 224 | 0.003 | 145(0.021) | 223 | 0.012 | 188(0.016) |
| sim3 | 113 | 0.246 | 85(0.318) | 135 | 0.048 | 109(0.073) |
| sim4 | 166 | 0.050 | 115(0.043) | 222 | 0.012 | 188(0.016) |
| sim5 | 132 | 0.028 | 98(0.041) | 234 | 0.004 | 238(0.012) |
| sim6 | 227 | 0.011 | 194(0.021) | 289 | 0.003 | 289(0.007) |
| sim7 | 112 | 0.235 | 78(0.282) | 241 | 0.011 | 246(0.020) |
| sim8 | 167 | 0.048 | 124(0.065) | 288 | 0.002 | 288(0.007) |
| Tukey’s HSD | 30.908 | 0.021 | | 52.958 | 0.016 | |
| sim9 | 92 | 0.126 | 81(0.296) | 88 | 0.129 | 75(0.307) |
| sim10 | 118 | 0.033 | 104(0.048) | 118 | 0.036 | 99(0.061) |
| sim11 | 141 | 0.470 | 109(0.670) | 80 | 0.128 | 70(0.214) |
| sim12 | 103 | 0.116 | 68(0.118) | 118 | 0.049 | 102(0.088) |
| sim13 | 84 | 0.140 | 67(0.169) | 180 | 0.349 | 167(0.407) |
| sim14 | 120 | 0.023 | 88(0.023) | 152 | 0.022 | 151(0.101) |
| sim15 | 119 | 0.480 | 99(0.636) | 168 | 0.358 | 157(0.369) |
| sim16 | 96 | 0.093 | 57(0.105) | 147 | 0.020 | 148(0.061) |
| Tukey’s HSD | 3.947 | 0.011 | 2.337 | 0.004 | ||
Under simulation studies case I and case II. Numbers in parentheses are the observed FDRs. Averaged over the 500 simulated datasets. Tukey’s HSD for an α level of 0.05 is included beneath each set of eight simulation cases.
ANOVA table to quantify variability
| | | | ||
|---|---|---|---|---|
| Case I | Replicates | 1 | 1,141(0.006) | 7,319,401(0.408) ∗ |
| | Mean magnitude | 1 | 5,415,341(11.560) ∗ | 4,982,736(0.252) ∗ |
| | Array noise | 1 | 1,561,198(15.950) ∗ | 6,353(0.042) |
| | Replicates × Mean magnitude | 1 | 1,197(0.032) | 392,099(0.122) ∗ |
| | Replicates × Array noise | 1 | 697(0.012) | 351(0.011) |
| | Mean magnitude × Array noise | 1 | 400,720(7.021) ∗ | 16,601(0.043) ∗ |
| | Replicates × Mean magnitude × Array noise | 1 | 145(0.002) | 214(0.010) |
| | Error | 3,992 | 1,689,667(12.887) | 592,184(1.817) |
| | Total | 3,999 | 9,070,107(47.471) | 13,309,941(2.706) |
| Case II | Replicates | 1 | 73,917(0.006) | 3,676,664(2.888) |
| | Mean magnitude | 1 | 6(56.346) | 22,274(2.830) |
| | Array noise | 1 | 130,794(43.770) ∗ | 865,448(0.001) ∗ |
| | Replicates × Mean magnitude | 1 | 37,277(0.190) | 36,778(1.038) |
| | Replicates × Array noise | 1 | 34,404(0.015) | 5,065(0.049) |
| | Mean magnitude × Array noise | 1 | 929,915(17.537) | 19,128(0.043) |
| | Replicates × Mean magnitude × Array noise | 1 | 1,467(0.004) | 58(0.000) |
| | Error | 3,992 | 103,604,971(47.182) | 304,161,196(27.427) |
| Total | 3,999 | 104,812,751(165.052) | 308,786,610(34.276) |
ANOVA was performed independently for simulation studies case I and case II to quantify the variability among the results (gene counts and observed FDRs). Numbers in parentheses are the sum of squares for the observed FDRs. For gene counts, *indicates which sources of variability could be declared significant at level α = 0.05.
Parameter estimates of the bivariate mixture model
| | ||||||
|---|---|---|---|---|---|---|
| 0 | 0.889(0.013) | NE | NE | 0.117(0.002) | NE | 0.052(0.001) |
| 1 | 0.002(0.001) | 1.440(0.190) | 0.139(0.203) | 0.112(0.064) | -0.070(0.040) | 0.154(0.061) |
| 2 | 0.001(0.001) | -1.174(0.228) | -0.853(0.128) | 0.156(0.132) | 0.033(0.053) | 0.048(0.039) |
| 3 | 0.000(0.001) | 2.104(0.319) | -1.231(0.361) | 0.006(0.124) | -0.005(0.078) | 0.003(0.111) |
| 4 | 0.002(0.001) | -1.047(0.081) | 0.660(0.128) | 0.027(0.018) | -0.005(0.023) | 0.087(0.047) |
| 5 | 0.021(0.001) | NE | 0.669(0.003) | 0.144(0.000) | NE | 0.050(0.006) |
| 6 | 0.049(0.002) | NE | -0.787(0.001) | 0.149(0.000) | NE | 0.055(0.004) |
| 7 | 0.003(0.008) | 1.528(0.310) | NE | 0.543(0.201) | NE | 0.084(0.022) |
| 8 | 0.034(0.009) | -1.038(0.146) | NE | 0.225(0.051) | NE | 0.055(0.006) |
Bootstrap (B = 1,000) standard errors in parentheses. NE, not estimated.
Figure 1Scatter plots of the estimated treatment effects before and after gene membership identification. From left to right: all orthologs, orthologs differentially expressed in either species (categories (1,2,3,4,5,6,7,8)), and orthologs differentially expressed in both species (categories (1,2,3,4)).
Figure 2Histograms of values from tests of no treatment effects. From left to right: p value histogram for mice and p value histogram for humans.