| Literature DB >> 21044367 |
Nicholas P Tatonetti1, Joel T Dudley, Hersh Sagreiya, Atul J Butte, Russ B Altman.
Abstract
BACKGROUND: A key challenge in pharmacogenomics is the identification of genes whose variants contribute to drug response phenotypes, which can include severe adverse effects. Pharmacogenomics GWAS attempt to elucidate genotypes predictive of drug response. However, the size of these studies has severely limited their power and potential application. We propose a novel knowledge integration and SNP aggregation approach for identifying genes impacting drug response. Our SNP aggregation method characterizes the degree to which uncommon alleles of a gene are associated with drug response. We first use pre-existing knowledge sources to rank pharmacogenes by their likelihood to affect drug response. We then define a summary score for each gene based on allele frequencies and train linear and logistic regression classifiers to predict drug response phenotypes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21044367 PMCID: PMC2967750 DOI: 10.1186/1471-2105-11-S9-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Knowledge filtered significant SNPs
| SNP | Gene | Name | P-Value |
|---|---|---|---|
| rs4086116 | CYP2C9 | cytochrome P450, family 2, subfamily C | 7.54E-05 |
| rs4917639 | CYP2C9 | cytochrome P450, family 2, subfamily C | 9.47E-05 |
| rs9332169 | CYP2C9 | cytochrome P450, family 2, subfamily C | 2.33E-04 |
| rs9332214 | CYP2C9 | cytochrome P450, family 2, subfamily C | 2.33E-04 |
| rs10509680 | CYP2C9 | cytochrome P450, family 2, subfamily C | 2.33E-04 |
| rs12445568 | HSD3B7 | 3-beta-HSD VII | 3.84E-04 |
| rs12357515 | EXOC6 | Exocyst complex component Sec15A | 3.86E-04 |
| rs11187215 | EXOC6 | Exocyst complex component Sec15A | 3.87E-04 |
| rs7294 | VKORC1 | Vitamin K1 2,3-epoxide reductase subunit 1 | 4.15E-04 |
Top 10 of 3,856 SNPs fitting a univariate linear regression model of warfarin dose for 181 patients. Bold SNPs are significant after multiple hypothesis testing. Only rs10871454 (a SNP in 100% LD with VKORC1) was significant after correcting for multiple hypothesis testing.
Univariate gene linear regression
| Gene | Name | P-Value |
|---|---|---|
| NSUN6 | NOL1/NOP2/Sun domain family 6 | 1.07E-02 |
| UBE3A | E6AP ubiquitin-protein ligase | 1.35E-02 |
| BRF1 | B - related factor 1 | 1.39E-02 |
| QTRTD1 | queuine tRNA-ribosyltransferase domain containing 1 | 1.54E-02 |
| F8 | Procoagulant component | 2.40E-02 |
| BAT5 | HLA-B associated transcript 5 | 3.02E-02 |
| COL1A2 | Alpha-2 type I collagen | 3.23E-02 |
| RCN2 | E6-binding protein | 3.27E-02 |
Genes with candidate gene-scores that significantly predict dose in a univariate linear regression model (p≤0.05). Genes in bold are significant after correcting for multiple hypothesis testing.
Low/not low model
| Gene | Name | P-Value |
|---|---|---|
| SLA2 | Src-like adapter protein 2 | 2.11E-03 |
| DICER1 | Dicer1, Dcr-1 homolog | 3.76E-03 |
| CYP2C9 | cytochrome P450, family 2, subfamily C | 5.49E-03 |
| SLC22A1 | solute carrier family 22 | 1.12E-02 |
| BBC3 | BCL2 binding component 3 | 1.49E-02 |
| F8 | Procoagulant component | 1.74E-02 |
| HMOX2 | heme oxygenase (decycling) 2 | 1.79E-02 |
| MUTED | muted homolog | 1.99E-02 |
| VWF | coagulation factor VIII VWF | 2.40E-02 |
| HSD3B7 | 3-beta-HSD VII | 2.56E-02 |
| SPIN1 | spindlin 1 | 2.58E-02 |
| SELPLG | selectin P ligand | 2.89E-02 |
| FAM113B | family with sequence similarity 113, member B | 2.92E-02 |
| F13B | Fibrin-stabilizing factor B subunit | 3.36E-02 |
| MVP | major vault protein | 3.39E-02 |
| UGT2B7 | UDP glucuronosyltransferase 2B7 | 3.83E-02 |
| AKR7A2 | aflatoxin beta1 aldehyde reductase | 4.17E-02 |
| SERTAD1 | CDK4-binding protein p34SEI | 4.96E-02 |
Top 20 genes with candidate gene-scores that significantly distinguish between low dose patients and non-low-dose patients (p≤0.05). P-Values are result of t test between low-dose and non-low-dose distributions of gene-scores. Genes in bold are significant after correcting for multiple hypothesis testing.
Figure 1Low Dose Classification ROC Curve. Receiver Operating Characteristic Curve for the low dose classification algorithms. Two classifiers were trained, the first, dotted line, on all 20 genes for which the gene-scores significantly distinguish low-dose and non-low-dose patients (AUROC=0.886, p≤0.05, Table 3), and the second, dashed line, on only those genes that were significant after multiple hypothesis testing correction (AUROC=0.721, p≤0.001, Table 3). Both classifiers have empirical p-value significance of less than 0.01 when tested using bootstrapping.
High/not high model
| Gene | Name | P-Value |
|---|---|---|
| NAT13 | N-acetyltransferase 13 | 5.25E-03 |
| BAT5 | HLA-B associated transcript 5 | 6.76E-03 |
| CRP | C-reactive protein, pentraxin-related | 1.02E-02 |
| COL1A2 | Alpha-2 type I collagen | 1.37E-02 |
| GGCX | Vitamin K gamma glutamyl carboxylase | 1.44E-02 |
| A2M | C3 and PZP-like alpha-2-macroglobulin | 2.06E-02 |
| FBXO28 | F-box protein 28 | 2.14E-02 |
| HSD3B7 | 3-beta-HSD VII | 2.54E-02 |
| ITGA5 | Fibronectin receptor subunit alpha | 4.07E-02 |
| ALS2 | amyotrophic lateral sclerosis 2 | 4.13E-02 |
| NSUN6 | NOL1/NOP2/Sun domain family 6 | 4.21E-02 |
| SORBS3 | sorbin and SH3 domain containing 3 | 4.99E-02 |
Top 15 genes with candidate gene-scores that significantly distinguish between high dose patients and non-high-dose patients (p≤0.05). P-Values are result of t test between low-dose and non-low-dose distributions of gene-scores. Genes in bold are significant after correcting for multiple hypothesis testing.
Figure 2High Dose Classification ROC Curve. Receiver Operating Characteristic Curve for the high dose classification algorithms. Two classifiers were trained, the first, dotted line, on all 15 genes for which the gene-scores significantly distinguish high-dose and non-high-dose patients (AUROC=0.764, p≤0.05, Table 3), and the second, dashed line, on only those genes that were significant after multiple hypothesis testing correction (AUROC=0.693, p≤0.001, Table 3). Both classifiers have empirical p-value significance of less than 0.01 when tested using bootstrapping.