| Literature DB >> 23825689 |
Abstract
Expression Quantitative Trait Locus (eQTL) analysis is a powerful tool to study the biological mechanisms linking the genotype with gene expression. Such analyses can identify genomic locations where genotypic variants influence the expression of genes, both in close proximity to the variant (cis-eQTL), and on other chromosomes (trans-eQTL). Many traditional eQTL methods are based on a linear regression model. In this study, we propose a novel method by which to identify eQTL associations with information theory and machine learning approaches. Mutual Information (MI) is used to describe the association between genetic marker and gene expression. MI can detect both linear and non-linear associations. What's more, it can capture the heterogeneity of the population. Advanced feature selection methods, Maximum Relevance Minimum Redundancy (mRMR) and Incremental Feature Selection (IFS), were applied to optimize the selection of the affected genes by the genetic marker. When we applied our method to a study of apoE-deficient mice, it was found that the cis-acting eQTLs are stronger than trans-acting eQTLs but there are more trans-acting eQTLs than cis-acting eQTLs. We compared our results (mRMR.eQTL) with R/qtl, and MatrixEQTL (modelLINEAR and modelANOVA). In female mice, 67.9% of mRMR.eQTL results can be confirmed by at least two other methods while only 14.4% of R/qtl result can be confirmed by at least two other methods. In male mice, 74.1% of mRMR.eQTL results can be confirmed by at least two other methods while only 18.2% of R/qtl result can be confirmed by at least two other methods. Our methods provide a new way to identify the association between genetic markers and gene expression. Our software is available from supporting information.Entities:
Mesh:
Year: 2013 PMID: 23825689 PMCID: PMC3692482 DOI: 10.1371/journal.pone.0067899
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The workflow of mRMR.eQTL.
(A) The input of mRMR.eQTL includes genotype and gene expression data of the same samples. (B) For each SNP, the SNP status is considered as class label and the gene expressions are considered as features. (C) mRMR feature selection is applied to rank the genes based on its relevance to the genotype and redundant to other genes. (D) Incremental feature selection is applied to select the optimal gene set that can best discriminate the genotype status. (E) The eQTL tables are generated based on the mRMR and IFS results.
Figure 2The venn diagram of mRMR.eQTL, R/qtl, MatrixEQTL.modelLINEAR and MatrixEQTL.modelANOVA in female and male mice.
(A) The venn diagram of mRMR.eQTL, R/qtl, MatrixEQTL.modelLINEAR and MatrixEQTL.modelANOVA in female mice; (B) The venn diagram of mRMR.eQTL, R/qtl, MatrixEQTL.modelLINEAR and MatrixEQTL.modelANOVA in male mice.
SNPs with significantly more Apoe partners in female mice.
| SNP | Gene located close to the SNP | P value | Number of Apoe partners | Apoe partners |
| rs6350987 | Kcna4 | 0.003421984 | 3 | Cat, Cd44, Rbm45 |
| rs13476656 | Gm13803 | 0.005736386 | 3 | Cat, Cd44, Rbm45 |
| rs13476672 | Cd44 | 0.005736386 | 3 | Cat, Cd44, Rbm45 |
| rs3689502 | Gm13803 | 0.005736386 | 3 | Cat, Cd44, Rbm45 |
| rs6246565 | Hsd17b12 | 0.005736386 | 3 | Cat, Cd44, Rbm45 |
| rs13478827 | Gm8992 | 0.030289618 | 2 | Gpnmb, Apobec1 |
SNPs with significantly more Apoe partners in male mice.
| SNP | Gene located close to the SNP | P value | Number of Apoe partners | Apoe partners |
| rs13480712 | Hal | 0.007091998 | 12 | Ebp, Npc2, Pla2g2e, Lta4h, Vapb, Enpp1, Irak1, Ncor2, Gla, Ccl24, Cbx3, Hecw1 |
| rs13481811 | BB123696 | 0.007278187 | 3 | 2010111I01Rik, Sptlc1, Nrip1 |
| rs13480667 | Ikbip | 0.008178776 | 9 | Npc2, Ngb, Pla2g2e, Lipg, Lta4h, Enpp1, Tax1bp1, Nr0b2, Il6st |
| rs13481820 | Gm19516 | 0.011111922 | 3 | Stab2, 2010111I01Rik, Sptlc1 |
| rs13481821 | Slc25a48 | 0.011111922 | 3 | Stab2, 2010111I01Rik, Sptlc1 |
| rs3698807 | Gm19516 | 0.011111922 | 3 | Stab2, 2010111I01Rik, Sptlc1 |
| rs8273881 | Slc34a1 | 0.011111922 | 3 | Stab2, 2010111I01Rik, Sptlc1 |
| rs13480704 | Mir135a-2 | 0.014548555 | 7 | Npc2, Pla2g2e, Lipg, Lta4h, Vapb, Enpp1, Il6st |
| rs13480695 | Nr1h4 | 0.014572157 | 11 | Plat, Npc2, Pla2g2e, Lipg, Usp12, Lta4h, Enpp1, Plek, Nr0b2, Apaf1, Il6st |
| rs13481896 | LOC101055640 | 0.015906918 | 3 | Stab2, 2010111I01Rik, Sptlc1 |
| rs13478738 | Cntnap2 | 0.019373037 | 4 | Dfna5, Armc9, Pnlip, Gpnmb |
| rs13481850 | Ercc6l2 | 0.021689728 | 3 | Stab2, 2010111I01Rik, Sptlc1 |
| rs3705446 | Arrdc3 | 0.021689728 | 3 | Stab2, 2010111I01Rik, Sptlc1 |
Figure 3The network of Apoe partners and their upstream SNPs.
(A) The network of Apoe partners and their upstream SNPs in male mice. The red node is Apoe. The grey nodes are Apoe partners. The orange nodes are their upstream SNPs. The grey edges are protein-protein interactions. The orange edges are eQTL relationships between SNPs and genes. (B) The network of Apoe partners and their upstream SNPs in female mice. The red node is Apoe. The grey nodes are Apoe partners. The orange nodes are their upstream SNPs. The grey edges are protein-protein interactions. The orange edges are eQTL relationships between SNPs and genes.
Figure 4The precision-recall curves of our method, R/qtl, and MatrixEQTL (modelLINEAR and modelANOVA).
The red, green, brown, purple lines represent the precision-recall curves of our method, R/qtl, MatrixEQTL_modelLINEAR and MatrixEQTL_modelANOVA, respectively.
The AUPR comparison of our method, R/qtl, and MatrixEQTL modelLINEAR and MatrixEQTL modelANOVA.
| Our method | R/qtl | MatrixEQTL modelLINEAR | MatrixEQTL modelANOVA | |
| AUPR | 0.131679926 | 0.12847128 | 0.108051418 | 0.116587322 |
| RAUPR | 1 | 0.975632993 | 0.820561052 | 0.885384173 |