| Literature DB >> 24931977 |
Wei Cheng1, Xiang Zhang1, Zhishan Guo1, Yu Shi1, Wei Wang1.
Abstract
MOTIVATION: As a promising tool for dissecting the genetic basis of complex traits, expression quantitative trait loci (eQTL) mapping has attracted increasing research interest. An important issue in eQTL mapping is how to effectively integrate networks representing interactions among genetic markers and genes. Recently, several Lasso-based methods have been proposed to leverage such network information. Despite their success, existing methods have three common limitations: (i) a preprocessing step is usually needed to cluster the networks; (ii) the incompleteness of the networks and the noise in them are not considered; (iii) other available information, such as location of genetic markers and pathway information are not integrated.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24931977 PMCID: PMC4058913 DOI: 10.1093/bioinformatics/btu293
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Examples of prior knowledge on genetic-interaction network S and gene–gene interactions represented by PPI network or gene co-expression network G. W is the regression coefficients to be learned
Summary of notations
| Symbols | Description |
|---|---|
| Number of SNPs | |
| Number of genes | |
| Number of samples | |
| The SNP matrix data | |
| The gene matrix data | |
| A low-rank matrix | |
| The input affinity matrices of the genetic-interaction network | |
| The input affinity matrices of the network of traits | |
| The refined affinity matrices of the genetic-interaction network | |
| The refined affinity matrices of the network of traits | |
| The coefficient matrix to be inferred | |
| The graph regularizer from the genetic-interaction network | |
| The graph regularizer from the PPI network | |
| A non-negative distance measure |
Fig. 2.Ground truth of matrix W and that estimated by different methods. The x-axis represents traits and y-axis represents SNPs. Normalized absolute values of regression coefficients are used. Darker color implies stronger association
Fig. 3.The ground truth networks, prior partial networks and the refined networks
Fig. 4.Power curves for synthetic data. The left plots show the ROC curve, where our model GDL achieved maximum power. The black solid line denotes what random guessing would have achieved. The right plots illustrate the areas under the precision-recall curve (AUCs) of different methods
Fig. 5.The areas under the TPR-FPR curve (AUCs) of Lasso, LORS, G-Lasso and GDL. In each panel, we vary the percentage of noises in the prior networks and
Pairwise comparison of different models using cis-enrichment and trans-enrichment analysis
| GDL | G-Lasso | SIOL | Mtlasso2G | Multi-task | Sparse group | LORS | Lasso | |
|---|---|---|---|---|---|---|---|---|
| GGDL | 0.0003 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |
| GDL | – | 0.0009 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |
| G-Lasso | – | – | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 | <0.0001 |
| SIOL | – | – | – | 0.1213 | 0.0331 | 0.0173 | <0.0001 | <0.0001 |
| Mtlasso2G | – | – | – | – | 0.0487 | 0.0132 | <0.0001 | <0.0001 |
| Multi-task | – | – | – | – | – | 0.4563 | 0.4132 | <0.0001 |
| Sparse group | – | – | – | – | – | – | 0.4375 | <0.0001 |
| LORS | – | – | – | – | – | – | – | <0.0001 |
| GGDL | 0.0881 | 0.0119 | 0.0102 | 0.0063 | 0.0006 | 0.0003 | <0.0001 | <0.0001 |
| GDL | – | 0.0481 | 0.0253 | 0.0211 | 0.0176 | 0.0004 | <0.0001 | <0.0001 |
| G-Lasso | – | – | 0.0312 | 0.0253 | 0.0183 | 0.0007 | <0.0001 | <0.0001 |
| SIOL | – | – | – | 0.1976 | 0.1053 | 0.0044 | 0.0005 | <0.0001 |
| Mtlasso2G | – | – | – | – | 0.1785 | 0.0061 | 0.0009 | <0.0001 |
| Multi-task | – | – | – | – | – | 0.0235 | 0.0042 | 0.0011 |
| Sparse group | – | – | – | – | – | – | 0.0075 | 0.0041 |
| LORS | – | – | – | – | – | – | – | 0.2059 |
Fig. 6.The top-1000 significant associations identified by different methods. In each plot, the x-axis represents SNPs and y-axis represents genes. Both SNPs and genes are arranged by their locations in the genome
Fig. 7.Ratio of correct interactions refined when varying κ. The initial input networks only contain 25% correct interactions
Summary of the top-15 hotspots detected by GGDL
| ID | Size | Loci | GO | Hits | GDL (all)e | GDL (hits)f | G-Lasso(all)g | G-Lasso(hits)h | SIOL(all)i | SIOL(hits) | LORS(all) | LORS(hits) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 31 | XII:1056097 | (1)*** | 7 | 31 | 7 | 32 | 7 | 8 | 6 | 31 | 7 |
| 2 | 28 | III:81832..92391 | (2)** | 5 | 29 | 5 | 28 | 5 | 58 | 5 | 22 | 4 |
| 4 | 27 | III:79091 | (2)*** | 6 | 29 | 6 | 28 | 6 | 28 | 7 | 10 | 2 |
| 5 | 27 | III:175799..177850 | (3)* | 3 | 26 | 3 | 23 | 3 | 9 | 2 | 18 | 4 |
| 7 | 25 | III:105042 | (2)*** | 6 | 23 | 6 | 25 | 6 | 5 | 3 | 19 | 4 |
| 8 | 23 | III:201166..201167 | (3)*** | 3 | 23 | 3 | 22 | 3 | 13 | 2 | 23 | 3 |
| 9 | 22 | XII:1054278..1054302 | (1)*** | 7 | 26 | 7 | 24 | 7 | 24 | 5 | 12 | 4 |
| 10 | 21 | III:100213 | (2)** | 5 | 23 | 5 | 23 | 5 | 5 | 3 | 5 | 1 |
| 11 | 20 | III:209932 | (3)* | 3 | 21 | 3 | 19 | 3 | 16 | 4 | 15 | 4 |
| 13 | 19 | III:210748..210748 | (5)* | 4 | 24 | 4 | 18 | 4 | 2 | 3 | 11 | 4 |
| 14 | 19 | VIII:111679..111680 | (6)* | 3 | 20 | 3 | 19 | 3 | 3 | 3 | 12 | 2 |
| 15 | 19 | VIII:111682..111690 | (7)** | 5 | 21 | 5 | 20 | 5 | 57 | 6 | 22 | 3 |
| Total hits | 75 | 74 | 70 | 59 | 49 | |||||||
aNumber of genes associated with the hotspot bThe chromosome position of the hotspot. cThe most significant GO category enriched with the associated gene set. The enrichment test was performed using DAVID (Huang ). The gene function is defined by GO category. The involved GO categories are: (i) telomere maintenance via recombination; (ii) branched chain family amino acid biosynthetic process; (iii). regulation of mating-type specific transcription, DNA-dependent; (iv) sterol biosynthetic process; (v) pheromone-dependent signal transduction involved in conjugation with cellular fusion; (vi) cytogamy; (vii) response to pheromone. dNumber of genes that have enriched GO categories. e,g,I,kNumber of associated genes that can also be identified using GDL, G-Lasso, SIOL and LORS, respectively. f,h,j,lNumber of genes that have enriched GO categories and can also be identified by GDL, G-Lasso, SIOL and LORS, respectively. Among these hotspots, hotspot (12) in bold cannot be detected by G-Lasso. Hotspot (6) in italic cannot be detected by SIOL. Hotspot (3) in teletype cannot be detected by LORS. Adjusted P-values using permutation tests. *10–2∼10−3, **10−3∼10−5, ***10−5∼10−10.
Hotspots detected by different methods
| GGDL | GDL | G-Lasso | SIOL | LORS | |
|---|---|---|---|---|---|
| Number of hotspots significantly enriched (top 15 hotposts) | 15 | 14 | 13 | 10 | 9 |
| Number of total reported hotspots (size > 10) | 65 | 82 | 96 | 89 | 64 |
| Number of hotspots significantly enriched | 45 | 56 | 61 | 53 | 41 |
| Ratio of significantly enriched hotspots (%) | 70 | 68 | 64 | 60 | 56 |