| Literature DB >> 17352813 |
Miguel Pérez-Enciso1, José R Quevedo, Antonio Bahamonde.
Abstract
BACKGROUND: Genetical genomics is a very powerful tool to elucidate the basis of complex traits and disease susceptibility. Despite its relevance, however, statistical modeling of expression quantitative trait loci (eQTL) has not received the attention it deserves. Based on two reasonable assertions (i) a good model should consider all available variables as potential effects, and (ii) gene expressions are highly interconnected, we suggest that an eQTL model should consider the rest of expression levels as potential regressors, in addition to the markers.Entities:
Mesh:
Year: 2007 PMID: 17352813 PMCID: PMC1828729 DOI: 10.1186/1471-2164-8-69
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Approach proposed in this paper. Schematic representation of the analysis of genetical genomics experiment, the blue circles represent the markers tested and the yellow circles, each of the transcript levels analysed, an arrow signifies that the effect has been included in the model for the transcript. The left cartoon (a) represents the current strategy for eQTL searching: it consists of including the most significant marker in the model when testing each transcript independently. Several arrows pointing to a transcript means that the transcript is affected by several QTL, while many arrows starting in a single marker represents an eQTL hotspot (H). The right cartoon (b) presents the strategy proposed here, which suggests that external expression levels can be included as covariates in the model for the expression level studied (the arrows that start and end at the cDNA circles). Including cDNAs in the model can dramatically affect the final eQTL map, some positions may be shifted, some previous eQTL may disappear or some new appear. The bottom line of this approach is that all markers and all expression levels are potential regressors to be considered. The optimum model could be chosen using some of the available criteria, like AIC, BIC, DIC or AUC among others.
Figure 2Comparison of eQTL profiles. P-value Profiles with model 1 (red dots) or 2 (blue line) for two genes, Prdx2 (top) and Lin7c (bottom). P-values are in log10 scale. Model 1 considers only the marker in the model, whereas model 2 also includes the most associated transcript level.
Associated P-values and AUC for some of the most significant QTL reported by Chesler et al.(2005)
| Best markera | Best transcriptb | AUC50(%) c | |||||||
| Name | -log10 P-valued | AUC% | Name | -log 10 P-valued | AUC% | Marker | Transcript | All | |
| Trans-QTL | |||||||||
| 23 | 72 | 6 | 65 | 76 | 88 | 93 | |||
| 14 | 71 | 7 | 66 | 77 | 88 | 91 | |||
| 12 | 72 | 11 | 57 | 74 | 89 | 89 | |||
| 5 | 61 | 18 | 91 | 71 | 89 | 91 | |||
| 6 | 63 | 20 | 77 | 71 | 88 | 88 | |||
| Cis-QTL | |||||||||
| 15 | 71 | 20 | 74 | 69 | 94 | 94 | |||
| 7 | 62 | 24 | 77 | 62 | 95 | 96 | |||
| 13 | 70 | 10 | 71 | 73 | 89 | 89 | |||
| 5 | 57 | 39 | 80 | 63 | 84 | 85 | |||
| 11 | 73 | 23 | 67 | 77 | 90 | 91 | |||
| Largest-clique | |||||||||
| 5 | 62 | 53 | 90 | 80 | 97 | 97 | |||
a The marker shown is the most associated to the cDNA level of the gene in the first column.
b The gene name shown is that whose cDNA level is most associated to the cDNA level of the gene in the first column.
c AUC50 is the AUC obtained with the best 50 variables, the three columns refer to AUC obtained when only markers, only transcripts or all variables, respectively, are considered as predictors.
d Values reported are -log10 (P-value), that is a value of x means that significance is 10-x.
Figure 3AUCs for gene expression levels. Comparison between AUC for 67 gene expression levels considering the best 50 predictive variables chosen among all markers and cDNA levels (red solid squares), the best 50 variables chosen among all markers (green solid triangles) and considering the best 50 variables chosen among all transcript levels (blue open circles). All three AUCs for each expression level are in the same abscissa's position, genes were ranked according to AUC using all variables. It can be seen that using only markers results in consistently lower AUC, whereas there are no large differences between using all variables or only transcript levels. For some genes (23 out of 67), AUCs using only cDNAs were slightly better than using all variables, this occurred because the RFE algorithm [10] may not completely remove redundant information from all variables and thus does not always guarantee the absolute maximum. The 67 genes shown were chosen within those with most significant QTLs in Chesler et al. (2005). Thus, one should expect that markers are better predictors, and consequently higher AUC, for these genes than for a random gene. Note that an AUC of 50% means than the criterion is no better than a random ordering.
QTL results for a subset of genes pertaining to a QTL hotspot localised around marker D6Mit254 (chr. 6).
| Gene | -log10 P-value of QTL (model 1) a,b | Best transcript | P-value of best transcripta | Position of QTL (model 2)c | -log10 P-value of QTL (model 2) a,d |
| 4.4 | 30.0 | 21.0 | |||
| 4.1 | 32.3 | 5.6 | |||
| 3.7 | 35.5 | 5.9 | |||
| 3.6 | 37.8 | 7.2 | |||
| 3.5 | 41.7 | 10.9 | |||
| 3.3 | 30.9 | 3.6 | |||
| 3.3 | 34.4 | 6.2 | |||
| 3.0 | 45.3 | 3.2 | |||
| 2.5 | 45.5 | 11.7 |
a Values reported are -log10 (P-value), that is a value of x means that significance is 10-x.
b P-value when only the marker is included in the model.
c QTL position when the best transcript is also included in the model.
d QTL P-value when the best transcript is also included in the model.