| Literature DB >> 28575251 |
Dimitrios V Vavoulis1,2,3,4, Jenny C Taylor2,4, Anna Schuh3,4,5.
Abstract
MOTIVATION: The identification of genetic variants influencing gene expression (known as expression quantitative trait loci or eQTLs) is important in unravelling the genetic basis of complex traits. Detecting multiple eQTLs simultaneously in a population based on paired DNA-seq and RNA-seq assays employs two competing types of models: models which rely on appropriate transformations of RNA-seq data (and are powered by a mature mathematical theory), or count-based models, which represent digital gene expression explicitly, thus rendering such transformations unnecessary. The latter constitutes an immensely popular methodology, which is however plagued by mathematical intractability.Entities:
Mesh:
Year: 2017 PMID: 28575251 PMCID: PMC5637939 DOI: 10.1093/bioinformatics/btx355
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Workflow and statistical models. (a) A matrix of genotypes X and a matrix of untransformed (Z) or transformed (Y) expression data are used in the estimation of a matrix of coefficients B, which captures the associations between M variants and K transcripts in a population of N samples. (b, c, d) Dependencies between random variables in Normal (b), over-dispersed Poisson (c), over-dispersed Binomial (c) and Negative Binomial (d) models. observed data (genotypes and gene expression); latent (unobserved) data; thick circle: the matrix of association coefficients B
Fig. 2.Model performance on simulated data. (a) Model comparison using Matthews correlation coefficient (MCC) as performance metric. (b) Model comparison using the root mean square error (RMSE) among true positives as performance metric. For each model at each sample size, we performed 252 simulations (grey dots). The black dots and whiskers indicate the mean and three standard errors on either side of the mean over these simulations
Fig. 3.Model performance on gEUVADIS data. (a) Model comparison on mRNA and miRNA datasets using the concordance correlation coefficient (CCC) as performance metric. (b) Model comparison with respect to the number of gene/variant associations they identify. In (a) and (b), a Monte Carlo cross-validation protocol with 10 repetitions (grey points) was followed for each model and each group (mRNAs or miRNAs). (c) Candidate eQTLs and number of supporting models based on all gEUVADIS samples (also, see Table 1). There are 5 eQTLs identified by more than half of the models (dashed line)
Variants identified as eQTLs in the gEUVADIS data
| ID | dbSNP | CHROM | POS | REF | ALT | MAF | Consequence |
|---|---|---|---|---|---|---|---|
| 3 | rs4639011 | 3 | 32 030 998 | C | T | 0.081 | stop gained |
| 4 | rs6535531 | 4 | 76 474 866 | T | C | 0.496 | stop lost |
| 5 | rs3217313 | 5 | 121 488 635 | AT | A | 0.186 | frameshift |
| 22 | rs35575803 | 19 | 20 807 177 | G | GA | 0.730 | frameshift |
| 23 | rs35999740 | 19 | 22 116 015 | A | T | 0.135 | stop gained |