| Literature DB >> 31581212 |
Sharon Marie Lutz1,2, Annie Thwing3, Tasha Fingerlin3,4.
Abstract
Expression quantitative trait loci (eQTL) provide insight on transcription regulation and illuminate the molecular basis of phenotypic outcomes. High-throughput RNA sequencing (RNA-seq) is becoming a popular technique to measure gene expression abundance. Traditional eQTL mapping methods for microarray expression data often assume the expression data follow a normal distribution. As a result, for RNA-seq data, total read count measurements can be normalized by normal quantile transformation in order to fit the data using a linear regression. Other approaches model the total read counts using a negative binomial regression. While these methods work well for common variants (minor allele frequencies > 5% or 1%), an extension of existing methodology is needed to accommodate a collection of rare variants in RNA-seq data. Here, we examine 2 approaches that are direct applications of existing methodology and apply these approaches to RNAseq studies: 1) collapsing the rare variants in the region and using either negative binomial regression or Poisson regression and 2) using the normalized read counts with the Sequence Kernel Association Test (SKAT), the burden test for SKAT (SKAT-Burden), or an optimal combination of these two tests (SKAT-O). We evaluated these approaches via simulation studies under numerous scenarios and applied these approaches to the 1,000 Genomes Project.Entities:
Mesh:
Year: 2019 PMID: 31581212 PMCID: PMC6776318 DOI: 10.1371/journal.pone.0223273
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Simulation results for Scenario A.
The read counts were generated from a Poisson distribution for row 1 and a negative binomial distribution for row 2. All plots were generated from Scenario A with the average number of reads μ = 50. Columns 1-3 are for 30, 100, and 500 subjects, respectively. For data generated under the Poisson distribution (row 1), all methods preserved the type 1 error rate (Fold Change = 1) and the Poisson regressions had substantial gains in power over the other methods for smaller sample sizes (n = 30, 100). For data generated under the negative binomial distribution (row 2), the Poisson regressions had an inflated type 1 error rate (Fold Change = 1). For a sample size of 500, SKAT and SKAT-O had the highest power.
Using the total read counts for each of the 87 Yoruba subjects with African Ancestry and 266 subjects with European ancestry from the 1000 genomes project, we applied the 7 approaches to determine if the rare variants in these genes are associated with the overall expression levels for these genes.
Below are the p-values for all approaches and 5 genes (LCT, PRICKLE4, FADS, SLC24A5, HERC2). Sum refers to the sum of the rare variants in the region and Indicator refers to the indicator function which equals one if the subject has at least one rare variant in the region.
| Population | Method | Chr 2 | Chr 6 | Chr 11 | Chr 15 | Chr 15 |
|---|---|---|---|---|---|---|
| African Ancestry | SKAT-Burden | 0.30 | 0.04 | 0.68 | 0.22 | |
| SKAT-O | 0.46 | 0.07 | 0.48 | 0.34 | ||
| SKAT | 0.04 | 0.74 | 0.15 | 0.30 | 0.27 | |
| Negative Binomial: Sum | 0.04 | 0.55 | 0.01 | 0.52 | 0.25 | |
| Negative Binomial: Indicator | 0.52 | 0.37 | 0.08 | 0.52 | 0.57 | |
| Poisson: Sum | 2.2E-19 | 1.0E-42 | 0 | 3.5E-08 | 2.2E-198 | |
| Poisson: Indicator | 0.01 | 2.1E-100 | 0 | 3.0E-07 | 1.4E-34 | |
| European Ancestry | SKAT-Burden | 0.51 | 0.51 | 0.77 | 0.50 | |
| SKAT-O | 0.56 | 0.60 | 1.00 | 0.73 | ||
| SKAT | 0.18 | 0.43 | 0.39 | 0.97 | 0.57 | |
| Negative Binomial: Sum | 0.12 | 0.62 | 0.38 | 0.62 | 0.63 | |
| Negative Binomial: Indicator | 0.41 | 0.62 | 0.41 | 0.21 | 0.70 | |
| Poisson: Sum | 1.5E-19 | 1.4E-38 | 1.9E-164 | 8.5E-05 | 1.2E-40 | |
| Poisson: Indicator | 4.7E-05 | 1.4E-38 | 1.5E-141 | 4.5E-19 | 1.9E-23 |