| Literature DB >> 23876401 |
Keyan Zhao, Zhi-xiang Lu, Juw Won Park, Qing Zhou, Yi Xing.
Abstract
To characterize the genetic variation of alternative splicing, we develop GLiMMPS, a robust statistical method for detecting splicing quantitative trait loci (sQTLs) from RNA-seq data. GLiMMPS takes into account the individual variation in sequencing coverage and the noise prevalent in RNA-seq data. Analyses of simulated and real RNA-seq datasets demonstrate that GLiMMPS outperforms competing statistical models. Quantitative RT-PCR tests of 26 randomly selected GLiMMPS sQTLs yielded a validation rate of 100%. As population-scale RNA-seq studies become increasingly affordable and popular, GLiMMPS provides a useful tool for elucidating the genetic variation of alternative splicing in humans and model organisms.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23876401 PMCID: PMC4054007 DOI: 10.1186/gb-2013-14-7-r74
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Schematic outline of GLiMMPS. (a) RNA-seq reads mapped to splice junctions of alternatively spliced exons are used for estimating exon inclusion levels ψ. Shown here is a schematic illustration using the skipped exon (SE) type of alternative splicing events as the example. White, sQTL target exon; black and gray, flanking exons. The inclusion junction (IJ) reads consist of reads mapped to the upstream and downstream splice junctions of the exon inclusion isoform, while the skipping junction (SJ) reads are reads mapped to the skipping splice junction of the exon skipping isoform. (b) Illustration of the GLiMMPS statistical model. SNP genotype effect is modeled as fixed effect β. The overdispersion is modeled as individual level random effect .
Figure 2Performance evaluation of different statistical models using simulated data. (a) The observed false positive rate at the significance level of 0.05 for the linear model (lm), generalized linear model (glm), and GLiMMPS. Data were simulated with different sequencing depth with mean total junction reads ranging from 5 to 80, as described in Supplementary Methods in Additional file 1. (b) Receiver operating characteristic (ROC) curve analysis demonstrates that GLiMMPS outperforms the lm and glm models. The ROC curve plots the fraction of true positives called correctly and the fraction of false positives called incorrectly using P-values from each model. The zoomed-in figure shows the part of the ROC curve where the false positive rate is in the range of (0, 0.2).
Figure 3Concordance of sQTLs in two RNA-seq datasets of the Caucasian (CEU) population as obtained by different statistical models. (a) Comparison of GLiMMPS P values for the most significant SNP of each alternatively spliced exon in the CEU and CEU2 datasets. X-axis shows the -log10(P value) in CEU. Y-axis shows the -log10(P value) in CEU2. Red lines show the FDR cutoff of 10%. (b) Concordance of sQTL rankings between CEU and CEU2 based on different statistical models. The x-axis represents the number of top n ranked sQTLs in each dataset, while the y-axis represents the percentage of sQTLs in common between the two datasets among the top n sQTLs in CEU, based on P value rankings calculated by the linear model (lm), generalized linear model (glm), and GLiMMPS.
Figure 4The fraction of sQTL exons with significant SNPs within 300 bp of the splice sites as a function of the P value cutoffs for significant sQTLs. X-axis is the -log10(P value) for cutoffs to define significant sQTLs. Y-axis is the fraction of sQTL exons with any SNP called significant within 300 bp of the splice sites. (b) The boxplot of GLiMMPS P values for all SNPs around the 140 significant sQTLs (FDR ≤0.1), grouped into five categories based on the positions of SNPs with respect to the splice sites.
The list of sQTL signals linked to GWAS signals.
| Gene | AS typea | Target exonb (hg19) | sQTL SNPc | SNP type | GWAS trait (SNP) | GWAS references |
|---|---|---|---|---|---|---|
| SE | +chr1:76194085-76194173 | rs7524467 | < = 300 bp | Metabolic traits (rs211718) | [ | |
| SE | -chr1:111682122-111682288 | rs3762374 | 5' SS | Liver enzyme levels (gamma-glutamyl transferase) (rs1335645) | [ | |
| SE | +chr2:231110577-231110655 | rs28445040 | Exon | Chronic lymphocytic leukemia (rs13397985) | [ | |
| Multiple sclerosis (rs10201872) | [ | |||||
| Crohn's disease (rs7423615) | [ | |||||
| SE | +chr5:96076448-96076487 | rs7724759 | 5' SS | Alcohol dependence (rs13160562) | [ | |
| A5SSd | +chr5:96235824-96235949 | rs2248374 | 5' SS | Crohn's disease (rs2549794) | [ | |
| Ankylosing spondylitis (rs30187) | [ | |||||
| A5SSd | -chr11:66206102-66206319 | rs11110 | Exon | Bipolar disorder (rs2242663) | [ | |
| A3SSe | +chr12:123466117-123466426 | rs55742290 | 3' SS | Platelet counts (rs7296418, rs1727307) | [ | |
| MXEf | -chr15:75130091-75130139 | rs12898397 | 5' SS | Coffee consumption (rs6495122) | [ | |
| Coronary heart disease (rs2472299) | [ | |||||
| SE | -chr19:41939176-41939339 | rs1043413 | Exon | Height (rs17318596) | [ | |
| SE | +chr20:3193814-3193872 | rs1127354 | Exon | Response to hepatitis C treatment (rs11697186, rs6139030) | [ | |
| Ribavirin-induced anemia (rs1127354) | [ |
aAS type: SE, skipped exon; A5SS, alternative 5' splice site; A3SS, alternative 3' splice site; MXE, mutually exclusive exons.
bExon coordinates are in hg19 with the start position 0 based and the end position 1 based. The direction (+/-) of transcription is denoted before the coordinates.
cThe significant sQTL SNP (FDR≤0.1) closest to the target exon. SNP position and P value from GLiMMPS can be found in Additional file 2.
dAlternative 5' SS: ERAP2, chr5:96235893; MRPL11, chr11:66206180.
eAlternative 3' SS: ARL6IP4, chr12:123466141.
fMutually exclusive alternative exon: ULK3, chr15:75130492-75130533.
Figure 5An example of sQTL signal overlapping with GWAS signal near gene . (a) The distribution of GLiMMPS P values around the sQTL exon (exon 7) in gene SP140. The black horizontal dashed line reflects the 10% FDR cutoff and red vertical lines mark the location of the sQTL exon. SNPs in linkage disequilibrium (r2 >0.8 in the CEU population) with the GWAS SNPs (blue asterisks) are shown in solid black dots, while other SNPs are shown in grey circles. The causal splicing SNP in exon 7 is shown in red triangle. Exon-intron structure is shown in the bottom with GWAS SNPs and the causal splicing SNP (rs28445040) marked at corresponding locations. (b) Boxplot showing the significant association of rs28445040 with exon inclusion level (ψ) of the SP140 exon 7 estimated by the CEU RNA-seq dataset. The size of each dot is scaled by the total number of splice junction reads for that individual. (c) The same boxplot using exon inclusion level (ψ) measured by quantitative RT-PCR.