| Literature DB >> 26231172 |
Rong Lu1, Ryan M Smith2, Michal Seweryn1,3, Danxin Wang2, Katherine Hartmann2, Amy Webb4, Wolfgang Sadee2, Grzegorz A Rempala5,6.
Abstract
BACKGROUND: Measuring allele-specific RNA expression provides valuable insights into cis-acting genetic and epigenetic regulation of gene expression. Widespread adoption of high-throughput sequencing technologies for studying RNA expression (RNA-Seq) permits measurement of allelic RNA expression imbalance (AEI) at heterozygous single nucleotide polymorphisms (SNPs) across the entire transcriptome, and this approach has become especially popular with the emergence of large databases, such as GTEx. However, the existing binomial-type methods used to model allelic expression from RNA-seq assume a strong negative correlation between reference and variant allele reads, which may not be reasonable biologically.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26231172 PMCID: PMC4521363 DOI: 10.1186/s12864-015-1749-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Poisson mixture model parameter estimates and SNPs classification results
| Mixture component | Proportion | Poisson mean | No. of SNPs | No. of genes |
|---|---|---|---|---|
| Comp.1 |
|
|
|
|
| Comp.2 | 0.0011 (0.0010, 0.0012) | 152.37 (146.08, 166.13) | 519 | 37 |
| Comp.3 | 0.186 (0.182, 0.190) | 20.34 (20.20, 20.49) | 82963 | 3892 |
| Comp.4 | 0.003 (0.0025, 0.0033) | 108.14 (105.13, 115.60) | 2073 | 89 |
| Comp.5 | 0.0006 (0.0004, 0.0008) | 201.01 (196.15, 209.71) | 425 | 27 |
| Comp.6 | 0.0073 (0.0069, 0.0077) | 74.60 (72.56, 78.08) | 5156 | 202 |
| Comp.7 | 0.771 (0.769, 0.775) | 7.82 (7.78, 7.85) | 198889 | 11174 |
The Poisson mixture model was fitted to the averaged total reads within tissue-specific genes (62326 tissue-specific genes in total, i.e. sample size = 62326; overall log-likelihood = -216846; BIC = 433836). Genes with the same rs number but from different brain region were considered as different tissue-specific genes. We found the optimal number of mixture components to be 7, meaning that we could classify all SNPs into 7 “comparable” SNP groups. Most SNPs in the gene of our interest (SLC1A3) were classified into the mixture component Comp.1. The SNPs in Comp.1 were used to fit the folded Skellam mixture model
Poisson mixture Comp.1 SNP counts by gene regions
| 3’ UTR | Exon | Intron | 5’ UTR | |
|---|---|---|---|---|
|
|
| 4694 | 2142 | 269 |
|
|
| 405 | 236 | 43 |
In total 18367 SNPs were classified into the Poisson mixture component 1 and 10702 of them were in 3’ UTR of 531 genes. Fitting of the folded Skellam mixture model only used the 10702 SNPs in 3’ UTR
Folded Skellam mixture parameter estimates and results of LRTs for equal Poisson mean values
| Parameter | Mix1 | Mix2 | Mix3 | Mix4 | Mix5 | Mix6 |
|---|---|---|---|---|---|---|
|
| 0.54 (0.54, 0.55) | 0.1 (0.10, 0.11) | 0.0065 (0.0064, 0.0066) | 0.037 (0.036, 0.038) | 0.0003 (0.0003, 0.00035) | 0.3 (0.3, 0.31) |
|
| 65.7 (65.4, 66.5) | 83.8 (82.6, 84.2) | 268 (263.3, 269.4) | 92.7 (91.4, 93.1) | 214.8 (212.2, 216.3) | 4.81 (4.75, 4.84) |
|
| 69.2 (69.2, 70.2) | 106 (105, 107) | 80.3 (79.9, 81.5) | 166.0 (165.9, 169.1) | 78.1 (77.0, 78.5) | 5.39 (5.29, 5.40) |
|
| −17852 | −2074 | NA | −650 | NA | −7860 |
|
| −17864 | −1967 | −522 | −8233 | ||
|
| 1 | <0.00001 | <0.00001 | 1 | ||
|
| 5459 | 482 | 3 | 130 | 2 | 4626 |
|
| 471 | 165 | 3 | 72 | 2 | 407 |
Only SNPs on 3’ UTR and classified into Poisson mixture component 1 were used for fitting the folded Skellam mixture (overall log-likelihood = -34979; BIC = 70117; sample-size = 10702; (λ , λ ) is estimate of the ordered pair (λ , λ ). NAs indicate insufficient sample sizes for LRTs
Fig. 1Histogram of the simulation from the folded Skellam mixture (sample size = 105). Different mixture components are indicated by different colors. The two mixture components Mix1 and Mix6 which are closest to zero are considered the two no AEI signal components. The right tail (>50) with relatively smaller frequencies is enlarged and presented in the inner panel
Percentiles of absolute read ratios
| SNP category | Min | 10 % | 20 % | 30 % | 40 % | 50 % | 60 % | 70 % | 80 % | 90 % | Max |
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 1.14 |
| 1.71 | 1.88 | 2.08 | 2.32 |
| 3.06 | 3.67 | 4.85 | 9 |
|
| 1 | 1.05 | 1.13 | 1.2 | 1.29 | 1.4 |
| 1.71 | 2 |
| 9.67 |
Absolute read ratios were calculated using the formula Max(reference, variant)/Min(reference, variant). The 617 AEI signal SNPs were designated according to the largest mixture probability. The remaining 10,085 SNPs included 10 % uncertain AEI signal SNPs and 84 % no AEI signal SNPs