| Literature DB >> 35794008 |
Helena Kilpinen1,2,3,4, Alan Hodgkinson5, Anna Saukkonen5,1.
Abstract
Analysis of allele-specific gene expression (ASE) is a powerful approach for studying gene regulation, particularly when sample sizes are small, such as for rare diseases, or when studying the effects of rare genetic variation. However, detection of ASE events relies on accurate alignment of RNA sequencing reads, where challenges still remain, particularly for reads containing genetic variants or those that align to many different genomic locations. We have developed the Personalised ASE Caller (PAC), a tool that combines multiple steps to improve the quantification of allelic reads, including personalized (i.e., diploid) read alignment with improved allocation of multimapping reads. Using simulated RNA sequencing data, we show that PAC outperforms standard alignment approaches for ASE detection, reducing the number of sites with incorrect biases (>10%) by ∼80% and increasing the number of sites that can be reliably quantified by ∼3%. Applying PAC to real RNA sequencing data from 670 whole-blood samples, we show that genetic regulatory signatures inferred from ASE data more closely match those from population-based methods that are less prone to alignment biases. Finally, we use PAC to characterize cell type-specific ASE events that would be missed by standard alignment approaches, and in doing so identify disease relevant genes that may modulate their effects through the regulation of gene expression. PAC can be applied to the vast quantity of existing RNA sequencing data sets to better understand a wide array of fundamental biological and disease processes.Entities:
Year: 2022 PMID: 35794008 PMCID: PMC9435737 DOI: 10.1101/gr.276296.121
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.438
Figure 1.Overview of the PAC pipeline. (A) A schematic describing the main steps, features, and outputs of PAC. (B) Correlation of reference allele ratios (RARs) between the three different methods (standard alignment, WASP-filtered alignment, PAC) and the ground truth data. Genome-wide Pearson correlation coefficients (R2) are shown (P < 0.05 for all comparisons). (C) Site-level summary statistics for the different analysis methods. Statistics are reported for sites with at least 20× coverage in all three methods. Panel A was created with BioRender (https://biorender.com).
Figure 2.Performance of PAC compared to other methods. (A) Genome-wide correlation of reference allele ratios at heterozygous sites that PAC and standard alignment detect but that are discarded by WASP-filtering (Pearson's correlation R2 = 0.956, P = 2.6 × 10−266). Sites with at least 20× coverage were considered. (B) The difference in reference allele ratio of sites that are within 500 bp of an at least 6-bp indel, within 25 bp of another variant or a rare (MAF < 1%) variant in different analyses against the ground truth. Sites shared between all methods and with at least 20× coverage were considered. A Mann–Whitney U test was performed with Bonferroni correction to adjust for multiple testing. (****) P ≤ 1 × 10−4, (**) 1.00 × 10−3 < P ≤1.00 × 10−2 , and stars above each box plot refer to the comparison against PAC. (C) Correlation of allelic fold change (aFC) values derived from ASE and eQTL analyses from 670 GTEx whole-blood samples. Genes with a significant eQTL (Q-value < 5%) and gene-level ASE information for at least 10 individuals were selected. Pearson correlation coefficients are shown for eQTL versus ASE aFCs derived using PAC (see also Supplemental Fig. 3).