Zong Miao1,2, Marcus Alvarez1, Päivi Pajukanta1,2,3, Arthur Ko1,3. 1. Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA 90024, USA. 2. Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA 90024, USA. 3. Molecular Biology Institute, UCLA, Los Angeles, CA 90024, USA.
Abstract
Motivation: Mapping bias causes preferential alignment to the reference allele, forming a major obstacle in allele-specific expression (ASE) analysis. The existing methods, such as simulation and SNP-aware alignment, are either inaccurate or relatively slow. To fast and accurately count allelic reads for ASE analysis, we developed a novel approach, ASElux, which utilizes the personal SNP information and counts allelic reads directly from unmapped RNA-sequence (RNA-seq) data. ASElux significantly reduces runtime by disregarding reads outside single nucleotide polymorphisms (SNPs) during the alignment. Results: When compared to other tools on simulated and experimental data, ASElux achieves a higher accuracy on ASE estimation than non-SNP-aware aligners and requires a much shorter time than the benchmark SNP-aware aligner, GSNAP with just a slight loss in performance. ASElux can process 40 million read-pairs from an RNA-sequence (RNA-seq) sample and count allelic reads within 10 min, which is comparable to directly counting the allelic reads from alignments based on other tools. Furthermore, processing an RNA-seq sample using ASElux in conjunction with a general aligner, such as STAR, is more accurate and still ∼4× faster than STAR + WASP, and ∼33× faster than the lead SNP-aware aligner, GSNAP, making ASElux ideal for ASE analysis of large-scale transcriptomic studies. We applied ASElux to 273 lung RNA-seq samples from GTEx and identified a splice-QTL rs11078928 in lung which explains the mechanism underlying an asthma GWAS SNP rs11078927. Thus, our analysis demonstrated ASE as a highly powerful complementary tool to cis-expression quantitative trait locus (eQTL) analysis. Availability and implementation: The software can be downloaded from https://github.com/abl0719/ASElux. Contact: zmiao@ucla.edu or a5ko@ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Mapping bias causes preferential alignment to the reference allele, forming a major obstacle in allele-specific expression (ASE) analysis. The existing methods, such as simulation and SNP-aware alignment, are either inaccurate or relatively slow. To fast and accurately count allelic reads for ASE analysis, we developed a novel approach, ASElux, which utilizes the personal SNP information and counts allelic reads directly from unmapped RNA-sequence (RNA-seq) data. ASElux significantly reduces runtime by disregarding reads outside single nucleotide polymorphisms (SNPs) during the alignment. Results: When compared to other tools on simulated and experimental data, ASElux achieves a higher accuracy on ASE estimation than non-SNP-aware aligners and requires a much shorter time than the benchmark SNP-aware aligner, GSNAP with just a slight loss in performance. ASElux can process 40 million read-pairs from an RNA-sequence (RNA-seq) sample and count allelic reads within 10 min, which is comparable to directly counting the allelic reads from alignments based on other tools. Furthermore, processing an RNA-seq sample using ASElux in conjunction with a general aligner, such as STAR, is more accurate and still ∼4× faster than STAR + WASP, and ∼33× faster than the lead SNP-aware aligner, GSNAP, making ASElux ideal for ASE analysis of large-scale transcriptomic studies. We applied ASElux to 273 lung RNA-seq samples from GTEx and identified a splice-QTL rs11078928 in lung which explains the mechanism underlying an asthma GWAS SNPrs11078927. Thus, our analysis demonstrated ASE as a highly powerful complementary tool to cis-expression quantitative trait locus (eQTL) analysis. Availability and implementation: The software can be downloaded from https://github.com/abl0719/ASElux. Contact: zmiao@ucla.edu or a5ko@ucla.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Emmanuelle Bouzigon; Eve Corda; Hugues Aschard; Marie-Hélène Dizier; Anne Boland; Jean Bousquet; Nicolas Chateigner; Frédéric Gormand; Jocelyne Just; Nicole Le Moual; Pierre Scheinmann; Valérie Siroux; Daniel Vervloet; Diana Zelenika; Isabelle Pin; Francine Kauffmann; Mark Lathrop; Florence Demenais Journal: N Engl J Med Date: 2008-10-15 Impact factor: 91.245
Authors: Alfonso Buil; Andrew Anand Brown; Tuuli Lappalainen; Ana Viñuela; Matthew N Davies; Hou-Feng Zheng; J Brent Richards; Daniel Glass; Kerrin S Small; Richard Durbin; Timothy D Spector; Emmanouil T Dermitzakis Journal: Nat Genet Date: 2014-12-01 Impact factor: 38.330
Authors: Kimberly R Kukurba; Rui Zhang; Xin Li; Kevin S Smith; David A Knowles; Meng How Tan; Robert Piskol; Monkol Lek; Michael Snyder; Daniel G Macarthur; Jin Billy Li; Stephen B Montgomery Journal: PLoS Genet Date: 2014-05-01 Impact factor: 5.917
Authors: Jacob F Degner; John C Marioni; Athma A Pai; Joseph K Pickrell; Everlyne Nkadori; Yoav Gilad; Jonathan K Pritchard Journal: Bioinformatics Date: 2009-10-06 Impact factor: 6.937
Authors: Jordan M Eizenga; Adam M Novak; Jonas A Sibbesen; Simon Heumos; Ali Ghaffaari; Glenn Hickey; Xian Chang; Josiah D Seaman; Robin Rounthwaite; Jana Ebler; Mikko Rautiainen; Shilpa Garg; Benedict Paten; Tobias Marschall; Jouni Sirén; Erik Garrison Journal: Annu Rev Genomics Hum Genet Date: 2020-05-26 Impact factor: 8.929
Authors: Mazdak Salavati; Stephen J Bush; Sergio Palma-Vera; Mary E B McCulloch; David A Hume; Emily L Clark Journal: Front Genet Date: 2019-09-19 Impact factor: 4.599