Motivation: Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation. Results: Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects. Availability and implementation: EMASE software is available at https://github.com/churchill-lab/emase. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Allele-specific expression (ASE) refers to the differential abundance of the allelic copies of a transcript. RNA sequencing (RNA-seq) can provide quantitative estimates of ASE for genes with transcribed polymorphisms. When short-read sequences are aligned to a diploid transcriptome, read-mapping ambiguities confound our ability to directly count reads. Multi-mapping reads aligning equally well to multiple genomic locations, isoforms or alleles can comprise the majority (>85%) of reads. Discarding them can result in biases and substantial loss of information. Methods have been developed that use weighted allocation of read counts but these methods treat the different types of multi-reads equivalently. We propose a hierarchical approach to allocation of read counts that first resolves ambiguities among genes, then among isoforms, and lastly between alleles. We have implemented our model in EMASE software (Expectation-Maximization for Allele Specific Expression) to estimate total gene expression, isoform usage and ASE based on this hierarchical allocation. Results: Methods that align RNA-seq reads to a diploid transcriptome incorporating known genetic variants improve estimates of ASE and total gene expression compared to methods that use reference genome alignments. Weighted allocation methods outperform methods that discard multi-reads. Hierarchical allocation of reads improves estimation of ASE even when data are simulated from a non-hierarchical model. Analysis of RNA-seq data from F1 hybrid mice using EMASE reveals widespread ASE associated with cis-acting polymorphisms and a small number of parent-of-origin effects. Availability and implementation: EMASE software is available at https://github.com/churchill-lab/emase. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Ana Conesa; Pedro Madrigal; Sonia Tarazona; David Gomez-Cabrero; Alejandra Cervera; Andrew McPherson; Michal Wojciech Szcześniak; Daniel J Gaffney; Laura L Elo; Xuegong Zhang; Ali Mortazavi Journal: Genome Biol Date: 2016-08-26 Impact factor: 13.583
Authors: Jacob F Degner; John C Marioni; Athma A Pai; Joseph K Pickrell; Everlyne Nkadori; Yoav Gilad; Jonathan K Pritchard Journal: Bioinformatics Date: 2009-10-06 Impact factor: 6.937
Authors: Daniel A Skelly; Anne Czechanski; Candice Byers; Selcan Aydin; Catrina Spruce; Chris Olivier; Kwangbom Choi; Daniel M Gatti; Narayanan Raghupathy; Gregory R Keele; Alexander Stanton; Matthew Vincent; Stephanie Dion; Ian Greenstein; Matthew Pankratz; Devin K Porter; Whitney Martin; Callan O'Connor; Wenning Qin; Alison H Harrill; Ted Choi; Gary A Churchill; Steven C Munger; Christopher L Baker; Laura G Reinholdt Journal: Cell Stem Cell Date: 2020-08-13 Impact factor: 24.633
Authors: Jordan M Eizenga; Adam M Novak; Jonas A Sibbesen; Simon Heumos; Ali Ghaffaari; Glenn Hickey; Xian Chang; Josiah D Seaman; Robin Rounthwaite; Jana Ebler; Mikko Rautiainen; Shilpa Garg; Benedict Paten; Tobias Marschall; Jouni Sirén; Erik Garrison Journal: Annu Rev Genomics Hum Genet Date: 2020-05-26 Impact factor: 8.929
Authors: Mark P Keller; Daniel M Gatti; Kathryn L Schueler; Mary E Rabaglia; Donnie S Stapleton; Petr Simecek; Matthew Vincent; Sadie Allen; Aimee Teo Broman; Rhonda Bacher; Christina Kendziorski; Karl W Broman; Brian S Yandell; Gary A Churchill; Alan D Attie Journal: Genetics Date: 2018-03-22 Impact factor: 4.562
Authors: Jennifer R Dwyer; Jeremy J Racine; Harold D Chapman; Anna Quinlan; Maximiliano Presa; Grace A Stafford; Ingo Schmitz; David V Serreze Journal: J Immunol Date: 2022-06-27 Impact factor: 5.426
Authors: Olivia L Sabik; Gina M Calabrese; Eric Taleghani; Cheryl L Ackert-Bicknell; Charles R Farber Journal: Cell Rep Date: 2020-09-15 Impact factor: 9.423
Authors: Zhenhua Zhang; Freerk van Dijk; Niek de Klein; Mariëlle E van Gijn; Lude H Franke; Richard J Sinke; Morris A Swertz; K Joeri van der Velde Journal: Sci Rep Date: 2021-05-19 Impact factor: 4.379