Ernest Turro1, William J Astle, Simon Tavaré. 1. Cancer Research UK Cambridge Institute, University of Cambridge, Robinson Way, Cambridge CB2 0RE, UK, Department of Haematology, University of Cambridge, NHS Blood and Transplant, Long Road, Cambridge CB2 0PT, UK and Department of Epidemiology, Biostatistics and Occupational Health, McGill University, 1020 Pine Avenue West, Montreal QC H3A 1A2, Canada.
Abstract
MOTIVATION: Most methods for estimating differential expression from RNA-seq are based on statistics that compare normalized read counts between treatment classes. Unfortunately, reads are in general too short to be mapped unambiguously to features of interest, such as genes, isoforms or haplotype-specific isoforms. There are methods for estimating expression levels that account for this source of ambiguity. However, the uncertainty is not generally accounted for in downstream analysis of gene expression experiments. Moreover, at the individual transcript level, it can sometimes be too large to allow useful comparisons between treatment groups. RESULTS: In this article we make two proposals that improve the power, specificity and versatility of expression analysis using RNA-seq data. First, we present a Bayesian method for model selection that accounts for read mapping ambiguities using random effects. This polytomous model selection approach can be used to identify many interesting patterns of gene expression and is not confined to detecting differential expression between two groups. For illustration, we use our method to detect imprinting, different types of regulatory divergence in cis and in trans and differential isoform usage, but many other applications are possible. Second, we present a novel collapsing algorithm for grouping transcripts into inferential units that exploits the posterior correlation between transcript expression levels. The aggregate expression levels of these units can be estimated with useful levels of uncertainty. Our algorithm can improve the precision of expression estimates when uncertainty is large with only a small reduction in biological resolution. AVAILABILITY AND IMPLEMENTATION: We have implemented our software in the mmdiff and mmcollapse multithreaded C++ programs as part of the open-source MMSEQ package, available on https://github.com/eturro/mmseq.
MOTIVATION: Most methods for estimating differential expression from RNA-seq are based on statistics that compare normalized read counts between treatment classes. Unfortunately, reads are in general too short to be mapped unambiguously to features of interest, such as genes, isoforms or haplotype-specific isoforms. There are methods for estimating expression levels that account for this source of ambiguity. However, the uncertainty is not generally accounted for in downstream analysis of gene expression experiments. Moreover, at the individual transcript level, it can sometimes be too large to allow useful comparisons between treatment groups. RESULTS: In this article we make two proposals that improve the power, specificity and versatility of expression analysis using RNA-seq data. First, we present a Bayesian method for model selection that accounts for read mapping ambiguities using random effects. This polytomous model selection approach can be used to identify many interesting patterns of gene expression and is not confined to detecting differential expression between two groups. For illustration, we use our method to detect imprinting, different types of regulatory divergence in cis and in trans and differential isoform usage, but many other applications are possible. Second, we present a novel collapsing algorithm for grouping transcripts into inferential units that exploits the posterior correlation between transcript expression levels. The aggregate expression levels of these units can be estimated with useful levels of uncertainty. Our algorithm can improve the precision of expression estimates when uncertainty is large with only a small reduction in biological resolution. AVAILABILITY AND IMPLEMENTATION: We have implemented our software in the mmdiff and mmcollapse multithreaded C++ programs as part of the open-source MMSEQ package, available on https://github.com/eturro/mmseq.
Authors: Julio D Perez; Nimrod D Rubinstein; Daniel E Fernandez; Stephen W Santoro; Leigh A Needleman; Olivia Ho-Shing; John J Choi; Mariela Zirlinger; Shau-Kwaun Chen; Jun S Liu; Catherine Dulac Journal: Elife Date: 2015-07-03 Impact factor: 8.140
Authors: Alexander Kanitz; Foivos Gypas; Andreas J Gruber; Andreas R Gruber; Georges Martin; Mihaela Zavolan Journal: Genome Biol Date: 2015-07-23 Impact factor: 13.583
Authors: Ernest Turro; Daniel Greene; Anouck Wijgaerts; Chantal Thys; Claire Lentaigne; Tadbir K Bariana; Sarah K Westbury; Anne M Kelly; Dominik Selleslag; Jonathan C Stephens; Sofia Papadia; Ilenia Simeoni; Christopher J Penkett; Sofie Ashford; Antony Attwood; Steve Austin; Tamam Bakchoul; Peter Collins; Sri V V Deevi; Rémi Favier; Myrto Kostadima; Michele P Lambert; Mary Mathias; Carolyn M Millar; Kathelijne Peerlinck; David J Perry; Sol Schulman; Deborah Whitehorn; Christine Wittevrongel; Marc De Maeyer; Augusto Rendon; Keith Gomez; Wendy N Erber; Andrew D Mumford; Paquita Nurden; Kathleen Stirrups; John R Bradley; F Lucy Raymond; Michael A Laffan; Chris Van Geet; Sylvia Richardson; Kathleen Freson; Willem H Ouwehand Journal: Sci Transl Med Date: 2016-03-02 Impact factor: 17.956
Authors: Christina L Nemeth; Sophia N Tomlinson; Melissa Rosen; Brett M O'Brien; Oscar Larraza; Mahim Jain; Connor F Murray; Joel S Marx; Michael Delannoy; Amena S Fine; Dan Wu; Aleksandra Trifunovic; Ali Fatemi Journal: Exp Neurol Date: 2019-12-27 Impact factor: 5.330
Authors: Donald M Bryant; Kimberly Johnson; Tia DiTommaso; Timothy Tickle; Matthew Brian Couger; Duygu Payzin-Dogru; Tae J Lee; Nicholas D Leigh; Tzu-Hsing Kuo; Francis G Davis; Joel Bateman; Sevara Bryant; Anna R Guzikowski; Stephanie L Tsai; Steven Coyne; William W Ye; Robert M Freeman; Leonid Peshkin; Clifford J Tabin; Aviv Regev; Brian J Haas; Jessica L Whited Journal: Cell Rep Date: 2017-01-17 Impact factor: 9.423
Authors: Lu Chen; Myrto Kostadima; Joost H A Martens; Nicole Soranzo; Willem H Ouwehand; Hendrik G Stunnenberg; Mattia Frontini; Augusto Rendon; Giovanni Canu; Sara P Garcia; Ernest Turro; Kate Downes; Iain C Macaulay; Ewa Bielczyk-Maczynska; Sophia Coe; Samantha Farrow; Pawan Poudel; Frances Burden; Sjoert B G Jansen; William J Astle; Antony Attwood; Tadbir Bariana; Bernard de Bono; Alessandra Breschi; John C Chambers; Bridge Consortium; Fizzah A Choudry; Laura Clarke; Paul Coupland; Martijn van der Ent; Wendy N Erber; Joop H Jansen; Rémi Favier; Matthew E Fenech; Nicola Foad; Kathleen Freson; Chris van Geet; Keith Gomez; Roderic Guigo; Daniel Hampshire; Anne M Kelly; Hindrik H D Kerstens; Jaspal S Kooner; Michael Laffan; Claire Lentaigne; Charlotte Labalette; Tiphaine Martin; Stuart Meacham; Andrew Mumford; Sylvia Nürnberg; Emilio Palumbo; Bert A van der Reijden; David Richardson; Stephen J Sammut; Greg Slodkowicz; Asif U Tamuri; Louella Vasquez; Katrin Voss; Stephen Watt; Sarah Westbury; Paul Flicek; Remco Loos; Nick Goldman; Paul Bertone; Randy J Read; Sylvia Richardson; Ana Cvejic Journal: Science Date: 2014-09-26 Impact factor: 47.728
Authors: Sadia Saeed; Jessica Quintin; Hindrik H D Kerstens; Nagesha A Rao; Ali Aghajanirefah; Filomena Matarese; Shih-Chin Cheng; Jacqueline Ratter; Kim Berentsen; Martijn A van der Ent; Nilofar Sharifi; Eva M Janssen-Megens; Menno Ter Huurne; Amit Mandoli; Tom van Schaik; Aylwin Ng; Frances Burden; Kate Downes; Mattia Frontini; Vinod Kumar; Evangelos J Giamarellos-Bourboulis; Willem H Ouwehand; Jos W M van der Meer; Leo A B Joosten; Cisca Wijmenga; Joost H A Martens; Ramnik J Xavier; Colin Logie; Mihai G Netea; Hendrik G Stunnenberg Journal: Science Date: 2014-09-26 Impact factor: 47.728