Cynthia A Kalita1, Gregory A Moyerbrailean1, Christopher Brown2, Xiaoquan Wen3, Francesca Luca1,4, Roger Pique-Regi1,4. 1. Center for Molecular Medicine and Genetics, School of Medicine, Wayne State University, Detroit, MI 48201, USA. 2. Department of Genetics, University of Pennsylvania, Philadelphia, PA 19104, USA. 3. Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA. 4. Department of Obstetrics and Gynecology, Wayne State University, Detroit, MI 48201, USA.
Abstract
Motivation: The majority of the human genome is composed of non-coding regions containing regulatory elements such as enhancers, which are crucial for controlling gene expression. Many variants associated with complex traits are in these regions, and may disrupt gene regulatory sequences. Consequently, it is important to not only identify true enhancers but also to test if a variant within an enhancer affects gene regulation. Recently, allele-specific analysis in high-throughput reporter assays, such as massively parallel reporter assays (MPRAs), have been used to functionally validate non-coding variants. However, we are still missing high-quality and robust data analysis tools for these datasets. Results: We have further developed our method for allele-specific analysis QuASAR (quantitative allele-specific analysis of reads) to analyze allele-specific signals in barcoded read counts data from MPRA. Using this approach, we can take into account the uncertainty on the original plasmid proportions, over-dispersion, and sequencing errors. The provided allelic skew estimate and its standard error also simplifies meta-analysis of replicate experiments. Additionally, we show that a beta-binomial distribution better models the variability present in the allelic imbalance of these synthetic reporters and results in a test that is statistically well calibrated under the null. Applying this approach to the MPRA data, we found 602 SNPs with significant (false discovery rate 10%) allele-specific regulatory function in LCLs. We also show that we can combine MPRA with QuASAR estimates to validate existing experimental and computational annotations of regulatory variants. Our study shows that with appropriate data analysis tools, we can improve the power to detect allelic effects in high-throughput reporter assays. Availability and implementation: http://github.com/piquelab/QuASAR/tree/master/mpra. Contact: fluca@wayne.edu or rpique@wayne.edu. Supplementary information: Supplementary data are available online at Bioinformatics.
Motivation: The majority of the human genome is composed of non-coding regions containing regulatory elements such as enhancers, which are crucial for controlling gene expression. Many variants associated with complex traits are in these regions, and may disrupt gene regulatory sequences. Consequently, it is important to not only identify true enhancers but also to test if a variant within an enhancer affects gene regulation. Recently, allele-specific analysis in high-throughput reporter assays, such as massively parallel reporter assays (MPRAs), have been used to functionally validate non-coding variants. However, we are still missing high-quality and robust data analysis tools for these datasets. Results: We have further developed our method for allele-specific analysis QuASAR (quantitative allele-specific analysis of reads) to analyze allele-specific signals in barcoded read counts data from MPRA. Using this approach, we can take into account the uncertainty on the original plasmid proportions, over-dispersion, and sequencing errors. The provided allelic skew estimate and its standard error also simplifies meta-analysis of replicate experiments. Additionally, we show that a beta-binomial distribution better models the variability present in the allelic imbalance of these synthetic reporters and results in a test that is statistically well calibrated under the null. Applying this approach to the MPRA data, we found 602 SNPs with significant (false discovery rate 10%) allele-specific regulatory function in LCLs. We also show that we can combine MPRA with QuASAR estimates to validate existing experimental and computational annotations of regulatory variants. Our study shows that with appropriate data analysis tools, we can improve the power to detect allelic effects in high-throughput reporter assays. Availability and implementation: http://github.com/piquelab/QuASAR/tree/master/mpra. Contact: fluca@wayne.edu or rpique@wayne.edu. Supplementary information: Supplementary data are available online at Bioinformatics.
Authors: Jian Yang; Michael N Weedon; Shaun Purcell; Guillaume Lettre; Karol Estrada; Cristen J Willer; Albert V Smith; Erik Ingelsson; Jeffrey R O'Connell; Massimo Mangino; Reedik Mägi; Pamela A Madden; Andrew C Heath; Dale R Nyholt; Nicholas G Martin; Grant W Montgomery; Timothy M Frayling; Joel N Hirschhorn; Mark I McCarthy; Michael E Goddard; Peter M Visscher Journal: Eur J Hum Genet Date: 2011-03-16 Impact factor: 4.246
Authors: J Raphael Gibbs; Marcel P van der Brug; Dena G Hernandez; Bryan J Traynor; Michael A Nalls; Shiao-Lin Lai; Sampath Arepalli; Allissa Dillman; Ian P Rafferty; Juan Troncoso; Robert Johnson; H Ronald Zielke; Luigi Ferrucci; Dan L Longo; Mark R Cookson; Andrew B Singleton Journal: PLoS Genet Date: 2010-05-13 Impact factor: 5.917
Authors: Jacob C Ulirsch; Satish K Nandakumar; Li Wang; Felix C Giani; Xiaolan Zhang; Peter Rogov; Alexandre Melnikov; Patrick McDonel; Ron Do; Tarjei S Mikkelsen; Vijay G Sankaran Journal: Cell Date: 2016-06-02 Impact factor: 41.582
Authors: Jacob F Degner; Athma A Pai; Roger Pique-Regi; Jean-Baptiste Veyrieras; Daniel J Gaffney; Joseph K Pickrell; Sherryl De Leon; Katelyn Michelini; Noah Lewellen; Gregory E Crawford; Matthew Stephens; Yoav Gilad; Jonathan K Pritchard Journal: Nature Date: 2012-02-05 Impact factor: 49.962
Authors: Gregory A Moyerbrailean; Allison L Richards; Daniel Kurtz; Cynthia A Kalita; Gordon O Davis; Chris T Harvey; Adnan Alazizi; Donovan Watza; Yoram Sorokin; Nancy Hauff; Xiang Zhou; Xiaoquan Wen; Roger Pique-Regi; Francesca Luca Journal: Genome Res Date: 2016-10-19 Impact factor: 9.043
Authors: Matthew T Maurano; Eric Haugen; Richard Sandstrom; Jeff Vierstra; Anthony Shafer; Rajinder Kaul; John A Stamatoyannopoulos Journal: Nat Genet Date: 2015-10-26 Impact factor: 38.330
Authors: William H Majoros; Young-Sook Kim; Alejandro Barrera; Fan Li; Xingyan Wang; Sarah J Cunningham; Graham D Johnson; Cong Guo; William L Lowe; Denise M Scholtens; M Geoffrey Hayes; Timothy E Reddy; Andrew S Allen Journal: Bioinformatics Date: 2020-01-15 Impact factor: 6.937
Authors: Ilakya Selvarajan; Anu Toropainen; Kristina M Garske; Maykel López Rodríguez; Arthur Ko; Zong Miao; Dorota Kaminska; Kadri Õunap; Tiit Örd; Aarthi Ravindran; Oscar H Liu; Pierre R Moreau; Ashik Jawahar Deen; Ville Männistö; Calvin Pan; Anna-Liisa Levonen; Aldons J Lusis; Sami Heikkinen; Casey E Romanoski; Jussi Pihlajamäki; Päivi Pajukanta; Minna U Kaikkonen Journal: Am J Hum Genet Date: 2021-02-23 Impact factor: 11.025
Authors: Dandi Qiao; Corwin M Zigler; Michael H Cho; Edwin K Silverman; Xiaobo Zhou; Peter J Castaldi; Nan H Laird Journal: Genet Epidemiol Date: 2020-07-18 Impact factor: 2.135
Authors: Cynthia A Kalita; Christopher D Brown; Andrew Freiman; Jenna Isherwood; Xiaoquan Wen; Roger Pique-Regi; Francesca Luca Journal: Genome Res Date: 2018-09-25 Impact factor: 9.043
Authors: David Bray; Heather Hook; Rose Zhao; Jessica L Keenan; Ashley Penvose; Yemi Osayame; Nima Mohaghegh; Xiaoting Chen; Sreeja Parameswaran; Leah C Kottyan; Matthew T Weirauch; Trevor Siggers Journal: Cell Genom Date: 2022-02-09
Authors: Matthew R Hass; Daniel Brissette; Sreeja Parameswaran; Mario Pujato; Omer Donmez; Leah C Kottyan; Matthew T Weirauch; Raphael Kopan Journal: PLoS Genet Date: 2021-06-10 Impact factor: 6.020
Authors: Christophe Bourges; Abigail F Groff; Oliver S Burren; Chiara Gerhardinger; Kaia Mattioli; Anna Hutchinson; Theodore Hu; Tanmay Anand; Madeline W Epping; Chris Wallace; Kenneth Gc Smith; John L Rinn; James C Lee Journal: EMBO Mol Med Date: 2020-04-01 Impact factor: 12.137