William H Majoros1,2,3, Young-Sook Kim3,4, Alejandro Barrera3, Fan Li5, Xingyan Wang6, Sarah J Cunningham7, Graham D Johnson3,8, Cong Guo7, William L Lowe9, Denise M Scholtens10, M Geoffrey Hayes9, Timothy E Reddy1,2,3, Andrew S Allen1,2,3. 1. Duke Center for Statistical Genetics and Genomics, Duke University. 2. Division of Integrative Genomics, Department of Biostatistics and Bioinformatics, Duke University Medical School. 3. Center for Genomic and Computational Biology, Duke University Medical School. 4. Program in Computational Biology & Bioinformatics, Duke University, Durham, NC 27710. 5. Department of Biostatistics, Yale University, New Haven, CT 06520. 6. Masters Program in Biostatistics, Department of Biostatistics and Bioinformatics, Duke University Medical School, Durham, NC 27710. 7. University Program in Genetics and Genomics, Duke University. 8. Department of Biostatistics and Bioinformatics, Duke University, Durham, NC 27710. 9. Division of Endocrinology Metabolism and Molecular Medicine, Northwestern University Feinberg School of Medicine, Chicago. 10. Division of Biostatistics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA.
Abstract
MOTIVATION: High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS: We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences, which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY AND IMPLEMENTATION: The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: High-throughput reporter assays dramatically improve our ability to assign function to noncoding genetic variants, by measuring allelic effects on gene expression in the controlled setting of a reporter gene. Unlike genetic association tests, such assays are not confounded by linkage disequilibrium when loci are independently assayed. These methods can thus improve the identification of causal disease mutations. While work continues on improving experimental aspects of these assays, less effort has gone into developing methods for assessing the statistical significance of assay results, particularly in the case of rare variants captured from patient DNA. RESULTS: We describe a Bayesian hierarchical model, called Bayesian Inference of Regulatory Differences, which integrates prior information and explicitly accounts for variability between experimental replicates. The model produces substantially more accurate predictions than existing methods when allele frequencies are low, which is of clear advantage in the search for disease-causing variants in DNA captured from patient cohorts. Using the model, we demonstrate a clear tradeoff between variant sequencing coverage and numbers of biological replicates, and we show that the use of additional biological replicates decreases variance in estimates of effect size, due to the properties of the Poisson-binomial distribution. We also provide a power and sample size calculator, which facilitates decision making in experimental design parameters. AVAILABILITY AND IMPLEMENTATION: The software is freely available from www.geneprediction.org/bird. The experimental design web tool can be accessed at http://67.159.92.22:8080. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: William H Majoros; Michael S Campbell; Carson Holt; Erin K DeNardo; Doreen Ware; Andrew S Allen; Mark Yandell; Timothy E Reddy Journal: Bioinformatics Date: 2017-05-15 Impact factor: 6.937
Authors: Rupali P Patwardhan; Joseph B Hiatt; Daniela M Witten; Mee J Kim; Robin P Smith; Dalit May; Choli Lee; Jennifer M Andrie; Su-In Lee; Gregory M Cooper; Nadav Ahituv; Len A Pennacchio; Jay Shendure Journal: Nat Biotechnol Date: 2012-02-26 Impact factor: 54.908
Authors: Cynthia A Kalita; Christopher D Brown; Andrew Freiman; Jenna Isherwood; Xiaoquan Wen; Roger Pique-Regi; Francesca Luca Journal: Genome Res Date: 2018-09-25 Impact factor: 9.043
Authors: Sumantra Chatterjee; Ashish Kapoor; Jennifer A Akiyama; Dallas R Auer; Dongwon Lee; Stacey Gabriel; Courtney Berrios; Len A Pennacchio; Aravinda Chakravarti Journal: Cell Date: 2016-09-29 Impact factor: 41.582