Haoyang Zeng1, Tatsunori Hashimoto1, Daniel D Kang1, David K Gifford2. 1. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA and. 2. Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA 02142, USA and Department of Stem Cell and Regenerative Biology, Harvard University and Harvard Medical School, Cambridge, MA 02138, USA.
Abstract
MOTIVATION: The majority of disease-associated variants identified in genome-wide association studies reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of genome-wide association studies. RESULTS: We present GERV (generative evaluation of regulatory variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked single-nucleotide polymorphisms and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis. AVAILABILITY AND IMPLEMENTATION: The implementation of GERV and related data are available at http://gerv.csail.mit.edu/.
MOTIVATION: The majority of disease-associated variants identified in genome-wide association studies reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of genome-wide association studies. RESULTS: We present GERV (generative evaluation of regulatory variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer-based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor's canonical motif and associated co-factor motifs. We show that GERV outperforms existing methods in predicting single-nucleotide polymorphisms associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked single-nucleotide polymorphisms and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis. AVAILABILITY AND IMPLEMENTATION: The implementation of GERV and related data are available at http://gerv.csail.mit.edu/.
Authors: Jason S Carroll; Clifford A Meyer; Jun Song; Wei Li; Timothy R Geistlinger; Jérôme Eeckhoute; Alexander S Brodsky; Erika Krasnickas Keeton; Kirsten C Fertuck; Giles F Hall; Qianben Wang; Stefan Bekiranov; Victor Sementchenko; Edward A Fox; Pamela A Silver; Thomas R Gingeras; X Shirley Liu; Myles Brown Journal: Nat Genet Date: 2006-10-01 Impact factor: 38.330
Authors: Lucia A Hindorff; Praveen Sethupathy; Heather A Junkins; Erin M Ramos; Jayashri P Mehta; Francis S Collins; Teri A Manolio Journal: Proc Natl Acad Sci U S A Date: 2009-05-27 Impact factor: 11.205
Authors: Jérôme Eeckhoute; Jason S Carroll; Timothy R Geistlinger; Maria I Torres-Arzayus; Myles Brown Journal: Genes Dev Date: 2006-09-15 Impact factor: 11.361
Authors: Shuichi Fujioka; Jiangong Niu; Christian Schmidt; Guido M Sclabas; Bailu Peng; Tadashi Uwagawa; Zhongkui Li; Douglas B Evans; James L Abbruzzese; Paul J Chiao Journal: Mol Cell Biol Date: 2004-09 Impact factor: 4.272
Authors: Anat Kreimer; Haoyang Zeng; Matthew D Edwards; Yuchun Guo; Kevin Tian; Sunyoung Shin; Rene Welch; Michael Wainberg; Rahul Mohan; Nicholas A Sinnott-Armstrong; Yue Li; Gökcen Eraslan; Talal Bin Amin; Ryan Tewhey; Pardis C Sabeti; Jonathan Goke; Nikola S Mueller; Manolis Kellis; Anshul Kundaje; Michael A Beer; Sunduz Keles; David K Gifford; Nir Yosef Journal: Hum Mutat Date: 2017-03-09 Impact factor: 4.878
Authors: Kaixuan Luo; Jianling Zhong; Alexias Safi; Linda K Hong; Alok K Tewari; Lingyun Song; Timothy E Reddy; Li Ma; Gregory E Crawford; Alexander J Hartemink Journal: Genome Res Date: 2022-05-24 Impact factor: 9.438
Authors: Ron Schwessinger; Maria C Suciu; Simon J McGowan; Jelena Telenius; Stephen Taylor; Doug R Higgs; Jim R Hughes Journal: Genome Res Date: 2017-09-13 Impact factor: 9.043