Zhong Zhuang1, Xiaotong Shen2, Wei Pan3. 1. Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA. 2. School of Statistics, University of Minnesota, Minneapolis, MN, USA. 3. Division of Biostatistics, University of Minnesota, Minneapolis, MN, USA.
Abstract
MOTIVATION: Enhancer-promoter interactions (EPIs) in the genome play an important role in transcriptional regulation. EPIs can be useful in boosting statistical power and enhancing mechanistic interpretation for disease- or trait-associated genetic variants in genome-wide association studies. Instead of expensive and time-consuming biological experiments, computational prediction of EPIs with DNA sequence and other genomic data is a fast and viable alternative. In particular, deep learning and other machine learning methods have been demonstrated with promising performance. RESULTS: First, using a published human cell line dataset, we demonstrate that a simple convolutional neural network (CNN) performs as well as, if no better than, a more complicated and state-of-the-art architecture, a hybrid of a CNN and a recurrent neural network. More importantly, in spite of the well-known cell line-specific EPIs (and corresponding gene expression), in contrast to the standard practice of training and predicting for each cell line separately, we propose two transfer learning approaches to training a model using all cell lines to various extents, leading to substantially improved predictive performance. AVAILABILITY AND IMPLEMENTATION: Computer code is available at https://github.com/zzUMN/Combine-CNN-Enhancer-and-Promoters. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Enhancer-promoter interactions (EPIs) in the genome play an important role in transcriptional regulation. EPIs can be useful in boosting statistical power and enhancing mechanistic interpretation for disease- or trait-associated genetic variants in genome-wide association studies. Instead of expensive and time-consuming biological experiments, computational prediction of EPIs with DNA sequence and other genomic data is a fast and viable alternative. In particular, deep learning and other machine learning methods have been demonstrated with promising performance. RESULTS: First, using a published human cell line dataset, we demonstrate that a simple convolutional neural network (CNN) performs as well as, if no better than, a more complicated and state-of-the-art architecture, a hybrid of a CNN and a recurrent neural network. More importantly, in spite of the well-known cell line-specific EPIs (and corresponding gene expression), in contrast to the standard practice of training and predicting for each cell line separately, we propose two transfer learning approaches to training a model using all cell lines to various extents, leading to substantially improved predictive performance. AVAILABILITY AND IMPLEMENTATION: Computer code is available at https://github.com/zzUMN/Combine-CNN-Enhancer-and-Promoters. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Qin Cao; Christine Anyansi; Xihao Hu; Liangliang Xu; Lei Xiong; Wenshu Tang; Myth T S Mok; Chao Cheng; Xiaodan Fan; Mark Gerstein; Alfred S L Cheng; Kevin Y Yip Journal: Nat Genet Date: 2017-09-04 Impact factor: 38.330
Authors: Suhas S P Rao; Miriam H Huntley; Neva C Durand; Elena K Stamenova; Ivan D Bochkov; James T Robinson; Adrian L Sanborn; Ido Machol; Arina D Omer; Eric S Lander; Erez Lieberman Aiden Journal: Cell Date: 2014-12-11 Impact factor: 41.582
Authors: Biola M Javierre; Oliver S Burren; Steven P Wilder; Roman Kreuzhuber; Steven M Hill; Sven Sewitz; Jonathan Cairns; Steven W Wingett; Csilla Várnai; Michiel J Thiecke; Frances Burden; Samantha Farrow; Antony J Cutler; Karola Rehnström; Kate Downes; Luigi Grassi; Myrto Kostadima; Paula Freire-Pritchett; Fan Wang; Hendrik G Stunnenberg; John A Todd; Daniel R Zerbino; Oliver Stegle; Willem H Ouwehand; Mattia Frontini; Chris Wallace; Mikhail Spivakov; Peter Fraser Journal: Cell Date: 2016-11-17 Impact factor: 41.582
Authors: Nicola H Dryden; Laura R Broome; Frank Dudbridge; Nichola Johnson; Nick Orr; Stefan Schoenfelder; Takashi Nagano; Simon Andrews; Steven Wingett; Iwanka Kozarewa; Ioannis Assiotis; Kerry Fenwick; Sarah L Maguire; James Campbell; Rachael Natrajan; Maryou Lambros; Eleni Perrakis; Alan Ashworth; Peter Fraser; Olivia Fletcher Journal: Genome Res Date: 2014-08-13 Impact factor: 9.043