Yupeng Wang1,2, Rosario B Jaime-Lara3,4, Abhrarup Roy4, Ying Sun5, Xinyue Liu5, Paule V Joseph6,7. 1. BDX Research and Consulting LLC, Herndon, VA, 20171, USA. ywang@bdxconsult.com. 2. Division of Intramural Research, National Institute of Nursing Research, National Institutes of Health, Bethesda, MD, 20892, USA. ywang@bdxconsult.com. 3. Division of Intramural Clinical and Biological Research (DICBR), National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, 20892, USA. 4. Division of Intramural Research, National Institute of Nursing Research, National Institutes of Health, Bethesda, MD, 20892, USA. 5. BDX Research and Consulting LLC, Herndon, VA, 20171, USA. 6. Division of Intramural Clinical and Biological Research (DICBR), National Institute on Alcohol Abuse and Alcoholism, National Institutes of Health, Bethesda, MD, 20892, USA. paule.joseph@nih.gov. 7. Division of Intramural Research, National Institute of Nursing Research, National Institutes of Health, Bethesda, MD, 20892, USA. paule.joseph@nih.gov.
Abstract
OBJECTIVE: To address the challenge of computational identification of cell type-specific regulatory elements on a genome-wide scale. RESULTS: We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of "strong enhancer" chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL .
OBJECTIVE: To address the challenge of computational identification of cell type-specific regulatory elements on a genome-wide scale. RESULTS: We propose SeqEnhDL, a deep learning framework for classifying cell type-specific enhancers based on sequence features. DNA sequences of "strong enhancer" chromatin states in nine cell types from the ENCODE project were retrieved to build and test enhancer classifiers. For any DNA sequence, positional k-mer (k = 5, 7, 9 and 11) fold changes relative to randomly selected non-coding sequences across each nucleotide position were used as features for deep learning models. Three deep learning models were implemented, including multi-layer perceptron (MLP), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN). All models in SeqEnhDL outperform state-of-the-art enhancer classifiers (including gkm-SVM and DanQ) in distinguishing cell type-specific enhancers from randomly selected non-coding sequences. Moreover, SeqEnhDL can directly discriminate enhancers from different cell types, which has not been achieved by other enhancer classifiers. Our analysis suggests that both enhancers and their tissue-specificity can be accurately identified based on their sequence features. SeqEnhDL is publicly available at https://github.com/wyp1125/SeqEnhDL .
Entities:
Keywords:
Cell type; Classification; DNA sequence; Deep learning; Enhancer
Authors: Leelavati Narlikar; Noboru J Sakabe; Alexander A Blanski; Fabio E Arimura; John M Westlund; Marcelo A Nobrega; Ivan Ovcharenko Journal: Genome Res Date: 2010-01-14 Impact factor: 9.043
Authors: Lorenzo Pasquali; Kyle J Gaulton; Santiago A Rodríguez-Seguí; Loris Mularoni; Irene Miguel-Escalada; İldem Akerman; Juan J Tena; Ignasi Morán; Carlos Gómez-Marín; Martijn van de Bunt; Joan Ponsa-Cobas; Natalia Castro; Takao Nammo; Inês Cebola; Javier García-Hurtado; Miguel Angel Maestro; François Pattou; Lorenzo Piemonti; Thierry Berney; Anna L Gloyn; Philippe Ravassard; José Luis Gómez Skarmeta; Ferenc Müller; Mark I McCarthy; Jorge Ferrer Journal: Nat Genet Date: 2014-01-12 Impact factor: 38.330
Authors: Michael M Hoffman; Jason Ernst; Steven P Wilder; Anshul Kundaje; Robert S Harris; Max Libbrecht; Belinda Giardine; Paul M Ellenbogen; Jeffrey A Bilmes; Ewan Birney; Ross C Hardison; Ian Dunham; Manolis Kellis; William Stafford Noble Journal: Nucleic Acids Res Date: 2012-12-05 Impact factor: 16.971
Authors: Christopher J Ott; Magdalena Suszko; Neil P Blackledge; Jane E Wright; Gregory E Crawford; Ann Harris Journal: J Cell Mol Med Date: 2009-04 Impact factor: 5.310