Hailin Hu1, An Xiao2, Sai Zhang3, Yangyang Li4, Xuanling Shi4, Tao Jiang5,6,7, Linqi Zhang4, Lei Zhang1, Jianyang Zeng2. 1. School of Medicine, Tsinghua University, Beijing, China. 2. Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China. 3. Department of Genetics, Stanford Center for Genomics and Personalized Medicine, Stanford University School of Medicine, Stanford, CA, USA. 4. Comprehensive AIDS Research Center, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, School of Life Sciences and School of Medicine, Tsinghua University, Beijing, China. 5. Department of Computer Science and Engineering, University of California, Riverside, CA, USA. 6. Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China. 7. Institute of Integrative Genome Biology, University of California, Riverside, CA, USA.
Abstract
MOTIVATION: Human immunodeficiency virus type 1 (HIV-1) genome integration is closely related to clinical latency and viral rebound. In addition to human DNA sequences that directly interact with the integration machinery, the selection of HIV integration sites has also been shown to depend on the heterogeneous genomic context around a large region, which greatly hinders the prediction and mechanistic studies of HIV integration. RESULTS: We have developed an attention-based deep learning framework, named DeepHINT, to simultaneously provide accurate prediction of HIV integration sites and mechanistic explanations of the detected sites. Extensive tests on a high-density HIV integration site dataset showed that DeepHINT can outperform conventional modeling strategies by automatically learning the genomic context of HIV integration from primary DNA sequence alone or together with epigenetic information. Systematic analyses on diverse known factors of HIV integration further validated the biological relevance of the prediction results. More importantly, in-depth analyses of the attention values output by DeepHINT revealed intriguing mechanistic implications in the selection of HIV integration sites, including potential roles of several DNA-binding proteins. These results established DeepHINT as an effective and explainable deep learning framework for the prediction and mechanistic study of HIV integration. AVAILABILITY AND IMPLEMENTATION: DeepHINT is available as an open-source software and can be downloaded from https://github.com/nonnerdling/DeepHINT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION:Humanimmunodeficiency virus type 1 (HIV-1) genome integration is closely related to clinical latency and viral rebound. In addition to human DNA sequences that directly interact with the integration machinery, the selection of HIV integration sites has also been shown to depend on the heterogeneous genomic context around a large region, which greatly hinders the prediction and mechanistic studies of HIV integration. RESULTS: We have developed an attention-based deep learning framework, named DeepHINT, to simultaneously provide accurate prediction of HIV integration sites and mechanistic explanations of the detected sites. Extensive tests on a high-density HIV integration site dataset showed that DeepHINT can outperform conventional modeling strategies by automatically learning the genomic context of HIV integration from primary DNA sequence alone or together with epigenetic information. Systematic analyses on diverse known factors of HIV integration further validated the biological relevance of the prediction results. More importantly, in-depth analyses of the attention values output by DeepHINT revealed intriguing mechanistic implications in the selection of HIV integration sites, including potential roles of several DNA-binding proteins. These results established DeepHINT as an effective and explainable deep learning framework for the prediction and mechanistic study of HIV integration. AVAILABILITY AND IMPLEMENTATION: DeepHINT is available as an open-source software and can be downloaded from https://github.com/nonnerdling/DeepHINT. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Seyedeh Neelufar Payrovnaziri; Zhaoyi Chen; Pablo Rengifo-Moreno; Tim Miller; Jiang Bian; Jonathan H Chen; Xiuwen Liu; Zhe He Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497