Jiuxing Liang1, Zifeng Cui2, Canbiao Wu1, Yao Yu3,4, Rui Tian5, Hongxian Xie6, Zhuang Jin2, Weiwen Fan2, Weiling Xie2, Zhaoyue Huang2, Wei Xu2, Jingjing Zhu2, Zeshan You2, Xiaofang Guo7, Xiaofan Qiu1, Jiahao Ye1,8, Bin Lang9, Mengyuan Li2, Songwei Tan10, Zheng Hu2,11. 1. Key Laboratory of Brain, Cognition and Education Sciences, Ministry of Education, China; Institute for Brain Research and Rehabilitation, South China Normal University, Guangzhou 510631, China. 2. Department of Gynaecological oncology, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, Guangdong, China. 3. Department of Urology, The First Medical Center of Chinese PLA General Hospital, Beijing 100853 China. 4. School of Medicine, Nankai University, Tianjin 300071, China. 5. Center for Translational Medicine, The First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510080, Guangdong, China. 6. STech Company Bio-X Lab, Zhuhai 519000, Guangdong, China. 7. Department of Medical Oncology of the Eastern Hospital, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou, 510700, China. 8. School of Computer Science, South China Normal University, Guangzhou 510631, China. 9. School of Health Sciences and Sports, Macao Polytechnic Institute, China. 10. School of Pharmacy, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China. 11. Department of Obstetrics and Gynaecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, Hubei, China.
Abstract
MOTIVATION: Epstein-Barr virus (EBV) is one of the most prevalent DNA oncogenic viruses. The integration of EBV into the host genome has been reported to play an important role in cancer development. The preference of EBV integration showed strong dependence on the local genomic environment, which enables the prediction of EBV integration sites. RESULTS: An attention-based deep learning model, DeepEBV, was developed to predict EBV integration sites by learning local genomic features automatically. First, DeepEBV was trained and tested using the data from the dsVIS database. The results showed that DeepEBV with EBV integration sequences plus Repeat peaks and 2 fold data augmentation performed the best on the training dataset. Furthermore, the performance of the model was validated in an independent dataset. In addition, the motifs of DNA-binding proteins could influence the selection preference of viral insertional mutagenesis. Furthermore, the results showed that DeepEBV can predict EBV integration hotspot genes accurately. In summary, DeepEBV is a robust, accurate and explainable deep learning model, providing novel insights into EBV integration preferences and mechanisms. AVAILABILITY: DeepEBV is available as open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepEBV.gitSupplementary information Supplementary data are available at Bioinformatics online.
MOTIVATION: Epstein-Barr virus (EBV) is one of the most prevalent DNA oncogenic viruses. The integration of EBV into the host genome has been reported to play an important role in cancer development. The preference of EBV integration showed strong dependence on the local genomic environment, which enables the prediction of EBV integration sites. RESULTS: An attention-based deep learning model, DeepEBV, was developed to predict EBV integration sites by learning local genomic features automatically. First, DeepEBV was trained and tested using the data from the dsVIS database. The results showed that DeepEBV with EBV integration sequences plus Repeat peaks and 2 fold data augmentation performed the best on the training dataset. Furthermore, the performance of the model was validated in an independent dataset. In addition, the motifs of DNA-binding proteins could influence the selection preference of viral insertional mutagenesis. Furthermore, the results showed that DeepEBV can predict EBV integration hotspot genes accurately. In summary, DeepEBV is a robust, accurate and explainable deep learning model, providing novel insights into EBV integration preferences and mechanisms. AVAILABILITY: DeepEBV is available as open-source software and can be downloaded from https://github.com/JiuxingLiang/DeepEBV.gitSupplementary information Supplementary data are available at Bioinformatics online.