Junghwan Baek1, Byunghan Lee2, Sunyoung Kwon2, Sungroh Yoon1,2. 1. Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea. 2. Electrical and Computer Engineering, Seoul National University, Seoul, Korea.
Abstract
Motivation: Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-generation sequencing has helped in the discovery of lncRNA transcripts. However, the experimental verification of numerous transcriptomes is time consuming and costly. To alleviate these issues, a computational approach is needed to distinguish lncRNAs from the transcriptomes. Results: We present a deep learning-based approach, lncRNAnet, to identify lncRNAs that incorporates recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. lncRNAnet performed clearly better than the other tools for sequences of short lengths, on which most lncRNAs are distributed. In addition, lncRNAnet successfully learned features and showed 7.83%, 5.76%, 5.30% and 3.78% improvements over the alternatives on a human test set in terms of specificity, accuracy, F1-score and area under the curve, respectively. Availability and implementation: Data and codes are available in http://data.snu.ac.kr/pub/lncRNAnet.
Motivation: Long non-coding RNAs (lncRNAs) are important regulatory elements in biological processes. LncRNAs share similar sequence characteristics with messenger RNAs, but they play completely different roles, thus providing novel insights for biological studies. The development of next-generation sequencing has helped in the discovery of lncRNA transcripts. However, the experimental verification of numerous transcriptomes is time consuming and costly. To alleviate these issues, a computational approach is needed to distinguish lncRNAs from the transcriptomes. Results: We present a deep learning-based approach, lncRNAnet, to identify lncRNAs that incorporates recurrent neural networks for RNA sequence modeling and convolutional neural networks for detecting stop codons to obtain an open reading frame indicator. lncRNAnet performed clearly better than the other tools for sequences of short lengths, on which most lncRNAs are distributed. In addition, lncRNAnet successfully learned features and showed 7.83%, 5.76%, 5.30% and 3.78% improvements over the alternatives on a human test set in terms of specificity, accuracy, F1-score and area under the curve, respectively. Availability and implementation: Data and codes are available in http://data.snu.ac.kr/pub/lncRNAnet.
Authors: Adam W Turner; Doris Wong; Mohammad Daud Khan; Caitlin N Dreisbach; Meredith Palmore; Clint L Miller Journal: Front Cardiovasc Med Date: 2019-02-19