Literature DB >> 30649200

Characterization and identification of long non-coding RNAs based on feature relationship.

Guangyu Wang1,2,3, Hongyan Yin1,2,3, Boyang Li4, Chunlei Yu1,2,3, Fan Wang1,2, Xingjian Xu1,2,3, Jiabao Cao1,2,3, Yiming Bao1,2, Liguo Wang5, Amir A Abbasi6, Vladimir B Bajic7, Lina Ma1,2, Zhang Zhang1,2,3.   

Abstract

MOTIVATION: The significance of long non-coding RNAs (lncRNAs) in many biological processes and diseases has gained intense interests over the past several years. However, computational identification of lncRNAs in a wide range of species remains challenging; it requires prior knowledge of well-established sequences and annotations or species-specific training data, but the reality is that only a limited number of species have high-quality sequences and annotations.
RESULTS: Here we first characterize lncRNAs in contrast to protein-coding RNAs based on feature relationship and find that the feature relationship between open reading frame length and guanine-cytosine (GC) content presents universally substantial divergence in lncRNAs and protein-coding RNAs, as observed in a broad variety of species. Based on the feature relationship, accordingly, we further present LGC, a novel algorithm for identifying lncRNAs that is able to accurately distinguish lncRNAs from protein-coding RNAs in a cross-species manner without any prior knowledge. As validated on large-scale empirical datasets, comparative results show that LGC outperforms existing algorithms by achieving higher accuracy, well-balanced sensitivity and specificity, and is robustly effective (>90% accuracy) in discriminating lncRNAs from protein-coding RNAs across diverse species that range from plants to mammals. To our knowledge, this study, for the first time, differentially characterizes lncRNAs and protein-coding RNAs based on feature relationship, which is further applied in computational identification of lncRNAs. Taken together, our study represents a significant advance in characterization and identification of lncRNAs and LGC thus bears broad potential utility for computational analysis of lncRNAs in a wide range of species.
AVAILABILITY AND IMPLEMENTATION: LGC web server is publicly available at http://bigd.big.ac.cn/lgc/calculator. The scripts and data can be downloaded at http://bigd.big.ac.cn/biocode/tools/BT000004. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Substances:

Year:  2019        PMID: 30649200     DOI: 10.1093/bioinformatics/btz008

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  19 in total

1.  Database Resources of the National Genomics Data Center in 2020.

Authors: 
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

2.  Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2022.

Authors: 
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

3.  A survey of transcriptome complexity using full-length isoform sequencing in the tea plant Camellia sinensis.

Authors:  Dongna Ma; Jingping Fang; Qiansu Ding; Liufeng Wei; Yiying Li; Liwen Zhang; Xingtan Zhang
Journal:  Mol Genet Genomics       Date:  2022-06-28       Impact factor: 2.980

4.  Integrated SMRT and Illumina Sequencing Provide New Insights into Crocin Biosynthesis of Gardenia jasminoides.

Authors:  Tengfei Shen; Yongjie Zheng; Qian Liu; Caihui Chen; Lili Huang; Shaoyong Deng; Meng Xu; Chunxia Yang
Journal:  Int J Mol Sci       Date:  2022-06-05       Impact factor: 6.208

5.  Computational Analysis Predicts Hundreds of Coding lncRNAs in Zebrafish.

Authors:  Shital Kumar Mishra; Han Wang
Journal:  Biology (Basel)       Date:  2021-04-26

6.  Identification of a novel anthocyanin synthesis pathway in the fungus Aspergillus sydowii H-1.

Authors:  Congfan Bu; Qian Zhang; Jie Zeng; Xiyue Cao; Zhaonan Hao; Dairong Qiao; Yi Cao; Hui Xu
Journal:  BMC Genomics       Date:  2020-01-08       Impact factor: 3.969

7.  IC4R-2.0: Rice Genome Reannotation Using Massive RNA-seq Data.

Authors:  Jian Sang; Dong Zou; Zhennan Wang; Fan Wang; Yuansheng Zhang; Lin Xia; Zhaohua Li; Lina Ma; Mengwei Li; Bingxiang Xu; Xiaonan Liu; Shuangyang Wu; Lin Liu; Guangyi Niu; Man Li; Yingfeng Luo; Songnian Hu; Lili Hao; Zhang Zhang
Journal:  Genomics Proteomics Bioinformatics       Date:  2020-07-16       Impact factor: 7.691

8.  lncRNADetector: a bioinformatics pipeline for long non-coding RNA identification and MAPslnc: a repository of medicinal and aromatic plant lncRNAs.

Authors:  Bhaskar Shukla; Sanchita Gupta; Gaurava Srivastava; Ashok Sharma; Ashutosh K Shukla; Ajit K Shasany
Journal:  RNA Biol       Date:  2021-03-18       Impact factor: 4.652

9.  Genome-wide profiling of long noncoding RNAs involved in wheat spike development.

Authors:  Pei Cao; Wenjuan Fan; Pengjia Li; Yuxin Hu
Journal:  BMC Genomics       Date:  2021-07-02       Impact factor: 3.969

10.  LncExpDB: an expression database of human long non-coding RNAs.

Authors:  Zhao Li; Lin Liu; Shuai Jiang; Qianpeng Li; Changrui Feng; Qiang Du; Dong Zou; Jingfa Xiao; Zhang Zhang; Lina Ma
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.