| Literature DB >> 33080214 |
Shengli Zhang1, Huijuan Qiao2.
Abstract
Long non-coding RNAs (lncRNAs) refer to functional RNA molecules with a length more than 200 nucleotides and have minimal or no function to encode proteins. In recent years, more studies show that lncRNAs subcellular localization has valuable clues for their biological functions. So it is count for much to identify lncRNAs subcellular localization. In this paper, a novel statistical model named KD-KLNMF is constructed to predict lncRNAs subcellular localization. Firstly, k-mer and dinucleotide-based spatial autocorrelation are incorporated as the feature vector. Then, Synthetic Minority Over-sampling Technique is used to deal with the imbalance dataset. Next, Kullback-Leibler divergence-based nonnegative matrix factorization is applied to select optimal features. And then we utilize support vector machine as the classifier after comparing with other classifiers. Finally, the jackknife test is performed to evaluate the model. The overall accuracies reach 97.24% and 92.86% on training dataset and independent dataset, respectively. The results are better than the previous methods, which indicate that our model will be a useful and feasible tool to identify lncRNAs subcellular localization. The datasets and source code are freely available at https://github.com/HuijuanQiao/KD-KLNMF.Entities:
Keywords: K-mer; KLNMF; LncRNAs subcellular localization; SMOTE; Spatial autocorrelation
Mesh:
Substances:
Year: 2020 PMID: 33080214 DOI: 10.1016/j.ab.2020.113995
Source DB: PubMed Journal: Anal Biochem ISSN: 0003-2697 Impact factor: 3.365