| Literature DB >> 35753698 |
Young-Jun Jeon1, Md Mehedi Hasan2, Hyun Woo Park1, Ki Wook Lee1, Balachandran Manavalan3.
Abstract
Long noncoding RNAs (lncRNAs) are primarily regulated by their cellular localization, which is responsible for their molecular functions, including cell cycle regulation and genome rearrangements. Accurately identifying the subcellular location of lncRNAs from sequence information is crucial for a better understanding of their biological functions and mechanisms. In contrast to traditional experimental methods, bioinformatics or computational methods can be applied for the annotation of lncRNA subcellular locations in humans more effectively. In the past, several machine learning-based methods have been developed to identify lncRNA subcellular localization, but relevant work for identifying cell-specific localization of human lncRNA remains limited. In this study, we present the first application of the tree-based stacking approach, TACOS, which allows users to identify the subcellular localization of human lncRNA in 10 different cell types. Specifically, we conducted comprehensive evaluations of six tree-based classifiers with 10 different feature descriptors, using a newly constructed balanced training dataset for each cell type. Subsequently, the strengths of the AdaBoost baseline models were integrated via a stacking approach, with an appropriate tree-based classifier for the final prediction. TACOS displayed consistent performance in both the cross-validation and independent assessments compared with the other two approaches employed in this study. The user-friendly online TACOS web server can be accessed at https://balalab-skku.org/TACOS.Entities:
Keywords: bioinformatics; feature extraction; long noncoding RNAs; sequence analysis; stacking strategy; tree-based algorithms
Mesh:
Substances:
Year: 2022 PMID: 35753698 PMCID: PMC9294414 DOI: 10.1093/bib/bbac243
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 13.994