Literature DB >> 34294042

Feature selection for RNA cleavage efficiency at specific sites using the LASSO regression model in Arabidopsis thaliana.

Daishin Ueno1, Harunori Kawabe1, Shotaro Yamasaki1, Taku Demura1, Ko Kato2.   

Abstract

BACKGROUND: RNA degradation is important for the regulation of gene expression. Despite the identification of proteins and sequences related to deadenylation-dependent RNA degradation in plants, endonucleolytic cleavage-dependent RNA degradation has not been studied in detail. Here, we developed truncated RNA end sequencing in Arabidopsis thaliana to identify cleavage sites and evaluate the efficiency of cleavage at each site. Although several features are related to RNA cleavage efficiency, the effect of each feature on cleavage efficiency has not been evaluated by considering multiple putative determinants in A. thaliana.
RESULTS: Cleavage site information was acquired from a previous study, and cleavage efficiency at the site level (CSsite value), which indicates the number of reads at each cleavage site normalized to RNA abundance, was calculated. To identify features related to cleavage efficiency at the site level, multiple putative determinants (features) were used to perform feature selection using the Least Absolute Shrinkage and Selection Operator (LASSO) regression model. The results indicated that whole RNA features were important for the CSsite value, in addition to features around cleavage sites. Whole RNA features related to the translation process and nucleotide frequency around cleavage sites were major determinants of cleavage efficiency. The results were verified in a model constructed using only sequence features, which showed that the prediction accuracy was similar to that determined using all features including the translation process, suggesting that cleavage efficiency can be predicted using only sequence information. The LASSO regression model was validated in exogenous genes, which showed that the model constructed using only sequence information can predict cleavage efficiency in both endogenous and exogenous genes.
CONCLUSIONS: Feature selection using the LASSO regression model in A. thaliana identified 155 features. Correlation coefficients revealed that whole RNA features are important for determining cleavage efficiency in addition to features around the cleavage sites. The LASSO regression model can predict cleavage efficiency in endogenous and exogenous genes using only sequence information. The model revealed the significance of the effect of multiple determinants on cleavage efficiency, suggesting that sequence features are important for RNA degradation mechanisms in A. thaliana.
© 2021. The Author(s).

Entities:  

Keywords:  Degradome sequencing; LASSO; RNA degradation

Mesh:

Substances:

Year:  2021        PMID: 34294042     DOI: 10.1186/s12859-021-04291-5

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  2 in total

1.  Changes in Polysome Association of mRNA Throughout Growth and Development in Arabidopsis thaliana.

Authors:  Shotaro Yamasaki; Hideyuki Matsuura; Taku Demura; Ko Kato
Journal:  Plant Cell Physiol       Date:  2015-09-26       Impact factor: 4.927

2.  i6mA-Fuse: improved and robust prediction of DNA 6 mA sites in the Rosaceae genome by fusing multiple feature representation.

Authors:  Md Mehedi Hasan; Balachandran Manavalan; Watshara Shoombuatong; Mst Shamima Khatun; Hiroyuki Kurata
Journal:  Plant Mol Biol       Date:  2020-03-05       Impact factor: 4.076

  2 in total
  3 in total

1.  Identification of a Novel Glycosyltransferase Prognostic Signature in Hepatocellular Carcinoma Based on LASSO Algorithm.

Authors:  Zhiyang Zhou; Tao Wang; Yao Du; Junping Deng; Ge Gao; Jiangnan Zhang
Journal:  Front Genet       Date:  2022-03-09       Impact factor: 4.599

2.  RSNET: inferring gene regulatory networks by a redundancy silencing and network enhancement technique.

Authors:  Xiaohan Jiang; Xiujun Zhang
Journal:  BMC Bioinformatics       Date:  2022-05-06       Impact factor: 3.307

3.  Glycosyltransferase-related long non-coding RNA signature predicts the prognosis of colon adenocarcinoma.

Authors:  Jiawei Zhang; Yinan Wu; Jiayi Mu; Dijia Xin; Luyao Wang; Yili Fan; Suzhan Zhang; Yang Xu
Journal:  Front Oncol       Date:  2022-09-20       Impact factor: 5.738

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.