Literature DB >> 35526094

Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Md Mehedi Hasan1, Sho Tsukiyama2, Jae Youl Cho3, Hiroyuki Kurata2, Md Ashad Alam4, Xiaowen Liu4, Balachandran Manavalan5, Hong-Wen Deng6.   

Abstract

As one of the most prevalent post-transcriptional epigenetic modifications, N5-methylcytosine (m5C) plays an essential role in various cellular processes and disease pathogenesis. Therefore, it is important accurately identify m5C modifications in order to gain a deeper understanding of cellular processes and other possible functional mechanisms. Although a few computational methods have been proposed, their respective models have been developed using small training datasets. Hence, their practical application is quite limited in genome-wide detection. To overcome the existing limitations, we propose Deepm5C, a bioinformatics method for identifying RNA m5C sites throughout the human genome. To develop Deepm5C, we constructed a novel benchmarking dataset and investigated a mixture of three conventional feature-encoding algorithms and a feature derived from word-embedding approaches. Afterward, four variants of deep-learning classifiers and four commonly used conventional classifiers were employed and trained with the four encodings, ultimately obtaining 32 baseline models. A stacking strategy is effectively utilized by integrating the predicted output of the optimal baseline models and trained with a one-dimensional (1D) convolutional neural network. As a result, the Deepm5C predictor achieved excellent performance during cross-validation with a Matthews correlation coefficient and an accuracy of 0.697 and 0.855, respectively. The corresponding metrics during the independent test were 0.691 and 0.852, respectively. Overall, Deepm5C achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, Deepm5C is expected to assist community-wide efforts in identifying putative m5Cs and to formulate the novel testable biological hypothesis.
Copyright © 2022 The American Society of Gene and Cell Therapy. Published by Elsevier Inc. All rights reserved.

Entities:  

Keywords:  RNA N5-methylcytosine; baseline models; bioinformatics; deep learning; epigenetic regulation; machine learning; prediction model; sequence analysis; stacking framework; systematic evaluation

Mesh:

Substances:

Year:  2022        PMID: 35526094      PMCID: PMC9372321          DOI: 10.1016/j.ymthe.2022.05.001

Source DB:  PubMed          Journal:  Mol Ther        ISSN: 1525-0016            Impact factor:   12.910


  68 in total

1.  Three RNA cells for ribosomal lineages and three DNA viruses to replicate their genomes: a hypothesis for the origin of cellular domain.

Authors:  Patrick Forterre
Journal:  Proc Natl Acad Sci U S A       Date:  2006-02-27       Impact factor: 11.205

Review 2.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences.

Authors:  Wei Chen; Hao Lin; Kuo-Chen Chou
Journal:  Mol Biosyst       Date:  2015-10

3.  PHYPred: a tool for identifying bacteriophage enzymes and hydrolases.

Authors:  Hui Ding; Wuritu Yang; Hua Tang; Peng-Mian Feng; Jian Huang; Wei Chen; Hao Lin
Journal:  Virol Sin       Date:  2016-08       Impact factor: 4.327

4.  Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework.

Authors:  Leyi Wei; Wenjia He; Adeel Malik; Ran Su; Lizhen Cui; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

5.  BioSeq-Analysis2.0: an updated platform for analyzing DNA, RNA and protein sequences at sequence level and residue level based on machine learning approaches.

Authors:  Bin Liu; Xin Gao; Hanyu Zhang
Journal:  Nucleic Acids Res       Date:  2019-11-18       Impact factor: 16.971

6.  Comprehensive review and assessment of computational methods for predicting RNA post-transcriptional modification sites from RNA sequences.

Authors:  Zhen Chen; Pei Zhao; Fuyi Li; Yanan Wang; A Ian Smith; Geoffrey I Webb; Tatsuya Akutsu; Abdelkader Baggag; Halima Bensmail; Jiangning Song
Journal:  Brief Bioinform       Date:  2019-11-11       Impact factor: 11.622

7.  RMDisease: a database of genetic variants that affect RNA modifications, with implications for epitranscriptome pathogenesis.

Authors:  Kunqi Chen; Bowen Song; Yujiao Tang; Zhen Wei; Qingru Xu; Jionglong Su; João Pedro de Magalhães; Daniel J Rigden; Jia Meng
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

8.  Eukaryotic rRNA Modification by Yeast 5-Methylcytosine-Methyltransferases and Human Proliferation-Associated Antigen p120.

Authors:  Gabrielle Bourgeois; Michel Ney; Imre Gaspar; Christelle Aigueperse; Matthias Schaefer; Stefanie Kellner; Mark Helm; Yuri Motorin
Journal:  PLoS One       Date:  2015-07-21       Impact factor: 3.240

9.  Epigenomic analysis of Parkinson's disease neurons identifies Tet2 loss as neuroprotective.

Authors:  Lee L Marshall; Bryan A Killinger; Elizabeth Ensink; Peipei Li; Katie X Li; Wei Cui; Noah Lubben; Matthew Weiland; Xinhe Wang; Juozas Gordevicius; Gerhard A Coetzee; Jiyan Ma; Stefan Jovinge; Viviane Labrie
Journal:  Nat Neurosci       Date:  2020-08-17       Impact factor: 24.884

10.  A Leveraged Signal-to-Noise Ratio (LSTNR) Method to Extract Differentially Expressed Genes and Multivariate Patterns of Expression From Noisy and Low-Replication RNAseq Data.

Authors:  Oswaldo A Lozoya; Janine H Santos; Richard P Woychik
Journal:  Front Genet       Date:  2018-05-16       Impact factor: 4.599

View more
  4 in total

1.  TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization.

Authors:  Young-Jun Jeon; Md Mehedi Hasan; Hyun Woo Park; Ki Wook Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

2.  Computational prediction and interpretation of druggable proteins using a stacked ensemble-learning framework.

Authors:  Phasit Charoenkwan; Nalini Schaduangrat; Pietro Lio'; Mohammad Ali Moni; Watshara Shoombuatong; Balachandran Manavalan
Journal:  iScience       Date:  2022-08-05

3.  IBPred: A sequence-based predictor for identifying ion binding protein in phage.

Authors:  Shi-Shi Yuan; Dong Gao; Xue-Qin Xie; Cai-Yi Ma; Wei Su; Zhao-Yue Zhang; Yan Zheng; Hui Ding
Journal:  Comput Struct Biotechnol J       Date:  2022-08-28       Impact factor: 6.155

4.  MLACP 2.0: An updated machine learning tool for anticancer peptide prediction.

Authors:  Le Thi Phan; Hyun Woo Park; Thejkiran Pitti; Thirumurthy Madhavan; Young-Jun Jeon; Balachandran Manavalan
Journal:  Comput Struct Biotechnol J       Date:  2022-08-02       Impact factor: 6.155

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.