Literature DB >> 33152766

Computational prediction and interpretation of cell-specific replication origin sites from multiple eukaryotes by exploiting stacking framework.

Leyi Wei1, Wenjia He2, Adeel Malik3, Ran Su4, Lizhen Cui5, Balachandran Manavalan6.   

Abstract

Origins of replication sites (ORIs), which refers to the initiative locations of genomic DNA replication, play essential roles in DNA replication process. Detection of ORIs' distribution in genome scale is one of key steps to in-depth understanding their regulation mechanisms. In this study, we presented a novel machine learning-based approach called Stack-ORI encompassing 10 cell-specific prediction models for identifying ORIs from four different eukaryotic species (Homo sapiens, Mus musculus, Drosophila melanogaster and Arabidopsis thaliana). For each cell-specific model, we employed 12 feature encoding schemes that cover nucleic acid composition, position-specific and physicochemical properties information. The optimal feature set was identified from each encoding individually and developed their respective baseline models using the eXtreme Gradient Boosting (XGBoost) classifier. Subsequently, the predicted scores of 12 baseline models are integrated as a novel feature vector to train XGBoost and develop the final model. Extensive experimental results show that Stack-ORI achieves significantly better performance as compared with their baseline models on both training and independent datasets. Interestingly, Stack-ORI consistently outperforms existing predictor in all cell-specific models, not only on training but also on independent test. Moreover, our novel approach provides necessary interpretations that help understanding model success by leveraging the powerful SHapley Additive exPlanation algorithm, thus underlining the most important feature encoding schemes significant for predicting cell-specific ORIs.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  eXtreme Gradient Boosting; feature extraction; model interpretability; origin of replication site; stacking strategy

Year:  2021        PMID: 33152766     DOI: 10.1093/bib/bbaa275

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  19 in total

1.  Computational prediction of species-specific yeast DNA replication origin via iterative feature representation.

Authors:  Balachandran Manavalan; Shaherin Basith; Tae Hwan Shin; Gwang Lee
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

2.  STALLION: a stacking-based ensemble learning framework for prokaryotic lysine acetylation site prediction.

Authors:  Shaherin Basith; Gwang Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-01-17       Impact factor: 11.622

3.  TACOS: a novel approach for accurate prediction of cell-specific long noncoding RNAs subcellular localization.

Authors:  Young-Jun Jeon; Md Mehedi Hasan; Hyun Woo Park; Ki Wook Lee; Balachandran Manavalan
Journal:  Brief Bioinform       Date:  2022-07-18       Impact factor: 13.994

4.  Deepm5C: A deep-learning-based hybrid framework for identifying human RNA N5-methylcytosine sites using a stacking strategy.

Authors:  Md Mehedi Hasan; Sho Tsukiyama; Jae Youl Cho; Hiroyuki Kurata; Md Ashad Alam; Xiaowen Liu; Balachandran Manavalan; Hong-Wen Deng
Journal:  Mol Ther       Date:  2022-05-06       Impact factor: 12.910

5.  PUP-Fuse: Prediction of Protein Pupylation Sites by Integrating Multiple Sequence Representations.

Authors:  Firda Nurul Auliah; Andi Nur Nilamyani; Watshara Shoombuatong; Md Ashad Alam; Md Mehedi Hasan; Hiroyuki Kurata
Journal:  Int J Mol Sci       Date:  2021-02-20       Impact factor: 5.923

6.  4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism.

Authors:  Rao Zeng; Song Cheng; Minghong Liao
Journal:  Front Cell Dev Biol       Date:  2021-05-10

7.  Porpoise: a new approach for accurate prediction of RNA pseudouridine sites.

Authors:  Fuyi Li; Xudong Guo; Peipei Jin; Jinxiang Chen; Dongxu Xiang; Jiangning Song; Lachlan J M Coin
Journal:  Brief Bioinform       Date:  2021-11-05       Impact factor: 13.994

8.  Assessing Dry Weight of Hemodialysis Patients via Sparse Laplacian Regularized RVFL Neural Network with L2,1-Norm.

Authors:  Xiaoyi Guo; Wei Zhou; Qun Lu; Aiyan Du; Yinghua Cai; Yijie Ding
Journal:  Biomed Res Int       Date:  2021-02-04       Impact factor: 3.411

9.  PredNTS: Improved and Robust Prediction of Nitrotyrosine Sites by Integrating Multiple Sequence Features.

Authors:  Andi Nur Nilamyani; Firda Nurul Auliah; Mohammad Ali Moni; Watshara Shoombuatong; Md Mehedi Hasan; Hiroyuki Kurata
Journal:  Int J Mol Sci       Date:  2021-03-08       Impact factor: 5.923

10.  Identification of Helicobacter pylori Membrane Proteins Using Sequence-Based Features.

Authors:  Mujiexin Liu; Hui Chen; Dong Gao; Cai-Yi Ma; Zhao-Yue Zhang
Journal:  Comput Math Methods Med       Date:  2022-01-12       Impact factor: 2.238

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.