Literature DB >> 34790508

Improved recognition of splice sites in A. thaliana by incorporating secondary structure information into sequence-derived features: a computational study.

Prabina Kumar Meher1, Subhrajit Satpathy1.   

Abstract

Identification of splice sites is an important aspect with regard to the prediction of gene structure. In most of the existing splice site prediction studies, machine learning algorithms coupled with sequence-derived features have been successfully employed for splice site recognition. However, the splice site identification by incorporating the secondary structure information is lacking, particularly in plant species. Thus, we made an attempt in this study to evaluate the performance of structural features on the splice site prediction accuracy in Arabidopsis thaliana. Prediction accuracies were evaluated with the sequence-derived features alone as well as by incorporating the structural features into the sequence-derived features, where support vector machine (SVM) was employed as prediction algorithm. Both short (40 base pairs) and long (105 base pairs) sequence datasets were considered for evaluation. After incorporating the secondary structure features, improvements in accuracies were observed only for the longer sequence dataset and the improvement was found to be higher with the sequence-derived features that accounted nucleotide dependencies. On the other hand, either a little or no improvement in accuracies was found for the short sequence dataset. The performance of SVM was further compared with that of LogitBoost, Random Forest (RF), AdaBoost and XGBoost machine learning methods. The prediction accuracies of SVM, AdaBoost and XGBoost were observed to be at par and higher than that of RF and LogitBoost algorithms. While prediction was performed by taking all the sequence-derived features along with the structural features, a little improvement in accuracies was found as compared to the combination of individual sequence-based features and structural features. To the best of our knowledge, this is the first attempt concerning the computational prediction of splice sites using machine learning methods by incorporating the secondary structure information into the sequence-derived features. All the source codes are available at https://github.com/meher861982/SSFeature. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s13205-021-03036-8. © King Abdulaziz City for Science and Technology 2021.

Entities:  

Keywords:  Computational biology; Machine learning; Nucleotide dependencies; Secondary structure; Splice junction

Year:  2021        PMID: 34790508      PMCID: PMC8558126          DOI: 10.1007/s13205-021-03036-8

Source DB:  PubMed          Journal:  3 Biotech        ISSN: 2190-5738            Impact factor:   2.406


  40 in total

1.  GeneSplicer: a new computational method for splice site prediction.

Authors:  M Pertea; X Lin; S L Salzberg
Journal:  Nucleic Acids Res       Date:  2001-03-01       Impact factor: 16.971

2.  Pre-mRNA secondary structure prediction aids splice site prediction.

Authors:  Donald J Patterson; Ken Yasuhara; Walter L Ruzzo
Journal:  Pac Symp Biocomput       Date:  2002

3.  Vienna RNA secondary structure server.

Authors:  Ivo L Hofacker
Journal:  Nucleic Acids Res       Date:  2003-07-01       Impact factor: 16.971

Review 4.  Influence of RNA secondary structure on the pre-mRNA splicing process.

Authors:  Emanuele Buratti; Francisco E Baralle
Journal:  Mol Cell Biol       Date:  2004-12       Impact factor: 4.272

5.  Splice site detection with a higher-order markov model implemented on a neural network.

Authors:  Loi Sy Ho; Jagath C Rajapakse
Journal:  Genome Inform       Date:  2003

6.  Intron-exon structures of eukaryotic model organisms.

Authors:  M Deutsch; M Long
Journal:  Nucleic Acids Res       Date:  1999-08-01       Impact factor: 16.971

7.  High-accuracy splice site prediction based on sequence component and position features.

Authors:  J L Li; L F Wang; H Y Wang; L Y Bai; Z M Yuan
Journal:  Genet Mol Res       Date:  2012-09-25

8.  A computational approach for prediction of donor splice sites with improved accuracy.

Authors:  Prabina Kumar Meher; Tanmaya Kumar Sahu; A R Rao; S D Wahi
Journal:  J Theor Biol       Date:  2016-06-11       Impact factor: 2.691

9.  Splice site identification using probabilistic parameters and SVM classification.

Authors:  A K M A Baten; B C H Chang; S K Halgamuge; Jason Li
Journal:  BMC Bioinformatics       Date:  2006-12-18       Impact factor: 3.169

10.  Correlation between the secondary structure of pre-mRNA introns and the efficiency of splicing in Saccharomyces cerevisiae.

Authors:  Sanja Rogic; Ben Montpetit; Holger H Hoos; Alan K Mackworth; Bf Francis Ouellette; Philip Hieter
Journal:  BMC Genomics       Date:  2008-07-29       Impact factor: 3.969

View more
  1 in total

1.  ASRmiRNA: Abiotic Stress-Responsive miRNA Prediction in Plants by Using Machine Learning Algorithms with Pseudo K-Tuple Nucleotide Compositional Features.

Authors:  Prabina Kumar Meher; Shbana Begam; Tanmaya Kumar Sahu; Ajit Gupta; Anuj Kumar; Upendra Kumar; Atmakuri Ramakrishna Rao; Krishna Pal Singh; Om Parkash Dhankher
Journal:  Int J Mol Sci       Date:  2022-01-30       Impact factor: 5.923

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.