| Literature DB >> 30336361 |
Md Rafsan Jani1, Md Toha Khan Mozlish2, Sajid Ahmed3, Niger Sultana Tahniat4, Dewan Md Farid5, Swakkhar Shatabda6.
Abstract
In genetic evolution, meiotic recombination plays an important role. Recombination introduces genetic variations and is a vital source of biodiversity and appears as a driving force in evolutionary development. Local regions of chromosomes where recombination events tend to be concentrated are known as hotspots and regions with relatively low frequencies of recombination are called coldspots. Predicting hotspots and coldspots can enlighten structure of recombination and genome evolution. In this paper, we proposed a predictor, called iRecSpot-EF to predict recombination hot and cold spots. iRecSpot-EF uses a novel set of features extracted from the genome sequences. We introduce the frequency of (l,k,p)-mers in the sequence as features. Our proposed feature extraction method hinges solely upon the nucleotide sequences, thus being cost-effective and robust. After feature extraction, the most informative features are selected using AdaBoost algorithm. We have selected logistic regression as the classification algorithm. iRecSpot-EF was tested on a standard benchmark dataset using cross-fold validation. It achieved an accuracy of 95.14% and area under Receiver Operating Characteristic curve (auROC) of 0.985. The performance of iRecSpot-EF is significantly better than the state-of-the-art methods. iRecSpot-EF is readily available for use from http://iRecSpot.pythonanywhere.com/server. All relevant codes are available via open repository at: https://github.com/mrzResearchArena/iRecSpot.Entities:
Keywords: Classification; DNA sequences; Feature selection; Novel feature extraction; Recombination hotspots; Web application
Mesh:
Substances:
Year: 2018 PMID: 30336361 DOI: 10.1016/j.compbiomed.2018.10.005
Source DB: PubMed Journal: Comput Biol Med ISSN: 0010-4825 Impact factor: 4.589