| Literature DB >> 32065211 |
Fu-Ying Dao1, Hao Lv1, Hasan Zulfiqar1, Hui Yang1, Wei Su1, Hui Gao1, Hui Ding1, Hao Lin1.
Abstract
The locations of the initiation of genomic DNA replication are defined as origins of replication sites (ORIs), which regulate the onset of DNA replication and play significant roles in the DNA replication process. The study of ORIs is essential for understanding the cell-division cycle and gene expression regulation. Accurate identification of ORIs will provide important clues for DNA replication research and drug development by developing computational methods. In this paper, the first integrated predictor named iORI-Euk was built to identify ORIs in multiple eukaryotes and multiple cell types. In the predictor, seven eukaryotic (Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana, Pichia pastoris, Schizosaccharomyces pombe and Kluyveromyces lactis) ORI data was collected from public database to construct benchmark datasets. Subsequently, three feature extraction strategies which are k-mer, binary encoding and combination of k-mer and binary were used to formulate DNA sequence samples. We also compared the different classification algorithms' performance. As a result, the best results were obtained by using support vector machine in 5-fold cross-validation test and independent dataset test. Based on the optimal model, an online web server called iORI-Euk (http://lin-group.cn/server/iORI-Euk/) was established for the novel ORI identification.Entities:
Keywords: classification algorithm; eukaryote; feature extraction; origins of replication site; webserver
Year: 2021 PMID: 32065211 DOI: 10.1093/bib/bbaa017
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622