| Literature DB >> 33842280 |
Faegheh Golabi1,2, Elnaz Mehdizadeh Aghdam3,4, Mousa Shamsi1, Mohammad Hossein Sedaaghi5, Abolfazl Barzegar6, Mohammad Saeid Hejazi3,4.
Abstract
Introduction: Riboswitches are short regulatory elements generally found in the untranslated regions of prokaryotes' mRNAs and classified into several families. Due to the binding possibility between riboswitches and antibiotics, their usage as engineered regulatory elements and also their evolutionary contribution, the need for bioinformatics tools of riboswitch detection is increasing. We have previously introduced an alignment independent algorithm for the identification of frequent sequential blocks in the families of riboswitches. Herein, we report the application of block location-based feature extraction strategy (BLBFE), which uses the locations of detected blocks on riboswitch sequences as features for classification of seed sequences. Besides, mono- and dinucleotide frequencies, k-mer, DAC, DCC, DACC, PC-PseDNC-General and SC-PseDNC-General methods as some feature extraction strategies were investigated.Entities:
Keywords: BLBFE; Block-finding algorithm; Classification; Feature extraction; Riboswitches
Year: 2020 PMID: 33842280 PMCID: PMC8022236 DOI: 10.34172/bi.2021.17
Source DB: PubMed Journal: Bioimpacts ISSN: 2228-5652
The seed data related to five families of riboswitches employed in this study, obtained from the Rfam 12.0 database
|
|
|
|
| RF00050 | FMN | 144 |
| RF00522 | PreQ1 | 41 |
| RF00167 | Purine | 133 |
| RF00162 | SAM | 433 |
| RF00059 | TPP | 115 |
The identified frequent blocks for 5 riboswitch families using block finding algorithm.
|
|
|
| ACCG, CCGAC, CGGU, GGAUG, GGGC, GGUG, UCCC | FMN |
| AAAAAACUA, CCC, GGUUC | PreQ1 |
| UAUA, UCUACC | Purine |
| AGA, AUC, GAGGGA, GCAACC, GCCC, GUGC | SAM |
| ACCUG, CUGAGA, GGG | TPP |
Fig. 1
Fig. 2The confusion matrix for the decision Tree classifier, when using the BLBFE method
|
|
|
|
|
|
|
|
|
|
|
| FMN | 141 | 2 | 0 | 0 | 1 | 141 | 675 | 14 | 3 |
| PreQ1 | 1 | 36 | 3 | 1 | 0 | 36 | 780 | 8 | 5 |
| Purine | 6 | 5 | 112 | 6 | 5 | 112 | 704 | 6 | 21 |
| SAM | 0 | 1 | 1 | 426 | 5 | 426 | 390 | 12 | 7 |
| TPP | 7 | 0 | 2 | 5 | 101 | 101 | 715 | 10 | 14 |
Performance measures for 8 feature extraction methods used in 4 classifiers
|
|
|
|
|
|
|
|
| ||
|
| Decision Tree |
| 68.94 | 73.09 | 62.36 | 69.63 | 68.13 | 66.74 | 69.4 |
| KNN |
| 84.18 | 85.22 | 81.64 | 86.61 | 86.72 | 83.14 | 87.3 | |
| LDA |
| 84.18 | 84.3 | 85.68 | 92.61 | 94.69 | 87.07 | 89.38 | |
| Naïve Bayes | 63.97 | 83.49 | 83.6 | 74.13 |
| 83.49 | 76.91 | 82.45 | |
|
| Decision Tree |
| 85.22 | 87.52 | 81.27 | 85.54 | 84.6 | 83.87 | 85.39 |
| KNN |
| 93.22 | 93.72 | 91.94 | 94.27 | 94.31 | 92.7 | 94.64 | |
| LDA |
| 93.23 | 93.3 | 93.88 | 96.94 | 97.83 | 94.54 | 95.54 | |
| Naïve Bayes | 82.72 | 92.87 | 92.97 | 88.24 |
| 92.79 | 89.58 | 92.42 | |
|
| Decision Tree |
| 62.49 | 68.79 | 55.22 | 62.81 | 58.59 | 60.85 | 62.85 |
| KNN |
| 80.68 | 81.29 | 76.43 | 82.27 | 82.58 | 79.45 | 83.68 | |
| LDA | 93.75 | 85.44 | 86.09 | 86.96 | 92.21 |
| 89.25 | 90.33 | |
| Naïve Bayes | 59.66 | 80.22 |
| 65.02 | 78.77 | 77.1 | 71.48 | 77.36 | |
|
| Decision Tree |
| 88.15 | 90.44 | 85.06 | 88.68 | 87.83 | 87.2 | 88.71 |
| KNN |
| 94.43 | 94.7 | 93.46 | 95.51 | 95.63 | 94.24 | 95.56 | |
| LDA |
| 95.38 | 95.41 | 95.85 | 97.84 | 98.48 | 96.39 | 97.04 | |
| Naïve Bayes | 85.57 | 94.05 | 94.14 | 89.94 |
| 94.1 | 91.49 | 93.63 | |
|
| Decision Tree |
| 62.89 | 68.35 | 54.69 | 63.53 | 59.64 | 60.19 | 62.6 |
| KNN |
| 81.88 | 83.25 | 78.6 | 84.03 | 83.98 | 80.64 | 85.2 | |
| LDA | 93.85 | 83.81 | 84.33 | 85.28 | 92.09 |
| 87.32 | 88.23 | |
| Naïve Bayes | 60.69 | 82.14 |
| 68.12 | 81.86 | 79.48 | 72.36 | 79.05 |
The highest value in each row is bolded.
Fig. 3
Fig. 4