| Literature DB >> 34122622 |
Mbulayi Onesime1, Zhenyu Yang1, Qi Dai1.
Abstract
Genomic islands are related to microbial adaptation and carry different genomic characteristics from the host. Therefore, many methods have been proposed to detect genomic islands from the rest of the genome by evaluating its sequence composition. Many sequence features have been proposed, but many of them have not been applied to the identification of genomic islands. In this paper, we present a scheme to predict genomic islands using the chi-square test and random forest algorithm. We extract seven kinds of sequence features and select the important features with the chi-square test. All the selected features are then input into the random forest to predict the genome islands. Three experiments and comparison show that the proposed method achieves the best performance. This understanding can be useful to design more powerful method for the genomic island prediction.Entities:
Year: 2021 PMID: 34122622 PMCID: PMC8169257 DOI: 10.1155/2021/9969751
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Figure 1Comparison of the accuracy, precision, recall, F1, AUC, and MCC of the PICK108, CF15, and RGP104 datasets.
Comparison of the proposed method with other reported results on the PICK108 dataset.
| Method | Accuracy | Precision | Recall |
|---|---|---|---|
| Centroid | 82.4 | 61.4 | 27.6 |
| INDeGenIUS | 82.4 | 67.9 | 19.9 |
| MTGIpick | 86.2 | 72.8 | 47.2 |
| SigHunt | 80.5 | 51.0 | 24.0 |
| Zisland Explorer | 83.8 | 75.9 | 25.5 |
| This paper | 94.6 | 95.1 | 85.7 |
Comparison of the proposed method with other reported results on the RGP104 dataset.
| Method | MCC | F1 | ACC | Precision | Recall |
|---|---|---|---|---|---|
| PanRGP | 77.8 | 80.9 | 92.4 | 94.9 | 76.4 |
| IslandViewer | 76.2 | 82.0 | 91.1 | 90.8 | 78.8 |
| IslandPath | 52.3 | 57.0 | 78.1 | 89.1 | 47.7 |
| IslandCafe | 37.7 | 44.4 | 76.1 | 76.9 | 35.5 |
| SIGI-HMM | 33.8 | 45.5 | 75.6 | 65.5 | 37.6 |
| This paper | 88.8 | 94.4 | 95.6 | 94.8 | 94.0 |
Comparison of the proposed method with other reported results on the CF15 dataset.
| Method | Recall | Precision | F1 | MCC |
|---|---|---|---|---|
| IslandCafe | 71.0 | 61.0 | 66.0 | 62.0 |
| IslandViewer | 72.0 | 59.0 | 65.0 | 59.0 |
| IslandPath-Dimob | 53.0 | 67.0 | 59.0 | 55.0 |
| Zisland Explorer | 45.0 | 56.0 | 50.0 | 46.0 |
| SIGI-HMM | 24.0 | 57.0 | 33.0 | 32.0 |
| This paper | 95.4 | 95.4 | 95.4 | 90.9 |
Figure 2Comparison of the overall prediction accuracies of seven kinds of the sequence features.
Figure 3The comparison of the overall accuracies of all experiments with the selected feature sets for three datasets.
Figure 4The comparison of the overall accuracies of different prediction algorithms with the selected feature sets for three datasets.