| Literature DB >> 27293536 |
Abstract
Clusters of genes acquired by lateral gene transfer in microbial genomes, are broadly referred to as genomic islands (GIs). GIs often carry genes important for genome evolution and adaptation to niches, such as genes involved in pathogenesis and antibiotic resistance. Therefore, GI prediction has gradually become an important part of microbial genome analysis. Despite inherent difficulties in identifying GIs, many computational methods have been developed and show good performance. In this mini-review, we first summarize the general challenges in predicting GIs. Then we group existing GI detection methods by their input, briefly describe representative methods in each group, and discuss their advantages as well as limitations. Finally, we look into the potential improvements for better GI prediction.Entities:
Keywords: Comparative genomics; Genome segmentation; Outlier detection; Pathogenicity islands; Sequence composition
Year: 2016 PMID: 27293536 PMCID: PMC4887561 DOI: 10.1016/j.csbj.2016.05.001
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1The schematic representation of several GI-associated features. A GI is often absent in closely related genomes. It may also have atypical compositional characteristics compared with the core genome, such as lower GC content. The presence of several sequence elements is indicative of a GI: flanking conserved regions, DRs, insertion sequence (IS) elements and mobility-related genes encoding integrase and transposase.
The available datasets related to genomic islands.
| Name | Feature | Availability |
|---|---|---|
| PAIDB | The only database including most reported PAIs and REIs | |
| Islander | Intended to be gold standard dataset for accurately mapped GIs | |
| ICEberg | Providing comprehensive information about ICEs | |
| RVM datasets | 331 GIs and 337 non-GIs from 37 bacteria of 3 genera | Not available |
| IslandPick datasets | 771 GIs and 3770 non-GIs from 118 bacteria of 12 orders | |
Fig. 2The hierarchical overview of computational methods for predicting genomic islands which are discussed in this paper.
The summary of selected programs for predicting genomic islands.
| Program | Form | Availability |
|---|---|---|
| PAI-IDA | Command line | Upon request |
| SIGI-HMM | Graphical interface | |
| Window-based methods | ||
| AlienHunter | Command line | |
| Centroid | Command line | Upon request |
| Design-Island | Command line | |
| INDeGenIUS | Command line | Upon request |
| GI-SVM | Command line | |
| Windowless methods | ||
| GC Profile | Web-based | |
| MJSD | Command line | |
| Direct integration methods | ||
| IslandPath | Web-based | |
| Machine learning methods | ||
| GIDetector | Command line | |
| GIHunter | Command line | |
| tRNAcc | Web-based | |
| IslandPick | Command line | |
| IslandViewer | Web-based | |
| EGID | Command line | |
| GIST | Graphical interface | |
| PredictBias | Web-based | |
| PIPS | Command line | |
| GI-POP | Web-based | |
The comparisons of selected programs for predicting genomic islands on S. typhi CT18 genome.
| Program | Category | Recall | Precision | F1 |
|---|---|---|---|---|
| GI-SVM | Methods based on DNA composition | 0.895 | 0.446 | 0.596 |
| EGID | Ensemble methods | 0.779 | 0.535 | 0.634 |
| SIGI-HMM | Methods based on gene composition | 0.241 | 0.556 | 0.337 |
| IslandViewer | Ensemble methods | 0.654 | 0.670 | 0.662 |
| GIHunter | Methods based on GI structure | 0.827 | 0.676 | 0.744 |
| IslandPath-DIMOB | Methods based on GI structure | 0.553 | 0.788 | 0.650 |
| tRNAcc | Methods based on several genomes | 0.286 | 0.993 | 0.444 |
| IslandPick | Methods based on several genomes | 0.060 | 1.000 | 0.114 |
The evaluations were based on 19 reference GIs obtained from [39], excluding two GIs of size smaller than 5 kb. The predictions of each program were either downloaded from the corresponding website (IslandViwer (including the predictions from SIGI-HMM, IslandPath-DIMOB, and IslandPick), tRNAcc, GIHunter) or from running the program on local machine with optimal parameters (GI-SVM, EGID). The evaluation metrics (Recall, Precision, F1) were measured as those in [36]. All the relevant data and scripts can be found at https://github.com/icelu/GI_Prediction.