| Literature DB >> 29868902 |
Claire Bertelli1, Keith E Tilley1, Fiona S L Brinkman1.
Abstract
Horizontal gene transfer (also called lateral gene transfer) is a major mechanism for microbial genome evolution, enabling rapid adaptation and survival in specific niches. Genomic islands (GIs), commonly defined as clusters of bacterial or archaeal genes of probable horizontal origin, are of particular medical, environmental and/or industrial interest, as they disproportionately encode virulence factors and some antimicrobial resistance genes and may harbor entire metabolic pathways that confer a specific adaptation (solvent resistance, symbiosis properties, etc). As large-scale analyses of microbial genomes increases, such as for genomic epidemiology investigations of infectious disease outbreaks in public health, there is increased appreciation of the need to accurately predict and track GIs. Over the past decade, numerous computational tools have been developed to tackle the challenges inherent in accurate GI prediction. We review here the main types of GI prediction methods and discuss their advantages and limitations for a routine analysis of microbial genomes in this era of rapid whole-genome sequencing. An assessment is provided of 20 GI prediction software methods that use sequence-composition bias to identify the GIs, using a reference GI data set from 104 genomes obtained using an independent comparative genomics approach. Finally, we present guidelines to assist researchers in effectively identifying these key genomic regions.Entities:
Keywords: antimicrobial resistance; genomic island; horizontal gene transfer; interactive visualization; microbial genomics
Mesh:
Year: 2019 PMID: 29868902 PMCID: PMC6917214 DOI: 10.1093/bib/bby042
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
GI prediction tools listed by descending year of last publication
| Predictors | Command line, GUI, webserver, database | Complete/draft genome | Input file | Year, reference | Link |
|---|---|---|---|---|---|
| IslandPath-DIMOB | C (W D) | C | gbk or embl | 2018, [ |
|
| 2005, [ | |||||
| IslandViewer | C W D | C / D | gbk or embl | 2017, [ |
|
| 2009, [ | |||||
| Vrprofile | W, D | C / D | gbk or fna | 2017, [ |
|
| MTGIpick | G W | C / D | fna | 2016, [ |
|
| Zisland Explorer | C G W D | C | fna (optional ptt) | 2016, [ |
|
| MSGIP | C G | C | fna | 2016, [ |
|
| Islander | C D | C | fna | 2015, [ |
|
| 2004, [ | |||||
| GI-SVM | C | C | fna | 2015, [ |
|
| PAI Finder-PAIDB | W D | C | ffn (specific format) | 2015, [ |
|
| PIPS and GIPSy | C G W | C | gbk or embl | 2015, [ |
|
| 2012, [ |
| ||||
| GIHunter | C D | C | fna, ptt, and rnt | 2014, [ |
|
| Sighunt | C | C | fna | 2014, [ |
|
| GC-profile | C W | C | fna | 2014, [ |
|
| SVM-AGP (HGT) | C | embl | 2014, [ |
| |
| GI-POP | W | C / D | – | 2013, [ |
|
| CGS (HGT) | C | C | – | 2012, [ | available on request |
| EGID and GIST | C G | C | fna, faa, ffn, gbk, and ptt | 2012, [ |
|
| 2011, [ |
| ||||
| IGIPT | C W | C | fna/ffn | 2011, [ |
|
| GIDetector | G (C) | C | fna, ptt, rnt | 2010, [ |
|
| INDeGenIUS | C | C | fna | 2010, [ | available on request |
| MJSD | C | C | fna | 2009, [ |
|
| Design-Island | G | C | fna | 2008, [ |
|
| PredictBias | W | C | gbk | 2008, [ |
|
| RVM | – | C | – | 2008, [ | no implementation available |
| IslandPick | W D | C | gbk or embl | 2008, [ |
|
| DarkHorse | C D | C / D | ffn | 2007, [ |
|
| tRNAcc and MobilomeFinder | C W D | C | fna, tRNA, ptt | 2007, [ |
|
| 2006, [ | |||||
| Centroid | C | C | ffn and ptt | 2007, [ | available on request |
| Colombo | C G | C | embl, gbk, fasta | 2006, [ |
|
| SIGI-CRF (HGT) | C (G) | C | embl | 2006, [ |
|
| SIGI-HMM (HGT) | C (G) | C | embl | 2006, [ |
|
| AlienHunter | C | C | fna | 2006, [ |
|
| Wn-SVM (HGT) | C | C | ffn and ptt | 2005, [ | available on request |
| PAI-IDA | C | C | gbk | 2003, [ | available on request |
Could not be successfully used for the GI predictor assessment analysis (Supplementary Table S1).
Comparative genomics method and so excluded from the GI predictor assessment analysis.
() are used when the GUI, webserver or database has been published as a separate too.
GI features used by GI predictors
| Predictors | Method | Sequence composition bias | Machine learning | Other | Insertion site/ direct repeats | Mobility gene | Phage genes | Other genes |
|---|---|---|---|---|---|---|---|---|
| IslandPath-DIMOB | Seq composition | Dinucleotide | – | – | – | Integrase, transposase recombinase, insertion sequences | Yes | – |
| IslandViewer | Hybrid | Dinucleotide, codon usage | – | Incorporates Islander in pre-computed analyzes, and so contains its features | Incorporates Islander in pre-computed analyzes, and so contains its features | Integrase, transposase, recombinase, insertion sequences | Yes | VFs, AMRs |
| VRprofile | Hybrid | GC, dinucleotide, codon usage | – | BLASTP | tRNA, tmRNA, repeats | MobilomeDB | Yes | VFs, AMRs, secretion systems |
| MTGIpick | Seq composition | Tetranucleotide | – | – | – | – | – | – |
| Zisland Explorer | Seq composition | GC, codon usage, amino acid | – | – | – | – | – | – |
| MSGIP | Seq composition | Oligonucleotide | – | – | – | – | – | – |
| Islander | Structure | – | – | Negative filter: GC, length, integrase to end distance | tRNA, tmRNA, repeats | Integrase | – | – |
| GI-SVM | Seq composition | k-mer | SVM | – | – | – | – | – |
| PAI Finder-PAIDB | Seq composition | GC, codon usage | – | – | – | – | – | VFs, AMRs |
| PIPS and GIPSy | Hybrid | GC, codon usage | – | BLASTP | – | Transposase, tyr/ser recombinase | – | VFs, AMRs, metabolism, symbiosis, hypothetical proteins |
| GIHunter | Seq composition | k-mers, IVOM | Decision tree | Gene density, intergenic distance | tRNA | Integrase, transposase | Yes | – |
| Sighunt | Seq composition | Tetranucleotides | – | – | – | – | – | – |
| GC-profile | Seq composition | GC | – | – | – | – | – | – |
| SVM-AGP (HGT) | Seq composition | GC, oligonucleotide, codon usage, amino acid, position-based frequency | SVM | – | – | – | – | – |
| GI-POP | Seq composition | GC, oligonucleotide, codon usage, codon adaptation index | SVM | – | tRNA, repeats | Mobile genetic element from ACLAME database | – | – |
| CGS (HGT) | Seq composition | k-mers | – | – | – | – | – | – |
| EGID and GIST | Seq composition | Dinucleotide, codon usage, k-mers, IVOM | – | – | – | Integrase, transposase recombinase | – | – |
| IGIPT | Seq composition | k-mers (2–6), codon usage, amino acid | – | – | – | – | – | – |
| GIDetector | Seq composition | k-mers, IVOM | Decision tree | Gene density, intergenic distance | tRNA | Integrase, transposase | Yes | – |
| INDeGenIUS | Seq composition | k-mers | – | – | – | – | – | – |
| MJSD | Seq composition | k-mers | – | – | – | – | – | – |
| DarkHorse | Comparative | BLAST, phylogeny | – | |||||
| Design-Island | Seq composition | GC, oligonucleotide | – | – | – | – | – | – |
| PredictBias | Seq composition | GC, dinucleotides, codon usage | – | – | tRNA | Insertion sequences | – | VFs |
| RVM | Seq composition | k-mers, IVOM | RVM | Region size, gene density, insertion point | ncRNA, repeats | Integrase | Yes | – |
| IslandPick | Comparative | – | – | Multiple-sequence alignment | – | – | – | – |
| tRNAcc and MobilomeFinder | Comparative | – | – | Multiple-sequence alignment | tRNA | – | – | – |
| Centroid | Seq composition | k-mers | – | – | – | – | – | – |
| Colombo | Seq composition | Codon usage, tetranucleotides | – | – | – | – | – | – |
| SIGI-CRF (HGT) | Seq composition | Tetranucleotides | – | – | – | – | – | – |
| SIGI-HMM (HGT) | Seq composition | Codon usage | – | – | – | – | – | – |
| AlienHunter | Seq composition | k-mers, IVOM | – | – | – | – | – | – |
| Wn-SVM (HGT) | Seq composition | k-mers | SVM | – | – | – | – | – |
| PAI-IDA | Seq composition | GC, dinucleotide, codon usage | – | – | – | – | – | – |
Could not be successfully used for the GI predictor assessment analysis (Supplementary Table S1).
Comparative genomics method, and so excluded from the GI predictor assessment analysis.
Figure 1.Accuracy assessment of genomic island prediction methods. Accuracy of genomic island (GI) predictors was assessed using a data set of GIs identified by comparative genomics analysis of 104 genomes [39]. Each genome is represented by a value, with the median, and the first and third quartiles represented in the boxplot as the lower and upper hinges, respectively. Outliers are shown as black dots, if they exceed 1.5 times the interquartile range.
Mean GI prediction accuracy (%) assessed using the reference comparative genomics-based data set listed by descending MCC
| Predictors | MCC |
| Accuracy | Precision | Recall |
|---|---|---|---|---|---|
| IslandViewer 4 | 0.703 | 0.780 | 0.880 | 0.904 | 0.732 |
| GIHunter | 0.645 | 0.713 | 0.849 | 0.934 | 0.634 |
| IslandPath-DIMOB v1.0.0 | 0.486 | 0.554 | 0.768 | 0.874 | 0.469 |
| VRprofile | 0.470 | 0.511 | 0.774 | 0.940 | 0.417 |
| SIGI-CRF | 0.441 | 0.498 | 0.784 | 0.920 | 0.400 |
| SIGI-HMM | 0.353 | 0.374 | 0.729 | 0.919 | 0.264 |
| PredictBias | 0.347 | 0.526 | 0.715 | 0.577 | 0.623 |
| AlienHunter | 0.342 | 0.540 | 0.734 | 0.594 | 0.570 |
| MTGIpick | 0.324 | 0.559 | 0.704 | 0.551 | 0.675 |
| MJSD | 0.259 | 0.497 | 0.614 | 0.521 | 0.638 |
| INDeGenIUS | 0.236 | 0.356 | 0.699 | 0.651 | 0.275 |
| GI-SVM | 0.221 | 0.382 | 0.685 | 0.603 | 0.317 |
| Wn-SVM | 0.204 | 0.354 | 0.685 | 0.552 | 0.286 |
| Zisland Explorer | 0.203 | 0.239 | 0.688 | 0.853 | 0.177 |
| Centroid | 0.196 | 0.292 | 0.678 | 0.627 | 0.217 |
| Islander | 0.191 | 0.204 | 0.697 | 0.971 | 0.140 |
| SigHunt | 0.186 | 0.466 | 0.619 | 0.457 | 0.605 |
| PAI-IDA | 0.177 | 0.220 | 0.671 | 0.685 | 0.149 |
| MSGIP | 0.150 | 0.199 | 0.678 | 0.865 | 0.163 |
| GC-Profile | 0.091 | 0.205 | 0.620 | 0.637 | 0.225 |
Mean GI prediction accuracy (%), assessed using the reference literature-based data set and listed by descending MCC
| Predictor | MCC |
| Accuracy | Precision | Recall |
|---|---|---|---|---|---|
| GIHunter | 0.734 | 0.832 | 0.847 | 0.981 | 0.745 |
| PredictBias | 0.643 | 0.838 | 0.820 | 0.868 | 0.817 |
| IslandViewer 4 | 0.640 | 0.753 | 0.788 | 0.998 | 0.619 |
| VRprofile | 0.574 | 0.643 | 0.751 | 0.993 | 0.542 |
| IslandPath-DIMOB v1.0.0 | 0.541 | 0.669 | 0.720 | 0.979 | 0.521 |
| MTGIpick | 0.504 | 0.775 | 0.753 | 0.819 | 0.744 |
| SIGI-CRF | 0.426 | 0.522 | 0.689 | 0.993 | 0.436 |
| AlienHunter | 0.398 | 0.642 | 0.705 | 0.753 | 0.570 |
| SIGI-HMM | 0.359 | 0.420 | 0.600 | 0.998 | 0.285 |
| MJSD | 0.347 | 0.694 | 0.653 | 0.742 | 0.697 |
| Islander | 0.321 | 0.354 | 0.560 | 1.000 | 0.226 |
| MSGIP | 0.306 | 0.439 | 0.620 | 0.947 | 0.353 |
| SigHunt | 0.234 | 0.649 | 0.620 | 0.692 | 0.624 |
| Zisland Explorer | 0.175 | 0.264 | 0.520 | 0.833 | 0.171 |
| INDeGenIUS | 0.152 | 0.293 | 0.515 | 0.716 | 0.189 |
| GI-SVM | 0.150 | 0.312 | 0.512 | 0.721 | 0.200 |
| Wn-SVM | 0.129 | 0.320 | 0.499 | 0.715 | 0.208 |
| PAI-IDA | 0.077 | 0.118 | 0.461 | 0.667 | 0.067 |
| Centroid | 0.075 | 0.202 | 0.481 | 0.636 | 0.129 |
| GC-Profile | −0.015 | 0.063 | 0.414 | 0.667 | 0.034 |
Figure 2.Summary of GI predictor characteristics with a multi-entry decision table. GI predictors were classified according to their interface and the requirements. Sensitivity (recall) and precision are color coded using a gradient from low-to-high as assessed using the comparative genomics data set. Methods that were not assessed (comparative genomics) or that we were unable to assess are shown in white.