| Literature DB >> 25437607 |
Dongsheng Che1, Mohammad Shabbir Hasan2, Bernard Chen3.
Abstract
High-throughput sequencing technologies have made it possible to study bacteria through analyzing their genome sequences. For instance, comparative genome sequence analyses can reveal the phenomenon such as gene loss, gene gain, or gene exchange in a genome. By analyzing pathogenic bacterial genomes, we can discover that pathogenic genomic regions in many pathogenic bacteria are horizontally transferred from other bacteria, and these regions are also known as pathogenicity islands (PAIs). PAIs have some detectable properties, such as having different genomic signatures than the rest of the host genomes, and containing mobility genes so that they can be integrated into the host genome. In this review, we will discuss various pathogenicity island-associated features and current computational approaches for the identification of PAIs. Existing pathogenicity island databases and related computational resources will also be discussed, so that researchers may find it to be useful for the studies of bacterial evolution and pathogenicity mechanisms.Entities:
Year: 2014 PMID: 25437607 PMCID: PMC4235732 DOI: 10.3390/pathogens3010036
Source DB: PubMed Journal: Pathogens ISSN: 2076-0817
Figure 1A schematic view of a pathogenicity island with associated features. The PAI region has biased sequence composition. The PAI regions are associated with virulence genes (vir1, vir2, vir3, and vir4), phage-related genes (phag1 and phag2), mobile genes (int and trans), hypothetic protein genes (hypo1, hypo2, and hypo3), insertion sequence elements, direct repeats, and tRNA gene.
A list of Pathogenicity Islands (PAI)-associated features and measurement methods.
| PAI-Associated Features | Feature Measurement Methods |
|---|---|
| Different genomic sequence signature | Compute G+C content, GC-skew, codon usage, or other sequence signature tools |
| Presence of virulence factors | Search through virulence factor database such as VFDB |
| Presence of mobility genes | Search through NCBI-nr/nt, UniprotKB, Pfam or COG database |
| High percentage of phage-related genes | Search through NCBI-nr/nt, UniprotKB, Pfam or COG database |
| Presence of tRNA genes | Use tRNA gene search tool of tRNAscan-SE |
| High percentage of hypothetic protein genes | Search through NCBI-nr/nt, UniprotKB, Pfam or COG database |
| Presence of direct repeats | Use repeat finder software REPuter |
| Presence of insertion sequences | Search through ISfinder database |
Figure 2A schematic view of genomic region alignment in the comparative genomic based approach for island prediction. Three phylogenetically closely-related reference genomes (G1, G2, and G3) are shown here for the detection of island region in the query genome (G4).
The summary of a list of sequence composition based software.
| Software a | Main Principle | System Setup b | Website |
|---|---|---|---|
| AlienHunter | HMMs on various mer words | Unix/Linux OS, Java and Perl environment setup | |
| Centroid | Centroid on
| Unix/Linux OS and C++ environment setup | Upon request |
| EGID | Ensembles the results of AlienHunter, IslandPath, SIGI-HMM, INDeGenIUS and PAI-IDA | Unix/Linux OS, Java, C++ and Perl environment setup | |
| GIDetector | Decision-tree based bagging on IVOM score, insertion point, size, gene density, repeats, integrase, phage and non-coding RNA | Windows OS, C# with the support of Perl and Cygwin | |
| GI-GPS | SVMs on sequence composition (including GC content, dinucleotide frequency, codon usage, and codon adaption usage), and with filtering steps including length of candidate segment, tRNA and repeat elements | Not available | |
| GIHunter | Decision tree based bagging model using sequence composition, gene information and inter-genic distance, mobile genes, phage genes, tRNA, and gene density | Unix/Linux OS, Java, C++ and Perl environment setup | |
| INDeGenIUS | Clustering/Centroid on
| Unix/Linux OS and C++ environment setup | Upon request |
| IslandPath | G+C, dinucleotide, mobile genes, and codon usage | Unix/Linux OS and Perl environment setup | |
| PAI-IDA | Discriminant analysis on G+C, dinucleotide and codon usage | Unix/Linux OS, C++ and Perl environment setup | |
| PIPS | G+C content, codon usage deviation, virulence factors, hypothetical proteins, transposases, flanking tRNA and its absence in nonpathogenic organisms | Unix/Linux OS and Perl environment setup | |
| SIGI-HMM | HMM on codon usage | Unix/Linux OS and Java, environment setup |
a PIPS is used for predicting PAIs specifically, the rest of software tools are used for predicting GIs in general, including PAIs; b System setup include the operating systems in which software tools are run, and additional software may be installed such as Java/Perl/C++ environments.
The summary of public island databases and web resources.
| Category | Description | Website |
|---|---|---|
| DGI | A database that contains genomic islands of more than 2,000 bacterial genomes, many of which are PAIs, and displays GIs in circular graphic images | |
| GI-POP | A database that provides ongoing microbial gnome annotation, including ORF annotation, non-coding RNAs and GIs. GIs are predicted using GI-GPS | |
| IGIPT | A web server that identifies islands based on standard deviation from sequence composition average | |
| Islander | A database that contains a list of 89 islands in 106 bacterial genomes that harbor tRNA and tmRNA genes, and integrase genes | |
| IslandViewer | A database that contains predicted GI based on IslandPick, IslandPath-DIMOB and SIGI-HMM, and displays GIs in circular graphic images | |
| MOSAIC | A database that contains conserved segments and various regions ( | |
| PAIDB | A database contains known PAIs, candidate PAIs which are homologous to known PAIs | |
| PredictBias | A web server that calculates PAIs based on %G+C, dinucleotide, codon usage, virulence factor and absence of non-pathogenic species | |
| MvirDB | A database that contains a collection of publicly available and organized sequences representing known toxins, virulence factors, and antibiotic resistance genes | |
| VFDB | A database that contains all known virulence factors, as well as homologous genes through similarity search | |
| VirulentPred | A web server that predicts virulence factors based on input protein sequences | |