| Literature DB >> 26731657 |
Daniel M de Brito1, Vinicius Maracaja-Coutinho2,3,4,5, Savio T de Farias3, Leonardo V Batista1, Thaís G do Rêgo1.
Abstract
Genomic Islands (GIs) are regions of bacterial genomes that are acquired from other organisms by the phenomenon of horizontal transfer. These regions are often responsible for many important acquired adaptations of the bacteria, with great impact on their evolution and behavior. Nevertheless, these adaptations are usually associated with pathogenicity, antibiotic resistance, degradation and metabolism. Identification of such regions is of medical and industrial interest. For this reason, different approaches for genomic islands prediction have been proposed. However, none of them are capable of predicting precisely the complete repertory of GIs in a genome. The difficulties arise due to the changes in performance of different algorithms in the face of the variety of nucleotide distribution in different species. In this paper, we present a novel method to predict GIs that is built upon mean shift clustering algorithm. It does not require any information regarding the number of clusters, and the bandwidth parameter is automatically calculated based on a heuristic approach. The method was implemented in a new user-friendly tool named MSGIP--Mean Shift Genomic Island Predictor. Genomes of bacteria with GIs discussed in other papers were used to evaluate the proposed method. The application of this tool revealed the same GIs predicted by other methods and also different novel unpredicted islands. A detailed investigation of the different features related to typical GI elements inserted in these new regions confirmed its effectiveness. Stand-alone and user-friendly versions for this new methodology are available at http://msgip.integrativebioinformatics.me.Entities:
Mesh:
Year: 2016 PMID: 26731657 PMCID: PMC4711805 DOI: 10.1371/journal.pone.0146352
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Mean shift algorithm procedure for a data point x.
The bold filled circles with arrows represent the iteration, while the pointed circles represent the window used in density estimation until the convergence is achieved at the nth iteration.
Genomes used for testing mean shift and other related algorithms.
| Genome | Length(Mb) | Accession Number | References |
|---|---|---|---|
| 3.309 | NC_003450.3 | [ | |
| 3.281 | NC_004459.3 | [ | |
| 5.459 | NC_005296.1 | [ | |
| 2.032 | NC_004350.2 | [ | |
| 1.072 | NC_002506.1 | [ | |
| 3.354 | NC_005139.1 | [ | |
| 0.580 | NC_000908.2 | [ | |
| 1.109 | NC_020993.1 | [ |
Genomic islands detected by mean shift and its comparison with other methods previously used for each species.
| Genome | Identifier | Detected GI (Mb) (This work) | Corresponding GI (Mb) (Other Methods) | Characteristics |
|---|---|---|---|---|
| CGGI01 | 1.800–2.000 | 1.776–1.987 [ | Hypothetical proteins (with unknown function) | |
| VVCGI01 | 0.350–0.400 | 0.355–0.395 [ | Hypothetical proteins and invasion-associated proteins | |
| VVCGI02 | 2.450–2.600 | 2.438–2.605 [ | Invasion-associated proteins | |
| VVCGI03 | 3.250–3.281 | 3.248–3.281 [ | Transporter protein, transposase, phage and hypothetical proteins | |
| RPGI01 | — | 2.481–2.564 [ | IV secretion genes for conjugal transfer of DNA, arsenate reductase pump modifier and an arsenical pump membrane protein | |
| RPGI02 | 3.750–3.800 | 3.729–3.807 [ | Hypothetical proteins | |
| RPGI03 | 4.400–4.450 | — | Hypothetical proteins and flagellar proteins | |
| RPGI04 | 4.550–4.650 | 4.578–4.678 [ | Multidrug efflux and transporter related genes | |
| SMGI01 | 1.250–1.300 | 1.250–1.300 [ | TnSMU2 (nonribosomal peptide synthetases (NRPS), polyketide synthases (PKS), accessory proteins, transporters, and transcription regulators) | |
| VCGI01 | 0.300–0.450 | 0.302–0.436 [ | Chloramphenicol acetyltransferase, killer protein, antidote protein, haemagglutinin, others copies of acetyltransferase and Hypothetical protein | |
| VVYGI01 | — | 0.159–0.167 [ | Lactoglutathione lyase | |
| VVYGI02 | 1.800–1.950 | 1.757–1.936 [ | Hypothetical proteins and transposases | |
| VVYGI03 | 2.200–2.250 | — | Hypothetical proteins, region started and finished |
Fig 2Graphics representing the z′ curves for following six different bacteria genomes.
(A) Corynebacterium glutamicum ATCC 13032, (B) Vibrio vulnificus CMCP6 chromosome I, (C) Rhodopseudomonas palustris CGA009, (D) Streptococcus mutans UA159, (E) Vibrio cholerae chromosome II e (F) Vibrio vulnificus YJ016 chromosome I. Black lines represents the genomic islands identified by the mean shift method.