| Literature DB >> 18380895 |
Raghunath Chatterjee1, Keya Chaudhuri, Probal Chaudhuri.
Abstract
BACKGROUND: Many of the available methods for detecting Genomic Islands (GIs) in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision.Entities:
Mesh:
Year: 2008 PMID: 18380895 PMCID: PMC2362129 DOI: 10.1186/1471-2164-9-150
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Algorithmic flow-charts of the first phase (Fig. 1A) and the refinement phase (Fig. 1B) of Design-Island.
Figure 2The influence of different choices of cut-off P-values (P0) used in the refinement phase on the sensitivity (SN), the specificity (SP) and the accuracy (AC) of Design-Island applied to a manually curated data set of 1560 putative horizontally transferred genes of Salmonella typhi CT18 generated by Vernikos et al. [21] is shown in Fig. 2A. Fig. 2B shows corresponding variations of slopes of the curves for SN, SP and AC for different choices of cut-off P-values (P0).
Figure 3In the upper panel, 3D plots of the P-values for a window with variable size that slides across (i) the chromosome of Salmonella typhi CT18 from 1 bp, i.e., the start to 2.5 Mbp (Fig. 3A), (ii) the chromosome of Salmonella typhi CT18 from 2.5 Mbp to 4.8 Mbp, i.e., end (Fig. 3B). The P-value at a specific location and for a specific size of the window is plotted using a gray scale that changes gradually from black to white with black corresponding to the extreme P-value = 0 and white corresponding to the other extreme P-value = 1. The white dots corresponding to higher P-values are almost invisible in the white background while dark dots corresponding to low P-values are prominently visible marking the GIs in the chromosome. Lower panel in each figure gives some representative 1D plots generated from the refinement phase for some of the 'putative GIs' (enclosed in gray blocks and labeled as 1,2,... in the 3D plots) detected in the first phase of Design-Island. The quantity (P0-P-value)+ for the region of a GI detected in the first phase is plotted. Here, for P-value > P0, (P0-P-value)+ = 0, and for P-value < P0, (P0-P-value)+ = (P0-P-value).
Figure 4The bar diagram and the corresponding data table for the sensitivity (SN), the specificity (SP) and the accuracy (AC) of Design-Island along with the other methods using a manually curated data set of 1560 putative horizontally transferred genes of Salmonella typhi CT18 generated by Vernikos et al. [21].