| Literature DB >> 22389766 |
Sarbashis Das1, Priyanka Duggal, Rahul Roy, Vithal P Myneedu, Digamber Behera, Hanumanthappa K Prasad, Alok Bhattacharya.
Abstract
The organization of genomic sequences is dynamic and undergoes change during the process of evolution. Many of the variations arise spontaneously and the observed genomic changes can either be distributed uniformly throughout the genome or be preferentially localized to some regions (hot spots) compared to others. Conversely cold spots may tend to accumulate very few variations or none at all. In order to identify such regions statistically, we have developed a method based on Shewhart Control Chart. The method was used for identification of hot and cold spots of single-nucleotide variations (SNVs) in Mycobacterium tuberculosis genomes. The predictions have been validated by sequencing some of these regions derived from clinical isolates. This method can be used for analysis of other genome sequences particularly infectious microbes.Entities:
Mesh:
Year: 2012 PMID: 22389766 PMCID: PMC3291883 DOI: 10.1038/srep00297
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Distribution of SNVs across whole genome.
Pink dots indicate frequency of SNVs identified by comparison between M. tuberculosis CDC1551 and M. tuberculosis H37Rv were mapped on H37Rv genome using a bin size of 2000 nucleotides. Blue dots indicate distribution of randomly generated SNVs on H37Rv genome. X-axis represents whole genome position. Y-axis represents SNV frequency.
Figure 2Kolmogorov-Smirnov test to check if SNV distribution follows Poisson distribution.
The function F1 is the empirically observed cumulative distribution of the SNVs and the function F2 is the cumulative distribution of a Poisson random variable with parameter 0.4954.
Figure 3Shewhart Control Chart: (a) Chart was derived using SNV frequencies from Fig. 1. (b) Average SNV frequencies in all the strains and isolates. Red and black dots indicate out-of-control (“hot spots”) and in-control respectively. Yellow dots indicate violating runs.
Figure 4Multiple sequence alignment of representative amplified re-sequenced cold spot and hot spot regions from clinical isolates.
Left side of the alignments is the isolates names with F/R indicating forward/reverse stands. (A) Cold spot; (B) Hot spot.