| Literature DB >> 20939910 |
Abstract
BACKGROUND: Microevolution is the study of short-term changes of alleles within a population and their effects on the phenotype of organisms. The result of the below-species-level evolution is heterogeneity, where populations consist of subpopulations with a large number of structural variations. Heterogeneity analysis is thus essential to our understanding of how selective and neutral forces shape bacterial populations over a short period of time. The Solexa Genome Analyzer, a next-generation sequencing platform, allows millions of short sequencing reads to be obtained with great accuracy, allowing for the ability to study the dynamics of the bacterial population at the whole genome level. The tool referred to as GenHtr was developed for genome-wide heterogeneity analysis.Entities:
Mesh:
Year: 2010 PMID: 20939910 PMCID: PMC2967562 DOI: 10.1186/1471-2105-11-508
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1The four-step analysis procedure for heterogeneity analysis of bacterial population. The first step is to establish genome-wide heterogeneity phenotypes for the newly sequenced bacterial strain (a clone population) (I.). The step first creates a database of Reference Genome DNA Fragments (RGDF) with a set of non-overlapped DNA fragments from the isogenic reference genome (IRG) (I.a); Once established, RGDF is used to search the database of Solexa short reads of the bacterial strain via MegaBlast to identify its candidate trace sequences (I.b), which are then mapped to the IRG to define genetic heterogeneity (I.c). The second step is a simulation procedure to study the genome complexity of the IRG (II.). The procedure creates an IRG-specific DNA database, covering all possible N-base pair "Solexa read-like" DNA fragments from the IRG (II.a), Then the analysis follow the same procedure from I.b to I.c to identify candidate sequences from the IRG-specific DNA database (II.b) and to establish genome-wide "heterogeneity" genotypes (II.c). The genotypes from step I and II are comparatively analyzed (III.). The genetic heterogeneity sites were analyzed with genewisedb for synonymous/non-synonymous mutations (IV).
Characterization of the genetic heterogeneity Sites and SNP in large gene families
| Maq | Chrom Position | Genotype profile at selected loci of SRX007711 | Genotype profile at selected loci of FPR3757 | Read Depth | SRX007711 Mean Phred Values | Max Phred Value | Functional Description |
|---|---|---|---|---|---|---|---|
| I. Heterogeneity sites that passed the SNPfilter and have an average of per-base Phred value greater than 13 | |||||||
| * | 778416 | T:6 G:140 | G:37 | 6 | 31.5 | 40 | sulfatase family protein |
| * | 2512836 | C:7 G:115 | G:37 | 8 | 31.25 | 40 | sensor histidine kinase |
| * | 107624 | A:5 C:110 | C:37 | 5 | 29.8 | 40 | hypothetical protein |
| * | 435033 | T:5 G:98 | G:37 | 5 | 28.2 | 38 | hypothetical protein |
| * | 1021087 | T:5 G:151 | G:37 | 5 | 26.2 | 40 | lipoate-protein ligase A family protein |
| * | 1662849 | A:5 C:134 | C:37 | 6 | 24.66 | 40 | penicillin-binding protein 3 |
| * | 2648343 | A:5 C:192 | C:37 | 5 | 24.6 | 40 | drug transporter phosphotransferase system, glucose- |
| * | 2674216 | A:5 C:170 | C:37 | 5 | 24.6 | 40 | specific IIABC component |
| * | 1542366 | T:6 G:129 | G:37 | 6 | 24.5 | 40 | hypothetical protein lantibiotic epidermin biosynthesis |
| * | 1950547 | A:5 C:164 | C:37 | 5 | 23.4 | 40 | protein EpiC |
| * | 105211 | A:6 G:132 | G:37 | 8 | 20.3 | 40 | hypothetical protein |
| * | 1857182 | A:14 G:85 | G:37 | 14 | 19.42 | 40 | hypothetical protein |
| * | 1558524 | T:5 G:56 | G:37 | 7 | 17.57 | 40 | tail tape measure protein |
| * | 2122182 | C:122 G:6 | C:37 | 6 | 17.28 | 38 | repressor protein |
| * | 1383603 | A:5 G:128 | G:37 | 5 | 15.8 | 20 | oxacillin resistance-related FmtC protein |
| (A:2) T:5 | lactose phosphotransferase system | ||||||
| * | 2333470 | C:164 | C:37 | 7 | 15.28 | 40 | repressor |
| putative fibronectin/fibrinogen binding | |||||||
| * | 1206348 | A:1 T:6 G:145 | G:37 | 7 | 14.71 | 25 | protein |
| A:153 (T:1) | |||||||
| * | 2262790 | G:6 | A:37 | 7 | 13.14 | 29 | cation efflux family protein |
| II. Heterogeneity sites that did not pass the SNPfilter but have an average per base Phred value greater than 13 | |||||||
| 1180638 | T:5 G:97 | G:37 | 5 | 40 | 40 | cell division protein ftsA | |
| 2-oxoglutarate dehydrogenase E1 | |||||||
| 1437922 | C:149 G:5 | C:37 | 5 | 39.2 | 40 | component | |
| 2212436 | A:5 C:107 | C:37 | 5 | 37.8 | 40 | thiamine-phosphate pyrophosphorylase | |
| 257712 | T:5 G:101 | G:37 | 5 | 35.6 | 40 | sensor histidine kinase family protein | |
| 861340 | A:9 C:136 | C:49 | 10 | 34 | 40 | clumping factor A | |
| acetyl-CoA carboxylase, biotin carboxyl | |||||||
| 1714319 | T:6 C:95 | C:37 | 6 | 34 | 40 | carrier protein | |
| 1252956 | T:9 G:125 | G:37 | 9 | 33 | 40 | DNA topoisomerase I | |
| lantibiotic epidermin leader peptide | |||||||
| 1948255 | A:5 C:169 | C:37 | 5 | 32.8 | 40 | processing serine protease EpiP | |
| 955972 | A:5 C:145 | C:37 | 5 | 32.6 | 40 | Hypothetical protein | |
| 2123183 | A:25 X:29 | A:37 | 29 | 31.44 | 40 | putative phage transcriptional regulator | |
| 2638027 | A:5 C:112 | C:36 | 5 | 31 | 40 | gluconate kinase | |
| 1829558 | T:5 G:118 | G:37 | 5 | 30.8 | 40 | septation ring formation regulator EzrA | |
| putative maltose ABC transporter, | |||||||
| 247386 | A:5 C:163 | C:37 | 5 | 30.4 | 40 | maltose-binding protein | |
| 2262622 | C:5 G:156 | G:37 | 5 | 30.4 | 40 | cation efflux family protein | |
| 344352 | C:460 G:7 | C:131 | 8 | 29.5 | 40 | Hypothetical protein | |
| 346978 | C:383 G:7 | C:113 | 8 | 29.5 | 40 | Hypothetical protein | |
| 472492 | T:5 G:194 | G:37 | 5 | 29.4 | 40 | Hypothetical protein | |
| 2175831 | A:585 G:8 | A:229 | 9 | 28 | 40 | 5 S ribosomal RNA | |
| 1753468 | C:96 G:5 | C:37 | 5 | 27.2 | 40 | Hypothetical protein | |
| 1503664 | A:113 T:6 | A:37 | 6 | 27.16 | 40 | Hypothetical protein | |
| 617974 | A:374 T:5 | A:99 | 5 | 26.2 | 38 | sdrD protein | |
| 154794 | A:5 C:136 | C:37 | 5 | 25.6 | 40 | Fe/Mn family superoxide dismutase | |
| 1943616 | C:5 G:110 | G:37 | 5 | 25.2 | 32 | serine protease SplA | |
| 2064321 | C:113 G:6 | C:37 | 6 | 24.8 | 40 | Hypothetical protein | |
| 910508 | A:177 T:6 | A:37 | 7 | 24 | 40 | lipoyl synthase | |
| 5-methyltetrahydropteroyltriglutamate-- | |||||||
| 408863 | T:6 G:160 | G:37 | 6 | 23.16 | 40 | homocysteine S-methyltransferase | |
| 2417570 | A:5 G:109 | G:37 | 5 | 22.8 | 39 | Na+/H+ antiporter NhaC | |
| glycerol uptake operon antiterminator | |||||||
| 1311574 | A:5 C:181 | C:37 | 5 | 22.6 | 40 | regulatory protein | |
| capsular polysaccharide biosynthesis | |||||||
| 175115 | X:6 G:136 | G:37 | 6 | 21.33 | 40 | protein Cap5B | |
| 2775087 | A:14 T:107 | T:49 | 14 | 20.4 | 29 | clumping factor B | |
| 2678195 | A:5 G:92 | G:37 | 5 | 19.4 | 27 | LysR family transcriptional regulator | |
| 451366 | C:5 G:119 | G:37 | 5 | 19.2 | 33 | Superantigen-like protein 5 | |
| 1633215 | A:5 C:83 | C:37 | 5 | 18.4 | 28 | putative traG membrane protein | |
| 2114835 | A:6 C:136 | C:37 | 7 | 17.14 | 27 | phiPVL ORF046-like protein | |
| 1859648 | C:5 X:1 G:121 | G:37 | 6 | 16 | 24 | FtsK/SpoIIIE family protein | |
| 467549 | A:89 T:23 | A:58 | 23 | 15.56 | 40 | Staphylococcal tandem lipoprotein | |
| 2123177 | A:57 X:5 | A:37 | 5 | 15.4 | 21 | putative phage transcriptional regulator | |
| 36501 | A:29 G:391 | G:192 | 29 | 15.13 | 40 | putative transposase | |
| 1857109 | A:162 G:12 | A:37 | 12 | 14 | 40 | hypothetical protein | |
| 950365 | A:5 C:113 | C:37 | 5 | 13.8 | 17 | Exonuclease RexB | |
| 801123 | T:5 G:91 | G:37 | 5 | 13.4 | 28 | transferrin receptor | |
| 2481059 | T:5 G:148 | G:37 | 5 | 13.4 | 21 | response regulator protein | |
| 1545118 | A:15 T:109 | T:37 | 15 | 13.2 | 40 | putative lipoprotein | |
| IV. Heterogeneity sites at RNA genes that pass the SNPfileter when single RNA genes were used as reference sequence. | |||||||
| * | 1997102 | A:69 T:13 | A:37 | 13 | 7.0 | 30 | Leu tRNA |
| 1996261 | T:7 C:76 | C:55 | 8 | 14.25 | 30 | Met tRNA | |
| * | 1961354 | T:12 C:25 | C:55 | 12 | 20.16 | 40 | Met tRNA |
| 517898 | T:581 C:8 | T:231 | 8 | 31.37 | 40 | 5 S ribosomal RNA | |
| * | 556291 | T:560 C:16 | T:225 | 16 | 20.56 | 40 | 5 S ribosomal RNA |
| 561501 | T:580 C:8 | T:229 | 8 | 31.37 | 40 | 5 S ribosomal RNA | |
| 1997607 | A:569 G:8 | A:218 | 8 | 31.37 | 40 | 5 S ribosomal RNA | |
| 2292385 | A:549 G:8 | A:218 | 8 | 31.37 | 40 | 5 S ribosomal RNA | |
| 516288 | A:34 G:408 | G:185 | 34 | 14.11 | 40 | 23 S ribosomal RNA | |
| 517172 | T:6 G:495 | G:185 | 6 | 13.83 | 26 | 23 S ribosomal RNA | |
| 559891 | A:34 G:408 | G:185 | 34 | 14.1 | 40 | 23 S ribosomal RNA | |
| 560775 | T:6 G:495 | G:185 | 6 | 13.83 | 26 | 23 S ribosomal RNA | |
| 1998333 | A:6 C:495 | C:185 | 6 | 13.8 | 26 | 23 S ribosomal RNA | |
| 1999217 | T:34 C:408 | C:185 | 34 | 14.11 | 40 | 23 S ribosomal RNA | |
| 2176557 | A:6 C:495 | C:185 | 6 | 13.83 | 26 | 23 S ribosomal RNA | |
| 2177441 | T:34 C:408 | C:185 | 34 | 14.1 | 40 | 23 S ribosomal RNA | |
| 2293111 | A:6 C:495 | C:185 | 6 | 13.83 | 26 | 23 S ribosomal RNA | |
Synonymous and non-synonymous analysis of mutations in the heterogeneity sites
| Chrom Position | Types of Mutations | Genotype at SRX007711 | Genotype at FPR3757 | Alignment with Orthologous Proteins | Gene and Function | ||
|---|---|---|---|---|---|---|---|
| 778416 | NONSYN | T:6 G:140 | G:37 | ||||
| sulfatase family protein lipoate-protein ligase A family protein | |||||||
| 1021087 | |||||||
| NONSYN | T:5 G:151 | G:37 | |||||
| lantibiotic epidermin biosynthesis protein EpiC | |||||||
| 1950547 | NONSYN | A:5 C:164 | C:37 | ||||
| Lactose phosphotransferase system repressor phosphotransferase system, glucose-specific IIABC component | |||||||
| 2333470 | NONSYN | T:5 C:164 | C:37 | ||||
| 2674216 | TRUNC | A:5 C:170 | C:37 | ||||
| 2648343 | NONSYN | A:5 C:192 | C:37 | drug transporter | |||
| 2262790 | NONSYN | A:153 G:6 | A:37 | cation efflux family protein | |||
| 861340 | SYN | A:9 C:136 | C:49 | clumping factor A | |||
| 1542366 | NONSYN | A:14 G:85 | G:37 | hypothetical protein | |||
| 5- | |||||||
| Methyltetrahydropteroyltriglut amate--homocysteine S- Methyltransferase | |||||||
| 408863 | NONSYN | T:6 G:160 | G:37 | ||||
| 1180638 | NONSYN | T:5 G:97 | G:37 | cell division protein ftsA | |||
| acetyl-CoA carboxylase, biotin carboxyl carrier protein | |||||||
| 1714319 | NONSYN | T:6 C:95 | C:37 | ||||
| 2638027 | NONSYN | A:5 C:112 | C:36 | gluconate kinase | |||
| 2481059 | NONSYN | T:5 G:148 | G:37 | response regulator protein | |||
| thiamine-phosphate pyrophosphorylase | |||||||
| 2212436 | NONSYN | A:5 C:107 | C:37 | ||||
| 1252956 | NONSYN | T:9 G:125 | G:37 | DNA topoisomerase I | |||
| lantibiotic epidermin leader | |||||||
| tataggaggtag | |||||||
| 1948255 | NONSYN | A:5 C:169 | C:37 | ctaattctgcca | peptide processing serine | ||
| protease EpiP | |||||||
| 1309034 | TRUNC | C:115 G:5 | C:37 | DNA mismatch repair protein MutS | |||
| 950365 | NONSYN | A:5 C:113 | C:37 | ||||
| exonuclease RexB | |||||||
| 2512836 | TRUNC | C:7 G:115 | G:37 | sensor histidine kinase | |||
| LysR family transcriptional regulator | |||||||
| 2678195 | TRUNC | A:5 G:92 | G:37 | ||||
Note. TRUNC: truncated protein; NONSYN: non-synonymous mutation and SYN: synonymous mutation. : align with Genewisedb and : align with Blastx.