| Literature DB >> 23383074 |
Stacy S Duncan1, Pieter L Valk, Mark S McClain, Carrie L Shaffer, Jason A Metcalf, Seth R Bordenstein, Timothy L Cover.
Abstract
Helicobacter pylori infection is a risk factor for the development of gastric adenocarcinoma, a disease that has a high incidence in East Asia. Genes that are highly divergent in East Asian H. pylori strains compared to non-Asian strains are predicted to encode proteins that differ in functional activity and could represent novel determinants of virulence. To identify such proteins, we undertook a comparative analysis of sixteen H. pylori genomes, selected equally from strains classified as East Asian or non-Asian. As expected, the deduced sequences of two known virulence determinants (CagA and VacA) are highly divergent, with 77% and 87% mean amino acid sequence identities between East Asian and non-Asian groups, respectively. In total, we identified 57 protein sequences that are highly divergent between East Asian and non-Asian strains, but relatively conserved within East Asian strains. The most highly represented functional groups are hypothetical proteins, cell envelope proteins and proteins involved in DNA metabolism. Among the divergent genes with known or predicted functions, population genetic analyses indicate that 86% exhibit evidence of positive selection. McDonald-Kreitman tests further indicate that about one third of these highly divergent genes, including cagA and vacA, are under diversifying selection. We conclude that, similar to cagA and vacA, most of the divergent genes identified in this study evolved under positive selection, and represent candidate factors that may account for the disproportionately high incidence of gastric cancer associated with East Asian H. pylori strains. Moreover, these divergent genes represent robust biomarkers that can be used to differentiate East Asian and non-Asian H. pylori strains.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23383074 PMCID: PMC3561388 DOI: 10.1371/journal.pone.0055120
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1MLST analysis of H. pylori strains included in this study.
Nucleotide sequences of 7 conserved housekeeping genes (atpA, efp, mutY, ppa, trpC, ureI, and yphC) from 16 strains of H. pylori were concatenated and compared to corresponding loci from 445 reference strains (see Methods). Eight strains (98-10, 35A, 51, 52, F16, F30, F32 and F57) were classified as hspEAsia, six strains (26695, HPAG1, G27, P12, B8 and B38) were classified as hpEurope and two strains (J99 and 908) were classified as hspWAfrica.
Classification of CagA and VacA in 16 H. pylori strains.
| Strain |
| CagA type (EPIYA) | VacA type |
| 98-10 | + | EPIYA-D | s1c/i1/m1 |
| 35A | + | EPIYA-D | s1c/i1/m1 |
| F16 | + | EPIYA-D | s1c/i1/m1 |
| F30 | + | EPIYA-D | s1c/i1/m1 |
| F32 | + | EPIYA-D | s1c/i1/m1 |
| F57 | + | EPIYA-D | s1a/i1/m1 |
| 51 | + | EPIYA-D | s1c/i1/m1 |
| 52 | + | EPIYA-D | s1c/i1/m1 |
| 26695 | + | EPIYA-C | s1a/i1/m1 |
| J99 | + | EPIYA-C | s1a/i1/m1 |
| HPAG1 | + | EPIYA-C | s1a/i1/m1 |
| G27 | + | EPIYA-C | s1b/i1/m1 |
| P12 | + | EPIYA-C | s1a/i1/m1 |
| B8 | + | EPIYA-C | s1a/i2/m2 |
| 908 | + | EPIYA-C | s1b/i1/m1 |
| B38 | − | Not applicable | s2/i2/m2 |
VacA is truncated in strains 52 and B8.
The cag PAI is absent from strain B38.
Proteins that are highly divergent in East Asian and non-Asian strains of H. pylori.
| Main role | Subrole | Gene ID | Annotation |
| Cell envelope | Other | HP0009 | outer membrane protein HopZ (omp1) |
| Cell envelope | Other | HP0025 | outer membrane protein HopD (omp2) |
| Cell envelope | Other | HP1243 | outer membrane protein BabA (omp28) |
| Cell envelope | Other | HP0373 | outer membrane protein HomC/HomD |
| Cell envelope | Other | NA | outer membrane protein HomB |
| Cell envelope | Other | HP0725 | outer membrane protein SabA/HopP (omp17) |
| Cell envelope | Other | HP0923 | outer membrane protein HopK (omp12) |
| Cell envelope | Other | HP0229 | outer membrane protein HopA (omp6) |
| Cell envelope | Other | HP1157 | outer membrane protein HopL (omp26) |
| Cell envelope | Other | HP0609/0610 | vacuolating cytotoxin (VacA)-like protein |
| Cell envelope | Other | HP0922 | vacuolating cytotoxin (VacA)-like protein |
| Cell envelope | Other | HP0492 | HpaA-like protein |
| Cell envelope | Biosynthesis and degradation of surface polysaccharides and lipopolysaccharides | HP0651 | alpha-(1,3)-fucosyltransferase |
| Cell envelope | Biosynthesis and degradation of surface polysaccharides and lipopolysaccharides | HP0159 | lipopolysaccharide 1,2-glucosyltransferase (RfaJ) |
| Cell envelope | Biosynthesis and degradation of murein sacculus and peptidoglycan | HP0160 | cysteine-rich protein D/beta-lactamase HcpD |
| Cellular processes | Pathogenesis | HP0547 | cytotoxin associated protein A (CagA) |
| Cellular processes | Toxin production and resistance | HP0887 | vacuolating cytotoxin A (VacA) |
| Cellular processes | Chemotaxis and motility | HP0906 | flagellar hook-length control protein |
| DNA metabolism | DNA replication, recombination, and repair | HP1553 | recombination protein RecB/helicase |
| DNA metabolism | DNA replication, recombination, and repair | HP0661 | ribonuclease H (RnhA) |
| DNA metabolism | DNA replication, recombination, and repair | HP1323 | ribonuclease HII (RnhB) |
| DNA metabolism | Restriction/modification | HP0463 | type I restriction enzyme M protein/HsdM |
| DNA metabolism | Restriction/modification | HP0850 | type I restriction enzyme M protein (HsdM) |
| DNA metabolism | Restriction/modification | HP1354 | type IIG restriction-modification enzyme/adenine specific DNA methyltransferase |
| DNA metabolism | Restriction/modification | HP1371 | type III restriction enzyme R protein |
| Protein fate | Degradation of proteins, peptides, and glycopeptides | HP0806 | metalloprotease |
| Protein fate | Protein and peptide secretion and trafficking | HP1255 | preprotein translocase subunit SecG |
| Protein synthesis | tRNA and rRNA base modification | HP1415 | tRNA delta(2)-isopentenylpyrophosphate transferase (MiaA) |
| Protein synthesis | tRNA aminoacylation | HP1513 | selenocysteine synthase (SelA)/L-seryl-tRNA(Sec)selenium transferase |
| Purines, pyrimidines, nucleosides, and nucleotides | Purine ribonucleotide biosynthesis | HP1530 | purine nucleoside phosphorylase (PunB) |
| Transcription | RNA processing | HP0640 | poly(A) polymerase (PapS) |
| Unknown function | General | HP0322 | poly E-rich protein |
| Hypothetical | Conserved | HP0728 | tRNA(Ile)-lysidine synthase (TilS) |
| Hypothetical | Conserved | HP0729 | probable ATP/GTP binding protein |
| Hypothetical | Conserved | HP1250 | bacterial SH3 domain protein |
| Hypothetical | Conserved | HP0852 | excinuclease ATPase subunit |
| Hypothetical | Conserved | HP1265 | NADH-ubiquinone oxidoreductase chain F (NuoF) |
| Hypothetical | Conserved | HP0721 | hypothetical protein |
| Hypothetical | Conserved | HP0636 | hypothetical protein |
| Hypothetical | Conserved | HP1579 | hypothetical protein |
| Hypothetical | Conserved | HP0861 | hypothetical protein |
| Hypothetical | Conserved | HP0384 | hypothetical protein |
| Hypothetical | Conserved | HP0635 | hypothetical protein |
| Hypothetical | Conserved | HP0897 | hypothetical protein |
| Hypothetical | Conserved | HP0398 | hypothetical protein |
| Hypothetical | Conserved | HP0629 | hypothetical protein |
| Hypothetical | Conserved | HP0973 | hypothetical protein |
| Hypothetical | Conserved | HP0167 | hypothetical protein |
| Hypothetical | Conserved | HP0120 | hypothetical protein |
| Hypothetical | Conserved | HP0583 | hypothetical protein |
| Hypothetical | Conserved | HP0119 | hypothetical protein |
| Hypothetical | Conserved | HP0681 | hypothetical protein |
| Hypothetical | Conserved | HP1321 | hypothetical protein |
| Hypothetical | Conserved | HP0833 | hypothetical protein |
| Hypothetical | Conserved | HP0338 | hypothetical protein |
| Hypothetical | Conserved | HP0061 | hypothetical protein |
| Hypothetical | Conserved | HP1322 | hypothetical protein |
|
| |||
| Energy metabolism | ATP-proton motive force interconversion | HP1134 | ATP synthase F0F1 subunit alpha (AtpA) |
| Protein synthesis | Translation factors | HP0177 | elongation factor P (Efp) |
| DNA metabolism | DNA replication, recombination, and repair | HP0142 | A/G-specific adenine glycosylase (MutY) |
| Central intermediary metabolism | Phosphorus compounds | HP0620 | inorganic pyrophosphatase (Ppa) |
| Tryptophan biosynthesis | Aromatic amino acid family | HP1279 | anthranilate isomerase (TrpC) |
| Central intermediary metabolism | Other | HP0071 | urease accessory protein (UreI) |
| Unknown function | General | HP0834 | GTP-binding protein (YphC) |
Assignment of genes into functional groups is based on classifications of H. pylori 26695 genes reported in the JCVI Comprehensive Microbial Resource database, based on analysis of three H. pylori genomes (26695, J99 and HPAG1).
Gene numbers in H. pylori reference strain 26695 are shown.
Annotations are based on data reported in the JCVI Comprehensive Microbial Resource database or data reported in Genbank at the time when this study was undertaken.
Three proteins initially classified as “hypothetical” were subsequently found to exhibit similarity to proteins of known function. These include HP0861 [corresponding to heavy metal (copper tolerance) in Shewanella and integral membrane protein in Campylobacter], HP0635 (corresponding to hydrogenase E in Campylobacter) and HP1321 (corresponding to an ATPase in Wolinella and other species). Conserved domain analysis indicates that HP0861 belongs to the Dsb superfamily, HP0384 belongs to the SPOR superfamily, and HP1321 belongs to both the P-loop-containing nucleoside triphosphosphate hydrolase superfamily and the helix-turn-helix superfamily.
Not applicable. HomB is absent from strain 26695.
Analysis of nucleotide diversity.a
| Annotation | Gene ID (26695) | Mean % aa identity (EA vs. Non-EA) | πa-EA | πa-Non EA | πs-EA | πs-Non EA | Ka/Ks (EA-NEA) |
| HopZ (omp1) | HP0009 | 73.61 | 0.069 | 0.039 | 0.228 | 0.186 | 0.264 |
| HopD (omp2) | HP0025 | 88.96 | 0.015 | 0.042 | 0.097 | 0.212 | 0.224 |
| BabA (omp28) | HP1243 | 87.65 | 0.050 | 0.053 | 0.186 | 0.266 | 0.226 |
| HomC/HomD | HP0373 | 80.21 | 0.012 | 0.068 | 0.073 | 0.272 | 0.293 |
| HomB | NA | 86.77 | 0.053 | 0.048 | 0.195 | 0.236 | 0.220 |
| SabA/HopP/(omp17) | HP0725 | 82.78 | 0.070 | 0.039 | 0.171 | 0.183 | 0.299 |
| HopK (omp12) | HP0923 | 89.23 | 0.021 | 0.040 | 0.100 | 0.205 | 0.197 |
| HopA (omp6) | HP0229 | 89.24 | 0.040 | 0.043 | 0.116 | 0.170 | 0.259 |
| HopL (omp26) | HP1157 | 89.58 | 0.032 | 0.036 | 0.132 | 0.202 | 0.210 |
| VacA-like protein | HP0609/0610 | 91.09 | 0.024 | 0.037 | 0.144 | 0.251 | 0.162 |
| VacA-like protein | HP0922 | 89.93 | 0.020 | 0.027 | 0.083 | 0.167 | 0.197 |
| HpaA-like protein | HP0492 | 71.03 | 0.014 | 0.038 | 0.053 | 0.128 | 0.403 |
| alpha-(1,3)-fucosyltransferase | HP0651 | 81.68 | 0.023 | 0.057 | 0.174 | 0.294 | 0.218 |
| lipopolysaccharide 1,2-glucosyltransferase (rfaJ) | HP0159 | 86.99 | 0.020 | 0.057 | 0.063 | 0.193 | 0.331 |
| cysteine-rich protein D/beta-lactamase (hcpD) | HP0160 | 89.61 | 0.022 | 0.035 | 0.088 | 0.141 | 0.320 |
| cytotoxin associated protein A (cagA) | HP0547 | 77.74 | 0.018 | 0.054 | 0.067 | 0.116 | 0.414 |
| vacuolating cytotoxin A (vacA) | HP0887 | 87.39 | 0.013 | 0.057 | 0.097 | 0.203 | 0.260 |
| flagellar hook-length control protein | HP0906 | 85.76 | 0.030 | 0.049 | 0.108 | 0.178 | 0.279 |
| recombination protein RecB/helicase | HP1553 | 89.52 | 0.020 | 0.032 | 0.102 | 0.101 | 0.229 |
| ribonuclease H (rnhA) | HP0661 | 79.16 | 0.020 | 0.020 | 0.149 | 0.076 | 0.206 |
| ribonuclease HII (rnhB) | HP1323 | 87.90 | 0.035 | 0.042 | 0.165 | 0.173 | 0.222 |
| type I restriction enzyme M protein (hsdM) | HP0463 | 89.23 | 0.025 | 0.037 | 0.091 | 0.147 | 0.250 |
| type I restriction enzyme M protein (hsdM) | HP0850 | 87.75 | 0.026 | 0.044 | 0.131 | 0.208 | 0.213 |
| type IIG restriction-modification enzyme | HP1354 | 82.75 | 0.031 | 0.072 | 0.098 | 0.242 | 0.321 |
| type III restriction enzyme R protein | HP1371 | 83.71 | 0.031 | 0.036 | 0.093 | 0.155 | 0.264 |
| metalloprotease | HP0806 | 87.69 | 0.020 | 0.047 | 0.090 | 0.231 | 0.251 |
| preprotein translocase subunit secG | HP1255 | 89.15 | 0.012 | 0.024 | 0.087 | 0.169 | 0.179 |
| tRNA delta(2)-isopentenylpyrophosphate transferase (miaA) | HP1415 | 80.77 | 0.020 | 0.055 | 0.100 | 0.195 | 0.268 |
| selenocysteine synthase (SelA)/L-seryl-tRNA(Sec) selenium transferase | HP1513 | 89.50 | 0.024 | 0.038 | 0.093 | 0.195 | 0.221 |
| purine nucleoside phosphorylase (punB) | HP1530 | 89.03 | 0.016 | 0.037 | 0.091 | 0.200 | 0.223 |
| poly(A) polymerase (papS) | HP0640 | 89.11 | 0.021 | 0.034 | 0.106 | 0.187 | 0.190 |
| poly E-rich protein | HP0322 | 72.27 | 0.027 | 0.048 | 0.105 | 0.189 | 0.243 |
| tRNA(Ile)-lysidine synthase | HP0728 | 89.96 | 0.017 | 0.034 | 0.078 | 0.150 | 0.236 |
| probable ATP/GTP binding protein | HP0729 | 88.15 | 0.035 | 0.031 | 0.151 | 0.158 | 0.190 |
| bacterial SH3 domain protein | HP1250 | 77.58 | 0.038 | 0.056 | 0.122 | 0.131 | 0.445 |
| Excinuclease ATPase subunit | HP0852 | 83.85 | 0.038 | 0.053 | 0.100 | 0.194 | 0.284 |
| NADH-ubiquinone oxidoreductase chain F | HP1265 | 88.97 | 0.018 | 0.038 | 0.078 | 0.186 | 0.218 |
|
| |||||||
| ATP synthase F0F1 subunit alpha (atpA) | HP1134 | 98.00 | 0.003 | 0.003 | 0.076 | 0.105 | 0.027 |
| elongation factor P (efp) | HP0177 | 98.00 | 0.002 | 0.003 | 0.107 | 0.159 | 0.020 |
| A/G-specific adenine glycosylase (mutY) | HP0142 | 94.00 | 0.011 | 0.026 | 0.095 | 0.223 | 0.114 |
| inorganic pyrophosphatase (ppa) | HP0620 | 96.00 | 0.004 | 0.005 | 0.060 | 0.123 | 0.092 |
| anthranilate isomerase (trpC) | HP1279 | 94.00 | 0.020 | 0.032 | 0.088 | 0.188 | 0.173 |
| urease accessory protein (ureI) | HP0071 | 97.00 | 0.001 | 0.007 | 0.047 | 0.103 | 0.061 |
| GTP-binding protein (yphC) | HP0834 | 96.00 | 0.008 | 0.018 | 0.078 | 0.154 | 0.096 |
Outlier sequences were not removed prior to these analyses.
The mean % amino acid identity when comparing East Asian (EA) and non-EA sequences was significantly higher for the control group of housekeeping genes than for the group of divergent genes.
The mean Ka/Ks ratio, calculated based on comparison of East Asian sequences with non-EA (NEA) sequences, was significantly higher for the group of divergent genes than for the control group of housekeeping genes.
Not applicable. HomB is absent from strain 26695.
Figure 2Bayesian phylogenies for six representative proteins that are highly divergent in East Asian H. pylori strains compared to non-Asian strains.
These include CagA (HP0547), VacA (HP0887), HpaA-like protein (HP0492), HopK (HP0923), HopL (HP1157), and a VacA-like protein (HP0922). The best available model of evolution was determined with ProtTest and phylogenies were inferred using MrBayes. Asterisks indicate posterior probabilities greater than 0.75. Sequences from East Asian strains (boxed) are highly divergent when compared to corresponding amino acid sequences from non-Asian strains of H. pylori. Scale bars show number of substitutions per site.
Figure 3Analyses of Ka/Ks values for 37 divergent genes with predicted functions and seven housekeeping genes.
Gene-wide Ka/Ks ratios were calculated, comparing sequences from East Asian strains with corresponding sequences from non-Asian strains, without the removal of outliers. (A) Distribution of Ka/Ks values. Ka/Ks values of highly divergent genes were significantly higher than Ka/Ks values of housekeeping genes. (B) Simple linear regression analysis comparing Ka/Ks values with mean % amino acid identity values (East Asian vs. non-Asian) for the 37 highly divergent proteins.
Analysis of positive selection using McDonald-Kreitman test.a
| Annotation | Gene ID (26695) |
|
|
|
|
| NI | α-Value |
| HopZ (omp1) | HP0009 | 12.07 | 16.44 | 285 | 346 | 0.765 | 1.122 | −0.122 |
| HopD (omp2) | HP0025 | 22.21 | 25.98 | 95 | 141 | 0.561 | 0.788 | 0.211 |
| BabA (omp28) | HP1243 | 28.33 | 28.08 | 127 | 244 | 0.020 | 0.515 | 0.484 |
| HomC/HomD | HP0373 | 38.63 | 37 | 177 | 212 | 0.373 | 0.799 | 0.200 |
| HomB | NA | 5.01 | 9.13 | 198 | 291 | 0.702 | 1.239 | −0.239 |
| SabA/HopP/(omp17) | HP0725 | 21.21 | 15.39 | 177 | 215 | 0.137 | 0.597 | 0.402 |
| HopK (omp12) | HP0923 | 13.13 | 14.62 | 58 | 77 | 0.672 | 0.838 | 0.161 |
| HopA (omp6) | HP0229 | 4.01 | 9.18 | 126 | 113 | 0.114 | 2.552 | −1.552 |
| HopL (omp26) | HP1157 | 29.20 | 24.52 | 201 | 307 | 0.036 | 0.549 | 0.450 |
| VacA-like protein | HP0609/0610 | 36.11 | 38.47 | 743 | 1321 | 0.028 | 0.599 | 0.400 |
| VacA-like protein | HP0922 | 58.41 | 57.43 | 325 | 488 | 0.032 | 0.654 | 0.345 |
| HpaA-like protein | HP0492 | 79.15 | 57.12 | 45 | 41 | 0.399 | 0.792 | 0.207 |
| alpha-(1,3)-fucosyltransferase | HP0651 | 27.51 | 24.45 | 145 | 220 | 0.070 | 0.585 | 0.414 |
| lipopolysaccharide 1,2-glucosyltransferase (rfaJ) | HP0159 | 12.11 | 6.10 | 76 | 71 | 0.232 | 0.539 | 0.460 |
| cysteine-rich protein D/beta-lactamase HcpD | HP0160 | 8.06 | 3.03 | 45 | 63 | 0.047 | 0.268 | 0.731 |
| cytotoxin associated protein A (cagA) | HP0547 | 144.15 | 79.93 | 209 | 165 | 0.041 | 0.702 | 0.297 |
| vacuolating cytotoxin A (vacA) | HP0887 | 64.97 | 51.03 | 183 | 271 | 0.002 | 0.530 | 0.469 |
| flagellar hook-length control protein | HP0906 | 13.09 | 9.18 | 106 | 103 | 0.469 | 0.721 | 0.278 |
| recombination protein RecB/helicase | HP1553 | 40.50 | 28.97 | 132 | 167 | 0.033 | 0.565 | 0.434 |
| ribonuclease H (rnhA) | HP0661 | 4.04 | 2.04 | 9 | 17 | 0.150 | 0.267 | 0.732 |
| ribonuclease HII (rnhB) | HP1323 | 6.05 | 7.29 | 34 | 50 | 0.736 | 0.819 | 0.180 |
| type I restriction enzyme M protein (hsdM) | HP0463 | 11.06 | 11.25 | 149 | 142 | 0.881 | 1.067 | −0.067 |
| type I restriction enzyme M protein (hsdM) | HP0850 | 17.16 | 14.43 | 93 | 122 | 0.242 | 0.641 | 0.358 |
| type IIG restriction-modification enzyme | HP1354 | 8.02 | 6.04 | 404 | 365 | 0.738 | 0.834 | 0.165 |
| type III restriction enzyme R protein | HP1371 | 10.03 | 13.25 | 228 | 188 | 0.269 | 1.602 | −0.602 |
| metalloprotease | HP0806 | 8.09 | 4.08 | 41 | 52 | 0.141 | 0.398 | 0.601 |
| preprotein translocase subunit SecG | HP1255 | 5.03 | 4.09 | 23 | 38 | 0.314 | 0.491 | 0.508 |
| tRNA delta(2)-isopentenylpyrophosphate transferase (miaA) | HP1415 | 9.09 | 9.34 | 55 | 63 | 0.828 | 0.897 | 0.102 |
| selenocysteine synthase (SelA)/L-seryl-tRNA(Sec) selenium transferase | HP1513 | 13.13 | 12.44 | 60 | 94 | 0.237 | 0.604 | 0.395 |
| purine nucleoside phosphorylase (punB) | HP1530 | 7.07 | 4.10 | 30 | 40 | 0.202 | 0.434 | 0.565 |
| poly(A) polymerase (papS) | HP0640 | 7.03 | 10.29 | 52 | 88 | 0.778 | 0.864 | 0.135 |
| poly E-rich protein | HP0322 | 14.15 | 6.12 | 64 | 62 | 0.111 | 0.446 | 0.553 |
| tRNA(Ile)-lysidine synthase | HP0728 | 8.05 | 7.16 | 51 | 55 | 0.725 | 0.825 | 0.174 |
| probable ATP/GTP binding protein | HP0729 | 10.08 | 21.39 | 90 | 90 | 0.062 | 2.121 | −1.121 |
| bacterial SH3 domain protein | HP1250 | 17.52 | 8.44 | 39 | 26 | 0.505 | 0.722 | 0.277 |
| excinuclease ATPase subunit | HP0852 | 6.03 | 6.14 | 95 | 82 | 0.779 | 1.180 | −0.180 |
| NADH-ubiquinone oxidoreductase chain F | HP1265 | 7.05 | 3.03 | 54 | 58 | 0.186 | 0.400 | 0.599 |
|
| ||||||||
| ATP synthase F0F1 subunit alpha (atpA) | HP1134 | 0 | 2 | 5 | 79 | 0.721 | Null | Null |
| elongation factor P (efp) | HP0177 | 0 | 1 | 3 | 47 | 0.800 | Null | Null |
| A/G-specific adenine glycosylase (mutY) | HP0142 | 1 | 3.03 | 32 | 81 | 0.023 | 1.197 | −0.197 |
| inorganic pyrophosphatase (ppa) | HP0620 | 5.04 | 5.16 | 1 | 22 | 0.001 | 0.046 | 0.953 |
| anthranilate isomerase (trpC) | HP1279 | 5.01 | 7.11 | 62 | 109 | 0.722 | 0.807 | 0.192 |
| urease accessory protein (ureI) | HP0071 | 0 | 2.01 | 8 | 43 | 0.541 | Null | Null |
| GTP-binding protein (yphC) | HP0834 | 2 | 4.06 | 17 | 50 | 0.680 | 0.689 | 0.310 |
Outlier sequences were not removed prior to these analyses.
The neutrality index (NI) was calculated from the ratio of the number of polymorphisms to the number of substitutions as follows: NI = (Pn/Ps)/(Dn/Ds), where P is polymorphic within the population, D is divergence or fixed difference between populations, n is nonsynonymous, and s is synonymous.
The proportion of adaptive substitutions that ranges from - ∞ to 1 and is estimated as 1 - NI.
Not applicable. HomB is absent from strain 26695.
Asterisks indicate genes showing signatures of diversifying selection.
Figure 4Correlation between Ka/Ks values and πs values among 12 genes under diversifying selection, based on MKT analysis.
(A,B) A linear regression analysis showed a significant correlation between Ka/Ks values and πs values, when analyzing sequences from non-Asian strains (p<0.05). There was a non-significant correlation when analyzing these sequences from East Asian strains (p = 0.07). (C) There was a strong positive correlation when comparing πs values of these 12 genes from either East Asian strains with the corresponding πs values from non-Asian strains (p<0.001).
Figure 5Lack of correlation between Ka/Ks values and πs values among 25 genes that were not under diversifying selection, based on MKT analysis.
(A,B) Linear regression analyses showed non-significant trends when comparing Ka/Ks values to πs values (p>0.05). (C) There was no significant correlation between the πs values of these sequences from East Asian strains with the corresponding πs values from non-Asian strains (p>0.05).
Figure 6Sliding window analysis of positive selection (Ka/Ks) within selected genes.
Sliding window analysis was performed to analyze the sequences of representative housekeeping genes (trpC and yphC) and several representative highly divergent genes, including vacA and cagA (under positive selection by MKT analysis) and an hpaA-like gene, hopK, rnhB, and HP1265 (not under positive selection by MKT analysis). Sequences from strain F16 (East Asian) and 26695 (non-Asian) were aligned and Ka/Ks ratios were calculated using DnaSP. In cases where sequences were not available from strains F16 or 26695, other representative East Asian or non-Asian sequences were analyzed. Parameters for the sliding window analysis were set at 50 bases (window size) and a step size of 10 bases. A Ka/Ks value of >1 indicates positive selection.