| Literature DB >> 19123947 |
Mark S McClain1, Carrie L Shaffer, Dawn A Israel, Richard M Peek, Timothy L Cover.
Abstract
BACKGROUND: Persistent colonization of the human stomach by Helicobacter pylori is associated with asymptomatic gastric inflammation (gastritis) and an increased risk of duodenal ulceration, gastric ulceration, and non-cardia gastric cancer. In previous studies, the genome sequences of H. pylori strains from patients with gastritis or duodenal ulcer disease have been analyzed. In this study, we analyzed the genome sequences of an H. pylori strain (98-10) isolated from a patient with gastric cancer and an H. pylori strain (B128) isolated from a patient with gastric ulcer disease.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19123947 PMCID: PMC2627912 DOI: 10.1186/1471-2164-10-3
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Features of H. pylori genomes
| Origin | U.K. | U.S. | Sweden | Japan | U.S. |
| Disease statea | Gastritis only | DU | AG | GC | GU |
| Yes | Yes | Yes | Yes | Yes | |
| s1a/m1 | s1b/m1 | s1b/m1 | s1c/m1 | s1a/m2h | |
| Genome size (Mb) | 1.67 | 1.64 | 1.61b | 1.6c | 1.6c |
| Total no. of ORFs | 1564d | 1491e | 1544f | 1527 | 1731 |
| No. of strain-specific genesg | 69 | 23 | 38 | 22 | 51 |
a DU, duodenal ulcer; AG, atrophic gastritis; GC, gastric cancer; GU, gastric ulcer
b Includes a 9.3 kb plasmid.
c The genome size of strain 98-10 is based on analysis of 51 large contigs, as defined in Methods. The genome size of strain B128 is based on analysis of 73 large contigs.
d The current analysis is based on data downloaded from TIGR, comprising 1564 ORFs. In contrast, a table on the TIGR website lists 1587 ORFs in strain 26695, and Genbank sequence files include 1566 ORFs from strain 26695.
e Additional ORFs, not included in this total, were subsequently detected in strain J99 [43].
f The HPAG1 chromosome contains 1,536 predicted protein-coding genes, and the remainder are contained on a plasmid.
g Present in only one of five strains analyzed in this study.
h vacA is truncated in strain B128.
Figure 1Phylogenetic structure based on sequence analysis of 8 . H. pylori strains analyzed in this figure include strains 98-10, B128, three strains for which genome sequences were previously determined (26695, J99, HPAG1), and representative strains isolated from patients in diverse geographic locations [18]. The figure lists the strain designations and the countries where strains were isolated. The nucleotide sequences of the concatenated MLST loci were aligned and compared, as described in Methods. All positions containing gaps and missing data were eliminated from the dataset. There were a total of 3041 positions in the final dataset. Neighbor-joining trees were constructed based on distances estimated by the Kimura 2-parameter model of nucleotide substitution [57,58]. The bootstrap consensus tree inferred from 1000 replicates is taken to represent the evolutionary history of the strains analyzed [59]. Branches corresponding to partitions reproduced in fewer than 50% bootstrap replicates are collapsed. The tree is drawn to scale, with the branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. Phylogenetic analyses were conducted in MEGA4 [63]. Five H. pylori strains for which genome sequences were available are denoted by diamonds. Three main H. pylori population groups (East Asian, European, and West African) are identifiable.
Figure 2Comparison of predicted proteomes by BLAST-score ratio (BSR) analysis. The left panel shows a BSR analysis of proteins encoded by strain J99 and HPAG1, with strain 26695 as the reference strain. The right panel shows a BSR analysis of proteins encoded by strain 98-10 and B128, with strain 26695 as the reference strain. The BSR approach analyzes all proteins predicted to be encoded by three genomes, using a measure of similarity based on the ratio of BLAST scores, as described in the Methods. Proteins depicted within the box at the lower left corner (BSR <0.4) correspond to proteins present in the reference proteome (strain 26695) but absent from the two query proteomes. The upper right quadrant represents proteins conserved in all three proteomes.
Figure 3Relatedness of core proteins predicted to be encoded by . A set of 1237 genes present in all 5 H. pylori strains was identified, as described in the Methods. The deduced amino acid sequences of the corresponding proteins encoded by strain 98-10 were used to search a database of sequences from strain 26695 using FastA. The best match was identified, and the percent amino acid identity was calculated. The histogram shows the number of ORFs exhibiting the indicated level of amino acid identity.
Highly divergent alleles in East Asian strain 98-10
| Gene number | Gene number | % aa identity | % aa identity | ||
| HP9810_903g20 | HP0061d | Hypothetical | 67 | 86 | 21 |
| HP9810_889g5 | HP0492d | 72 | 92 | 21 | |
| HP9810_889g32 | HP0519d | 73 | 92 | 15 | |
| HP9810_905g13 | HP0547 | 79 | 87 | 11 | |
| HP9810_868g41 | HP0806d | Hypothetical | 86 | 92 | 6 |
| HP9810_899g75 | HP1322d | Hypothetical | 75 | 90 | 18 |
| HP9810_899g76 | HP1323d | Ribonuclease | 88 | 92 | 6 |
| HP9810_885g15 | HP1524d | Hypothetical | 80 | 95 | 13 |
a The sequences of the indicated gene products in strain 98-10 were compared with corresponding sequences in each of the other 4 strains (26695, J99, HPAG1 and B128), and mean % amino acid identities were calculated as described in Methods.
b The sequences of the indicated gene products in each strain were compared in all permutations, except that comparisons involving strain 98-10 were excluded from analysis. Mean % amino acid identities were calculated as described in Methods.
cPercentage of aligned sites in which the protein from strain 98-10 contained an amino acid different from the corresponding amino acids in proteins from 4 other strains.
dReported to be a constituent of the H. pylori core genome, based on at least one array analysis [18-20].
Highly divergent alleles in strain J99
| Gene number | Gene number | % aa identity | % aa identity | ||
| jhp0028 | HP0032 | Hypothetical | 68 | 91 | 24 |
| jhp0080 | HP0087d | Hypothetical | 89 | 96 | 8 |
| jhp0173 | HP0185d | Hypothetical | 88 | 93 | 7 |
| jhp0395 | HP1029d | Hypothetical | 88 | 95 | 7 |
a The sequences of the indicated gene products in strain J99 were compared with corresponding sequences in each of the other 4 strains (26695, HPAG1, B128, and 98-10), and mean % amino acid identities were calculated.
b The sequences of the indicated gene products in each strain were compared in all permutations, except that comparisons involving strain J99 were excluded from analysis. Mean % amino acid identities were calculated.
cPercentage of aligned sites in which the protein from strain J99 contained an amino acid different from the corresponding amino acids in proteins from 4 other strains.
d Reported to be a constituent of the H. pylori core genome, based on at least one array analysis [18-20].
Strain-specific H. pylori genes present exclusively in strain 98-10 or B128
| Total number of strain-specific genesa | 22 | 51 | 16 |
| Functional class | |||
| Transposase | 2 | 3 | 6 |
| Type IV secretion gene clusterb | 0 | 7 | 0 |
| Hypothetical | 17 | 37 | 9 |
| No database match | 8 | 8 | 2 |
| Closest match lacks known function | 9 | 29 | 7 |
| Other | 3 | 4 | 1 |
| Gene islands containing strain-specific genesc | 2 | 11 | 3 |
aPresent in the indicated strain(s), but not in any of the other four strains for which genome sequences are available.
bThis group of genes was not detected in the original analysis of the genome from strain J99, but was subsequently detected in strain J99 [43].
cFor this analysis, an island was considered to be present if two or more strain-specific genes were in contiguous chromosomal loci.
Strain-specific genes present exclusively in H. pylori strains associated with gastric cancer or premalignant lesions
| Total number of strain-specific genesa | 10 | 16 | 2 | 14 |
| Functional class | ||||
| Restriction/modification | 5 | 6 | 2 | 4 |
| Hypothetical | 3 | 9 | 0 | 4 |
| No database match | 0 | 2 | 0 | 0 |
| Closest match lacks known function | 3 | 7 | 0 | 4 |
| Other | 2 | 1 | 0 | 6 |
aPresent in the indicated strains, but not in the other strains for which genome sequences are available.