Literature DB >> 18442984

A variable gene in a conserved region of the Helicobacter pylori genome: isotopic gene replacement or rapid evolution?

Armelle Ménard1, Antoine Danchin, Sandrine Dupouy, Francis Mégraud, Philippe Lehours.   

Abstract

The present study concerns the identification of a novel coding sequence in a region of the Helicobacter pylori genome, located between JHP1069/HP1141 and JHP1071/HP1143 according to the numbering of the J99 and 26,695 reference strains, respectively, and spanning three different coding DNA sequences (CDSs). The CDSs located at the centre of this locus were highly polymorphic, as determined by the analysis of 24 European isolates, 3 Asian, and 3 African isolates. Phylogenetic and molecular evolutionary analyses showed that the CDSs were not restricted to the geographical origin of the strains. Despite a very high variability observed in the deduced protein sequences, significant similarity was observed, always with the same protein families, i.e. ATPase and bacteriophage receptor/invasion proteins. Although this variability could be explained by isotopic gene replacement via horizontal transfer of a gene with the same function but coming from a variety of sources, it seems more likely that the very high sequence variation observed at this locus is the result of a strong selection pressure exerted on the corresponding gene product. The CDSs identified in the present study could be used as strain specific markers.

Entities:  

Mesh:

Year:  2008        PMID: 18442984      PMCID: PMC2650637          DOI: 10.1093/dnares/dsn006

Source DB:  PubMed          Journal:  DNA Res        ISSN: 1340-2838            Impact factor:   4.458


Comparative analyses conducted on Helicobacter pylori genome sequences, i.e. from H. pylori strain J99 associated with peptic ulcer,[1,2] strain 26695[3] associated with gastritis, and strain HPAG1 associated with atrophic gastritis,[4] revealed a significant macrodiversity (presence or absence of genes) and microdiversity (high polymorphism among orthologous genes).[5,6,7] The plasticity zones and the cag pathogenicity island (cag PAI) are considered to be the main variable genomic areas. The remaining variable genes are distributed throughout the H. pylori genome and some of them have been individualized in clusters of instability concerning blocks of 5–8 coding DNA sequences (CDSs).[5,8,9] Subtractive hybridization is a powerful tool for comparative prokaryotic genomics and was validated on H. pylori by several authors.[10,11] In a previous study, we used subtractive hybridization to compare the genetic content of one H. pylori strain isolated from a gastric MALT lymphoma strain (strain B34) and one chronic gastritis only strain.[12] One original 1092 bp sequence was identified, with no significant nucleotide similarity in comparison to the H. pylori reference strains 26695 and J99 genomes which were available. The aim of the present study was to localize this sequence in the H. pylori genome, to determine its prevalence, and to analyze its genetic diversity in H. pylori. Using an in-house genome walking method as previously described,[13] the original region was localized in the H. pylori genome and a new CDS was subsequently identified using the CDS finder website (http://www.ncbi.nlm.nih.gov/gCDS/CDSig.cgi). This new CDS, called CDS2, is located between two CDS homologous to JHP1069/HP1141 and JHP1071/HP1143 according to the numbering of the J99 and 26695 reference strains, respectively.[1,3] CDS2 replaced JHP1070/HP1142, called CDS1, in H. pylori reference strains J99 and 26695. The percentage of identity between the nucleotide sequences of CDS1 and CDS2 was determined using the LALIGN software,[14] which identifies multiple matching subsegments in two sequences (http://www.ch.embnet.org/software/LALIGN_form.html). CDS2 showed 54.9% identity in a 2046 nucleotides overlap with JHP1070 and 55.5% identity in a 2083 nucleotides overlap with HP1142. CDS2 encodes a putative polypeptide of 820 residues (Genbank accesion number EF492441, EMBL Nucleotide Sequence AM902682). Regarding the protein homology, CDS2 shared 23.6% identity with JHP1070 in a 628 amino acid overlap and 24.4% identity with HP1142 in a 630 amino acid overlap. Finally, a strong nucleotide identity was found with the HPAG1_1080 sequence[4] with 89.3% identity in a 2469 nucleotides overlap. The prevalence and the genetic diversity of the identified genomic locus were first determined for 24 H. pylori strains: 13 H. pylori strains isolated from gastric MALT lymphoma patients obtained from two multicentre French protocols and 11 strains isolated from French chronic gastritis only patients, as previously described[12,15] by PCR amplification using primers hybridizing to the conserved sequence of the flanking genes (JHP1069/HP1141 and JHP1071/HP1143) according to the numbering of the J99 and 26695 strains, respectively. The primers were designed using the web Primer3 software (http://www.broad.mit.edu/cgi-bin/primer/primer3_www.cgi).[16] Direct sequencing was carried out on both strands, and nucleotide and deduced protein sequences were compared with the NCBI Blast program (http://www.ncbi.nlm.nih.gov/BLAST/). A CDS was always present at this locus: CDS1 was found in 54% of the strains, CDS2 in 29% of the strains, and an additional CDS, called CDS3, was identified in 17% of the strains. In the chronic gastritis only H. pylori strain G2, CDS3 had a 53.4% identity in a 2005 nucleotide overlap with CDS1 and a 52.9% identity in a 2063 nucleotides overlap with CDS2, and it encodes a putative polypeptide of 861 residues (GenBank accesion number EF492442, EMBL Nucleotide Sequence AM902683). CDS3 still has no counterpart in databases. Considering the three CDSs, no significant association with a virulence factor was found, nor with a pathology (data not shown). The presence or absence of these CDSs was also verified by dot blot hybridization, as previously described.[12] It showed that the presence of one of these three CDSs was exclusive (no local duplication, data not shown). We first focused on the role of the genes present around the polymorphic locus. According to the revised annotation of the H. pylori genome,[17] JHP1069/HP1141 encodes a methionyl-tRNA formyltransferase (fmt) and JHP1071/HP1143, a conserved hypothetical protein. fmt is considered to be an essential gene which links general metabolism with the translation process (protein biosynthesis).[18,19] As shown in Fig. 1, JHP1069/HP1141 and JHP1071/HP1143 are surrounded by genes of hypothetical function. Considering the G + C% content of the region, all of the CDSs contained a G + C% similar to the rest of the H. pylori genome (∼39%) except for these variable regions: CDS1, CDS2, and CDS3 had 29, 30, and 31% G + C% content, respectively. The lower G + C% content suggests an external origin of these CDSs or a rapid adaptation.[20] Indeed, Saunders et al.[21], using a tetranucleotide and hexanucleotide signature analysis, identified substantial differences between JHP1070 and HP1142 genes and hypothesized that they were horizontally transferred.
Figure 1

Representation of the genomic area of interest, according to the genome sequences of the two Helicobacter pylori reference strains J99 and 26695, which contain the variable CDS identified in the present study. Each CDS is represented by an arrow with the direction indicating the translational direction. The numbering under each CDS corresponds to the number of CDS in H. pylori J99 (top line) and 26695 (bottom line). The function of each CDS is indicated according to the revised annotation of Boneca et al.17 Fmt, methionyl-tRNA formyltransferase; BirA, biotin ligase bifunctional protein; ParB, replication/partition-related protein; ParA, chromosome partition protein (Soj).

Representation of the genomic area of interest, according to the genome sequences of the two Helicobacter pylori reference strains J99 and 26695, which contain the variable CDS identified in the present study. Each CDS is represented by an arrow with the direction indicating the translational direction. The numbering under each CDS corresponds to the number of CDS in H. pylori J99 (top line) and 26695 (bottom line). The function of each CDS is indicated according to the revised annotation of Boneca et al.17 Fmt, methionyl-tRNA formyltransferase; BirA, biotin ligase bifunctional protein; ParB, replication/partition-related protein; ParA, chromosome partition protein (Soj). CDS1 has been annotated as a predicted coding region JHP1070 with no homolog in the databases. It codes for a putative polypeptide of 759 residues. Using a Blastp search, significant homologies were found with (i) Rlo proteins (R-linked ORF) from Campylobacter (e.g. RloG, E = e-11 in Campylobacter jejuni strain RM1167, or RloC, E = 7e-13 in C. jejuni strain RM11221),[22,23] (ii) an ATP/GTP binding-site (GXXXXGKT), and (iii) a putative phage murein transglycosylase YomI (SPbeta phage protein; lytic transglycosylase, E = 4.52e-05 in Bacillus subtilis).[24] The same significant homologies with Rlo and YomI were also found in CDS2 and CDS3. Finally, another interesting point to consider is that the three CDSs shared significant homology with chromosome partition protein SMC: for example, Treponema denticola ATCC 35405 chromosome NC_002967 E = 1.11e-11 for CDS1, Fusobacterium nucleatum subsp. nucleatum ATCC 25586 E = 8.66e-10 for CDS2, Fusobacterium nucleatum subsp. nucleatum ATCC 25586 E = 3.86e-08 for CDS3.[25] We did not find any significant motifs indicating that the proteins could be secreted and/or present in the membrane, but this does not preclude an association with the membrane via interaction with an integral membrane protein partner. How can one explain the apparent variability of the locus identified in the present study? One potential hypothesis is that the region is a hot spot for gene insertion/deletion, with a specific selection pressure maintaining a particular function at that precise location in the genome. Suerbaum and Josenhans[26] recently reviewed the current data on the genetic diversity of H. pylori and argued that this bacterium uses mutation and recombination processes to adapt to its individual host by modifying molecules that interact with the host.[26] Because the three CDSs retain the same similarities, it is likely that (i) these proteins share the same function or (ii) the gene is submitted to specific selection pressure making it evolve at a very rapid rate. We proposed that such a protein could be a phage receptor/translocator or that it could allow the DNA phage to enter host cells by remodelling the cell wall.[27-29] Indeed, as already described in Escherichia coli, this kind of protein is subjected to a strong positive selection.[30] Helicobacter pylori genotypes vary markedly with their geographical region, and this is particularly the case for genes under positive selection. Therefore, the corresponding genes were looked for in three East Asian strains and three African strains. All three CDS were found: CDS1 was found in one Asian (strain 8038) and one African strain (strain TALLAN), CDS2 in two Asian strains (strains 12001 and strain 8033), and CDS3 in one Asian (strain 19A) and one African strain (strain BAPOOI) (Fig. 2). A phylogenetic analysis was conducted on the deduced amino acid sequences of CDS. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 4.[31] Phylogenetic trees were generated by the neighbour-joining method.[32] Molecular distances were determined using the Kimura two-parameter model.[33] The tree showed three independent clusters which were clearly separated and corresponded to CDS1, CDS2, and CDS3, respectively (Fig. 2). However, the exact organization of these different CDS cannot be determined since this consensus tree cannot be rooted to other species. Indeed, no CDS with significant homology has ever been found in other species (in databases). Interestingly, even though the testing was performed on a limited number of non-European strains, these results indicate that the presence of one of the three CDS cannot be restricted to the geographical origin of the strains.
Figure 2

Phylogenetic analysis of CDS, CDS1, CDS2, and CDS3, proteins generated with the neighbour-joining method. The phylogeny presented is based on the alignment of the entire deduced protein. The bootstrap values are indicated next to each node. Nucleotide and protein sequences are available in GenBank (EMBL) for the Helicobacter pylori strains B34, G2, 8038, TALLAN, 12001, 8033, 19A, and BAPOO1 under the accession numbers EF492441 (AM902682), EF492442 (AM902683), EU553483 (AM946633), EU553485 (AM946634), EU553505 (AM946635), EU553482 (AM946636), EU553481 (AM946637), and EU556504 (AM946638), respectively. Nucleotide and protein sequences of H. pylori reference strains 26695, J99, and HPAG1 are available in GenBank under the accession numbers AE000511, AE001439, and ABF85147, respectively.

Phylogenetic analysis of CDS, CDS1, CDS2, and CDS3, proteins generated with the neighbour-joining method. The phylogeny presented is based on the alignment of the entire deduced protein. The bootstrap values are indicated next to each node. Nucleotide and protein sequences are available in GenBank (EMBL) for the Helicobacter pylori strains B34, G2, 8038, TALLAN, 12001, 8033, 19A, and BAPOO1 under the accession numbers EF492441 (AM902682), EF492442 (AM902683), EU553483 (AM946633), EU553485 (AM946634), EU553505 (AM946635), EU553482 (AM946636), EU553481 (AM946637), and EU556504 (AM946638), respectively. Nucleotide and protein sequences of H. pylori reference strains 26695, J99, and HPAG1 are available in GenBank under the accession numbers AE000511, AE001439, and ABF85147, respectively. The type of selection operating at the amino acid level was also evaluated by comparing non-synonymous substitutions (Ka) and synonymous substitutions (Ks).[34] The overall mean of Ks and Ka substitutions was determined using the Nei–Gojobori method.[35] The codon based Z-test of selection[36] was used to evaluate the significance of Ka/Ks substitution values. Bootstrap confidence levels were determined by randomly resampling the sequencing data 1000 times. The results are indicated for each CDS in Table 1. Since Ka/Ks was <1 for the three CDSs analyzed, the purifying selection hypothesis was tested and the significant P-value obtained supports the hypothesis of conservation at the protein level for each CDS (Z-test P < 0.001).
Table 1

Analysis of molecular distances and synonymous and non-synonymous nucleotide substitutions within CDSs, CDS1 (n = 4), CDS2 (n = 4), and CDS3 (n = 3), in different Helicobacter pylori strains

CDS1CDS2CDS3
Mol. distance (nt)0.045 ± 0.003&0.080 ± 0.0050.069 ± 0.004
No. differences (nt)98.167 ± 7.246182.167 ± 9.753170.00 ± 9.334
Ks0.088 ± 0.0120.161 ± 0.0150.123 ± 0.013
Ka0.035 ± 0.0040.061 ± 0.0040.057 ± 0.004
Ka/Ks0.398 ± 0.0710.379 ± 0.0430.463 ± 0.059

nt, nucleotides; Ks, synonymous substitutions; Ka, non-synonymous substitutions.

†P Z-Test < 0.001 for purifying selection hypothesis (Ka/Ks < 1).

&Value ± standard error.

The GenBank accession numbers of the sequences used in this study are listed in Fig. 2.

Analysis of molecular distances and synonymous and non-synonymous nucleotide substitutions within CDSs, CDS1 (n = 4), CDS2 (n = 4), and CDS3 (n = 3), in different Helicobacter pylori strains nt, nucleotides; Ks, synonymous substitutions; Ka, non-synonymous substitutions. †P Z-Test < 0.001 for purifying selection hypothesis (Ka/Ks < 1). &Value ± standard error. The GenBank accession numbers of the sequences used in this study are listed in Fig. 2. Finally, we propose that the very high variation observed in the protein sequences reflects the permanent selection pressure exerted by phages or other elements interacting with the organism's cell envelope. If this is the case, this locus could be used as a marker for constraints operating in the environmental niches in which particular H. pylori strains evolve. The presence of phages in H. pylori has been rarely described.[37] For example, Marsich et al.[38] postulated that H. pylori lysozyme gene (lys) had a prophage origin. Numerous other explanations cannot be excluded, such as bacterial mammalian host interaction, protozoan predation, or porin specificity. Indeed several publications have focused on cases of genes that vary markedly among H. pylori isolates. One example is the replacement of babA by babB as reported by Solnick et al.[39] Helicobacter pylori BabA is the ABO blood group antigen binding adhesin, which has a closely related paralogue (BabB) whose function is unknown. An extensive genotypic diversity in babA and babB across different strains, as well as within a strain colonizing an individual patient has been shown in line with the hypothesis that diverse profiles of babA and babB reflect selective pressures for adherence, which may differ across different hosts and within an individual over time.[40] In summary, a novel polymorphic locus comprised of a single gene was identified in the H. pylori genome. Although this variation could be explained by isotopic gene replacement via horizontal transfer of a gene with the same function but coming from a variety of sources, it seems more likely that the very high sequence variation observed at this locus is the result of a strong selection pressure exerted on the corresponding gene product. We propose that the evolution of CDS1, CDS2, and CDS3 is due to the occurrence of a specific environmental event, such as interaction with a biological structure, e.g. bacteriophage which are involved in surface cell secretion. The genes identified in the present study could be used as strain specific markers for particular niches. The predicted function of the gene products, although highly speculative, should encourage investigators to explore the presence of phages in the H. pylori environment and study their relationship regarding pathogenicity.
  37 in total

1.  Identification of four families of peptidoglycan lytic transglycosylases.

Authors:  N T Blackburn; A J Clarke
Journal:  J Mol Evol       Date:  2001-01       Impact factor: 2.395

2.  Base composition bias might result from competition for metabolic resources.

Authors:  Eduardo P C Rocha; Antoine Danchin
Journal:  Trends Genet       Date:  2002-06       Impact factor: 11.639

3.  Traces of human migrations in Helicobacter pylori populations.

Authors:  Daniel Falush; Thierry Wirth; Bodo Linz; Jonathan K Pritchard; Matthew Stephens; Mark Kidd; Martin J Blaser; David Y Graham; Sylvie Vacher; Guillermo I Perez-Perez; Yoshio Yamaoka; Francis Mégraud; Kristina Otto; Ulrike Reichard; Elena Katzowitsch; Xiaoyan Wang; Mark Achtman; Sebastian Suerbaum
Journal:  Science       Date:  2003-03-07       Impact factor: 47.728

4.  Bacteriophage PRD1 DNA entry uses a viral membrane-associated transglycosylase activity.

Authors:  P S Rydman; D H Bamford
Journal:  Mol Microbiol       Date:  2000-07       Impact factor: 3.501

5.  Genome sequence and analysis of the oral bacterium Fusobacterium nucleatum strain ATCC 25586.

Authors:  Vinayak Kapatral; Iain Anderson; Natalia Ivanova; Gary Reznik; Tamara Los; Athanasios Lykidis; Anamitra Bhattacharyya; Allen Bartman; Warren Gardner; Galina Grechkin; Lihua Zhu; Olga Vasieva; Lien Chu; Yakov Kogan; Oleg Chaga; Eugene Goltsman; Axel Bernal; Niels Larsen; Mark D'Souza; Theresa Walunas; Gordon Pusch; Robert Haselkorn; Michael Fonstein; Nikos Kyrpides; Ross Overbeek
Journal:  J Bacteriol       Date:  2002-04       Impact factor: 3.490

6.  A revised annotation and comparative analysis of Helicobacter pylori genomes.

Authors:  Ivo G Boneca; Hilde de Reuse; Jean-Charles Epinat; Maude Pupin; Agnès Labigne; Ivan Moszer
Journal:  Nucleic Acids Res       Date:  2003-03-15       Impact factor: 16.971

7.  Identification of strain-specific genes located outside the plasticity zone in nine clinical isolates of Helicobacter pylori.

Authors:  Grettel Chanto; Alessandra Occhialini; Nathalie Gras; Richard A Alm; Francis Mégraud; Armelle Marais
Journal:  Microbiology       Date:  2002-11       Impact factor: 2.777

8.  Functional organization and insertion specificity of IS607, a chimeric element of Helicobacter pylori.

Authors:  D Kersulyte; A K Mukhopadhyay; M Shirai; T Nakazawa; D E Berg
Journal:  J Bacteriol       Date:  2000-10       Impact factor: 3.490

9.  Evaluation of the association of nine Helicobacter pylori virulence factors with strains involved in low-grade gastric mucosa-associated lymphoid tissue lymphoma.

Authors:  Philippe Lehours; Armelle Ménard; Sandrine Dupouy; Bernard Bergey; Fréderique Richy; Frank Zerbib; Agnès Ruskoné-Fourmestraux; Jean Charles Delchier; Francis Mégraud
Journal:  Infect Immun       Date:  2004-02       Impact factor: 3.441

10.  Helicobacter pylori expresses an autolytic enzyme: gene identification, cloning, and theoretical protein structure.

Authors:  Eleonora Marsich; Pierfrancesco Zuccato; Sonia Rizzi; Amedeo Vetere; Enrico Tonin; Sergio Paoletti
Journal:  J Bacteriol       Date:  2002-11       Impact factor: 3.490

View more
  4 in total

1.  Helicobacter pylori possesses four coiled-coil-rich proteins that form extended filamentous structures and control cell shape and motility.

Authors:  Mara Specht; Sarah Schätzle; Peter L Graumann; Barbara Waidner
Journal:  J Bacteriol       Date:  2011-06-03       Impact factor: 3.490

Review 2.  Horizontal gene transfers with or without cell fusions in all categories of the living matter.

Authors:  Joseph G Sinkovics
Journal:  Adv Exp Med Biol       Date:  2011       Impact factor: 2.622

3.  Genome sequencing reveals a phage in Helicobacter pylori.

Authors:  Philippe Lehours; Filipa F Vale; Magnus K Bjursell; Ojar Melefors; Reza Advani; Steve Glavas; Julia Guegueniat; Etienne Gontier; Sabrina Lacomme; António Alves Matos; Armelle Menard; Francis Mégraud; Lars Engstrand; Anders F Andersson
Journal:  MBio       Date:  2011-11-15       Impact factor: 7.867

Review 4.  Bacterial-Viral Interactions in Human Orodigestive and Female Genital Tract Cancers: A Summary of Epidemiologic and Laboratory Evidence.

Authors:  Ikuko Kato; Jilei Zhang; Jun Sun
Journal:  Cancers (Basel)       Date:  2022-01-15       Impact factor: 6.639

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.