Literature DB >> 22534164

Genome-wide survey of mutual homologous recombination in a highly sexual bacterial species.

Koji Yahara1, Mikihiko Kawai, Yoshikazu Furuta, Noriko Takahashi, Naofumi Handa, Takeshi Tsuru, Kenshiro Oshima, Masaru Yoshida, Takeshi Azuma, Masahira Hattori, Ikuo Uchiyama, Ichizo Kobayashi.   

Abstract

The nature of a species remains a fundamental and controversial question. The era of genome/metagenome sequencing has intensified the debate in prokaryotes because of extensive horizontal gene transfer. In this study, we conducted a genome-wide survey of outcrossing homologous recombination in the highly sexual bacterial species Helicobacter pylori. We conducted multiple genome alignment and analyzed the entire data set of one-to-one orthologous genes for its global strains. We detected mosaic structures due to repeated recombination events and discordant phylogenies throughout the genomes of this species. Most of these genes including the "core" set of genes and horizontally transferred genes showed at least one recombination event. Taking into account the relationship between the nucleotide diversity and the minimum number of recombination events per nucleotide, we evaluated the recombination rate in every gene. The rate appears constant across the genome, but genes with a particularly high or low recombination rate were detected. Interestingly, genes with high recombination included those for DNA transformation and for basic cellular functions, such as biosynthesis and metabolism. Several highly divergent genes with a high recombination rate included those for host interaction, such as outer membrane proteins and lipopolysaccharide synthesis. These results provide a global picture of genome-wide distribution of outcrossing homologous recombination in a bacterial species for the first time, to our knowledge, and illustrate how a species can be shaped by mutual homologous recombination.

Entities:  

Mesh:

Year:  2012        PMID: 22534164      PMCID: PMC3381677          DOI: 10.1093/gbe/evs043

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Introduction

The nature of a species has been a fundamental and controversial question in biology for centuries (Darwin 1859; Mayr 1942). The biological species concept defines a species as a reproductively isolated group of organisms that exchange genetic material by interbreeding, and this definition has been widely accepted for eukaryotes since the mid-20th century. However, the era of genome/metagenome sequencing has intensified the debate in prokaryotes (Achtman and Wagner 2008) because extensive horizontal gene transfer across species boundaries (Nakamura et al. 2004; Fraser et al. 2009) makes the very existence of separate species debatable (Achtman and Wagner 2008; Doolittle and Zhaxybayeva 2009). More than a dozen attempts have been made to establish a conceptual framework for defining prokaryote species, including the ecotype model (Cohan and Perry 2007). However, none of these models is based on genome-wide sequence data. Recently, effect of homologous recombination between lineages in maintaining cohesion within a bacterial species has been pointed out (Fraser et al. 2007, 2009; Didelot et al. 2011; Takuno et al. 2012). From this point of view, it is important to reveal flux of mutual homologous recombination between lineages and how it shapes a bacterial species. The extent of outcrossing homologous recombination throughout the entire genome has not been quantitatively analyzed using the genome-wide sequence data in bacteria (Konstantinidis et al. 2006). It thus remains a challenge to reveal genome-wide distribution of the homologous recombination rate in a bacterial species. A related issue is the relation between genome diversity and homologous recombination. Homologous recombination rate is known to be correlated with DNA diversity in Drosophila melanogaster (Begun and Aquadro 1992), although such a correlation is questionable in humans (Spencer et al. 2006). Therefore, it is also important to reveal a genome-wide relationship between homologous recombination rate and DNA diversity in a bacterial species. It will provide a basis to detect genes with high or low recombination rates that deviate from the relationship, which may be a characteristic of the species. From this perspective, Helicobacter pylori is of great interest. This bacterium is present in the stomach of over half the human population, where it is linked with gastritis (stomach inflammation), ulcers, and gastric (stomach) cancer (Yamaoka 2008). It exhibits a remarkable allelic diversity (an “allele” indicates one of the alternative sequences that is possible at a locus in a genome). It is a highly sexual bacterial species, and the allelic diversity is primarily attributed to high homologous recombination between coinfecting lineages following natural transformation in the stomach (Suerbaum and Josenhans 2007). Homologous recombination is much more frequent than point mutation (Suerbaum et al. 1998). One homologous recombination event imports a cluster of small nucleotide polymorphisms into the genome, which increases the relative effect of recombination compared with mutation (Kennemann et al. 2011). Previous population genetic studies on homologous recombination in H. pylori, however, used a relatively small number of loci, in particular the seven genes used for multilocus sequence typing (MLST) (Falush et al. 2001, 2003; Linz et al. 2007; Moodley et al. 2009). In this study, we performed a genome-wide analysis of outcrossing homologous recombination using entire genome sequences of global H. pylori strains.

Materials and Methods

Helicobacter pylori Genome Sequences

Helicobacter pylori strain names and accession numbers from GenBank were as follows (Furuta et al. 2011): J99, NC_000921.1; P12, NC_011498.1 and NC_011499.1; G27, NC_011333.1 and NC_011334.1; HPAG1, NC_008086.1 and NC_008087.1; 26695, NC_000915.1; Shi470, NC_010698.2; F16, DDBJ:AP011940; F30, DDBJ:AP011941 and AP011942; F32, DDBJ: AP011943 and AP011944; F57, DDBJ:AP011945.

Sequence Alignment of Orthologous Genes

An entire data set of orthologous genes (one-to-one orthologous groups in supplementary table S1, Supplementary Material online, which were almost equivalent to the core genes) was prepared by clustering using DomClust (Uchiyama 2006) and RECOG (http://mbgd.genome.ad.jp/RECOG/). The core genes were then extracted using CoreAligner (Uchiyama 2008). Protein sequences were aligned using ClustalW (Thompson et al. 1994). The aligned sequences were then replaced with the corresponding DNA sequences to ensure that gaps occurred only at codon boundaries. We automatically classified orthologous genes based on the functional categories in MBGD (Uchiyama 2003). We then treated outer membrane protein (OMP) and restriction–modification (RM) system as separate categories.

Phylogenetic Analysis of Core, MLST, and Individual Genes

The phylogenetic analysis of core genes and concatenated MLST genes (atpA, efp, mutY, ppa, trpC, ureI, and yphC) was conducted using MOLPHY (Adachi and Hasegawa 1996) and Neighbor-Net (Bryant and Moulton 2004). Bayesian phylogenetic analyses of individual genes were conducted using MrBayes 3.1.2., with a GTR + G + I nucleotide substitution model in a partitioning scheme with three subsets, which corresponded to the three codon positions (Huelsenbeck and Ronquist 2001). All the parameters were unlinked. In Markov chain Monte Carlo procedure, the number of generations was 500,000, and the first 1,250 generations were discarded as a burn-in while the sampling frequency was 100. Individual genes that did not fit significantly well to the core tree were examined using the Shimodaira–Hasegawa test (Shimodaira and Hasegawa 1999). We calculated the Robinson and Foulds distance (Robinson and Foulds 1981) of an individual tree relative to the core tree, and the overall picture was obtained by multidimensional scaling (Cox TF and Cox MAA 1994) and clustering (Hartigan 1975).

Genome-Wide Detection of Recombination-Derived Mosaics

To obtain an overall view of recombination-derived mosaic structures throughout the entire genome, we extended the bootscan analysis (Salminen et al. 1995) using Hyphy version 2 (Kosakovsky Pond et al. 2005) with a window size of 800 bp and a step size of 30 bp using multiple genome alignments generated by Mauve (Darling et al. 2004). For each window in the genome, the bootscan values were calculated from a bootstrapped phylogenetic tree. To eliminate noise during phylogenetic estimation, we did not use bootscan values that were less than 90. The window size and step size were set as 800 and 30 bp, respectively. Column containing gaps were not used in the phylogenetic estimation. We validated the settings by confirming that the turnover of bootscan values, as an indicator of a mosaic boundary, was not found in pseudosequence alignments where the columns had been randomly shuffled.

Estimation of Minimum Number of Recombination Events and Recombination Rate

For each orthologous gene, the minimum number of recombination events (rmin) was calculated using the four-gamete test (Hudson and Kaplan 1985). This test locates pairs of closest segregating sites with four haplotypes that could not have arisen without recombination or a recurrent mutation. Homologous recombination is much more frequent than mutation in H. pylori (Suerbaum et al. 1998; Morelli et al. 2010; Kennemann et al. 2011), which makes the four-gamete test method suitable for the estimation of recombination events. We used the method implemented in the PGEToolbox (Cai 2008), which filters gaps in advance. We used the minimum number of recombination events divided by gene length in nucleotides (rmin/nt) as a measure of recombination rate.

Identification of Genes with a Particularly High or Low Recombination Rate

We identified these using two approaches. The first approach (method A) was based on the regression of the minimum number of recombination events on π (nucleotide diversity) without intercept. The Poisson regression was conducted using the rmin as the response variable, whereas the linear regression was conducted using the rmin/nt as the response variable. We used the no-intercept models so that the regression line includes the origin because recombination cannot be detected without nucleotide diversity. The π for each orthologous gene was calculated using DnaSP (Librado and Rozas 2009). Using the regression, we detected genes that deviated significantly from the regression line. We did not use highly diverged genes where π > 0.08 during fitting because most of these did not fit the regression model and because we thought that they require another analysis (see below) and biological explanations (see text). We excluded three exceptional genes (HP0462, HP1438, and HP1439) where π > 0.2 for the same reason. The second approach (method B) was used for those highly diverged genes where π > 0.08. Of these genes, we extracted those with rmin or rmin/nt in the top or bottom 2.5% of all the genes. In both approaches, we did not use genes with more than 50% of gaps in the alignment.

Identification of Horizontally Transferred Genes from Distantly Related Organisms

We also determined the minimum number of probable recombination events in horizontally transferred genes. Horizontally transferred genes from distantly related organisms were inferred using a Bayesian inference program with training models based on the nucleotide composition. This method exploits fixed order 5-mer nucleotide “words” that deviate from the background genome, and it has been applied successfully to many bacterial genomes including H. pylroi 26695 and J99 (Nakamura et al. 2004). Horizontally transferred “alien” genes such as genomic island were inferred using Alien Hunter (Vernikos and Parkhill 2006), which identifies atypical nucleotide compositions based on variable order motif distributions. The transferred genes in the H. pylori 26695 genome, which had a one-to-one orthologous relationship, were used to calculate the minimum number of recombination events during comparisons with other genes.

Visualization of a Genome Map

A genome map of H. pylori strain 26695 was constructed to show distribution of the rmin/nt (as a measure of the homologous recombination rate), which classified the genes into the three categories according to the rmin/nt: >top 25%, from bottom 25% to top 25%,

Identification of Functional Motifs/Domains

We searched for PROSITE motifs (Sigrist et al. 2002) in genes with a particularly high recombination rate using the ps_scan program (de Castro et al. 2006). We also searched for conserved domains in the CDD (Conserved Domain Database) using the NCBI Batch Web CD-Search Tool (Marchler-Bauer et al. 2009).

Results

Phylogenetic Analysis of All Genes Suggested Their Mutual Recombination

Complete genome sequences were obtained for ten global H. pylori strains (see Materials and Methods). Figure 1 shows the maximum likelihood phylogenetic trees for the concatenated MLST genes (3,406 bp) and the core genes (1,097,937 bp), respectively. The tree for the core genes (fig. 1) had visibly better resolution with higher bootstrap values. However, different patterns were observed in the phylogenetic network analysis (fig. 1) where the topology was polytomous with no tree-like structures in either of the two major clusters from the east (Japan & Amerind) and west (Europe and West Africa), suggesting substantial recombination between the seven genes within each cluster (fig. 1). Recombination appeared to have had a similar influence on the core genes (fig. 1).
F

Phylogenetic trees of MLST genes, core genes, and individual genes. (a) Maximum likelihood trees of MLST genes (3,406 bp) and (b) core genes (1,097,937 bp). (c) Phylogenetic networks of MLST genes and (d) core genes. (e) An example of trees with clear topology differences with the core tree for ubiA, prenyltransferase (HP1360). Scale bars indicate the number of substitutions per nucleotide site. Numbers indicate the bootstrap values (in a and b) or posterior probabilities (in e).

Phylogenetic trees of MLST genes, core genes, and individual genes. (a) Maximum likelihood trees of MLST genes (3,406 bp) and (b) core genes (1,097,937 bp). (c) Phylogenetic networks of MLST genes and (d) core genes. (e) An example of trees with clear topology differences with the core tree for ubiA, prenyltransferase (HP1360). Scale bars indicate the number of substitutions per nucleotide site. Numbers indicate the bootstrap values (in a and b) or posterior probabilities (in e). The probable effect of genetic information transfer among different phylogeographic lineages was also apparent in the individual gene trees, which did not fit significantly well to the core tree (P < 0.001, Shimodaira–Hasegawa test, as shown in supplementary table S2, Supplementary Material online). The example shown in figure 1 suggests horizontal transfer from a European/African lineage to a Japanese lineage (F32). We summarized the topological distances for each of the 1,224 trees relative to the core tree (supplementary table S2, Supplementary Material online) using a multidimensional scaling plot (fig. 2) and a clustering diagram (fig. 2). The topological diversity in the gene trees of H. pylori was clearly greater than that in Rickettsia, which is an endosymbiont bacterium where recombination is expected to be rare. In H. pylori, gene trees with an identical topology to the core tree were rare, whereas those with a deviant topology were conspicuous and scattered. Most of the H. pylori gene trees had different topologies. Thus, all the phylogenetic lines of evidence suggested frequent recombination between H. pylori lineages.
F

Variable topology of individual gene trees. (a and b) Multidimensional scaling. (c and d) Clustering. The colors indicate topological distances to the core tree. The red tree has an identical topology to the core tree. Each branch in the clustering represents one topology. The length of the bar below each branch indicates the number, on a logarithmic scale, of genes with that topology.

Variable topology of individual gene trees. (a and b) Multidimensional scaling. (c and d) Clustering. The colors indicate topological distances to the core tree. The red tree has an identical topology to the core tree. Each branch in the clustering represents one topology. The length of the bar below each branch indicates the number, on a logarithmic scale, of genes with that topology.

Genome-Wide Mosaics due to Mutual Homologous Recombination

Next, we visualized the consequences of the probable frequent mutual homologous recombinations throughout the entire genome (fig. 3). Using the bootscan analysis (Salminen et al. 1995), a sliding window approach, phylogenetic trees were estimated with bootstrapping throughout the genome. Figure 3 shows the bootscan values (i.e., bootstrap values on a branch grouping the query strain with other strains) for each of the H. pylori genomes that were used as queries. Recombination can alter phylogenetic relationships between the query and other genomes, so the turnover of the bootscan values between windows can indicate a recombination-derived mosaic structure (Lole et al. 1999). The turnover of the bootscan values indicative of mosaic boundaries was common throughout the entire genome. This overall view of mosaic structures demonstrated that there had been frequent homologous recombinations between H. pylori genomes. Furthermore, this approach revealed more cohesive connections due to recombination within a subgroup, for example, east and west.
F

Genome-wide mosaic structure indicating recombination. The bootscan values, which are indicative of phylogenetic similarity to the query genome (shown in horizontal axis), were plotted for each of the other nine genome sequences.

Genome-wide mosaic structure indicating recombination. The bootscan values, which are indicative of phylogenetic similarity to the query genome (shown in horizontal axis), were plotted for each of the other nine genome sequences.

Minimum Number of Recombination Events in Each Gene

Next, we calculated the minimum number of recombination events (rmin) for every gene (supplementary table S2, Supplementary Material online) using the four-gamete test, a simple method locating a pair of segregating sites with four haplotypes (Materials and Methods). Almost all genes (>99%) showed at least one indicator of recombination. Only three genes with low nucleotide diversity showed no sign of recombination (supplementary table S2, Supplementary Material online).

Minimum Number of Recombination Events, Recombination Rate, and Nucleotide Diversity

Before examining genes with a particularly high or low frequency of recombination, we examined properties of the minimum number of recombination events. The minimum number of recombination events (rmin) is an indicator of per-gene recombination, which conceivably depends on gene length. A relationship between the minimum number of recombination events (rmin) and the gene length among all the genes is shown in figure 4. Clearly, the relationship is linear, indicating rmin increases proportionally to gene length. In order to control this effect, we used the minimum number of recombination events divided by gene length in base pairs (rmin/nt).
F

Linear relationship between the minimum number of recombination events (rmin) and gene length. The line represents linear regression without an intercept.

Linear relationship between the minimum number of recombination events (rmin) and gene length. The line represents linear regression without an intercept. The distribution of rmin/nt is summarized into a genome map in which all genes were classified into the three categories: >top 25%, from bottom 25% to top 25%, and
F

Genome map of Helicobacter pylori (strain 26695) featuring the level of recombination. 1: Genes with a high or low recombination rate considering gene (nucleotide) diversity. 2: Genes classified according to the recombination rate (minimum number of recombination events/gene length). 3: Horizontally transferred genes from a distance.

Genome map of Helicobacter pylori (strain 26695) featuring the level of recombination. 1: Genes with a high or low recombination rate considering gene (nucleotide) diversity. 2: Genes classified according to the recombination rate (minimum number of recombination events/gene length). 3: Horizontally transferred genes from a distance. Meanwhile, it is also expected that the minimum number of recombination events depends on nucleotide diversity, even after they are divided by gene length. The relationship between nucleotide diversity and the minimum number of recombination events per nucleotide (rmin/nt) is shown in figure 6. The figure indicates linear relationship between π and rmin/nt in genes with π ≤ 0.08, which well fitted the regression. Together with figure 4, these genome-wide analyses indicate that the “true” recombination rate is nearly constant across the genome. Meanwhile, another relationship seemed to emerge when π > 0.08, which we did not explore by regression.
F

Linear relationship between the minimum number of recombination events per gene length in nucleotide (rmin/nt) and gene (nucleotide) diversity (π). The broken line indicates nucleotide diversity (π) = 0.08. Red: genes with a particularly high recombination rate in table 1. Green: genes with a particularly low recombination rate in table 2.

Linear relationship between the minimum number of recombination events per gene length in nucleotide (rmin/nt) and gene (nucleotide) diversity (π). The broken line indicates nucleotide diversity (π) = 0.08. Red: genes with a particularly high recombination rate in table 1. Green: genes with a particularly low recombination rate in table 2.
Table 1

Genes with High Recombination

Locus TagGeneDescriptionπrmin/ntrminLength (nt)Methoda
HP1277trpATryptophan synthase subunit alphaBiosynthesis0.0700.05846789A
HP0723ansBL-asparaginase IIVirulence0.0590.05050993A
mHP1361comECompetence locus ETransformation0.0570.049651,314A
HP0808acpS4′-phosphopantetheinyl transferaseFatty acid biosynthesis0.0480.04717360A
mHP0333dprAHypothetical protein involved in transformationTransformation0.0480.04536801A
HP1290pnuCNicotinamide mononucleotide transporterTransport0.0490.04429663A
HP0785lolAOuter membrane lipoprotein carrier proteinMembrane0.0490.04324555A
HP0132sdaAL-serine deaminaseMetabolism0.0460.041561,368A
HP0809fliLFlagellar basal body–associated protein FliLCellular processes0.0320.03821552A
HP1170glnPGlutamine ABC transporter, permease proteinTransport0.0380.03725672A
mHP0514rplI50S ribosomal protein L9Translation0.0380.03616450A
HP1261nuoBNADH dehydrogenase subunit BMetabolism0.0360.03517480A
mHP1262nuoCNADH dehydrogenase subunit CMetabolism0.0360.03427798A
HP1476ubiD3-octaprenyl-4-hydroxybenzoate carboxy-lyaseBiosynthesis of cofactors0.0350.03419564A
HP0389sodFIron-dependent superoxide dismutaseCellular processes0.0330.03321642A
HP0125rpmI50S ribosomal protein L35Translation0.0220.0265195A
HP1196rpsG30S ribosomal protein S7Translation0.0220.02612468A
HP0651futBAlpha-(1,3)-fucosyltransferaseCell envelope0.1080.058831,431B
HP0523cag4Peptidoglycan hydrolase, Cag island protein (caggamma)Cellular processes0.0860.05528510B
HP0009hopZOMPOMP0.1440.0551041,905B
HP1243babAOMPOMP0.0960.0541192,202B
HP1250Bacterial SH3 domainHypothetical0.0910.05029579B
HP0374Hypothetical proteinHypothetical (other categories)b0.0630.05638681A
mHP1384Hypothetical proteinHypothetical0.0530.05411204A
HP1225crcBHypothetical proteinHypothetical0.0450.04819393A
mHP0568Hypothetical proteinHypothetical (translation)b0.0560.04842873A
mHP0614Hypothetical proteinHypothetical0.0470.04214333A
HP1548Hypothetical proteinHypothetical0.0420.03813339A
HP0920Hypothetical proteinHypothetical (other categories)b0.0390.03625693A
HP1234Hypothetical proteinHypothetical (cell envelope)0.0380.03632897A
HP1203asecEPreprotein translocase subunit SecEHypothetical (no functional assignment)b0.0280.0336180A
HP1423Hypothetical proteinHypothetical (other categories)b0.0310.0318255A
HP1391Hypothetical proteinHypothetical0.0300.0309297A
HP0730Hypothetical proteinHypothetical0.1550.09529306B
HP0338Hypothetical proteinHypothetical0.1480.07643567B
HP0350Hypothetical proteinHypothetical0.0840.05537669B
HP0065Hypothetical proteinHypothetical0.1290.05419354B
mHP1322Hypothetical proteinHypothetical0.1140.05029579B

Note.—ABC, ATP-binding cassette.

A: Top 2.5% of the distribution of deviation from the regression line (fig. 6); B: Top 2.5% of the distribution of r/nt.

Category in MBGD.

Table 2

Genes with Low Recombination

Locus TagGeneDescriptionπrmin/ntrminLength (nt)Methoda
HP0200rpmF50S ribosomal protein L32Translation0.0370.0071147A
HP1016pgsAPhosphatidylglycerophosphate synthaseLipid metabolism0.0340.0085603A
HP0653pfrNonheme iron-containing ferritinTransport0.0320.0105504A
HP1448rnpARibonuclease P, protein componentTranscription0.0650.0105486A
HP0032clpSHypothetical proteinOther categories0.0660.0113276A
HP0320tatASec-independent protein translocase proteinTranslocation0.0410.0133240A
HP0799mogAMolybdenum cofactor biosynthesis proteinBiosynthesis of cofactors0.0450.0179531A
HP1512frpB-4Putative IRON-regulated OMPOMP0.0520.019512,634A
HP0326(2)neuACMP-N-acetylneuraminic acid synthetaseCell envelope0.0600.024381,554A
HP1287Putative transcriptional regulatorTranscription0.0630.02416654A
HP0566dapFDiaminopimelate epimeraseBiosynthesis0.0570.02621822A
HP0805Putative lipopolysaccharide biosynthesis proteinCell envelope0.0580.02723855A
HP1177hopQOMPOMP0.0740.029561,926A
HP1157(1)hopLOMPOMP0.0780.0301093,693A
HP1551yajCPreprotein translocase subunit YajCCellular processes0.0660.03112384A
HP1286Conserved hypothetical secreted proteinCell envelope0.0700.03318549A
HP1502Hypothetical proteinHypothetical0.0340.0115438A
HP0552Hypothetical proteinHypothetical (other categories)b0.0520.01614864A
mHP0608Hypothetical proteinHypothetical0.0440.01810570A
HP0203Hypothetical proteinHypothetical0.0630.0185276A
mHP0836Hypothetical proteinHypothetical0.0530.0207354A
HP0863Hypothetical proteinHypothetical0.0500.021351,629A
HP0495Hypothetical proteinHypothetical (other categories)b0.0590.0277261A
HP1424Hypothetical proteinHypothetical0.0610.02717621A
aHP26695_005rfaJ-2Putative lipopolysaccharide biosynthesis proteinCell envelope0.0620.029341,155A
HP0861Putative thiol:disulfide interchange proteinHypothetical0.0660.03224741A
HP0902Hypothetical proteinHypothetical0.0760.03310300A
HP0644Hypothetical proteinHypothetical (cell envelope)b0.0680.03410294A

A: Top 2.5% of the distribution of deviation from the regression line (fig. 6); B: Top 2.5% of the distribution of rmin/nt.

Category in MBGD.

Genes with High Recombination Note.—ABC, ATP-binding cassette. A: Top 2.5% of the distribution of deviation from the regression line (fig. 6); B: Top 2.5% of the distribution of r/nt. Category in MBGD. Genes with Low Recombination A: Top 2.5% of the distribution of deviation from the regression line (fig. 6); B: Top 2.5% of the distribution of rmin/nt. Category in MBGD. The relationship between nucleotide diversity (π) and the minimum number of recombination events per gene (rmin) is shown in supplementary figure S1 (Supplementary Material online). The figures indicate positive relationship for genes with π ≤ 0.08. The relationship appears exponential (supplementary fig. S1, Supplementary Material online).

Detection of Genes with High or Low Recombination Based on the Regression

Using these relationships (fig. 6 and supplementary fig. S1, Supplementary Material online), we identified genes with a particularly high or low recombination rate as those deviated significantly from the regression lines (red and green dots in fig. 6 and supplementary fig. S1, Supplementary Material online) (method A). The genes with particularly high recombination are listed in table 1 (based on “rmin/nt”) and supplementary table S5 (Supplementary Material online) (based on “rmin”), whereas those with particularly low recombination are listed in table 2 (based on “rmin/nt”) and supplementary table S6 (Supplementary Material online) (based on “rmin”). Hereafter, we mainly examined the results using the minimum number of recombination events per nucleotide (rmin/nt) as a measure of the recombination rate. These high and low recombination genes are mapped on the genome (fig. 5, line 1).

Genes with High Recombination

Among the genes with a particularly high recombination rate (table 1, red bars in fig. 5, line 1) are several genes responsible for basic cellular functions, such as biosynthesis and metabolism. For example, genes of tryptophan synthase subunit alpha (trpA, HP1277) and L-asparaginase II (ansB, HP0723) showed a high rate of recombination. Recombination breakpoints detected by the four-gamete test were found throughout these genes including functional motifs, as shown in figure 7. L-asparaginase, a putative virulence factor, inhibits host cell function and allows evasion from the immune system (Scotti et al. 2010; Shibayama et al. 2011). Also included is sdaA (HP0132) for L-serine deaminase.
F

Recombination breakpoints and functional motifs/domains of genes with a high recombination rate. A purple bar indicates a recombination breakpoint. A red bar indicates a functional motif. A black belt indicates a functional domain. The locus tags are as follows. (a) HP1277, (b) HP0723, (c) HP0651, (d) HP0009, and (e) HP1243.

Multiple genes for DNA transformation preceding mutual homologous recombination show a high rate of recombination. comE3 (HP1361) produces a homologue of Bacillus subtilis ComE3, which is essential for DNA transformation (Yeh et al. 2003). HP0033 is a member of the dprA family required for transformation by chromosomal DNA (Ando et al. 1999). Genes for a transporter and a membrane protein were also included. pnuC (HP1290) produces a membrane-associated protein involved in transport of nicotinamide mononucleotide (Zhu et al. 1991), a key intermediate of NAD biosynthesis. lolA (HP0785) produces an outer membrane lipoprotein carrier protein. glnP (HP1170) produces a glutamine ATP-binding cassette transporter, permease protein. Also included were three genes of ribosomal proteins, which represent about 6% of ribosomal protein genes in the genome. They are characterized by short gene length, and their rmin is not very large.

Genes with Low Recombination

Of the genes with particularly low recombination rate (table 2, green bars in fig. 5, line 1), there are genes involved in translation and transcription, such as rpmF (50S ribosomal protein L32), tatA (sec-independent protein translocase protein), rnpA (ribonuclease P, protein component), and HP1287 (putative transcriptional regulator). Also included were genes for lipid metabolism (pgsA), protease (clpS), and molybdenum cofactor synthesis (mogA). There are also three genes for OMPs (frpB-4, hopQ, and hopL), which represent about 6% of OMP genes in the genome. Genes of OMPs are known to have higher frequency of recombination (Kennemann et al. 2011). Their large gene length makes the rmin/nt value smaller. rpnA, clpS, HP1286 (gene of conserved hypothetical secreted protein), and two hypothetical genes (HP0902 and HP0644) are also listed in supplementary table S6 (Supplementary Material online), indicating that both rmin/nt and rmin are particularly low in these genes.

Highly Divergent Genes with a High Recombination Rate

Of the highly diverged genes where π > 0.08, we identified genes with a particularly high recombination rate as those in the top 2.5% of all the genes (red dots in fig. 6 and supplementary fig. S1, Supplementary Material online where π > 0.08, and orange bars in fig. 5, line 1). Sequence divergence inhibits homologous recombination (Fujitani and Kobayashi 1999), but these results show that some are highly divergent and yet with a high rate of recombination. On the other hand, there is no gene with particularly low recombination rates that are in the bottom 2.5% of all genes for π > 0.08. These genes with high divergence/recombination are listed in table 1 (method B). Among them, futB, hopZ, and babA are all related to cell surface and expected to be important for host interaction. futB is a fucosyltransferase gene responsible for lipopolysaccharide (LPS) synthesis. A previous study reported the high hpEurope–hspEAsia divergence of futB (Kawai et al. 2011). Genetic modifications attributable to recombination events within the futA and futB genes and between the two genes were detected under laboratory conditions (Nilsson et al. 2008). hopZ is a phase-variable adhesion gene and plays an important role for colonization (Kennemann et al. 2011). babA is responsible for adhesion of H. pylori to human gastric epithelium. Recombination in the babA locus is unique in that three allele groups are mutually replaced (Hennig et al. 2006). futB, babA, and hopZ were also listed in supplementary table S5 (Supplementary Material online) based on rmin, indicating that both rmin/nt and rmin are particularly high in these genes. Recombination breakpoints detected by the four-gamete test were found throughout the genes including functional domains and motifs, as shown in figure 7. Recombination breakpoints and functional motifs/domains of genes with a high recombination rate. A purple bar indicates a recombination breakpoint. A red bar indicates a functional motif. A black belt indicates a functional domain. The locus tags are as follows. (a) HP1277, (b) HP0723, (c) HP0651, (d) HP0009, and (e) HP1243. Another example is SH3-domain–containing protein (HP1250). It would be interesting if the divergent protein has some interaction with CagA, an SH3-binding oncoprotein, and cag4 in the list and other cag pathogenicity island proteins.

Homologous Recombination in Horizontally Transferred Genes

We examined the effects of homologous recombination on horizontally transferred genes. The candidates of horizontally transferred genes from distantly related organisms (supplementary table S3, Supplementary Material online) and aliens, such as genomic islands (supplementary table S4, Supplementary Material online), were indicated in the genome map (brown and purple rectangles in fig. 5, line 3). Among the genes with a one-to-one orthologous relationship throughout the ten strains, homologous recombination events were found in all these genes as explained above for supplementary table S2 (Supplementary Material online). There is no significant difference between their average homologous recombination rate (rmin/nt) and that of other genes (P = 0.15, Welch’s t-test). Thus, even horizontally transferred genes from distantly related organisms appear to have been shared among H. pylori strains via active homologous recombination.

Discussion

Examination of a few genes in H. pylori indicated that homologous recombination is much more frequent than point mutation, with an estimated rate as high as 6.9 × 10−5/nt/year (95% credibility region = 3.5 × 10−5–1.2 × 10−4) (Falush et al. 2001) or 5.5 (range = 0.5–16.5) × 10−5/initiation sites/year (Kennemann et al. 2011). It was also shown that clusters of polymorphisms were effectively imported into the genome via recombination, which increased the ratio of effect of recombination-derived imports and mutations to 4.3–26.7 (Kennemann et al. 2011). We thus used the four-gamete test to estimate the minimum number of recombination events and the recombination rate. This method is suitable when the recombination rate is significantly higher than the mutation rate and the pairs of segregating sites with four haplotypes arise mainly from recombination. It has been suggested that this test has a low statistical power in detecting recombination events. For example, even if the sample size is 1,000 and mutations are dense, ≤69% of all recombination events may be picked up using this test (Hein et al. 2005). However, we successfully detected recombination events in almost all (>99%) of the orthologous genes including the “core” set of genes and horizontally transferred (from distantly related organisms) genes. To the best of our knowledge, this is the first genome-wide quantitative analysis of homologous recombination in a prokaryote species using population genomic sequence data. Previous genome-wide surveys with other bacteria (Mau et al. 2006; Lefebure and Stanhope 2007; Orsi et al. 2008; Xu et al. 2011) focused on the presence or absence of recombination. In contrast, we quantitatively analyzed intragenic recombination events in each gene. We recognized the dependence of the minimum number of recombination events on nucleotide sequence length and diversity and utilized the linear relationship between nucleotide diversity and recombination rate. Such a linear relationship was reported in several genes of D. melanogaster (Begun and Aquadro 1992), but we, for the first time, revealed the relationship using all the orthologous genes in a genome. Our results clearly indicated that the “true” recombination rate is nearly constant across the genome. We then identified several genes with a particularly high or low recombination rate based on the relationship and the regression. The highly divergent genes (π > 0.08) with a particularly high recombination rate were those of OMPs and related to LPS synthesis, which are important for host interaction. It was interesting genes responsible for DNA transformation, a step preceding the mutual homologous recombination showed high recombination. High recombination genes also included basic cellular functions, such as biosynthesis and metabolism. The active homologous recombination may generate diversity to promote the adaptive evolution of these genes. An examination of this hypothesis is the subject of another publication (Yahara K, Furuta Y, Kawai M, Matelska D, Dunin-Horkawicz S, Bujnicki J, Uchiyama I, Kobayashi I, unpublished data). From genomic locations of the genes with a particularly high or low recombination rate (fig. 5, line 1), we cannot detect any obvious recombination hot spot or cold spot regions. Two types of hot spots of homologous recombination have been well characterized in bacterial genomes. One is a site for a DNA double-strand breakage (Takahashi and Kobayashi 1990), which initiates homologous recombination just as hot spots in eukaryote meiotic recombination. The other is chi (5′-GCTGGTGG) in Escherichia coli and analogous sequences in other bacterial groups (Dillingham and Kowalczykowski 2008). The chi sequence on DNA triggers switching of RecBCD enzyme from DNA degradation to recombination repair and thus serves as an ID sequence of a genome. A homolog of RecBCD enzyme (AddAB) has been characterized in H. pylori (Amundsen et al. 2008), but a cognate chi-equivalent sequence has not been identified. Helicobacter pylori carries many RM systems. Their recognition sites may serve as a recombination hot spot by DNA double-strand breakage or activate a hot spot elsewhere by providing an entry site for a RecBCD-like recombinase (Stahl et al. 1983) or a restriction enzyme (Ishikawa et al. 2009). Because repertoire of RM systems and, therefore, their recognition sites along the genome are highly variable among H. pylori strains (Furuta et al. 2011), hot spot activities related to them may not have been detected by the present method of genome comparison between the ten global strains. Recombination hot/cold spot in H. pylori genomes should be analyzed with a higher resolution in the future. A recent analysis of sequentially sampling H. pylori from the same individual (Kennemann et al. 2011) detected a wide distribution of recombination events in several parts of the genome. They found a high frequency of recombination imports in genes in the Hop family of OMPs, such as babA and hopZ (table 1 and supplementary table S5, Supplementary Material online). This study and our current study are complementary with respect to the time scale, that is, tens of years of evolution in Homo sapiens versus tens of thousands of years of evolution in H. sapiens. Moreover, using the entire data set of one-to-one orthologous genes, we identified genes with a particularly high or low recombination rate, which have not been reported previously. In conclusion, this study provides a genome-wide gene-by-gene view of homologous recombination in this highly sexual bacterial species. From this viewpoint, a species can be considered as a cohesive group of genomes that are closely connected by homologous recombination. We expect that this survey will have implications for evolutionary and population genomic studies of bacteria, which may lead to a reexamination of the species concept.

Supplementary Material

Supplementary figure S1 and tables S1–S7 are available at Genome Biology and Evolution online (http://www.gbe.oxfordjournals.org/).
  56 in total

1.  Population genomics in bacteria: a case study of Staphylococcus aureus.

Authors:  Shohei Takuno; Tomoyuki Kado; Ryuichi P Sugino; Luay Nakhleh; Hideki Innan
Journal:  Mol Biol Evol       Date:  2011-10-17       Impact factor: 16.240

2.  HyPhy: hypothesis testing using phylogenies.

Authors:  Sergei L Kosakovsky Pond; Simon D W Frost; Spencer V Muse
Journal:  Bioinformatics       Date:  2004-10-27       Impact factor: 6.937

3.  Multiple chromosomal loci for the babA gene in Helicobacter pylori.

Authors:  Ewa E Hennig; Johnna M Allen; Timothy L Cover
Journal:  Infect Immun       Date:  2006-05       Impact factor: 3.441

4.  Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands.

Authors:  Georgios S Vernikos; Julian Parkhill
Journal:  Bioinformatics       Date:  2006-07-12       Impact factor: 6.937

5.  Identification of breakpoints in intergenotypic recombinants of HIV type 1 by bootscanning.

Authors:  M O Salminen; J K Carr; D S Burke; F E McCutchan
Journal:  AIDS Res Hum Retroviruses       Date:  1995-11       Impact factor: 2.205

6.  HP0333, a member of the dprA family, is involved in natural transformation in Helicobacter pylori.

Authors:  T Ando; D A Israel; K Kusugami; M J Blaser
Journal:  J Bacteriol       Date:  1999-09       Impact factor: 3.490

7.  Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination.

Authors:  K S Lole; R C Bollinger; R S Paranjape; D Gadkari; S S Kulkarni; N G Novak; R Ingersoll; H W Sheppard; S C Ray
Journal:  J Virol       Date:  1999-01       Impact factor: 5.103

8.  Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes.

Authors:  Ikuo Uchiyama
Journal:  Nucleic Acids Res       Date:  2006-01-25       Impact factor: 16.971

9.  ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins.

Authors:  Edouard de Castro; Christian J A Sigrist; Alexandre Gattiker; Virginie Bulliard; Petra S Langendijk-Genevaux; Elisabeth Gasteiger; Amos Bairoch; Nicolas Hulo
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli.

Authors:  Bob Mau; Jeremy D Glasner; Aaron E Darling; Nicole T Perna
Journal:  Genome Biol       Date:  2006-05-31       Impact factor: 13.583

View more
  20 in total

1.  Microevolution of Virulence-Related Genes in Helicobacter pylori Familial Infection.

Authors:  Yoshikazu Furuta; Mutsuko Konno; Takako Osaki; Hideo Yonezawa; Taichiro Ishige; Misaki Imai; Yuh Shiwa; Mari Shibata-Hatta; Yu Kanesaki; Hirofumi Yoshikawa; Shigeru Kamiya; Ichizo Kobayashi
Journal:  PLoS One       Date:  2015-05-15       Impact factor: 3.240

2.  Inference of the properties of the recombination process from whole bacterial genomes.

Authors:  M Azim Ansari; Xavier Didelot
Journal:  Genetics       Date:  2013-10-30       Impact factor: 4.562

3.  Transmission of the PabI family of restriction DNA glycosylase genes: mobility and long-term inheritance.

Authors:  Kenji K Kojima; Ichizo Kobayashi
Journal:  BMC Genomics       Date:  2015-10-19       Impact factor: 3.969

4.  Chromosome painting in silico in a bacterial species reveals fine population structure.

Authors:  Koji Yahara; Yoshikazu Furuta; Kenshiro Oshima; Masaru Yoshida; Takeshi Azuma; Masahira Hattori; Ikuo Uchiyama; Ichizo Kobayashi
Journal:  Mol Biol Evol       Date:  2013-03-16       Impact factor: 16.240

5.  Horizontal gene transfer can rescue prokaryotes from Muller's ratchet: benefit of DNA from dead cells and population subdivision.

Authors:  Nobuto Takeuchi; Kunihiko Kaneko; Eugene V Koonin
Journal:  G3 (Bethesda)       Date:  2014-02-19       Impact factor: 3.154

6.  Methylome diversification through changes in DNA methyltransferase sequence specificity.

Authors:  Yoshikazu Furuta; Hiroe Namba-Fukuyo; Tomoko F Shibata; Tomoaki Nishiyama; Shuji Shigenobu; Yutaka Suzuki; Sumio Sugano; Mitsuyasu Hasebe; Ichizo Kobayashi
Journal:  PLoS Genet       Date:  2014-04-10       Impact factor: 5.917

Review 7.  To be or not to be: regulation of restriction-modification systems and other toxin-antitoxin systems.

Authors:  Iwona Mruk; Ichizo Kobayashi
Journal:  Nucleic Acids Res       Date:  2013-08-13       Impact factor: 16.971

8.  Efficient inference of recombination hot regions in bacterial genomes.

Authors:  Koji Yahara; Xavier Didelot; M Azim Ansari; Samuel K Sheppard; Daniel Falush
Journal:  Mol Biol Evol       Date:  2014-02-27       Impact factor: 16.240

9.  Genomic signatures of distributive conjugal transfer among mycobacteria.

Authors:  Tatum D Mortimer; Caitlin S Pepperell
Journal:  Genome Biol Evol       Date:  2014-08-30       Impact factor: 3.416

10.  A Novel Approach to Helicobacter pylori Pan-Genome Analysis for Identification of Genomic Islands.

Authors:  Ikuo Uchiyama; Jacob Albritton; Masaki Fukuyo; Kenji K Kojima; Koji Yahara; Ichizo Kobayashi
Journal:  PLoS One       Date:  2016-08-09       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.