| Literature DB >> 30351264 |
Jack Aidley1, Joseph J Wanford1, Luke R Green1, Samuel K Sheppard2, Christopher D Bayliss1.
Abstract
Hypermutable simple sequence repeats (SSRs) are drivers of phase variation (PV) whose stochastic, high-frequency, reversible switches in gene expression are a common feature of several pathogenic bacterial species, including the human pathogen Campylobacter jejuni. Here we examine the distribution and conservation of known and putative SSR-driven phase variable genes - the phasome - in the genus Campylobacter. PhasomeIt, a new program, was specifically designed for rapid identification of SSR-mediated PV. This program detects the location, type and repeat number of every SSR. Each SSR is linked to a specific gene and its putative expression state. Other outputs include conservation of SSR-driven phase-variable genes and the 'core phasome' - the minimal set of PV genes in a phylogenetic grouping. Analysis of 77 complete Campylobacter genome sequences detected a 'core phasome' of conserved PV genes in each species and a large number of rare PV genes with few, or no, homologues in other genome sequences. Analysis of a set of partial genome sequences, with food-chain-associated metadata, detected evidence of a weak link between phasome and source host for disease-causing isolates of sequence type (ST)-828 but not the ST-21 or ST-45 complexes. Investigation of the phasomes in the genus Campylobacter provided evidence of overlapping but distinctive mechanisms of PV-mediated adaptation to specific niches. This suggests that the phasome could be involved in host adaptation and spread of campylobacters. Finally, this tool is malleable and will have utility for studying the distribution and genic effects of other repetitive elements in diverse bacterial species.Entities:
Keywords: Campylobacter; PhasomeIt; phase variation; simple sequence repeat
Mesh:
Year: 2018 PMID: 30351264 PMCID: PMC6321876 DOI: 10.1099/mgen.0.000228
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1.Overview of the processes and outputs of PhasomeIt. This flow diagram depicts the four major processes (steps 1–4) applied by PhasomeIt to detection of PV genes in genome sequences and the major outputs of this program (step 5). Step 1, input genome sequences are subject to a search for SSRs using Bossref that allows for user-defined inputs of repeat numbers for each repeat unit type and outputs the genome position and type of SSR. Bossref converts the DNA sequence into a binary code (2 bits per base), shifts the sequence to the left by twice the repeat unit length, performs an eXclusive OR (xor) with the original sequence, and identifies the repeats as a run of 0’s equal to twice the repeat length minus one. Step 2, PhasomeIt utilizes these SSR data to link the SSRs to genes, now defined as a phase variation (PV) gene, and to determine if the SSR mediates translational or transcriptional switching of gene expression. For transcriptional PV wherein an SSR is close to two genes, a bias (indicated by the magnifying-glass) is introduced to link the SSR to the 5′ end of a gene and hence in the putative promoter region. Step 3, PhasomeIt performs homology searches to identify known or putative functions of this gene followed by Step 4 wherein homology searches are performed against all genes in all submitted genome sequences in order to identify homology groups of related genes. Step 5, a range of output datasets (indicated in boxes) and visual outputs are generated providing both summary data and enabling interactive exploration of specific genes and specific amino acid sequences.
Fig. 2.Example gene group graphic with C. coli and C. fetus. This HTML output of PhasomeIt illustrates the homology of PV genes between isolates with each numbered row representing a different homology group. Coloured blocks show the presence or absence of PV genes: green is a tract located within the ORF, orange is a tract located nearby and grey is a non-PV homologue. Coloured bars at the start indicate the repeat unit on PV tracts. Faint beige bars are present to help readability and indicate absence of any homologue in that isolate. Numbered points shown in red indicate the following: 1, PV gene conserved in all C. coli strains but absent in C. fetus; 2, PV gene conserved in all C. coli and C. fetus strains; 3, PV gene present in only one isolate; 4, PV homology group present in all C. coli and C. fetus strains but only phase-variable in one isolate; 5, block of PV genes present in the majority of C. fetus isolates; 6, intragenic SSR present in only one isolate.
Fig. 3.Number of PV genes per genome sequence in each species. This graph shows the number of putatively PV genes identified for each species. For species with five or more isolates, box and whiskers are shown, while for those with fewer each point is shown. These indicate the median and interquartile range (IQR) from 25 to 75 %. The whiskers stretch to the further point within 1.5 IQRs from the ends of the box, and outlying points beyond this range are individually marked. The labelled point is for C. jejuni subsp. doylei strain 269.97.
Fig. 4.Neighbour-joining tree of phasome similarity for Campylobacter species. The tree depicts the phasome similarities created by applying a Manhattan distance metric to binary lists of presence and absence of homology groups. The tree is coloured by species. Numbers indicate strains, a full list is given in Table S4, and selected strains are: 1, C. jejuni NCTC 11168; 2, C. jejuni 81-176; 12, C. jejuni subsp. doylei 269.97; 25, C. jejuni NCTC 11351.
Core phasome of C. jejuni, C. coli, C. fetus and C. lari
| 100.0 | Maf7 | Carbonic anhydrase|| | G (7–11) |
| Cj1295 | Hypothetical protein (DUF2172 domain), putative M28 family zinc peptidase|| | G (7–10) | |
| 94.3 | Cj0045c | Hemerythrin-like iron-binding protein | G (9–11) |
| 88.6 | CipA | Invasion protein CipA | C (8–10) |
| 82.9 | UbiE_3 | SAM-dependent methyltransferase | G (8–10) |
| 71.4 | HxuB_1 | Heme/hemopexin transporter protein HuxB precursor | C (8–11) |
| 68.6 | Cj1421c | Putative sugar transferase¶ | G (8–11) |
| 65.7 | Maf1 | Motility accessory factor|| | G (7–13) |
| AnsA | C (9–11) | ||
| 100.0 | Cj0045c | Hemerythrin-like iron-binding protein | G (9–11) |
| Maf7 | Carbonic anhydrase|| | G (7–11) | |
| 90 | Cj1295 | Hypothetical protein (DUF2172 domain), putative M28 family zinc peptidase|| | G (7–10) |
| MurD | UDP- | C (9–11) | |
| N149_0842 | Hypothetical protein | G (9–11) | |
| N149_0993 | Phosphoglycerol transferase | G (8–9) | |
| HxuA | Filamentous haemagglutinin | G (8–10) | |
| VacA | Autotransporter | G (8–10) | |
| 70.0 | Cj0170 | SAM-dependent methyltransferase | G (8–13) |
| Maf1 | Motility accessory factor|| | G (7–13) | |
| lgrA | Formyl transferase domain protein | G (8–10) | |
| PJ17_06935 | 3-Oxoacyl-ACP synthase | G (7–11) | |
| 100 | CFT03427_1684 | Hypothetical protein | G (9–10) |
| Cj1295 | Hypothetical protein, putative M28 family zinc peptidase5 | G (7–10) | |
| UbiE_3 | SAM-dependent methyltransferase | G (7–10) | |
| CFT03427_0876 | SAM-dependent methyltransferase | G (8–9) | |
| CFT03427_0951 | Hypothetical protein | G (8–11) | |
| CFT03427_1021 | MCP-domain signal transduction protein | G (8–10) | |
| CFT03427_1099 | Putative membrane protein | G (9–11) | |
| CFT03427_1115 | Autotransporter domain protein | C (8–11) | |
| CFT03427_1510 | ATP-grasp domain protein | G (9–10) | |
| CFT03427_1512 | SAM-dependent methyltransferase | G (8–9) | |
| CFT03427_1562 | Probable 3-demethylubiquinone-9 3-methyltransferase | G (9–10) | |
| CFT03427_1573 | Hypothetical protein | G (9) | |
| CFT03427_1574 | Hypothetical membrane protein | G (9–10) | |
| CFT03427_1581 | SAM-dependent methyltransferase | G (8–10) | |
| 87.5 | MenA | 1,4-Dihydroxy-2-naphthoate octaprenyltransferase | G (7–9) |
| CFT03427_1442 | Transformation system protein | G (8–10) | |
| CFT03427_1545 | Radical SAM superfamily enzyme, MoaA/NifB/PqqE/SkfB | G (9–10) | |
| CFT03427_1551 | Short-chain dehydrogenase/reductase family protein | G (8–9) | |
| CFT03427_1554 | Methyltransferase | G (8–9) | |
| CFT03427_1558 | Radical SAM superfamily enzyme, MoaA/NifB/PqqE/SkfB family | G (9) | |
| CFT03427_1559 | Hypothetical protein | G (8–9) | |
| CFT03427_1565 | Hypothetical protein | G (9) | |
| CFT03427_1566 | Hypothetical protein | G (9–10) | |
| CFT03427_1577 | Hypothetical membrane protein | G (9–10) | |
| 75.0 | CFT03427_1556 | Formyltransferase domain-containing protein | G (9–10) |
| 62.5 | CFF04554_0871 | Putative type II secretion system protein | G (9) |
| CFF04554_1255 | 4HB_MCP sensor-containing MCP-domain signal transduction | G (8–9) | |
| 100.0 | UPTC4110_0710 | Hypothetical protein | G (9–11) |
| 71.4 | UPTC4110_1471 | MCP-domain signal transduction protein | C (9–11) |
*Homology groups present in 60 % or more of the genome sequences from each species with the actual percentage for each homology group being shown in the first column.
†Each group is assigned a name based on the first gene name found in the PV genes in the group, or failing that the first locus name found. These names are preferentially chosen from a manually curated order that favours well-studied species.
‡The functional assignment is based on annotation data associated with the genome sequences and is automatically obtained as described in the main text.
§Indicates the repeat tract associated with at least 90 % of genes in each function group, and the range of repeat numbers are given in parentheses.
||Known or putative flagella-modifying protein.
¶Known or putative capsular-modifying protein.
Fig. 5.Tree based on homology groups shows clear separation by ST-complex. A neighbour-joining tree was derived using a Manhattan distance between isolates based on the presence or absence of homology groups. The tree is coloured to indicate the ST-complex for each isolate. A group of genomes with poor coverage across the highly conserved PV genes is indicated in red. Numbers are isolate numbers and carry no biological significance (see Table S5).
Fig. 6.Host association in the C. jejuni STs. A neighbour-joining tree (derived as for Fig. 5) is shown for C. jejuni isolates from the ST-21 (a), ST-45 (b) and ST-823 complexes (c). Tree branches are coloured according to the source for each isolates. ST-828 isolates from cattle show significant grouping within the tree (P<0.05, tree-based scan statistic with conditional Poisson model). All other differences were non-significant.