| Literature DB >> 24676150 |
Guillaume Méric1, Koji Yahara2, Leonardos Mageiros1, Ben Pascoe1, Martin C J Maiden3, Keith A Jolley3, Samuel K Sheppard1.
Abstract
The increasing availability of hundreds of whole bacterial genomes provides opportunities for enhanced understanding of the genes and alleles responsible for clinically important phenotypes and how they evolved. However, it is a significant challenge to develop easy-to-use and scalable methods for characterizing these large and complex data and relating it to disease epidemiology. Existing approaches typically focus on either homologous sequence variation in genes that are shared by all isolates, or non-homologous sequence variation--focusing on genes that are differentially present in the population. Here we present a comparative genomics approach that simultaneously approximates core and accessory genome variation in pathogen populations and apply it to pathogenic species in the genus Campylobacter. A total of 7 published Campylobacter jejuni and Campylobacter coli genomes were selected to represent diversity across these species, and a list of all loci that were present at least once was compiled. After filtering duplicates a 7-isolate reference pan-genome, of 3,933 loci, was defined. A core genome of 1,035 genes was ubiquitous in the sample accounting for 59% of the genes in each isolate (average genome size of 1.68 Mb). The accessory genome contained 2,792 genes. A Campylobacter population sample of 192 genomes was screened for the presence of reference pan-genome loci with gene presence defined as a BLAST match of ≥ 70% identity over ≥ 50% of the locus length--aligned using MUSCLE on a gene-by-gene basis. A total of 21 genes were present only in C. coli and 27 only in C. jejuni, providing information about functional differences associated with species and novel epidemiological markers for population genomic analyses. Homologs of these genes were found in several of the genomes used to define the pan-genome and, therefore, would not have been identified using a single reference strain approach.Entities:
Mesh:
Year: 2014 PMID: 24676150 PMCID: PMC3968026 DOI: 10.1371/journal.pone.0092798
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1The reference pan-genome approach.
Conceptual pipeline showing the approach to generate a list of unique genes from more than one reference strain. Step 1: Compiling a gene list from reference genomes reflecting strain diversity from public repositories such as NCBI, or after automatic annotation on assembled contiguous segments (for example using RAST). Step 2: Comparative list analysis to remove duplicate genes that show ≥70% sequence identity over ≥50% of the sequence of another gene in the list. Step 3: Creating a final reference pan-genome list.
Publicly-available genomes used to produce a Campylobacter reference pan-genome.
| Strain name | Lineage | Annotated genes | Genome size (Mbp) | NCBI Accession |
|
| ST-21 complex | 1,670 | 1.64 | NC_002163.1 |
|
| ST-42 complex | 1,812 | 1.7 | NC_008787.1 |
|
| ST-283 complex | 1,681 | 1.63 | NC_009839.1 |
|
| ST-45 complex | 1,675 | 1.62 | NC_017280.1 |
|
| Clade 3 | 1,556 | 1.58 | NC_022132.1 |
|
| Clade 1 | 1,747 | 1.73 | NC_022347.1 |
|
| ST-1845 | 2,037 | 1.85 | NC_009707.1 |
| Total size | - | 12,178 | 11.75 | - |
|
| - | 3,933 | 3.72 | - |
Figure 2Phylogenetic tree of 192 Campylobacter genomes and novel epidemiological markers.
Maximum-likelihood tree of 130 C. jejuni and 62 C. coli genomes. Isolates belonging to C. jejuni are shown in blue, and those belonging to C. coli clade 1 are indicated in red, clade 2 in yellow, and clade 3 in green. The scale bar indicates the estimated number of substitutions per site. Example genomes from C. coli clades 1-3 and C. jejuni ST-21, ST-45, ST-353 and ST-61 clonal complexes were used to define the 7 isolate reference pan-genome gene list. The number of epidemiological markers from this list is indicated for each lineage. The asterisk indicates that markers were not found to be absolutely specific to that lineage, but were also present at low frequency in other lineages. Details about the markers are shown in Table 2 and Table 3.
Functional categories of genes present in the reference pan-genome and in the reference genome of C. jejuni NCTC11168.
| Functional category | Reference pan-genome |
|
| Protein Metabolism | 328 (12.2%) | 213 (14.9%) |
| Cell Wall and Capsule | 318 (11.8%) | 125 (8.7%) |
| Cofactors, Vitamins, Prosthetic Groups, Pigments | 285 (10.6%) | 134 (9.4%) |
| Amino Acids and Derivatives | 283 (10.5%) | 181 (12.6%) |
| Virulence, Disease and Defense | 167 (6.2%) | 67 (4.7%) |
| Respiration | 155 (5.8%) | 72 (5.0%) |
| Motility and Chemotaxis | 155 (5.8%) | 86 (6.0%) |
| DNA Metabolism | 148 (5.5%) | 66 (4.6%) |
| RNA Metabolism | 132 (4.9%) | 65 (4.5%) |
| Carbohydrates | 111 (4.1%) | 62 (4.3%) |
| Membrane Transport | 106 (3.9%) | 51 (3.6%) |
| Iron acquisition and metabolism | 96 (3.6%) | 43 (3.0%) |
| Fatty Acids, Lipids, and Isoprenoids | 92 (3.4%) | 64 (4.5%) |
| Stress Response | 76 (2.8%) | 42 (2.9%) |
| Nucleosides and Nucleotides | 63 (2.3%) | 52 (3.6%) |
| Cell Division and Cell Cycle | 38 (1.4%) | 21 (1.5%) |
| Regulation and Cell signaling | 33 (1.2%) | 16 (1.1%) |
| Phosphorus Metabolism | 30 (1.1%) | 20 (1.4%) |
| Nitrogen Metabolism | 21 (0.8%) | 13 (0.9%) |
| Potassium metabolism | 20 (0.7%) | 16 (1.1%) |
| Regulons | 12 (0.4%) | 5 (0.3%) |
| Secondary Metabolism | 7 (0.3%) | 4 (0.3%) |
| Sulfur Metabolism | 5 (0.2%) | 5 (0.3%) |
| Miscellaneous | 4 (0.1%) | 4 (0.3%) |
| Metabolism of Aromatic Compounds | 3 (0.1%) | 3 (0.2%) |
| Dormancy and Sporulation | 1 (0.0%) | 1 (0.1%) |
|
|
|
|
Prevalence of C. coli and C. jejuni associated genes from a comparison of 192 genomes.
| Gene identifier | Description | Detailed functional categories | Gene prevalence | Species association | ||||||||
|
|
| All | All | |||||||||
| Clade 1 (n = 47) | Clade 2 (n = 4) | Clade 3 (n = 5) | ST-21 (n = 41) | ST-45 (n = 28) | ST-353 (n = 7) | ST-61 (n = 6) | ||||||
| Cc76339__00005c | Methyl-accepting chemotaxis protein, putative | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__01340 | Cytolethal distending toxin subunit C | Cytolethal distending toxins | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__01460c | 2-methylcitrate dehydratase (EC 4.2.1.79) | Methylcitrate cycle | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__01470c | 2-methylcitrate synthase (EC 2.3.3.5) | Methylcitrate cycle | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__01480c | Methylisocitrate lyase (EC 4.1.3.30) | Methylcitrate cycle | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__01490c | Propionate—CoA ligase (EC 6.2.1.17) / Acetyl-coenzyme A synthetase (EC 6.2.1.1) | Methylcitrate cycle; Pyruvate metabolism | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__01750 | Highly acidic protein | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__02240 | Integral membrane protein TerC | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__03250 | hypothetical protein | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__04670 | probable periplasmic protein Cj0093, putative | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__09670 | Hypothetical protein Cj1162c | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__10710 | Small hydrophobic protein | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__10950 | FIG00469427: hypothetical protein | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__11130 | Putative periplasmic protein | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__11470 | Uncharacterized protein Cj0990c | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__11500c | Surface-exposed lipoprotein JlpA | Adhesion | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__12660c | Zinc ABC transporter, periplasmic-binding protein ZnuA | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__12670 | Peroxide stress regulator / Ferric uptake regulation protein | Oxidative stress; Iron Metabolism | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__12940 | CoA-binding domain protein | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__15800 | Methionine synthase II (cobalamin-independent) | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| Cc76339__15900c | FIG00469900: hypothetical protein | - | 47 | 4 | 5 | 0 | 0 | 0 | 0 | 62 | 0 |
|
| 11168_Cj0011c | Periplasmic dsDNA and ssDNA-binding protein contributing to transformation | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0090 | Putative lipoprotein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0135 | Hypothetical protein Cj0135 | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0186c | Integral membrane protein TerC | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0327 | Putative translation initiation inhibitor, yjgF family | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0339 | Putative transmembrane transport protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0340 | Inosine-uridine preferring nucleoside hydrolase (EC 3.2.2.1) | Purine conversions; Queuosine-Archaeosine Biosynthesis | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0414 | FIG00471287: hypothetical protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0454c | membrane protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0494 | FIG00469900: hypothetical protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0873c | Cytochrome c family protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj0900c | Small hydrophobic protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj1021c | Putative periplasmic protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj1036c | FIG00469427: hypothetical protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj1060c | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
| |
| 11168_Cj1162c | Hypothetical protein Cj1162c | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj1666c | CopG protein | Copper homeostasis | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_Cj1714 | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
| |
| 11168_ctsT | Transformation system protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_kdpD | Osmosensitive K+ channel histidine kinase KdpD (EC 2.7.3.-) | Potassium homeostasis | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| 11168_tonB2 | Ferric siderophore transport system, periplasmic binding protein TonB | Iron Metabolism | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| Cj_81-176_1820 | Putative transmembrane transport protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| Cj_81-176_6530 | FIG00469465: hypothetical protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
| Cj_81-176_8530 | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
| |
| Cj_81-176_8535 | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
| |
| Cj81116_1523 | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
| |
| Cjdoleyi_26997_0913 | Small hydrophobic protein | - | 0 | 0 | 0 | 41 | 28 | 7 | 6 | 0 | 130 |
|
Figure 3Rarefaction and accumulation curve estimates of C. jejuni and C. coli core and pan-genomes.
The number of shared genes (A), and the total number of genes (B and C), were determined as genome sampling increased. Comparisons were made based on matrices of gene presence/absence, derived from the reference pan-genome list, for C. coli (62 genomes), C. jejuni (130 genomes) and the two species combined (192 genomes). Randomized genome sampling was carried out 100 times to obtain the average number of genes for each sample comparison number (plain lines) and standard deviations (dotted lines). Pan-genome size estimates were calculated using the reference pan-genome (B) or the NCTC11168 annotation (C).
Lineage associated genes in C. coli and C. jejuni from a comparison of 192 genomes.
| Gene identifier | Description | Detailed functional categories | Gene prevalence | Species/clade association | ||||||||
|
|
| All | All | |||||||||
| Clade 1 (n = 47) | Clade 2 (n = 4) | Clade 3 (n = 5) | ST-21 (n = 41) | ST-45 (n = 28) | ST-353 (n = 7) | ST-61 (n = 6) | ||||||
|
| 3-oxoacyl-[acyl-carrier protein] reductase | Fatty Acid Biosynthesis | 47 | 4 | 0 | 0 | 0 | 0 | 0 | 53 | 0 |
|
|
| Hypothetical protein | - | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 4 | 0 |
|
|
| Biotin sulfoxide reductase | - | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 5 | 1 |
|
|
| Putative secreted serine protease | - | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 5 | 1 |
|
|
| Putative cytochrome C-type haem-binding periplasmic protein | - | 0 | 0 | 5 | 0 | 0 | 0 | 0 | 5 | 1 |
|
|
| Aldehyde dehydrogenase | L-rhamnose utilization | 43 | 0 | 0 | 41 | 0 | 0 | 0 | 48 | 59 |
|
|
| Transcriptional regulator | Aromatic compound degradation | 38 | 0 | 0 | 41 | 0 | 0 | 0 | 43 | 58 |
|
|
| Putative oxidoreductase | - | 43 | 0 | 0 | 41 | 0 | 0 | 0 | 48 | 59 |
|
|
| Fucose permease | L-fucose utilization | 43 | 0 | 0 | 41 | 0 | 0 | 0 | 48 | 59 |
|
|
| Predicted metal-dependent hydrolase of the TIM-barrel fold | - | 42 | 0 | 0 | 41 | 0 | 0 | 0 | 47 | 59 |
|
|
| Hypothetical protein | - | 43 | 0 | 0 | 41 | 0 | 0 | 0 | 48 | 59 |
|
|
| Putative lyase | - | 43 | 0 | 0 | 41 | 0 | 0 | 0 | 48 | 59 |
|
|
| Altronate hydrolase | D-Galacturonate and D-Glucuronate Utilization | 43 | 0 | 0 | 41 | 0 | 0 | 0 | 48 | 59 |
|
|
| Putative periplasmic protein | - | 0 | 0 | 0 | 0 | 28 | 0 | 0 | 0 | 48 |
|
|
| Hypothetical protein | - | 15 | 0 | 0 | 0 | 0 | 7 | 0 | 16 | 27 |
|
|
| hypothetical protein | - | 11 | 0 | 0 | 0 | 0 | 7 | 0 | 12 | 21 |
|
|
| Death-on-curing family protein | - | 5 | 0 | 0 | 0 | 0 | 7 | 0 | 6 | 21 |
|
|
| Hypothetical protein | - | 33 | 3 | 0 | 1 | 0 | 0 | 6 | 42 | 11 |
|
|
| Membrane protein | - | 40 | 1 | 0 | 2 | 0 | 0 | 6 | 46 | 20 |
|
|
| Membrane protein | - | 38 | 1 | 0 | 2 | 0 | 0 | 6 | 44 | 20 |
|