| Literature DB >> 22919648 |
Catherine D Carrillo1, Peter Kruczkiewicz, Steven Mutschall, Andrei Tudor, Clifford Clark, Eduardo N Taboada.
Abstract
Tracking of sources of sporadic cases of campylobacteriosis remains challenging, as commonly used molecular typing methods have limited ability to unambiguously link genetically related strains. Genomics has become increasingly prominent in the public health response to enteric pathogens as methods enable characterization of pathogens at an unprecedented level of resolution. However, the cost of sequencing and expertise required for bioinformatic analyses remains prohibitive, and these comprehensive analyses are limited to a few priority strains. Although several molecular typing methods are currently widely used for epidemiological analysis of campylobacters, it is not clear how accurately these methods reflect true strain relationships. To address this, we have developed a framework and associated computational tools to rapidly analyze draft genome sequence data for the assessment of molecular typing methods against a "gold standard" based on the phylogenetic analysis of highly conserved core (HCC) genes with high sequence quality. We analyzed 104 publicly available whole genome sequences (WGS) of C. jejuni and C. coli. In addition to in silico determination of multi-locus sequence typing (MLST), flaA, and porA type, as well as comparative genomic fingerprinting (CGF) type, we inferred a "reference" phylogeny based on 389 HCC genes. Molecular typing data were compared to the reference phylogeny for concordance using the adjusted Wallace coefficient (AWC) with confidence intervals. Although MLST targets the sequence variability in core genes and CGF targets insertions/deletions of accessory genes, both methods are based on multi-locus analysis and provided better estimates of true phylogeny than methods based on single loci (porA, flaA). A more comprehensive WGS dataset including additional genetically related strains, both epidemiologically linked and unlinked, will be necessary to more comprehensively assess the performance of subtyping methods for outbreak investigations and surveillance activities. Analyses of the strengths and weaknesses of widely used typing methodologies in inferring true strain relationships will provide guidance in the interpretation of this data for epidemiological purposes.Entities:
Keywords: CGF; Campylobacter spp.; MLST; flaA; genome; molecular epidemiology; porA
Mesh:
Year: 2012 PMID: 22919648 PMCID: PMC3417556 DOI: 10.3389/fcimb.2012.00057
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 5.293
Figure 1Finishing quality is reflected in the number of truncated predicted genes in . The C. jejuni genomes used in this analysis were classified according to finishing levels (reference sequence or draft), and/or sequencing platform used [Sanger open diamonds with a central period), 454 (closed diamonds), or Illumina (open diamonds)]. The number of predicted genes with truncations relative to C. jejuni NCTC11168 was calculated for each genome. The mean number of truncations for each group is represented by a line. Closed genomes had significantly fewer truncations than RefSeq genomes that were not closed and draft genomes generated with either 454 or Illumina sequencing technologies.
Figure 2The core genome phylogeny for . The 104 C. jejuni and C. coli WGS were analyzed using an automated pipeline for core genome analysis; a core gene phylogeny derived from 389 core genes is shown here. This phylogeny was compared to the underlying (A) SNP rates and (B) accessory gene content differences. High SNP rates and accessory genome content differences between C. coli and C. jejuni genomes support a deep split between the species. Conversely, small phylogenetic clusters comprised of highly similar strains are supported by lower differences in SNPs and accessory gene content. A high resolution image of this figure is available in the supplementary material.
Figure 3. Publicly available WGS data for 104 C. jejuni (A) and C. coli (B) strains were used to derive typing profiles using an in silico typing pipeline. Although the dataset is comprised of highly genetically diverse strains, there is concordance between molecular typing and phylogenetic data: strains sharing similar/identical molecular fingerprints were found clustered in the dendrogram and increasing similarity led to shorter branch lengths. CGF cluster numbers are based on 90 and 95% fingerprint identity; green is positive, red is negative. MLST alleles that could not be determined are noted as “nd.”
Comparison of metrics of subtyping method performance.
| Method | Partitions | Simpson’s ID (CI) | Phylogenetic clusters | ||
|---|---|---|---|---|---|
| 15 SNPs per 1000 bp (CI) | 10 SNPs per 1000 bp (CI) | 5 SNPs per 1000 bp (CI) | |||
| CGF40 (100%) | 82 | 0.997 (0.995−0.999) | 0.813 (0.682−0.945) | 0.833 (0.715−0.951) | 0.610 (0.411−0.810) |
| CGF40 (95%) | 53 | 0.984 (0.978−0.990) | 0.654 (0.520−0.787) | 0.644 (0.527−0.761) | 0.320 (0.203−0.437) |
| ST | 77 | 0.994 (0.990−0.998) | 1.000 (1.000−1.000) | 1.000 (1.000−1.000) | 0.727 (0.583−0.872) |
| CC | 35 | 0.860 (0.797−0.922) | 0.299 (0.146−0.452) | 0.227 (0.123−0.330) | 0.071 (0.038−0.104) |
| porA | 73 | 0.993 (0.989−0.997) | 0.515 (0.318−0.712) | 0.494 (0.311−0.678) | 0.325 (0.169−0.480) |
| flaA | 68 | 0.988 (0.980−0.997) | 0.347 (0.195−0.499) | 0.270 (0.138−0.402) | 0.221 (0.100−0.342) |
| ST-porA | 89 | 0.998 (0.995−1.000) | 1.000 (1.000−1.000) | 1.000 (1.000−1.000) | 1.000 (1.000−1.000) |
Because of missing data, only 94 isolates could be included in the analysis.
.
.
.
.
Figure 4Genotypic clusters are comprised of strains with high genomic similarity. Genotypic clusters obtained using five different molecular typing methods were analyzed for (A) average SNP rates or (B) accessory gene content differences in order to assess intra- and inter-cluster core and accessory genetic similarities, respectively. Although all methods generated genotypic clusters sharing higher core and accessory genetic similarities than expected by chance alone, multi-locus methods led to genotypic clusters with higher genetic similarity.