Literature DB >> 28348813

Redefining the differences in gene content between Yersinia pestis and Yersinia pseudotuberculosis using large-scale comparative genomics.

Katy J Califf1, Paul S Keim1,2, David M Wagner1, Jason W Sahl2,1.   

Abstract

Yersinia pestis, the causative agent of plague, is best known for historical pandemics, but still actively causes disease in many parts of the world. Y. pestis is a recently derived clone of the pathogenic species Yersinia pseudotuberculosis, but is more associated with human infection. Numerous studies have documented genomic changes since the two species differentiated, although all of these studies used a relatively small sample set for defining these differences. In this study, we compared the complete genomic content between a diverse set of Y. pestis and Y. pseudotuberculosis genomes, and identified unique loci that could serve as diagnostic markers or for better understanding the evolution and pathogenesis of each group. Comparative genomics analyses also identified subtle variations in gene content between individual monophyletic clades within these species, based on a core genome single nucleotide polymorphism phylogeny that would have been undetected in a less comprehensive genome dataset. We also screened loci that were identified in other published studies as unique to either species and generally found a non-uniform distribution, suggesting that the assignment of these unique genes to either species should be re-evaluated in the context of current sequencing efforts. Overall, this study provides a high-resolution view into the genomic differences between Y. pestis and Y. pseudotuberculosis, demonstrating fine-scale differentiation and unique gene composition in both species.

Entities:  

Keywords:  Yersinia pestis; comparative genomics; evolution; genomics

Year:  2015        PMID: 28348813      PMCID: PMC5320571          DOI: 10.1099/mgen.0.000028

Source DB:  PubMed          Journal:  Microb Genom        ISSN: 2057-5858


Data Summary

1. De novo sequencing data of 118 Yersinia pestis strains are deposited in the Sequence Read Archive: http://www.ncbi.nlm.nih.gov/Traces/sra/?study = SRP003808

Impact Statement

This study employed large-scale comparative genomics to understand the genomic differences between Yersinia pseudotuberculosis and Yersinia pestis, a pathogen responsible for global human pandemics. We identified that many results from previous comparative studies were not reproduced when using a larger sample set. These results change the understanding of the evolution of Y. pestis, specifically the timing for when genomic components were gained and lost since the most recent common ancestor between the two species. Additionally, the identification of coding regions conserved in Y. pestis and missing from Y. pseudotuberculosis provides targets for functional studies to better understand the pathogenicity and spread of Y. pestis. Whilst the results of this study will primarily appeal to researchers who study Yersinia spp., the methods employed provide a framework to better understand the evolution of the pan-genome in any bacterial species.

Introduction

Yersinia pestis, the causative agent of plague, is a recently evolved clone of Yersinia pseudotuberculosis serotype O : 1b (Achtman ). Y. pestis is best known for causing three pandemics, including the black death (Achtman ), the plague of Justinian (Wagner ) and the third pandemic, where Y. pestis was spread worldwide (Cui ). Y. pestis still actively causes disease in many parts of the world, including North America (Wagner ) and Madagascar (Ratovonjato ). Disease caused by Y. pestis is generally treatable with antimicrobials (Bonacorsi ), although acquired resistance has been observed in isolated cases in Madagascar (Galimand ); details of the plasmid that confers the multidrug-resistant phenotype have been published (Welch ). A lot of work has focused on genomic and phenotypic differences between Y. pestis and Y. pseudotuberculosis. One of the most striking differences is the presence in most Y. pestis of two plasmids, pMT1 and pCP1, that are absent from all Y. pseudotuberculosis (Hu ). Y. pseudotuberculosis is still pathogenic (Rosqvist ), and Y. pestis and Y. pseudotuberculosis share genes associated with pathogenesis, including the pYV plasmid (also known as pCD1) and the ail locus (Reuter ), which is associated with attachment and invasion. A filamentous phage, YpfΦ, has been characterized in some Y. pestis, but is absent from all known Y. pseudotuberculosis genomes (Derbise & Carniel, 2014); this phage has been associated with dissemination in mice (Derbise ). Additionally, mutations have been described in Y. pestis compared with Y. pseudotuberculosis that have led to flea-borne transmission of Y. pestis (Sun ) due to enhanced biofilm formation. In this study, we compared the entire pan-genomes of both species with the large-scale blast score ratio (LS-BSR) pipeline (Sahl ) to better understand the distribution and conservation of identified coding regions. Smaller-scale comparisons between Y. pestis and Y. pseudotuberculosis have been published (Achtman ; Chain ; Duan ; Pouillot ), but LS-BSR on a large set of genomes enabled us to rapidly compare the genomic content between species. These comparisons enabled us to identify fine-scale resolution in the global propagation of the Y. pestis clone. We also screened previously published genes against this comprehensive dataset to determine how well these previous studies captured the gene distribution across Y. pestis and Y. pseudotuberculosis.

Methods

Genomes analysed

Except where noted otherwise, our comparisons included 133 Y. pestis genomes and 13 Y. pseudotuberculosis genomes, as well as five plasmids (Table S1, available in the online Supplementary Material). To improve the assembly contiguity for 118 genomes recently published (Cui ) and assembled with Velvet (Zerbino & Birney, 2008), raw reads were downloaded and assembled with SPAdes version 3.1.0 (Bankevich ). This method improved the contiguity of all but one assembly and increased the assembly size (Table S2). The SPAdes assemblies are publicly available and named by the Sequence Read Archive accession number (https://github.com/jasonsahl/YP_genomes.git).

Core genome single nucleotide polymorphism (SNP) phylogeny

Genome assemblies from external genomes were aligned against the finished genome of Colorado 92 (CO92) (GenBank accession number NC_003143) with NUCmer (Delcher ). The reference was also aligned against itself to identify duplicated regions, which were then removed for SNP comparisons. SNPs were identified from genome assemblies by a direct mapping of each query base to the corresponding reference base. Raw reads from a previously published study (Cui ) were mapped to CO92 with bwa-mem (Li, 2013) and SNPs were called with the UnifiedGenotyper method in the Genome Analysis Toolkit (DePristo ; McKenna ); any SNP with a coverage of < 10 or an allele proportion < 90 % were filtered from downstream analyses. These methods were wrapped by the Northern Arizona SNP Pipeline (NASP) (http://tgennorth.github.io/NASP/) (Engelthaler ). A phylogeny was inferred on the concatenated SNP alignment using RAxML version 8 (Stamatakis, 2014) and the ASC_GTRGAMMA model (Lewis correction) that incorporates an ascertainment bias correction. The retention index (RI) was calculated with phangorn (Schliep, 2011).

Comparative genomics

All coding regions were predicted from all genome assemblies with Prodigal (Hyatt ). These coding regions were concatenated, then clustered and de-replicated using usearch (Edgar, 2010) at a pairwise identity of 0.9. Each cluster representative, or the cluster centroid, was then aligned against each genome using either tblastn (Altschul ) or blat (Kent, 2002) in conjunction with LS-BSR (Sahl ); the different methods were used to examine the conservation of either nucleotide or peptide sequences. LS-BSR values can range from 0 (no alignment) to 1 (100 % identity). To identify correlations between coding region conservation and metadata (e.g. isolation location), LS-BSR values were first multiplied by 100 in order to convert all float values to integers and these adjusted BSR (Rasko ) values were then correlated with categorical data using a Kruskal–Wallis test implemented in qiime version 1.9.0 (Caporaso ), which also incorporated false detection rate (FDR) adjusted P values. Categories tested included: branch location within the Y. pestis phylogeny (0–4), subclade in the Y. pestis phylogeny (e.g. 0.PE4Ba), country of strain origin, latitude of strain origin, number of nodes to the most recent common ancestor (MRCA; Y. pseudotuberculosis) and number of SNPs to the MRCA. Coding regions that showed positive correlations (FDR P < 0.05) were aligned against the GenBank (Benson ) nr database with tblastn (Altschul ) for annotation. Pan-genome stats were calculated with the pan_genome_stats.py script in the LS-BSR repository (https://github.com/jasonsahl/LS-BSR/tree/master/tools).

Distribution of annotated genes

From reviewing the relevant literature, we identified a panel of genes associated with Y. pestis virulence (Chain ; Derbise ; Lillard ; Pouillot ; Radnedge ; Reuter ; Schubert ; Sodeinde ; Sun ) (Table S3), and screened these for conservation across 133 contemporary genomes of Y. pestis and 13 Y. pseudotuberculosis genomes using LS-BSR in conjunction with tblastn. In general, a gene was considered conserved if it had BSR>0.8, which is ∼80 % peptide identity over 100 % of the peptide length (Rasko ). The conservation of genes was visualized as a heatmap correlated with a phylogeny using the Interactive Tree Of Life (iTOL) (Letunic & Bork, 2007).

Read mapping to confirm gene presence/absence

To confirm the presence or absence of specific coding regions, raw sequence reads were mapped to the reference sequence with bwa-mem. The coverage across each base was calculated with the GenomeCoverageBed method in BEDTools (Quinlan & Hall, 2010). Presence was dependent on the breadth of coverage across the reference at a minimum depth of coverage (DOC) (default value of × 3 was used); a script to wrap these methods is available (https://gist.github.com/3206314afe510b9e2cbb.git). Differences in mean DOC were visualized with Circos (Krzywinski ).

Multiple sequence alignment analysis

For cases where individual mutations were investigated, the gene sequence was pulled out of all genome assemblies directly from blastn alignments (Altschul ), aligned with muscle (Edgar, 2004) and visualized with Jalview (Waterhouse ).

Results

Core genome phylogeny

A core genome SNP phylogeny was inferred to demonstrate the relationship between Y. pestis genomes screened in the current study. The resulting phylogeny (Fig. 1) was similar in clade membership to a previous study that used the same set of genomes (Cui ). The RI (Farris, 1989) of the concatenated SNP alignment was 0.99, indicating very little homoplasy in the underlying data. A phylogeny was also inferred from an alignment including 13 Y. pseudotuberculosis genomes as well as the 133 Y. pestis genomes, rooted with Yersinia enterolitica 8081 (GenBank accession number NC_008800) (Fig. S1); for visualization, SNPs specific to Y. enterolitica were removed from the alignment. The phylogeny demonstrated the increased phylogenetic diversity in Y. pseudotuberculosis relative to Y. pestis. The RI of this alignment was 0.85, demonstrating significant homoplasy, likely resulting from recombination.
Fig. 1.

Loss of genes across Branch 2 genomes across the Y. pestis phylogeny. Differentially conserved genes were identified through associations of metadata and the LS-BSR (Sahl ) matrix using qiime (Caporaso ). The phylogeny was inferred with RAxML (Stamatakis, 2014) on a concatenation of SNPs identified with NASP (Engelthaler ). The phylogeny was associated with the LS-BSR values using iTOL (Letunic & Bork, 2007).

Loss of genes across Branch 2 genomes across the Y. pestis phylogeny. Differentially conserved genes were identified through associations of metadata and the LS-BSR (Sahl ) matrix using qiime (Caporaso ). The phylogeny was inferred with RAxML (Stamatakis, 2014) on a concatenation of SNPs identified with NASP (Engelthaler ). The phylogeny was associated with the LS-BSR values using iTOL (Letunic & Bork, 2007).

In silico screen of the Y. pseudotuberculosis pan-genome across 133 Y. pestis genomes

To understand the pan-genome differences between Y. pseudotuberculosis and Y. pestis, the pan-genome was identified with LS-BSR for all Y. pseudotuberculosis genomes (n = 13), resulting in 5469 unique coding regions (Fig. 2a). Of these regions, seven were found to be highly conserved (BSR>0.8) in all Y. pseudotuberculosis genomes and missing (BSR < 0.4) from all Y. pestis genomes (Table 1); this demonstrated that most of the Y. pseudotuberculosis core genome was also conserved in at least one Y. pestis genome. Y. pestis has been associated with genome reduction (Chain ) based on host adaptation, but this small number of regions represents either the limited number of ancestral genes that are no longer required by Y. pestis, genes that are no longer needed but have not been purged or the relatively short time since the split between Y. pestis and Y. pseudotuberculosis.
Fig. 2.

Pan-genome plots demonstrating the acquisition of genes with additional genomic sequencing in (a) Y. pseudotuberculosis and (b) Y. pestis. For each analysis, a number of randomly selected genomes were parsed at different genome depths from the LS-BSR (Sahl ) matrix, and the number of coding regions with BSR>0.8 in all genomes was identified and plotted.

Table 1

Annotation of coding regions unique to either Y. pestis or Y. pseudotuberculosis

AnnotationGenBank accession no.Group
Hypothetical proteinWP_038824878Y. pseudotuberculosis
Uracil transporterWP_038824557Y. pseudotuberculosis
Hypothetical proteinWP_011192493Y. pseudotuberculosis
Aldehyde dehydrogenaseWP_011192485Y. pseudotuberculosis
TonB-dependent vitamin B12 receptorACA70332Y. pseudotuberculosis
Hypothetical proteinWP_038824929Y. pseudotuberculosis
Hypothetical proteinABS48865Y. pseudotuberculosis
Hypothetical proteinYPO0396Y. pestis
IntegraseYOP2083Y. pestis
Hypothetical proteinYPO0387Y. pestis
Hypothetical proteinYPO0388Y. pestis
Hypothetical proteinYPO3437Y. pestis
XRE family transcriptional regulatorWP_002214362Y. pestis
Transcriptional regulatorYPO4031Y. pestis
TransposaseWP_002214360Y. pestis
Outer membrane receptorYPO3910Y. pestis
Hypothetical proteinYPO3948Y. pestis
Hypothetical proteinYPO0394aY. pestis
Hypothetical proteinYPO0394Y. pestis
IntegraseYPO4033Y. pestis
PseudogeneYPO4029Y. pestis
Hypothetical proteinYPO0397Y. pestis
Pan-genome plots demonstrating the acquisition of genes with additional genomic sequencing in (a) Y. pseudotuberculosis and (b) Y. pestis. For each analysis, a number of randomly selected genomes were parsed at different genome depths from the LS-BSR (Sahl ) matrix, and the number of coding regions with BSR>0.8 in all genomes was identified and plotted. The pan-genome of Y. pseudotuberculosis was screened across 133 Y. pestis genomes (Table S1) using the LS-BSR pipeline. Of all coding regions screened, 21 were found to have significantly lower BSR values (fdr adjusted P < 0.05) among Branch 2 genomes in the Y. pestis phylogeny when compared with other branches (Fig. 1), indicating loss of these coding regions since evolving from Y. pseudotuberculosis. Annotation of these regions using the CO92 annotation demonstrated that 19 out of 21 were known products, many involved with flagellar composition and function (Table 2); many of these coding regions have also been lost in Pestoides F (Fig. 1). One coding region was conserved in the 2.ANT lineage, but was absent from the 2.MED lineage (Fig. 1). Annotation of this sequence is associated with a toxin described in Y. pseudotuberculosis (locus tag WP_011192526); similar types of genes have recently been described as atypical toxins (Koskiniemi ). The 2.MED lineage contains genomes collected over a long time frame (1958–2006), with almost all of them isolated from China, suggesting local adaptation, niche specialization or a lack of sampling from other parts of the world.
Table 2

Annotation of regions lost by Branch 2 genomes

LocusAnnotationLocus tag
fliDFlagellar hook-associated proteinYPO0740
fliSFlagellar proteinYPO0741
YPO0742Hypothetical proteinYPO0742
YPO0744Flagellar biogenesis proteinYPO0744
fliAFlagellar biosynthesis sigma factorYPO0745
motAFlagellar motor protein MotAYPO0746
motBHypothetical proteinYPO0747
YPO0749Hypothetical proteinYPO0749
YPO0750Hypothetical proteinYPO0750
YPO0751Hypothetical proteinYPO0751
YPO0754Hypothetical proteinYPO0754
YPO1380MFS family transporter proteinYPO1380
YPO2315Hypothetical proteinYPO2315
PcpLipoproteinYPO2373
YPO2375Aldo/keto reductaseYPO2375
YPO2376Hypothetical proteinYPO2376
sepCInsecticial toxinYPO2380
YPO2493Dioxygenase subunit alphaYPO2493

In silico screen of the Y. pestis pan-genome across Y. pseudotuberculosis genomes

Based on default values in LS-BSR, the Y. pestis pan-genome was found to consist of 5227 unique coding regions (Fig. 2b). A screen of these coding regions against all Y. pseudotuberculosis genomes demonstrated that there were 15 coding regions present in all Y. pestis genomes (BSR>0.8) and missing from all Y. pseudotuberculosis genomes (BSR < 0.4) (Table 1); none of these regions was associated with any of the known Y. pestis plasmids. In the pan-genome, there were 725 coding regions that were conserved in one or more Y. pestis genome and missing from all Y. pseudotuberculosis genomes. Of the total Y. pestis-specific coding regions, 156 of these coding regions were associated with Y. pestis plasmids (BSR>0.8). Of the remaining coding regions, 294 were only conserved in a single genome, suggesting possible contamination, sequencing artefacts or horizontally acquired genomic regions. For example, 228 of these unique regions were identified in the draft genome assembly of Orientalis IP275 and are most likely associated with a plasmid that confers resistance to multiple antimicrobials (Welch ). The remaining Y. pestis-specific coding regions (n = 275) were conserved in two or more genomes (Table S4). Chain identified a total of 112 Y. pestis-specific genes and screened them across a panel of 19 Y. pestis strains. They found 32 of these regions to be conserved in all of their Y. pestis strains, but in none of the nine screened Y. pseudotuberculosis strains. An in silico screen of these regions (Table S3) against our collection of genomes demonstrated that only five genes were completely conserved across the Y. pestis phylogeny (BSR>0.8 in 133 genomes) (Table S5). One of the regions (WP_002213869) was conserved (BSR = 1) in two Y. pseudotuberculosis genomes (B-7194, CBKS0000000; B-7195, CBKR00000000) in our collection.

In silico screen of previously annotated genes

LS-BSR analysis demonstrated that the 11 genes from a previously described Y. pestis filamentous phage (Derbise ) associated with mouse dissemination were conserved in all genomes within the 1.ORI group (Fig. 3); this region was also identified in one genome (YN472) from the 1.IN3 group and a portion of the phage was identified in a genome (H1958004) from the 2.MED group (Fig. 3). A read mapping analysis confirmed the presence of portions of this phage in these two genomes, although the DOC was significantly lower compared with the coverage in 1.ORI genomes. For example, in genome fig1946001 from the 1.ORI group, the mean DOC across the genome was × 58 and the coverage across the phage was × 243. In contrast, the DOC across the genome of YN472 from the 1.IN group was × 108 and only × 5 across the phage. This lower level of coverage could be due to copy number variation or cross-sample contamination during sequencing and would need to be confirmed with targeted assays. If confirmed, the presence of this phage in the 1.IN3 group would suggest that this lineage gave rise to 1.ORI, as has been proposed previously (Cui ). All coding regions from the filamentous phage were missing (BSR < 0.4) from all Y. pseudotuberculosis genomes. Li identified three of these phage ORFs in a strain that was included in our panel (A1956001), but our analysis identified these ORFs as missing (BSR = 0.0) in the genome assembly. We validated the absence of these ORFs in this assembly with read mapping.
Fig. 3.

Variable distribution of previously characterized genes across the Y. pestis phylogeny (Table S3). The phylogeny was inferred with RAxML (Stamatakis, 2014) on a concatenation of SNPs identified with NASP (Engelthaler ). The phylogeny was associated with the LS-BSR values using iTOL (Letunic & Bork, 2007).

Variable distribution of previously characterized genes across the Y. pestis phylogeny (Table S3). The phylogeny was inferred with RAxML (Stamatakis, 2014) on a concatenation of SNPs identified with NASP (Engelthaler ). The phylogeny was associated with the LS-BSR values using iTOL (Letunic & Bork, 2007). From a pathogenesis perspective, one of the genomic regions associated with virulence is the HPI (high pathogenicity island), a component of the pgm locus (Schubert ). Prodigal was used to predict coding regions from this region (AL031866) and all coding regions (n = 93) were screened against all genomes with LS-BSR. The results demonstrate that a large section (∼65 coding regions) of the pgm locus was consistently missing from multiple (∼30) Y. pestis genomes (Fig. S2); the pgm was largely conserved across screened Y. pseudotuberculosis genomes. A subset of the pgm locus, the hmsHFRS operon, was reported to be necessary for the synthesis of Y. pestis biofilm in fleas (Sun ) and therefore flea-borne transmission of the pathogen. Sun further stated that both Y. pestis and Y. pseudotuberculosis contained a fully functional hmsHFRS operon. However, our LS-BSR analysis of the four hmsHFRS loci (hmsH, hmsF, hmsR and hmsS; GenBank accession number U22837) (Lillard ) demonstrated that these genes were conserved in only a portion (106/133) of our Y. pestis panel. In contrast, these four loci were present in all of our screened Y. pseudotuberculosis genomes. Reuter reported that the chromosomal type III secretion system (abbreviated Ygt for the Yersinia genus type III secretion system) is present in all Yersinia species, but in the process of being lost from the pathogenic lineages (Reuter ). They screened 32 regions (Table S3) of Ygt across three and 31 Y. pestis and Y. pseudotuberculosis strains, respectively, and found them to be variably distributed across these two species (range 0–84.68 %; mean ± sd 45.68 ± 32.45 %). In our global LS-BSR analysis of these 32 regions in our panel of genomes, we found 30 of 32 regions to be missing (BSR < 0.1) from all screened Y. pestis and Y. pseudotuberculosis genomes. The remaining two regions (strain YE21202, coordinates 3177984:3178778 and 3178768:3179844) were not highly conserved across the majority of genomes in both species (BSR < 0.5). These two regions were categorized as putative type III secretion proteins. The pla gene (locus tag YPPCP1.07) on the pPCP1 plasmid encodes a cell surface protease/plasminogen activator that is essential for invasiveness at the flea-bite site following Y. pestis transmission (Sodeinde ; Sun ). The LS-BSR analysis demonstrated the conservation (BSR = 1) of this gene in most contemporary Y. pestis strains (n = 125/133) (Fig. 3); the absence of this locus in some genomes may represent the loss of the pPCP1 plasmid due to laboratory passage (Perry & Fetherston, 1997). The pla gene was missing from all Y. pseudotuberculosis genomes surveyed (BSR = 0). Pouillot screened 47 Y. pestis and 31 Y. pseudotuberculosis strains for five regions and four ORFs (Table S3) that were specific to Y. pseudotuberculosis. They concluded that despite their loss in Y. pestis, five out of nine of these (ORF2, ORF3, ORF4, R1, R3) were important to survival, growth or virulence in Y. pseudotuberculosis. An in silico screen of these five loci demonstrated that R1 was conserved in Y. pestis Branches 0.PE2, 0.PE3 and 0.PE4A only, suggesting independent loss in both 0.PE7 and all other branches of Y. pestis (Fig. 3). ORF2 was present in Branch 0.PE7 only (Fig. 3), suggesting that the deletion of this region happened after the split between Y. pseudotuberculosis and Y. pestis. In Y. pseudotuberculosis, ORF2 and R1 were highly conserved (BSR>0.98) across all sequenced genomes, but ORF3 and ORF4 were only conserved (BSR>0.8) in a portion of Y. pseudotuberculosis genomes (ORF3 in four of 13; ORF4 in seven of 13). Chain reported the loss of a lipid A acyltransferase gene htrB (locus tag YPTB2490) in all Y. pestis since its divergence from Y. pseudotuberculosis. However, our LS-BSR analysis demonstrated that this gene was conserved in the two most ancestral contemporary strains of Y. pestis on Branch 0.PE7 (620024 and CMCC05009) (Fig. 3). This gene was also highly conserved in all Y. pseudotuberculosis genomes (BSR ≥ 0.99). Chain also reported nine coding regions (YPTB3450–YPTB3459) conserved in Y. pseudotuberculosis, representing a region that encodes several haemolysin genes absent from Y. pestis. Our in silico screen of these nine regions in our panel of Y. pestis assemblies demonstrated variation in the conservation of these regions across assemblies in both species (Table S6). Five of these regions were missing across all screened Y. pestis genomes (BSR = 0) and the remaining four demonstrate low homology across Y. pestis genomes (BSR = 0–0.86). In our screened Y. pseudotuberculosis genomes, all of these regions were completely conserved (BSR = 1) in only two of 13 genomes. Chain also reported frameshift mutations in two virulence factor genes characterized in Y. pestis, srfA and srfB (YPTB2212 and YPTB2213, respectively, both discontinued in the National Center for Biotechnology Information database) that are conserved in Y. pseudotuberculosis. The authors speculated that if these mutations influence protein function, these loci might be involved in species-specific virulence. Our in silico screen of these regions demonstrated that both loci are conserved in all Y. pestis genomes, although small deletions observed in the multiple sequence alignment could result in frameshift mutations; these deletions were observed in only a portion of Y. pseudotuberculosis genomes (Fig. S3). In Y. pseudotuberculosis, srfA was conserved in all screened genomes (mean BSR = 0.98), but srfB was not conserved (BSR < 0.51) in six out of 13 genomes. In their 2014 paper, Sun detailed three loss-of-function mutations that increased the transmissibility of Y. pestis via flea bite. We screened both the Y. pestis Kim10 and Y. pseudotuberculosis IP32953 homologues for PDE2, PDE3 and rcsA (Table S3). Whilst Sun reported that PDE3 was conserved in all Y. pestis genomes except for Angola and Pestoides F strains, we found that PDE3 was also missing from genome Antigua UG05-0454, Branch 0.PE2 and Branch 0.ANT3 (Fig. S4); we also found that PDE3 was missing (BSR < 0.2) from six of 13 Y. pseudotuberculosis genomes. Sun also reported a 30 bp tandem duplication in rcsA that was present in all Y. pestis except for Branch 0 strain Pestoides A. We screened the Y. pestis version of this gene (YP02449) in a multiple sequence alignment and found the duplication to be missing in additional genomes across the Y. pestis phylogeny (C1975003, M0000002, SHAN11, YN1683, Antigua, Antigua UG05, CMCC96007, CMCC11001, CMCC03001) (Fig. S5); the tandem duplication was confirmed to be missing from all Y. pseudotuberculosis genomes. Whilst PDE2 was highly conserved across all Y. pestis genomes, it was missing from one Y. pseudotuberculosis genome (B-6863), although this may be have been due to an assembly error. The reported frameshift was found in all Y. pestis genomes and none of the Y. pseudotuberculosis genomes. Difference region (DFR) 4 is a 15 kb genomic island that is known to be lost in some Y. pestis strains (i.e. CO92) and contains genes that may play a role in virulence. Radnedge investigated the presence of DFR4 in 78 Y. pestis and four Y. pseudotuberculosis strains, and found it to be conserved in only 13 of the Y. pestis strains but in all four Y. pseudotuberculosis strains. LS-BSR of 16 coding regions of DFR4 demonstrated them all to be missing from 48 genomes, including the entire 0.PE2, 0.PE3, 0.PE4A, 1.IN3, 1.ORI and 2.MED1 branches. One coding region (AAL27378) was not highly conserved in all 133 screened Y. pestis genomes (BSR < 0.7) and one (AAL27386) was conserved in 27 (33 %) of the genomes; the remaining 14 coding regions were found in 85 (64 %) Y. pestis genomes. Fourteen out of 16 coding regions were conserved in all screened Y. pseudotuberculosis genomes, whilst the other two coding regions were variably distributed. Our LS-BSR analysis of the remaining DFRs (DFR1–3, 5 and 6) described by Radnedge demonstrated that DFR5 was completely lost in all Y. pestis genomes (Table 3). In DFR2, three out of four coding regions had been lost across all Y. pestis genomes. For the remaining DFRs, two out of three regions of each DFR were demonstrated to be lost across all Y. pestis genomes, but the third was found to be conserved in a portion of Y. pestis genomes: 65 % in DFR1 (AF333796), 87 % in DFR3 (AF333802), 80 % in DFR6 (AF333812) and 99 % in DFR2 (AF333801). In Y. pseudotuberculosis, DFR5 was missing from all screened genomes and aff333802 of DFR3 was conserved in all Y. pseudotuberculosis genomes. However, similar to what was found in Y. pestis, three out of four regions in DFR2 were missing across all Y. pseudotuberculosis genomes and two out of three regions of the remaining DFRs were missing across all Y. pseudotuberculosis genomes. The remaining one region within each DFR was present in 46 % in DFR1 (AF333796), 54 % in DFR2 (AF333801) and 92 % in DFR6 (AF333812) across all Y. pseudotuberculosis genomes.
Table 3

Comparison of screening results between previous studies and the current study

Gene/regionLocus tag(s)ReferencePrevious results (%)Results from current study (%)
Y. pestisY. pseudotuberculosisY. pestis* (n = 133)Y. pseudotuberculosis* (n = 13)
Filamentous phageYPO2271–YPO2280Derbise et al. (2007)nana100 (1.ORI)0
Y. pestis specific genesSee Table S3Chain et al. (2004)100 (n = 19)0 (n = 9)18–1002–15
YgtSee Table S3Reuter et al. (2014)nana00
hmsHFRSU22837Sun et al. (2014)10010080100
PlaYPPCP1.07nanana940
Y. pseudotuberculosisSpecific genesSee Table S3Pouillot et al. (2008)0 (n = 47)100 (n = 31)0–140–100
htrBYPTB2490Chain et al. (2004)0 (n = 19)100 (n = 9)1.50100
HaemolysinYPTB3450–YPTB3459Chain et al. (2004)0 (n = 19)100 (n = 9)0–923–46
DFR4AF426171Radnedge et al. (2002)17 (n = 78)100 (n = 4)0–8538–100
DFR1See Table S3Radnedge et al. (2002)79 (n = 78)100 (n = 4)2214
DFR2AF333798–AF333801Radnedge et al. (2002)97 (n = 78)0 (n = 4)2516
DFR3AF333802–AF333804Radnedge et al. (2002)72 (n = 78)100 (n = 4)2832
DFR5AF333808–AF333810Radnedge et al. (2002)68 (n = 78)0 (n = 4)00
DFR6AF333811–AF333813Radnedge et al. (2002)90 (n = 78)100 (n = 4)2630
PDE2 frameshiftYPTB1308Sun et al. (2014)100 (n = 29)01000
PDE3 presenceYPTB3308Sun et al. (2014)93 (n = 29)na9354
rcsA duplicationYPTB2486Sun et al. (2014)97 (n = 29)0930
ail attachment locusYE1820Reuter et al. (2014)70–90 (n = 3)70–90 (n = 31)00
YmtY1069Sun et al. (2014)nana0–1000
srfA frameshiftYPTB2212Chain et al. (2004)nana100100
srfB frameshiftYPTB2213Chain et al. (2004)nana10054

na, Results not reported.

* Presence based on BSR>0.8.

na, Results not reported. * Presence based on BSR>0.8. Although the focus of this work was on contemporary strains, ancient genomes are now available from the first and second pandemics (Table S1); the genome sequences from the second pandemic were generated with a capture array using CO92 as the reference, which may generate false negatives for genes not present in this genome. Reads from ancient genomes were mapped across the genes screened in this study and variable distribution was observed (Table S7). With ancient genomes, the low and variable coverage prohibits any firm conclusions about the presence or absence of screened genes.

Distribution of plasmids across Y. pestis

One of the defining features of Y. pestis is the acquisition of multiple plasmids that are absent from Y. pseudotuberculosis. We screened each coding region for each plasmid (Table S1) against 133 Y. pestis genomes to look at their distribution; by only considering coding regions, we could make no comment about the synteny of plasmid structure between different isolates. The results demonstrated that pPCP1 and pCD1 are broadly conserved across Y. pestis genomes (Table S8), although they have been intermittently lost in select genomes, likely due to laboratory passage. The pMT plasmid was also intermittently conserved, although a large region (∼16.7 kb) of the plasmid appears to be missing (Supplementary Data File 1) in a monophyletic clade within the 1.IN2 group (1.IN2h–1IN2p); annotation of missing coding regions was associated with phage-related genes. Deleted regions can easily be visualized by mapping reads across the plasmid between a genome with the region (Y. pestis C1975003) and a closely related genome missing the region (Y. pestis 5) (Fig. 4).
Fig. 4.

Variable distribution of raw sequence reads from two 1.IN genomes across the pMT plasmid. Reads were mapped to the plasmid with bwa-mem (Li, 2013) and the mean DOC was calculated with the GenomeCoverageBed method in BEDTools (Quinlan & Hall, 2010). The differences in DOC were visualized with Circos (Krzywinski ).

Variable distribution of raw sequence reads from two 1.IN genomes across the pMT plasmid. Reads were mapped to the plasmid with bwa-mem (Li, 2013) and the mean DOC was calculated with the GenomeCoverageBed method in BEDTools (Quinlan & Hall, 2010). The differences in DOC were visualized with Circos (Krzywinski ).

Discussion

Y. pestis is the causative agent of plague and a recently derived clone of Y. pseudotuberculosis. The evolution of Y. pestis has been associated with the acquisition of virulence plasmids, mutations that allow for transmission via flea bites and the loss of genes due to host adaptation. In this study, we used publicly available genomes to comprehensively characterize the gain, loss and variable composition in the Y. pestis pan-genome following its separation from the common ancestor with Y. pseudotuberculosis. The evolution of Y. pestis has been described as largely clonal (Achtman, 2008), likely due to the specialized habitat of Y. pestis. The core genome SNP phylogeny of Y. pestis confirmed this observation, with very little homoplasy detected. However, an analysis of the pan-genome demonstrated that the gene content can be widely variable and does not necessarily resemble the evolution of other clonal pathogens, such as Bacillus anthracis (Van Ert ). Therefore, the evolution of Y. pestis can be thought of as a clonal expansion with regard to the core genome, with a pan-genome characterized by differential gene loss and acquisition of mobile genetic elements including plasmids and bacteriophages. A comparison of the pan-genomes between Y. pestis and Y. pseudotuberculosis demonstrated that the Y. pseudotuberculosis pan-genome is larger (Fig. 2), suggesting either that coding regions have been lost in Y. pestis due to adaptation or the genomic diversity of Y. pseudotuberculosis is much greater. Comparative genomics identified coding regions both lost by all Y. pestis compared with Y. pseudotuberculosis, as well as coding regions unique to Y. pestis. Whilst some of these regions were associated with acquired plasmids, others were associated with the chromosome, suggesting acquisition after the differentiation from Y. pseudotuberculosis. These unique regions could serve as diagnostic markers for either field or clinical differentiation between the two species. These species-specific markers did not overlap well with previously published markers, suggesting that our larger sample set provides greater resolution into the dynamics of pan-genome structures. Annotation of Y. pestis specific coding regions was largely associated with hypothetical proteins (Table 2), which presents targets for functional studies investigating the pathogenesis, evolution and spread of Y. pestis. Other studies have investigated the genomic differences between Y. pestis and Y. pseudotuberculosis based on smaller-scale analyses (Achtman ; Chain ; Duan ; Pouillot ). Our results demonstrate that several genomic features reported to be missing from Y. pestis, such as htrB, are conserved in a subset of Y. pestis genomes (Fig. 3). Other genomic features, such as coding regions of the pMT plasmid, are variably conserved (Fig. 4), but this result was concordant with the core genome SNP phylogeny. These observations would not have been possible without analysing a comprehensive panel of Y. pestis genomes. Additional sequencing of Y. pestis isolates from under-sampled geographical regions may further refine gene conservation and distribution in Y. pestis. In addition to screening previously characterized genes, we also performed a de novo analysis to identify genes that were associated with different categorical data. The most associated pattern of gene conservation was in Branch 2 genomes, where several genes have been lost compared to other contemporary Y. pestis isolates. Many of these genes were associated with the flagellum, which suggests that these genes are being lost because Y. pestis does not appear to be motile (Brubaker, 1991). Continued surveillance of this pathogen will determine if these genes will be lost by additional lineages in the future. A recent study characterized three mutations in Y. pestis compared with Y. pseudotuberculosis that result in enhanced biofilm formation and transmissibility (Sun ). However, our analysis demonstrated variation in the presence of these mutations in the Y. pestis panel screened, suggesting that the lack of these mutations does not completely shut down the transmissibility of these isolates, other mechanisms are also responsible for biofilm formation or these strains are still transmitted by fleas, but just less efficiently. These questions will need to be addressed with additional experimentation. Overall, this study highlights the necessity of sequencing and analysing a large number of genomes from a given species to understand the distribution of genes, especially in relation to close near-neighbours of a given species. These types of comparative studies can focus functional studies that aim at understanding the global evolution of a given species. Although this study is focused on Y. pestis, the methods can be applied to any organism to characterize the gene flow, including gain and loss, across a comprehensive set of sequenced genomes.
  52 in total

1.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

Review 2.  Resistance of Yersinia pestis to antimicrobial agents.

Authors:  Marc Galimand; Elisabeth Carniel; Patrice Courvalin
Journal:  Antimicrob Agents Chemother       Date:  2006-10       Impact factor: 5.191

Review 3.  Evolution, population structure, and phylogeography of genetically monomorphic bacterial pathogens.

Authors:  Mark Achtman
Journal:  Annu Rev Microbiol       Date:  2008       Impact factor: 15.500

4.  Historical variations in mutation rate in an epidemic pathogen, Yersinia pestis.

Authors:  Yujun Cui; Chang Yu; Yanfeng Yan; Dongfang Li; Yanjun Li; Thibaut Jombart; Lucy A Weinert; Zuyun Wang; Zhaobiao Guo; Lizhi Xu; Yujiang Zhang; Hancheng Zheng; Nan Qin; Xiao Xiao; Mingshou Wu; Xiaoyi Wang; Dongsheng Zhou; Zhizhen Qi; Zongmin Du; Honglong Wu; Xianwei Yang; Hongzhi Cao; Hu Wang; Jing Wang; Shusen Yao; Alexander Rakin; Yingrui Li; Daniel Falush; Francois Balloux; Mark Achtman; Yajun Song; Jun Wang; Ruifu Yang
Journal:  Proc Natl Acad Sci U S A       Date:  2012-12-27       Impact factor: 11.205

5.  Prevalence of the "high-pathogenicity island" of Yersinia species among Escherichia coli strains that are pathogenic to humans.

Authors:  S Schubert; A Rakin; H Karch; E Carniel; J Heesemann
Journal:  Infect Immun       Date:  1998-02       Impact factor: 3.441

6.  Sequence and genetic analysis of the hemin storage (hms) system of Yersinia pestis.

Authors:  J W Lillard; J D Fetherston; L Pedersen; M L Pendrak; R D Perry
Journal:  Gene       Date:  1997-07-01       Impact factor: 3.688

7.  Complete genome sequence of Yersinia pestis strains Antiqua and Nepal516: evidence of gene reduction in an emerging pathogen.

Authors:  Patrick S G Chain; Ping Hu; Stephanie A Malfatti; Lyndsay Radnedge; Frank Larimer; Lisa M Vergez; Patricia Worsham; May C Chu; Gary L Andersen
Journal:  J Bacteriol       Date:  2006-06       Impact factor: 3.490

8.  Jalview Version 2--a multiple sequence alignment editor and analysis workbench.

Authors:  Andrew M Waterhouse; James B Procter; David M A Martin; Michèle Clamp; Geoffrey J Barton
Journal:  Bioinformatics       Date:  2009-01-16       Impact factor: 6.937

9.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity.

Authors:  Robert C Edgar
Journal:  BMC Bioinformatics       Date:  2004-08-19       Impact factor: 3.169

10.  Different region analysis for genotyping Yersinia pestis isolates from China.

Authors:  Yanjun Li; Erhei Dai; Yujun Cui; Min Li; Yujiang Zhang; Mingshou Wu; Dongsheng Zhou; Zhaobiao Guo; Xiang Dai; Baizhong Cui; Zhizhen Qi; Zuyun Wang; Hu Wang; Xingqi Dong; Zhizhong Song; Junhui Zhai; Yajun Song; Ruifu Yang
Journal:  PLoS One       Date:  2008-05-14       Impact factor: 3.240

View more
  4 in total

1.  Identification of Microorganisms by Liquid Chromatography-Mass Spectrometry (LC-MS1) and in Silico Peptide Mass Libraries.

Authors:  Peter Lasch; Andy Schneider; Christian Blumenscheit; Joerg Doellinger
Journal:  Mol Cell Proteomics       Date:  2020-09-30       Impact factor: 5.911

2.  Crohn's Disease: Is the Cold Chain Hypothesis Still Hot?

Authors:  Jean-Pierre Hugot; Anne Dumay; Frédérick Barreau; Ulrich Meinzer
Journal:  J Crohns Colitis       Date:  2021-04-06       Impact factor: 9.071

3.  The Evolution of a Specialized, Highly Virulent Fish Pathogen through Gene Loss and Acquisition of Host-Specific Survival Mechanisms.

Authors:  Laura Baseggio; Oleksandra Silayeva; Jan Engelstädter; Andrew C Barnes
Journal:  Appl Environ Microbiol       Date:  2022-07-11       Impact factor: 5.005

4.  The Effects of Signal Erosion and Core Genome Reduction on the Identification of Diagnostic Markers.

Authors:  Jason W Sahl; Adam J Vazquez; Carina M Hall; Joseph D Busch; Apichai Tuanyok; Mark Mayo; James M Schupp; Madeline Lummis; Talima Pearson; Kenzie Shippy; Rebecca E Colman; Christopher J Allender; Vanessa Theobald; Derek S Sarovich; Erin P Price; Alex Hutcheson; Jonas Korlach; John J LiPuma; Jason Ladner; Sean Lovett; Galina Koroleva; Gustavo Palacios; Direk Limmathurotsakul; Vanaporn Wuthiekanun; Gumphol Wongsuwan; Bart J Currie; Paul Keim; David M Wagner
Journal:  mBio       Date:  2016-09-20       Impact factor: 7.867

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.