Literature DB >> 21559501

Genome sequencing and analysis of Yersina pestis KIM D27, an avirulent strain exempt from select agent regulation.

Liliana Losada1, John J Varga, Jessica Hostetler, Diana Radune, Maria Kim, Scott Durkin, Olaf Schneewind, William C Nierman.   

Abstract

Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21559501      PMCID: PMC3084740          DOI: 10.1371/journal.pone.0019054

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Whole genome sequencing (WGS) has broadened the field of microbial forensics by allowing the identification of bacteria not only to the subspecies level, but also to specific sequence types [1]. With the advent of next generation sequencing (NGS) technologies, the application of whole genome sequencing data promises to be even more applicable since it is much more affordable and rapid than traditional Sanger sequencing. The data generated by NGS can be used to fully understand the lineage of naturally or maliciously constructed disease isolates, and could even yield information about past propagation and culturing conditions due to mutations that occur during passage [2], [3]. For forensic purposes, it is necessary to use sequencing methods that yield very accurate results, as well as bioinformatics tools that reveal the similarities and differences with a reference genome such as single nucleotide polymorphisms (SNPs), insertion and deletion events (INDELs), and differences in tandem repeats (VNTRs). To date, genome resequencing efforts with NGS have been used to study the evolutionary lineage of different strains within a single species [2], [4], [5], or to study the types of adaptations that occur during growth in specific conditions [3]. Furthermore, Y. pestis has also been subject of NGS sequencing studies to characterize the phylogeographic diversity within the species [6]. The study found that Y. pestis strains can be grouped into 4 specific lineages (1 root and 3 branches), that can be characterized further by lineage-specific SNPs [6]. Although the conclusions from these studies depend on the accuracy of the data, any errors inherent in the technology used can be tolerated in some applications. In contrast, for forensic applications, NGS data must be practically error-free since the data could potentially be used to identify and prosecute perpetrators of criminal or terrorist acts. Y. pestis KIM D27, the attenuated laboratory strain, is an isogenic derivative of KIM 10+ (biovar Mediaevalis) that was passaged until the pigmentation phenotype was lost due to loss of the pgm locus [7]. The virulent Y. pestis KIM 10+ was originally isolated from a human pneumonic plague infection in Iran in 1961 [8]. The 102 kb pgm locus is composed of a pathogenicity island and genes involved in pigment synthesis, and its loss results in the attenuated phenotype [7]. The pathogenicity island contains genes for biosynthesis of yersiniabactin [9] and hemin storage (hms) genes [10], that are used for iron scavenging [11], biofilm formation [12], and prevention of damage due to reactive oxygen species [13]. Since loss of the pgm locus renders a strain avirulent [14], [15], it is exempt from the US Department of Health and Human Services (HHS) Select Agent rules [16], and is commonly used in the lab as a model for Yersinia pestis studies. Prior to this report, the genome sequence of Y. pestis KIM D27 has not been reported nor has any comprehensive analysis of the details of its genome alterations relative to the fully virulent KIM 10+ parent been considered as they pertain to avirulent phenotype of KIM D27. This issue has assumed some relevance due to the death of a researcher possibly due to an accidental infection by such an attenuated strain of Y. pestis [17]. We took advantage of the fact that the genome sequence of Y. pestis KIM 10+ was available since its publication in 2002 [18], to test our NGS platforms for forensic purposes by sequencing KIM D27 genomic DNA. Whole genome sequence of KIM D27 confirms the loss of the pgm locus and allowed a detailed comparison with the published KIM 10+ sequence. Initial comparisons indicated that KIM D27 contained numerous SNPs and INDELs, but upon further sequencing and examination, these were shown to be errors in the reference KIM 10+ sequence. However, KIM D27 did possess 2 authentic SNPs and 1 repeat expansion relative to the KIM 10+ parent. The data were used to correct the genome sequence of KIM 10+. It was also concluded that the most accurate sequence for genome comparison purposes results from hybrid 454-Illumina sequencing coupled with the use of multiple bioinformatics tools for identifying differences between genome sequences from closely related strains.

Materials and Methods

Genome sequencing and assembly

Y. pestis KIM D27 genomic DNA was obtained from Olaf Schneewind at the University of Chicago. A 454 Titanium 8 kb paired-end library was constructed and used for sequencing on a full plate of 454 XLR Titanium platform following manufacturer's instructions (Roche, Branford, CT). The overall paired end ratio was 53.9% and the average coverage of the chromosome was 29x. An Illumina paired-end library was constructed and 100 bp sequencing was conducted on Illumina Genome Analyzer IIx following manufacturer's instructions (Illumina, San Diego, CA). Raw sequences from 454 only, 454 and Illumina, or 454, Illumina and Sanger from PCR verification products, were assembled de novo using the 454 Newbler assembler [19] and the Celera Assembler 6.1 [20], with default parameters. In addition, mapping assemblies using the published KIM 10+ (accession no. NC_004088) were performed with 454 Newbler, CLC Genomics Workbench (CLC bio, Cambridge, MA), or Amoscmp -shortReads [21] using default parameters. Mate pair information was used to predict the order and orientation of the contigs, confirming that no genomic rearrangements were present relative to KIM 10+.

Annotation

The KIM D27 genome was annotated through JCVI's auto-annotation pipeline. Whenever possible, annotation followed the nomenclature of KIM 10+. All ORFs ≥ 30 amino acids in length and annotated in other Yersinia genomes were added to the KIM D27 annotation.

SNP, INDELS, and VNTR analysis

The resulting hybrid 454, Illumina, and Sanger assembly was used to align against the Y. pestis KIM 10+ nucleotide sequence in CLC Genomics Workbench or Newbler assembler to detect single base-pair polymorphisms (SNP) or insertion/deletion (INDELS) using a 4 read minimum and 2000 read maximum coverage with a frequency of at least 35%, allowing for at most 3 different variants at any site. The results were filtered to exclude any SNPs or INDELs that had low confidence after assembly. All remaining changes were verified by PCR amplification and Sanger sequencing using primers listed in Table S1. SNP or INDELS were corrected to match Sanger read if it did not match the consensus sequence. Repeats were detected with REputer [22] using default parameters and variations in tandem repeat numbers were characterized with Tandem Repeat Finder [23] and CLC genomics workbench. Errors in the Y. pestis KIM 10+ genome sequence were identified by PCR amplification and Sanger sequencing. KIM 10+ genomic DNA was purchased from Biodefense and Emerging Infections Research Resources Repository (Cat No. NR-2645; Manassas, VA).

Nucleotide sequence accession numbers

The Y. pestis KIM D27 genome was deposited in GenBank under accession number ADDC00000000.

Results

Summary of sequencing and assembly results for Y. pestis KIM D27

The genomic DNA of Y. pestis KIM D27 was purified and sequenced on a full-plate of 454 Life Sciences Genome Sequencer FLX machine (Roche, Branford, CT). The 454 sequencing from 8 kb paired end libraries yielded 677,705 reads that passed filter wells with a 53.9% paired end ratio and a total chromosomal coverage of 29X. 454 sequencing data resulted in highly fragmented assemblies using either Celera Assembler 6.1 [20] or Newbler assembler [19], with slightly better results for AMOScmp (12 contigs) [21] (Table 1). The fragmentation was due in part to the complex repeat structure of Yersinia pestis genome [24]. The effect of repeats on assembly is evident because 123,000 bases of the KIM 10+ were not covered by a uniquely matching read during reference assemblies, most of these associated with repeat regions or multi-copy sequences like rRNA or transposable elements (not including the known pgm deletion discussed below [9]; data not shown). The assemblies showed that the KIM 10+ and KIM D27 genomes had the same overall structure. However, a close inspection of the assembly of 454 sequences resulted in a high number of discrepancies from the KIM 10+ reference genome, including SNPs and INDELs (Figure 1). A majority of these (82%) involved either an A or T residue and most (72%) were primarily associated with poly(A) or poly(T) sequences.
Table 1

Summary of sequencing results for Y. pestis KIM D27 and comparison with KIM10.

Sequencing technologyType of assemblyTotal number of contigs
de novo Reference
454 CA135-
Newbler306126
AMOScmp12
CLC
Illumina Newbler
CLC3784
454 + Illumina Hybrid Newbler120
AMOScmp1
CLC Reference4
Figure 1

Comparison of SNP and INDELs detection tools based on different sequence assemblies.

Assemblies of KIM D27 sequences from either 454 only, Illumina only, or hybrid assemblies (454 + Illumina) were analyzed for SNPs and INDELs as described in Materials and Methods. The Venn diagram represents the number of identified differences in either CLC Genomics Workbench or Newbler either singly (purple or blue, respectively) or the overlap between both programs (darkened area).

Comparison of SNP and INDELs detection tools based on different sequence assemblies.

Assemblies of KIM D27 sequences from either 454 only, Illumina only, or hybrid assemblies (454 + Illumina) were analyzed for SNPs and INDELs as described in Materials and Methods. The Venn diagram represents the number of identified differences in either CLC Genomics Workbench or Newbler either singly (purple or blue, respectively) or the overlap between both programs (darkened area). Because of the large number of SNPs and INDELs identified, additional sequencing was performed with a second platform to eliminate technology-specific errors, such as those associated with homopolymer tracts with the 454 platform. One lane of 100 bp Illumina sequencing was done on a paired end library, yielding over 18 million reads and 371X chromosomal coverage. Hybrid mapping assemblies were generated using Newbler reference assembler (120 contigs) [19], or CLC Genomics Workbench (3 contigs; CLC bio, Cambridge, MA) reference assembler and are summarized in Table 1. In our experience, the consensus built by AMOScmp was faulty due to its inability to efficiently handle homopolymeric bases. This assembly, while structurally matching the KIM 10+ reference, contained hundreds of consensus discrepancies with KIM 10+, and thus was not further analyzed. The sequence of KIM D27 showed that the chromosomal region from 2580534 to 2679386 of KIM 10+ that contains the pgm locus and genes involved in iron acquisition was entirely missing as reported previously [9]. The genetic loss was a consequence of genetic rearrangement across insertion (IS) elements IS100 that border the region. Other organisms, such as Burkholderia mallei, have also lost virulence due to rearrangements across IS element while cultured in the laboratory [25]–[26]. From the coverage data, the ratio between the plasmids was determined to be 1∶2∶4 for pMT-1, pCD1, and pPCP1, compared to the chromosome, respectively.

SNP, INDELs, and repeat analysis and confirmation

A major goal for forensic sequencing and other applications requiring high sequence accuracy is to identify both large and small changes in the genome that could explain the source of a strain or have implications for gene function, for example. Therefore, a detailed analysis of SNPs, INDELs, changes in numbers of repeats and gene-scale gains or losses was conducted. SNPs and INDELs were identified for the chromosome and all three plasmids using CLC Genomics Workbench or Newbler and are summarized in Figure 1. Analysis of 454 data alone produced the most SNP and INDELs (364 by CLC alone, 15 by Newbler alone, 33 by both). The Illumina data alone gave fewer SNPs (59 for CLC alone, 55 by Newbler alone, 48 by both). Hybrid assemblies including both types of data resulted in slightly higher SNPs than Illumina data alone (67 by CLC, 51 by Newbler, 51 by both). For all assemblies, alignment to the KIM 10+ reference sequence identified 3–5 additional SNPs and INDELs, leading to a total of 18 unambiguous SNPs and 38 INDELs. All of these SNPs and INDELs were amplified by PCR and confirmed by Sanger sequencing from KIM D27 source DNA. In addition to SNP/INDELs, Nucmer alignment [27] of KIM D27 de novo contigs from the CA6.1 assembly and KIM 10+ identified multiple breaks in regions of suspected differences in numbers of repeat units. REputer [22] analysis showed that 21 of these regions contained repeat elements (e.g., VNTR). After PCR and Sanger sequencing, 16 of these sequences matched the published KIM 10+ sequence. However, 4 of the repeats in KIM D27 appeared to have expanded while 1 repeat had contracted relative to the KIM 10+ published genome (Table 2). Since KIM D27 is a laboratory derived isolate of KIM 10+, the 56 SNP/INDELs and 5 repeat differences seemed suspiciously high, and it was possible that the reference KIM 10+ sequence had some errors. Therefore, Y. pestis KIM 10+ DNA was purchased from the Biodefense and Emerging Infections Research Resources Repository (Cat No. NR-2645; Manassas, VA) and used as template for PCR and Sanger sequencing. 55 of 56 SNP/INDELs and 2 repeat expansions and 1 contraction were found to be errors in the KIM 10+ sequence, since the PCR products had the same sequence as KIM D27. Because the trace reads from the KIM 10+ sequence were no longer available we are unable to determine the source of the error. One true SNP between KIM D27 and KIM 10+ was confirmed at coordinate 1505634 on KIM 10+, which results in an Alanine to Valine change in hisS (Table 2); one true additional 17 bp tandem repeat expansion was found at coordinate 3346611 upstream of the modE gene (Table 3). Lastly, one large expansion (∼3 kb) had occurred in KIM D27 within a gene for a putative outer membrane surface protein (invasin homolog, y3884) that is 9 kb in KIM 10+ and 12 kb in KIM D27 (Table 3 and Figure 2). The difference in size was due to multiple copies of a 289 bp repeat as identified by Tandem Repeat Finder [23]. Interestingly, the KIM 10+ PCR reactions appeared mixed (Figure 2), suggesting that either the genomic DNA was derived from a mixed population of repeats resulting in gene lengths from 7–13 kb or that the repeat is unstable, resulting from copy number variation on replication.
Table 2

Repeat expansion or reduction in Y. pestis KIM D27.

TypeKIM 10+ coordinatesSize in KIM 10+ (bp)KIM D27 coordinatesSize in KIM D27 (bp)CodingComment
VNTR125066–125542476 125066–125429363NTandem repeat. Five copies in KIM 10+; four in KIM D27.
VNTR704203–704423220 704091–704433342NRepeat area expanded in D27.
Expansion* 3346556–3346654983245775–3245890115N17 bp insertion (TTTCTATCTATGTTGTTA) extends N-term of modE(y3040)
IS Element4232115–4232834719 4131351–41327811430YTandem repeat of IS154. One copy in reference, two copies in PCR.
VNTR* 4328042–432922211804227881–42333355454YMultiple copies of a 289 bp motif; PCR results in ladder effect in y3884

Based on published sequence. PCR reactions showed these sites were the same length in KIM 10+ DNA as was observed for KIM D27.

These sequences were different between KIM 10+ and KIM D27.

Table 3

SNP/INDEL confirmed in Y. pestis KIM10+.

Type of substitutionCoordinateRefVarNearest geneAnnotationCodingResultHomopolymer or repetivite region
SNP (pCD1)65595AGy0087.1Nputative transposaseNrepetitive region
SNP121683AGy0111transposaseYNo amino acid change
SNP142073CGy0129 gltD; glutamate synthase beta subunitYP to A at residue 2234 C
INDEL276834Gy0258hypothetical proteinYExtends N-term of ORF3 G
INDEL327388Gy0305transposaseNPossibly extends ORF
INDEL655265Cy0579tyrB; tyrosine biosynthesisN
INDEL887688Gy0795 mrcB; murein sacculusN3 G
INDEL897819Ay0800chloride channel proteinYMoves stop 2 aa up4 A
SNP988057GTy0879putative sugar transportN
SNP1086724TCy0962hypothetical proteinYNo amino acid change
INDEL1176674Gy1043 tes; acyl-CoA thioesterase IIN5 G
SNP1184200AGy1049 acrB; multidrug efflux pumpYNo amino acid change6 A
INDEL1375636Cy1224NrdE truncationNExtends N-term of ORF7 C
INDEL/SNP1415467Ay1265 kdpA; potassium transporterYChanges balance outrepetitive region
INDEL/SNP1415469Gy1265 kdpA; potassium transporterYChanges balance outrepetitive region
SNP*1505634CTy1354hisSYA to V at residue 88
INDEL1544038Ay1389Transcriptional regulatorN3 A
INDEL1616527Gy1457hypothetical protein, putative peroxidaseN
SNP1758361GTy1590 fabB NMay change promoter
INDEL1830493Cy1655hypothetical proteinN
INDEL2006614Cy1821mgtA; magnesium transporterNrepetitive region
INDEL2006622Ay1821mgtA; magnesium transporterNrepetitive region
INDEL2006743Ty1821mgtA; magnesium transporterNrepetitive region
INDEL2006767Ty1821mgtA; magnesium transporterNrepetitive region
INDEL2006775Ty1821mgtA; magnesium transporterNrepetitive region
INDEL2006778Ty1821mgtA; magnesium transporterNrepetitive region
INDEL2006781Ty1821mgtA; magnesium transporterNrepetitive region
INDEL2006785Ty1821mgtA; magnesium transporterNrepetitive region
INDEL2006791Ty1821mgtA; magnesium transporterNrepetitive region
INDEL2006801Ty1821mgtA; magnesium transporterNrepetitive region
SNP2006807CAy1821mgtA; magnesium transporterNrepetitive region
INDEL2018833Ay1834hypothetical proteinsYUnites 2 ORFs3 A
SNP2021584CTy1834hypothetical proteinYNo amino acid change
SNP2078371CTy1880hypothetical proteinYK to N at residue 33
INDEL2253705Gy2047Tryptophan synthase alpha subunitYShorter C-term of trpA
INDEL2377088Ay2150 purU NExtends N-term of ORF3 A
INDEL2377158Ty2150purUY2nd aa, restores reading frame due to other purU change4 T
SNP2470103ACy2242chaperoneYNo amino acid change
INDEL2563999Ay2328hypothetical proteinYAnother stop available5 A
INDEL2564024Gy2328hypothetical proteinYAnother stop available3 G
SNP2786829GAy2524ftnYNo amino acid changerepetitive region
SNP2786831GAy2524ftnYS to F at residue 11repetitive region
SNP2786834GAy2524ftnYA to G at residue 10repetitive region
INDEL2959407Gy2681hypothetical proteinYJoins both ORFs3 G
SNP2978605AGy2697hypothetical proteinYV to G at residue 289
SNP2981487CTy2698hypothetical proteinYNo amino acid change
SNP2981565CGy2698hypothetical proteinYNo amino acid change
INDEL3231270Gy2925hypothetical proteinN
SNP3533369AGy3211hypothetical proteinYN to S at residue 188
INDEL3546848Ay3221PTS permeaseYExtends ORFs at C-term
INDEL3782193Cy3410Non-ribosomal peptide synthaseYChanges balance out3 C
INDEL3782201Gy3410Non-ribosomal peptide synthaseYChanges balance out4 G
INDEL3782212Ay3410Non-ribosomal peptide synthaseYChanges balance out
INDEL3824911Cy3437hypothetical proteinYJoins both ORFs
INDEL4154719Gy3736NadR disruptedYFunctional NadR3 G
INDEL4363489Gy3907hypothetical proteinYJoins both ORFs5 G
INDEL4470283Ty4034phosphoethanolamine transferaseYNo amino acid change
Figure 2

A. Nucmer alignement of putative invasin gene from KIM 10+ and KIM D27.

Regions of homology are depicted as diagonal lines. The 289 bp repeat aligns with itself numerous times and results in the square pattern. B. PCR amplification of the invasin gene from KIM 10+ and KIM D27. The chromosomal region 4228610 – 4234900 was amplified from genomic DNA from either strain. The predicted 6,290 bp product was observed as the major product from KIM D27 (lanes 1 and 2), but a ladder effect was observed when KIM 10+ DNA was used (lanes 3 and 4).

A. Nucmer alignement of putative invasin gene from KIM 10+ and KIM D27.

Regions of homology are depicted as diagonal lines. The 289 bp repeat aligns with itself numerous times and results in the square pattern. B. PCR amplification of the invasin gene from KIM 10+ and KIM D27. The chromosomal region 4228610 – 4234900 was amplified from genomic DNA from either strain. The predicted 6,290 bp product was observed as the major product from KIM D27 (lanes 1 and 2), but a ladder effect was observed when KIM 10+ DNA was used (lanes 3 and 4). Based on published sequence. PCR reactions showed these sites were the same length in KIM 10+ DNA as was observed for KIM D27. These sequences were different between KIM 10+ and KIM D27. Tables 2 and 3 show the effects of the corrections to the KIM 10+ genome annotation as a result of the identified errors. 23 of the 55 sequence corrections occurred in non-coding regions of the chromosome, while 9 others were silent substitutions within coding regions. 7 of the corrections result in amino acid substitutions. Three of the corrections introduce stop codons, although two of these occur within 2 amino acids of the C-terminus of the protein, and thus are predicted to have little effect on the structure or function of the peptide. In every case where the ORF was extended or truncated, the corrected sequence matched the published sequence of other Yersinia genomes (data not shown).

Determination of the source of the errors in the 454-only assembly

In order to ascertain the potential universality of the problems with the 454-only assembly, an analysis of candidate INDELs conflicts from other 454-only genomes sequenced at the JCVI was performed (Table 4). For each species, the closest finished genome available was chosen as the reference. SNPs were not analyzed because the reference genome did not come from the same strain. The genome sequences of query strains were compared with completed reference sequences published in GenBank, and all INDELs that could result from 454 sequencing errors were identified (Table 4). Y. pestis had among the highest average coverage (i.e. the number of reads that represent any given nucleotide) and number of INDELs, but all genomes contained potential 454-introduced errors. There was no relation between the number of homopolymeric-INDELs and % GC content of the genome. As expected, in general the higher the sequencing coverage, the lower the number of INDELs (P<0.05), although this was not true for Y. pestis.
Table 4

Search for conflicting INDELs in 454-only genomes.

OrganismTechnology% Genome Covered%GCAverage CoverageConflicting INDELsShort Sequence Repeats
E. coli strain HMP paired end99%51%29x1257
P. gingivalis W83 fragment100%48%18x2158
E. coli strain HMP fragment99%51%19x9564
M. tuberculosis strain fragment99%66%16x11878
M. tuberculosis strain fragment98%66%13x18997
M. tuberculosis strain fragment98%66%13x20694
Y. pestis KIM D27 paired end98%48%29x411246
Since the errors were mostly present at poly(A) and poly(T) sequences it is possible that the large number of errors in the KIM D27 454 sequence were due to the strain having an aberrantly high amount of these homopolymer sequences in its genome. The genomes of several species were analyzed for incidence of 4–9 base A or T sequences in CLC Genomics Workbench. Y. pestis had an A or T homopolymer frequency of 11.00 per kb, close to the average (13.23) (Figure 3) and an overall number of A or T homopolymers of 49,536 compared to the average of 52,200. This indicates that the number of 454 errors was not simply an effect of having a high frequency or occurrence of A and T homopolymers. However, the number of simple sequence repeats (SSR; i.e., motifs between 2 and 25 nucleotides that are repeated in tandem) had a positive correlation with the number of INDELs (P<0.005), suggesting that genomes with highly repetitive sequences pose significant difficulties for sequencing and assembly (Table 4). However, increasing sequencing coverage and incorporating Illumina data yielded a better genome assembly than either the Sanger only [18], or 454 only data.
Figure 3

Instances of poly(A) and poly(T) in sequenced genomes.

All 4–9 bp poly(A) and poly(T) in published genomes were counted and plotted per kilobase in each genome.

Instances of poly(A) and poly(T) in sequenced genomes.

All 4–9 bp poly(A) and poly(T) in published genomes were counted and plotted per kilobase in each genome.

Discussion

In this study we tested our next generation sequencing (NGS) platforms for the purposes of rapidly and accurately sequencing and characterizing disease causing microorganisms. Using the 454 sequencing platform we obtained the genomic sequence of Y. pestis KIM D27 within days of obtaining the genomic DNA. After various attempts at assembling the genome, we discovered that 454-only sequencing yielded large numbers of SNP, INDELS, and VNTR differences when compared to the parental genome, most within poly(A) or poly(T) homopolymer tracts. Consequently, we complemented the 454 data with Illumina sequencing data to improve the consensus sequence. We found that optimal sequencing of Y.pestis KIM D27 required hybrid assemblies of 454 paired end data and Illumina reads. The hybrid assembly also resulted in highly effective SNP identification, with neither CLC Genomics workbench nor Newbler making false positive calls. The inclusion of multiple SNP calling programs resulted in no false positives called by more than one tool, providing the most confidence in the SNPs that were called. In our experience, the primary strengths of 454 sequencing were that the long reads allow de novo assembly and detection of structural changes in a genome, and rapid sequencing and assembly (less than 1 week in the case of Y. pestis KIM D27). However, the consensus quality is not optimal due to artifacts in homopolymer tracts, especially in genomes with high numbers of short repeats. In turn, the coverage depth acquired by Illumina sequencing provides very accurate consensus sequence generation and analysis, including mapping assemblies, SNP and INDEL calling. However, Illumina sequencing requires more time to generate data, and in general, de novo assemblies are more fragmented than 454-sequence assemblies. We found that the Celera Assembler was able to handle both 454 and hybrid assemblies better and resulted in more robust, accurate assemblies than any of the other programs used. However, mapping assemblies were best performed by CLC Genomics Workbench. We also found that SNP and INDELs were best identified by using several bioinformatic tools, comparing their results, and curating the list of candidate modifications manually. Analysis of other 454-only genomes from JCVI indicated that, while the severity of the A and T homopolymer error rate was excessive for Y. pestis, the problem is likely to be present in all of the genomes. While the problem can be partially obviated by increased 454 sequencing depth, we found that the error rate was not correlated with the level of coverage, but rather with the number of simple sequence repeats present in the genome. Indeed, a recent study using 454 only sequencing of B. anthracis found that the number of potential SNP/INDELs variants among 8 derivatives from either the Ames or Sterne lineages could be as high as 97 differences [28], and was not related to the level of coverage. The authors were able to verify roughly one half of those differences as true differences using PCR and Sanger sequencing. However, there still remain one half of those differences that can be attributed to sequencing errors. Thus, it might be more cost effective to add Illumina sequencing instead of additional 454 sequencing. The Y. pestis KIM D27 genome sequence did not reveal any additional genome alterations that would contribute to its attenuated phenotype beyond those already discussed [9], [11], [14], [15]. This analysis, however, revealed that KIM 10+ and to a lesser extent KIM D27, encode a highly variable region within a putative invasin gene (y3040; Figure 2), which may confer adaptation to different hosts, or immune reactions. An additional variable region was detected in the molybdate transport transciriptional regulator, modE, which extended the N-terminus of the peptide by 17 residues. A similar extension is observed in Y. pestis Antigua and Y. pestsis Angola. Interestingly, the additional residues add a secretion signal as predicted by SignalP [29], suggesting there might be a different response to molybdates in KIM D27 than KIM 10+. Lastly, the modification to the hisS gene at residue A88V is not predicted to have a major effect on the protein, because valine is not an uncommon residue at that location among members of the HisS family. However, a modification in HisS could result in differences in the histidine biosynthesis pathway regulation between the two strains that could results through adaptation to laboratory growth. Importantly, the fact that out of the 56 SNPs and INDELs and 5 repeat size changes initially validated in KIM D27, all but 1 SNPs and 2 repeat expansions (Tables 2 and 3) were found to be sequence errors in the reference genome has significant implications for forensic microbiology and applications involving comparisons with published reference genomes. Specifically, newly observed alterations must be analyzed in both the new isolate as well as the reference genome to discard the possibility of errors in published sequences. Our predicted error levels in the recently 454-only sequenced genomes (see Table 3) combined with those found in older Sanger-only genome sequences would make it virtually impossible to achieve high confidence SNP determinations using solely 454-sequence data, eliminating the possibility of definitive conclusions about the source of disease-causing strains in an investigation such as the Anthrax attacks. Additionally, the limitations of observing large chromosomal rearrangements using only Illumina (or other short read technology), obligates the use of a combined sequencing approach for forensic investigations. List of primers used to validate SNPs and INDELs in Yersinia pestis KIM10 and KIM D27. (XLS) Click here for additional data file.
  28 in total

1.  Improved prediction of signal peptides: SignalP 3.0.

Authors:  Jannick Dyrløv Bendtsen; Henrik Nielsen; Gunnar von Heijne; Søren Brunak
Journal:  J Mol Biol       Date:  2004-07-16       Impact factor: 5.469

2.  Isolation and confirmation of Yersinia pestis mutants exempt from select agent regulations.

Authors:  Robert D Perry; Scott W Bearden
Journal:  Curr Protoc Microbiol       Date:  2008-11

3.  Tandem repeats finder: a program to analyze DNA sequences.

Authors:  G Benson
Journal:  Nucleic Acids Res       Date:  1999-01-15       Impact factor: 16.971

4.  Sequence and genetic analysis of the hemin storage (hms) system of Yersinia pestis.

Authors:  J W Lillard; J D Fetherston; L Pedersen; M L Pendrak; R D Perry
Journal:  Gene       Date:  1997-07-01       Impact factor: 3.688

5.  Identification of critical amino acid residues in the plague biofilm Hms proteins.

Authors:  Stanislav Forman; Alexander G Bobrov; Olga Kirillina; Susannah K Craig; Jennifer Abney; Jacqueline D Fetherston; Robert D Perry
Journal:  Microbiology       Date:  2006-11       Impact factor: 2.777

6.  Rapid identification of genetic modifications in Bacillus anthracis using whole genome draft sequences generated by 454 pyrosequencing.

Authors:  Peter E Chen; Kristin M Willner; Amy Butani; Shakia Dorsey; Matroner George; Andrew Stewart; Shannon M Lentz; Christopher E Cook; Arya Akmal; Lance B Price; Paul S Keim; Alfred Mateczun; Trupti N Brahmbhatt; Kimberly A Bishop-Lilly; Michael E Zwick; Timothy D Read; Shanmuga Sozhamannan
Journal:  PLoS One       Date:  2010-08-25       Impact factor: 3.240

7.  Short-term signatures of evolutionary change in the Salmonella enterica serovar typhimurium 14028 genome.

Authors:  Tyler Jarvik; Chris Smillie; Eduardo A Groisman; Howard Ochman
Journal:  J Bacteriol       Date:  2009-11-06       Impact factor: 3.490

8.  Yersinia pestis genome sequencing identifies patterns of global phylogenetic diversity.

Authors:  Giovanna Morelli; Yajun Song; Camila J Mazzoni; Mark Eppinger; Philippe Roumagnac; David M Wagner; Mirjam Feldkamp; Barica Kusecek; Amy J Vogler; Yanjun Li; Yujun Cui; Nicholas R Thomson; Thibaut Jombart; Raphael Leblois; Peter Lichtner; Lila Rahalison; Jeannine M Petersen; Francois Balloux; Paul Keim; Thierry Wirth; Jacques Ravel; Ruifu Yang; Elisabeth Carniel; Mark Achtman
Journal:  Nat Genet       Date:  2010-10-31       Impact factor: 38.330

9.  Aggressive assembly of pyrosequencing reads with mates.

Authors:  Jason R Miller; Arthur L Delcher; Sergey Koren; Eli Venter; Brian P Walenz; Anushka Brownley; Justin Johnson; Kelvin Li; Clark Mobarry; Granger Sutton
Journal:  Bioinformatics       Date:  2008-10-24       Impact factor: 6.937

10.  Characterization of clinically-attenuated Burkholderia mallei by whole genome sequencing: candidate strain for exclusion from Select Agent lists.

Authors:  Steven E Schutzer; Linda R K Schlater; Catherine M Ronning; David DeShazer; Benjamin J Luft; John J Dunn; Jacques Ravel; Claire M Fraser-Liggett; William C Nierman
Journal:  PLoS One       Date:  2008-04-30       Impact factor: 3.240

View more
  9 in total

1.  Metagenomic analysis of the viral flora of pine marten and European badger feces.

Authors:  Judith M A van den Brand; Marije van Leeuwen; Claudia M Schapendonk; James H Simon; Bart L Haagmans; Albert D M E Osterhaus; Saskia L Smits
Journal:  J Virol       Date:  2011-12-14       Impact factor: 5.103

2.  In vivo-induced InvA-like autotransporters Ifp and InvC of Yersinia pseudotuberculosis promote interactions with intestinal epithelial cells and contribute to virulence.

Authors:  Fabio Pisano; Annika Kochut; Frank Uliczka; Rebecca Geyer; Tatjana Stolz; Tanja Thiermann; Manfred Rohde; Petra Dersch
Journal:  Infect Immun       Date:  2011-12-12       Impact factor: 3.441

3.  Adhesive properties of YapV and paralogous autotransporter proteins of Yersinia pestis.

Authors:  Manoj K M Nair; Leon De Masi; Min Yue; Estela M Galván; Huaiqing Chen; Fang Wang; Dieter M Schifferli
Journal:  Infect Immun       Date:  2015-02-17       Impact factor: 3.441

Review 4.  Plague gives surprises in the first decade of the 21st century in the United States and worldwide.

Authors:  Thomas Butler
Journal:  Am J Trop Med Hyg       Date:  2013-09-16       Impact factor: 2.345

5.  Improving indel detection specificity of the Ion Torrent PGM benchtop sequencer.

Authors:  Zhen Xuan Yeo; Maurice Chan; Yoon Sim Yap; Peter Ang; Steve Rozen; Ann Siew Gek Lee
Journal:  PLoS One       Date:  2012-09-19       Impact factor: 3.240

6.  SNIT: SNP identification for strain typing.

Authors:  Ravi Vijaya Satya; Nela Zavaljevski; Jaques Reifman
Journal:  Source Code Biol Med       Date:  2011-09-08

7.  Metatranscriptome analysis of fungal strains Penicillium camemberti and Geotrichum candidum reveal cheese matrix breakdown and potential development of sensory properties of ripened Camembert-type cheese.

Authors:  Marie-Hélène Lessard; Catherine Viel; Brian Boyle; Daniel St-Gelais; Steve Labrie
Journal:  BMC Genomics       Date:  2014-03-26       Impact factor: 3.969

8.  Metagenomic analysis of the ferret fecal viral flora.

Authors:  Saskia L Smits; V Stalin Raj; Minoushka D Oduber; Claudia M E Schapendonk; Rogier Bodewes; Lisette Provacia; Koert J Stittelaar; Albert D M E Osterhaus; Bart L Haagmans
Journal:  PLoS One       Date:  2013-08-20       Impact factor: 3.240

9.  Naturally occurring recombination in ferret coronaviruses revealed by complete genome characterization.

Authors:  Mart M Lamers; Saskia L Smits; Gadissa B Hundie; Lisette B Provacia; Marion Koopmans; Albert D M E Osterhaus; Bart L Haagmans; V Stalin Raj
Journal:  J Gen Virol       Date:  2016-06-07       Impact factor: 3.891

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.