| Literature DB >> 21559501 |
Liliana Losada1, John J Varga, Jessica Hostetler, Diana Radune, Maria Kim, Scott Durkin, Olaf Schneewind, William C Nierman.
Abstract
Yersinia pestis is the causative agent of the plague. Y. pestis KIM 10+ strain was passaged and selected for loss of the 102 kb pgm locus, resulting in an attenuated strain, KIM D27. In this study, whole genome sequencing was performed on KIM D27 in order to identify any additional differences. Initial assemblies of 454 data were highly fragmented, and various bioinformatic tools detected between 15 and 465 SNPs and INDELs when comparing both strains, the vast majority associated with A or T homopolymer sequences. Consequently, Illumina sequencing was performed to improve the quality of the assembly. Hybrid sequence assemblies were performed and a total of 56 validated SNP/INDELs and 5 repeat differences were identified in the D27 strain relative to published KIM 10+ sequence. However, further analysis showed that 55 of these SNP/INDELs and 3 repeats were errors in the KIM 10+ reference sequence. We conclude that both 454 and Illumina sequencing were required to obtain the most accurate and rapid sequence results for Y. pestis KIMD27. SNP and INDELS calls were most accurate when both Newbler and CLC Genomics Workbench were employed. For purposes of obtaining high quality genome sequence differences between strains, any identified differences should be verified in both the new and reference genomes.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21559501 PMCID: PMC3084740 DOI: 10.1371/journal.pone.0019054
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Summary of sequencing results for Y. pestis KIM D27 and comparison with KIM10.
| Sequencing technology | Type of assembly | Total number of contigs | |
|
| Reference | ||
|
| CA | 135 | - |
| Newbler | 306 | 126 | |
| AMOScmp | − | 12 | |
| CLC | − | − | |
|
| Newbler | − | − |
| CLC | 378 | 4 | |
|
| Newbler | − | 120 |
| AMOScmp | − | 1 | |
| CLC Reference | − | 4 | |
Figure 1Comparison of SNP and INDELs detection tools based on different sequence assemblies.
Assemblies of KIM D27 sequences from either 454 only, Illumina only, or hybrid assemblies (454 + Illumina) were analyzed for SNPs and INDELs as described in Materials and Methods. The Venn diagram represents the number of identified differences in either CLC Genomics Workbench or Newbler either singly (purple or blue, respectively) or the overlap between both programs (darkened area).
Repeat expansion or reduction in Y. pestis KIM D27.
| Type | KIM 10+ coordinates | Size in KIM 10+ (bp) | KIM D27 coordinates | Size in KIM D27 (bp) | Coding | Comment |
| VNTR | 125066–125542 | 476 | 125066–125429 | 363 | N | Tandem repeat. Five copies in KIM 10+; four in KIM D27. |
| VNTR | 704203–704423 | 220 | 704091–704433 | 342 | N | Repeat area expanded in D27. |
| Expansion | 3346556–3346654 | 98 | 3245775–3245890 | 115 | N | 17 bp insertion ( |
| IS Element | 4232115–4232834 | 719 | 4131351–4132781 | 1430 | Y | Tandem repeat of IS154. One copy in reference, two copies in PCR. |
| VNTR | 4328042–4329222 | 1180 | 4227881–4233335 | 5454 | Y | Multiple copies of a 289 bp motif; PCR results in ladder effect in y3884 |
Based on published sequence. PCR reactions showed these sites were the same length in KIM 10+ DNA as was observed for KIM D27.
These sequences were different between KIM 10+ and KIM D27.
SNP/INDEL confirmed in Y. pestis KIM10+.
| Type of substitution | Coordinate | Ref | Var | Nearest gene | Annotation | Coding | Result | Homopolymer or repetivite region |
| SNP (pCD1) | 65595 | A | G | y0087.1N | putative transposase | N | repetitive region | |
| SNP | 121683 | A | G | y0111 | transposase | Y | No amino acid change | |
| SNP | 142073 | C | G | y0129 |
| Y | P to A at residue 223 | 4 C |
| INDEL | 276834 | − | G | y0258 | hypothetical protein | Y | Extends N-term of ORF | 3 G |
| INDEL | 327388 | G | − | y0305 | transposase | N | Possibly extends ORF | |
| INDEL | 655265 | − | C | y0579 | tyrB; tyrosine biosynthesis | N | ||
| INDEL | 887688 | − | G | y0795 |
| N | 3 G | |
| INDEL | 897819 | A | − | y0800 | chloride channel protein | Y | Moves stop 2 aa up | 4 A |
| SNP | 988057 | G | T | y0879 | putative sugar transport | N | ||
| SNP | 1086724 | T | C | y0962 | hypothetical protein | Y | No amino acid change | |
| INDEL | 1176674 | G | − | y1043 |
| N | 5 G | |
| SNP | 1184200 | A | G | y1049 |
| Y | No amino acid change | 6 A |
| INDEL | 1375636 | − | C | y1224 | NrdE truncation | N | Extends N-term of ORF | 7 C |
| INDEL/SNP | 1415467 | A | − | y1265 |
| Y | Changes balance out | repetitive region |
| INDEL/SNP | 1415469 | − | G | y1265 |
| Y | Changes balance out | repetitive region |
| SNP* | 1505634 | C | T | y1354 | hisS | Y | A to V at residue 88 | |
| INDEL | 1544038 | − | A | y1389 | Transcriptional regulator | N | 3 A | |
| INDEL | 1616527 | − | G | y1457 | hypothetical protein, putative peroxidase | N | ||
| SNP | 1758361 | G | T | y1590 |
| N | May change promoter | |
| INDEL | 1830493 | C | − | y1655 | hypothetical protein | N | ||
| INDEL | 2006614 | − | C | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006622 | A | − | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006743 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006767 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006775 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006778 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006781 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006785 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006791 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2006801 | − | T | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| SNP | 2006807 | C | A | y1821 | mgtA; magnesium transporter | N | repetitive region | |
| INDEL | 2018833 | A | − | y1834 | hypothetical proteins | Y | Unites 2 ORFs | 3 A |
| SNP | 2021584 | C | T | y1834 | hypothetical protein | Y | No amino acid change | |
| SNP | 2078371 | C | T | y1880 | hypothetical protein | Y | K to N at residue 33 | |
| INDEL | 2253705 | G | − | y2047 | Tryptophan synthase alpha subunit | Y | Shorter C-term of | |
| INDEL | 2377088 | A | − | y2150 |
| N | Extends N-term of ORF | 3 A |
| INDEL | 2377158 | T | − | y2150 | purU | Y | 2nd aa, restores reading frame due to other | 4 T |
| SNP | 2470103 | A | C | y2242 | chaperone | Y | No amino acid change | |
| INDEL | 2563999 | − | A | y2328 | hypothetical protein | Y | Another stop available | 5 A |
| INDEL | 2564024 | − | G | y2328 | hypothetical protein | Y | Another stop available | 3 G |
| SNP | 2786829 | G | A | y2524 | ftn | Y | No amino acid change | repetitive region |
| SNP | 2786831 | G | A | y2524 | ftn | Y | S to F at residue 11 | repetitive region |
| SNP | 2786834 | G | A | y2524 | ftn | Y | A to G at residue 10 | repetitive region |
| INDEL | 2959407 | G | − | y2681 | hypothetical protein | Y | Joins both ORFs | 3 G |
| SNP | 2978605 | A | G | y2697 | hypothetical protein | Y | V to G at residue 289 | |
| SNP | 2981487 | C | T | y2698 | hypothetical protein | Y | No amino acid change | |
| SNP | 2981565 | C | G | y2698 | hypothetical protein | Y | No amino acid change | |
| INDEL | 3231270 | − | G | y2925 | hypothetical protein | N | ||
| SNP | 3533369 | A | G | y3211 | hypothetical protein | Y | N to S at residue 188 | |
| INDEL | 3546848 | − | A | y3221 | PTS permease | Y | Extends ORFs at C-term | |
| INDEL | 3782193 | − | C | y3410 | Non-ribosomal peptide synthase | Y | Changes balance out | 3 C |
| INDEL | 3782201 | − | G | y3410 | Non-ribosomal peptide synthase | Y | Changes balance out | 4 G |
| INDEL | 3782212 | A | − | y3410 | Non-ribosomal peptide synthase | Y | Changes balance out | |
| INDEL | 3824911 | C | − | y3437 | hypothetical protein | Y | Joins both ORFs | |
| INDEL | 4154719 | G | − | y3736 | NadR disrupted | Y | Functional NadR | 3 G |
| INDEL | 4363489 | G | − | y3907 | hypothetical protein | Y | Joins both ORFs | 5 G |
| INDEL | 4470283 | T | − | y4034 | phosphoethanolamine transferase | Y | No amino acid change |
Figure 2A. Nucmer alignement of putative invasin gene from KIM 10+ and KIM D27.
Regions of homology are depicted as diagonal lines. The 289 bp repeat aligns with itself numerous times and results in the square pattern. B. PCR amplification of the invasin gene from KIM 10+ and KIM D27. The chromosomal region 4228610 – 4234900 was amplified from genomic DNA from either strain. The predicted 6,290 bp product was observed as the major product from KIM D27 (lanes 1 and 2), but a ladder effect was observed when KIM 10+ DNA was used (lanes 3 and 4).
Search for conflicting INDELs in 454-only genomes.
| Organism | Technology | % Genome Covered | %GC | Average Coverage | Conflicting INDELs | Short Sequence Repeats |
|
| paired end | 99% | 51% | 29x | 12 | 57 |
|
| fragment | 100% | 48% | 18x | 21 | 58 |
|
| fragment | 99% | 51% | 19x | 95 | 64 |
|
| fragment | 99% | 66% | 16x | 118 | 78 |
|
| fragment | 98% | 66% | 13x | 189 | 97 |
|
| fragment | 98% | 66% | 13x | 206 | 94 |
|
| paired end | 98% | 48% | 29x | 411 | 246 |
Figure 3Instances of poly(A) and poly(T) in sequenced genomes.
All 4–9 bp poly(A) and poly(T) in published genomes were counted and plotted per kilobase in each genome.