| Literature DB >> 24455204 |
John P Jakupciak1, Jeffrey M Wells1, Richard J Karalus2, David R Pawlowski2, Jeffrey S Lin3, Andrew B Feldman3.
Abstract
Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.Entities:
Year: 2013 PMID: 24455204 PMCID: PMC3877622 DOI: 10.1155/2013/801505
Source DB: PubMed Journal: J Nucleic Acids ISSN: 2090-0201
Figure 1A schematic diagram of the experimental design. Theoretical accumulation of mutational variations among the 12 bacterial culture lineages.
Figure 2Accumulated genomic diversity expected from different passaging approaches. (a) Imposing a single cell genetic bottleneck at each passage step causes a gradual mutational shift with all descendent cells being closely related to one another. (b) By passaging a random subset of microbes at each step, accumulated mutational diversity within the lineage population is expected to be much greater.
Figure 3Pathogen culturing protocol in selective media. Following the seventh passage step, six (6) colonies from each of the twelve cultures were selected, amplified in liquid media, and the DNA was isolated from each for a total of 72 DNA isolations per strain tested. A frozen archive sample of each clone selected for DNA isolation will also be maintained for potential future analysis.
Shows two genome positions analyzed by pipeline A (GNUPMap/Soap) as examples of the pipeline processing. The SNP file data analysis has 15 columns. The SNP file data includes the following: (1) reference genotype; (2) consensus genotype; (3) quality score of consensus genotype; (4) best base; (5) average quality score of best base; (6) count of uniquely mapped best base; (7) count of all mapped best base; (8) second best bases; (9) average quality score of second best base; (10) count of uniquely mapped second best base; (11) count of all mapped second best base; (12) sequencing depth of the site; (13) rank sum test P value; (14) average copy number of nearby region; and (15) whether the site is a dbSNP (1, yes; 0, new SNP). The quality score in field 3 is the posterior probability, and the range is between 0 and 1 (The reported score is PP × 100, so the range is 0–100). The quality score in field 5 and 9 is the average quality score of the best or 2nd best base, respectively. This corresponds to the Illumina quality scores from the original QSEQ or FASTQ files, and the range is between 0 and 40.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A | T | 36* | T | 37 | 4 | 4 | A | 36 | 3 | 3 | 7 | 1 | 1 | 0 |
| A | G | 99 | G | 38 | 6 | 6 | A | 0 | 0 | 0 | 6 | 1 | 1 | 0 |
*This example SNP is called with a posterior probability of 36%. The reference genotype is A and the consensus is T. The best base identified is T with a q-score of 37, while the 2nd best base is the same as reference with a q-score of 36. The total depth at the site is seven, with four reads supporting T and three reads supporting A. This SNP should be rejected by setting an appropriate posterior probability cutoff and not used to determine diversity or phylogenetic distance relationships.
Figure 4Five clones of the same lineage after passage 8 were sequenced and compared for SNPs. Posterior probabilities were calculated by the program SOAPsnp. This includes SNPs detected against the progenitor culture that was sequenced right after the first passage. (a) Illustrates chromosome 1. (b) Illustrates chromosome 2.
Figure 5Data Analysis Pipeline. SOAPsnp was used to find SNPs in the data. The criteria for SNP validation in SOAPsnp is rather low. Variant validation is highly critical in metagenomic samples, more so than with homogeneous samples. False variants created by sequencer error can quickly change the results of forensics analysis in metagenomic samples where lower coverage depth and partial base consensus conditions are expected, whereas base consensus can be demanded in homogeneous sample data sets.
Pipeline B alignment-based mapping statistics for threat agents and our Bg calibrant for comparison to nonalignment Z values.
| Organism |
| Nonreference | Mapped read fraction | Unmapped reference bases | SNP calls |
|---|---|---|---|---|---|
| B.mallei | 3.06 | 0.025 | 0.94 | 166,439 | 431 |
| B.pseudomallei | 2.87 | 0.027 | 0.98 | 859 | 365 |
| B.globigii* | 0.02 | 0.018 | 0.99 | 0 | 0 |
The Z values for Burkholderia are reflective of the higher fraction of non-reference-matching base calls in these samples and is indicative of greater population diversity compared to the Bg calibrant sample. The larger unmapped base counts along the reference genomes (column 5) for Bm are due to insertion elements that are highly mobile within these genomes and promote re-arrangements. BFAST default parameters for assigning candidate locations to reads when there is a high multiplicity of candidate alignments across the reference genome resulted in these gaps. Representative mapping statistics: 10,000,000 Illumina reads, 100 bp. *Calibrant.
SNP calls for Burkholderia pseudomallei on chromosome 1. Progenitors are labeled 0_1 to 0_12. Clones from the 8th passage are labeled 8_x_y to indicate their respective lineage (x = 1–12) and the number of clones selected for comparison (y = 1–6).
| Genome position | Reference | 0_1 | 0_2 | 0_3 | 0_4 | 0_5 | 0_8 | 0_10 | 0_11 | 0_12 | 8_1_1 | 8_1_2 | 8_1_3 | 8_1_4 | 8_1_5 | 8_1_6 | 8_3_1 | 8_5_1 | 8_6_1 | 8_8_1 | 8_9_1 | 8_10_1 | 8_11_1 | 8_12_1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 27694 | C | A | A | |||||||||||||||||||||
| 30272 | G | A | C | |||||||||||||||||||||
| 198367 | C | T | T | A | T | |||||||||||||||||||
| 208896 | G | C | ||||||||||||||||||||||
| 208898 | G | C | ||||||||||||||||||||||
| 208901 | G | A | ||||||||||||||||||||||
| 209253 | G | A | A | |||||||||||||||||||||
| 238223 | G | A | A | |||||||||||||||||||||
| 238226 | T | A | A | A | ||||||||||||||||||||
| 238248 | G | A | ||||||||||||||||||||||
| 254143 | C | |||||||||||||||||||||||
| 260159 | C | G | ||||||||||||||||||||||
| 309954 | G | C | ||||||||||||||||||||||
| 387701 | T | G | ||||||||||||||||||||||
| 417834 | C | T | ||||||||||||||||||||||
| 417940 | T | A | ||||||||||||||||||||||
| 439654 | C | A | A | G | G | |||||||||||||||||||
| 468747 | G | A | C | C | ||||||||||||||||||||
| 473445 | T | G | ||||||||||||||||||||||
| 502149 | T | |||||||||||||||||||||||
| 522021 | C | A | ||||||||||||||||||||||
| 535552 | G | C | ||||||||||||||||||||||
| 575316 | C | A | ||||||||||||||||||||||
| 575340 | T | A | ||||||||||||||||||||||
| 636702 | T | G | ||||||||||||||||||||||
| 637354 | C | A | ||||||||||||||||||||||
| 637366 | T | C | C | |||||||||||||||||||||
| 719754 | G | A | ||||||||||||||||||||||
| 748773 | T | C | ||||||||||||||||||||||
| 748778 | G | A | ||||||||||||||||||||||
| 752001 | G | A | ||||||||||||||||||||||
| 755855 | G | C | ||||||||||||||||||||||
| 827221 | G | A | A | A | ||||||||||||||||||||
| 830204 | C | |||||||||||||||||||||||
| 840516 | C | T | ||||||||||||||||||||||
| 846684 | C | T | T | |||||||||||||||||||||
| 856204 | G | A | ||||||||||||||||||||||
| 856229 | G | T | T | |||||||||||||||||||||
| 867860 | T | G | ||||||||||||||||||||||
| 879922 | G | A | ||||||||||||||||||||||
| 908149 | C | G | G | |||||||||||||||||||||
| 918155 | G | A | ||||||||||||||||||||||
| 918163 | G | A |
SNP calls for Burkholderia pseudomallei on chromosome 2. Progenitors are labeled 0_1 to 0_12. Clones from the 8th passage are labeled 8_x_y to indicate their respective lineage (x = 1–12) and the number of clones selected for comparison (y = 1–6).
| Genome position | Reference | 0_1 | 0_2 | 0_3 | 0_4 | 0_5 | 0_8 | 0_10 | 0_11 | 0_12 | 8_1_1 | 8_1_2 | 8_1_3 | 8_1_4 | 8_1_5 | 8_1_6 | 8_3_1 | 8_5_1 | 8_6_1 | 8_8_1 | 8_9_1 | 8_10_1 | 8_11_1 | 8_12_1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 16302 | C | A | A | |||||||||||||||||||||
| 63867 | T | C | C | |||||||||||||||||||||
| 112549 | A | G | ||||||||||||||||||||||
| 112589 | A | G | ||||||||||||||||||||||
| 125529 | T | C | C | |||||||||||||||||||||
| 153871 | C | A | A | |||||||||||||||||||||
| 255127 | C | T | T | T | T | T | T | |||||||||||||||||
| 257976 | A | |||||||||||||||||||||||
| 289329 | T | |||||||||||||||||||||||
| 334349 | G | A | A | |||||||||||||||||||||
| 371920 | G | A | A | |||||||||||||||||||||
| 401983 | C | T | T | |||||||||||||||||||||
| 429628 | G | A | ||||||||||||||||||||||
| 438505 | A | G | G | |||||||||||||||||||||
| 446580 | T | |||||||||||||||||||||||
| 479883 | T | G | ||||||||||||||||||||||
| 482145 | G | A | ||||||||||||||||||||||
| 563692 | A | G | G | |||||||||||||||||||||
| 568009 | G | A | A | |||||||||||||||||||||
| 656370 | C | A | ||||||||||||||||||||||
| 673008 | G | T | C | C | T | T | ||||||||||||||||||
| 767675 | C | T | T | |||||||||||||||||||||
| 767835 | T | C | C | |||||||||||||||||||||
| 769495 | C | A | ||||||||||||||||||||||
| 770544 | C | A | ||||||||||||||||||||||
| 790672 | C | A | A | |||||||||||||||||||||
| 793808 | G | A | A | |||||||||||||||||||||
| 794151 | G | A | A | |||||||||||||||||||||
| 794210 | G | A | ||||||||||||||||||||||
| 794352 | G | T | ||||||||||||||||||||||
| 794608 | T | G | ||||||||||||||||||||||
| 794942 | G | A | A | |||||||||||||||||||||
| 821535 | G | A | ||||||||||||||||||||||
| 859546 | A | G | G | |||||||||||||||||||||
| 859585 | G | T | T |