| Literature DB >> 23281601 |
Pimlapas Leekitcharoenphon1, Rolf S Kaas, Martin Christen Frølund Thomsen, Carsten Friis, Simon Rasmussen, Frank M Aarestrup.
Abstract
BACKGROUND: The advances and decreasing economical cost of whole genome sequencing (WGS), will soon make this technology available for routine infectious disease epidemiology. In epidemiological studies, outbreak isolates have very little diversity and require extensive genomic analysis to differentiate and classify isolates. One of the successfully and broadly used methods is analysis of single nucletide polymorphisms (SNPs). Currently, there are different tools and methods to identify SNPs including various options and cut-off values. Furthermore, all current methods require bioinformatic skills. Thus, we lack a standard and simple automatic tool to determine SNPs and construct phylogenetic tree from WGS data.Entities:
Mesh:
Year: 2012 PMID: 23281601 PMCID: PMC3521233 DOI: 10.1186/1471-2164-13-S7-S6
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1snpTree server implementation. (A) SNP tree construction from raw reads. Pre-processing (shown in blue) filters and trims raw data to remove low-quality bases. Trimmed raw reads are aligned against a reference genome by BWA with mapping quality equal to 30 as a default. SNPs calling and filtering process (shown in purple) identifies and filters informative SNPs by SAMtools with a couple of cut-offs, minimum coverage and minimum distance between each SNP (the default for both cut-offs is 10) and additionally all heterozygote SNPs are filtered. SNPs tree construction step (shown in orange) transforms from multiple alignments of concatenated SNPs to a phylogenetic tree by using Fastree and a perl script. (B) SNP tree construction from assembled genomes. Contigs or assembled genome are aligned to a reference genome using Nucmer. The SNPs calling and SNPs filtering steps are performed by a 'show-snps' application from MUMmer. SNPs tree construction step is carried out as the same way as the raw reads.
Figure 2snpTree output. An example of the output from snpTree server using Illumina paired-end reads as input data.
Evaluation table
| Data set | Percentage of concordance | |
|---|---|---|
| Exact match | cluster match | |
| 91 | 100 | |
| 88 | 96 | |
| 61 | 100 | |
| 58 | 78 | |
The percentage of concordance from comparing SNP trees from snpTree server against the four published data set.
Figure 3Comparison between phylogenetic trees from published data set (. These trees (34 WGS from ) shows comparison of tree topology between the trees from original publication (left) and snpTree server (right). The linked lines indicate exact match for each genome in the tree. According to the tree from published data, the blue lines mean exact match and the red one represent inexact match.
Figure 4Percentage of identified SNPs. Venn diagram showing the percentage of overlapped and non-overlapped identified SNPs from snpTree server against original publications in both raw reads (A) and assembled genomes (B). The purple, blue and green circles represent the percentage of identified SNPs from original publications, raw reads and assemble genomes from snpTree server respectively.
Sensitivity and specificity
| Variable and cut-off value | Sensitivity (%) | Specificity (%) |
|---|---|---|
| 0 | 97.8 | 100 |
| 10 | 97.2 | 99.99988 |
| 25 | 96.6 | 99.99975 |
| 50 | 95.8 | 99.99959 |
| 75 | 94.6 | 99.99935 |
| 100 | 93.8 | 99.99918 |
| 0 | 97.8 | 100 |
| 10 | 97.8 | 100 |
| 25 | 97.8 | 100 |
| 50 | 97.8 | 100 |
| 75 | 97.8 | 100 |
| 100 | 97.7 | 100 |
Evaluation of sensitivity (SN) and specificity (SP) using different settings of minimum number of bp between SNPs (prune) and minimum number of bp from a sequence end (e) for SNP detection on a simulated dataset consisting of a genome of 4,878,012 bp with 1,000 randomly SNP artificial inserted.