| Literature DB >> 19880367 |
Ji Qi1, Fangqing Zhao, Anne Buboltz, Stephan C Schuster.
Abstract
SUMMARY: We develop a novel mining pipeline, Integrative Next-generation Genome Analysis Pipeline (inGAP), guided by a Bayesian principle to detect single nucleotide polymorphisms (SNPs), insertion/deletions (indels) by comparing high-throughput pyrosequencing reads with a reference genome of related organisms. inGAP can be applied to the mapping of both Roche/454 and Illumina reads with no restriction of read length. Experiments on simulated and experimental data show that this pipeline can achieve overall 97% accuracy in SNP detection and 94% in the finding of indels. All the detected SNPs/indels can be further evaluated by a graphical editor in our pipeline. inGAP also provides functions of multiple genomes comparison and assistance of bacterial genome assembly. AVAILABILITY: inGAP is available at http://sites.google.com/site/nextgengenomics/ingapEntities:
Mesh:
Year: 2009 PMID: 19880367 PMCID: PMC2796817 DOI: 10.1093/bioinformatics/btp615
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.A description of SNP/indel calling workflow of inGAP. First, assigning reads to a reference genome. Second, precise multiple alignments are performed for gapped regions. Third, a Bayesian algorithm is used to call SNPs and indels.
Fig. 2.Performance comparison between MAQ and inGAP on simulated datasets. (A) Sensitivity on SNP/indel calling on different levels of divergent Illumina reads. Green line shows the indels (1–10 bp) identified by inGAP. (B) Positive predictive value (PPV) comparison between MAQ and inGAP based on different divergent Illumina reads. (C) Performance on simulated Illumina reads with coverage ranging from 5× to 100×. (D) PPVs on simulated Illumina reads under different sequence coverage. (E) Performance on simulated 454 reads with coverage ranging from 5× to 100×. (F) PPVs on simulated 454 reads under different sequence coverage.