| Literature DB >> 22863213 |
Mauricio O Carneiro1, Carsten Russ, Michael G Ross, Stacey B Gabriel, Chad Nusbaum, Mark A DePristo.
Abstract
BACKGROUND: Pacific Biosciences technology provides a fundamentally new data type that provides the potential to overcome some limitations of current next generation sequencing platforms by providing significantly longer reads, single molecule sequencing, low composition bias and an error profile that is orthogonal to other platforms. With these potential advantages in mind, we here evaluate the utility of the Pacific Biosciences RS platform for human medical amplicon resequencing projects.Entities:
Mesh:
Year: 2012 PMID: 22863213 PMCID: PMC3443046 DOI: 10.1186/1471-2164-13-375
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Characterization of Pacific Biosciences data.a) Base error mode rate for deletions, insertions and mismatches. b) Length distribution of reads in the Pacific Biosciences discovery dataset (here some raw reads are as long as 5,000 bases). c) Pacific Biosciences error rate by position. Shown are all errors (mismatch, insertion and deletion) by base position, including every base sequenced despite any previously known variation (this is why the average is slightly higher than 15%). Due to the diminishing number of reads with bases beyond 1000 we only plot here positions up to 1000. d-f) GC bias of the Pacific Biosciences instrument represented by the genomes of P. falciparum (low GC), E. coli (average GC) and R. sphaeroides (high GC) shows good balance in GC coverage where there is sufficient data in the genome, regardless of GC content.
Figure 2Error profile of Pacific Biosciences data.a) A chart showing the number of observations of the alternate allele in all heterozygous sites and how reference bias pulls the median significantly below the expected 0.5. This combination creates multiple possible alignments with the highest alignment score, allowing the aligner in some cases to hide the true alternate allele inside an insertion to maximize the alignment score at the cost of reference bias. b) IGV browser (http://www.broadinstitute.org/igv/) screenshot of the validation dataset showing an example of a case of aligner-created reference bias on Pacific Biosciences RS data. The true SNPs (C) are correctly called in individual reads. c) An IGV browser[18,19] screen snapshot of a region in the discovery dataset where Illumina HiSeq data suffers from context specific errors that makes it appear as a true heterozygous site whereas Pacific Biosciences RS data (with errors nearly random, though more frequent) clearly shows no event in this region.
Validation calls for Pacific Biosciences and Illumina MiSeq
| 37 | 1 | 38 | 0 | |
| | | | | |
| 1 | 59 | 5 | 55 | |
Number of sites called polymorphic and monomorphic by Pacific Biosciences RS and Illumina MiSeq in the validation experiment. Datasets were sequenced from the same amplicons and were downsampled to 70x average coverage for comparison. Pacific Biosciences shows good accuracy with consistently high percentages in all metrics and making only 2 out 98 wrong calls, while MiSeq shows excellent sensitivity and negative predictive value but lower specificity and positive predictive value and 5 out of 98 wrong calls, all of the same class.
Poly = polymorphic site; Mono = monomorphic site.
Validation metrics for Pacific Biosciences and Illumina MiSeq
| 97% | 98% | 97% | 98% | |
| 100% | 91% | 88% | 100% |
PPV = positive predictive value; NPV = negative predictive value.