| Literature DB >> 25888426 |
Funda Orkunoglu-Suer1, Arthur F Harralson2, David Frankfurter3, Paul Gindoff4, Travis J O'Brien5.
Abstract
BACKGROUND: One of the most significant issues surrounding next generation sequencing is the cost and the difficulty assembling short read lengths. Targeted capture enrichment of longer fragments using single molecule sequencing (SMS) is expected to improve both sequence assembly and base-call accuracy but, at present, there are very few examples of successful application of these technologic advances in translational research and clinical testing. We developed a targeted single molecule sequencing (T-SMS) panel for genes implicated in ovarian response to controlled ovarian hyperstimulation (COH) for infertility.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25888426 PMCID: PMC4397691 DOI: 10.1186/s12864-015-1451-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Data pipeline
|
|
|
|
|---|---|---|
|
| BLASR de novo CCS aligner algorithm filter v1 (hg19) | BLASR de novo CCS aligner algorithm was used for SNP calling using CCS reads. Reads were filtered by length/quality and mapped to reference sequence (UCSC, hg19). Base quality scores were recalibrated, and consensus Filter v1: Min Read Length bp: 50, Minimum Sub Read Length 50 |
|
| Overview | Variants identified using the GATK Unified Genotyper for Bayesian diploid and haploid SNP calling using base quality score recalibration and default settings. Indel calling was not included in SMRT pipeline |
|
| Functional annotation of variants | |
|
| Overview | SMART view, UCSC Genome browser, R circos plot, Partek |
Characteristics of captured sequence
|
|
|
|---|---|
| Sequence yield per run (pre filter base) | 800 Mb |
| Sequence run per sample | 2 chips |
| Run time | 2 movies, 45 min each |
| Mean Accuracy | 10X CCS, 97.3% |
| Targeted Accuracy | 10 X CCS, 100% |
| Mean Read length | 3200 nt |
| Mean mapped read length | 900 bp |
| Insert size | 1 kb |
| DNA requirements | 500 ng/uL |
Figure 1Read length vs. GC coverage. This is a representative example of typical read length and coverage (A) of GC rich regions (VEGFA gene) and sequence results imported as custom track in UCSC Genome Browser. Fragments (1 kb) were tiled to cover the entire genomic region of the VEGFA gene (grey bars above sequence). There were a total of 42 amplicons (intragenic sequences totaled 34,268 bp) with 50–150 bp overlap. B. Screen capture showing the high GC content (72%) of the VEGFA gene which was successfully 005B1:1] sequenced.
Figure 2Uniform Coverage between amplicons using Droplet PCR with SMS technology. Circos plot illustrating the relative coverage of target sequences in 3 samples. Outermost blue displays target genomic sequence with respect to chromosomal location. For each target sequence, the percentage GC content is provided in purple (scale 0–100). The bases covered for bait regions are shown in green (scale 1–1000). The coverage of SMS for 3 representative samples is provided in red.
Figure 3Validation of rs12470652 identified by T-SMS. SMRT® View screenshot of secondary data analysis from SMRT® Portal (A and B). Validation of the LHCGR rs12470652 heterozygous variant discovered using T-SMS by Sanger DNA sequencing. The figure depicts the process of biomarker identification by SMS (A and B) and validation by conventional Sanger sequencing (C).