| Literature DB >> 26497143 |
Xiaoge Guo1, Kevin Lehner2, Karen O'Connell2, Jenny Zhang2, Sandeep S Dave3, Sue Jinks-Robertson4.
Abstract
Single-molecule real-time (SMRT) sequencing generates much longer reads than other widely used next-generation (next-gen) sequencing methods, but its application to whole genome/exome analysis has been limited. Here, we describe the use of SMRT sequencing coupled with barcoding to simultaneously analyze one or a small number of genomic targets derived from multiple sources. In the budding yeast system, SMRT sequencing was used to analyze strand-exchange intermediates generated during mitotic recombination and to analyze genetic changes in a forward mutation assay. The general barcoding-SMRT approach was then extended to diffuse large B-cell lymphoma primary tumors and cell lines, where detected changes agreed with prior Illumina exome sequencing. A distinct advantage afforded by SMRT sequencing over other next-gen methods is that it immediately provides the linkage relationships between SNPs in the target segment sequenced. The strength of our approach for mutation/recombination studies (as well as linkage identification) derives from its inherent computational simplicity coupled with a lack of reliance on sophisticated statistical analyses.Entities:
Keywords: PacBio; SNP linkage; barcoded sequencing; rare variant; single-molecule sequencing
Mesh:
Year: 2015 PMID: 26497143 PMCID: PMC4683651 DOI: 10.1534/g3.115.023317
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1SMRT sequencing pipeline. Each amplicon was barcoded with unique forward and reverse primer pairs in a 96-well format. Complementary amplicon strands are dark and light blue bars; black and red bars correspond to forward and reverse barcodes, respectively, conjugated to target-specific primers (arrows). Following amplicon pooling, hairpin adapters (orange) were attached during SMRT library construction, converting linear into circular molecules. A single DNA polymerase (purple) reads each circular template in real time during the sequencing reaction. Circular consensus sequence (CCS) reads obtained from a SMRT cell were sorted by barcodes and aligned to a reference sequence.
Figure 2Mapping strand exchange intermediates during recombination. (A) Model of double-strand break (DSB) repair that generates mismatched SNPs (blue paired with orange strand) on only one side of the initiating break. (B) SNP linkages in 123 CCS reads. SNPs are spaced at ∼50-bp intervals and are indicated by orange or blue squares. Isolated blue or black squares within orange regions represent random errors that matched the blue SNP or did not match either SNP, respectively. (C) Sum of orange/blue SNPs (circles) at each position. In all panels, the region of strand exchange where orange and blue strands pair is boxed.
Figure 3CAN1 sequencing data. (A) Alignment of CCS reads for a Can-R mutant harboring a 2-bp deletion. Dots and commas are matches to complementary strands; letters are deviations from the reference sequence and asterisks are deletions. (B) Base-substitution error frequency at each position in the CAN1 amplicon. The enlarged inset compares error frequencies derived from two independent SMRT libraries indicated by open and filled circles. See also Figure S1.
Comparison of SMRT sequences and Illumina exome sequences for MYD88 exon 5 (chromosome 3)
| DNA Source | Position in MYD88 Exon 5 Amplicon | |||||||
|---|---|---|---|---|---|---|---|---|
| 19t | +58g | +1245a | +1524t | |||||
| SMRT | Illumina | SMRT | Illumina | SMRT | Illumina | SMRT | Illumina | |
| Tumor DLBCL773 | 8t + 65 | 8t + 22 | 73g | All g | 3a + 70 | 1 | 73t | No reads |
| Tumor DLBCL778 | 17t + 73 | 5t + 44 | 90g | All g | 90a | No reads | 90t | No reads |
| Tumor DLBCL799 | 26t + 29 | 16t + 21 | 37g + 18 | 5g | 55a | No reads | 55t | No reads |
| Tumor DLBCL816 | 88t + 19 | 8t + 3 | 107g | All g | 53a + 54 | No reads | 107t | No reads |
| Tumor DLBCL894 | 165t + 50 | 27t + 14 | 215g | All g | 315a | No reads | 315t | No reads |
| Tumor DLBCL832 | 75t | All t | 75g | All g | 75a | No reads | 75t | No reads |
| Cell line Ly3 | 79 | 2t + 10 | 79g | All g | 79 | No reads | 63t + 16 | No reads |
| Cell line Ly10 | 16t + 28 | 7t + 5 | 44g | All g | 16a + 28 | No reads | 44t | No reads |
| Cell line SKI | 310t | All t | 310g | All g | 151a + 159 | No reads | 310t | No reads |
| Cell line Karpas422 | 11t | All t | 11g | All g | 11a | No reads | 11t | 2t |
| Cell line Ly1 | 121t | All t | 121g | All g | 121a | No reads | 121t | No reads |
Bases deviating from the reference are uppercase bold. The variant position within exon 5 is numbered relative to the start of the exon. Variants detected downstream of exon 5 are designated “+” and numbered relative to the end of the exon. No reads on the Illumina platform denotes no sequence mapped to a particular region.
Figure 4SMRT sequence data from human lymphomas. (A) MYD88 exon 5 CCS reads from tumor DLBCL260. CCS reads with the t-g-a-t haplotype are shaded yellow. (B) Crossover event that results in four haplotypes for the EZH2 exon 16 sequence in tumor DLBCL799.
Summary of CCS reads from lymphoma cell lines
Bases deviating from the reference are uppercase bold, and the site of the causative mutation in each exon is highlighted gray. Haplotypes are color-coded. A variant within a targeted exon is assigned a number based on its position relative to the start of the exon. Noncoding upstream and downstream variant positions are designated − and +, respectively.