| Literature DB >> 29476024 |
Jörg A Bachmann1, Andrew Tedder2, Benjamin Laenen2, Kim A Steige2, Tanja Slotte2.
Abstract
Rapid advances in short-read DNA sequencing technologies have revolutionized population genomic studies, but there are genomic regions where this technology reaches its limits. Limitations mostly arise due to the difficulties in assembly or alignment to genomic regions of high sequence divergence and high repeat content, which are typical characteristics for loci under strong long-term balancing selection. Studying genetic diversity at such loci therefore remains challenging. Here, we investigate the feasibility and error rates associated with targeted long-read sequencing of a locus under balancing selection. For this purpose, we generated bacterial artificial chromosomes (BACs) containing the Brassicaceae S-locus, a region under strong negative frequency-dependent selection which has previously proven difficult to assemble in its entirety using short reads. We sequence S-locus BACs with single-molecule long-read sequencing technology and conduct de novo assembly of these S-locus haplotypes. By comparing repeated assemblies resulting from independent long-read sequencing runs on the same BAC clone we do not detect any structural errors, suggesting that reliable assemblies are generated, but we estimate an indel error rate of 5.7×10-5 A similar error rate was estimated based on comparison of Illumina short-read sequences and BAC assemblies. Our results show that, until de novo assembly of multiple individuals using long-read sequencing becomes feasible, targeted long-read sequencing of loci under balancing selection is a viable option with low error rates for single nucleotide polymorphisms or structural variation. We further find that short-read sequencing is a valuable complement, allowing correction of the relatively high rate of indel errors that result from this approach.Entities:
Keywords: Brassicaceae; Capsella; assembly; bacterial artificial chromosomes; self-incompatibility locus; sequencing errors; single-molecule real-time sequencing
Mesh:
Year: 2018 PMID: 29476024 PMCID: PMC5873921 DOI: 10.1534/g3.117.300467
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Capsella S-locus sequencing summary
| CgrS-BAC1 | pb_126-1 | 178,980 | 2690 | 56,575 | 19,187 | 11,836 | 156,636 |
| CgrS-BAC1 | pb_192-4 | 180,680 | 136 | 1,787 | 25,340 | 17,241 | 156,640 |
| CgrS-BAC2 | pb_274-14 | 164,087 | 160 | 1,421 | 28,120 | 20,433 | 153,560 |
Figure 1A S-locus sequence assemblies with two measures of indel errors indicated in black bars. Inference of indel errors are based on comparison of two independent SMRT-sequencing runs and assemblies of CgrS-BAC1 (upper) and alignment of Illumina short reads to assembly of CgrS-BAC2 (lower). Annotation of exons are shown as colored arrows, simple repeat sequences in red, and blue-boxes indicate positions of transposable elements. The genes flanking the S-locus are ARK3 (light blue) and U-box (light green). SCR was only annotated in CgrS-BAC2. B S-locus sequence conservation between the two Capsella S-locus BACs, created by aligning the S-locus regions with LASTZ and comparing sequence homology (in % between 0 and 100) using a fixed window size of 250 bp. Sequence similarity between CgrS-BAC1 and CgrS-BAC2 drops steeply at the borders of the S-locus, corresponding to the genes ARK3 and U-box, respectively, although some sequence similarity is also found at SRK. C ML phylogeny of all alignable SRK alleles (exon 1) above 500 bp from GenBank. Bootstrap support over 70% is represented with an asterisk (*). Our newly identified sequences, indicated with arrows, are found broadly distributed across the phylogeny.