| Literature DB >> 30940689 |
Ou Wang1,2,3, Robert Chin4, Xiaofang Cheng1,2, Michelle Ka Yan Wu4, Qing Mao4, Jingbo Tang5, Yuhui Sun1,2, Radoje Drmanac1,2,4,5, Brock A Peters1,2,4,5, Ellis Anderson4, Han K Lam4, Dan Chen1,2, Yujun Zhou1,2, Linying Wang1,2, Fei Fan1,2, Yan Zou1,2, Yinlong Xie5, Rebecca Yu Zhang4, Snezana Drmanac4, Darlene Nguyen4, Chongjun Xu1,2,4, Christian Villarosa4, Scott Gablenz4, Nina Barua4, Staci Nguyen4, Wenlan Tian4, Jia Sophie Liu4, Jingwan Wang1,2, Xiao Liu1,2, Xiaojuan Qi1,2, Ao Chen1,2, He Wang1,2, Yuliang Dong1,2, Wenwei Zhang1,2, Andrei Alexeev4, Huanming Yang1,6, Jian Wang1,6, Karsten Kristiansen1,2,3, Xun Xu1,2.
Abstract
Here, we describe single-tube long fragment read (stLFR), a technology that enables sequencing of data from long DNA molecules using economical second-generation sequencing technology. It is based on adding the same barcode sequence to subfragments of the original long DNA molecule (DNA cobarcoding). To achieve this efficiently, stLFR uses the surface of microbeads to create millions of miniaturized barcoding reactions in a single tube. Using a combinatorial process, up to 3.6 billion unique barcode sequences were generated on beads, enabling practically nonredundant cobarcoding with 50 million barcodes per sample. Using stLFR, we demonstrate efficient unique cobarcoding of more than 8 million 20- to 300-kb genomic DNA fragments. Analysis of the human genome NA12878 with stLFR demonstrated high-quality variant calling and phase block lengths up to N50 34 Mb. We also demonstrate detection of complex structural variants and complete diploid de novo assembly of NA12878. These analyses were all performed using single stLFR libraries, and their construction did not significantly add to the time or cost of whole-genome sequencing (WGS) library preparation. stLFR represents an easily automatable solution that enables high-quality sequencing, phasing, SV detection, scaffolding, cost-effective diploid de novo genome assembly, and other long DNA sequencing applications.Entities:
Mesh:
Year: 2019 PMID: 30940689 PMCID: PMC6499310 DOI: 10.1101/gr.245126.118
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
stLFR equipment and reagent cost
Figure 1.Overview of stLFR. (A) The first step of stLFR involves inserting a hybridization sequence approximately every 200–1000 bp on long genomic DNA molecules. This is achieved using transposons. The transposon-integrated DNA is then mixed with beads that each contain ∼400,000 copies of an adapter sequence that contains a unique barcode shared by all adapters on the bead, a common PCR primer site, and a common capture sequence that is complementary to the sequence on the integrated transposons. After the genomic DNA is captured to the beads, the transposons are ligated to the barcode adapters. There are a few additional library processing steps and then the cobarcoded subfragments are sequenced on a BGISEQ-500 or equivalent sequencer. (B) Mapping read data by barcode results in clustering of reads within 10- to 350-kb regions of the genome. Total coverage and barcode coverage from four barcodes are shown for the 1-ng stLFR-1 library across a small region on Chromosome 11. Most barcodes are associated with only one read cluster in the genome. (C) The number of original long DNA fragments per barcode are plotted for the 1-ng libraries stLFR-1 (blue) and stLFR-2 (orange) and the 10-ng stLFR libraries stLFR-3 (yellow) and stLFR-4 (gray). More than 80% of the fragments from the 1-ng stLFR libraries are cobarcoded by a single unique barcode. (D) The fraction of nonoverlapping sequence reads (blue) and captured subfragments (orange) covering each original long DNA fragment are plotted for the 1-ng stLFR-1 library.
Variant calling statistics
Figure 2.stLFR-1 phasing performance. The 221 phased blocks from the stLFR-1 library are depicted on chromosomes as alternating colors of gray and purple. Unphased regions are depicted in white. The inset table shows the performance of phasing with different sequence read coverage levels.
Figure 3.SV detection. (A) Previously reported deletions in NA12878 were also found using stLFR data. Heat maps of barcode sharing for each deletion can be found in Supplemental Figure S3. (B) A heat map of barcode sharing within windows of 2 kb for a region with a ∼150 kb heterozygous deletion on Chromosome 8 was plotted using a Jaccard index as previously described (Zhang et al. 2017). Regions of high overlap are depicted in dark red. Those with no overlap in beige. Arrows demonstrate how regions that are spatially distant from each other on Chromosome 8 have increased overlap marking the locations of the deletion. (C) Cobarcoded reads are separated by haplotype and plotted by unique barcode on the y-axis and Chromosome 8 position on the x-axis. The heterozygous deletion is found in a single haplotype. Heat maps were also plotted for overlapping barcodes between Chromosomes 5 and 12 for a patient cell line with a known translocation (Dong et al. 2016) (D) and GM20759, a cell line with a known transversion in Chromosome 2 (Dong et al. 2017) (E).
NA12878 de novo assembly statistics
Figure 4.Dot plots of de novo–assembled NA12878. The scaffolds from the de novo assemblies of stLFR-1 (A) and stLFR-2 (B) were compared against chromosomes from GRCh38 using dot plots.