| Literature DB >> 32540955 |
Zhoutao Chen1, Long Pham1, Tsai-Chin Wu1, Guoya Mo1, Yu Xia1, Peter L Chang1, Devin Porter1, Tan Phan2, Huu Che2, Hao Tran2,3, Vikas Bansal4, Justin Shaffer5, Pedro Belda-Ferre5, Greg Humphrey5, Rob Knight5, Pavel Pevzner6, Son Pham2, Yong Wang7, Ming Lei7.
Abstract
Long-range sequencing information is required for haplotype phasing, de novo assembly, and structural variation detection. Current long-read sequencing technologies can provide valuable long-range information but at a high cost with low accuracy and high DNA input requirements. We have developed a single-tube Transposase Enzyme Linked Long-read Sequencing (TELL-seq) technology, which enables a low-cost, high-accuracy, and high-throughput short-read second-generation sequencer to generate over 100 kb of long-range sequencing information with as little as 0.1 ng input material. In a PCR tube, millions of clonally barcoded beads are used to uniquely barcode long DNA molecules in an open bulk reaction without dilution and compartmentation. The barcoded linked-reads are used to successfully assemble genomes ranging from microbes to human. These linked-reads also generate megabase-long phased blocks and provide a cost-effective tool for detecting structural variants in a genome, which are important to identify compound heterozygosity in recessive Mendelian diseases and discover genetic drivers and diagnostic biomarkers in cancers.Entities:
Year: 2020 PMID: 32540955 PMCID: PMC7370886 DOI: 10.1101/gr.260380.119
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Figure 1.Overview of TELL-seq library workflow and structure. (A) Diagram of TELL-seq library preparation procedure. In a 0.2-mL PCR tube, 0.1 ng to 5 ng genomic DNA was mixed with 3–10 million barcoded TELL beads and transpososomes for the clonal barcoding reaction. Genomic DNA fragments were captured on the barcoded TELL beads via connecting strand transfer complexes (STCs) to barcode oligos on the bead surface. A tagging between STCs by a second transpososome introduced a second priming site for library amplification. After breaking the STCs and washing the magnetic TELL beads, sequencing library molecules were amplified off beads with P5 and P7 adaptor sequences incorporated at the same time. The total library procedure took ∼3 h. (B) TELL-seq library structure for Illumina sequencing systems. Index 1 comprises 18-base TELL-seq molecular barcode; Index 2 comprises 8-base barcode for sample indexing.
Comparison of de novo assembly results of E. coli K12 MG1655 using sequencing data from Illumina standard fragment library, TELL-seq Illumina library (with different assemblers), and Oxford Nanopore R10.3 chemistry
Summary of de novo assembly results using TuringAssembler on bacterial samples
Figure 2.TELL-seq linked-read molecule analyses. (A) Calculated molecule length based on the TELL-seq sequencing data from microbial samples. To compare results from different microbial samples, the calculated DNA mass was normalized as follows: (Individual DNA mass at specified molecule length/Total DNA mass of the microbial sample) × 1000. (B) Calculated molecule length based on the TELL-seq sequencing data from human cell line samples. To compare results from different samples, the calculated DNA mass was normalized as follows: (Individual DNA mass at specified molecule length/Total DNA mass of the cell line sample) × 1000. (C) Distribution of linked-read sequencing coverage per molecule. Average sequencing coverage per molecule was 13%, 10%, 18%, 24%, and 14% for E. coli DH10B (0.5 ng genomic DNA input for library prep), E. coli DH10B (0.1 ng), E. coli K12 MG1655 (0.5 ng), C. jejuni (0.5 ng), and R. sphaeroides (0.1 ng), respectively.
Summary of TELL-seq phasing results on NA12878 and NA24385 samples
Figure 3.Diagram of phased heterozygous SNVs on nine HLA genes. The major histocompatibility complex region in the NA12878 sample was phased into two complete phasing blocks: one for the maternal haplotype (orange), another for paternal haplotype (blue). Compared with HLA reference on nine well-characterized genes, SNVs with switch error were shown in the opposite color on each haplotype in the HLA-A, HLA-DRB1, and HLA-DQA1 gene.
Summary of TELL-seq phasing results on nine HLA genes in comparison with the reference data
Comparison of 10 large deletion calls reported by different linked-read methods from 10x, Illumina's CPTv2-seq, and stLFR
Figure 4.Detection of structural variations in NA12878. (A) Phased read graph from TELL-seq data showed a 19-kb homozygous deletion (Hom) within a 114-kb heterozygous deletion (Het) on Chromosome 3: 162,512,134–162,626,335. For the same location, a 10x study (Zheng et al. 2016) only reported a 114-kb heterozygous deletion, whereas an stLFR study (Wang et al. 2019) only identified a 19-kb homozygous deletion. However, for 10x data, we confirmed the presence of the 19-kb homozygous deletion when we manually examined the data. GRCh38 (hg38) coordinates were used for visualization data. (B) Heat map of the same region from TELL-seq data clearly showed the presence of both the small homozygous deletion and the large heterozygous deletion.