| Literature DB >> 31173071 |
Zirui Dong1,2, Xia Zhao3,4,5, Qiaoling Li3,4,5, Zhenjun Yang1,2, Yang Xi3,4,5, Andrei Alexeev6, Hanjie Shen3,4,5, Ou Wang3,4, Jie Ruan3,4, Han Ren3,4, Hanmin Wei5, Xiaojuan Qi3,4, Jiguang Li3,4,5, Xiaofan Zhu1,2, Yanyan Zhang5, Peng Dai7, Xiangdong Kong7, Killeen Kirkconnell6, Oleg Alferov6, Shane Giles6, Jennifer Yamtich6, Bahram G Kermani6, Chao Dong3,4, Pengjuan Liu3,4,5, Zilan Mi3,4, Wenwei Zhang3,4,8, Xun Xu3,4,9, Radoje Drmanac3,4,5,6, Kwong Wai Choy1,2,10, Yuan Jiang6.
Abstract
The diversity of disease presentations warrants one single assay for detection and delineation of various genomic disorders. Herein, we describe a gel-free and biotin-capture-free mate-pair method through coupling Controlled Polymerizations by Adapter-Ligation (CP-AL). We first demonstrated the feasibility and ease-of-use in monitoring DNA nick translation and primer extension by limiting the nucleotide input. By coupling these two controlled polymerizations by a reported non-conventional adapter-ligation reaction 3' branch ligation, we evidenced that CP-AL significantly increased DNA circularization efficiency (by 4-fold) and was applicable for different sequencing methods but at a faction of current cost. Its advantages were further demonstrated by fully elimination of small-insert-contaminated (by 39.3-fold) with a ∼50% increment of physical coverage, and producing uniform genome/exome coverage and the lowest chimeric rate. It achieved single-nucleotide variants detection with sensitivity and specificity up to 97.3 and 99.7%, respectively, compared with data from small-insert libraries. In addition, this method can provide a comprehensive delineation of structural rearrangements, evidenced by a potential diagnosis in a patient with oligo-atheno-terato-spermia. Moreover, it enables accurate mutation identification by integration of genomic variants from different aberration types. Overall, it provides a potential single-integrated solution for detecting various genomic variants, facilitating a genetic diagnosis in human diseases.Entities:
Keywords: adapter-ligation; controlled polymerization; integrated platform; mate-pair sequencing
Mesh:
Year: 2019 PMID: 31173071 PMCID: PMC6704401 DOI: 10.1093/dnares/dsz011
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Method comparison
| Method | Reagent cost/sample (USD) | Size selection (i.e. gel) | Biotin label and enrichment | Turn-around-time (days) | DNA input amount (µg) | Chimeric rate (%) | Adapter-contaminated reads (%) | Longest read-length (bp) |
|---|---|---|---|---|---|---|---|---|
| BLBEC | 350 | Optional | Yes | 3 | 3|20 | ∼8.7 | NA | >100 |
| Custom Jumping Library | 188 | Yes | Yes | 3 | 5∼20 | — | NA | ∼26 |
| SOLiD | ∼330 | Yes | Yes | 2 | 1∼5 | 24.0–58.4 | NA | 60 |
| Nextera | ∼90|∼350 | Optional | Yes | 1.5∼2 | 1|4 | ∼3.0 | 23.1∼23.7 | >100 |
| CP-AL | ∼40 | No | No | 2.5 | 1 | ∼2.9 | — | >100 |
Library construction method: BLBEC [biotin-labeled blunt-end circularization]; custom jumping library;, SOLiD mate-pair library construction; Nextera mate-pair library construction; CP-AL (mate-pair method through coupling Controlled Polymerizations with a non-conventional Adapter-Ligation) demonstrated in this study.
Reagent cost was obtained from a published study (BLBEC and Custom Jumping Library) and supplier websites. Nextera: 4,233 USD for 48 gel-free samples or for 12 gel-plus samples, while SOLiD ∼4,000 USD for 12 samples.
By searching with the Tn5 sequence for the data with read-length no shorter than 100 bp.
The numbers before and after the vertical line reflect DNA input for non-size-selected and size-selected library construction, respectively.
Figure 1Schematic representation of mate-pair library construction by coupling controlled polymerizations with a non-conventional adapter-ligation (CP-AL). The first adapter (Ad1, blue) with/without a barcode sequence (indicated by grey bar) is ligated to genomic DNA fragments (black lines). Various adapter designs may be used as the first adapter [e.g. blunt-end adapters, Y-shaped Illumina adapters, bubble adapters (Supplementary Fig. S2), etc.]. After Ad1 ligation and PCR, DNA ends are ligated together to form dsCirs containing a gap. naCNT is performed and results in DNA polymerization (dashed lines) and the movement of the gap into a selected length of the genomic DNA. 3′-branch ligation is used to ligate a 3′-end of the second adapter (Ad2_3′, yellow). The two strands of the dsCir are separated, and the single-stranded DNA (ssDNA) with Ad2_3′ at the 3′-end is used as a template for CPE (ntCPE or ttCPE). The Ad2_5′ sequence (green) is added to the 3′-end of the CPE product through 3′-branch ligation. After PCR, this results in genomic DNA with half of Ad2 at each end separated by Ad1.
Performance comparison with paired-end 50 or 26 bp reads
| Sample/data | Karyotype | Results from whole-genome sequencing | Library construction method and sequencing platform | DNA input (µg) | Chimeric rate (%) | Duplication rate (%) | Aligned distance <1|kb (%) | Aligned distance <1|kb (% uniquely aligned and non-duplicated) | Fraction of genome physically covered (%) |
|---|---|---|---|---|---|---|---|---|---|
| BCA01 | 46,XX,t(2;20)(p13;p13) | 46,XX,t(2;20)(p15;p13) | BLBEC+HS | 3 | 9.04 | 15.7 | 27.6 | 5.2 | 98.6|91.1 |
| BCA02 | 46,XX,t(2;3)(p21;p25) | 46,XX,t(2;3)(p22.1;p26.1) | BLBEC+HS | 3 | 10.92 | 15.9 | 22.3 | 3.9 | 98.5|93.7 |
| BCA11 | 46,XX,t(4;5; 14)(q31.3;q33; q13) | 46,XX,t(4;5;14)(q32.3;q34;q21.3) | BLBEC+HS | 3 | 6.70 | 12.5 | 29.7 | 6.0 | 96.7|91.9 |
| BCA16 | 46,XX,t(14;18)(q24;q23) | 46,XX,t(14;18)(q13.2;q22.1) | BLBEC+HS | 3 | 8.04 | 12.0 | 30.5 | 8.0 | 96.7|91.2 |
| Sample01 | 46,XX,ins(10;13)(q11.2;q31q33) | 46,XX,ins(10;13)(q21.3;q21.2q31.3) | CP-AL+BS | 1 | 2.84 | 16.8 | 0.6 | 0.1 | 97.7|96.6 |
| Sample02 | 46,XY,ins(6;2)(q23;p13p22) | 46,XY,ins(6;2)der(6)(6pter->6q13:: 6q13->6q21.3:: 2p16.1<-2p16.1:: 6q21->6q22.31:: 2p16.1->2p11.2:: 2p22.2->2p16.1:: 6q22.31->6q26:: 6q13<-6q13:: 6q26->6qter); der(2)(2pter->2p22.2:: 2p11.2->2qter) | CP-AL+BS | 1 | 2.82 | 17.8 | 0.8 | 0.2 | 96.2|94.9 |
| Sample03 | 46,XX,ins(2;18)(q31;q21.1q23) | 46, XX, ins(2; 18)(q32.2; q21.1q22.1); inv(18)(q21.1) | CP-AL+BS | 1 | 3.52 | 17.5 | 0.5 | 0.1 | 98.0|97.0 |
| Sample04 | 46,XY, der(4)ins(4)(q21.1;q31.1q31.3)inv(4)(p12q13.3) | 46,XY,der(4)(pter->p12:: q13.3<-p12:: q31.1->q31.23:: q13.3->q31.1:: q31.23->qter) | CP-AL+BS | 1 | 2.74 | 19.0 | 0.9 | 0.4 | 96.3|95.0 |
| Sample05 | 46,XY,ins(6;3)(q13; q21q24) | 46,XY,ins(6;3)der(6)(6pter->6q14.3:: 3q21.1<-3q21.1:: 3q24<-3q21.3:: 3q21.1->3q21.3:: 6q14.3->6qter); der(3)(3pter->3q21.1:: 3q21.1->3q21.1:: 3q24->3qter)seq[hg19] del(3q24)chr3: g.146055006_148300124del | CP-AL+BS | 1 | 2.87 | 18.9 | 0.7 | 0.2 | 96.1|95.0 |
| Sample06 | 46,XX | 46,XX | CP-AL+BS | 1 | 2.81 | 18.9 | 0.7 | 0.2 | 98.1|97.0 |
BLBEC and CP-AL indicate library construction based on biotin-labeled blunt-end circularization and through coupling Controlled Polymerizations with Adapter-Ligation, respectively; HS and BS refer to HiSeq 2000 (Illumina) and BGISEQ-500 (BGI-Wuhan), respectively.
Chimeric read-pairs were defined as the read-pairs aligned to different chromosomes or to the same chromosomes but with a distance >10 kb.
Sequenced with non-size-selected BLBEC libraries, published in our pilot study.
The numbers before and after each vertical line reflect the fraction of the genome physically coverage rate by the data with paired-end 50 bp and trimmed 26 bp reads, respectively.
Sequenced with CP-AL libraries and the detection results shown here are based on the data with 100 bp.
Evaluation of different parameters/conditions for mate-pair library construction
| Library | Nick translation | Primer extension | % Autosomal exome by GC with <60% of mean coverage | % Faction of the genome with fully called | % Fraction of the coding region with fully called |
|---|---|---|---|---|---|
| #1 | PolI | ttCPE by | 6.6 | 95.0 | 95.9 |
| #2 | PolI+2xAT | ttCPE by | 1.8 | 96.2 | 97.8 |
| #3 | PolI+2xAT | naCPE by | 0.7 | 96.7 | 98.1 |
| #4 | PolI/ | naCPE by | 1.4 | 96.1 | 97.8 |
ttCPE and naCPE refer to controlled primer extension by adjusting of reaction temperature and duration time or limiting the nucleotide input only.
% Autosomal exome by GC with <60% of mean coverage measures the fraction of the autosomal exome regions where the normalized coverage of exome by cumulative base GC percentage is <60% of the mean coverage. Higher percentage indicates higher GC bias in the autosomal exome regions.
Fully called genome and exome fractions; both alleles can be confidently called by Trait-o-Matic software.,
Additional 2-fold of dATP and dTTP relative to dGTP and dCTP.
Figure 2GC concordance and insert-size distribution of data from CP-AL. (A–F) The correlation of GC percentage between human reference genome and sequencing reads in Samples01 to 06, respectively. The detailed method is described in Materials and methods. Each point indicates a particular window with the GC percentage (%) determined by sequencing read (X-axis) and human reference genome (Y-axis). Dotted line shows the ideal 100% correlation. Pearson correlation coefficient with P-value is shown in the left side of each figure. (G) The size distributions of four samples prepared with BLBEC and six prepared with CP-AL, respectively. Each sample was non-size-selected and sequenced with paired-end 50 bp reads. The X-axis shows the insert-size, and Y-axis indicates the percentage of the paired-end reads with certain insert-sizes. The figure shown in the left side with red dotted frame is the zoom in view. (H) Physical coverage distribution for the data from different library construction methods (BLBEC and CP-AL) for 50 million read-pairs per sample.
Figure 3Detection of chromosomal structural rearrangements. (A) Spectrum of chromosomal structural rearrangements. Each sample is indicated with the linkages with one particular colour and the event stated next to the linkages. All events are illustrated by the data from CP-AL with paired-end 100 bp sequencing and validated by Sanger sequencing. The events missed by the data from the same sample using shorter read-lengths are indicated by the red arrows (26 bp) and blue arrows (50 bp, outmost), respectively. Chromosomal nucleotide positions and bands are shown according to the University of California, Santa Cruz Genome Viewer Table Browser. The compositions of original chromosomes and the derivation chromosomes illustrated by the sequencing data from CP-AL with paired-end 100 bp reads in Sample03 and Sample05 are shown in figure (B) and (C), respectively. Each DNA/chromosome segment is shown with a different colour and an arrow indicating the genomic orientation.
Figure 4Detection of structural rearrangements in Sample02 a male subject with oligo-atheno-terato-spermia. (A) The compositions of normal chromosomes (shown in green dotted frame) and the derivation chromosomes (shown in red dotted frame) illustrated by the sequencing data from CP-AL with paired-end 100 bp reads in Sample02. Each DNA/chromosome segment is shown with a different colour and an arrow indicating the genomic orientation. (B) Visualization (http://www.kobic.kr/3div/) of interaction between gene SLC17A5 and the other locations in PrEC normal Prostate epithelial cell, BglII, the most relevant cell line in the reported database. Bi. Distribution of topological associated domains (triangles in blue); Bii. Distributions of distance normalized interaction frequency and bias-removed interaction frequency; Biii. Present of super enhancers; Biv. Distribution of genes (RefSeq). The disruption location is indicated by red line (SLC17A5 indicated by a red arrow), while the locations of the certain topological associated domain (same as SLC17A5) are shown in orange dotted lines. Black arrow and green indicate the super enhancer of gene EEF1A1 and the gene itself, respectively.
Figure 5Whole-genome analysis of genomic variants in sample with Trisomy 2. The distributions of allelic ratio (window-size: 100 kb in log2 scale; Supplementary Material), copy-ratio and structural variants are shown from outer to inner circles accordingly. Allelic ratio of on average 4 is shown in chromosome 2 indicating there are two copies of one base type against only one copy of another base type. Karyotypical structures and cytogenetic band colours are shown according to the University of California, Santa Cruz Genome Viewer Table Browser and chromosome colour schemes (outmost circle). Rectangles in red and blue indicate copy-number losses and gains, respectively, while lines in red and blue also indicate copy-number losses and gains. Lines in black in SV analysis show two inversions detected in chromosome 6 and X, respectively.