| Literature DB >> 32161795 |
Hiroshi Shinozuka1, Shimna Sudheesh1, Maiko Shinozuka1, Noel O I Cogan1,2.
Abstract
The current Illumina HiSeq and MiSeq platforms can generate paired-end reads of up to 2 x 250 bp and 2 x 300 bp in length, respectively. These read lengths may be substantially longer than genomic regions of interest when a DNA sequencing library is prepared through a target enrichment-based approach. A sequencing library preparation method has been developed based on the homology-based enzymatic DNA fragment assembly scheme to allow processing of multiple PCR products within a single read. Target sequences were amplified using locus-specific PCR primers with 8 bp tags, and using the tags, homology-based enzymatic DNA assembly was performed with DNA polymerase, T7 exonuclease and T4 DNA ligase. Short PCR amplicons can hence be assembled into a single molecule, along with sequencing adapters specific to the Illumina platforms. As a proof-of-concept experiment, short PCR amplicons (57-66 bp in length) derived from genomic DNA templates of field pea and containing variable nucleotide locations were assembled and sequenced on the MiSeq platform. The results were validated with other genotyping methods. When 5 PCR amplicons were assembled, 4.3 targeted sequences (single-nucleotide polymorphisms) on average were successfully identified within each read. The utility of this for sequencing of short fragments has consequently been demonstrated.Entities:
Keywords: Gibson assembly; next-generation sequencing (NGS); single nucleotide polymorphisms (SNPs); synthetic biology; target enrichment
Year: 2018 PMID: 32161795 PMCID: PMC6994068 DOI: 10.1093/biomethods/bpy001
Source DB: PubMed Journal: Biol Methods Protoc ISSN: 2396-8923
PCR primers for homology-based enzymatic DNA fragment assembly-based library preparation
| Primer set | Locus (adapter) | Primer name | Primer sequence (5ʹ→3ʹ) |
|---|---|---|---|
| Psy_KP1_SNP_100000290 | Forward | ||
| Reverse | |||
| Psy_KP1_SNP_100000228 | Forward | ||
| Reverse | |||
| Psy_KP2_SNP_100000360 | Forward | ||
| Reverse | |||
| Psy_KP3_SNP_100000258 | Forward | ||
| Reverse | |||
| Psy_KP4_SNP_100000576 | Forward | ||
| Reverse | |||
| Psy_KP4_SNP_100000076 | Forward | ||
| Reverse | |||
| Psy_KP4_SNP_100000577 | Forward | ||
| Reverse | |||
| Psy_KP4_SNP_100000137 | Forward | ||
| Reverse | |||
| Psy_KP4_SNP_100000293 | Forward | ||
| Reverse | |||
| Psy_KP4_SNP_100000267 | Forward | ||
| Reverse | |||
| Sequencing library adapter | Positive strand of adapter | mpxPE1(+).adpPsytag1 | A*C*A*CTTTCCCTACACGACGCTCTTCCGATCT |
| Positive strand of adapter | mpxPE1(+).adpPsytag2 | A*C*A*CTTTCCCTACACGACGCTCTTCCGATCT | |
| Negative strand of adapter | mpxPE2(-).GA-adp | A*G*ATCGGAAGAGCACACGTCTGAACTCCA*G*T*C |
The sequence corresponding to assembly tag is shown underlined. An asterisk (*) indicates presence of S-bond modification between the nucleotides. The PCR primers were synthesised at Integrated DNA Technologies and GeneWorks.
Figure 1:Visualised DNA fragments following the T7 and T5 exonuclease activity assay. (a) Effect of S-bond modification on the exonuclease activity. The size of the DNA ladder is shown on the right side of the T7 exonuclease activity assay image. (b) Time course assay of the exonuclease. The purple and green lines show the positions of the upper and lower markers of the Agilent D1000 kit, respectively. ‘T7’ and ‘T5’ stand for T7 and T5 exonuclease, respectively, and ‘NC’ stands for ‘no-enzyme control’, in which molecular biology grade water was used, instead of exonuclease. ‘NEBuf2’ and ‘NEBuf4’ denote that the reaction was performed in the NEBuffer 2 and NEBuffer 4, respectively. The position of the target DNA fragments is indicated with a red arrow. Although a slight DNA size difference can be observed between DNA fragments, the difference is within the size resolution of the instrument and kit (15% for DNA fragments between 35 and 300 bp).
Figure 2:Short DNA fragment assembly-based Illumina library preparation method. (a) Procedure of the library preparation method. The target sequences (five regions, which are shown with blue, light green, purple, brown and aqua lines) are amplified using locus-specific primers with assembly tags (dark blue, orange, red, yellow, dark green and pink boxes) from gDNA templates. Following the PCR, partial DNA digestion is performed with T7 exonuclease. The S-bond modification in the PCR primers reduced the nucleotide catalysis in order to protect DNA fragments from excess digestion. Using DNA polymerase and ligase, the DNA fragments and sequencing adapters (grey boxes) are assembled into a single molecule, and the assembled DNA is used for the second PCR. (b) PCR amplicons generated for preparation of the PsySNP_Set 1 library. The signal peaks between the 50 and 100 bp positions show the PCR amplicons from the Kaspa genotype (c) DNA molecules after the assembly of the five PCR amplicons. (d) DNA molecules after the second PCR. The y and x axes denote the fluorescence intensity and DNA fragment size, respectively (b, c and d). The desired DNA molecules are indicated with a red arrow. LM and UM indicate the lower and upper markers of the Agilent D1000 kit, respectively.
Figure 3:Alignment of sequencing reads to the reference sequences. Sequencing reads (Seq. reads) from the PsySNP_Set 1 library of the Psy_RIL677 genotype (a) and PsySNP_Set 2 library of the PsyRIL99 genotype (b) were visualised on the Sequencher software. The reference sequences (Ref.) is shown at the bottom of the alignment. The A, C, G and T bases are shown in green, dark blue, black and red, respectively. A gap (:) is shown in light blue. The position of target SNP is indicated with the blue arrow, and ambiguity codes (light blue) are used to show candidate nucleotides at the SNP sites of the reference sequences. Under the reference sequence, the corresponding region for each PCR fragments including the 8-bp tags is shown with a blue line.
Genotyping results from the PsySNP_Set 1, PsySNP_Set 2 sequencing libraries and KASP-based methods
| Psy_KP1_SNP_100000290 | Psy_KP1_SNP_100000228 | Psy_KP2_SNP_100000360 | Psy_KP3_SNP_100000258 | Psy_KP4_SNP_100000576 | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| GBS | KASP | GBS | KASP | GBS | KASP | GBS | KASP | GBS | KASP | |||||||||||
| C | G | Alleles | A | G | Alleles | C | T | Alleles | C | G | Alleles | A | T | Alleles | ||||||
| Kaspa | 32 (1%) | C/C | C/C | 51 (1%) | G/G | G/G | 33 (1%) | C/C | C/C | 11 (0%) | C/C | C/C | 21 (0%) | T/T | T/T | |||||
| PBAOura | 238 (2%) | G/G | G/G | 108 (1%) | A/A | A/A | 151 (1%) | T/T | T/T | 135 (1%) | G/G | G/G | 53 (0%) | T/T | T/T | |||||
| Psy_GenoX | 235 (2%) | C/C | NA | 136 (1%) | A/A | NA | 150 (1%) | T/T | N.A. | 208 (1%) | G/G | N.A. | 217 (1%) | A/A | N.A. | |||||
| Psy_RIL99 | 116 (1%) | C/C | C/C | 73 (1%) | G/G | G/G | 85 (1%) | C/C | C/C | 13 (0%) | C/C | C/C | A/T | A/T | ||||||
| Psy_RIL195 | 247 (2%) | G/G | G/G | 118 (1%) | A/A | A/A | 192 (1%) | T/T | T/T | 61 (0%) | C/C | C/C | 90 (0%) | T/T | T/T | |||||
| Psy_RIL268 | 76 (1%) | G/G | G/G | 54 (0%) | A/A | A/A | 38 (0%) | C/C | C/C | 28 (0%) | C/C | C/C | 33 (0%) | T/T | T/T | |||||
| Psy_RIL614 | 54 (0%) | C/C | C/C | 56 (0%) | G/G | G/G | 46 (0%) | C/C | C/C | 23 (0%) | C/C | C/C | 120 (1%) | A/A | A/A | |||||
| Psy_RIL656 | 54 (0%) | C/C | C/C | 63 (0%) | G/G | G/G | 43 (0%) | C/C | C/C | C/G | C/G | 56 (0%) | T/T | T/T | ||||||
| Psy_RIL677 | C/G | C/G | A/G | A/G | 6 (0%) | C/C | C/C | 3 (0%) | C/C | C/C | 15 (0%) | T/T | T/T | |||||||
| Psy_RIL678 | C/G | C/G | A/G | A/G | 22 (0%) | C/C | C/C | 20 (0%) | C/C | C/C | 15 (0%) | T/T | T/T | |||||||
The number of reads corresponding to each allele is shown for the GBS-based method, of which major (positive) allele(s) are shown in bold. An asterisk (*) denotes that the GBS-based genotyping result is not consistent with that from the KASP-based method, but the PCR-RFLP-based method supported the GBS-based result. NA stands for ‘not analysed’.
Enrichment efficiency of the short fragment assembly-based library preparation method
| Genotype | PsySNP_ Set 1 | PsySNP_ Set 2 | PsySNP_ All | Total | ||
|---|---|---|---|---|---|---|
| Kaspa | 4.04 | 4.60 | 4.52 | |||
| PBAOura | 3.38 | 4.57 | 4.01 | |||
| Psy_GenoX | 2.81 | 4.21 | 4.30 | |||
| Psy_RIL99 | 4.01 | 4.70 | 4.55 | |||
| Psy_RIL195 | 3.12 | 4.61 | 4.17 | |||
| Psy_RIL268 | 3.96 | 4.41 | 4.48 | |||
| Psy_RIL614 | 3.71 | 4.73 | 4.53 | |||
| Psy_RIL656 | 3.78 | 4.62 | 4.62 | |||
| Psy_RIL677 | 4.52 | 4.56 | 4.39 | |||
| Psy_RIL678 | 4.49 | 4.63 | 4.56 | |||
| Total | 3.78 | 4.56 | 4.41 | Average | 4.25 | |
| SD | 0.55 | 0.15 | 0.20 | SD | 0.48 |
The average number of sequenced SNP sites in each read is shown from each sequencing library. ‘SD’ stands for standard deviation.
Figure 4:Sequencing libraries prepared through penta-plexed PCR. Products and sequencing libraries after PCR-enrichment and size-selection were visualised on the 2200 TapeStation instrument using the D1000 Kit. The target DNA is indicated with a red arrow. NTC stands for no template control.