| Literature DB >> 32995546 |
Andrew Currin1,2, Neil Swainston1,2,3, Mark S Dunstan1,2, Adrian J Jervis1,2, Paul Mulherin1,2, Christopher J Robinson1,2, Sandra Taylor1,2, Pablo Carbonell1,2, Katherine A Hollywood1,2, Cunyu Yan1,2, Eriko Takano1,2, Nigel S Scrutton1,2, Rainer Breitling1,2.
Abstract
Synthetic biology utilizes the Design-Build-Test-Learn pipeline for the engineering of biological systems. Typically, this requires the construction of specifically designed, large and complex DNA assemblies. The availability of cheap DNA synthesis and automation enables high-throughput assembly approaches, which generates a heavy demand for DNA sequencing to verify correctly assembled constructs. Next-generation sequencing is ideally positioned to perform this task, however with expensive hardware costs and bespoke data analysis requirements few laboratories utilize this technology in-house. Here a workflow for highly multiplexed sequencing is presented, capable of fast and accurate sequence verification of DNA assemblies using nanopore technology. A novel sample barcoding system using polymerase chain reaction is introduced, and sequencing data are analyzed through a bespoke analysis algorithm. Crucially, this algorithm overcomes the problem of high-error rate nanopore data (which typically prevents identification of single nucleotide variants) through statistical analysis of strand bias, permitting accurate sequence analysis with single-base resolution. As an example, 576 constructs (6 × 96 well plates) were processed in a single workflow in 72 h (from Escherichia coli colonies to analyzed data). Given our procedure's low hardware costs and highly multiplexed capability, this provides cost-effective access to powerful DNA sequencing for any laboratory, with applications beyond synthetic biology including directed evolution, single nucleotide polymorphism analysis and gene synthesis.Entities:
Keywords: DNA assembly; nanopore sequencing; next-generation sequencing; strand bias; synthetic biology
Year: 2019 PMID: 32995546 PMCID: PMC7445882 DOI: 10.1093/synbio/ysz025
Source DB: PubMed Journal: Synth Biol (Oxf) ISSN: 2397-7000
Figure 1.Allocation of primer pairs to enable the identification of individual wells from highly multiplexed samples, using well B1 from plate 1 as an example. Each well is allocated a forward primer, identifying the source plate, and a reverse primer, identifying the well. This enables the accurate identification of each individual well by data analysis after sequencing.
Figure 2.Overview of the construct-sequencing workflow. Colonies harbouring assembled plasmids are first (A) picked and (B) cultured in deep well plates, prior to (C) dilution to create the PCR template. (D) PCR amplification of the construct generates 5′ (red) and 3′ (green) barcoded amplicons which are (E) analyzed by capillary electrophoresis. (F) Pooled amplicons are prepared for NGS sequencing by adapter ligation and (G) sequenced using the MinION device. (H) Bioinformatics processing of data identifies mutations and removes systematic errors by probabilistic analysis and (I) data metrics are outputted.
Figure 3.Examples of strand bias and a genuine SNV in nanopore data. Each row corresponds to a different read. Bases correctly aligned to the target sequence are shown in gray and potential mutations are highlighted in color (A = yellow, G = blue, T = pink, C = green, deletion = red). The consensus basecall is identified at the relevant position. Strand bias is shown by the inconsistent SNV basecalling between the (A) forward and (B) reverse strand reads from the same sample. Our statistical analysis of this alignment prevents erroneous SNV identification. In contrast, genuine SNVs are identified by an agreement between the (C) forward and (D) reverse strand read data.