| Literature DB >> 21113166 |
Mark Matzas1, Peer F Stähler, Nathalie Kefer, Nicole Siebelt, Valesca Boisguérin, Jack T Leonard, Andreas Keller, Cord F Stähler, Pamela Häberle, Baback Gharizadeh, Farbod Babrzadeh, George M Church.
Abstract
The construction of synthetic biological systems involving millions of nucleotides is limited by the lack of high-quality synthetic DNA. Consequently, the field requires advances in the accuracy and scale of chemical DNA synthesis and in the processing of longer DNA assembled from short fragments. Here we describe a highly parallel and miniaturized method, called megacloning, for obtaining high-quality DNA by using next-generation sequencing (NGS) technology as a preparative tool. We demonstrate our method by processing both chemically synthesized and microarray-derived DNA oligonucleotides with a robotic system for imaging and picking beads directly off of a high-throughput pyrosequencing platform. The method can reduce error rates by a factor of 500 compared to the starting oligonucleotide pool generated by microarray. We use DNA obtained by megacloning to assemble synthetic genes. In principle, millions of DNA fragments can be sequenced, characterized and sorted in a single megacloner run, enabling constructive biology up to the megabase scale.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21113166 PMCID: PMC3579223 DOI: 10.1038/nbt.1710
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1Strategy overview. The general approach includes DNA from a variety of sources. After Next Generation Sequencing the DNA will be sorted and retrieved selectively whereas the technologies used depend on the NGS platform. The particular approach described here includes microarrays as well as conventional sources of oligonucleotides. For sequencing prior sorting and selection the GS FLX platform (454/Roche) was used.
Figure 2(A) Comparison of the initial microarray oligonucleotide pool (blue) and the pool enriched with the Megacloner technology (red) based on the results of the Illumina GAII runs. The bars in Set 1 represent the fraction of reads that could be mapped allowing up to three errors, bars in Set 2 show the fractions of perfectly matching reads to the sequence set of the initial pool (3918 sequences). Differences between the blue and the red bar in Set 2 represent the enrichment of correct sequences by Megacloning. The bars in Set 3 and Set 4 show the fractions of reads mapping to sequences from the selected pool (319 sequences). Differences between blue and red bars in Set 3 show the enrichment of selected 319 sequences prior vs. after Megacloning, blue and red bars in Set 4 represent the enrichment of sequences which are in the set of 319 selected sequences and which are correct. (B) Histogram of read counts in the Illumina GAII data of the initial pool (blue) and the enriched Megacloned sample (red). Only reads mapping without errors to one of the 319 selected target sequences have been taken into account. To compare the two NGS runs on the basis of read counts the numbers have been converted into parts-per-million-units (ppm) taking the number of filtered reads as basis. (C) Composition of reads from the Illumina GAII data including 319 selected sequences in the initial pool (top) and the enriched pool (bottom). The oligonucleotides are sorted by the fraction of correct reads. Green: correct reads, red: error prone reads (compartments in the red bars represent single sequences with a readcount of 0.1% or more of total reads for the particular sequence). Light blue: sum of non-unique error prone reads where each sequence represents less than 0.1% of total reads for the particular sequence. Blue: unique reads. In the Illumina GAII dataset from the enriched sample just 315 out of 319 selected sequences could be detected.