| Literature DB >> 20544019 |
Harish Nagarajan1, Jessica E Butler, Anna Klimes, Yu Qiu, Karsten Zengler, Joy Ward, Nelson D Young, Barbara A Methé, Bernhard Ø Palsson, Derek R Lovley, Christian L Barrett.
Abstract
State-of-the-art DNA sequencing technologies are transforming the life sciences due to their ability to generate nucleotide sequence information with a speed and quantity that is unapproachable with traditional Sanger sequencing. Genome sequencing is a principal application of this technology, where the ultimate goal is the full and complete sequence of the organism of interest. Due to the nature of the raw data produced by these technologies, a full genomic sequence attained without the aid of Sanger sequencing has yet to be demonstrated.We have successfully developed a four-phase strategy for using only next-generation sequencing technologies (Illumina and 454) to assemble a complete microbial genome de novo. We applied this approach to completely assemble the 3.7 Mb genome of a rare Geobacter variant (KN400) that is capable of unprecedented current production at an electrode. Two key components of our strategy enabled us to achieve this result. First, we integrated the two data types early in the process to maximally leverage their complementary characteristics. And second, we used the output of different short read assembly programs in such a way so as to leverage the complementary nature of their different underlying algorithms or of their different implementations of the same underlying algorithm.The significance of our result is that it demonstrates a general approach for maximizing the efficiency and success of genome assembly projects as new sequencing technologies and new assembly algorithms are introduced. The general approach is a meta strategy, wherein sequencing data are integrated as early as possible and in particular ways and wherein multiple assembly algorithms are judiciously applied such that the deficiencies in one are complemented by another.Entities:
Mesh:
Year: 2010 PMID: 20544019 PMCID: PMC2882325 DOI: 10.1371/journal.pone.0010922
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Assembly Strategy.
A) Hybrid Assembly Phase; B) Scaffold Bridging and Finishing Phase; C) Scaffold Ordering Phase: The left branch of the decision tree consists of permutations that can be confirmed by the PCR performed while the right branch consists of those permutations that cannot be confirmed by the particular PCR. The faded permutations are those which have been eliminated by the PCRs while those in bold are those that are remaining. (Gel Inset: Showing PCR products for all the 9 PCRs performed in the search strategy to confirm the correct orientation of the scaffolds); D) Genome Finishing Phase.
Summary and statistics of different stages of Meta-Assembly.
| Phase | Assembler | Number of Contigs/Scaffolds | N50(kb) | Degenerate Positions | Assembly Length(Mb) |
| A | EULER-SR(Illumina Alone) | 4233 | 1.487 | 0 | 3.51 |
| A | Newbler(454 Reads+Illumina) | 270 | 92.67 | 0 | 3.72 |
| A | Newbler Scaffolder (Mate Pairs) | 3 | 3184.3 | 41421 | 3.71 |
| B | Scaffold Bridger/Finisher | 4 | 3184.3 | 0 | 3.71 |
| C | Scaffold Ordering | 1 | 3714.2 | 0 | 3.71 |
| D | Finisher | 1 | 3714.2 | 0 | 3.71 |
Changes made due to alignment of Illumina reads in the genome-finishing phase.
| Changes | Number of Changes |
| SNPs | 101 |
| Deletions | 18 |
| Insertions | 7 |
Figure 2Assembly validation approaches.
A) Sanger sequencing approach; B) Comparative Genomics approach.
Figure 3Genome-level comparison of KN400 and PCA.
Shown in this figure is a dot-plot of the genome-wide alignment of KN400 and PCA. Along the X-Axis is the KN400 genome and the PCA genome is shown along the Y axis.
Figure 4Gel picture confirming the 79 kb deletion (Region2) in KN400.
PCR was performed with primer sets in order to amplify over the break (shown in panel B). The expected product size is 207bp. Panel A shows that we can amplify over the break only in KN400 and not in PCA, confirming the deletion of region 2 in KN400.
Figure 5Density of five different genomic properties in the space of microbial genomes.
A) GC Content B) Genome Size C)Number of rRNAs D) Number of Replicons E) Number of tRNAs. Shown in red circle, is the value of KN400's genomic property.
Comparison of Meta-Assembly to other assembly programs.
| Assembler | Number of Contigs/Scaffolds | N50(kb) | Degenerate Positions | Assembly Length(Mb) |
| Meta-Assembly | 1 | 3714.2 | 0 | 3.71 |
| EULER-SR | 150 | 58 | 0 | 3.70 |
| Velvet | 329 | 45 | 486532 | 3.89 |
| Newbler alone | 5 | 3184.3 | 90449 | 3.72 |