| Literature DB >> 18818729 |
Steven L Salzberg1, Daniel D Sommer, Daniela Puiu, Vincent T Lee.
Abstract
Recent improvements in technology have made DNA sequencing dramatically faster and more efficient than ever before. The new technologies produce highly accurate sequences, but one drawback is that the most efficient technology produces the shortest read lengths. Short-read sequencing has been applied successfully to resequence the human genome and those of other species but not to whole-genome sequencing of novel organisms. Here we describe the sequencing and assembly of a novel clinical isolate of Pseudomonas aeruginosa, strain PAb1, using very short read technology. From 8,627,900 reads, each 33 nucleotides in length, we assembled the genome into one scaffold of 76 ordered contiguous sequences containing 6,290,005 nucleotides, including one contig spanning 512,638 nucleotides, plus an additional 436 unordered contigs containing 416,897 nucleotides. Our method includes a novel gene-boosting algorithm that uses amino acid sequences from predicted proteins to build a better assembly. This study demonstrates the feasibility of very short read sequencing for the sequencing of bacterial genomes, particularly those for which a related species has been sequenced previously, and expands the potential application of this new technology to most known prokaryotic species.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18818729 PMCID: PMC2529408 DOI: 10.1371/journal.pcbi.1000186
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Major steps in the assembly of P. aeruginosa from 33 bp Solexa reads.
| Assembly Step | Input | Number of Contigs | Contigs >200 bp | Largest Contig | Singletons |
| AMOScmp with PA14 | 8,627,900 reads | 2,053 | 428 | 170,485 | 1,127,399 |
| AMOScmp with PAO1 | 8,627,900 reads | 2,797 | 865 | 75,626 | 1,592,525 |
| Merged comparative assemblies | 4,850 contigs | 1,850 | 306 | 236,472 | 1,066,226 |
| Gene-boosted assembly | 306 contigs | 120 | 120 | 512,638 | NA |
| De novo assembly by Velvet | 8,627,900 reads | 10,684 | 7382 | 16,239 | 1,241,079 |
| Merged gene-boosted and Velvet assemblies | 120 contigs, 7382 contigs | 76 | 76 | 512,638 | 822,210 |
The first column indicates the assembly strategies described in the text. Singletons refers to the number of reads that were not used to produce the contigs generated by each method.
Figure 1Comparative assembly using multiple genomes.
The target genome is shown in the center, aligned to two related genomes, A and B. The DNA sequence of the target diverges from the reference genomes in distinct loci, labeled X, Y, and Z. The comparative assembly based on genome A contains a gap corresponding to region Y, while the assembly based on genome B contains two gaps, corresponding to X and Z. The merged assembly will cover all of the target genome with no gaps.
Figure 2Gene-boosted assembly.
All contigs are aligned with predicted gene sequences to identify genes that span 2 or more contigs. The DNA sequences of these spanning genes are cut out with a small buffer on each end. The amino acid translation of each gene fragment is then searched against a translated database of all singleton reads that have not yet been placed in the assembly. Finally, the reads identified by this process are assembled together with the two contigs to fill in the gap.