| Literature DB >> 20398345 |
Niranjan Nagarajan1, Christopher Cook, Mariapia Di Bonaventura, Hong Ge, Allen Richards, Kimberly A Bishop-Lilly, Robert DeSalle, Timothy D Read, Mihai Pop.
Abstract
While new sequencing technologies have ushered in an era where microbial genomes can be easily sequenced, the goal of routinely producing high-quality draft and finished genomes in a cost-effective fashion has still remained elusive. Due to shorter read lengths and limitations in library construction protocols, shotgun sequencing and assembly based on these technologies often results in fragmented assemblies. Correspondingly, while draft assemblies can be obtained in days, finishing can take many months and hence the time and effort can only be justified for high-priority genomes and in large sequencing centers. In this work, we revisit this issue in light of our own experience in producing finished and nearly-finished genomes for a range of microbial species in a small-lab setting. These genomes were finished with surprisingly little investments in terms of time, computational effort and lab work, suggesting that the increased access to sequencing might also eventually lead to a greater proportion of finished genomes from small labs and genomics cores.Entities:
Mesh:
Year: 2010 PMID: 20398345 PMCID: PMC2864248 DOI: 10.1186/1471-2164-11-242
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Statistics from some contemporary finishing projects.
| Genome | Size (Mbp) | Sequencing Center | Release Date | Finishing Reads |
|---|---|---|---|---|
| 4.68 | TIGR | 12/12/07 | 5,642 | |
| 5.4 | JGI | 02/06/06 | 2,417 | |
| 4.1 | TIGR | 05/08/07 | 4,521 | |
| 4.99 | JCVI | 07/24/08 | 799 | |
| - | Baylor | Ongoing | 524 | |
| 8.2 | JCVI | Ongoing | 3,828 | |
| 4.6 | Sanger | 10/01/08 | 2,033 |
Data was collected from NCBI's Trace Archive and Genomes Database. The V. cholerae genome was sequenced using a 454/Sanger hybrid approach while the rest of the genomes were sequenced by Sanger sequencing. Note that the P. stewartii genome was found to be particularly hard to finish, despite the high sequence coverage using Sanger sequencing, because of the presence of numerous plasmids in the sequenced strain.
Figure 1Summary of the finishing effort for . As can be seen from the figure much of the finishing effort (PCR experiments indicated by bars in the outermost ring) were devoted to disambiguating the neighborhood of the rRNA operon.
Assembly and Map statistics for the Yersinia genomes.
| 454 Contigs | Optical Map | |||||
|---|---|---|---|---|---|---|
| 29× | 488 (50) | 93 | AflII: 4.63 | 350 | 5.5 | |
| 25× | 375 (83) | 86 | AflII: 4.30 | 360 | 11.4 | |
| 36× | 1637 (74) | 85 | AflII: 4.93 | 397 | 19.0 | |
| NheI: 4.92 | 556 | 6.4 | ||||
| 29× | 1244 (46) | 150 | AflII: 5.34 | 467 | 24.7 | |
| NheI: 5.39 | 611 | 3.6 | ||||
| 26× | 1219 (84) | 59 | AflII: 4.54 | 415 | 2.6 | |
| NheI: 4.50 | 591 | 13.9 | ||||
| 35× | 1242 (60) | 124 | AflII: 4.95 | 436 | 6.6 | |
| NheI: 5.06 | 537 | 10.2 | ||||
| 22× | 281 (59) | 116 | AflII: 4.65 | 413 | 11.3 | |
| NheI: 4.64 | 458 | 12.7 | ||||
| 36× | 419 (63) | 79 | AflII: 3.90 | 142 | 29.6 | |
| NheI: 3.95 | 457 | 7.9 | ||||
For the 454 contigs we report the average coverage of the contigs, the number of contigs (with large contigs in parentheses) and the N50 size. For the optical maps, we report the total size, number of fragments and N50 size of fragments for AflII and NheI based maps on seperate lines.
Scaffolding results for the Yersinia genomes.
| Strain | AflII based | NheI based | Both Maps | Draft genome | |
|---|---|---|---|---|---|
| Size in Mbp (% of genome) | Size in Mbp (% of genome) | Size in Mbp (% of genome) | Size in Mbp (% of genome) | # of gaps (>10 | |
| 3.91 (83.9) | 4.36 (93.7) | 37 (8) | |||
| 3.51 (82.9) | 3.64 (86.2) | 39 (14) | |||
| 3.73 (76.1) | 3.95 (80.6) | 4.15 (84.8) | 4.30 (87.8) | 56 (24) | |
| 4.34 (81.0) | 4.54 (84.8) | 4.63 (86.3) | 4.72 (88.1) | 33 (20) | |
| 3.32 (72.8) | 3.50 (76.7) | 3.62 (79.2) | 3.87 (84.8) | 57 (25) | |
| 4.38 (86.9) | 4.14 (82.0) | 4.35 (86.3) | 4.62 (91.5) | 53 (15) | |
| 3.91 (85.3) | 3.81 (82.9) | 4.01 (87.3) | 4.15 (90.4) | 40 (16) | |
| 1.93 (49.4) | 3.13 (80.1) | 3.20 (82.0) | 3.34 (85.5) | 39 (17) | |
Here we report the size of the scaffolds obtained by combining each of the optical maps and the 454 contigs. These scaffolds were then merged and augmented with contig graph information (see Methods) to obtain the draft genome and we report the results after both stages. For the draft genome we also report the number of gaps in the final scaffold.
Figure 2. Note that the comments for Figure 3 are also valid here. This graph can be resolved into a unique in silico reconstruction of the genome.
Figure 3Partial Contig Graph of . The pointed boxes represent contigs while the edges mark the presence of reads that span the corresponding contigs. The arrows on both ends of an edge indicate the orientation of the adjacent contigs. An arrow "out" of a contig indicates that the end of the contig is adjacent and an arrow "in" indicates that the beginning of the contig is adjacent.
Figure 4AMOS-Hybrid pipeline. Circles are used to represent input/output and intermediate datasets. Names in parentheses refer to the programs used to perform the corresponding tasks in the boxes.
Figure 5The optical mapping process. To generate a whole-genome optical map, DNA is sheared into fragments that are stretched and fixed onto an optical mapping surface and then digested using a restriction enzyme. The resulting pieces are optically analyzed and assembled into a genome-wide map.