| Literature DB >> 20140207 |
Matthew R Henn1, Matthew B Sullivan, Nicole Stange-Thomann, Marcia S Osburne, Aaron M Berlin, Libusha Kelly, Chandri Yandava, Chinnappa Kodira, Qiandong Zeng, Michael Weiand, Todd Sparrow, Sakina Saif, Georgia Giannoukos, Sarah K Young, Chad Nusbaum, Bruce W Birren, Sallie W Chisholm.
Abstract
BACKGROUND: Bacterial viruses (phages) play a critical role in shaping microbial populations as they influence both host mortality and horizontal gene transfer. As such, they have a significant impact on local and global ecosystem function and human health. Despite their importance, little is known about the genomic diversity harbored in phages, as methods to capture complete phage genomes have been hampered by the lack of knowledge about the target genomes, and difficulties in generating sufficient quantities of genomic DNA for sequencing. Of the approximately 550 phage genomes currently available in the public domain, fewer than 5% are marine phage. METHODOLOGY/PRINCIPALEntities:
Mesh:
Substances:
Year: 2010 PMID: 20140207 PMCID: PMC2816706 DOI: 10.1371/journal.pone.0009083
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Impact of lysate nuclease treatment on 454 assembly.
| Library ID | Nuclease +/− | No. of Phage Contigs | Total Contig Length (bp) | Largest Contig Length (bp) | Fraction of bases >Q40 |
| 519 | + | 1 | 175,091 | 174,078 | 99.7% |
| 520 | + | 1 | 174,060 | 174,060 | 99.9% |
| 521 | + | 1 | 175,170 | 174,079 | 99.9% |
| 522 | − | 1 | 174,079 | 174,079 | 100.0% |
| 523 | − | 1 | 174,081 | 174,081 | 99.9% |
| 524 | − | 1 | 174,070 | 174,070 | 99.1% |
*Genome size of P-SM1 is 174,079 bp.
Figure 1Overview of Linker Amplified Shotgun Library (LASL), Whole Genome Shotgun (WGSL), and 454 library construction strategies.
Figure 2Comparison of sequence coverage across genomes sequenced using the 454, LASL, or WGSL approaches.
Coverage plots for T7 (A), P-SSP7 (B), P-SSM2 (C), and P-SS2 are shown. Sequence coverage is binned by 100 nt windows.
LASL and WGSL assembly metrics.
| Reference Phage | Reference Phage Size (bp) | Library Type | Input DNA quantity (ng) | Average Sequence Coverage | % Reference Covered | No. of Phage Contigs | Largest Contig Length (bp) | Contig N50 (bp) | Percent of bases ≥Q20 |
| P-SS2 | 107,530 | LASL | 0.5 | 5.8±3.0 | 98.1% | 7 | 42,908 | 37,263 | 94.0% |
| P-SS2 | 107,530 | LASL | 1.0 | 5.7±2.9 | 97.4% | 12 | 39,421 | 31,931 | 91.0% |
| P-SS2 | 107,530 | WGS | 50.0 | 5.3±2.6 | 98.3% | 22 | 10,683 | 4,457 | 98.0% |
| P-SSM2 | 252,401 | LASL | <0.5 | 4.3±2.9 | 93.0% | 49 | 9,729 | 3,849 | 99.2% |
| P-SSM2 | 252,401 | LASL | 1.0 | 3.1±2.3 | 88.6% | 27 | 4,631 | 3,159 | 99.0% |
| P-SSP7 | 44,970 | LASL | 1.0 | 13.6±6.1 | 99.9% | 13 | 14,001 | 4,483 | 99.4% |
| P-SSP7 | 44,970 | WGS | 400.0 | 16.9±7.4 | 100.0% | 7 | 16,602 | 13,164 | 99.2% |
| T7 | 39,937 | LASL | 1.0 | 15.9±13.1 | 89.7% | 6 | 16,857 | 16,857 | 99.6% |
| T7 | 39,937 | WGS | >5000 | 66.3±56.5 | 91.8% | 9 | 14,369 | 6,148 | 99.2% |
Figure 3Bias at low and high sequence coverage of P-SSP7 genome sequenced using the WGSL approach.
Sequence coverage is binned by 100 nt windows.
454 assembly metrics.
| Phage | DNA template | Input DNA quantity (ng) | Average Aligned Sequence Coverage | No. of Contigs | No. of Phage Contigs | Largest Contig Length (bp) | Total Contig Length (bp) | Percent of the Reference Covered | Percent of bases ≥Q40 | 454 Platform (assembled read average length in bp) |
| P-SS2 | CsCl purified particles | 1 | 65.2±14.4 | 1 | 1 | 105,532 | 105,532 | 99.4% | 99.9% | GS20 & FLX (161.5) |
| P-SSM2 | Lysate | 4 | 33.1±9.0 | 23 | 1 | 252,407 | 252,407 | 100.0% | 99.5% | GS20 & FLX (239.6) |
| P-SSP7 | CsCl purified particles | 4 | 22.5±5.7 | 3 | 3 | 39,777 | 44,935 | 99.9% | 99.4% | GS20 (103.6) |
| T7 | Epicenter Biotech-nologies | 500 | 15.3±5.2 | 1 | 1 | 39,778 | 39,778 | 100.0% | 99.8% | GS20 & FLX (145.8) |
*See Table 1 for reference genome size.
454 assembly quality as a function of sequence coverage.
| Sequence Coverage | Total Large Contigs (>500 nt) | Total Large Contig Length | Largest Contig Length | Largest Contig Sequence Coverage | Percent of bases >Q40 |
| 8.5 | 11 | 174,742 | 65,299 | 9.24 | 97.9 |
| 10.2 | 6 | 175,215 | 84,315 | 10.98 | 98.8 |
| 11.5 | 3 | 175,483 | 173,969 | 11.61 | 99.1 |
| 13.2 | 4 | 176,170 | 174,079 | 13.36 | 99.5 |
| 16.6 | 10 | 179,923 | 174,080 | 17.06 | 98.9 |
| 20.5 | 16 | 184,250 | 174,079 | 21.5 | 98.7 |
| 30.1 | 100 | 239,442 | 174,079 | 40.47 | 93.5 |
| 30.8 | 123 | 254,225 | 174,079 | 43.86 | 92.6 |
*Only lower sequence coverages are shown.
Results of in silico mixed sample assembly experiments.
| Input Genomes | Fragment length | Ratio | No. of Large Contigs | Genomes assembled |
| PSSM2/PSSM4/PSSP7 | 100 | 1∶1∶1 | 3 | PSSM2/PSSM4/PSSP7 |
| PSSM2/PSSM4/PSSP7 | 425 | 1∶1∶1 | 3 | PSSM2/PSSM4 |
| PSSM2/PSSM4/PSSP7/MED4-259/MED4-247 | 100 | 1∶1∶1∶1∶1 | 83 | PSSM2/PSSM4/PSSP7 |
| PSSM2/PSSM4/PSSP7/MED4-259/MED4-247 | 425 | 1∶1∶1∶1∶1 | 28 | MED4-259/MED4-247/PSSM2/PSSM4 |
|
| 100 | 2∶1∶1 | 3 |
|
|
| 425 | 2∶1∶1 | 3 |
|
| PSSM2/ | 100 | 1∶2∶2 | 3 | PSSM2/ |
| PSSM2/ | 425 | 1∶2∶2 | 3 | PSSM2/ |
Genome names in bold were represented twice as often in the mixed samples with non-uniform ratios of sequences.
*If a genome is included in the Genomes assembled column, then a large contig from the Newbler assembly output was matched to a genome of the appropriate size using MUMMER.
Performance of phage annotation pipeline.
| Genome | Genome Size (bp) | No. ORFs GenBank RefSeq | No. ORFs Annotated Broad | No. ORFs Same Start & Stop | No. ORFs Same Start Different Stop | No. ORFs Same Stop Different Start | No. ORFs in GenBank Only | No. ORFs in Broad Only | No. blastLoci Not In GenBank | No. blastLoci Not In Broad |
| T7 | 39,778 | 60 | 49 | 46 | 3 | 4 | 11 | 0 | 0 | 0 |
| P-SSP7 | 44,935 | 53 | 52 | 47 | 1 | 3 | 2 | 1 | 1 | 0 |
| P-SSM2 | 252,407 | 329 | 331 | 300 | 0 | 21 | 7 | 9 | 1 | 1 |
*GenBank predictions contain multiple overlapping hypothetical proteins at the same locus.