| Literature DB >> 24939889 |
Jason T Ladner1, Brett Beitzel2, Patrick S G Chain3, Matthew G Davenport4, Eric F Donaldson5, Matthew Frieman6, Jeffrey R Kugelman2, Jens H Kuhn7, Jules O'Rear5, Pardis C Sabeti, David E Wentworth8, Michael R Wiley2, Guo-Yun Yu2, Shanmuga Sozhamannan, Christopher Bradburne4, Gustavo Palacios9.
Abstract
Thanks to high-throughput sequencing technologies, genome sequencing has become a common component in nearly all aspects of viral research; thus, we are experiencing an explosion in both the number of available genome sequences and the number of institutions producing such data. However, there are currently no common standards used to convey the quality, and therefore utility, of these various genome sequences. Here, we propose five "standard" categories that encompass all stages of viral genome finishing, and we define them using simple criteria that are agnostic to the technology used for sequencing. We also provide genome finishing recommendations for various downstream applications, keeping in mind the cost-benefit trade-offs associated with different levels of finishing. Our goal is to define a common vocabulary that will allow comparison of genome quality across different research groups, sequencing platforms, and assembly techniques.Entities:
Mesh:
Year: 2014 PMID: 24939889 PMCID: PMC4068259 DOI: 10.1128/mBio.01360-14
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1 Graphical representation of viral genome standards. Bullets on the left represent primary distinctions between categories. Bullets on the right indicate potential downstream applications of genomes in each category.
Overview of viral genome standards
| Feature | Standard draft[ | High quality[ | Coding complete[ | Complete | Finished |
|---|---|---|---|---|---|
| No. of contigs | >1 for some segments | 1 per segment | 1 per segment | 1 per segment | 1 per segment |
| Open reading frames | Incomplete | Incomplete | Complete | Complete | Complete |
| Estimated % of genome covered[ | ≥50% | ~80-90% | ~90-99% | 100% | 100% |
| Population-level characterization | Optional | Optional | Optional | Optional | Required |
| Contaminant analysis | Optional | Optional | Optional | Optional | Optional |
It is suggested that all bases included in any incomplete genome meet a minimum quality standard, with ≥5 reads supporting the consensus base call with individual base qualities of ≥20 on the Phred scale.
Percentages of genome covered are not meant to serve as criteria for categorizing a genome; they are simply estimates of expected levels of coverage.