| Literature DB >> 27789703 |
Ana Laura Grazziotin1, Eugene V Koonin2, David M Kristensen3,2.
Abstract
Viruses are the most abundant and diverse biological entities on earth, and while most of this diversity remains completely unexplored, advances in genome sequencing have provided unprecedented glimpses into the virosphere. The Prokaryotic Virus Orthologous Groups (pVOGs, formerly called Phage Orthologous Groups, POGs) resource has aided in this task over the past decade by using automated methods to keep pace with the rapid increase in genomic data. The uses of pVOGs include functional annotation of viral proteins, identification of genes and viruses in uncharacterized DNA samples, phylogenetic analysis, large-scale comparative genomics projects, and more. The pVOGs database represents a comprehensive set of orthologous gene families shared across multiple complete genomes of viruses that infect bacterial or archaeal hosts (viruses of eukaryotes will be added at a future date). The pVOGs are constructed within the Clusters of Orthologous Groups (COGs) framework that is widely used for orthology identification in prokaryotes. Since the previous release of the POGs, the size has tripled to nearly 3000 genomes and 300 000 proteins, and the number of conserved orthologous groups doubled to 9518. User-friendly webpages are available, including multiple sequence alignments and HMM profiles for each VOG. These changes provide major improvements to the pVOGs database, at a time of rapid advances in virus genomics. The pVOGs database is hosted jointly at the University of Iowa at http://dmk-brain.ecn.uiowa.edu/pVOGs and the NCBI at ftp://ftp.ncbi.nlm.nih.gov/pub/kristensen/pVOGs/home.html. Published by Oxford University Press on behalf of Nucleic Acids Research 2016. This work is written by (a) US Government employee(s) and is in the public domain in the US.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27789703 PMCID: PMC5210652 DOI: 10.1093/nar/gkw975
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Coverage of protein sequences in VOGs for each viral family. The number of genomes in each family is shown in parentheses. Families with representatives covered by VOGs are shown. Globuviridae is not shown since its representative viruses had no proteins present in VOGs.
Figure 2.Sizes and functions of the largest VOGs. A single function for these was manually defined and reported based on the consistency of individual protein annotations within each VOG.
Figure 3.Screenshot of pVOGs webpages showing the main access to database content: (A) genome table with information about all virus genomes in the database; (B) VOG table describing a list of all VOGs present in the pVOGs database, protein annotations of sequences present in each VOG and their mapping to the previous POGs.
Figure 4.Screenshot of pVOGs webpages showing additional types of information available: (A) individual genome table showing a list of VOGs present in a particular genome and respective protein annotations; (B) individual VOG table, describing detailed information about each VOG and protein content. This page also provides tabular files, multiple sequence alignments and HMM profiles for downloading.
Mapping of the current pVOGs to the older POGs
| Number of clusters | 4542 | 9518 |
| Number of genomes in clusters | 1018 | 2976 |
| Number of conserved proteins in clusters | 58 276 | 195 002 |
| Number of clusters | 4201 | 3549 |
| Number of genomes | 982 | 982 |
| Number of conserved proteins | 48 246 | 48 246 |
| Number of exactly identical clusters | 2773 | 2773 |
| Number of overlapping clusters | 1153 | 509 |
| Number of disjoint clusters | 275 | 267 |
| Agreement between datasets (%)a | 94% | 92% |
aPercentage of cluster agreement between reduced intersection datasets of POG 2013 and pVOGs 2016.