| Literature DB >> 30407573 |
David Paez-Espino1, Simon Roux1, I-Min A Chen2, Krishna Palaniappan2, Anna Ratner2, Ken Chu2, Marcel Huntemann1, T B K Reddy1, Joan Carles Pons3, Mercè Llabrés3, Emiley A Eloe-Fadrosh1, Natalia N Ivanova1, Nikos C Kyrpides1.
Abstract
The Integrated Microbial Genome/Virus (IMG/VR) system v.2.0 (https://img.jgi.doe.gov/vr/) is the largest publicly available data management and analysis platform dedicated to viral genomics. Since the last report published in the 2016, NAR Database Issue, the data has tripled in size and currently contains genomes of 8389 cultivated reference viruses, 12 498 previously published curated prophages derived from cultivated microbial isolates, and 735 112 viral genomic fragments computationally predicted from assembled shotgun metagenomes. Nearly 60% of the viral genomes and genome fragments are clustered into 110 384 viral Operational Taxonomic Units (vOTUs) with two or more members. To improve data quality and predictions of host specificity, IMG/VR v.2.0 now separates prokaryotic and eukaryotic viruses, utilizes known prophage sequences to improve taxonomic assignments, and provides viral genome quality scores based on the estimated genome completeness. New features also include enhanced BLAST search capabilities for external queries. Finally, geographic map visualization to locate user-selected viral genomes or genome fragments has been implemented and download options have been extended. All of these features make IMG/VR v.2.0 a key resource for the study of viruses.Entities:
Mesh:
Year: 2019 PMID: 30407573 PMCID: PMC6323928 DOI: 10.1093/nar/gky1127
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Growth rate of predicted viral genomes and genome fragments from publicly available assembled metagenomes. Growth during IMG/VR update cycles for total (UViGs) and unique (vOTUs) viral genomes and genome fragments from the Integrated Microbial Genome & Microbiomes (IMG/M) system using the JGI’s metagenomic virus discovery pipeline (16). Previous reports include the Earth's virome project (3) and the first release of IMG/VR (10).
Predicted bacterial and archaeal host phyla with corresponding number of UViGs. Archaeal phyla are indicated with (A)
| Host phylum | Viral contig count |
|---|---|
| Euryarchaeota (A) | 218 |
| Crenarchaeota (A) | 58 |
| ca. Micrarchaeota (A) | 40 |
| Thaumarchaeota (A) | 6 |
| ca. Bathyarchaeota (A) | 4 |
| Aigarchaeota (A) | 4 |
| Nanoarchaeota (A) | 2 |
| Thermotogae (A) | 1 |
| Firmicutes | 8123 |
| Proteobacteria | 5911 |
| Bacteroidetes | 3583 |
| Actinobacteria | 1971 |
| Fusobacteria | 1801 |
| Spirochaetes | 130 |
| Verrucomicrobia | 127 |
| Synergistetes | 58 |
| Thermotogae | 53 |
| Chloroflexi | 52 |
| Cyanobacteria | 47 |
| Chlorobi | 47 |
| Deinococcus-Thermus | 32 |
| Aquificae | 26 |
| Fibrobacteres | 21 |
| Planctomycetes | 15 |
| Chlamydiae | 14 |
| Ignavibacteriae | 12 |
| Caldiserica | 9 |
| ca. Atribacteria | 9 |
| Gemmatimonadetes | 8 |
| ca. Desantisbacteria | 7 |
| Armatimonadetes | 5 |
| ca. Marinimicrobia | 5 |
| ca. Fervidibacteria | 4 |
| ca. Cloacimonetes | 4 |
| ca. Microgenomates | 3 |
| Marinimicrobia | 3 |
| ca. Moranbacteria | 3 |
| ca. Parcubacteria | 3 |
| ca. Aminicenantes | 2 |
| ca. Saccharibacteria | 2 |
| Nitrospirae | 2 |
| Tenericutes | 2 |
| ca. Wildermuthbacteria | 2 |
| ca. Daviesbacteria | 2 |
| ca. Omnitrophica | 1 |
| Lentisphaerae | 1 |
| Acidobacteria | 1 |
| ca. Gracilibacteria | 1 |
| Thermodesulfobacteria | 1 |
*Host phyla without a previous connected virus. Microbial phyla classification according to the IMG/M system.
†Candidate Phyla from the CPR.
Figure 2.Distribution and example of the different viral genome quality categories in IMG/VR v.2.0. (A) Distribution of the number of sequences identified as ‘finished genome’, ‘high-quality draft genomes’, or ‘genome fragments.’ (B) Comparison of three contigs from vOTU_00079, two ‘high-quality draft genomes’ and one ‘genome fragment’. Genome quality category was based on estimated genome completeness (Roux et al., in press, Nature Biotech). Genes are colored according to their functional annotation. The starting coordinate of the circular contig map was shifted to match the one of the linear contig maps.
Figure 3.Example of analyses features in IMG/VR v.2.0. (1) UViGs were selected from a ‘Viral Datasets’ sample and added to the ‘Scaffold Cart’. (2) From the ‘Find Functions’ tab protein families (pfams) were filtered by the text ‘terminase’ obtaining pfams associated with this predicted viral function. (3) Located in the ‘Scaffold Cart’, the ‘Scaffold Function Profile’ option allows user to see the distribution of the selected functions (pfams in this example) against the selected list of UViGs. (4) Additionally, all genes from the selected UViGs can be added to the ‘Gene Cart’ by clicking ‘Add Genes from Selected Scaffolds to Cart’. (5) ‘Gene Cart’ functionality allows users to perform (6) gene or protein alignments of selected sequences and visualization as a phylogram or (7) to display the gene neighborhood of the selected genes (underscored in red). The location of the tools or the steps necessary to recreate this example are indicated in a red box.
Figure 4.Visualization of geographic location from selected viral genomes. Uncultivated viral genomes and genome fragments (UViGs) can be accessed differently. (1) Here, UViGs from the viral cluster ‘vc_2912’ were selected from the ‘Viral Clusters’ link in the IMG/VR Home Page and added to the ‘Scaffold Cart’. (2) 14 UViGs were retrieved and a feature table displayed (not shown in the figure). (3) To obtain a Google Map with the location of the selected UViGs, users need to get back to the Home Page and select ‘Scaffold Cart’ from the ‘Ecosystem’ drop down box above the map. Map pins (in red) represent location counts of viral contigs and may contain multiple samples. Map pins are grouped into clusters (bold number in a coloured square based on number of members within the cluster) according to the Google Map javascript API utility library. (4) As you zoom into any of the cluster locations, the number on the cluster decreases, and you begin to see the individual markers on the map from which specific UViGs can be selected. Zooming out of the map consolidates the markers into clusters again.
Figure 5.Searches against IMG/VR v.2.0 databases. (A) Location of the blast tool in IMG/VR v.2.0 (dashed red box). (B) User interface to blast sequences. Users can select between nucleotide or protein searches by selecting the blastn or blastp program when querying external ‘Viral Sequences’. For nucleotide searches, users can additionally use the displayed spacer ‘Blast Databases’.