| Literature DB >> 30556814 |
Simon Roux1, Evelien M Adriaenssens2, Bas E Dutilh3,4, Eugene V Koonin5, Andrew M Kropinski6, Mart Krupovic7, Jens H Kuhn8, Rob Lavigne9, J Rodney Brister5, Arvind Varsani10,11, Clara Amid12, Ramy K Aziz13, Seth R Bordenstein14, Peer Bork15, Mya Breitbart16, Guy R Cochrane12, Rebecca A Daly17, Christelle Desnues18, Melissa B Duhaime19, Joanne B Emerson20, François Enault21, Jed A Fuhrman22, Pascal Hingamp23, Philip Hugenholtz24, Bonnie L Hurwitz25,26, Natalia N Ivanova1, Jessica M Labonté27, Kyung-Bum Lee28, Rex R Malmstrom1, Manuel Martinez-Garcia29, Ilene Karsch Mizrachi5, Hiroyuki Ogata30, David Páez-Espino1, Marie-Agnès Petit31, Catherine Putonti32,33,34, Thomas Rattei35, Alejandro Reyes36, Francisco Rodriguez-Valera37, Karyna Rosario16, Lynn Schriml38, Frederik Schulz1, Grieg F Steward39, Matthew B Sullivan40,41, Shinichi Sunagawa42, Curtis A Suttle43,44,45,46, Ben Temperton47, Susannah G Tringe1, Rebecca Vega Thurber48, Nicole S Webster24,49, Katrine L Whiteson50, Steven W Wilhelm51, K Eric Wommack52, Tanja Woyke1, Kelly C Wrighton17, Pelin Yilmaz53, Takashi Yoshida54, Mark J Young55, Natalya Yutin5, Lisa Zeigler Allen56,57, Nikos C Kyrpides1, Emiley A Eloe-Fadrosh1.
Abstract
We present an extension of the Minimum Information about any (x) Sequence (MIxS) standard for reporting sequences of uncultivated virus genomes. Minimum Information about an Uncultivated Virus Genome (MIUViG) standards were developed within the Genomic Standards Consortium framework and include virus origin, genome quality, genome annotation, taxonomic classification, biogeographic distribution and in silico host prediction. Community-wide adoption of MIUViG standards, which complement the Minimum Information about a Single Amplified Genome (MISAG) and Metagenome-Assembled Genome (MIMAG) standards for uncultivated bacteria and archaea, will improve the reporting of uncultivated virus genomes in public databases. In turn, this should enable more robust comparative studies and a systematic exploration of the global virosphere.Entities:
Mesh:
Year: 2018 PMID: 30556814 PMCID: PMC6871006 DOI: 10.1038/nbt.4306
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1Size of virus genome databases over time[4,7,22,45,83,84,85,86,87,88,89].
Genome sequences from isolates (blue and green) or from UViGs (yellow) are shown. For genomes from isolates, the total number of genomes (blue) and the number of 'reference' genomes (green) are shown. Data were downloaded using the queries “Viruses[Organism] AND srcdb_refseq[PROP] NOT wgs[PROP] NOT cellular organisms[ORGN] NOT AC_000001:AC_999999[PACC]” for reference genomes and “Viruses[Organism] NOT cellular organisms[ORGN] NOT wgs[PROP] NOT AC_000001:AC_999999[pacc] NOT gbdiv syn[prop] AND nuccore genome samespecies[Filter]” for total number of virus genomes, on the NCBI nucleotide database portal (https://www.ncbi.nlm.nih.gov/nuccore) in January 2018. Genomes from the influenza virus database (https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi?go=genomeset) were also added to the total number of virus genomes. UViGs can be assembled from metagenomes, from proviruses identified in microbial genomes, or from single-virus genomes, and estimated total UViG numbers were obtained by compiling data from the literature and from the total number of sequences in the IMG/VR database in January 2017, January 2018 and July 2018 (https://img.jgi.doe.gov/vr/)[11]. UpViG, uncultivated provirus.
List of mandatory metadata for UViGs
| Mandatory metadata | Description |
|---|---|
| Source of UViGs | Type of dataset from which the UViG was obtained |
| Assembly software | Tool(s) used for assembly and/or binning, including version number and parameters |
| Virus identification software | Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used (see |
| Predicted genome type | Type of genome predicted for the UViG |
| Predicted genome structure | Expected structure of the viral genome |
| Detection type | Type of UViG detection |
| Assembly quality | The assembly quality categories, specific for virus genomes, are based on sets of criteria as follows: |
| Number of contigs | Total number of contigs composing the UViG |
For a complete list and description of mandatory and optional metadata, see Supplementary Table 1.
Figure 2Identification of UViGs.
Schematic of methods used to obtain UViGs. Steps that have been adapted from those used to assemble MAGs and SAGs[12] or added for UViG are shown for sample preparation (orange) and bioinformatics analysis (blue). Steps specifically required for virus targeting and identification are highlighted in bold. *For viruses with short genomes, long-read technologies can provide complete genomes from shotgun sequencing in a single read, bypassing the assembly step[24]. **Targeted sequence capture can be used to recover viral genomes from a known virus group. These genomes can be recovered from samples in which they represent a small fraction of the templates (for example, clinical samples[20]).
Figure 3UViG classification and associated sequence analyses.
“Functional potential” is functional annotation used in gene content analysis. “Host prediction” is the application of different in silico host prediction tools. “Taxonomic classification” is classification of the contig to established groups using marker genes or gene content comparison. “Diversity and distribution” includes vOTU clustering and relative abundance estimation through metagenome read mapping, at the geographical scale or across anatomical sites for host-associated datasets. “New taxonomic groups” concerns the delineation of new proposed groups (for example, families or genera) based exclusively on UViG sequences. “New reference species” refers to the proposal of a new entry in ICTV (https://talk.ictvonline.org/files/taxonomy-proposal-templates/). *Some of these approaches require a minimum contig size—for example, contigs ≥10 kb for taxonomic classification based on gene content[59] or diversity estimation[47]—and will not be applicable to every genome fragment.
Summary of required characteristics for each category
| Category | Genome fragment(s) | High-quality draft genome | Finished genome |
|---|---|---|---|
| Assembly | Single or multiple fragments | Single or multiple fragments where gaps span (mostly) repetitive regions | Single contiguous sequence (per segment) without gaps or ambiguities |
| Completeness | <90% expected genome size or no expected genome size | Complete or ≥90% of expected genome size | Complete |
| Required features | Minimal annotation | Minimal annotation | Comprehensive manual review and editing |
Complete genomes include sequences detected as circular, those with terminal inverted repeats, or those for which an integration site is identified.