| Literature DB >> 32576649 |
Julien Andreani1, Rania Francis1, Frederik Schulz2, Hadjer Boudjemaa1,3, Jacques Yaacoub Bou Khalil1, Janey Lee4, Bernard La Scola1, Tanja Woyke2.
Abstract
Giant viruses have large genomes, often within the size range of cellular organisms. This distinguishes them from most other viruses and demands additional effort for the successful recovery of their genomes from environmental sequence data. Here, we tested the performance of genome-resolved metagenomics on a recently isolated giant virus, Fadolivirus, by spiking it into an environmental sample from which two other giant viruses were isolated. At high spike-in levels, metagenome assembly and binning led to the successful genomic recovery of Fadolivirus from the sample. A complementary survey of the major capsid protein indicated the presence of other giant viruses in the sample matrix but did not detect the two isolated from this sample. Our results indicate that genome-resolved metagenomics is a valid approach for the recovery of near-complete giant virus genomes given that sufficient clonal particles are present. However, our data also underline that a vast majority of giant viruses remain currently undetected, even in an era of terabase-scale metagenomics.IMPORTANCE The discovery of large and giant nucleocytoplasmic large DNA viruses (NCLDV) with genomes in the megabase range and equipped with a wide variety of features typically associated with cellular organisms was one of the most unexpected, intriguing, and spectacular breakthroughs in virology. Recent studies suggest that these viruses are highly abundant in the oceans, freshwater, and soil, impact the biology and ecology of their eukaryotic hosts, and ultimately affect global nutrient cycles. Genome-resolved metagenomics is becoming an increasingly popular tool to assess the diversity and coding potential of giant viruses, but this approach is currently lacking validation.Entities:
Keywords: NCLDV; giant viruses; metagenomics
Year: 2020 PMID: 32576649 PMCID: PMC7311315 DOI: 10.1128/mSystems.00048-20
Source DB: PubMed Journal: mSystems ISSN: 2379-5077 Impact factor: 6.496
FIG 1(Left) Benchmarking approach to giant virus metagenomics. Three giant viruses were isolated from wastewater samples by cocultivation with amoebae; Mimivirus-like particles (dark green), Phoenician Marseillevirus (turquoise), and Fadolivirus (light green) are shown. Giant virus particles were identified using a specific PCR assay (Mimivirus-like particles) or using whole-genome sequencing (Fadolivirus, Phoenician Marseillevirus). Fadolivirus particles were purified and spiked into the initial sample at different concentrations (low, medium, and high; see Materials and Methods for more details). Samples with and without viral spike-in were subjected to shotgun metagenome sequencing, quality control (QC), assembly, and binning. The Fadolivirus metagenome assembled genome (MAG) was then compared to the Fadolivirus reference genome. (a and b) Scanning electron micrographs of isolated giant virus obtained with the TM4000 Plus tabletop microscope. (a) Mimivirus-like particles (white arrows). (b) Phoenician Marseillevirus particles (black arrows). Scale bars are indicated on each micrograph.
FIG 2Metagenomic assembly and binning to generate the Fadolivirus metagenome assembled genome (MAG). (a) Bars indicate the total number of low-quality (LQ), medium-quality (MQ), and high-quality (HQ) MAG, as defined by MIMAG standards, after differential coverage binning of the metagenome assembly derived from the sample with the highest virus spike-in. Colors indicate domain-level taxonomic assignment of MAG according to CheckM. Boxplots show different assembly metrics for MAG. Center lines of box plots represent the median, bounds of boxes represent the lower and upper quartile, and whiskers extend to points that lie within the 1.5 interquartile range of the lower and upper quartile. Green arrows indicate the Fadolivirus MAG. (b) Whole-genome synteny plot of the Fadolivirus MAG (light green) compared to the Fadolivirus reference assembly (black). Areas with >99% alignment identity between the two assemblies are highlighted in dark gray. For each assembly, high-identity structural repeats (>95% nucleic acid similarity) with a length of 80 to 200 bp are connected to each other with orange links. Yellow links connect the repeats between both assemblies.
Assembly metrics of the Fadolivirus metagenome assembled genome (MAG) compared to the Fadolivirus reference assembly
| Parameter | MetaBAT 2-dc | MetaBAT 2 | CONCOCT | MaxBin 2 |
|---|---|---|---|---|
| Bin size (bp) (Fadolivirus MAG) | 1,623,616 | 1,583,180 | 1,941,890 | 1,712,889 |
| Total aligned length (bp) | 1,590,159 | 1,567,605 | 1,590,159 | 1,590,159 |
| Unaligned length (bp) | 33,031 | 15,575 | 351,731 | 122,730 |
| Genome fraction (%) | 99.707 | 98.297 | 99.707 | 99.707 |
| 481,715 | 481,715 | 481,715 | 481,715 | |
| No. of contigs | 12 | 8 | 31 | 21 |
| Largest contig (bp) | 535,783 | 535,783 | 535,783 | 535,783 |
| No. of misassemblies | 0 | 0 | 0 | 0 |
| No. of aligned contigs | 11 | 7 | 11 | 11 |
| No. of unaligned contigs | 1 | 1 | 20 | 4 |
| Duplication ratio | 1.001 | 1.001 | 1.001 | 1.001 |
| No. of N’s per 100 kb | 0 | 0 | 0 | 0 |
| No. of mismatches per 100 kb | 16.61 | 14.87 | 16.33 | 14.88 |
| No. of indels per 100 kb | 1.64 | 1.66 | 1.64 | 1.64 |
MAG from 4 different binning methods are compared. N, unidentified nucleotide.
MetaBAT 2-dc, MetaBAT 2-differential coverage binning.
FIG 3Detection of giant viruses in metagenomic data. (a) Mapping of metagenomic reads from samples with and without viral spike-in to the Fadolivirus reference genome; 98.3% and 99.7% of the Fadolivirus genome could be reconstructed in the metagenome with the highest virus spike-in using MetaBAT 2 (29) and differential coverage binning (*), respectively. (b) Presence of contigs which contained the giant virus MCP gene in samples with and without viral spike-in. Contigs are shown as filled circles which are colored based on the taxonomic origin of the MCP gene. Circle diameter correlates with the total number of MCP genes present on the respective contig. Each contig contained only one MCP gene, with the exception of a single contig in the sample with the high viral spike-in which contained 4 copies of the MCP gene.