| Literature DB >> 30993039 |
Andrea Garretto1, Thomas Hatzopoulos2, Catherine Putonti1,2,3,4.
Abstract
Metagenomics has enabled sequencing of viral communities from a myriad of different environments. Viral metagenomic studies routinely uncover sequences with no recognizable homology to known coding regions or genomes. Nevertheless, complete viral genomes have been constructed directly from complex community metagenomes, often through tedious manual curation. To address this, we developed the software tool virMine to identify viral genomes from raw reads representative of viral or mixed (viral and bacterial) communities. virMine automates sequence read quality control, assembly, and annotation. Researchers can easily refine their search for a specific study system and/or feature(s) of interest. In contrast to other viral genome detection tools that often rely on the recognition of viral signature sequences, virMine is not restricted by the insufficient representation of viral diversity in public data repositories. Rather, viral genomes are identified through an iterative approach, first omitting non-viral sequences. Thus, both relatives of previously characterized viruses and novel species can be detected, including both eukaryotic viruses and bacteriophages. Here we present virMine and its analysis of synthetic communities as well as metagenomic data sets from three distinctly different environments: the gut microbiota, the urinary microbiota, and freshwater viromes. Several new viral genomes were identified and annotated, thus contributing to our understanding of viral genetic diversity in these three environments.Entities:
Keywords: Bacteriophage; Freshwater virome; Human microbiome; Metagenomics; Virome
Year: 2019 PMID: 30993039 PMCID: PMC6462185 DOI: 10.7717/peerj.6695
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1Overview of virMine pipeline.
Tools integrated into the pipeline are listed in red. The sequences for viral contigs predicted with high confidence (“viral_contigs”) and putative viral contigs (“unkn_contigs”) are written to file.
Software integrated into the virMine pipeline.
| Sickle | 1.33 | Read trimming |
|
| SPAdes | 3.10.1 | Assembly | |
| metaSPAdes | 3.10.1 | Assembly | |
| MEGAHIT | 1.1.4 | Assembly | |
| BBMap | 37.36 | Coverage |
|
| GLIMMER | 3.02 | Gene prediction | |
| BLAST+ | 2.6.0 | Sequence Analysis |
|
Complex community microbiomes examined for virMine proof-of-concept study.
| Synthetic | N/A | 22 | 4.4 | |
| Gut Microbiomes | A subset of faecal microbiota of monozygot twins and their mothers ( | 454 FLX | 3 | 0.66 |
| A subset of faecal samples from 124 European individuals ( | Illumina Genome Analyzer | 55 | 1141.33 | |
| Urinary Viromes | UTI positive urine samples ( | Ion Torrent PGM | 10 | 6.22 |
| Freshwater Viromes | A subset of samples from Lake Michigan nearshore waters ( | Illumina MiSeq | 4 | 13.46 |
| Viral community of Lough Neagh ( | Illumina MiSeq | 1 | 4.60 |
Figure 2Number of contigs assembled for each of the synthetic data sets predicted as viral (black bars) or of unknown origin (gray bars).
The N50 score of the assembled contigs in each group is indicated within the corresponding bars.
Figure 3Coverage of crAssphage by contigs predicted by virMine as viral or unknown.
BLAST homology for longer (>5,000 bp) contigs predicted as viral.
| MGM4568637 |
| 14,157 | 73 | 0 | |
|
| 11,424 | 66 | 15 | ||
| MGM4568639 |
| 12,310 | 73 | 8 | |
|
| 5,156 | 71 | 5 | ||
| MGM4568640 |
| 7,987 | 69 | 2 | |
|
| 5,479 | 96 | 88 | ||
| MGM4568641 |
| 16,416 | 95 | 95 | |
| Uncultured Mediterranean phage uvMED |
| 13,087 | 79 | 1 | |
|
| 7,825 | 83 | 0 | ||
|
| 5,086 | 99 | 100 | ||
| MGM4568642 |
| 9,301 | 66 | 27 | |
|
| 5,312 | 83 | 1 | ||
| MGM4568645 |
| 8,302 | 66 | 11 | |
|
| 8,215 | 68 | 19 |
Notes.
Indicates BLAST homologies to annotated prophage regions.
Figure 4Viral species most frequently detected within the Lake Michigan data sets.
Viral genome sequences identified by virMine from the Lough Neagh virome (Skvortsov et al., 2016).
| contig_11 | 46,867 | 71 |
| 1 | 80 | Rhizosphere | |
| contig_12 | 46,702 | 74 | Uncultured marine virus isolate CBSM-242 |
| 0 | 83 | Chesapeake Bay sediment |
| contig_13 | 46,245 | 60 | Bacteriophage 11b |
| 1 | 68 | Arctic sea ice |
| contig_17 | 40,578 | 56 |
| 6 | 67 | Automobile air-conditioning evaporator | |
| contig_18 | 40,568 | 61 |
| 0 | 72 | Soda dam hot springs | |
| contig_2 | 70,520 | 92 | Uncultured virus YBW_Contig_50752 |
| 1 | 72 | North Sea Surface Water Virome |
| contig_5 | 56,143 | 55 | Uncultured virus SERC 372681 |
| 2 | 73 | Rhode River surface water |
| contig_6 | 55,961 | 75 |
| 1 | 71 | freshwater | |
| contig_7 | 55,939 | 77 | Uncultured virus SERC Contig 695464 |
| 0 | 76 | Rhode River surface water |
Notes.
Contig also predicted as viral by VirSorter (Roux et al., 2015).