| Literature DB >> 27375564 |
Patrick W Laffy1, Elisha M Wood-Charlson2, Dmitrij Turaev3, Karen D Weynberg1, Emmanuelle S Botté1, Madeleine J H van Oppen4, Nicole S Webster1, Thomas Rattei3.
Abstract
Abundant bioinformatics resources are available for the study of complex microbial metagenomes, however their utility in viral metagenomics is limited. HoloVir is a robust and flexible data analysis pipeline that provides an optimized and validated workflow for taxonomic and functional characterization of viral metagenomes derived from invertebrate holobionts. Simulated viral metagenomes comprising varying levels of viral diversity and abundance were used to determine the optimal assembly and gene prediction strategy, and multiple sequence assembly methods and gene prediction tools were tested in order to optimize our analysis workflow. HoloVir performs pairwise comparisons of single read and predicted gene datasets against the viral RefSeq database to assign taxonomy and additional comparison to phage-specific and cellular markers is undertaken to support the taxonomic assignments and identify potential cellular contamination. Broad functional classification of the predicted genes is provided by assignment of COG microbial functional category classifications using EggNOG and higher resolution functional analysis is achieved by searching for enrichment of specific Swiss-Prot keywords within the viral metagenome. Application of HoloVir to viral metagenomes from the coral Pocillopora damicornis and the sponge Rhopaloeides odorabile demonstrated that HoloVir provides a valuable tool to characterize holobiont viral communities across species, environments, or experiments.Entities:
Keywords: Bioinformatics tools; host-associated communities; marine ecology; marine invertebrates; symbiosis; viral metagenomics
Year: 2016 PMID: 27375564 PMCID: PMC4899465 DOI: 10.3389/fmicb.2016.00822
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Assemblies of simulated viral metagenomes with and without contig size filtering.
| # bases | 601,595 | 585,524 | 10,281,842 | 421,252 | 968,069 | 960,610 | 686,987 | |
| Total number of contigs | 92 | 50 | 64,966 | 25 | 116 | 106 | 45 | 16 |
| Longest contig (bp) | 179,062 | 179,062 | 97,990 | 97,990 | 177,419 | 177,419 | 182,047 | |
| N50 | 15,944 | 15,944 | 187 | 32,637 | 24,173 | 14,026 | 86,038 | |
| % of reference genomes covered | 76.9 | 76.0 | 98.8 | 58.8 | 97.9 | 98.0 | ||
| # bases | 2,218,909 | 2,185,321 | 9,635,750 | 2,016,524 | 3,027,437 | 2,988,389 | 2,361,691 | |
| Total number of contigs | 203 | 73 | 49,720 | 95 | 358 | 308 | 326 | 64 |
| Longest contig (bp) | 276,216 | 276,216 | 868,737 | 868,737 | 130,081 | 130,081 | 747,574 | |
| N50 | 129,841 | 129,841 | 199 | 14,000 | 24,473 | 131,252 | 133,117 | |
| % of reference genomes covered | 88.3 | 87.4 | 99.0 | 59.9 | 94.7 | 91.2 | 98.1 | |
Assembly statistics are provided for two mock viral metagenomes using four different assemblers, Ray, IDBA-UD, Trinity and CLC Genomics Workbench de novo assembler. For each assembly, statistics are listed for all contigs and for contigs with a minimum size of 1000 bp. The total coverage of the reference genomes was calculated using run_mummer3. Best values for longest contig, N50 and percentage of reference genome covered as well as the total number of bases most closely resembling source genomes size is indicated in bold.
Figure 1Graphical overview of HoloVir, the computational workflow for predicting taxonomic composition and gene functions from invertebrate-associated metaviromes.
Figure 2Taxonomic overview of the . Normalized taxonomic assignments of the metavirome data sets using NCBI's viral RefSeq database as BLAST searches from single read analysis (in light blue) and predicted genes from assembled data (in dark blue) are displayed in (A). The size of the colored circle is indicative of the relative abundance of reads in the metavirome being assigned to each specific taxonomic level (square root scaled). Normalized taxonomic assignments of the metavirome data sets against phage-specific orthologous group (POG) and cellular marker database as BLAST searches from single read analysis (in light gray) and predicted genes from assembled data (in dark gray) are displayed in (B). The MEGAN5 last common ancestor classification was used to assign all taxonomy. Data sets were normalized against the total number of significant assignments using a minimum bitscore threshold of 80, with taxonomic assignments being made based on 80% consensus of the best BLAST matches.
Figure 3A gene-centric comparison of the taxonomic composition of viral metagenomes from . Output is based on BLAST analysis of MetaGeneAnnotator predicted genes from assembled metaviromes, with taxonomy of genes assigned using the MEGAN5 last common ancestor classification, a minimum bitscore threshold of 80 and assignments being made using a minimum 80% consensus of the best BLAST matches. The size of the colored circle is indicative of the relative abundance of reads in the metavirome being assigned to each specific taxonomic level (square root scaled). Specific counts of genes that can be assigned to specific taxa are listed to the right of the taxa name (P. damicornis on the left, R. odorabile on the right).
Figure 4Functional assignment of predicted viral genes based on COG functional category classification. A total of 6560 P. damicornis and 1041 R. odorabile COG functional category classifications were made based on BLAST comparisons to the EggNOG 4.5 database. Of these classifications, 51.6% of P. damicornis genes and 56.4% of R. odorabile genes were assigned “Function unknown.” The relative proportion of each known COG functional category for genes predicted from viral metagenome of P. damicornis and R. odorabile are shown.
Keyword assignments were identified for the best significant UniprotKB/Swiss-Prot BLAST match for each predicted gene.
| Degradation of host chromosome by virus | 126.3 | 548.6 |
| Evasion of bacteria-mediated translation shutoff by virus | 202 | 365.7 |
| Degradation of host lipopolysaccharides during virus entry | 101 | 365.7 |
| Bacterial host gene expression shutoff by virus | 84.2 | 365.7 |
| Viral DNA replication | 82.6 | 359.1 |
| Viral long flexible tail ejection system | 256.4 | 337.6 |
| Viral short tail ejection system | 314.2 | 243.8 |
| Latency-replication switch | N/A | 274.3 |
| Viral genome ejection through host cell envelope | 156.2 | 205.7 |
| Viral latency | N/A | 182.9 |
| Viral genome excision | 15.2 | 164.6 |
| Viral contractile tail ejection system | 67.3 | 162.5 |
| Viral genome packaging | 103.7 | 151.2 |
| Restriction system | 23.2 | 130.3 |
| Viral capsid assembly | 125.5 | 125.4 |
| Viral baseplate protein | 48.9 | 106.2 |
| Viral tail assembly | 83.9 | 44.5 |
| DNA invertase | 79.7 | 57.7 |
| Viral tail protein | 43.9 | 71.8 |
| Viral tail fiber protein | 60.6 | 62.7 |
Enriched functions were determined by comparison of the relative keyword frequency in each dataset with the frequency in the UniprotKB/Swiss-Prot database. The fold enrichments of the 20 most enriched functions are displayed for each host species.