Literature DB >> 28957499

WIsH: who is the host? Predicting prokaryotic hosts from metagenomic phage contigs.

Clovis Galiez1, Matthias Siebert1, François Enault2, Jonathan Vincent2, Johannes Söding1.   

Abstract

SUMMARY: WIsH predicts prokaryotic hosts of phages from their genomic sequences. It achieves 63% mean accuracy when predicting the host genus among 20 genera for 3 kbp-long phage contigs. Over the best current tool, WisH shows much improved accuracy on phage sequences of a few kbp length and runs hundreds of times faster, making it suited for metagenomics studies.
AVAILABILITY AND IMPLEMENTATION: OpenMP-parallelized GPL-licensed C ++ code available at https://github.com/soedinglab/wish. CONTACT: clovis.galiez@mpibpc.mpg.de or soeding@mpibpc.mpg.de. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author 2017. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2017        PMID: 28957499      PMCID: PMC5870724          DOI: 10.1093/bioinformatics/btx383

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Viruses are key components of almost all known ecosystems (Edwards and Rohwer, 2005). They regulate biological diversity in various environments from oceans to the human gut by depleting dominant species (De Paepe ; Lehahn ) and are even estimated to be responsible for the death of 20% of the living ocean biomass per day (Suttle, 2007). Viruses are therefore central for understanding microbial ecology and dynamics. Even though phages (i.e. viruses infecting bacteria and archaea) represent the majority of the global virosphere, their comprehensive study has been hampered by the necessity of isolating and cultivating their host. Viral metagenomics circumvent this limitation, increasingly unveiling new viral genomic sequences from a wide range of environments (Bolduc ; Edwards and Rohwer, 2005). As a drawback, the identity of the hosts remains unknown for these newly discovered viruses, limiting our ecological understanding of the microbiome. Different methods exist to predict prokaryotic hosts for phage sequences in metagenomes, based either on co-abundance, sequence homology, similarity to other phages (Villarroel ) or sequence composition similarity between viruses and their hosts (Edwards ). Among tools taking this last approach, VirHostMatcher (Ahlgren ) has reported the best accuracy (proportion of correct predictions) on full-length viral genomes: between 33 and 64% at the genus level depending on the dataset. But its performance drops notably for shorter sequences, falling by 36% at 5 kbp length. However, contigs of a few kbp length are common in viral metagenomic data due to shallow coverage and intra-population variation (Smits ). In addition, the running time of VirHostMatcher hinders its use on large datasets (Supplementary Table S5). Here we introduce WIsH, a tool to predict the prokaryotic host of viral contigs with good accuracy for contigs as short as 3 kbp that runs several hundred times faster than VirHostMatcher.

2 Materials and methods

The estimated k-mer frequencies classically used for host prediction using genomic composition become very noisy for short phage contigs. We therefore adopted a suited probabilistic approach. First, we train a homogeneous Markov model of order 8 (Supplementary Fig. S2) for each potential host genome (WIsH -c build -g prokaryoteGenomesDir -m modelDir). We then compute the likelihood of a contig under each of the trained Markov models (WIsH -c predict -g phageContigsDir -m modelDir -r outputResultDir) and predict de novo (i.e. without relying on any known phage-host interaction) the host whose model yields the highest likelihood (details in Supplementary Material). To evaluate the performance of WIsH an VirHostMatcher, we used the 3780 full prokaryotic genomes of the KEGG database (Kanehisa ) and the 1420 phages in the RefSeq Virus database (Brister ) for which a host was annotated in this database. WIsH can compute P-values when provided with the parameters of the Gaussian null-distributions of each Markov model (option -n KeggGaussianFits.tsv -b). The Gaussian parameters were precomputed for each model as explained in Supplementary Material Section S1.2.

3 Results

WIsH outperforms VirHostMatcher at every taxonomic level (Fig. 1A, and ROC curves in Supplementary Fig. S4). Although the accuracy for long contigs is improved only by a few percentage points, predictions for contigs of 3 kB have 60% higher accuracy than those of VirHostMatcher. Similar results were obtained on the original VirHostMatcher benchmark set (Ahlgren ) (Supplementary Table S1). At a P-value threshold of 0.06, WIsH predicts hosts for 50% of the phage sequences with 75% accuracy at the family level (Supplementary Fig. S1). Furthermore, these accuracies can be considered as lower bonds as in practice the user can restrict the set of host genomes to those actually present in the sample. For contigs of length 3 kbp, WisH accuracy reaches 63% for 20 potential host genera per sample and 52% for 80 genera per sample (Fig. 1B).
Fig. 1

Solid lines for WIsH (errors bars showing 95% confidence interval) and dashed lines for VirHostMatcher. (A) Prediction accuracy over phage contig length for 3780 potential bacterial and archaeal host genomes from 965 genera. (B) Accuracy for 3 kbp phage contigs for various numbers of prokaryotic host genera per sample, estimated by randomly drawing (300 replications) potential hosts from the indicated number of genera

Solid lines for WIsH (errors bars showing 95% confidence interval) and dashed lines for VirHostMatcher. (A) Prediction accuracy over phage contig length for 3780 potential bacterial and archaeal host genomes from 965 genera. (B) Accuracy for 3 kbp phage contigs for various numbers of prokaryotic host genera per sample, estimated by randomly drawing (300 replications) potential hosts from the indicated number of genera Paez-Espino describe a set of 125,842 metagenomic viral contigs (mVCs) of 11 kbp median length from various environments. The original host prediction mainly used CRISPR and t-RNA sequence matches and made predictions for only 7.7% of the mVCs. With a P-value threshold of 0.1 WIsH annotated 59% of the mVCs and the predicted host families matched the previous annotation in 70% of the cases, giving a lower bound on the accuracy (Supplementary Fig. S10). Runtime measurements of WIsH on a 16-core 2.60GHz Intel Xeon yielded a speed of 55 kbp/s, several hundred times faster than VirHostMatcher (Supplementary Table S5). Prokaryotic taxonomy usually follows subjective, historic criteria that can differ markedly among phyla, limiting the observed prediction accuracies. Using the fraction of identical nucleotides in 16S rRNA genes as quantitative measure of evolutionary relatedness (Yarza ), accuracies improve drastically, e.g. from 47 to 63% on the family level when using the full set of 3780 host reference genomes (Supplementary Table S3). The phages that show the poorest predictions tend to have longer genomes and to encode more tRNA (Supplementary Material Section S7.1 and Supplementary Figs S11–S14). These phages may be more independent from their hosts and may have less selective pressure to adapt their genomes to their hosts.

4 Conclusion

WIsH predicts hosts for short phage sequences with a good accuracy and very high speed. We hope that it will help in the investigation of microbial ecology through metagenomics shotgun sequencing of microbiomes. Click here for additional data file.
  13 in total

1.  Decoupling physical from biological processes to assess the impact of viruses on a mesoscale algal bloom.

Authors:  Yoav Lehahn; Ilan Koren; Daniella Schatz; Miguel Frada; Uri Sheyn; Emmanuel Boss; Shai Efrati; Yinon Rudich; Miri Trainic; Shlomit Sharoni; Christian Laber; Giacomo R DiTullio; Marco J L Coolen; Ana Maria Martins; Benjamin A S Van Mooy; Kay D Bidle; Assaf Vardi
Journal:  Curr Biol       Date:  2014-08-21       Impact factor: 10.834

Review 2.  Viral metagenomics.

Authors:  Robert A Edwards; Forest Rohwer
Journal:  Nat Rev Microbiol       Date:  2005-06       Impact factor: 60.633

Review 3.  Marine viruses--major players in the global ecosystem.

Authors:  Curtis A Suttle
Journal:  Nat Rev Microbiol       Date:  2007-10       Impact factor: 60.633

4.  Uncovering Earth's virome.

Authors:  David Paez-Espino; Emiley A Eloe-Fadrosh; Georgios A Pavlopoulos; Alex D Thomas; Marcel Huntemann; Natalia Mikhailova; Edward Rubin; Natalia N Ivanova; Nikos C Kyrpides
Journal:  Nature       Date:  2016-08-17       Impact factor: 49.962

5.  Alignment-free $d_2^*$ oligonucleotide frequency dissimilarity measure improves prediction of hosts from metagenomically-derived viral sequences.

Authors:  Nathan A Ahlgren; Jie Ren; Yang Young Lu; Jed A Fuhrman; Fengzhu Sun
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

Review 6.  Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences.

Authors:  Pablo Yarza; Pelin Yilmaz; Elmar Pruesse; Frank Oliver Glöckner; Wolfgang Ludwig; Karl-Heinz Schleifer; William B Whitman; Jean Euzéby; Rudolf Amann; Ramon Rosselló-Móra
Journal:  Nat Rev Microbiol       Date:  2014-09       Impact factor: 60.633

7.  NCBI viral genomes resource.

Authors:  J Rodney Brister; Danso Ako-Adjei; Yiming Bao; Olga Blinkova
Journal:  Nucleic Acids Res       Date:  2014-11-26       Impact factor: 16.971

8.  Assembly of viral genomes from metagenomes.

Authors:  Saskia L Smits; Rogier Bodewes; Aritz Ruiz-Gonzalez; Wolfgang Baumgärtner; Marion P Koopmans; Albert D M E Osterhaus; Anita C Schürch
Journal:  Front Microbiol       Date:  2014-12-18       Impact factor: 5.640

9.  KEGG: new perspectives on genomes, pathways, diseases and drugs.

Authors:  Minoru Kanehisa; Miho Furumichi; Mao Tanabe; Yoko Sato; Kanae Morishima
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

Review 10.  Computational approaches to predict bacteriophage-host relationships.

Authors:  Robert A Edwards; Katelyn McNair; Karoline Faust; Jeroen Raes; Bas E Dutilh
Journal:  FEMS Microbiol Rev       Date:  2015-12-09       Impact factor: 16.408

View more
  61 in total

1.  PPR-Meta: a tool for identifying phages and plasmids from metagenomic fragments using deep learning.

Authors:  Zhencheng Fang; Jie Tan; Shufang Wu; Mo Li; Congmin Xu; Zhongjie Xie; Huaiqiu Zhu
Journal:  Gigascience       Date:  2019-06-01       Impact factor: 6.524

Review 2.  Metaviromics coupled with phage-host identification to open the viral 'black box'.

Authors:  Kira Moon; Jang-Cheon Cho
Journal:  J Microbiol       Date:  2021-02-23       Impact factor: 3.422

3.  DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach.

Authors:  Shufang Wu; Zhencheng Fang; Jie Tan; Mo Li; Chunhui Wang; Qian Guo; Congmin Xu; Xiaoqing Jiang; Huaiqiu Zhu
Journal:  Gigascience       Date:  2021-09-08       Impact factor: 6.524

4.  Novel Genus of Phages Infecting Streptococcus thermophilus: Genomic and Morphological Characterization.

Authors:  Cécile Philippe; Sébastien Levesque; Moïra B Dion; Denise M Tremblay; Philippe Horvath; Natascha Lüth; Christian Cambillau; Charles Franz; Horst Neve; Christophe Fremaux; Knut J Heller; Sylvain Moineau
Journal:  Appl Environ Microbiol       Date:  2020-06-17       Impact factor: 4.792

Review 5.  The spinal cord-gut-immune axis as a master regulator of health and neurological function after spinal cord injury.

Authors:  Kristina A Kigerl; Kylie Zane; Kia Adams; Matthew B Sullivan; Phillip G Popovich
Journal:  Exp Neurol       Date:  2019-10-22       Impact factor: 5.330

6.  Interaction dynamics and virus-host range for estuarine actinophages captured by epicPCR.

Authors:  Eric G Sakowski; Keith Arora-Williams; Funing Tian; Ahmed A Zayed; Olivier Zablocki; Matthew B Sullivan; Sarah P Preheim
Journal:  Nat Microbiol       Date:  2021-02-25       Impact factor: 17.745

7.  Efficient dilution-to-extinction isolation of novel virus-host model systems for fastidious heterotrophic bacteria.

Authors:  Holger H Buchholz; Michelle L Michelsen; Luis M Bolaños; Emily Browne; Michael J Allen; Ben Temperton
Journal:  ISME J       Date:  2021-01-25       Impact factor: 10.302

8.  Spinal Cord Injury Changes the Structure and Functional Potential of Gut Bacterial and Viral Communities.

Authors:  Jingjie Du; Ahmed A Zayed; Kristina A Kigerl; Kylie Zane; Matthew B Sullivan; Phillip G Popovich
Journal:  mSystems       Date:  2021-05-11       Impact factor: 6.496

9.  Methane-derived carbon flows into host-virus networks at different trophic levels in soil.

Authors:  Sungeun Lee; Ella T Sieradzki; Alexa M Nicolas; Robin L Walker; Mary K Firestone; Christina Hazard; Graeme W Nicol
Journal:  Proc Natl Acad Sci U S A       Date:  2021-08-10       Impact factor: 11.205

10.  Computational Viromics: Applications of the Computational Biology in Viromics Studies.

Authors:  Congyu Lu; Yousong Peng
Journal:  Virol Sin       Date:  2021-05-31       Impact factor: 4.327

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.