Ahmed A Zayed1,2, Dominik Lücking3, Mohamed Mohssen1,2,4, Dylan Cronin1, Ben Bolduc1, Ann C Gregory5,6, Katherine R Hargreaves1,7, Paul D Piehowski8, Richard A White9,10,11,12, Eric L Huang8, Joshua N Adkins8, Simon Roux13, Cristina Moraru14, Matthew B Sullivan1,2,4,15. 1. Department of Microbiology, The Ohio State University, Columbus, Ohio 43210, USA. 2. Center of Microbiome Science, Ohio State University, Columbus, OH 43210, USA. 3. Max-Planck-Institut fuer Marine Mikrobiologie, Bremen 28359, Germany. 4. The Interdisciplinary Biophysics Graduate Program, The Ohio State University, Columbus, Ohio 43210, USA. 5. Department of Microbiology and Immunology, Rega Institute, KU Leuven, Leuven, Belgium. 6. VIB-KU Leuven Center for Microbiology, Leuven, Belgium. 7. Department of Life Sciences, Manchester Metropolitan University, John Dalton Building, Chester Street, Manchester, M1 5GD, UK. 8. Earth and Biological Sciences Directorate, PNNL, Richland, WA 99354, USA. 9. Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte 9201 University City Boulevard, Charlotte, NC 28223, USA. 10. Department of Bioinformatics and Genomics, The University of North Carolina at Charlotte 150 Research Campus Drive, Kannapolis, NC 28081, USA. 11. Australian Centre for Astrobiology, University of NewSouth Wales, Sydney, Australia. 12. RAW Molecular Systems (RAW), INC, Concord, NC 28025, USA. 13. DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA. 14. The Institute for Chemistry and Biology of the Marine Environment (ICBM), University of Oldenburg, Oldenburg 26111, Germany. 15. Department of Civil, Environmental and Geodetic Engineering, The Ohio State University, Columbus, Ohio 43210, USA.
Abstract
MOTIVATION: Viruses infect, reprogram, and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. RESULTS: Here we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240,311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, pan-ecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from "conservative" to "eXtremely Conservative" resulted in 37,841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by ∼24% on average (up to ∼42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem. AVAILABILITY: The resources are available on the iVirus platform at (doi.org/10.25739/9vze-4143). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Viruses infect, reprogram, and kill microbes, leading to profound ecosystem consequences, from elemental cycling in oceans and soils to microbiome-modulated diseases in plants and animals. Although metagenomic datasets are increasingly available, identifying viruses in them is challenging due to poor representation and annotation of viral sequences in databases. RESULTS: Here we establish efam, an expanded collection of Hidden Markov Model (HMM) profiles that represent viral protein families conservatively identified from the Global Ocean Virome 2.0 dataset. This resulted in 240,311 HMM profiles, each with at least 2 protein sequences, making efam >7-fold larger than the next largest, pan-ecosystem viral HMM profile database. Adjusting the criteria for viral contig confidence from "conservative" to "eXtremely Conservative" resulted in 37,841 HMM profiles in our efam-XC database. To assess the value of this resource, we integrated efam-XC into VirSorter viral discovery software to discover viruses from less-studied, ecologically distinct oxygen minimum zone (OMZ) marine habitats. This expanded database led to an increase in viruses recovered from every tested OMZ virome by ∼24% on average (up to ∼42%) and especially improved the recovery of often-missed shorter contigs (<5 kb). Additionally, to help elucidate lesser-known viral protein functions, we annotated the profiles using multiple databases from the DRAM pipeline and virion-associated metaproteomic data, which doubled the number of annotations obtainable by standard, single-database annotation approaches. Together, these marine resources (efam and efam-XC) are provided as searchable, compressed HMM databases that will be updated bi-annually to help maximize viral sequence discovery and study from any ecosystem. AVAILABILITY: The resources are available on the iVirus platform at (doi.org/10.25739/9vze-4143). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Christian Rinke; Patrick Schwientek; Alexander Sczyrba; Natalia N Ivanova; Iain J Anderson; Jan-Fang Cheng; Aaron Darling; Stephanie Malfatti; Brandon K Swan; Esther A Gies; Jeremy A Dodsworth; Brian P Hedlund; George Tsiamis; Stefan M Sievert; Wen-Tso Liu; Jonathan A Eisen; Steven J Hallam; Nikos C Kyrpides; Ramunas Stepanauskas; Edward M Rubin; Philip Hugenholtz; Tanja Woyke Journal: Nature Date: 2013-07-14 Impact factor: 49.962
Authors: Kathryn M Kauffman; Fatima A Hussain; Joy Yang; Philip Arevalo; Julia M Brown; William K Chang; David VanInsberghe; Joseph Elsherbini; Radhey S Sharma; Michael B Cutler; Libusha Kelly; Martin F Polz Journal: Nature Date: 2018-01-24 Impact factor: 49.962
Authors: Matthew C Chambers; Brendan Maclean; Robert Burke; Dario Amodei; Daniel L Ruderman; Steffen Neumann; Laurent Gatto; Bernd Fischer; Brian Pratt; Jarrett Egertson; Katherine Hoff; Darren Kessner; Natalie Tasman; Nicholas Shulman; Barbara Frewen; Tahmina A Baker; Mi-Youn Brusniak; Christopher Paulse; David Creasy; Lisa Flashner; Kian Kani; Chris Moulding; Sean L Seymour; Lydia M Nuwaysir; Brent Lefebvre; Frank Kuhlmann; Joe Roark; Paape Rainer; Suckau Detlev; Tina Hemenway; Andreas Huhmer; James Langridge; Brian Connolly; Trey Chadick; Krisztina Holly; Josh Eckels; Eric W Deutsch; Robert L Moritz; Jonathan E Katz; David B Agus; Michael MacCoss; David L Tabb; Parag Mallick Journal: Nat Biotechnol Date: 2012-10 Impact factor: 54.908
Authors: Simon Roux; Evelien M Adriaenssens; Bas E Dutilh; Eugene V Koonin; Andrew M Kropinski; Mart Krupovic; Jens H Kuhn; Rob Lavigne; J Rodney Brister; Arvind Varsani; Clara Amid; Ramy K Aziz; Seth R Bordenstein; Peer Bork; Mya Breitbart; Guy R Cochrane; Rebecca A Daly; Christelle Desnues; Melissa B Duhaime; Joanne B Emerson; François Enault; Jed A Fuhrman; Pascal Hingamp; Philip Hugenholtz; Bonnie L Hurwitz; Natalia N Ivanova; Jessica M Labonté; Kyung-Bum Lee; Rex R Malmstrom; Manuel Martinez-Garcia; Ilene Karsch Mizrachi; Hiroyuki Ogata; David Páez-Espino; Marie-Agnès Petit; Catherine Putonti; Thomas Rattei; Alejandro Reyes; Francisco Rodriguez-Valera; Karyna Rosario; Lynn Schriml; Frederik Schulz; Grieg F Steward; Matthew B Sullivan; Shinichi Sunagawa; Curtis A Suttle; Ben Temperton; Susannah G Tringe; Rebecca Vega Thurber; Nicole S Webster; Katrine L Whiteson; Steven W Wilhelm; K Eric Wommack; Tanja Woyke; Kelly C Wrighton; Pelin Yilmaz; Takashi Yoshida; Mark J Young; Natalya Yutin; Lisa Zeigler Allen; Nikos C Kyrpides; Emiley A Eloe-Fadrosh Journal: Nat Biotechnol Date: 2018-12-17 Impact factor: 54.908
Authors: Cristina Howard-Varona; Morgan M Lindback; G Eric Bastien; Natalie Solonenko; Ahmed A Zayed; HoBin Jang; Bill Andreopoulos; Heather M Brewer; Tijana Glavina Del Rio; Joshua N Adkins; Subhadeep Paul; Matthew B Sullivan; Melissa B Duhaime Journal: ISME J Date: 2020-01-02 Impact factor: 10.302
Authors: Yuri I Wolf; Sukrit Silas; Yongjie Wang; Shuang Wu; Michael Bocek; Darius Kazlauskas; Mart Krupovic; Andrew Fire; Valerian V Dolja; Eugene V Koonin Journal: Nat Microbiol Date: 2020-07-20 Impact factor: 17.745
Authors: Franziska Hufsky; Denis Beslic; Dimitri Boeckaerts; Sebastian Duchene; Enrique González-Tortuero; Andreas J Gruber; Jiarong Guo; Daan Jansen; John Juma; Kunaphas Kongkitimanon; Antoni Luque; Muriel Ritsch; Gabriel Lencioni Lovate; Luca Nishimura; Célia Pas; Esteban Domingo; Emma Hodcroft; Philippe Lemey; Matthew B Sullivan; Friedemann Weber; Fernando González-Candelas; Sarah Krautwurst; Alba Pérez-Cataluña; Walter Randazzo; Gloria Sánchez; Manja Marz Journal: Viruses Date: 2022-05-05 Impact factor: 5.818