Jiarong Guo1, Ben Bolduc1, Ahmed A Zayed1, Arvind Varsani2,3, Guillermo Dominguez-Huerta1, Tom O Delmont4, Akbar Adjie Pratama1, M Consuelo Gazitúa5, Dean Vik1, Matthew B Sullivan6,7,8, Simon Roux9. 1. Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA. 2. The Biodesign Center for Fundamental and Applied Microbiomics, Center for Evolution and Medicine, School of Life Sciences, Arizona State University, Tempe, AZ, 85287, USA. 3. Structural Biology Research Unit, Department of Integrative Biomedical Sciences, University of Cape Town, Observatory, Cape Town, 7701, South Africa. 4. Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France. 5. Viromica, 7870582, Santiago, Chile. 6. Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA. sullivan.948@osu.edu. 7. Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, 43210, USA. sullivan.948@osu.edu. 8. Center of Microbiome Science, Ohio State University, Columbus, OH, 43210, USA. sullivan.948@osu.edu. 9. DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA. sroux@lbl.gov.
Abstract
BACKGROUND: Viruses are a significant player in many biosphere and human ecosystems, but most signals remain "hidden" in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools. RESULTS: Here, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score > 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2's modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity. CONCLUSION: With multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in various ecosystems. Source code of VirSorter2 is freely available ( https://bitbucket.org/MAVERICLab/virsorter2 ), and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse ( https://de.cyverse.org/de ). Video abstract.
BACKGROUND: Viruses are a significant player in many biosphere and human ecosystems, but most signals remain "hidden" in metagenomic/metatranscriptomic sequence datasets due to the lack of universal gene markers, database representatives, and insufficiently advanced identification tools. RESULTS: Here, we introduce VirSorter2, a DNA and RNA virus identification tool that leverages genome-informed database advances across a collection of customized automatic classifiers to improve the accuracy and range of virus sequence detection. When benchmarked against genomes from both isolated and uncultivated viruses, VirSorter2 uniquely performed consistently with high accuracy (F1-score > 0.8) across viral diversity, while all other tools under-detected viruses outside of the group most represented in reference databases (i.e., those in the order Caudovirales). Among the tools evaluated, VirSorter2 was also uniquely able to minimize errors associated with atypical cellular sequences including eukaryotic genomes and plasmids. Finally, as the virosphere exploration unravels novel viral sequences, VirSorter2's modular design makes it inherently able to expand to new types of viruses via the design of new classifiers to maintain maximal sensitivity and specificity. CONCLUSION: With multi-classifier and modular design, VirSorter2 demonstrates higher overall accuracy across major viral groups and will advance our knowledge of virus evolution, diversity, and virus-microbe interaction in various ecosystems. Source code of VirSorter2 is freely available ( https://bitbucket.org/MAVERICLab/virsorter2 ), and VirSorter2 is also available both on bioconda and as an iVirus app on CyVerse ( https://de.cyverse.org/de ). Video abstract.
Authors: Michael J Tisza; Diana V Pastrana; Nicole L Welch; Brittany Stewart; Alberto Peretti; Gabriel J Starrett; Yuk-Ying S Pang; Siddharth R Krishnamurthy; Patricia A Pesavento; David H McDermott; Philip M Murphy; Jessica L Whited; Bess Miller; Jason Brenchley; Stephan P Rosshart; Barbara Rehermann; John Doorbar; Blake A Ta'ala; Olga Pletnikova; Juan C Troncoso; Susan M Resnick; Ben Bolduc; Matthew B Sullivan; Arvind Varsani; Anca M Segall; Christopher B Buck Journal: Elife Date: 2020-02-04 Impact factor: 8.140
Authors: Simon Roux; Mart Krupovic; Rebecca A Daly; Adair L Borges; Stephen Nayfach; Frederik Schulz; Allison Sharrar; Paula B Matheus Carnevali; Jan-Fang Cheng; Natalia N Ivanova; Joseph Bondy-Denomy; Kelly C Wrighton; Tanja Woyke; Axel Visel; Nikos C Kyrpides; Emiley A Eloe-Fadrosh Journal: Nat Microbiol Date: 2019-07-22 Impact factor: 17.745
Authors: Simon Roux; Evelien M Adriaenssens; Bas E Dutilh; Eugene V Koonin; Andrew M Kropinski; Mart Krupovic; Jens H Kuhn; Rob Lavigne; J Rodney Brister; Arvind Varsani; Clara Amid; Ramy K Aziz; Seth R Bordenstein; Peer Bork; Mya Breitbart; Guy R Cochrane; Rebecca A Daly; Christelle Desnues; Melissa B Duhaime; Joanne B Emerson; François Enault; Jed A Fuhrman; Pascal Hingamp; Philip Hugenholtz; Bonnie L Hurwitz; Natalia N Ivanova; Jessica M Labonté; Kyung-Bum Lee; Rex R Malmstrom; Manuel Martinez-Garcia; Ilene Karsch Mizrachi; Hiroyuki Ogata; David Páez-Espino; Marie-Agnès Petit; Catherine Putonti; Thomas Rattei; Alejandro Reyes; Francisco Rodriguez-Valera; Karyna Rosario; Lynn Schriml; Frederik Schulz; Grieg F Steward; Matthew B Sullivan; Shinichi Sunagawa; Curtis A Suttle; Ben Temperton; Susannah G Tringe; Rebecca Vega Thurber; Nicole S Webster; Katrine L Whiteson; Steven W Wilhelm; K Eric Wommack; Tanja Woyke; Kelly C Wrighton; Pelin Yilmaz; Takashi Yoshida; Mark J Young; Natalya Yutin; Lisa Zeigler Allen; Nikos C Kyrpides; Emiley A Eloe-Fadrosh Journal: Nat Biotechnol Date: 2018-12-17 Impact factor: 54.908
Authors: Nuala A O'Leary; Mathew W Wright; J Rodney Brister; Stacy Ciufo; Diana Haddad; Rich McVeigh; Bhanu Rajput; Barbara Robbertse; Brian Smith-White; Danso Ako-Adjei; Alexander Astashyn; Azat Badretdin; Yiming Bao; Olga Blinkova; Vyacheslav Brover; Vyacheslav Chetvernin; Jinna Choi; Eric Cox; Olga Ermolaeva; Catherine M Farrell; Tamara Goldfarb; Tripti Gupta; Daniel Haft; Eneida Hatcher; Wratko Hlavina; Vinita S Joardar; Vamsi K Kodali; Wenjun Li; Donna Maglott; Patrick Masterson; Kelly M McGarvey; Michael R Murphy; Kathleen O'Neill; Shashikant Pujar; Sanjida H Rangwala; Daniel Rausch; Lillian D Riddick; Conrad Schoch; Andrei Shkeda; Susan S Storz; Hanzhen Sun; Francoise Thibaud-Nissen; Igor Tolstoy; Raymond E Tully; Anjana R Vatsan; Craig Wallin; David Webb; Wendy Wu; Melissa J Landrum; Avi Kimchi; Tatiana Tatusova; Michael DiCuccio; Paul Kitts; Terence D Murphy; Kim D Pruitt Journal: Nucleic Acids Res Date: 2015-11-08 Impact factor: 16.971
Authors: Joanne B Emerson; Simon Roux; Jennifer R Brum; Benjamin Bolduc; Ben J Woodcroft; Ho Bin Jang; Caitlin M Singleton; Lindsey M Solden; Adrian E Naas; Joel A Boyd; Suzanne B Hodgkins; Rachel M Wilson; Gareth Trubl; Changsheng Li; Steve Frolking; Phillip B Pope; Kelly C Wrighton; Patrick M Crill; Jeffrey P Chanton; Scott R Saleska; Gene W Tyson; Virginia I Rich; Matthew B Sullivan Journal: Nat Microbiol Date: 2018-07-16 Impact factor: 17.745
Authors: Robert C Edgar; Jeff Taylor; Victor Lin; Tomer Altman; Pierre Barbera; Dmitry Meleshko; Dan Lohr; Gherman Novakovsky; Benjamin Buchfink; Basem Al-Shayeb; Jillian F Banfield; Marcos de la Peña; Anton Korobeynikov; Rayan Chikhi; Artem Babaian Journal: Nature Date: 2022-01-26 Impact factor: 49.962
Authors: Andrey N Shkoporov; Stephen R Stockdale; Aonghus Lavelle; Ivanela Kondova; Cara Heuston; Aditya Upadrasta; Ekaterina V Khokhlova; Imme van der Kamp; Boudewijn Ouwerling; Lorraine A Draper; Jan A M Langermans; R Paul Ross; Colin Hill Journal: Nat Microbiol Date: 2022-08-02 Impact factor: 30.964
Authors: Matthew G Durrant; Alison Fanton; Josh Tycko; Michaela Hinks; Sita S Chandrasekaran; Nicholas T Perry; Julia Schaepe; Peter P Du; Peter Lotfy; Michael C Bassik; Lacramioara Bintu; Ami S Bhatt; Patrick D Hsu Journal: Nat Biotechnol Date: 2022-10-10 Impact factor: 68.164
Authors: Ahmed A Zayed; Dominik Lücking; Mohamed Mohssen; Dylan Cronin; Ben Bolduc; Ann C Gregory; Katherine R Hargreaves; Paul D Piehowski; Richard A White; Eric L Huang; Joshua N Adkins; Simon Roux; Cristina Moraru; Matthew B Sullivan Journal: Bioinformatics Date: 2021-06-16 Impact factor: 6.931