Alexa B R McIntyre1,2,3, Rachid Ounit4, Ebrahim Afshinnekoo2,3,5, Robert J Prill6, Elizabeth Hénaff2,3, Noah Alexander2,3, Samuel S Minot7, David Danko1,2,3, Jonathan Foox2,3, Sofia Ahsanuddin2,3, Scott Tighe8, Nur A Hasan9,10, Poorani Subramanian9, Kelly Moffat9, Shawn Levy11, Stefano Lonardi4, Nick Greenfield7, Rita R Colwell9,12, Gail L Rosen13, Christopher E Mason14,15,16. 1. Tri-Institutional Program in Computational Biology and Medicine, New York, NY, USA. 2. Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA. 3. The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA. 4. Department of Computer Science and Engineering, University of California, Riverside, CA, 92521, USA. 5. School of Medicine, New York Medical College, Valhalla, NY, 10595, USA. 6. Accelerated Discovery Lab, IBM Almaden Research Center, San Jose, CA, 95120, USA. 7. One Codex, Reference Genomics, San Francisco, CA, 94103, USA. 8. University of Vermont, Burlington, VT, 05405, USA. 9. CosmosID, Inc, Rockville, MD, 20850, USA. 10. Center for Bioinformatics and Computational Biology, University of Maryland Institute for Advanced Computer Studies (UMIACS), College Park, MD, 20742, USA. 11. HudsonAlpha Institute for Biotechnology, Huntsville, AL, 35806, USA. 12. Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, USA. 13. Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, 19104, USA. gail.l.rosen@gmail.com. 14. Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, 10021, USA. chm2042@med.cornell.edu. 15. The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, New York, NY, 10021, USA. chm2042@med.cornell.edu. 16. The Feil Family Brain and Mind Research Institute, New York, NY, 10065, USA. chm2042@med.cornell.edu.
Abstract
BACKGROUND: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. RESULTS: In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. CONCLUSIONS: This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
BACKGROUND: One of the main challenges in metagenomics is the identification of microorganisms in clinical and environmental samples. While an extensive and heterogeneous set of computational tools is available to classify microorganisms using whole-genome shotgun sequencing data, comprehensive comparisons of these methods are limited. RESULTS: In this study, we use the largest-to-date set of laboratory-generated and simulated controls across 846 species to evaluate the performance of 11 metagenomic classifiers. Tools were characterized on the basis of their ability to identify taxa at the genus, species, and strain levels, quantify relative abundances of taxa, and classify individual reads to the species level. Strikingly, the number of species identified by the 11 tools can differ by over three orders of magnitude on the same datasets. Various strategies can ameliorate taxonomic misclassification, including abundance filtering, ensemble approaches, and tool intersection. Nevertheless, these strategies were often insufficient to completely eliminate false positives from environmental samples, which are especially important where they concern medically relevant species. Overall, pairing tools with different classification strategies (k-mer, alignment, marker) can combine their respective advantages. CONCLUSIONS: This study provides positive and negative controls, titrated standards, and a guide for selecting tools for metagenomic analyses by comparing ranges of precision, accuracy, and recall. We show that proper experimental design and analysis parameters can reduce false positives, provide greater resolution of species in complex metagenomic samples, and improve the interpretation of results.
Authors: Daniel H Huson; Suparna Mitra; Hans-Joachim Ruscheweyh; Nico Weber; Stephan C Schuster Journal: Genome Res Date: 2011-06-20 Impact factor: 9.043
Authors: Xochitl C Morgan; Timothy L Tickle; Harry Sokol; Dirk Gevers; Kathryn L Devaney; Doyle V Ward; Joshua A Reyes; Samir A Shah; Neal LeLeiko; Scott B Snapper; Athos Bousvaros; Joshua Korzenik; Bruce E Sands; Ramnik J Xavier; Curtis Huttenhower Journal: Genome Biol Date: 2012-04-16 Impact factor: 13.583
Authors: Sasha K Ames; David A Hysom; Shea N Gardner; G Scott Lloyd; Maya B Gokhale; Jonathan E Allen Journal: Bioinformatics Date: 2013-07-04 Impact factor: 6.937
Authors: Aaron E Darling; Guillaume Jospin; Eric Lowe; Frederick A Matsen; Holly M Bik; Jonathan A Eisen Journal: PeerJ Date: 2014-01-09 Impact factor: 2.984
Authors: Ebrahim Afshinnekoo; Cem Meydan; Shanin Chowdhury; Dyala Jaroudi; Collin Boyer; Nick Bernstein; Julia M Maritz; Darryl Reeves; Jorge Gandara; Sagar Chhangawala; Sofia Ahsanuddin; Amber Simmons; Timothy Nessel; Bharathi Sundaresh; Elizabeth Pereira; Ellen Jorgensen; Sergios-Orestis Kolokotronis; Nell Kirchberger; Isaac Garcia; David Gandara; Sean Dhanraj; Tanzina Nawrin; Yogesh Saletore; Noah Alexander; Priyanka Vijay; Elizabeth M Hénaff; Paul Zumbo; Michael Walsh; Gregory D O'Mullan; Scott Tighe; Joel T Dudley; Anya Dunaif; Sean Ennis; Eoghan O'Halloran; Tiago R Magalhaes; Braden Boone; Angela L Jones; Theodore R Muth; Katie Schneider Paolantonio; Elizabeth Alter; Eric E Schadt; Jeanne Garbarino; Robert J Prill; Jane M Carlton; Shawn Levy; Christopher E Mason Journal: Cell Syst Date: 2015-03-03 Impact factor: 10.304
Authors: Shibu Yooseph; Cynthia Andrews-Pfannkoch; Aaron Tenney; Jeff McQuaid; Shannon Williamson; Mathangi Thiagarajan; Daniel Brami; Lisa Zeigler-Allen; Jeff Hoffman; Johannes B Goll; Douglas Fadrosh; John Glass; Mark D Adams; Robert Friedman; J Craig Venter Journal: PLoS One Date: 2013-12-11 Impact factor: 3.240
Authors: Fernando Meyer; Till-Robin Lesker; David Koslicki; Adrian Fritz; Alexey Gurevich; Aaron E Darling; Alexander Sczyrba; Andreas Bremges; Alice C McHardy Journal: Nat Protoc Date: 2021-03-01 Impact factor: 13.491
Authors: Matthew Thoendel; Patricio Jeraldo; Kerryl E Greenwood-Quaintance; Janet Yao; Nicholas Chia; Arlen D Hanssen; Matthew P Abdel; Robin Patel Journal: J Clin Microbiol Date: 2020-02-24 Impact factor: 5.948
Authors: Fernando Meyer; Adrian Fritz; Zhi-Luo Deng; David Koslicki; Till Robin Lesker; Alexey Gurevich; Gary Robertson; Mohammed Alser; Dmitry Antipov; Francesco Beghini; Denis Bertrand; Jaqueline J Brito; C Titus Brown; Jan Buchmann; Aydin Buluç; Bo Chen; Rayan Chikhi; Philip T L C Clausen; Alexandru Cristian; Piotr Wojciech Dabrowski; Aaron E Darling; Rob Egan; Eleazar Eskin; Evangelos Georganas; Eugene Goltsman; Melissa A Gray; Lars Hestbjerg Hansen; Steven Hofmeyr; Pingqin Huang; Luiz Irber; Huijue Jia; Tue Sparholt Jørgensen; Silas D Kieser; Terje Klemetsen; Axel Kola; Mikhail Kolmogorov; Anton Korobeynikov; Jason Kwan; Nathan LaPierre; Claire Lemaitre; Chenhao Li; Antoine Limasset; Fabio Malcher-Miranda; Serghei Mangul; Vanessa R Marcelino; Camille Marchet; Pierre Marijon; Dmitry Meleshko; Daniel R Mende; Alessio Milanese; Niranjan Nagarajan; Jakob Nissen; Sergey Nurk; Leonid Oliker; Lucas Paoli; Pierre Peterlongo; Vitor C Piro; Jacob S Porter; Simon Rasmussen; Evan R Rees; Knut Reinert; Bernhard Renard; Espen Mikal Robertsen; Gail L Rosen; Hans-Joachim Ruscheweyh; Varuni Sarwal; Nicola Segata; Enrico Seiler; Lizhen Shi; Fengzhu Sun; Shinichi Sunagawa; Søren Johannes Sørensen; Ashleigh Thomas; Chengxuan Tong; Mirko Trajkovski; Julien Tremblay; Gherman Uritskiy; Riccardo Vicedomini; Zhengyang Wang; Ziye Wang; Zhong Wang; Andrew Warren; Nils Peder Willassen; Katherine Yelick; Ronghui You; Georg Zeller; Zhengqiao Zhao; Shanfeng Zhu; Jie Zhu; Ruben Garrido-Oter; Petra Gastmeier; Stephane Hacquard; Susanne Häußler; Ariane Khaledi; Friederike Maechler; Fantin Mesny; Simona Radutoiu; Paul Schulze-Lefert; Nathiana Smit; Till Strowig; Andreas Bremges; Alexander Sczyrba; Alice Carolyn McHardy Journal: Nat Methods Date: 2022-04-08 Impact factor: 28.547
Authors: Katrina L Kalantar; Tiago Carvalho; Charles F A de Bourcy; Boris Dimitrov; Greg Dingle; Rebecca Egger; Julie Han; Olivia B Holmes; Yun-Fang Juan; Ryan King; Andrey Kislyuk; Michael F Lin; Maria Mariano; Todd Morse; Lucia V Reynoso; David Rissato Cruz; Jonathan Sheu; Jennifer Tang; James Wang; Mark A Zhang; Emily Zhong; Vida Ahyong; Sreyngim Lay; Sophana Chea; Jennifer A Bohl; Jessica E Manning; Cristina M Tato; Joseph L DeRisi Journal: Gigascience Date: 2020-10-15 Impact factor: 6.524
Authors: Francesco Beghini; Lauren J McIver; Aitor Blanco-Míguez; Leonard Dubois; Francesco Asnicar; Sagun Maharjan; Ana Mailyan; Paolo Manghi; Matthias Scholz; Andrew Maltez Thomas; Mireia Valles-Colomer; George Weingart; Yancong Zhang; Moreno Zolfo; Curtis Huttenhower; Eric A Franzosa; Nicola Segata Journal: Elife Date: 2021-05-04 Impact factor: 8.140
Authors: Jonathan Foox; Scott W Tighe; Charles M Nicolet; Justin M Zook; Marta Byrska-Bishop; Wayne E Clarke; Michael M Khayat; Medhat Mahmoud; Phoebe K Laaguiby; Zachary T Herbert; Derek Warner; George S Grills; Jin Jen; Shawn Levy; Jenny Xiang; Alicia Alonso; Xia Zhao; Wenwei Zhang; Fei Teng; Yonggang Zhao; Haorong Lu; Gary P Schroth; Giuseppe Narzisi; William Farmerie; Fritz J Sedlazeck; Don A Baldwin; Christopher E Mason Journal: Nat Biotechnol Date: 2021-09-09 Impact factor: 54.908
Authors: David Danko; Daniela Bezdan; Evan E Afshin; Sofia Ahsanuddin; Chandrima Bhattacharya; Daniel J Butler; Kern Rei Chng; Daisy Donnellan; Jochen Hecht; Katelyn Jackson; Katerina Kuchin; Mikhail Karasikov; Abigail Lyons; Lauren Mak; Dmitry Meleshko; Harun Mustafa; Beth Mutai; Russell Y Neches; Amanda Ng; Olga Nikolayeva; Tatyana Nikolayeva; Eileen Png; Krista A Ryon; Jorge L Sanchez; Heba Shaaban; Maria A Sierra; Dominique Thomas; Ben Young; Omar O Abudayyeh; Josue Alicea; Malay Bhattacharyya; Ran Blekhman; Eduardo Castro-Nallar; Ana M Cañas; Aspassia D Chatziefthimiou; Robert W Crawford; Francesca De Filippis; Youping Deng; Christelle Desnues; Emmanuel Dias-Neto; Marius Dybwad; Eran Elhaik; Danilo Ercolini; Alina Frolova; Dennis Gankin; Jonathan S Gootenberg; Alexandra B Graf; David C Green; Iman Hajirasouliha; Jaden J A Hastings; Mark Hernandez; Gregorio Iraola; Soojin Jang; Andre Kahles; Frank J Kelly; Kaymisha Knights; Nikos C Kyrpides; Paweł P Łabaj; Patrick K H Lee; Marcus H Y Leung; Per O Ljungdahl; Gabriella Mason-Buck; Ken McGrath; Cem Meydan; Emmanuel F Mongodin; Milton Ozorio Moraes; Niranjan Nagarajan; Marina Nieto-Caballero; Houtan Noushmehr; Manuela Oliveira; Stephan Ossowski; Olayinka O Osuolale; Orhan Özcan; David Paez-Espino; Nicolás Rascovan; Hugues Richard; Gunnar Rätsch; Lynn M Schriml; Torsten Semmler; Osman U Sezerman; Leming Shi; Tieliu Shi; Rania Siam; Le Huu Song; Haruo Suzuki; Denise Syndercombe Court; Scott W Tighe; Xinzhao Tong; Klas I Udekwu; Juan A Ugalde; Brandon Valentine; Dimitar I Vassilev; Elena M Vayndorf; Thirumalaisamy P Velavan; Jun Wu; María M Zambrano; Jifeng Zhu; Sibo Zhu; Christopher E Mason Journal: Cell Date: 2021-05-26 Impact factor: 41.582