Pranvera Hiseni1,2, Knut Rudi3,4, Robert C Wilson4, Finn Terje Hegge5, Lars Snipen3. 1. Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway. ph@genetic-analysis.com. 2. Genetic Analysis AS, Kabelgaten 8, 0580, Oslo, Norway. ph@genetic-analysis.com. 3. Department of Chemistry, Biotechnology and Food Sciences, Norwegian University of Life Sciences, P.O. Box 5003, 1432, Ås, Norway. 4. Department of Biotechnology, Inland Norway University of Applied Sciences, 2318, Hamar, Norway. 5. Genetic Analysis AS, Kabelgaten 8, 0580, Oslo, Norway.
Abstract
BACKGROUND: A major bottleneck in the use of metagenome sequencing for human gut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a huge increase in the number of relevant genomes. In this work, we aimed to create a collection of the most prevalent healthy human gut prokaryotic genomes, to be used as a reference database, including both MAGs from the human gut and ordinary RefSeq genomes. RESULTS: We screened > 5,700 healthy human gut metagenomes for the containment of > 490,000 publicly available prokaryotic genomes sourced from RefSeq and the recently announced UHGG collection. This resulted in a pool of > 381,000 genomes that were subsequently scored and ranked based on their prevalence in the healthy human metagenomes. The genomes were then clustered at a 97.5% sequence identity resolution, and cluster representatives (30,691 in total) were retained to comprise the HumGut collection. Using the Kraken2 software for classification, we find superior performance in the assignment of metagenomic reads, classifying on average 94.5% of the reads in a metagenome, as opposed to 86% with UHGG and 44% when using standard Kraken2 database. A coarser HumGut collection, consisting of genomes dereplicated at 95% sequence identity-similar to UHGG, classified 88.25% of the reads. HumGut, half the size of standard Kraken2 database and directly comparable to the UHGG size, outperforms them both. CONCLUSIONS: The HumGut collection contains > 30,000 genomes clustered at a 97.5% sequence identity resolution and ranked by human gut prevalence. We demonstrate how metagenomes from IBD-patients map equally well to this collection, indicating this reference is relevant also for studies well outside the metagenome reference set used to obtain HumGut. All data and metadata, as well as helpful code, are available at http://arken.nmbu.no/~larssn/humgut/ . Video Abstract.
BACKGROUND: A major bottleneck in the use of metagenome sequencing for humangut microbiome studies has been the lack of a comprehensive genome collection to be used as a reference database. Several recent efforts have been made to re-construct genomes from human gut metagenome data, resulting in a huge increase in the number of relevant genomes. In this work, we aimed to create a collection of the most prevalent healthy human gut prokaryotic genomes, to be used as a reference database, including both MAGs from the human gut and ordinary RefSeq genomes. RESULTS: We screened > 5,700 healthy human gut metagenomes for the containment of > 490,000 publicly available prokaryotic genomes sourced from RefSeq and the recently announced UHGG collection. This resulted in a pool of > 381,000 genomes that were subsequently scored and ranked based on their prevalence in the healthy humanmetagenomes. The genomes were then clustered at a 97.5% sequence identity resolution, and cluster representatives (30,691 in total) were retained to comprise the HumGut collection. Using the Kraken2 software for classification, we find superior performance in the assignment of metagenomic reads, classifying on average 94.5% of the reads in a metagenome, as opposed to 86% with UHGG and 44% when using standard Kraken2 database. A coarser HumGut collection, consisting of genomes dereplicated at 95% sequence identity-similar to UHGG, classified 88.25% of the reads. HumGut, half the size of standard Kraken2 database and directly comparable to the UHGG size, outperforms them both. CONCLUSIONS: The HumGut collection contains > 30,000 genomes clustered at a 97.5% sequence identity resolution and ranked by human gut prevalence. We demonstrate how metagenomes from IBD-patients map equally well to this collection, indicating this reference is relevant also for studies well outside the metagenome reference set used to obtain HumGut. All data and metadata, as well as helpful code, are available at http://arken.nmbu.no/~larssn/humgut/ . Video Abstract.
Authors: Taylor C Wallace; Francisco Guarner; Karen Madsen; Michael D Cabana; Glenn Gibson; Eric Hentges; Mary Ellen Sanders Journal: Nutr Rev Date: 2011-06-30 Impact factor: 7.110
Authors: Aurélie Cotillard; Sean P Kennedy; Ling Chun Kong; Edi Prifti; Nicolas Pons; Emmanuelle Le Chatelier; Mathieu Almeida; Benoit Quinquis; Florence Levenez; Nathalie Galleron; Sophie Gougis; Salwa Rizkalla; Jean-Michel Batto; Pierre Renault; Joel Doré; Jean-Daniel Zucker; Karine Clément; Stanislav Dusko Ehrlich Journal: Nature Date: 2013-08-29 Impact factor: 49.962
Authors: Daphna Rothschild; Omer Weissbrod; Elad Barkan; Alexander Kurilshikov; Tal Korem; David Zeevi; Paul I Costea; Anastasia Godneva; Iris N Kalka; Noam Bar; Smadar Shilo; Dar Lador; Arnau Vich Vila; Niv Zmora; Meirav Pevsner-Fischer; David Israeli; Noa Kosower; Gal Malka; Bat Chen Wolf; Tali Avnit-Sagi; Maya Lotan-Pompan; Adina Weinberger; Zamir Halpern; Shai Carmi; Jingyuan Fu; Cisca Wijmenga; Alexandra Zhernakova; Eran Elinav; Eran Segal Journal: Nature Date: 2018-02-28 Impact factor: 49.962
Authors: Jonas Halfvarson; Colin J Brislawn; Regina Lamendella; Yoshiki Vázquez-Baeza; William A Walters; Lisa M Bramer; Mauro D'Amato; Ferdinando Bonfiglio; Daniel McDonald; Antonio Gonzalez; Erin E McClure; Mitchell F Dunklebarger; Rob Knight; Janet K Jansson Journal: Nat Microbiol Date: 2017-02-13 Impact factor: 17.745
Authors: Samuel C Forster; Nitin Kumar; Blessing O Anonye; Alexandre Almeida; Elisa Viciani; Mark D Stares; Matthew Dunn; Tapoka T Mkandawire; Ana Zhu; Yan Shao; Lindsay J Pike; Thomas Louie; Hilary P Browne; Alex L Mitchell; B Anne Neville; Robert D Finn; Trevor D Lawley Journal: Nat Biotechnol Date: 2019-02-04 Impact factor: 54.908