Olexiy Kyrgyzov1, Vincent Prost1,2, Stéphane Gazut2, Bruno Farcy3, Thomas Brüls1. 1. Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France. 2. Laboratoire Sciences des Données et de la Décision, LIST, CEA, Bâtiment 565, 91191 Gif-sur-Yvette, France. 3. Atos Bull Technologies, 68 avenue Jean Jaurès, 78340 Les Clayes-sous-Bois, France.
Abstract
BACKGROUND: Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. RESULTS: We present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, >1010 reads). CONCLUSION: We showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels.
BACKGROUND: Sequence-binning techniques enable the recovery of an increasing number of genomes from complex microbial metagenomes and typically require prior metagenome assembly, incurring the computational cost and drawbacks of the latter, e.g., biases against low-abundance genomes and inability to conveniently assemble multi-terabyte datasets. RESULTS: We present here a scalable pre-assembly binning scheme (i.e., operating on unassembled short reads) enabling latent genome recovery by leveraging sparse dictionary learning and elastic-net regularization, and its use to recover hundreds of metagenome-assembled genomes, including very low-abundance genomes, from a joint analysis of microbiomes from the LifeLines DEEP population cohort (n = 1,135, >1010 reads). CONCLUSION: We showed that sparse coding techniques can be leveraged to carry out read-level binning at large scale and that, despite lower genome reconstruction yields compared to assembly-based approaches, bin-first strategies can complement the more widely used assembly-first protocols by targeting distinct genome segregation profiles. Read enrichment levels across 6 orders of magnitude in relative abundance were observed, indicating that the method has the power to recover genomes consistently segregating at low levels.
Authors: Donovan H Parks; Christian Rinke; Maria Chuvochina; Pierre-Alain Chaumeil; Ben J Woodcroft; Paul N Evans; Philip Hugenholtz; Gene W Tyson Journal: Nat Microbiol Date: 2017-09-11 Impact factor: 17.745
Authors: Johannes Alneberg; Brynjar Smári Bjarnason; Ino de Bruijn; Melanie Schirmer; Joshua Quick; Umer Z Ijaz; Leo Lahti; Nicholas J Loman; Anders F Andersson; Christopher Quince Journal: Nat Methods Date: 2014-09-14 Impact factor: 28.547
Authors: Alexandre Jousset; Christina Bienhold; Antonis Chatzinotas; Laure Gallien; Angélique Gobet; Viola Kurm; Kirsten Küsel; Matthias C Rillig; Damian W Rivett; Joana F Salles; Marcel G A van der Heijden; Noha H Youssef; Xiaowei Zhang; Zhong Wei; W H Gera Hol Journal: ISME J Date: 2017-01-10 Impact factor: 10.302
Authors: Alexandra Zhernakova; Alexander Kurilshikov; Marc Jan Bonder; Ettje F Tigchelaar; Melanie Schirmer; Tommi Vatanen; Zlatan Mujagic; Arnau Vich Vila; Gwen Falony; Sara Vieira-Silva; Jun Wang; Floris Imhann; Eelke Brandsma; Soesma A Jankipersadsing; Marie Joossens; Maria Carmen Cenit; Patrick Deelen; Morris A Swertz; Rinse K Weersma; Edith J M Feskens; Mihai G Netea; Dirk Gevers; Daisy Jonkers; Lude Franke; Yurii S Aulchenko; Curtis Huttenhower; Jeroen Raes; Marten H Hofker; Ramnik J Xavier; Cisca Wijmenga; Jingyuan Fu Journal: Science Date: 2016-04-28 Impact factor: 47.728