Emanuele Bosi1, Beatrice Donati2, Marco Galardini3, Sara Brunetti4, Marie-France Sagot5, Pietro Lió6, Pierluigi Crescenzi7, Renato Fani1, Marco Fondi1. 1. Department of Biology, ComBo, Florence Computational Biology Group, Department of Biology, LEMM, Laboratory of Microbial and Molecular Evolution Florence, University of Florence, I-50019 Sesto F.no, Italy. 2. INRIA Rhône-Alpes, Villeurbanne Cedex, France, Université de Lyon, F-69000 Lyon, France, Dipartimento di Ingegneria dell'Informazione, University of Florence, I-50139 Firenze, Italy. 3. EMBL-EBI - European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, CB10 1SD Cambridge, UK. 4. Dipartimento di Ingegneria dell'Informazione e Scienze Matematiche, University of Siena, Siena I-53100, Italy. 5. INRIA Rhône-Alpes, Villeurbanne Cedex, France, Université de Lyon, F-69000 Lyon, France, Université Lyon 1, CNRS,UMR5558, 69622 Villeurbanne Cedex, France and. 6. Computer Laboratory, University of Cambridge, CB3 0FD Cambridge, UK. 7. Dipartimento di Ingegneria dell'Informazione, University of Florence, I-50139 Firenze, Italy.
Abstract
MOTIVATION: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. RESULTS: In this article we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalizes the scaffolding problem by means of a combinatorial optimization formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results.
MOTIVATION: Completing the genome sequence of an organism is an important task in comparative, functional and structural genomics. However, this remains a challenging issue from both a computational and an experimental viewpoint. Genome scaffolding (i.e. the process of ordering and orientating contigs) of de novo assemblies usually represents the first step in most genome finishing pipelines. RESULTS: In this article we present MeDuSa (Multi-Draft based Scaffolder), an algorithm for genome scaffolding. MeDuSa exploits information obtained from a set of (draft or closed) genomes from related organisms to determine the correct order and orientation of the contigs. MeDuSa formalizes the scaffolding problem by means of a combinatorial optimization formulation on graphs and implements an efficient constant factor approximation algorithm to solve it. In contrast to currently used scaffolders, it does not require either prior knowledge on the microrganisms dataset under analysis (e.g. their phylogenetic relationships) or the availability of paired end read libraries. This makes usability and running time two additional important features of our method. Moreover, benchmarks and tests on real bacterial datasets showed that MeDuSa is highly accurate and, in most cases, outperforms traditional scaffolders. The possibility to use MeDuSa on eukaryotic datasets has also been evaluated, leading to interesting results.
Authors: Ananda S Bhattacharjee; Sha Wu; Christopher E Lawson; Mike S M Jetten; Vikram Kapoor; Jorge W Santo Domingo; Katherine D McMahon; Daniel R Noguera; Ramesh Goel Journal: Environ Sci Technol Date: 2017-03-31 Impact factor: 9.028
Authors: Alejandro Palomo; Anders G Pedersen; S Jane Fowler; Arnaud Dechesne; Thomas Sicheritz-Pontén; Barth F Smets Journal: ISME J Date: 2018-03-07 Impact factor: 10.302
Authors: Marco Antonio Carballo-Ontiveros; Adrián Cazares; Pablo Vinuesa; Luis Kameyama; Gabriel Guarneros Journal: J Virol Date: 2020-07-16 Impact factor: 5.103
Authors: Sandra Wiegand; Mareike Jogler; Christian Boedeker; Daniela Pinto; John Vollmers; Elena Rivas-Marín; Timo Kohn; Stijn H Peeters; Anja Heuer; Patrick Rast; Sonja Oberbeckmann; Boyke Bunk; Olga Jeske; Anke Meyerdierks; Julia E Storesund; Nicolai Kallscheuer; Sebastian Lücker; Olga M Lage; Thomas Pohl; Broder J Merkel; Peter Hornburger; Ralph-Walter Müller; Franz Brümmer; Matthias Labrenz; Alfred M Spormann; Huub J M Op den Camp; Jörg Overmann; Rudolf Amann; Mike S M Jetten; Thorsten Mascher; Marnix H Medema; Damien P Devos; Anne-Kristin Kaster; Lise Øvreås; Manfred Rohde; Michael Y Galperin; Christian Jogler Journal: Nat Microbiol Date: 2019-11-18 Impact factor: 17.745
Authors: Gautam Gaur; Jee-Hwan Oh; Pasquale Filannino; Marco Gobbetti; Jan-Peter van Pijkeren; Michael G Gänzle Journal: Appl Environ Microbiol Date: 2020-02-18 Impact factor: 4.792
Authors: Maria K Syrokou; Spiros Paramithiotis; Panagiotis N Skandamis; Eleftherios H Drosinos; Loulouda Bosnea; Marios Mataragas Journal: Data Brief Date: 2021-05-28