Literature DB >> 22923455

Improved gap size estimation for scaffolding algorithms.

Kristoffer Sahlin1, Nathaniel Street, Joakim Lundeberg, Lars Arvestad.   

Abstract

MOTIVATION: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.
RESULTS: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners. AVAILABILITY: A reference implementation is provided at https://github.com/SciLifeLab/gapest. SUPPLEMENTARY INFORMATION: Supplementary data are availible at Bioinformatics online.

Mesh:

Year:  2012        PMID: 22923455     DOI: 10.1093/bioinformatics/bts441

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  8 in total

1.  SWALO: scaffolding with assembly likelihood optimization.

Authors:  Atif Rahman; Lior Pachter
Journal:  Nucleic Acids Res       Date:  2021-11-18       Impact factor: 16.971

2.  RegScaf: a regression approach to scaffolding.

Authors:  Mengtian Li; Lei M Li
Journal:  Bioinformatics       Date:  2022-05-13       Impact factor: 6.931

3.  The Norway spruce genome sequence and conifer genome evolution.

Authors:  Björn Nystedt; Nathaniel R Street; Anna Wetterbom; Andrea Zuccolo; Yao-Cheng Lin; Douglas G Scofield; Francesco Vezzi; Nicolas Delhomme; Stefania Giacomello; Andrey Alexeyenko; Riccardo Vicedomini; Kristoffer Sahlin; Ellen Sherwood; Malin Elfstrand; Lydia Gramzow; Kristina Holmberg; Jimmie Hällman; Olivier Keech; Lisa Klasson; Maxim Koriabine; Melis Kucukoglu; Max Käller; Johannes Luthman; Fredrik Lysholm; Totte Niittylä; Ake Olson; Nemanja Rilakovic; Carol Ritland; Josep A Rosselló; Juliana Sena; Thomas Svensson; Carlos Talavera-López; Günter Theißen; Hannele Tuominen; Kevin Vanneste; Zhi-Qiang Wu; Bo Zhang; Philipp Zerbe; Lars Arvestad; Rishikesh Bhalerao; Joerg Bohlmann; Jean Bousquet; Rosario Garcia Gil; Torgeir R Hvidsten; Pieter de Jong; John MacKay; Michele Morgante; Kermit Ritland; Björn Sundberg; Stacey Lee Thompson; Yves Van de Peer; Björn Andersson; Ove Nilsson; Pär K Ingvarsson; Joakim Lundeberg; Stefan Jansson
Journal:  Nature       Date:  2013-05-22       Impact factor: 49.962

4.  Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools.

Authors:  Andrey Alexeyenko; Björn Nystedt; Francesco Vezzi; Ellen Sherwood; Rosa Ye; Bjarne Knudsen; Martin Simonsen; Benjamin Turner; Pieter de Jong; Cheng-Cang Wu; Joakim Lundeberg
Journal:  BMC Genomics       Date:  2014-06-06       Impact factor: 3.969

5.  BESST--efficient scaffolding of large fragmented assemblies.

Authors:  Kristoffer Sahlin; Francesco Vezzi; Björn Nystedt; Joakim Lundeberg; Lars Arvestad
Journal:  BMC Bioinformatics       Date:  2014-08-15       Impact factor: 3.169

6.  OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

Authors:  Song Gao; Denis Bertrand; Burton K H Chia; Niranjan Nagarajan
Journal:  Genome Biol       Date:  2016-05-11       Impact factor: 13.583

7.  MaGuS: a tool for quality assessment and scaffolding of genome assemblies with Whole Genome Profiling™ Data.

Authors:  Mohammed-Amin Madoui; Carole Dossat; Léo d'Agata; Jan van Oeveren; Edwin van der Vossen; Jean-Marc Aury
Journal:  BMC Bioinformatics       Date:  2016-03-03       Impact factor: 3.169

8.  Functional divergence of duplicate genes several million years after gene duplication in Arabidopsis.

Authors:  Kousuke Hanada; Ayumi Tezuka; Masafumi Nozawa; Yutaka Suzuki; Sumio Sugano; Atsushi J Nagano; Motomi Ito; Shin-Ichi Morinaga
Journal:  DNA Res       Date:  2018-02-21       Impact factor: 4.458

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.