Literature DB >> 35561180

RegScaf: a regression approach to scaffolding.

Mengtian Li1,2, Lei M Li1,2.   

Abstract

MOTIVATION: Crucial to the correctness of a genome assembly is the accuracy of the underlying scaffolds that specify the orders and orientations of contigs together with the gap distances between contigs. The current methods construct scaffolds based on the alignments of 'linking' reads against contigs. We found that some 'optimal' alignments are mistaken due to factors such as the contig boundary effect, particularly in the presence of repeats. Occasionally, the incorrect alignments can even overwhelm the correct ones. The detection of the incorrect linking information is challenging in any existing methods.
RESULTS: In this study, we present a novel scaffolding method RegScaf. It first examines the distribution of distances between contigs from read alignment by the kernel density. When multiple modes are shown in a density, orientation-supported links are grouped into clusters, each of which defines a linking distance corresponding to a mode. The linear model parameterizes contigs by their positions on the genome; then each linking distance between a pair of contigs is taken as an observation on the difference of their positions. The parameters are estimated by minimizing a global loss function, which is a version of trimmed sum of squares. The least trimmed squares estimate has such a high breakdown value that it can automatically remove the mistaken linking distances. The results on both synthetic and real datasets demonstrate that RegScaf outperforms some popular scaffolders, especially in the accuracy of gap estimates by substantially reducing extremely abnormal errors. Its strength in resolving repeat regions is exemplified by a real case. Its adaptability to large genomes and TGS long reads is validated as well.
AVAILABILITY AND IMPLEMENTATION: RegScaf is publicly available at https://github.com/lemontealala/RegScaf.git. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2022. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

Entities:  

Mesh:

Year:  2022        PMID: 35561180      PMCID: PMC9326850          DOI: 10.1093/bioinformatics/btac174

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.931


  21 in total

1.  Scaffolding pre-assembled contigs using SSPACE.

Authors:  Marten Boetzer; Christiaan V Henkel; Hans J Jansen; Derek Butler; Walter Pirovano
Journal:  Bioinformatics       Date:  2010-12-12       Impact factor: 6.937

2.  SEME: a fast mapper of Illumina sequencing reads with statistical evaluation.

Authors:  Shijian Chen; Anqi Wang; Lei M Li
Journal:  J Comput Biol       Date:  2013-11       Impact factor: 1.479

3.  Repeat-aware evaluation of scaffolding tools.

Authors:  Igor Mandric; Sergey Knyazev; Alex Zelikovsky
Journal:  Bioinformatics       Date:  2018-08-01       Impact factor: 6.937

4.  BAUM: improving genome assembly by adaptive unique mapping and local overlap-layout-consensus approach.

Authors:  Anqi Wang; Zhanyu Wang; Zheng Li; Lei M Li
Journal:  Bioinformatics       Date:  2018-06-15       Impact factor: 6.937

5.  Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping.

Authors:  D C Schwartz; X Li; L I Hernandez; S P Ramnarain; E J Huff; Y K Wang
Journal:  Science       Date:  1993-10-01       Impact factor: 47.728

6.  BESST--efficient scaffolding of large fragmented assemblies.

Authors:  Kristoffer Sahlin; Francesco Vezzi; Björn Nystedt; Joakim Lundeberg; Lars Arvestad
Journal:  BMC Bioinformatics       Date:  2014-08-15       Impact factor: 3.169

7.  OPERA-LG: efficient and exact scaffolding of large, repeat-rich eukaryotic genomes with performance guarantees.

Authors:  Song Gao; Denis Bertrand; Burton K H Chia; Niranjan Nagarajan
Journal:  Genome Biol       Date:  2016-05-11       Impact factor: 13.583

8.  Versatile genome assembly evaluation with QUAST-LG.

Authors:  Alla Mikheenko; Andrey Prjibelski; Vladislav Saveliev; Dmitry Antipov; Alexey Gurevich
Journal:  Bioinformatics       Date:  2018-07-01       Impact factor: 6.937

9.  Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory.

Authors:  Mark J Chaisson; Glenn Tesler
Journal:  BMC Bioinformatics       Date:  2012-09-19       Impact factor: 3.169

10.  LoRDEC: accurate and efficient long read error correction.

Authors:  Leena Salmela; Eric Rivals
Journal:  Bioinformatics       Date:  2014-08-26       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.