Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Assembly scaffolding with PE-contaminated mate-pair libraries.

Literature DB >> 27153683

Assembly scaffolding with PE-contaminated mate-pair libraries.

Kristoffer Sahlin¹, Rayan Chikhi², Lars Arvestad³.

Abstract

MOTIVATION: Scaffolding is often an essential step in a genome assembly process, in which contigs are ordered and oriented using read pairs from a combination of paired-end libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problems is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed before, in relation to integrated scaffolders, but solutions rely on the orientation being observable, e.g. by finding the junction adapter sequence in the reads. This is not always possible, making orientation and insert size of a read pair stochastic. To our knowledge, there is neither previous work on modeling PE-contamination, nor a study on the effect PE-contamination has on scaffolding quality.
RESULTS: We have addressed PE-contamination in an update to our scaffolder BESST. We formulate the problem as an integer linear program which is solved using an efficient heuristic. The new method shows significant improvement over both integrated and stand-alone scaffolders in our experiments. The impact of modeling PE-contamination is quantified by comparing with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in an increased number of misassemblies, more conservative scaffolding and inflated assembly sizes.
AVAILABILITY AND IMPLEMENTATION: The model is implemented in BESST. Source code and usage instructions are found at https://github.com/ksahlin/BESST BESST can also be downloaded using PyPI. CONTACT: ksahlin@kth.se SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh：

Year: 2016 PMID： 27153683 DOI： 10.1093/bioinformatics/btw064

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

23 in total

1. SWALO: scaffolding with assembly likelihood optimization.

Authors: Atif Rahman; Lior Pachter
Journal: Nucleic Acids Res Date: 2021-11-18 Impact factor: 16.971

2. Genome Sequence of the Edible Green Alga Ulva prolifera, Originating from the Yoshinogawa River in Japan.

Authors: Keita Tamura; Hidemasa Bono
Journal: Microbiol Resour Announc Date: 2022-08-29

3. Fast-SG: an alignment-free algorithm for hybrid assembly.

Authors: Alex Di Genova; Gonzalo A Ruz; Marie-France Sagot; Alejandro Maass
Journal: Gigascience Date: 2018-05-01 Impact factor: 6.524

4. Molecular mechanisms of mutualistic and antagonistic interactions in a plant-pollinator association.

Authors: Rong Wang; Yang Yang; Yi Jing; Simon T Segar; Yu Zhang; Gang Wang; Jin Chen; Qing-Feng Liu; Shan Chen; Yan Chen; Astrid Cruaud; Yuan-Yuan Ding; Derek W Dunn; Qiang Gao; Philip M Gilmartin; Kai Jiang; Finn Kjellberg; Hong-Qing Li; Yuan-Yuan Li; Jian-Quan Liu; Min Liu; Carlos A Machado; Ray Ming; Jean-Yves Rasplus; Xin Tong; Ping Wen; Huan-Ming Yang; Jing-Jun Yang; Ye Yin; Xing-Tan Zhang; Yuan-Ye Zhang; Hui Yu; Zhen Yue; Stephen G Compton; Xiao-Yong Chen
Journal: Nat Ecol Evol Date: 2021-05-17 Impact factor: 15.460

5. Chromosome Level Assembly of the Comma Butterfly (Polygonia c-album).

Authors: Maria de la Paz Celorio-Mancera; Pasi Rastas; Rachel A Steward; Soren Nylin; Christopher W Wheat
Journal: Genome Biol Evol Date: 2021-05-07 Impact factor: 3.416

6. Sequencing and Reconstructing Helminth Mitochondrial Genomes Directly from Genomic Next-Generation Sequencing Data.

Authors: Nikola Palevich; Paul Haydon Maclean
Journal: Methods Mol Biol Date: 2021

7. GSER (a Genome Size Estimator using R): a pipeline for quality assessment of sequenced genome libraries through genome size estimation.

Authors: Braulio Valdebenito-Maturana; Gonzalo Riadi
Journal: Interface Focus Date: 2021-06-11 Impact factor: 4.661

8. Whole-genome assembly of the coral reef Pearlscale Pygmy Angelfish (Centropyge vrolikii).

Authors: Iria Fernandez-Silva; James B Henderson; Luiz A Rocha; W Brian Simison
Journal: Sci Rep Date: 2018-01-24 Impact factor: 4.379

9. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter.

Authors: Shaun D Jackman; Benjamin P Vandervalk; Hamid Mohamadi; Justin Chu; Sarah Yeo; S Austin Hammond; Golnaz Jahesh; Hamza Khan; Lauren Coombe; Rene L Warren; Inanc Birol
Journal: Genome Res Date: 2017-02-23 Impact factor: 9.043

10. Dog10K_Boxer_Tasha_1.0: A Long-Read Assembly of the Dog Reference Genome.

Authors: Vidhya Jagannathan; Christophe Hitte; Jeffrey M Kidd; Patrick Masterson; Terence D Murphy; Sarah Emery; Brian Davis; Reuben M Buckley; Yan-Hu Liu; Xiang-Quan Zhang; Tosso Leeb; Ya-Ping Zhang; Elaine A Ostrander; Guo-Dong Wang
Journal: Genes (Basel) Date: 2021-05-30 Impact factor: 4.096