Literature DB >> 34090340

Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms.

Nadège Guiglielmoni1, Antoine Houtain2, Alessandro Derzelle2, Karine Van Doninck2,3, Jean-François Flot4,5.   

Abstract

BACKGROUND: Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Failure to properly collapse haplotypes results in fragmented and structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, yet this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking.
RESULTS: We tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups.
CONCLUSIONS: We provide a thorough evaluation of popular assemblers on a non-model eukaryote genome with variable levels of heterozygosity. Our study highlights several strategies using pre and post-processing approaches to generate haploid assemblies with high continuity and completeness. This benchmark will help users to improve haploid assemblies of non-model organisms, and evaluate the quality of their own assemblies.

Entities:  

Keywords:  Genome assembly; Haplotype collapsing; Long reads

Mesh:

Year:  2021        PMID: 34090340     DOI: 10.1186/s12859-021-04118-3

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  18 in total

1.  WhatsHap: Weighted Haplotype Assembly for Future-Generation Sequencing Reads.

Authors:  Murray Patterson; Tobias Marschall; Nadia Pisanti; Leo van Iersel; Leen Stougie; Gunnar W Klau; Alexander Schönhuth
Journal:  J Comput Biol       Date:  2015-02-06       Impact factor: 1.479

2.  Assembly of long, error-prone reads using repeat graphs.

Authors:  Mikhail Kolmogorov; Jeffrey Yuan; Yu Lin; Pavel A Pevzner
Journal:  Nat Biotechnol       Date:  2019-04-01       Impact factor: 54.908

Review 3.  Assembly algorithms for next-generation sequencing data.

Authors:  Jason R Miller; Sergey Koren; Granger Sutton
Journal:  Genomics       Date:  2010-03-06       Impact factor: 5.736

Review 4.  Piercing the dark matter: bioinformatics of long-range sequencing and mapping.

Authors:  Fritz J Sedlazeck; Hayan Lee; Charlotte A Darby; Michael C Schatz
Journal:  Nat Rev Genet       Date:  2018-06       Impact factor: 53.242

5.  Genomic evidence for ameiotic evolution in the bdelloid rotifer Adineta vaga.

Authors:  Jean-François Flot; Boris Hespeels; Xiang Li; Benjamin Noel; Irina Arkhipova; Etienne G J Danchin; Andreas Hejnol; Bernard Henrissat; Romain Koszul; Jean-Marc Aury; Valérie Barbe; Roxane-Marie Barthélémy; Jens Bast; Georgii A Bazykin; Olivier Chabrol; Arnaud Couloux; Martine Da Rocha; Corinne Da Silva; Eugene Gladyshev; Philippe Gouret; Oskar Hallatschek; Bette Hecox-Lea; Karine Labadie; Benjamin Lejeune; Oliver Piskurek; Julie Poulain; Fernando Rodriguez; Joseph F Ryan; Olga A Vakhrusheva; Eric Wajnberg; Bénédicte Wirth; Irina Yushenova; Manolis Kellis; Alexey S Kondrashov; David B Mark Welch; Pierre Pontarotti; Jean Weissenbach; Patrick Wincker; Olivier Jaillon; Karine Van Doninck
Journal:  Nature       Date:  2013-07-21       Impact factor: 49.962

6.  Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes.

Authors:  Kishwar Shafin; Trevor Pesout; Ryan Lorig-Roach; Marina Haukness; Hugh E Olsen; Colleen Bosworth; Joel Armstrong; Kristof Tigyi; Nicholas Maurer; Sergey Koren; Fritz J Sedlazeck; Tobias Marschall; Simon Mayes; Vania Costa; Justin M Zook; Kelvin J Liu; Duncan Kilburn; Melanie Sorensen; Katy M Munson; Mitchell R Vollger; Jean Monlong; Erik Garrison; Evan E Eichler; Sofie Salama; David Haussler; Richard E Green; Mark Akeson; Adam Phillippy; Karen H Miga; Paolo Carnevali; Miten Jain; Benedict Paten
Journal:  Nat Biotechnol       Date:  2020-05-04       Impact factor: 54.908

7.  Nanopore sequencing and assembly of a human genome with ultra-long reads.

Authors:  Miten Jain; Sergey Koren; Karen H Miga; Josh Quick; Arthur C Rand; Thomas A Sasani; John R Tyson; Andrew D Beggs; Alexander T Dilthey; Ian T Fiddes; Sunir Malla; Hannah Marriott; Tom Nieto; Justin O'Grady; Hugh E Olsen; Brent S Pedersen; Arang Rhie; Hollian Richardson; Aaron R Quinlan; Terrance P Snutch; Louise Tee; Benedict Paten; Adam M Phillippy; Jared T Simpson; Nicholas J Loman; Matthew Loose
Journal:  Nat Biotechnol       Date:  2018-01-29       Impact factor: 54.908

8.  HaploMerger2: rebuilding both haploid sub-assemblies from high-heterozygosity diploid genome assembly.

Authors:  Shengfeng Huang; Mingjing Kang; Anlong Xu
Journal:  Bioinformatics       Date:  2017-08-15       Impact factor: 6.937

9.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Authors:  Aaron M Wenger; Paul Peluso; William J Rowell; Pi-Chuan Chang; Richard J Hall; Gregory T Concepcion; Jana Ebler; Arkarachai Fungtammasan; Alexey Kolesnikov; Nathan D Olson; Armin Töpfer; Michael Alonge; Medhat Mahmoud; Yufeng Qian; Chen-Shan Chin; Adam M Phillippy; Michael C Schatz; Gene Myers; Mark A DePristo; Jue Ruan; Tobias Marschall; Fritz J Sedlazeck; Justin M Zook; Heng Li; Sergey Koren; Andrew Carroll; David R Rank; Michael W Hunkapiller
Journal:  Nat Biotechnol       Date:  2019-08-12       Impact factor: 54.908

10.  Fast and accurate long-read assembly with wtdbg2.

Authors:  Jue Ruan; Heng Li
Journal:  Nat Methods       Date:  2019-12-09       Impact factor: 28.547

View more
  3 in total

1.  Comparative analysis of the Mercenaria mercenaria genome provides insights into the diversity of transposable elements and immune molecules in bivalve mollusks.

Authors:  Sarah Farhat; Eric Bonnivard; Emmanuelle Pales Espinosa; Arnaud Tanguy; Isabelle Boutet; Nadège Guiglielmoni; Jean-François Flot; Bassem Allam
Journal:  BMC Genomics       Date:  2022-03-08       Impact factor: 3.969

2.  PacBio sequencing output increased through uniform and directional fivefold concatenation.

Authors:  Nisha Kanwar; Celia Blanco; Irene A Chen; Burckhard Seelig
Journal:  Sci Rep       Date:  2021-09-10       Impact factor: 4.379

3.  Genome Assembly of the Cold-Tolerant Leaf Beetle Gonioctena quinquepunctata, an Important Resource for Studying Its Evolution and Reproductive Barriers between Species.

Authors:  Svitlana Lukicheva; Jean-François Flot; Patrick Mardulyn
Journal:  Genome Biol Evol       Date:  2021-07-06       Impact factor: 3.416

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.