Literature DB >> 35077539

DENTIST-using long reads for closing assembly gaps at high accuracy.

Arne Ludwig1,2, Martin Pippel1,2, Gene Myers1,2, Michael Hiller1,2,3,4,5,6.   

Abstract

BACKGROUND: Long sequencing reads allow increasing contiguity and completeness of fragmented, short-read-based genome assemblies by closing assembly gaps, ideally at high accuracy. While several gap-closing methods have been developed, these methods often close an assembly gap with sequence that does not accurately represent the true sequence.
FINDINGS: Here, we present DENTIST, a sensitive, highly accurate, and automated pipeline method to close gaps in short-read assemblies with long error-prone reads. DENTIST comprehensively determines repetitive assembly regions to identify reliable and unambiguous alignments of long reads to the correct loci, integrates a consensus sequence computation step to obtain a high base accuracy for the inserted sequence, and validates the accuracy of closed gaps. Unlike previous benchmarks, we generated test assemblies that have gaps at the exact positions where real short-read assemblies have gaps. Generating such realistic benchmarks for Drosophila (134 Mb genome), Arabidopsis (119 Mb), hummingbird (1 Gb), and human (3 Gb) and using simulated or real PacBio continuous long reads, we show that DENTIST consistently achieves a substantially higher accuracy compared to previous methods, while having a similar sensitivity.
CONCLUSION: DENTIST provides an accurate approach to improve the contiguity and completeness of fragmented assemblies with long reads. DENTIST's source code including a Snakemake workflow, conda package, and Docker container is available at https://github.com/a-ludi/dentist. All test assemblies as a resource for future benchmarking are at https://bds.mpi-cbg.de/hillerlab/DENTIST/.
© The Author(s) 2022. Published by Oxford University Press GigaScience.

Entities:  

Keywords:  assembly gaps; genome assembly; long sequencing reads

Mesh:

Year:  2022        PMID: 35077539      PMCID: PMC8848313          DOI: 10.1093/gigascience/giab100

Source DB:  PubMed          Journal:  Gigascience        ISSN: 2047-217X            Impact factor:   7.658


  18 in total

1.  Evolution's cauldron: duplication, deletion, and rearrangement in the mouse and human genomes.

Authors:  W James Kent; Robert Baertsch; Angie Hinrichs; Webb Miller; David Haussler
Journal:  Proc Natl Acad Sci U S A       Date:  2003-09-19       Impact factor: 11.205

2.  FinisherSC: a repeat-aware tool for upgrading de novo assembly using long reads.

Authors:  Ka-Kit Lam; Kurt LaButti; Asif Khalak; David Tse
Journal:  Bioinformatics       Date:  2015-06-03       Impact factor: 6.937

3.  Snakemake--a scalable bioinformatics workflow engine.

Authors:  Johannes Köster; Sven Rahmann
Journal:  Bioinformatics       Date:  2012-08-20       Impact factor: 6.937

4.  Sequence diversity analyses of an improved rhesus macaque genome enhance its biomedical utility.

Authors:  Wesley C Warren; R Alan Harris; Marina Haukness; Ian T Fiddes; Shwetha C Murali; Jason Fernandes; Philip C Dishuck; Jessica M Storer; Muthuswamy Raveendran; LaDeana W Hillier; David Porubsky; Yafei Mao; David Gordon; Mitchell R Vollger; Alexandra P Lewis; Katherine M Munson; Elizabeth DeVogelaere; Joel Armstrong; Mark Diekhans; Jerilyn A Walker; Chad Tomlinson; Tina A Graves-Lindsay; Milinn Kremitzki; Sofie R Salama; Peter A Audano; Merly Escalona; Nicholas W Maurer; Francesca Antonacci; Ludovica Mercuri; Flavia A M Maggiolini; Claudia Rita Catacchio; Jason G Underwood; David H O'Connor; Ashley D Sanders; Jan O Korbel; Betsy Ferguson; H Michael Kubisch; Louis Picker; Ned H Kalin; Douglas Rosene; Jon Levine; David H Abbott; Stanton B Gray; Mar M Sanchez; Zsofia A Kovacs-Balint; Joseph W Kemnitz; Sara M Thomasy; Jeffrey A Roberts; Erin L Kinnally; John P Capitanio; J H Pate Skene; Michael Platt; Shelley A Cole; Richard E Green; Mario Ventura; Roger W Wiseman; Benedict Paten; Mark A Batzer; Jeffrey Rogers; Evan E Eichler
Journal:  Science       Date:  2020-12-18       Impact factor: 47.728

5.  Singularity: Scientific containers for mobility of compute.

Authors:  Gregory M Kurtzer; Vanessa Sochat; Michael W Bauer
Journal:  PLoS One       Date:  2017-05-11       Impact factor: 3.240

6.  Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome.

Authors:  Aaron M Wenger; Paul Peluso; William J Rowell; Pi-Chuan Chang; Richard J Hall; Gregory T Concepcion; Jana Ebler; Arkarachai Fungtammasan; Alexey Kolesnikov; Nathan D Olson; Armin Töpfer; Michael Alonge; Medhat Mahmoud; Yufeng Qian; Chen-Shan Chin; Adam M Phillippy; Michael C Schatz; Gene Myers; Mark A DePristo; Jue Ruan; Tobias Marschall; Fritz J Sedlazeck; Justin M Zook; Heng Li; Sergey Koren; Andrew Carroll; David R Rank; Michael W Hunkapiller
Journal:  Nat Biotechnol       Date:  2019-08-12       Impact factor: 54.908

7.  UCSC Genome Browser enters 20th year.

Authors:  Christopher M Lee; Galt P Barber; Jonathan Casper; Hiram Clawson; Mark Diekhans; Jairo Navarro Gonzalez; Angie S Hinrichs; Brian T Lee; Luis R Nassar; Conner C Powell; Brian J Raney; Kate R Rosenbloom; Daniel Schmelter; Matthew L Speir; Ann S Zweig; David Haussler; Maximilian Haeussler; Robert M Kuhn; W James Kent
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

8.  Towards complete and error-free genome assemblies of all vertebrate species.

Authors:  Arang Rhie; Shane A McCarthy; Olivier Fedrigo; Joana Damas; Giulio Formenti; Sergey Koren; Marcela Uliano-Silva; William Chow; Arkarachai Fungtammasan; Juwan Kim; Chul Lee; Byung June Ko; Mark Chaisson; Gregory L Gedman; Lindsey J Cantin; Francoise Thibaud-Nissen; Leanne Haggerty; Iliana Bista; Michelle Smith; Bettina Haase; Jacquelyn Mountcastle; Sylke Winkler; Sadye Paez; Jason Howard; Sonja C Vernes; Tanya M Lama; Frank Grutzner; Wesley C Warren; Christopher N Balakrishnan; Dave Burt; Julia M George; Matthew T Biegler; David Iorns; Andrew Digby; Daryl Eason; Bruce Robertson; Taylor Edwards; Mark Wilkinson; George Turner; Axel Meyer; Andreas F Kautt; Paolo Franchini; H William Detrich; Hannes Svardal; Maximilian Wagner; Gavin J P Naylor; Martin Pippel; Milan Malinsky; Mark Mooney; Maria Simbirsky; Brett T Hannigan; Trevor Pesout; Marlys Houck; Ann Misuraca; Sarah B Kingan; Richard Hall; Zev Kronenberg; Ivan Sović; Christopher Dunn; Zemin Ning; Alex Hastie; Joyce Lee; Siddarth Selvaraj; Richard E Green; Nicholas H Putnam; Ivo Gut; Jay Ghurye; Erik Garrison; Ying Sims; Joanna Collins; Sarah Pelan; James Torrance; Alan Tracey; Jonathan Wood; Robel E Dagnew; Dengfeng Guan; Sarah E London; David F Clayton; Claudio V Mello; Samantha R Friedrich; Peter V Lovell; Ekaterina Osipova; Farooq O Al-Ajli; Simona Secomandi; Heebal Kim; Constantina Theofanopoulou; Michael Hiller; Yang Zhou; Robert S Harris; Kateryna D Makova; Paul Medvedev; Jinna Hoffman; Patrick Masterson; Karen Clark; Fergal Martin; Kevin Howe; Paul Flicek; Brian P Walenz; Woori Kwak; Hiram Clawson; Mark Diekhans; Luis Nassar; Benedict Paten; Robert H S Kraus; Andrew J Crawford; M Thomas P Gilbert; Guojie Zhang; Byrappa Venkatesh; Robert W Murphy; Klaus-Peter Koepfli; Beth Shapiro; Warren E Johnson; Federica Di Palma; Tomas Marques-Bonet; Emma C Teeling; Tandy Warnow; Jennifer Marshall Graves; Oliver A Ryder; David Haussler; Stephen J O'Brien; Jonas Korlach; Harris A Lewin; Kerstin Howe; Eugene W Myers; Richard Durbin; Adam M Phillippy; Erich D Jarvis
Journal:  Nature       Date:  2021-04-28       Impact factor: 49.962

9.  A comparative genomics multitool for scientific discovery and conservation.

Authors: 
Journal:  Nature       Date:  2020-11-11       Impact factor: 49.962

10.  Six reference-quality genomes reveal evolution of bat adaptations.

Authors:  David Jebb; Zixia Huang; Martin Pippel; Graham M Hughes; Ksenia Lavrichenko; Paolo Devanna; Sylke Winkler; Lars S Jermiin; Emilia C Skirmuntt; Aris Katzourakis; Lucy Burkitt-Gray; David A Ray; Kevin A M Sullivan; Juliana G Roscito; Bogdan M Kirilenko; Liliana M Dávalos; Angelique P Corthals; Megan L Power; Gareth Jones; Roger D Ransome; Dina K N Dechmann; Andrea G Locatelli; Sébastien J Puechmaille; Olivier Fedrigo; Erich D Jarvis; Michael Hiller; Sonja C Vernes; Eugene W Myers; Emma C Teeling
Journal:  Nature       Date:  2020-07-22       Impact factor: 49.962

View more
  1 in total

1.  DENTIST-using long reads for closing assembly gaps at high accuracy.

Authors:  Arne Ludwig; Martin Pippel; Gene Myers; Michael Hiller
Journal:  Gigascience       Date:  2022-01-25       Impact factor: 7.658

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.