| Literature DB >> 32311025 |
Lauren Coombe1, Vladimir Nikolić1, Justin Chu1, Inanc Birol1, René L Warren1.
Abstract
SUMMARY: The ability to generate high-quality genome sequences is cornerstone to modern biological research. Even with recent advancements in sequencing technologies, many genome assemblies are still not achieving reference-grade. Here, we introduce ntJoin, a tool that leverages structural synteny between a draft assembly and reference sequence(s) to contiguate and correct the former with respect to the latter. Instead of alignments, ntJoin uses a lightweight mapping approach based on a graph data structure generated from ordered minimizer sketches. The tool can be used in a variety of different applications, including improving a draft assembly with a reference-grade genome, a short-read assembly with a draft long-read assembly and a draft assembly with an assembly from a closely related species. When scaffolding a human short-read assembly using the reference human genome or a long-read assembly, ntJoin improves the NGA50 length 23- and 13-fold, respectively, in under 13 m, using <11 GB of RAM. Compared to existing reference-guided scaffolders, ntJoin generates highly contiguous assemblies faster and using less memory.Entities:
Year: 2020 PMID: 32311025 PMCID: PMC7320612 DOI: 10.1093/bioinformatics/btaa253
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Comparing (a) the contiguity, correctness and (b) benchmarking results of ntJoin (orange), Ragoo (blue) and Ragout (green) runs on various H.sapiens (NA12878) assemblies on (a) linear and (b) log–log scale. The reference genomes are the human reference genome (‘Ref’) and an ntEdit-polished Shasta assembly (‘Shasta’). The target assemblies being improved are a NA12878 ABySS assembly scaffolded with MPET data (‘ABySS’), and an ntEdit-polished Shasta assembly (‘Shasta’). The ‘Baseline’ statistics are shown for the corresponding target assemblies prior to scaffolding in each panel of (a)