Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 LazyB: fast and cheap genome assembly.

Literature DB >> 34074310

LazyB: fast and cheap genome assembly.

Thomas Gatter¹, Sarah von Löhneysen², Jörg Fallmann², Polina Drozdova³, Tom Hartmann², Peter F Stadler^4,5,6,7,8.

Abstract

BACKGROUND: Advances in genome sequencing over the last years have lead to a fundamental paradigm shift in the field. With steadily decreasing sequencing costs, genome projects are no longer limited by the cost of raw sequencing data, but rather by computational problems associated with genome assembly. There is an urgent demand for more efficient and and more accurate methods is particular with regard to the highly complex and often very large genomes of animals and plants. Most recently, "hybrid" methods that integrate short and long read data have been devised to address this need.
RESULTS: LazyB is such a hybrid genome assembler. It has been designed specificially with an emphasis on utilizing low-coverage short and long reads. LazyB starts from a bipartite overlap graph between long reads and restrictively filtered short-read unitigs. This graph is translated into a long-read overlap graph G. Instead of the more conventional approach of removing tips, bubbles, and other local features, LazyB stepwisely extracts subgraphs whose global properties approach a disjoint union of paths. First, a consistently oriented subgraph is extracted, which in a second step is reduced to a directed acyclic graph. In the next step, properties of proper interval graphs are used to extract contigs as maximum weight paths. These path are translated into genomic sequences only in the final step. A prototype implementation of LazyB, entirely written in python, not only yields significantly more accurate assemblies of the yeast and fruit fly genomes compared to state-of-the-art pipelines but also requires much less computational effort.
CONCLUSIONS: LazyB is new low-cost genome assembler that copes well with large genomes and low coverage. It is based on a novel approach for reducing the overlap graph to a collection of paths, thus opening new avenues for future improvements. AVAILABILITY: The LazyB prototype is available at https://github.com/TGatter/LazyB .

Entities: Chemical

Keywords: Anchors; Genome assembly; Illumina sequencing; Nanopore sequencing; Spanning tree; Unitigs

Year: 2021 PMID： 34074310 PMCID： PMC8168326 DOI： 10.1186/s13015-021-00186-5

Source DB: PubMed Journal: Algorithms Mol Biol ISSN： 1748-7188 Impact factor: 1.405

41 in total

1. hybridSPAdes: an algorithm for hybrid assembly of short and long reads.

Authors: Dmitry Antipov; Anton Korobeynikov; Jeffrey S McLean; Pavel A Pevzner
Journal: Bioinformatics Date: 2015-11-20 Impact factor: 6.937

2. The MaSuRCA genome assembler.

Authors: Aleksey V Zimin; Guillaume Marçais; Daniela Puiu; Michael Roberts; Steven L Salzberg; James A Yorke
Journal: Bioinformatics Date: 2013-08-29 Impact factor: 6.937

3. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Authors: Sergey Nurk; Brian P Walenz; Arang Rhie; Mitchell R Vollger; Glennis A Logsdon; Robert Grothe; Karen H Miga; Evan E Eichler; Adam M Phillippy; Sergey Koren
Journal: Genome Res Date: 2020-08-14 Impact factor: 9.043

4. Superbubbles, Ultrabubbles, and Cacti.

Authors: Benedict Paten; Jordan M Eizenga; Yohei M Rosen; Adam M Novak; Erik Garrison; Glenn Hickey
Journal: J Comput Biol Date: 2018-02-20 Impact factor: 1.479

5. yacrd and fpa: upstream tools for long-read genome assembly.

Authors: Pierre Marijon; Rayan Chikhi; Jean-Stéphane Varré
Journal: Bioinformatics Date: 2020-06-01 Impact factor: 6.937

6. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies.

Authors: Arang Rhie; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal: Genome Biol Date: 2020-09-14 Impact factor: 13.583

7. QAlign: aligning nanopore reads accurately using current-level modeling.

Authors: Dhaivat Joshi; Shunfu Mao; Sreeram Kannan; Suhas Diggavi
Journal: Bioinformatics Date: 2021-05-05 Impact factor: 6.937

8. An ethnically relevant consensus Korean reference genome is a step towards personal reference genomes.

Authors: Yun Sung Cho; Hyunho Kim; Hak-Min Kim; Sungwoong Jho; JeHoon Jun; Yong Joo Lee; Kyun Shik Chae; Chang Geun Kim; Sangsoo Kim; Anders Eriksson; Jeremy S Edwards; Semin Lee; Byung Chul Kim; Andrea Manica; Tae-Kwang Oh; George M Church; Jong Bhak
Journal: Nat Commun Date: 2016-11-24 Impact factor: 14.919

9. Fast and accurate de novo genome assembly from long uncorrected reads.

Authors: Robert Vaser; Ivan Sović; Niranjan Nagarajan; Mile Šikić
Journal: Genome Res Date: 2017-01-18 Impact factor: 9.043

10. Featherweight long read alignment using partitioned reference indexes.

Authors: Hasindu Gamaarachchi; Sri Parameswaran; Martin A Smith
Journal: Sci Rep Date: 2019-03-13 Impact factor: 4.379

1 in total

1. Sequencing Bait: Nuclear and Mitogenome Assembly of an Abundant Coastal Tropical and Subtropical Fish, Atherinomorus stipes.

Authors: Melissa K Drown; Amanda N DeLiberto; Nicole Flack; Meghan Doyle; Alexander G Westover; John C Proefrock; Sandra Heilshorn; Evan D'Alessandro; Douglas L Crawford; Christopher Faulk; Marjorie F Oleksiak
Journal: Genome Biol Evol Date: 2022-08-03 Impact factor: 4.065

1 in total