Dmitry A Kuzmin1,2, Sergey I Feranchuk1,3,4, Vadim V Sharov1,2, Alexander N Cybin1,2, Stepan V Makolov1,2, Yuliya A Putintseva1, Natalya V Oreshkova1,5, Konstantin V Krutovsky6,7,8,9. 1. Laboratory of Forest Genomics, Genome Research and Education Center, Siberian Federal University, 660036, Krasnoyarsk, Russia. 2. Department of High Performance Computing, Institute of Space and Information Technologies, Siberian Federal University, 660074, Krasnoyarsk, Russia. 3. Department of Informatics, National Research Technical University, 664074, Irkutsk, Russia. 4. Limnological Institute, Siberian Branch of Russian Academy of Sciences, 664033, Irkutsk, Russia. 5. Laboratory of Forest Genetics and Selection, V. N. Sukachev Institute of Forest, Siberian Branch of Russian Academy of Sciences, 660036, Krasnoyarsk, Russia. 6. Laboratory of Forest Genomics, Genome Research and Education Center, Siberian Federal University, 660036, Krasnoyarsk, Russia. konstantin.krutovsky@forst.uni-goettingen.de. 7. Department of Forest Genetics and Forest Tree Breeding, Georg-August University of Göttingen, 37077, Göttingen, Germany. konstantin.krutovsky@forst.uni-goettingen.de. 8. Laboratory of Population Genetics, N. I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, 119333, Russia. konstantin.krutovsky@forst.uni-goettingen.de. 9. Department of Ecosystem Science and Management, Texas A&M University, College Station, TX, 77843-2138, USA. konstantin.krutovsky@forst.uni-goettingen.de.
Abstract
BACKGROUND: De novo assembling of large genomes, such as in conifers (~ 12-30 Gbp), which also consist of ~ 80% of repetitive DNA, is a very complex and computationally intense endeavor. One of the main problems in assembling such genomes lays in computing limitations of nucleotide sequence assembly programs (DNA assemblers). As a rule, modern assemblers are usually designed to assemble genomes with a length not exceeding the length of the human genome (3.24 Gbp). Most assemblers cannot handle the amount of input sequence data required to provide sufficient coverage needed for a high-quality assembly. RESULTS: An original stepwise method of de novo assembly by parts (sets), which allows to bypass the limitations of modern assemblers associated with a huge amount of data being processed, is presented in this paper. The results of numerical assembling experiments conducted using the model plant Arabidopsis thaliana, Prunus persica (peach) and four most popular assemblers, ABySS, SOAPdenovo, SPAdes, and CLC Assembly Cell, showed the validity and effectiveness of the proposed stepwise assembling method. CONCLUSION: Using the new stepwise de novo assembling method presented in the paper, the genome of Siberian larch, Larix sibirica Ledeb. (12.34 Gbp) was completely assembled de novo by the CLC Assembly Cell assembler. It is the first genome assembly for larch species in addition to only five other conifer genomes sequenced and assembled for Picea abies, Picea glauca, Pinus taeda, Pinus lambertiana, and Pseudotsuga menziesii var. menziesii.
BACKGROUND: De novo assembling of large genomes, such as in conifers (~ 12-30 Gbp), which also consist of ~ 80% of repetitive DNA, is a very complex and computationally intense endeavor. One of the main problems in assembling such genomes lays in computing limitations of nucleotide sequence assembly programs (DNA assemblers). As a rule, modern assemblers are usually designed to assemble genomes with a length not exceeding the length of the human genome (3.24 Gbp). Most assemblers cannot handle the amount of input sequence data required to provide sufficient coverage needed for a high-quality assembly. RESULTS: An original stepwise method of de novo assembly by parts (sets), which allows to bypass the limitations of modern assemblers associated with a huge amount of data being processed, is presented in this paper. The results of numerical assembling experiments conducted using the model plant Arabidopsis thaliana, Prunus persica (peach) and four most popular assemblers, ABySS, SOAPdenovo, SPAdes, and CLC Assembly Cell, showed the validity and effectiveness of the proposed stepwise assembling method. CONCLUSION: Using the new stepwise de novo assembling method presented in the paper, the genome of Siberian larch, Larix sibirica Ledeb. (12.34 Gbp) was completely assembled de novo by the CLC Assembly Cell assembler. It is the first genome assembly for larch species in addition to only five other conifer genomes sequenced and assembled for Picea abies, Picea glauca, Pinus taeda, Pinus lambertiana, and Pseudotsuga menziesii var. menziesii.
Entities:
Keywords:
Larix sibirica; Siberian larch; de novo genome assembly
Authors: Anton Bankevich; Sergey Nurk; Dmitry Antipov; Alexey A Gurevich; Mikhail Dvorkin; Alexander S Kulikov; Valery M Lesin; Sergey I Nikolenko; Son Pham; Andrey D Prjibelski; Alexey V Pyshkin; Alexander V Sirotkin; Nikolay Vyahhi; Glenn Tesler; Max A Alekseyev; Pavel A Pevzner Journal: J Comput Biol Date: 2012-04-16 Impact factor: 1.479
Authors: R N Mantegna; S V Buldyrev; A L Goldberger; S Havlin; C K Peng; M Simons; H E Stanley Journal: Phys Rev Lett Date: 1994-12-05 Impact factor: 9.161
Authors: Jared T Simpson; Kim Wong; Shaun D Jackman; Jacqueline E Schein; Steven J M Jones; Inanç Birol Journal: Genome Res Date: 2009-02-27 Impact factor: 9.043
Authors: Korbinian Schneeberger; Stephan Ossowski; Felix Ott; Juliane D Klein; Xi Wang; Christa Lanz; Lisa M Smith; Jun Cao; Joffrey Fitz; Norman Warthmann; Stefan R Henz; Daniel H Huson; Detlef Weigel Journal: Proc Natl Acad Sci U S A Date: 2011-06-06 Impact factor: 11.205
Authors: Elena Mosca; Fernando Cruz; Jèssica Gómez-Garrido; Luca Bianco; Christian Rellstab; Sabine Brodbeck; Katalin Csilléry; Bruno Fady; Matthias Fladung; Barbara Fussi; Dušan Gömöry; Santiago C González-Martínez; Delphine Grivet; Marta Gut; Ole Kim Hansen; Katrin Heer; Zeki Kaya; Konstantin V Krutovsky; Birgit Kersten; Sascha Liepelt; Lars Opgenoorth; Christoph Sperisen; Kristian K Ullrich; Giovanni G Vendramin; Marjana Westergren; Birgit Ziegenhagen; Tyler Alioto; Felix Gugerli; Berthold Heinze; Maria Höhn; Michela Troggio; David B Neale Journal: G3 (Bethesda) Date: 2019-07-09 Impact factor: 3.154
Authors: Eugeniya I Bondar; Maxim E Troukhan; Konstantin V Krutovsky; Tatiana V Tatarinova Journal: Int J Mol Sci Date: 2022-02-03 Impact factor: 5.923
Authors: Anastasia Y Batalova; Yuliya A Putintseva; Michael G Sadovsky; Konstantin V Krutovsky Journal: Int J Mol Sci Date: 2022-03-29 Impact factor: 5.923
Authors: José Miguel Valderrama-Martín; Francisco Ortigosa; Concepción Ávila; Francisco M Cánovas; Bertrand Hirel; Francisco R Cantón; Rafael A Cañas Journal: Plant J Date: 2022-03-09 Impact factor: 7.091
Authors: Yuliya A Putintseva; Eugeniya I Bondar; Evgeniy P Simonov; Vadim V Sharov; Natalya V Oreshkova; Dmitry A Kuzmin; Yuri M Konstantinov; Vladimir N Shmakov; Vadim I Belkov; Michael G Sadovsky; Olivier Keech; Konstantin V Krutovsky Journal: BMC Genomics Date: 2020-09-23 Impact factor: 3.969
Authors: David B Neale; Aleksey V Zimin; Sumaira Zaman; Alison D Scott; Bikash Shrestha; Rachael E Workman; Daniela Puiu; Brian J Allen; Zane J Moore; Manoj K Sekhwal; Amanda R De La Torre; Patrick E McGuire; Emily Burns; Winston Timp; Jill L Wegrzyn; Steven L Salzberg Journal: G3 (Bethesda) Date: 2022-01-04 Impact factor: 3.542
Authors: Isabel García-García; Belén Méndez-Cea; David Martín-Gálvez; José Ignacio Seco; Francisco Javier Gallego; Juan Carlos Linares Journal: Front Plant Sci Date: 2022-01-04 Impact factor: 5.753
Authors: Tony Heitkam; Luise Schulte; Beatrice Weber; Susan Liedtke; Sarah Breitenbach; Anja Kögler; Kristin Morgenstern; Marie Brückner; Ute Tröber; Heino Wolf; Doris Krabel; Thomas Schmidt Journal: Front Genet Date: 2021-07-12 Impact factor: 4.599