David Eccles1, Jodie Chandler1, Mali Camberis1, Bernard Henrissat2,3,4, Sergey Koren5, Graham Le Gros6, Jonathan J Ewbank1,7. 1. Malaghan Institute of Medical Research, Wellington, New Zealand. 2. Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia. 3. CNRS UMR 7257, Aix-Marseille University, Marseille, France. 4. INRA, USC 1408 AFMB, Marseille, France. 5. Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, 20892, USA. 6. Malaghan Institute of Medical Research, Wellington, New Zealand. glegros@malaghan.org.nz. 7. Centre d'Immunologie de Marseille-Luminy, Aix-Marseille University, CNRS, INSERM, Marseille, France.
Abstract
BACKGROUND: Eukaryotic genome assembly remains a challenge in part due to the prevalence of complex DNA repeats. This is a particularly acute problem for holocentric nematodes because of the large number of satellite DNA sequences found throughout their genomes. These have been recalcitrant to most genome sequencing methods. At the same time, many nematodes are parasites and some represent a serious threat to human health. There is a pressing need for better molecular characterization of animal and plant parasitic nematodes. The advent of long-read DNA sequencing methods offers the promise of resolving complex genomes. RESULTS: Using Nippostrongylus brasiliensis as a test case, applying improved base-calling algorithms and assembly methods, we demonstrate the feasibility of de novo genome assembly matching current community standards using only MinION long reads. In doing so, we uncovered an unexpected diversity of very long and complex DNA sequences repeated throughout the N. brasiliensis genome, including massive tandem repeats of tRNA genes. CONCLUSION: Base-calling and assembly methods have improved sufficiently that de novo genome assembly of large complex genomes is possible using only long reads. The method has the added advantage of preserving haplotypic variants and so has the potential to be used in population analyses.
BACKGROUND: Eukaryotic genome assembly remains a challenge in part due to the prevalence of complex DNA repeats. This is a particularly acute problem for holocentric nematodes because of the large number of satellite DNA sequences found throughout their genomes. These have been recalcitrant to most genome sequencing methods. At the same time, many nematodes are parasites and some represent a serious threat to human health. There is a pressing need for better molecular characterization of animal and plant parasitic nematodes. The advent of long-read DNA sequencing methods offers the promise of resolving complex genomes. RESULTS: Using Nippostrongylus brasiliensis as a test case, applying improved base-calling algorithms and assembly methods, we demonstrate the feasibility of de novo genome assembly matching current community standards using only MinION long reads. In doing so, we uncovered an unexpected diversity of very long and complex DNA sequences repeated throughout the N. brasiliensis genome, including massive tandem repeats of tRNA genes. CONCLUSION: Base-calling and assembly methods have improved sufficiently that de novo genome assembly of large complex genomes is possible using only long reads. The method has the added advantage of preserving haplotypic variants and so has the potential to be used in population analyses.
Entities:
Keywords:
Base-calling; DNA repeat; Genome assembly; Helminths; Next-generation sequencing; Population analysis
Authors: Derek M Bickhart; Benjamin D Rosen; Sergey Koren; Brian L Sayre; Alex R Hastie; Saki Chan; Joyce Lee; Ernest T Lam; Ivan Liachko; Shawn T Sullivan; Joshua N Burton; Heather J Huson; John C Nystrom; Christy M Kelley; Jana L Hutchison; Yang Zhou; Jiajie Sun; Alessandra Crisà; F Abel Ponce de León; John C Schwartz; John A Hammond; Geoffrey C Waldbieser; Steven G Schroeder; George E Liu; Maitreya J Dunham; Jay Shendure; Tad S Sonstegard; Adam M Phillippy; Curtis P Van Tassell; Timothy P L Smith Journal: Nat Genet Date: 2017-03-06 Impact factor: 38.330
Authors: Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy Journal: Nat Biotechnol Date: 2015-05-25 Impact factor: 54.908
Authors: Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov Journal: Bioinformatics Date: 2015-06-09 Impact factor: 6.937
Authors: Chen-Shan Chin; David H Alexander; Patrick Marks; Aaron A Klammer; James Drake; Cheryl Heiner; Alicia Clum; Alex Copeland; John Huddleston; Evan E Eichler; Stephen W Turner; Jonas Korlach Journal: Nat Methods Date: 2013-05-05 Impact factor: 28.547
Authors: David E Jarvis; Yung Shwen Ho; Damien J Lightfoot; Sandra M Schmöckel; Bo Li; Theo J A Borm; Hajime Ohyanagi; Katsuhiko Mineta; Craig T Michell; Noha Saber; Najeh M Kharbatia; Ryan R Rupper; Aaron R Sharp; Nadine Dally; Berin A Boughton; Yong H Woo; Ge Gao; Elio G W M Schijlen; Xiujie Guo; Afaque A Momin; Sónia Negrão; Salim Al-Babili; Christoph Gehring; Ute Roessner; Christian Jung; Kevin Murphy; Stefan T Arold; Takashi Gojobori; C Gerard van der Linden; Eibertus N van Loo; Eric N Jellen; Peter J Maughan; Mark Tester Journal: Nature Date: 2017-02-08 Impact factor: 49.962
Authors: Ivy K Brown; Nathan Dyjack; Mindy M Miller; Harsha Krovi; Cydney Rios; Rachel Woolaver; Laura Harmacek; Ting-Hui Tu; Brian P O'Connor; Thomas Danhorn; Brian Vestal; Laurent Gapin; Clemencia Pinilla; Max A Seibold; James Scott-Browne; Radleigh G Santos; R Lee Reinhardt Journal: PLoS Pathog Date: 2021-06-09 Impact factor: 6.823
Authors: Michael Schmid; Daniel Frei; Andrea Patrignani; Ralph Schlapbach; Jürg E Frey; Mitja N P Remus-Emsermann; Christian H Ahrens Journal: Nucleic Acids Res Date: 2018-09-28 Impact factor: 16.971
Authors: Jun Yoshimura; Kazuki Ichikawa; Massa J Shoura; Karen L Artiles; Idan Gabdank; Lamia Wahba; Cheryl L Smith; Mark L Edgley; Ann E Rougvie; Andrew Z Fire; Shinichi Morishita; Erich M Schwarz Journal: Genome Res Date: 2019-05-23 Impact factor: 9.043