Literature DB >> 26006009

Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Konstantin Berlin1, Sergey Koren2, Chen-Shan Chin3, James P Drake3, Jane M Landolin3, Adam M Phillippy2.   

Abstract

Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 26006009     DOI: 10.1038/nbt.3238

Source DB:  PubMed          Journal:  Nat Biotechnol        ISSN: 1087-0156            Impact factor:   54.908


  52 in total

1.  Complete sequence and gene map of a human major histocompatibility complex. The MHC sequencing consortium.

Authors: 
Journal:  Nature       Date:  1999-10-28       Impact factor: 49.962

2.  Assembly of large genomes using second-generation sequencing.

Authors:  Michael C Schatz; Arthur L Delcher; Steven L Salzberg
Journal:  Genome Res       Date:  2010-05-27       Impact factor: 9.043

3.  Efficient q-gram filters for finding all epsilon-matches over a given length.

Authors:  Kim R Rasmussen; Jens Stoye; Eugene W Myers
Journal:  J Comput Biol       Date:  2006-03       Impact factor: 1.479

4.  Continuous base identification for single-molecule nanopore DNA sequencing.

Authors:  James Clarke; Hai-Chen Wu; Lakmal Jayasinghe; Alpesh Patel; Stuart Reid; Hagan Bayley
Journal:  Nat Nanotechnol       Date:  2009-02-22       Impact factor: 39.213

5.  Sequence finishing and mapping of Drosophila melanogaster heterochromatin.

Authors:  Roger A Hoskins; Joseph W Carlson; Cameron Kennedy; David Acevedo; Martha Evans-Holm; Erwin Frise; Kenneth H Wan; Soo Park; Maria Mendez-Lago; Fabrizio Rossi; Alfredo Villasante; Patrizio Dimitri; Gary H Karpen; Susan E Celniker
Journal:  Science       Date:  2007-06-15       Impact factor: 47.728

6.  De novo assembly of highly diverse viral populations.

Authors:  Xiao Yang; Patrick Charlebois; Sante Gnerre; Matthew G Coole; Niall J Lennon; Joshua Z Levin; James Qu; Elizabeth M Ryan; Michael C Zody; Matthew R Henn
Journal:  BMC Genomics       Date:  2012-09-13       Impact factor: 3.969

7.  The Saccharomyces cerevisiae W303-K6001 cross-platform genome sequence: insights into ancestry and physiology of a laboratory mutt.

Authors:  Markus Ralser; Heiner Kuhl; Meryem Ralser; Martin Werber; Hans Lehrach; Michael Breitenbach; Bernd Timmermann
Journal:  Open Biol       Date:  2012-08       Impact factor: 6.411

8.  Finished bacterial genomes from shotgun sequence data.

Authors:  Filipe J Ribeiro; Dariusz Przybylski; Shuangye Yin; Ted Sharpe; Sante Gnerre; Amr Abouelleil; Aaron M Berlin; Anna Montmayeur; Terrance P Shea; Bruce J Walker; Sarah K Young; Carsten Russ; Chad Nusbaum; Iain MacCallum; David B Jaffe
Journal:  Genome Res       Date:  2012-07-24       Impact factor: 9.043

9.  Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements.

Authors:  Rajiv C McCoy; Ryan W Taylor; Timothy A Blauwkamp; Joanna L Kelley; Michael Kertesz; Dmitry Pushkarev; Dmitri A Petrov; Anna-Sophie Fiston-Lavier
Journal:  PLoS One       Date:  2014-09-04       Impact factor: 3.240

10.  Extensive error in the number of genes inferred from draft genome assemblies.

Authors:  James F Denton; Jose Lugo-Martinez; Abraham E Tucker; Daniel R Schrider; Wesley C Warren; Matthew W Hahn
Journal:  PLoS Comput Biol       Date:  2014-12-04       Impact factor: 4.475

View more
  343 in total

1.  hybridSPAdes: an algorithm for hybrid assembly of short and long reads.

Authors:  Dmitry Antipov; Anton Korobeynikov; Jeffrey S McLean; Pavel A Pevzner
Journal:  Bioinformatics       Date:  2015-11-20       Impact factor: 6.937

2.  Corrigendum: Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors:  Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2015-10       Impact factor: 54.908

3.  A proposed regulatory framework for genome-edited crops.

Authors:  Sanwen Huang; Detlef Weigel; Roger N Beachy; Jiayang Li
Journal:  Nat Genet       Date:  2016-02       Impact factor: 38.330

4.  TruSPAdes: barcode assembly of TruSeq synthetic long reads.

Authors:  Anton Bankevich; Pavel A Pevzner
Journal:  Nat Methods       Date:  2016-02-01       Impact factor: 28.547

5.  Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome.

Authors:  Derek M Bickhart; Benjamin D Rosen; Sergey Koren; Brian L Sayre; Alex R Hastie; Saki Chan; Joyce Lee; Ernest T Lam; Ivan Liachko; Shawn T Sullivan; Joshua N Burton; Heather J Huson; John C Nystrom; Christy M Kelley; Jana L Hutchison; Yang Zhou; Jiajie Sun; Alessandra Crisà; F Abel Ponce de León; John C Schwartz; John A Hammond; Geoffrey C Waldbieser; Steven G Schroeder; George E Liu; Maitreya J Dunham; Jay Shendure; Tad S Sonstegard; Adam M Phillippy; Curtis P Van Tassell; Timothy P L Smith
Journal:  Nat Genet       Date:  2017-03-06       Impact factor: 38.330

6.  Chromosome-level assembly of Arabidopsis thaliana Ler reveals the extent of translocation and inversion polymorphisms.

Authors:  Luis Zapata; Jia Ding; Eva-Maria Willing; Benjamin Hartwig; Daniela Bezdan; Wen-Biao Jiao; Vipul Patel; Geo Velikkakam James; Maarten Koornneef; Stephan Ossowski; Korbinian Schneeberger
Journal:  Proc Natl Acad Sci U S A       Date:  2016-06-27       Impact factor: 11.205

7.  LR_Gapcloser: a tiling path-based gap closer that uses long reads to complete genome assembly.

Authors:  Gui-Cai Xu; Tian-Jun Xu; Rui Zhu; Yan Zhang; Shang-Qi Li; Hong-Wei Wang; Jiong-Tang Li
Journal:  Gigascience       Date:  2019-01-01       Impact factor: 6.524

8.  Birth of a new gene on the Y chromosome of Drosophila melanogaster.

Authors:  Antonio Bernardo Carvalho; Beatriz Vicoso; Claudia A M Russo; Bonnielin Swenor; Andrew G Clark
Journal:  Proc Natl Acad Sci U S A       Date:  2015-09-18       Impact factor: 11.205

9.  Long-read sequencing data analysis for yeasts.

Authors:  Jia-Xing Yue; Gianni Liti
Journal:  Nat Protoc       Date:  2018-05-03       Impact factor: 13.491

Review 10.  Finding the Genomic Basis of Local Adaptation: Pitfalls, Practical Solutions, and Future Directions.

Authors:  Sean Hoban; Joanna L Kelley; Katie E Lotterhos; Michael F Antolin; Gideon Bradburd; David B Lowry; Mary L Poss; Laura K Reed; Andrew Storfer; Michael C Whitlock
Journal:  Am Nat       Date:  2016-08-15       Impact factor: 3.926

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.