Literature DB >> 32801147

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads.

Sergey Nurk1, Brian P Walenz1, Arang Rhie1, Mitchell R Vollger2, Glennis A Logsdon2, Robert Grothe3, Karen H Miga4, Evan E Eichler2,5, Adam M Phillippy1, Sergey Koren1.   

Abstract

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
© 2020 Nurk et al.; Published by Cold Spring Harbor Laboratory Press.

Entities:  

Year:  2020        PMID: 32801147      PMCID: PMC7545148          DOI: 10.1101/gr.263566.120

Source DB:  PubMed          Journal:  Genome Res        ISSN: 1088-9051            Impact factor:   9.043


  72 in total

Review 1.  Sequencing and genome assembly using next-generation technologies.

Authors:  Niranjan Nagarajan; Mihai Pop
Journal:  Methods Mol Biol       Date:  2010

2.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors:  Daniel R Zerbino; Ewan Birney
Journal:  Genome Res       Date:  2008-03-18       Impact factor: 9.043

3.  Assembling large genomes with single-molecule sequencing and locality-sensitive hashing.

Authors:  Konstantin Berlin; Sergey Koren; Chen-Shan Chin; James P Drake; Jane M Landolin; Adam M Phillippy
Journal:  Nat Biotechnol       Date:  2015-05-25       Impact factor: 54.908

4.  Chromosome-specific subsets of human alpha satellite DNA: analysis of sequence divergence within and between chromosomal subsets and evidence for an ancestral pentameric repeat.

Authors:  H F Willard; J S Waye
Journal:  J Mol Evol       Date:  1987       Impact factor: 2.395

5.  Genetic and physical analyses of the centromeric and pericentromeric regions of human chromosome 5: recombination across 5cen.

Authors:  J Puechberty; A M Laurent; S Gimenez; A Billault; M E Brun-Laurent; A Calenda; B Marçais; C Prades; P Ioannou; Y Yurov; G Roizès
Journal:  Genomics       Date:  1999-03-15       Impact factor: 5.736

6.  Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies.

Authors:  Arang Rhie; Brian P Walenz; Sergey Koren; Adam M Phillippy
Journal:  Genome Biol       Date:  2020-09-14       Impact factor: 13.583

7.  Versatile and open software for comparing large genomes.

Authors:  Stefan Kurtz; Adam Phillippy; Arthur L Delcher; Michael Smoot; Martin Shumway; Corina Antonescu; Steven L Salzberg
Journal:  Genome Biol       Date:  2004-01-30       Impact factor: 13.583

8.  GenomeScope: fast reference-free genome profiling from short reads.

Authors:  Gregory W Vurture; Fritz J Sedlazeck; Maria Nattestad; Charles J Underwood; Han Fang; James Gurtowski; Michael C Schatz
Journal:  Bioinformatics       Date:  2017-07-15       Impact factor: 6.937

9.  An integrated map of genetic variation from 1,092 human genomes.

Authors:  Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal:  Nature       Date:  2012-11-01       Impact factor: 49.962

10.  The yin and yang of human Beta-defensins in health and disease.

Authors:  Aaron Weinberg; Ge Jin; Scott Sieg; Thomas S McCormick
Journal:  Front Immunol       Date:  2012-10-08       Impact factor: 7.561

View more
  87 in total

1.  Mining the gaps of chromosome 8.

Authors:  Glennis A Logsdon; Evan E Eichler
Journal:  Nature       Date:  2021-05-14       Impact factor: 49.962

2.  Long-read mapping to repetitive reference sequences using Winnowmap2.

Authors:  Chirag Jain; Arang Rhie; Nancy F Hansen; Sergey Koren; Adam M Phillippy
Journal:  Nat Methods       Date:  2022-04-01       Impact factor: 28.547

Review 3.  Recent advances and future perspectives in vector-omics.

Authors:  Austin Compton; Igor V Sharakhov; Zhijian Tu
Journal:  Curr Opin Insect Sci       Date:  2020-05-29       Impact factor: 5.186

4.  Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Authors:  Peter Ebert; Peter A Audano; Qihui Zhu; Bernardo Rodriguez-Martin; Charles Lee; Jan O Korbel; Tobias Marschall; Evan E Eichler; David Porubsky; Marc Jan Bonder; Arvis Sulovari; Jana Ebler; Weichen Zhou; Rebecca Serra Mari; Feyza Yilmaz; Xuefang Zhao; PingHsun Hsieh; Joyce Lee; Sushant Kumar; Jiadong Lin; Tobias Rausch; Yu Chen; Jingwen Ren; Martin Santamarina; Wolfram Höps; Hufsah Ashraf; Nelson T Chuang; Xiaofei Yang; Katherine M Munson; Alexandra P Lewis; Susan Fairley; Luke J Tallon; Wayne E Clarke; Anna O Basile; Marta Byrska-Bishop; André Corvelo; Uday S Evani; Tsung-Yu Lu; Mark J P Chaisson; Junjie Chen; Chong Li; Harrison Brand; Aaron M Wenger; Maryam Ghareghani; William T Harvey; Benjamin Raeder; Patrick Hasenfeld; Allison A Regier; Haley J Abel; Ira M Hall; Paul Flicek; Oliver Stegle; Mark B Gerstein; Jose M C Tubio; Zepeng Mu; Yang I Li; Xinghua Shi; Alex R Hastie; Kai Ye; Zechen Chong; Ashley D Sanders; Michael C Zody; Michael E Talkowski; Ryan E Mills; Scott E Devine
Journal:  Science       Date:  2021-02-25       Impact factor: 47.728

5.  Breaking through the unknowns of the human reference genome.

Authors:  Karen H Miga
Journal:  Nature       Date:  2021-02       Impact factor: 49.962

Review 6.  Genomic Tackling of Human Satellite DNA: Breaking Barriers through Time.

Authors:  Mariana Lopes; Sandra Louzada; Margarida Gama-Carvalho; Raquel Chaves
Journal:  Int J Mol Sci       Date:  2021-04-29       Impact factor: 5.923

7.  Haplotype diversity and sequence heterogeneity of human telomeres.

Authors:  Kirill Grigorev; Jonathan Foox; Daniela Bezdan; Daniel Butler; Jared J Luxton; Jake Reed; Miles J McKenna; Lynn Taylor; Kerry A George; Cem Meydan; Susan M Bailey; Christopher E Mason
Journal:  Genome Res       Date:  2021-06-23       Impact factor: 9.043

8.  Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms.

Authors:  Nadège Guiglielmoni; Antoine Houtain; Alessandro Derzelle; Karine Van Doninck; Jean-François Flot
Journal:  BMC Bioinformatics       Date:  2021-06-05       Impact factor: 3.169

Review 9.  Towards population-scale long-read sequencing.

Authors:  Wouter De Coster; Matthias H Weissensteiner; Fritz J Sedlazeck
Journal:  Nat Rev Genet       Date:  2021-05-28       Impact factor: 53.242

10.  LazyB: fast and cheap genome assembly.

Authors:  Thomas Gatter; Sarah von Löhneysen; Jörg Fallmann; Polina Drozdova; Tom Hartmann; Peter F Stadler
Journal:  Algorithms Mol Biol       Date:  2021-06-01       Impact factor: 1.405

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.