| Literature DB >> 33288906 |
David Porubsky1, Peter Ebert2, Peter A Audano1, Mitchell R Vollger1, William T Harvey1, Pierre Marijon2, Jana Ebler2, Katherine M Munson1, Melanie Sorensen1, Arvis Sulovari1, Marina Haukness3, Maryam Ghareghani2,4, Peter M Lansdorp5,6, Benedict Paten3, Scott E Devine7, Ashley D Sanders8, Charles Lee9,10,11, Mark J P Chaisson12, Jan O Korbel8, Evan E Eichler13,14, Tobias Marschall15.
Abstract
Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.Entities:
Mesh:
Year: 2020 PMID: 33288906 PMCID: PMC7954704 DOI: 10.1038/s41587-020-0719-5
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908