| Literature DB >> 36261518 |
Erich D Jarvis1,2, Giulio Formenti3, Arang Rhie4, Andrea Guarracino5, Chentao Yang6, Jonathan Wood7, Alan Tracey7, Francoise Thibaud-Nissen8, Mitchell R Vollger9, David Porubsky9, Haoyu Cheng10,11, Mobin Asri12, Glennis A Logsdon9, Paolo Carnevali13, Mark J P Chaisson14, Chen-Shan Chin15, Sarah Cody16, Joanna Collins7, Peter Ebert17, Merly Escalona18, Olivier Fedrigo19, Robert S Fulton16, Lucinda L Fulton16, Shilpa Garg20, Jennifer L Gerton21, Jay Ghurye22, Anastasiya Granat23, Richard E Green12, William Harvey9, Patrick Hasenfeld24, Alex Hastie25, Marina Haukness12, Erich B Jaeger23, Miten Jain12, Melanie Kirsche26, Mikhail Kolmogorov27, Jan O Korbel24, Sergey Koren4, Jonas Korlach28, Joyce Lee25, Daofeng Li29,30, Tina Lindsay16, Julian Lucas12, Feng Luo31, Tobias Marschall17, Matthew W Mitchell32, Jennifer McDaniel33, Fan Nie34, Hugh E Olsen12, Nathan D Olson33, Trevor Pesout12, Tamara Potapova21, Daniela Puiu35, Allison Regier36, Jue Ruan37, Steven L Salzberg35, Ashley D Sanders38, Michael C Schatz26, Anthony Schmitt39, Valerie A Schneider8, Siddarth Selvaraj39, Kishwar Shafin12, Alaina Shumate35, Nathan O Stitziel16,29,40, Catherine Stober24, James Torrance7, Justin Wagner33, Jianxin Wang34, Aaron Wenger28, Chuanle Xiao41, Aleksey V Zimin35, Guojie Zhang42, Ting Wang16,29,30, Heng Li10, Erik Garrison43, David Haussler44,45, Ira Hall46, Justin M Zook33, Evan E Eichler44,9, Adam M Phillippy4, Benedict Paten12, Kerstin Howe47, Karen H Miga48.
Abstract
The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.Entities:
Year: 2022 PMID: 36261518 DOI: 10.1038/s41586-022-05325-5
Source DB: PubMed Journal: Nature ISSN: 0028-0836 Impact factor: 69.504