Literature DB >> 36261518

Semi-automated assembly of high-quality diploid human reference genomes.

Erich D Jarvis1,2, Giulio Formenti3, Arang Rhie4, Andrea Guarracino5, Chentao Yang6, Jonathan Wood7, Alan Tracey7, Francoise Thibaud-Nissen8, Mitchell R Vollger9, David Porubsky9, Haoyu Cheng10,11, Mobin Asri12, Glennis A Logsdon9, Paolo Carnevali13, Mark J P Chaisson14, Chen-Shan Chin15, Sarah Cody16, Joanna Collins7, Peter Ebert17, Merly Escalona18, Olivier Fedrigo19, Robert S Fulton16, Lucinda L Fulton16, Shilpa Garg20, Jennifer L Gerton21, Jay Ghurye22, Anastasiya Granat23, Richard E Green12, William Harvey9, Patrick Hasenfeld24, Alex Hastie25, Marina Haukness12, Erich B Jaeger23, Miten Jain12, Melanie Kirsche26, Mikhail Kolmogorov27, Jan O Korbel24, Sergey Koren4, Jonas Korlach28, Joyce Lee25, Daofeng Li29,30, Tina Lindsay16, Julian Lucas12, Feng Luo31, Tobias Marschall17, Matthew W Mitchell32, Jennifer McDaniel33, Fan Nie34, Hugh E Olsen12, Nathan D Olson33, Trevor Pesout12, Tamara Potapova21, Daniela Puiu35, Allison Regier36, Jue Ruan37, Steven L Salzberg35, Ashley D Sanders38, Michael C Schatz26, Anthony Schmitt39, Valerie A Schneider8, Siddarth Selvaraj39, Kishwar Shafin12, Alaina Shumate35, Nathan O Stitziel16,29,40, Catherine Stober24, James Torrance7, Justin Wagner33, Jianxin Wang34, Aaron Wenger28, Chuanle Xiao41, Aleksey V Zimin35, Guojie Zhang42, Ting Wang16,29,30, Heng Li10, Erik Garrison43, David Haussler44,45, Ira Hall46, Justin M Zook33, Evan E Eichler44,9, Adam M Phillippy4, Benedict Paten12, Kerstin Howe47, Karen H Miga48.   

Abstract

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
© 2022. The Author(s).

Entities:  

Year:  2022        PMID: 36261518     DOI: 10.1038/s41586-022-05325-5

Source DB:  PubMed          Journal:  Nature        ISSN: 0028-0836            Impact factor:   69.504


  73 in total

1.  Initial sequencing and analysis of the human genome.

Authors:  E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

Review 2.  Computing genetic imprinting expressed by haplotypes.

Authors:  Yun Cheng; Arthur Berg; Song Wu; Yao Li; Rongling Wu
Journal:  Methods Mol Biol       Date:  2009

Review 3.  Long-read human genome sequencing and its applications.

Authors:  Glennis A Logsdon; Mitchell R Vollger; Evan E Eichler
Journal:  Nat Rev Genet       Date:  2020-06-05       Impact factor: 53.242

Review 4.  Pan-genomics in the human genome era.

Authors:  Rachel M Sherman; Steven L Salzberg
Journal:  Nat Rev Genet       Date:  2020-02-07       Impact factor: 53.242

Review 5.  The Human Pangenome Project: a global resource to map genomic diversity.

Authors:  Ting Wang; Lucinda Antonacci-Fulton; Kerstin Howe; Heather A Lawson; Julian K Lucas; Adam M Phillippy; Alice B Popejoy; Mobin Asri; Caryn Carson; Mark J P Chaisson; Xian Chang; Robert Cook-Deegan; Adam L Felsenfeld; Robert S Fulton; Erik P Garrison; Nanibaa' A Garrison; Tina A Graves-Lindsay; Hanlee Ji; Eimear E Kenny; Barbara A Koenig; Daofeng Li; Tobias Marschall; Joshua F McMichael; Adam M Novak; Deepak Purushotham; Valerie A Schneider; Baergen I Schultz; Michael W Smith; Heidi J Sofia; Tsachy Weissman; Paul Flicek; Heng Li; Karen H Miga; Benedict Paten; Erich D Jarvis; Ira M Hall; Evan E Eichler; David Haussler
Journal:  Nature       Date:  2022-04-20       Impact factor: 69.504

6.  Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Authors:  Peter Ebert; Peter A Audano; Qihui Zhu; Bernardo Rodriguez-Martin; Charles Lee; Jan O Korbel; Tobias Marschall; Evan E Eichler; David Porubsky; Marc Jan Bonder; Arvis Sulovari; Jana Ebler; Weichen Zhou; Rebecca Serra Mari; Feyza Yilmaz; Xuefang Zhao; PingHsun Hsieh; Joyce Lee; Sushant Kumar; Jiadong Lin; Tobias Rausch; Yu Chen; Jingwen Ren; Martin Santamarina; Wolfram Höps; Hufsah Ashraf; Nelson T Chuang; Xiaofei Yang; Katherine M Munson; Alexandra P Lewis; Susan Fairley; Luke J Tallon; Wayne E Clarke; Anna O Basile; Marta Byrska-Bishop; André Corvelo; Uday S Evani; Tsung-Yu Lu; Mark J P Chaisson; Junjie Chen; Chong Li; Harrison Brand; Aaron M Wenger; Maryam Ghareghani; William T Harvey; Benjamin Raeder; Patrick Hasenfeld; Allison A Regier; Haley J Abel; Ira M Hall; Paul Flicek; Oliver Stegle; Mark B Gerstein; Jose M C Tubio; Zepeng Mu; Yang I Li; Xinghua Shi; Alex R Hastie; Kai Ye; Zechen Chong; Ashley D Sanders; Michael C Zody; Michael E Talkowski; Ryan E Mills; Scott E Devine
Journal:  Science       Date:  2021-02-25       Impact factor: 47.728

7.  The complete sequence of a human genome.

Authors:  Sergey Nurk; Sergey Koren; Arang Rhie; Mikko Rautiainen; Andrey V Bzikadze; Alla Mikheenko; Mitchell R Vollger; Nicolas Altemose; Lev Uralsky; Ariel Gershman; Sergey Aganezov; Savannah J Hoyt; Mark Diekhans; Glennis A Logsdon; Michael Alonge; Stylianos E Antonarakis; Matthew Borchers; Gerard G Bouffard; Shelise Y Brooks; Gina V Caldas; Nae-Chyun Chen; Haoyu Cheng; Chen-Shan Chin; William Chow; Leonardo G de Lima; Philip C Dishuck; Richard Durbin; Tatiana Dvorkina; Ian T Fiddes; Giulio Formenti; Robert S Fulton; Arkarachai Fungtammasan; Erik Garrison; Patrick G S Grady; Tina A Graves-Lindsay; Ira M Hall; Nancy F Hansen; Gabrielle A Hartley; Marina Haukness; Kerstin Howe; Michael W Hunkapiller; Chirag Jain; Miten Jain; Erich D Jarvis; Peter Kerpedjiev; Melanie Kirsche; Mikhail Kolmogorov; Jonas Korlach; Milinn Kremitzki; Heng Li; Valerie V Maduro; Tobias Marschall; Ann M McCartney; Jennifer McDaniel; Danny E Miller; James C Mullikin; Eugene W Myers; Nathan D Olson; Benedict Paten; Paul Peluso; Pavel A Pevzner; David Porubsky; Tamara Potapova; Evgeny I Rogaev; Jeffrey A Rosenfeld; Steven L Salzberg; Valerie A Schneider; Fritz J Sedlazeck; Kishwar Shafin; Colin J Shew; Alaina Shumate; Ying Sims; Arian F A Smit; Daniela C Soto; Ivan Sović; Jessica M Storer; Aaron Streets; Beth A Sullivan; Françoise Thibaud-Nissen; James Torrance; Justin Wagner; Brian P Walenz; Aaron Wenger; Jonathan M D Wood; Chunlin Xiao; Stephanie M Yan; Alice C Young; Samantha Zarate; Urvashi Surti; Rajiv C McCoy; Megan Y Dennis; Ivan A Alexandrov; Jennifer L Gerton; Rachel J O'Neill; Winston Timp; Justin M Zook; Michael C Schatz; Evan E Eichler; Karen H Miga; Adam M Phillippy
Journal:  Science       Date:  2022-03-31       Impact factor: 63.714

8.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly.

Authors:  Valerie A Schneider; Tina Graves-Lindsay; Kerstin Howe; Nathan Bouk; Hsiu-Chuan Chen; Paul A Kitts; Terence D Murphy; Kim D Pruitt; Françoise Thibaud-Nissen; Derek Albracht; Robert S Fulton; Milinn Kremitzki; Vincent Magrini; Chris Markovic; Sean McGrath; Karyn Meltz Steinberg; Kate Auger; William Chow; Joanna Collins; Glenn Harden; Timothy Hubbard; Sarah Pelan; Jared T Simpson; Glen Threadgold; James Torrance; Jonathan M Wood; Laura Clarke; Sergey Koren; Matthew Boitano; Paul Peluso; Heng Li; Chen-Shan Chin; Adam M Phillippy; Richard Durbin; Richard K Wilson; Paul Flicek; Evan E Eichler; Deanna M Church
Journal:  Genome Res       Date:  2017-04-10       Impact factor: 9.043

9.  Towards complete and error-free genome assemblies of all vertebrate species.

Authors:  Arang Rhie; Shane A McCarthy; Olivier Fedrigo; Joana Damas; Giulio Formenti; Sergey Koren; Marcela Uliano-Silva; William Chow; Arkarachai Fungtammasan; Juwan Kim; Chul Lee; Byung June Ko; Mark Chaisson; Gregory L Gedman; Lindsey J Cantin; Francoise Thibaud-Nissen; Leanne Haggerty; Iliana Bista; Michelle Smith; Bettina Haase; Jacquelyn Mountcastle; Sylke Winkler; Sadye Paez; Jason Howard; Sonja C Vernes; Tanya M Lama; Frank Grutzner; Wesley C Warren; Christopher N Balakrishnan; Dave Burt; Julia M George; Matthew T Biegler; David Iorns; Andrew Digby; Daryl Eason; Bruce Robertson; Taylor Edwards; Mark Wilkinson; George Turner; Axel Meyer; Andreas F Kautt; Paolo Franchini; H William Detrich; Hannes Svardal; Maximilian Wagner; Gavin J P Naylor; Martin Pippel; Milan Malinsky; Mark Mooney; Maria Simbirsky; Brett T Hannigan; Trevor Pesout; Marlys Houck; Ann Misuraca; Sarah B Kingan; Richard Hall; Zev Kronenberg; Ivan Sović; Christopher Dunn; Zemin Ning; Alex Hastie; Joyce Lee; Siddarth Selvaraj; Richard E Green; Nicholas H Putnam; Ivo Gut; Jay Ghurye; Erik Garrison; Ying Sims; Joanna Collins; Sarah Pelan; James Torrance; Alan Tracey; Jonathan Wood; Robel E Dagnew; Dengfeng Guan; Sarah E London; David F Clayton; Claudio V Mello; Samantha R Friedrich; Peter V Lovell; Ekaterina Osipova; Farooq O Al-Ajli; Simona Secomandi; Heebal Kim; Constantina Theofanopoulou; Michael Hiller; Yang Zhou; Robert S Harris; Kateryna D Makova; Paul Medvedev; Jinna Hoffman; Patrick Masterson; Karen Clark; Fergal Martin; Kevin Howe; Paul Flicek; Brian P Walenz; Woori Kwak; Hiram Clawson; Mark Diekhans; Luis Nassar; Benedict Paten; Robert H S Kraus; Andrew J Crawford; M Thomas P Gilbert; Guojie Zhang; Byrappa Venkatesh; Robert W Murphy; Klaus-Peter Koepfli; Beth Shapiro; Warren E Johnson; Federica Di Palma; Tomas Marques-Bonet; Emma C Teeling; Tandy Warnow; Jennifer Marshall Graves; Oliver A Ryder; David Haussler; Stephen J O'Brien; Jonas Korlach; Harris A Lewin; Kerstin Howe; Eugene W Myers; Richard Durbin; Adam M Phillippy; Erich D Jarvis
Journal:  Nature       Date:  2021-04-28       Impact factor: 49.962

Review 10.  Long walk to genomics: History and current approaches to genome sequencing and assembly.

Authors:  Alice Maria Giani; Guido Roberto Gallo; Luca Gianfranceschi; Giulio Formenti
Journal:  Comput Struct Biotechnol J       Date:  2019-11-17       Impact factor: 7.271

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.