Lu Zhang1,2,3, Xin Zhou3, Ziming Weng2, Arend Sidow2,4. 1. Department of Computer Science, Hong Kong Baptist University. 2. Department of Pathology, 300 Pasteur Dr, Stanford University, Stanford, CA 94305 USA. 3. Department of Computer Science, Stanford University, Stanford, CA 94305 USA. 4. Department of Genetics, 300 Pasteur Dr, Stanford University, Stanford, CA 94305 USA.
Abstract
BACKGROUND: Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate in depth how the parameter space of 10x library preparation and sequencing affects assembly quality, on the basis of both simulated and real libraries. RESULTS: We prepared and sequenced eight 10x libraries with a diverse set of parameters from standard cell lines NA12878 and NA24385 and performed whole-genome assembly on the data. We also developed the simulator LRTK-SIM to follow the workflow of 10x data generation and produce realistic simulated Linked-Read data sets. We found that assembly quality could be improved by increasing the total sequencing coverage (C) and keeping physical coverage of DNA fragments (CF) or read coverage per fragment (CR) within broad ranges. The optimal physical coverage was between 332× and 823× and assembly quality worsened if it increased to >1,000× for a given C. Long DNA fragments could significantly extend phase blocks but decreased contig contiguity. The optimal length-weighted fragment length (W${\mu _{FL}}$) was ∼50-150 kb. When broadly optimal parameters were used for library preparation and sequencing, ∼80% of the genome was assembled in a diploid state. CONCLUSIONS: The Linked-Read libraries we generated and the parameter space we identified provide theoretical considerations and practical guidelines for personal genome assemblies based on 10x Linked-Read sequencing.
BACKGROUND: Producing cost-effective haplotype-resolved personal genomes remains challenging. 10x Linked-Read sequencing, with its high base quality and long-range information, has been demonstrated to facilitate de novo assembly of human genomes and variant detection. In this study, we investigate in depth how the parameter space of 10x library preparation and sequencing affects assembly quality, on the basis of both simulated and real libraries. RESULTS: We prepared and sequenced eight 10x libraries with a diverse set of parameters from standard cell lines NA12878 and NA24385 and performed whole-genome assembly on the data. We also developed the simulator LRTK-SIM to follow the workflow of 10x data generation and produce realistic simulated Linked-Read data sets. We found that assembly quality could be improved by increasing the total sequencing coverage (C) and keeping physical coverage of DNA fragments (CF) or read coverage per fragment (CR) within broad ranges. The optimal physical coverage was between 332× and 823× and assembly quality worsened if it increased to >1,000× for a given C. Long DNA fragments could significantly extend phase blocks but decreased contig contiguity. The optimal length-weighted fragment length (W${\mu _{FL}}$) was ∼50-150 kb. When broadly optimal parameters were used for library preparation and sequencing, ∼80% of the genome was assembled in a diploid state. CONCLUSIONS: The Linked-Read libraries we generated and the parameter space we identified provide theoretical considerations and practical guidelines for personal genome assemblies based on 10x Linked-Read sequencing.
Authors: Fan Zhang; Lena Christiansen; Jerushah Thomas; Dmitry Pokholok; Ros Jackson; Natalie Morrell; Yannan Zhao; Melissa Wiley; Emily Welch; Erich Jaeger; Ana Granat; Steven J Norberg; Aaron Halpern; Maria C Rogert; Mostafa Ronaghi; Jay Shendure; Niall Gormley; Kevin L Gunderson; Frank J Steemers Journal: Nat Biotechnol Date: 2017-06-26 Impact factor: 54.908
Authors: Brock A Peters; Bahram G Kermani; Andrew B Sparks; Oleg Alferov; Peter Hong; Andrei Alexeev; Yuan Jiang; Fredrik Dahl; Y Tom Tang; Juergen Haas; Kimberly Robasky; Alexander Wait Zaranek; Je-Hyuk Lee; Madeleine Price Ball; Joseph E Peterson; Helena Perazich; George Yeung; Jia Liu; Linsu Chen; Michael I Kennemer; Kaliprasad Pothuraju; Karel Konvicka; Mike Tsoupko-Sitnikov; Krishna P Pant; Jessica C Ebert; Geoffrey B Nilsen; Jonathan Baccash; Aaron L Halpern; George M Church; Radoje Drmanac Journal: Nature Date: 2012-07-11 Impact factor: 49.962
Authors: John Huddleston; Swati Ranade; Maika Malig; Francesca Antonacci; Mark Chaisson; Lawrence Hon; Peter H Sudmant; Tina A Graves; Can Alkan; Megan Y Dennis; Richard K Wilson; Stephen W Turner; Jonas Korlach; Evan E Eichler Journal: Genome Res Date: 2014-01-13 Impact factor: 9.043
Authors: Amanda M Hulse-Kemp; Shamoni Maheshwari; Kevin Stoffel; Theresa A Hill; David Jaffe; Stephen R Williams; Neil Weisenfeld; Srividya Ramakrishnan; Vijay Kumar; Preyas Shah; Michael C Schatz; Deanna M Church; Allen Van Deynze Journal: Hortic Res Date: 2018-01-12 Impact factor: 6.793
Authors: Noah Spies; Ziming Weng; Alex Bishara; Jennifer McDaniel; David Catoe; Justin M Zook; Marc Salit; Robert B West; Serafim Batzoglou; Arend Sidow Journal: Nat Methods Date: 2017-07-17 Impact factor: 28.547