| Literature DB >> 27159086 |
Yulia Mostovoy1, Michal Levy-Sakin1, Jessica Lam1, Ernest T Lam2, Alex R Hastie2, Patrick Marks3, Joyce Lee2, Catherine Chu1, Chin Lin1, Željko Džakula2, Han Cao2, Stephen A Schlebusch4, Kristina Giorda3, Michael Schnall-Levin3, Jeffrey D Wall5, Pui-Yan Kwok1,5,6.
Abstract
Despite tremendous progress in genome sequencing, the basic goal of producing a phased (haplotype-resolved) genome sequence with end-to-end contiguity for each chromosome at reasonable cost and effort is still unrealized. In this study, we describe an approach to performing de novo genome assembly and experimental phasing by integrating the data from Illumina short-read sequencing, 10X Genomics linked-read sequencing, and BioNano Genomics genome mapping to yield a high-quality, phased, de novo assembled human genome.Entities:
Mesh:
Year: 2016 PMID: 27159086 PMCID: PMC4927370 DOI: 10.1038/nmeth.3865
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Figure 1Flowchart depicting genome sequence assembly strategy.
Summary of assembly statistics for human sample NA12878
The different rows correspond to the results from the initial de novo short-read-based assembly, the 10XG-scaffolded assembly, the BNG map assembly, and the final hybrid assembly, respectively. Statistics for the Illumina assembly were calculated after filtering for scaffolds that were at least 3 kb in length, since those served as input to the next step of the assembly.
| Assembly | Total map length (Gb) | Number of Scaffolds | Scaffold N50 (Mb) | Longest Scaffold (Mb) |
|---|---|---|---|---|
| 2.79 | 14,047 | 0.59 | 5.57 | |
| 2.81 | 5,697 | 7.03 | 37.9 | |
| 2.93 | 1,079 | 4.59 | 26.6 | |
| 2.86 | 170 | 33.5 | 99.96 |
Figure 2Schematic from the UCSC Genome Browser showing the relative sizes of scaffolds produced during each step of the assembly process, as well as haplotype blocks, for the hybrid scaffold (64 Mb) aligned to the q arm of reference chromosome X
(a) Assembly based on short-read Illumina ata filtered for scaffolds longer than 3 kb; (b) the short-read assembly scaffolded together using barcode information from 10XG data; (c) assembled BNG genome maps; (d) hybrid scaffold produced by merging b and c; (e) barcode-based haplotype blocks for this region; (f) dot plot of the region against reference genome hg38.
Figure 3Alignment and phasing of the hybrid assembly
(a) Ideograms of the hybrid scaffold assembly aligned to the reference genome hg38, with each colored block representing an assembled scaffold. (b) A 23-Mb phase block (super scaffold 259, aligned to Chr 3 50 Mb-73 Mb) at increasing resolution showing the alleles on the two haplotypes (green vertical line: assembly allele; grey vertical line, alternate allele). Where a green or grey vertical line is not matched with a corresponding mark, the allele is indeterminate on that haplotype.
Comparison with other NA12878 assemblies.
| This study | Ref | ALLPATHS-LG[ | |
|---|---|---|---|
| Illumina paired-end and mate-pair reads; 10XG reads; BNG genome maps | PacBio reads; BNG genome maps | Illumina paired-end, mate-pair, and fosmid-based short reads | |
| 33.5 | 31.1 | 11.5 | |
| 170 | 202 | 23,634 | |
| 2.86 | 2.76 | 2.78 | |
| 95.2 | 97.5 | 93.5 | |
| 10.2 | 4.61 | 5.90 | |
| 4.7 Mb | 145 kb | N/A | |
| 2,783,119 | 2,421,740 | N/A |