| Literature DB >> 28778149 |
Karen M Moll1,2, Peng Zhou3, Thiruvarangan Ramaraj1, Diego Fajardo1, Nicholas P Devitt1, Michael J Sadowsky4, Robert M Stupar5, Peter Tiffin6, Jason R Miller7, Nevin D Young6, Kevin A T Silverstein8, Joann Mudge9.
Abstract
BACKGROUND: Third generation sequencing technologies, with sequencing reads in the tens- of kilo-bases, facilitate genome assembly by spanning ambiguous regions and improving continuity. This has been critical for plant genomes, which are difficult to assemble due to high repeat content, gene family expansions, segmental and tandem duplications, and polyploidy. Recently, high-throughput mapping and scaffolding strategies have further improved continuity. Together, these long-range technologies enable quality draft assemblies of complex genomes in a cost-effective and timely manner.Entities:
Keywords: BioNano; Dovetail; Genome assembly; Medicago truncatula; Next generation sequencing; PacBio
Mesh:
Year: 2017 PMID: 28778149 PMCID: PMC5545040 DOI: 10.1186/s12864-017-3971-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Number and characteristics of contigs and scaffolds for each of the five assemblies
| PacBio (Pb) | PacBio BioNano (PbBn) | PacBio Dovetail (PbDt) | PacBio BioNano Dovetail (PbBnDt) | PacBio Dovetail BioNano (PbDtBn) | |
|---|---|---|---|---|---|
| Assembly software | FALCON | FALCON Irys | FALCON HiRise | FALCON Irys HiRise | FALCON HiRise Irys |
| Contigs | 1, 073 | 1, 073 | 1, 121 | 1, 125 | 1, 121 |
| Contig Length | 396,973,838 | 396,973,942 | 396,973,838 | 396,973,942 | 396,973,934 |
| Contig N50a | 3,768,504 | 3,768,512 | 3,768,504 | 3,768,512 | 3,768,504 |
| Scaffolds | 1, 073 | 993 | 1, 005 | 965 | 942 |
| Scaffold Length | 396,973,838 | 401,421,527 | 396,985,438 | 401,429,527 | 399,955,467 |
| Maximum Scaffold Length | 13,488,151 | 22,885,216 | 19,275,758 | 12,137,306 | 12,557,854 |
| Scaffold N50a | 3,768,504 | 6,819,834 | 6,895,511 | 12,137,306 | 12,557,854 |
aN50 s were also adjusted to use an assembly length of 400 Mb for all assemblies in order to facilitate comparisons across assemblies. Scaffold and contig N50 s adjusted for a 400 Mb assembly size were identical to unadjusted N50 s shown above, except for the PbDt scaffold N50 for which the adjusted N50 was 6,348,449 nt
Characteristics of input scaffolds that were joined by BioNano and/or Dovetail
| Assembly | Pb - > PbDt | Pb - > PbBn | PbDt - > PbDtBn | PbBn - > PbBnDt |
|---|---|---|---|---|
| Scaffolds | 172 | 140 | 96 | 114 |
| Max Scaffold | 13,488,151 | 13,488,151 | 19,275,758 | 22,885,216 |
| Scaffold N50 | 3,957,684 | 3,698,567 | 6,895,511 | 6,819,834 |
| Scaffold N90 | 854,372 | 929,179 | 1,425,957 | 1,427,073 |
| Min Scaffold | 4, 765 | 172,295 | 98,093 | 4, 765 |
| Total Scaffold Length | 307,402,024 | 293,002,927 | 260,974,793 | 289,680,947 |
Characteristics of the gaps introduced into the assemblies by BioNano and Dovetail. Note, there are no gaps in the Pb only base assembly so it is not included
| PbBn | PbDt | PbBnDt | PbDtBn | |
|---|---|---|---|---|
| Captured Gaps | 80 | 116 | 160 | 179 |
| Max Gap | 647,836 | 100 | 647,836 | 647,022 |
| Min Gap | 500 | 100 | 100 | 100 |
| Mean Gap | 55,595 | 100 | 27,847 | 16,657 |
| Gap N50 | 171,515 | 100 | 171,515 | 105,896 |
| Total Gap Length | 4,447,585 | 11,600 | 4,455,585 | 2,981,533 |
Assembly Statistics for R108 version 1.0 (PbDtBn PBJelly gap filled) and its input assembly (PbDtBn)
| R108 v 1.0 | PbDtBn | |
|---|---|---|
| Contigs | 1, 016 | 1, 121 |
| Contig Length | 399,348,944 | 396,973,934 |
| Contig N50 | 5,925,378 | 3,768,504 |
| Scaffolds | 909 | 942 |
| Scaffold Length | 402,065,285 | 399,955,467 |
| Scaffold N50 | 12,848,239 | 12,557,854 |
R108 v 1.0 assembly characteristics in comparison to the A17 reference assembly
| Nucleotides | % Nucleotides | |
|---|---|---|
| Total Bases | 399,348,955 | 100.00% |
| Repetitive | 96,760,262 | 24.23% |
| Alignable to A17 | 366,489,898 | 91.77% |
| Bases in Synteny with A17 | 283,853,354 | 71.08% |
| Novel Sequences vs A17 | 22,763,508 | 5.70% |
| Novel Coding Sequences vs A17 | 1,623,097 | 0.41% |
Fig. 1Synteny alignment of partial chromosomes 4 and 8 between A17 and R108 confirms rearrangement of the long arms of the chromosomes
Fig. 2Synteny alignment of partial A17 chromosomes 4 and 8 against syntenic regions in the R108 Illumina-based assembly (top panel), PacBio-based assembly (Pb, middle panel) as well as the gap-filled PbDtBn (v1.0) assembly (bottom panel)
Fig. 3Schematic of the rearrangement between chromosomes 4 and 8 in A17 (left) compared to R108 (right). Green segments indicate homology to A17’s chromosome 4 while blue segments indicate homology to A17 chromosome 8. Red segments indicate sequences not present in the A17 reference). Breakpoint 1 (br1) is pinpointed to a 104 bp region (chr4:39,021,788-39,021,891) and includes a 100 bp gap. Breakpoint 2 (br2) is pinpointed to a 7665 bp region (chr8:33,996,308-34,003,972) and includes a 7663 bp gap. Breakpoint 3 (br3) is pinpointed to a 708 bp region (chr8: 34,107,285-34,107,992) and includes a 100 bp gap. Breakpoint 4 is pinpointed to a 277 bp region (chr8:34,275,249-34,275,525) and includes a 100 bp gap)