| Literature DB >> 27307620 |
Volodymyr Kuleshov1, Michael P Snyder2, Serafim Batzoglou3.
Abstract
MOTIVATION: Despite rapid progress in sequencing technology, assembling de novo the genomes of new species as well as reconstructing complex metagenomes remains major technological challenges. New synthetic long read (SLR) technologies promise significant advances towards these goals; however, their applicability is limited by high sequencing requirements and the inability of current assembly paradigms to cope with combinations of short and long reads.Entities:
Mesh:
Year: 2016 PMID: 27307620 PMCID: PMC4908351 DOI: 10.1093/bioinformatics/btw267
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.High-level overview of SLR and read cloud technologies. DNA (1) is sheared into kilobase-long fragments (2), which are then diluted and placed into multiple containers, typically with 0.1–2% of the genome per container (3). Within each container, fragments may be amplified before being cut into short fragments, and barcoded (4). The barcoded fragments are finally pooled together and sequenced (5); reads can be demultiplexed on a computer into their original compartment via the barcodes in order to form read clouds or SLRs
Fig. 2.Scaffolding using read clouds. A genome contains a repeat R flanked by unique sequences (A, B) and (C, D) (top). With short reads, the correct assembly is ambiguous (middle). If two read clouds (marked as red and orange) map, respectively, to ARB and CRD, this provides signal that may be used to correctly resolve the repeat structure (bottom).
Assembly evaluation of Architect on four de novo assembly datasets.
| Genome + sequencing method | Scaffolds | Largest scaffold (kb) | Mb assembled | % assembled | N50 (kb) | NA50 (kb) | Misassemblies |
|---|---|---|---|---|---|---|---|
| Shotgun reads | 65 510 | 314.5 | 143.7 | 100.0 | 44.8 | 43.1 | 2265 |
| Long reads | 5064 | 341.5 | 127.5 | 88.7 | 45.3 | 43.2 | 1742 |
| Shotgun and long reads | 29 809 | 649.4 | 117.4 | 81.7 | 123.9 | 115.1 | 2024 |
| FragScaff† | 63 018 | 567.8 | 55.3 | 38.6 | 56.8 | 55.2 | 2289 |
| Shotgun and read clouds† | 57 567 | 1767.4 | 143.7 | 100.0 | 262.8 | 252.2 | 2341 |
| Shotgun reads | 32 092 | 383.1 | 100.1 | 99.9 | 35.6 | 31.9 | 307 |
| Long reads | 2345 | 555.0 | 96.3 | 96.4 | 81.2 | 76.0 | 363 |
| Shotgun and long reads | 2423 | 569.0 | 83.3 | 83.5 | 95.6 | 68.7 | 771 |
| FragScaff† | 29 320 | 510.2 | 40.3 | 40.4 | 51.1 | 50.2 | 321 |
| Shotgun and read clouds† | 4235 | 630.9 | 99.6 | 99.7 | 120.2 | 113.4 | 331 |
| Shotgun reads | 36 081 | 414.0 | 34.0 | 41.1 | 19.1 | 18.8 | 34 |
| Long reads | 914 | 405.1 | 17.6 | 21.2 | 24.6 | 24.2 | 29 |
| Shotgun and long reads | 22 562 | 553.3 | 42.5 | 51.2 | 35.1 | 34.3 | 113 |
| FragScaff† | 33 180 | 510.1 | 10.2 | 12.3 | 33.2 | 31.1 | 37 |
| Shotgun and read clouds† | 17 688 | 743.4 | 34.0 | 41.1 | 173.7 | 173.7 | 39 |
| Bona fide | |||||||
| Shotgun reads | 128 131 | 34.1 | 230.1 | — | 5.3 | — | — |
| Long reads | 12 432 | 89.2 | 170.2 | — | 8.2 | — | — |
| Shotgun and long reads | 121 319 | 101.9 | 289.5 | — | 15.3 | — | — |
| FragScaff† | 127 943 | 40.2 | 100.3 | — | 6.2 | — | — |
| Shotgun and read clouds† | 123 975 | 91.4 | 288.1 | — | 13.3 | — |
Note: Note that metrics reported for FragScaff and Architect correspond to orderings of contigs rather than scaffolds (this is indicated by a†)
Effect cloud sparsity on assembly quality
| Subsample | Number of reads (M) | N50 (kb) | NA50 (kb) | Size (Mb) | Max (kb) |
|---|---|---|---|---|---|
| 25% | 53.1 | 262.8 | 252.2 | 143.7 | 1767.4 |
| 15% | 31.9 | 261.4 | 250.8 | 143.7 | 1340.2 |
| 10% | 21.2 | 242.2 | 224.5 | 143.7 | 961.3 |
| 5% | 10.6 | 178.8 | 160.4 | 143.7 | 611.3 |
Note: Results are reported for orderings of Drosophila input scaffolds produced by Architect.