| Literature DB >> 28198678 |
Lina Yuan1,2,3, Yang Yu4, Yanmin Zhu1, Yulai Li5, Changqing Li6, Rujiao Li1, Qin Ma7, Gilman Kit-Hang Siu8, Jun Yu1, Taijiao Jiang2,3, Jingfa Xiao9, Yu Kang10.
Abstract
BACKGROUND: Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements.Entities:
Keywords: Core-gene-defined Genome Organizational Framework (cGOF); Prokaryotic genome; Rearrangement; Scaffolding
Mesh:
Year: 2017 PMID: 28198678 PMCID: PMC5310280 DOI: 10.1186/s12864-016-3267-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1The schematic framework of GAAP. seg, segment of cGOF. ref, reference; sc, scaffold/contig (assemblies). Head (filled circle) and tail (empty circle) vertices of the syntenic seg in each reference are sequentially connected with a dashed line indicating the seg permutation (order and orientation). The sc are indexed with seg and merged into ordered sc “strings”. The graph in the local scaffolding of ordered sc is built by connecting seg-ordered sc and unordered sc, where the PE links are higher than a certain cut-off. The line widths indicate the link count. Pseudo-genome of draft-quality assembly is constructed by combining the indexed scaffolds and the closest relevant seg permutaion of references
Performance of reference-assisted assembly tools
|
|
|
| |
|---|---|---|---|
| Ref complete genome, Mb | |||
| 1.85 | 1.67 | 3.08 | |
| Original assemblies | |||
| Number (>300 bp) | 38 | 40 | 41 |
| N50, kb | 124 | 134 | 236 |
| Final scaffolds | |||
| GAAP | 1 | 1 | 1 |
| AlignGraph | 11 | 7 | 15 |
| Ragout | 1 | 1 | 1 |
| MeDuSa | 1 | 1 | 1 |
| Coverage, recovered % (falsely located %) | |||
| GAAP | 98.65 (0.15) | 98.57 (0.06) | 97.85 (0) |
| AlignGraph | 81.93 (8.78) | 61.75 (0) | 79.13 (0) |
| Ragout | 97.35 (17.7) | 98.28 (0.02) | 98.05 (0) |
| MeDuSa | 98.2 (19.7) | 98.95 (0.10) | 98.19 (11.15) |
| Errors | |||
| GAAP | 19 | 5 | 8 |
| AlignGraph | 3 | 2 | 1 |
| Ragout | 6 | 5 | 5 |
| MeDuSa | 9 | 11 | 14 |
| Corrected N50, kb | |||
| GAAP | 273 | 277 | 2,026 |
| AlignGraph | 124 | 138 | 351 |
| Ragout | 103,6 | 1,121 | 1,323 |
| MeDuSa | 252 | 207 | 245 |
Performance on species of single-segment cGOF
|
|
| |
|---|---|---|
| Reference genome, Mb | ||
| 2.82 | 2.81 | |
| Original assemblies | ||
| number (>300 bp) | 12 | 26 |
| N50 (kp) | 1,416 | 262 |
| Coverage, recovered % (falsely located %) | ||
| GAAP | 99.0 (0) | 98.75 (0) |
| MeDuSa | 99.06 (17.6) | 99.2 (3.01) |
| Ragout | 99.05 (0) | 98.62 (0) |
| Errors | ||
| GAAP | 9 | 3 |
| MeDuSa | 11 | 6 |
| Ragout | 10 | 4 |
| Corrected N50, kb | ||
| GAAP | 1,519 | 2,276 |
| MeDuSa | 499 | 637 |
| Ragout | 1,534 | 1,757 |
Performance on species with symmetrical cGOF
|
|
|
|
|
|
| |
|---|---|---|---|---|---|---|
| Reference genome, Mb | ||||||
| 1.86 | 2.18 | 2.12 | 2.26 | 2.12 | 2.14 | |
| Original scaffolds | ||||||
| Number (>300 bp) | 27 | 32 | 41 | 74 | 86 | 67 |
| N50, kp | 123 | 170 | 166 | 73 | 45 | 71 |
| Coverage, recovered % (falsely located %) | ||||||
| GAAP | 97.17 (8.78) | 97.28 (1.53) | 94.53 (1.56) | 94.04 (0) | 98.16 (0.69) | 96.10 (0) |
| MeDuSa | 98.99 (9.13) | 98.70 (0.41) | 98.61 (0.04) | 91.37 (37.12) | 94.50 (37.79) | 95.10 (0.20) |
| Ragout | 97.09 (8.78) | 95.76 (38.08) | 93.34 (1.56) | 94.90 (3.05) | 94.36 (6.12) | 95.60 (38.41) |
| Errors | ||||||
| GAAP | 6 | 14 | 17 | 29 | 20 | 24 |
| MeDuSa | 9 | 7 | 10 | 26 | 16 | 13 |
| Ragout | 5 | 12 | 7 | 16 | 12 | 12 |
| Corrected N50, kb | ||||||
| GAAP | 1292 | 478 | 258 | 284 | 304 | 244 |
| MeDuSa | 964 | 1,433 | 311 | 142 | 208 | 368 |
| Ragout | 1,330 | 414 | 1,212 | 338 | 654 | 398 |
Performance on species of asymmetrical cGOF
|
|
|
|
| |
|---|---|---|---|---|
| Reference genome, Mb | ||||
| 4.71 | 4.71 | 1.61 | 1.58 | |
| Original assemblies | ||||
| number (>300 bp) | 105 | 85 | 29 | 39 |
| N50, kb | 105 | 176 | 165 | 142 |
| Coverage, recovered % (falsely located %) | ||||
| GAAP | 91.89 (0) | 95.29 (0) | 98.91 (0) | 89.30 (5.23) |
| MeDuSa | 97.25 (0.96) | 97.89 (34.82) | 99.29 (10.13) | 89.61 (16.01) |
| Ragout | 96.19 (0.05) | 95.45% (0) | 98.48 (4.76) | 89.08 (5.43) |
| Errors | ||||
| GAAP | 20 | 33 | 13 | 14 |
| MeDuSa | 29 | 13 | 9 | 7 |
| Ragout | 21 | 21 | 25 | 9 |
| Corrected N50, kb | ||||
| GAAP | 411 | 313 | 1292 | 190 |
| MeDuSa | 268 | 308 | 964 | 348 |
| Ragout | 689 | 689 | 1330 | 348 |
aEmpirical PE reads data downloaded from NCBI SRA (SRR001665)
bSimulated PE reads data