| Literature DB >> 25077983 |
Basil Britto Xavier, Julia Sabirova, Moons Pieter, Jean-Pierre Hernalsteens, Henri de Greve, Herman Goossens, Surbhi Malhotra-Kumar1.
Abstract
BACKGROUND: De novo genome assembly can be challenging due to inherent properties of the reads, even when using current state-of-the-art assembly tools based on de Bruijn graphs. Often users are not bio-informaticians and, in a black box approach, utilise assembly parameters such as contig length and N50 to generate whole genome sequences, potentially resulting in mis-assemblies.Entities:
Mesh:
Year: 2014 PMID: 25077983 PMCID: PMC4118782 DOI: 10.1186/1756-0500-7-484
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Assembly statistics of Velvet applied on (MRSA) strain E-MRSA15-CC22-SCC IV showing an increase in contig size and N50 when using higher sizes, but revealing a mis-assembly starting from size 97 using whole genome mapping
| | |||||
| 81 | 162295 | 40 | 340060 | 1 (10) | 122303 |
| 83 | 170447 | 38 | 351373 | 1 (9) | 122303 |
| 85 | 170449 | 37 | 351321 | 0 (10) | |
| 87 | 173763 | 33 | 351326 | 0 (10) | |
| 89 | 173765 | 33 | 351394 | 0 (10) | |
| 91 | 173767 | 33 | 351330 | 0 (10) | |
| 93 | 173769 | 35 | 340092 | 0 (10) | |
| 97 | 175770 | 33 | 365247 | 1 (9) | 130273 |
| 99 | 175776 | 33 | 365260 | 1 (10) | 130273 |
| 101 | 187438 | 32 | 365623 | 1 (9) | 130273 |
| 103 | 187448 | 32 | 365625 | 1 (9) | 130273 |
| 105 | 187458 | 32 | 365638 | 1 (9) | 130273 |
| 107 | 187465 | 33 | 365647 | 1 (9) | 130273 |
| 109 | 212189 | 32 | 365656 | 1 (9) | 130273 |
| 111 | 212287 | 33 | 349286 | 2 (8) | 93632 & 153207 |
| 113 | 212292 | 34 | 349288 | 1 (10) | 93634 |
| 115 | 212294 | 34 | 349290 | 1 (10) | 118928 |
| 117 | 174074 | 35 | 349419 | 1 (11) | 93634 |
| 119 | 174076 | 35 | 349423 | 1 (11) | 93638 |
| 121 | 170642 | 37 | 349435 | 0 (11) | |
| 123 | 170654 | 38 | 340456 | 0 (10) | |
*Number of mapped contigs indicated between brackets.
Figure 1Alignment of contigs to the corresponding whole genome map: A) Velvet derived assembly using size 93, revealing no mis-assemblies; B) Velvet derived assembly using size 115, corresponding to the highest N50, but revealing mis-assemblies; C: SPAdes derived assembly using a multi- approach up to k-mer size 83, yielding the optimal N50 for this sequence and showing no mis-assemblies; D: SPAdes derived assembly using a multi- approach up to k-mer size 77, yielding the optimal N50 for this sequence, but showing mis-assemblies.