| Literature DB >> 23031320 |
Veli Mäkinen1, Leena Salmela, Johannes Ylinen.
Abstract
BACKGROUND: For the development of genome assembly tools, some comprehensive and efficiently computable validation measures are required to assess the quality of the assembly. The mostly used N50 measure summarizes the assembly results by the length of the scaffold (or contig) overlapping the midpoint of the length-order concatenation of scaffolds (contigs). Especially for scaffold assemblies it is non-trivial to combine a correctness measure to the N50 values, and the current methods for doing this are rather involved.Entities:
Mesh:
Year: 2012 PMID: 23031320 PMCID: PMC3556137 DOI: 10.1186/1471-2105-13-255
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Normalized N50 on original assembly versus varying randomized assemblies with the same N50
| Normalized N50 | 183891 | 92212 | 56964 | 43461 | 33533 | 30403 |
| Genome coverage | 0.9333 | 0.6410 | 0.4778 | 0.4153 | 0.3421 | 0.3311 |
| Scaffold coverage | 0.9859 | 0.6847 | 0.5071 | 0.4414 | 0.3642 | 0.3522 |
Figure 1An example where greedy extraction of gap-restricted co-linear chains may result into more pieces than optimal. Greedy selection would align blocks 2, 3, 4 with dashed edges, but then with suitable gap-restriction blocks 1 and 5 could not be aligned together, and the assembly would be split into 3 parts. Optimal algorithm can choose 2 and 4 with dashed edges and then blocks 1, 3, 5 together, resulting into 2 parts only. It is possible to construct such an example even without multiple mappings for the blocks.