| Literature DB >> 24581555 |
Martin Hunt, Chris Newbold, Matthew Berriman, Thomas D Otto.
Abstract
BACKGROUND: Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics.Entities:
Mesh:
Year: 2014 PMID: 24581555 PMCID: PMC4053845 DOI: 10.1186/gb-2014-15-3-r42
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Scaffolding tools
| ABySS | 1.3.6 | 27/02/2009 | 764c | Graph | + | o | User’s choice | |
| Bambus2 | 3.1.0d | 16/09/2011 | 37 | Graph | - | - | User’s choice | Hard to install and run. Scripting work needed to prepare input for running scaffolding |
| GRASS | 0.003 | 06/04/2012 | 1 | Graph | - | - | BWA/NovoAlign | Hard to install and did not produce scaffolds |
| MIP | 0.5 | 13/10/2011 | 16 | Graph | o | - | User’s choice | A few dependencies required, significant work needed to prepare input files |
| Opera | 1.2 | 19/09/2011 | 30 | Graph | + | o | Bowtie/BWA | |
| SCARPA | 0.22 | 29/12/2012 | 3 | Graph | + | o | User’s choice | |
| SGA | 0.9.43 | 07/12/2011 | 74c | Graph | - | o | User’s choice | Several dependencies to install |
| SOAPdenovo2 | r223 | 27/12/2012 | 47c | Graph | o | - | SOAP2 | Little documentation on how to run scaffolder module alone |
| SOPRA | 1.4.6 | 24/06/2010 | 41 | Graph | + | - | User’s choice | Scripting work needed to prepare input for running scaffolding |
| SSPACE | 2 (basic) | 07/12/2010 | 151 | Greedy | + | + | Bowtie/BWAe | Extremely easy to install and run |
aDate the article was first available online.
bRetrieved from Google scholar 11 December 2013.
cCitations will include general assembly, not just scaffolding.
dBambus2 is part of the AMOS package. AMOS version 3.1.0 was used, but with latest goBambus2 script from the AMOS git repository.
eBWA only available with a paid version of the software.
Figure 1Data generation and results of test case 11. (a) Generation of contigs and read pairs for the test. (b) The test in graph form and the output of each scaffolder. Each node represents a 5 kb contig and each edge represents read pair evidence and is labelled with the read depth. Green nodes and edges mark the correct solution. Incorrect paths are coloured black. Numbers in brackets after each tool indicate the number of times that configuration was output by that tool. Tools with no number produced the same output on all runs.
Summary of datasets
| 2.8 | 32 | 27 | 0.76 | 76 | 505 | 96 | 96 | 95 | 95 | 97 | 97 | 97 | 97 | 40 | |
| 27 | 0.76 | 76 | 2,795 | 96 | 96 | 95 | 95 | 97 | 97 | 97 | 97 | 40 | |||
| 940 | 0.76 | 76 | 505 | 95 | 94 | 94 | 94 | 96 | 96 | 96 | 96 | 40 | |||
| 940 | 0.76 | 76 | 2,995 | 95 | 94 | 94 | 94 | 97 | 96 | 96 | 96 | 40 | |||
| 2.9 | 32 | 167 | 3.5 | 37 | 3,385 | 35 | 35 | 56 | 54 | 55 | 54 | 53 | 52 | 49 | |
| 4.6 | 68 | 570 | 2.1 | 101 | 3,695 | 17 | 11 | 36 | 32 | 69 | 68 | 35 | 30 | 62 | |
| 23.3 | 19 | 9,302 | 52.5 | 76 | 645 | 70 | 67 | 73 | 70 | 79 | 77 | 76 | 74 | 267 | |
| 9,302 | 12.0 | 75 | 2,705 | 27 | 25 | 31 | 30 | 43 | 42 | 33 | 32 | 33 | |||
| Human chromosome 14 GAGEb | 88.2 | 40 | 19,935 | 22.7 | 101 | 2,865 | 46 | 19 | 68 | 29 | 90 | 55 | 69 | 30 | 38 |
| 19,935 | 2.4 | 57-82 | 34,500 | 47 | 6 | 73 | 48 | 93 | 85 | 79 | 56 | 3 | |||
aCalculated from Bowtie2 mapping to contigs.
bExcluding the 19 Mb of Ns at the start of the sequence.
Results summary of simulated data
| | | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| ABySS | abyss-map | 100.0 | 1.00 | 66.7 | 0.91 | 98.9 | 0.0 | 1.00 | 99.5 | 0.0 | 1.00 |
| Bambus2 | Bowtie 2 | 100.0 | 1.00 | 66.7 | 0.91 | 63.7 | 0.0 | 0.84 | 95.9 | 0.0 | 0.99 |
| BWA | 48.2 | 0.82 | 48.2 | 0.87 | 59.6 | 0.0 | 0.82 | 96.2 | 0.0 | 0.99 | |
| MIP | Bowtie -v 0 | 100.0 | 1.00 | 96.3 | 0.99 | 98.9 | 0.0 | 1.00 | 98.4 | 0.0 | 1.00 |
| Bowtie -v 3 | 100.0 | 1.00 | 96.3 | 0.99 | 98.4 | 0.0 | 0.99 | 97.9 | 0.0 | 1.00 | |
| Bowtie 2 | 100.0 | 1.00 | 29.6 | 0.82 | 96.3 | 0.5 | 0.74 | 98.7 | 0.5 | 0.96 | |
| BWA | 100.0 | 1.00 | 33.3 | 0.83 | 98.0 | 1.1 | 0.54 | 98.2 | 0.4 | 0.96 | |
| Opera | Bowtie | 100.0 | 1.00 | 92.6 | 0.98 | 98.4 | 0.0 | 0.99 | 1.0 | 10.0 | 0.79 |
| BWA | 100.0 | 1.00 | 92.6 | 0.98 | 99.8 | 0.2 | 0.93 | 1.2 | 80.0 | 0.37 | |
| SCARPA | Bowtie -v 0 | 100.0 | 1.00 | 96.3 | 0.99 | 98.9 | 0.0 | 1.00 | 95.0 | 0.0 | 0.99 |
| Bowtie -v 3 | 100.0 | 1.00 | 96.3 | 0.99 | 98.6 | 0.0 | 0.99 | 96.3 | 0.0 | 0.99 | |
| Bowtie 2 | 85.2 | 0.95 | 96.3 | 0.99 | 96.8 | 0.0 | 0.99 | 76.3 | 0.7 | 0.92 | |
| BWA | 85.2 | 0.95 | 92.6 | 0.98 | 96.6 | 0.0 | 0.98 | 77.9 | 0.4 | 0.93 | |
| SGA | Bowtie 2 | 100.0 | 1.00 | 96.3 | 0.99 | 97.3 | 0.0 | 0.99 | 97.6 | 0.0 | 1.00 |
| BWA | 100.0 | 1.00 | 92.6 | 0.98 | 99.0 | 0.0 | 1.00 | 96.2 | 0.0 | 0.99 | |
| SOAP2 | SOAP2 | 96.3 | 0.99 | 96.3 | 0.99 | 98.6 | 0.0 | 0.99 | 99.5 | 0.0 | 1.00 |
| SOPRA | Bowtie -v 0 | 100.0 | 1.00 | 96.3 | 0.99 | 98.3 | 0.0 | 0.99 | 98.2 | 0.0 | 1.00 |
| Bowtie -v 3 | 100.0 | 1.00 | 96.3 | 0.99 | 97.2 | 0.0 | 0.99 | 97.2 | 0.0 | 1.00 | |
| Bowtie 2 | 74.1 | 0.91 | 100.0 | 1.00 | 91.5 | 0.5 | 0.82 | 85.6 | 0.5 | 0.93 | |
| BWA | 74.1 | 0.91 | 88.9 | 0.97 | 92.9 | 0.2 | 0.89 | 83.1 | 0.4 | 0.93 | |
| SSPACE | Bowtie -v 0 | 100.0 | 1.00 | 92.6 | 0.98 | 99.1 | 0.0 | 1.00 | 99.6 | 0.0 | 1.00 |
| Bowtie -v 3 | 100.0 | 1.00 | 92.6 | 0.98 | 98.7 | 0.0 | 0.99 | 99.3 | 0.0 | 1.00 | |
aNo incorrect joins were made by any of the tools.
Figure 2Simulated contigs, artificial contigs and sequence tags. (a) Generation of simulated contigs and reads from the S. aureus reference sequence. (b) Generation of artificial contigs from assembler output. (c) Tag types. Tags 1 and 2 are a correct join. Tags 2 and 4 demonstrate a skipped tag because the output scaffold jumps over tag 3. Tag 3 also does not appear in the output and is therefore a lost tag. Tags 4 and 5 are in the wrong orientation and tags 5 and 6 belong to different sequences in the reference.
Figure 3Genome-scale data results. (a)S. aureus GAGE data, (b)P. falciparum combined short and long data and (c) human chromosome 14 combined short and long insert data. Scatterplots show the relationship between correct and incorrect joins made by each scaffolder. Boxplots show the distribution of summary scores when iterating over different score combinations. The white circles in the boxplots denote the score from our chosen weighting system that focuses on penalising errors (with weights: correct join = 80, incorrect join = 160, lost tag = 160, skipped tag = 40, running time = 1).
Results summary of genome scale data
| ABySS | abyss-map | 59.3 | 2.0 | 0.87 | 61.5 | 1.2 | 0.85 | 47.1 | 0.5 | 0.83 |
| Bambus2 | Bowtie 2 | 56.9 | 2.1 | 0.82 | NAa | NAa | NAa | 51.6 | 1.5 | 0.72 |
| BWA | 57.5 | 0.0 | 0.86 | NAa | NAa | NAa | 54.0 | 2.1 | 0.69 | |
| MIP | Bowtie -v 0 | 1.2 | 0.0 | 0.82 | 89.8 | 4.0 | 0.75 | 34.5 | 4.7 | 0.43 |
| Bowtie -v 3 | 0.0 | NA | 0.82 | 89.3 | 4.8 | 0.68 | 42.8 | 7.5 | 0.10 | |
| Bowtie 2 | 0.0 | NA | 0.82 | 86.9 | 6.0 | 0.65 | NAa | NAa | NAa | |
| BWA | 0.0 | NA | 0.82 | 86.0 | 8.0 | 0.42 | NAa | NAa | NAa | |
| Opera | Bowtie | 67.1 | 8.9 | 0.65 | 69.2 | 2.7 | 0.74 | 64.5 | 0.4 | 0.88 |
| BWA | 64.7 | 12.2 | 0.56 | 69.1 | 3.0 | 0.72 | 63.9 | 1.1 | 0.84 | |
| SCARPA | Bowtie -v 0 | 50.3 | 8.7 | 0.72 | 79.6 | 4.7 | 0.39 | 52.8 | 1.4 | 0.80 |
| Bowtie -v 3 | 49.1 | 10.9 | 0.31 | 80.8 | 4.5 | 0.41 | 52.6 | 1.4 | 0.80 | |
| Bowtie 2 | 46.1 | 17.2 | 0.53 | 78.9 | 4.8 | 0.39 | 53.7 | 1.5 | 0.79 | |
| BWA | 46.7 | 7.1 | 0.77 | 79.7 | 4.8 | 0.40 | 53.6 | 1.4 | 0.80 | |
| SGA | Bowtie 2 | 49.7 | 1.2 | 0.88 | 52.8 | 0.9 | 0.80 | 49.0 | 0.0 | 0.85 |
| BWA | 43.1 | 1.4 | 0.88 | 51.1 | 5.2 | 0.65 | 41.1 | 0.4 | 0.82 | |
| SOAP2 | SOAP2 | 78.4 | 8.4 | 0.68 | 64.3 | 3.7 | 0.76 | 79.0 | 2.4 | 0.80 |
| SOPRA | Bowtie -v 0 | 53.3 | 2.2 | 0.85 | 87.5 | 0.8 | 0.97 | 66.1 | 0.4 | 0.90 |
| Bowtie -v 3 | 41.3 | 4.2 | 0.80 | 83.6 | 0.8 | 0.95 | 52.3 | 2.2 | 0.74 | |
| Bowtie 2 | 24.0 | 4.8 | 0.80 | 75.4 | 0.8 | 0.91 | NAa | NAa | NAa | |
| BWA | 25.1 | 6.7 | 0.77 | 81.5 | 0.6 | 0.95 | NAa | NAa | NAa | |
| SSPACE | Bowtie -v 0 | 65.9 | 7.6 | 0.72 | 63.0 | 1.4 | 0.85 | 44.0 | 0.3 | 0.83 |
| Bowtie -v 3 | 62.9 | 11.0 | 0.62 | 63.3 | 2.0 | 0.83 | 46.4 | 0.4 | 0.84 | |
aData not available because scaffolder required more than 30 GB of memory or did not finish within 12 days.