| Literature DB >> 29511353 |
Abdul Rafay Khan1, Muhammad Tariq Pervez1, Masroor Ellahi Babar2, Nasir Naveed3, Muhammad Shoaib4.
Abstract
BACKGROUND: Current advancements in next-generation sequencing technology have made possible to sequence whole genome but assembling a large number of short sequence reads is still a big challenge. In this article, we present the comparative study of seven assemblers, namely, ABySS, Velvet, Edena, SGA, Ray, SSAKE, and Perga, using prokaryotic and eukaryotic paired-end as well as single-end data sets from Illumina platform.Entities:
Keywords: DBG (de Bruijn graph); ENA (European Nucleotide Archive); NGS (next-generation sequencing); OLC (overlap layout consensus); bps (base pairs)
Year: 2018 PMID: 29511353 PMCID: PMC5826002 DOI: 10.1177/1176934318758650
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Prokaryotic data sets used in this study.
| S. no. | Data set | ENA run accession | Data set type | No. of reads |
|---|---|---|---|---|
| 1 |
| ERR353143 | Paired-end | 137 022 |
| 2 |
| ERR490828 | Paired-end | 321 004 |
| 3 |
| ERR490638 | Paired-end | 737 008 |
| 4 |
| ERR495003 | Paired-end | 770 994 |
| 5 |
| DRR015798 | Paired-end | 1 218 573 |
| 6 |
| DRR015726 | Paired-end | 2 267 875 |
| 7 |
| DRR015851 | Paired-end | 4 098 002 |
| 8 |
| DRR015872 | Single-end | 113 512 |
| 9 |
| SRR1148216 | Single-end | 724 546 |
| 10 |
| ERR233905 | Single-end | 1 490 584 |
| 11 |
| SRR969383 | Single-end | 1 840 438 |
| 12 |
| SRR1736648 | Single-end | 3 099 636 |
| 13 |
| ERR465798 | Single-end | 5 094 314 |
| 14 |
| ERR1596542 | Single-end | 7 466 661 |
| 15 |
| SRR1038047 | Single-end | 9 129 274 |
Eukaryotic data sets used in this study.
| S. no. | Data set | ENA run accession | Data set type | No. of reads |
|---|---|---|---|---|
| 1 |
| DRR002191 | Paired-end | 126 605 856 |
| 2 |
| DRR016722 | Paired-end | 95 461 377 |
| 3 |
| ERR1224454 | Paired-end | 30 841 688 |
| 4 |
| ERR052652 | Paired-end | 17 584 902 |
| 5 | Fungi | SRR1614243 | Paired-end | 22 344 195 |
| 6 |
| DRR002191 | Single-end | 126 605 856 |
| 7 |
| DRR002191 | Single-end | 95 461 377 |
| 8 |
| ERR1224454 | Single-end | 30 841 688 |
| 9 |
| ERR052652 | Single-end | 17 584 902 |
| 10 | Fungi | SRR1614243 | Single-end | 22 344 195 |
De novo assemblers selected for this study.
| S. no. | ASSEMBLER | Programming LANGUAGE | ALGORITHM | Input reads |
|---|---|---|---|---|
| 1 | ABySS[ | C++ | De Bruijn graph (DBG) | Paired-end and single-end |
| 2 | Velvet[ | C | De Bruijn graph (DBG) | Paired-end and single-end |
| 3 | Edena[ | C++ | Overlap/layout/consensus (OLC) | Paired-end and single-end |
| 4 | SGA[ | C++ | String graph | Paired-end |
| 5 | Ray[ | C++ | Hybrid | Paired-end and single-end |
| 6 | SSAKE[ | Perl | Greedy | Paired-end and single-end |
| 7 | Perga[ | C | Greedy | Paired-end and single-end |
Figure 1.The comparison of total median assembling time of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.
Figure 2.The mean comparison of memory usage and CPU usage of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.
Figure 3.The comparison of the total number of contigs by median of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.
Figure 4.The comparison of the N50 contig length by median of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.
List of all assemblers with their mean genome fraction.
| Assembler | Prokaryotic single-end | Prokaryotic paired-end |
|---|---|---|
| ABySS | 69.8 | 66.3 |
| Velvet | 59.6 | 57.1 |
| Edena | 43.8 | 51.4 |
| SGA | — | 50.4 |
| Ray | 48.7 | 58.8 |
| SSAKE | 44.3 | 13.2 |
| Perga | 57.6 | 51.9 |
| Assembler | Eukaryotic single-end | Eukaryotic paired-end |
| ABySS | 85.4 | 82.4 |
| Velvet | 82.6 | 85.6 |
| Edena | 62.2 | 90.4 |
| Perga | 82.0 | 83.2 |
| SGA | — | 52.4 |
| SSAKE | 49.2 | 74.0 |
Figure 5.The comparison of mean genome fraction of each assembler for (A) paired-end and single-end prokaryotic data sets and (B) paired-end and single-end eukaryotic data sets.