| Literature DB >> 23587118 |
Ruibang Luo1, Binghang Liu, Yinlong Xie, Zhenyu Li, Weihua Huang, Jianying Yuan, Guangzhu He, Yanxiang Chen, Qi Pan, Yunjie Liu, Jingbo Tang, Gengxiong Wu, Hao Zhang, Yujian Shi, Yong Liu, Chang Yu, Bo Wang, Yao Lu, Changlei Han, David W Cheung, Siu-Ming Yiu, Shaoliang Peng, Zhu Xiaoqian, Guangming Liu, Xiangke Liao, Yingrui Li, Huanming Yang, Jian Wang, Tak-Wah Lam, Jun Wang.
Abstract
BACKGROUND: There is a rapidly increasing amount of de novo genome assembly using next-generation sequencing (NGS) short reads; however, several big challenges remain to be overcome in order for this to be efficient and accurate. SOAPdenovo has been successfully applied to assemble many published genomes, but it still needs improvement in continuity, accuracy and coverage, especially in repeat regions.Entities:
Year: 2012 PMID: 23587118 PMCID: PMC3626529 DOI: 10.1186/2047-217X-1-18
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Evaluation of Assemblathon1 dataset assemblies
| V1 | 207,783 | 13,357 | 329,384 | 13,539 | 14,306 | 5.40E-05 | 9.14E-03 | 98.8 | 46 | 7 |
| V1.05* | 343,889 | 82,264 | 1,684,436 | 116,651 | 1,878 | 1.20E-05 | 6.75E-03 | 98.8 | 20 | 8 |
| V2.0 | 357,238 | 111,365 | 15,077,357 | 170,432 | 1,414 | 4.25E-06 | 2.79E-03 | 98.8 | 20 | 10§ |
| ALLPATHS-LG* | 163,633 | 72,480 | 8,185,650 | 210,649 | 1,244 | 2.92E-06 | 6.71E-02 | 98.3 | 100 | 12 |
Contig and scaffold path NG50 were defined in Assemblathon1 [1].
*SOAPdenovo v1.05 and ALLPATHS-LG’s evaluation result data were from [1].
§Time spent on filtering contamination was not included.
Figure 1A comparison of the scaffold N10 to N90 between the assemblies based on the Assemblathon 1 dataset.
Assemblies of and
| SOAPdenovo1 | 79 | 148.6 | 156 | 23 | 49 | 342 | 0 | 342 | |
| | SOAPdenovo2 | 80 | 98.6 | 25 | 71.5 | 38 | 1,086 | 2 | 1,078 |
| | ALLPATHS-LG* | 37 | 149.7 | 13 | 117.6 | 10 | 1,477 | 1 | 1,093 |
| SOAPdenovo1 | 2,242 | 3.5 | 392 | 2.8 | 956 | 105 | 18 | 70 | |
| | SOAPdenovo2 | 721 | 18 | 106 | 14.1 | 333 | 2,549 | 4 | 2,540 |
| ALLPATHS-LG* | 190 | 41.9 | 31 | 36.7 | 32 | 3,191 | 0 | 3,310 | |
All datasets were downloaded from http://gage.cbcb.umd.edu/data/.
*ALLPATHS-LG was using the latest version 42807.
Assemblies of
| SOAPdenovo1 | 64,361 | 7.9 | 10.4 | 52,041 | 12 | 25 |
| SOAPdenovo2 | 12,550 | 75.7 | 91.1 | 5,084 | 1,352 | 1,596 |
| ALLPATHS-LG* | - | - | - | - | - | - |
*The published ALLPATHS-LG could not be used to assemble this genome because it requires at least one library with overlapping paired-end reads.
Summary of YH dataset assemblies
| SOAPdenovo YH old data | v1 | 25 | 2,837,024,602 | 455,380 | 2,327,931,678 | 4,933 | 80.51% | 48^ | 140 |
| SOAPdenovo YH new data | v1 | 31 | 2,901,125,426 | 5,806,495 | 2,661,982,498 | 12,709 | 81.16% | 58^ | 107 |
| | v2 Multi- | 45-61 | 2,905,148,690 | 22,297,138 | 2,799,723,051 | 20,926 | 93.91% | 74^ | 155 |
| | v2 Sparse | 35 | 2,874,598,201 | 18,033,622 | 2,767,141,367 | 18,856 | 93.17% | 78^ | 35 |
| | v2 Sparse & Multi- | 35-49 | 2,888,094,847 | 17,576,272 | 2,776,209,134 | 18,960 | 93.20% | 81^ | 35 |
| ALLPATHS-LG§ YH new data | 42807 | 96 | 2,809,141,261 | 16,195,684 | 2,600,792,533 | 31,101 | 88.53% | 249* | 343 |
To be consistence with the result of ALLPATHS-LG, contigs and scaffolds shorter than 1 kb were filtered for SOAPdenovo assemblies.
§ Without ‘FixLocal’ due to the module failure (see Additional file 1: Supplementary Method 7).
^ Time consumption including SOAPdenovo’s error correction, assembly and gap closure modules.
* Time consumption including ALLPATHS-LG’s preparation and assembly modules.
Figure 2A comparison of the scaffold N10 to N90 between the assemblies based on the new YH sequencing data.