| Literature DB >> 24938749 |
Vitor C Piro, Helisson Faoro, Vinicius A Weiss, Maria B R Steffens, Fabio O Pedrosa, Emanuel M Souza, Roberto T Raittz1.
Abstract
BACKGROUND: The fast reduction of prices of DNA sequencing allowed rapid accumulation of genome data. However, the process of obtaining complete genome sequences is still very time consuming and labor demanding. In addition, data produced from various sequencing technologies or alternative assemblies remain underexplored to improve assembly of incomplete genome sequences.Entities:
Mesh:
Year: 2014 PMID: 24938749 PMCID: PMC4091766 DOI: 10.1186/1756-0500-7-371
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Overview of a gap handled by FGAP. Lower case characters represent the sequences aligned; diagonal lines represent the BLAST alignment.
assemblies
| Illumina(pe) + 454(se) [Draft] | 81 | 123 | 41(s)/32(c) | 4554392 | 172167 |
| 454(se) [Dataset] | 99 | 0 | 12407(c) | 6274970 | 531 |
| Illumina(se) [Dataset] | 81 | 0 | 564(c) | 4615235 | 63640 |
Datasets were assembled with single-end reads, generating only contigs; pe: paired-end; se: single-end; s: scaffolds; c: contigs.
Human chromosome assemblies
| ALLPATHS-LG [Draft] | 4307 | 418(s) | 87688255 | 81646936 |
| CABOG [Dataset] | 0 | 3541(c) | 86255201 | 46694 |
All data were obtained from GAGE evaluation [4]. s: scaffolds; c: contigs.
Software comparison in assembly
| Nº of gaps | 123 | 26 | 2 | 22 | 25 | 19 |
| Nº contigs (≥ 1000 bp) | 116 | 80 | 73 | 82 | 85 | 87 |
| Local misassemblies | 2 | 9 | 2 | 12 | 12 | 21 |
| Complete + partial genes | 4325 + 44 | 4377 + 34 | 4388 + 27 | 4375 + 35 | 4367 + 35 | 4389 + 67 |
| N50 | 66462 | 132608 | 172148 | 112396 | 132608 | 110934 |
| Inserted bases (bp) | - | 3133 | 6931 | 6140 | 3098 | 37217 |
| Execution time | - | 42 s | 2 m 55 s | 1 m 19 s | 19 m 23 s | 2 h 46 m 29 s |
The evaluation was performed by QUAST script v2.3 [16] (all metrics are in Additional file 1). The gene number was calculated based on a reference list with 4497 genes. *FGAP + Long stands for PacBio’s long reads used directly as datasets.
Software comparison in human chromosome 14 assembly
| Nº of gaps | 4307 | 2780 | 2799 | 3690 | 3840 |
| Nº contigs (≥ 1000 bp) | 4386 | 2880 | 2930 | 3796 | 3979 |
| Local misassemblies | 215 | 296 | 386 | 339 | 301 |
| Complete + partial genes | 1064 + 497 | 1141 + 423 | 1121 + 448 | 1093 + 468 | 1078 + 488 |
| N50 | 38359 | 61874 | 58014 | 45825 | 42385 |
| Inserted bases (bp) | - | 244379 | 1165698 | 421831 | 373900 |
| Execution time | - | 3 h 11 m | 1 h 10 m | 8 h 09 m | 50 h 45 m |
The evaluation was performed by QUAST script v2.3 [16] (all metrics are in Additional file 1). The gene number was calculated based on a reference list with 1655 genes.