| Literature DB >> 24930142 |
Sagar M Utturkar1, Dawn M Klingeman1, Miriam L Land1, Christopher W Schadt2, Mitchel J Doktycz2, Dale A Pelletier2, Steven D Brown2.
Abstract
MOTIVATION: To assess the potential of different types of sequence data combined with de novo and hybrid assembly approaches to improve existing draft genome sequences.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24930142 PMCID: PMC4173024 DOI: 10.1093/bioinformatics/btu391
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Summary of sequence data coverage
| NGS Technology | Illumina PE | Illumina MP | Roche 454 SE | PacBio |
|---|---|---|---|---|
| Avg. Read Length (bp) | 100 | 150 | 565 | 5456 |
| BT03 | 240x* | 24x | 15x | 18x |
| CF080 | 475x | 41x | 26x | 20x |
| GM41 | 520x | 46x | 24x | 32x |
| GM30 | 520x | 36x | 26x | NA |
Note: *x defines raw read coverage value.
Summary of de novo and hybrid assembly results
| Strain | Library type | No. of contigs | Maximum contig size (kb) | N50 (kb) | Genome size (Mb) | No. of scaffolds | Max Scaffold size (kb) | N50 (kb) | Genome size (Mb) | Software |
|---|---|---|---|---|---|---|---|---|---|---|
| CF080 | PE | 1039 | 335 | 75 | 7.54 | 897 | 631 | 383 | 7.56 | CLC |
| PE* | 90 | 694 | 237 | 8.20 | 69 | 646 | 331 | 7.20 | ABySS | |
| 454 | 71 | 1058 | 236 | 7.01 | — | — | — | — | Newbler | |
| Pacbio-454 | 102 | 799 | 187 | 7.06 | — | — | — | — | PBcR | |
| PE-454 | 57 | 1225 | 483 | 7.02 | — | — | — | — | Newbler | |
| PE-MP | 163 | 1413 | 597 | 7.12 | 103 | 4100 | 4100 | 7.21 | MaSuRCA | |
| PE-MP* | 40 | 1535 | 626 | 7.04 | 12 | 4813 | 4813 | 7.10 | ALLPATHSLG | |
| PE-MP-454 | 252 | 4095 | 4095 | 7.23 | 249 | 4095 | 4095 | 7.23 | MaSuRCA | |
| PE-MP-454* | 32 | 1341 | 615 | 7.01 | — | — | — | — | Newbler | |
| PE-MP-454-Pacbio | — | — | — | — | 6 | 4102 | 4102 | 7.04 | AHA | |
| PE-MP-Pacbio | 25 | 2395 | 1779 | 7.04 | 23 | 2395 | 1844 | 7.04 | SPAdes | |
| GM41 | PE | 164 | 308 | 75 | 6.61 | 89 | 599 | 137 | 6.64 | CLC |
| PE* | 101 | 436 | 165 | 6.64 | 96 | 679 | 183 | 6.64 | SPAdes | |
| 454 | 112 | 236 | 89 | 6.61 | — | — | — | — | Newbler | |
| Pacbio-454 | 80 | 371 | 140 | 6.79 | — | — | — | — | PBcR | |
| PE-454 | 96 | 345 | 143 | 6.63 | — | — | — | — | Newbler | |
| PE-MP | 157 | 621 | 279 | 6.70 | 117 | 2057 | 1560 | 6.71 | MaSuRCA | |
| PE-MP | 86 | 436 | 183 | 6.71 | 80 | 681 | 183 | 6.72 | SPAdes | |
| PE-MP* | 62 | 415 | 107 | 6.65 | 5 | 3919 | 3919 | 6.72 | ALLPATHS-LG | |
| PE-MP-454 | 66 | 345 | 159 | 6.62 | — | — | — | — | Newbler | |
| PE-MP-454-Pacbio | — | — | — | — | 17 | 1007 | 666 | 6.67 | AHA | |
| PE-MP-Pacbio | 73 | 653 | 292 | 6.68 | 68 | 1070 | 292 | 6.69 | SPAdes | |
| GM30 | PE | 180 | 184 | 59 | 6.14 | 55 | 567 | 227 | 6.17 | CLC |
| PE* | 61 | 662 | 186 | 6.15 | 52 | 662 | 208 | 6.16 | SPAdes | |
| 454 | 74 | 326 | 133 | 6.14 | — | — | — | — | Newbler | |
| PE-454 | 54 | 801 | 183 | 6.15 | — | — | — | — | Newbler | |
| PE-MP | 50 | 661 | 240 | 6.20 | 45 | 661 | 333 | 6.20 | SPAdes | |
| PE-MP* | 44 | 472 | 229 | 6.16 | 4 | 6208 | 6208 | 6.21 | ALLPATHSLG | |
| BT03 | PE | 690 | 155 | 29 | 10.64 | 422 | 295 | 63 | 10.77 | CLC |
| PE* | 397 | 363 | 80 | 10.82 | 386 | 363 | 85 | 10.83 | SPAdes | |
| 454 | 305 | 344 | 59 | 10.75 | — | — | — | — | Newbler | |
| Pacbio-454 | 235 | 565 | 99 | 11.40 | — | — | — | — | PBcR | |
| PE-454 | 315 | 344 | 70 | 10.82 | — | — | — | — | Newbler | |
| PE-MP | 806 | 240 | 59 | 10.95 | 457 | 1997 | 1161 | 11.04 | MaSuRCA | |
| PE-MP | 362 | 364 | 77 | 11.16 | 355 | 364 | 85 | 11.17 | SPAdes | |
Note: *Defines the optimal assembly statistics for particular combination of library types as assembled by more than one assembler. The best assembly is shown in bold.
The hybrid assembly statistics which were worse than the PE assemblies are not included in above table. The complete table of de novo and hybrid assemblies is available through Supplementary Table S3.
Fig. 1.Overview of 454 and Illumina hybrid assembly. Representation of shredding approach to generate 454 and Illumina hybrid assembly
Summary of PBJelly gap-filling results
| Description | BT03 | CF080 | GM41 | |
|---|---|---|---|---|
| Number of Gaps | 96 | 7 | 5 | |
| Total Gap Length (bp) | 195,912 | 2,880 | 3,475 | |
| Number of Gaps | 26 | 2 | 3 | |
| Total Gap Length (bp) | 70,100 | 30 | 232 |
Note: aGap statistics for the best scaffold assembly.
bGap statistics after application of PBJelly algorithm.
Comparison of Open Reading Frames (ORFs) predicted in draft and improved genome assemblies
| Strains | CF080 | BT03 | GM30 | GM41 |
|---|---|---|---|---|
| 6684 | 10 056 | 5511 | 5975 | |
| 5819 | 9385 | 5424 | 5881 | |
| No. of longer ORFs | 786 | 413 | 77 | 71 |
| No. of shorter ORFs | 64 | 205 | 10 | 15 |
| No. of new ORFs | 15 | 53 | 0 | 8 |
Note: aTotal number of open reading frames predicted in improved genome assembly by Prodigal gene calling algorithm.
bNumber of open reading frames in improved genome assemblies as compared with draft assemblies.
Fig. 2.Alignment of predicted CF080 rDNA operons tested via PCR and Sanger sequencing. The names of the operon denote corresponding assembly algorithm (ALLPATHS-LG is displayed as APLG) and contig ID. The alignment mismatches are highlighted in black and matches in grey. Identity of overlapping sequences is shown on top of the alignment as colored bar; positions with 100% identity are in green and positions with lower identity are in yellow. The annotation and the genomic position are shown on the consensus sequence