| Literature DB >> 23547799 |
Veljo Kisand1, Teresa Lettieri.
Abstract
BACKGROUND: De novo genome sequencing of previously uncharacterized microorganisms has the potential to open up new frontiers in microbial genomics by providing insight into both functional capabilities and biodiversity. Until recently, Roche 454 pyrosequencing was the NGS method of choice for de novo assembly because it generates hundreds of thousands of long reads (<450 bps), which are presumed to aid in the analysis of uncharacterized genomes. The array of tools for processing NGS data are increasingly free and open source and are often adopted for both their high quality and role in promoting academic freedom.Entities:
Mesh:
Year: 2013 PMID: 23547799 PMCID: PMC3618134 DOI: 10.1186/1471-2164-14-211
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Reference mapping results of type strain SK2 using pyrosequencing reads and comparison with assembly (MIRA3)
| MOSAIK | 3 099 937 | | |
| | 20 206 | | |
| | | 1535 | |
| | | 2 | |
| | | 10 225 | |
| | | 7 733 | |
| NEWBLER | 3 104 799 | | |
| | 15 344 | | |
| | | 1 544 | |
| | | 0 | |
| | | 11 493 | |
| | | 10 346 | |
| MIRA3 | 3 119 125 | | |
| | 1 018 | | |
| | | 905 | |
| | | 2 | |
| | | 1 867 | |
| | | Deletions | 5 187 |
| 3 079 251 | | | |
| | 40 892 | | |
| | | 323 | |
| | | 9 814 | |
| | | 1 491 | |
| Deletions | 1 590 |
The difference between NC_008260 (3,120,143 bps) and the number of nucleotides covered corresponds to the total length of the genome not mapped. Disagreements are indicated as conflicting positions in the consensus sequence between reference genome and mapped reads. Ns – number of fully ambiguous nucleotides within mapped regions.
Figure 1Venn diagrams of matching (based on 100% similarity) CDS in re-sequenced SK2 genome when annotated by IMG, PGAAP and RAST and compared to re-annotated CDS of SK2 finished genome downloaded as a raw sequence from GenBank. a – annotations based on genome assembly using reference mapping by MIRA3; b – annotations based on de novo genome assembly by MIRA3. Numbers represent count of CDS annotated by different annotation pipelines: IMG, RAST and PGAAP, while SK2 denotes CDS from re-annotation of the NC_008260 in GenBank.
Figure 2Comparison between genomes sequences of NC_008260 in GenBank, and genome sequence of re-sequenced strain SK2, which was obtained after reference mapping or after de novo assembly. Numbered labels indicate exact identity in percentages.
Annotation of assemblies
| Strain | | | | | |
| SK2 | Total COGs | | 2050 | 2111 | 2113 |
| | Missing COGs | | 116 | 82 | 80 |
| | Proportion of missing COGs | | 8.1 | 5.8 | 5.6 |
| | Predicted genome size, Mbs | 3.4 | | | |
| 209 | Total COGs | | 1383 | 1449 | 1476 |
| | Missing COGs | | 116 | 80 | 66 |
| | Proportion of missing COGs | | 10.5 | 7.3 | 6.0 |
| | Predicted genome size, Mbs | 2.4 | | | |
| C103-3 | Total COGs | | 2123 | 2212 | 2256 |
| | Missing COGs | | 203 | 154 | 142 |
| | Proportion of missing COGs | | 14.8 | 11.3 | 10.4 |
| | Predicted genome size; Mbs | 4.2 | | | |
| 320 | Total COGs | | 2964 | 3079 | 3072 |
| | Missing COGs | | 132 | 95 | 97 |
| | Proportion of missing COGs | | 7.9 | 5.7 | 5.8 |
| Predicted genome size, Mbs | 4.8 |
SK2 – Alcanivorax borkumensis SK2; 209 – Flavobacterium sp. GOBB3-209; C103-3 – Flavobacterium sp. GOBB3-C103-3; 320 – Marinomonas sp. GOBB3-320. Genome size is given as Newbler default prediction. Missing COGs are COGs that are absent from the annotation compared to NC_008260 annotation in GenBank.