| Literature DB >> 25593636 |
Mathias C Walter1,2, Caroline Öhrman3, Mats Forsman3, Dimitrios Frangoulidis4, Kerstin Myrtennäs3, Andreas Sjödin3, Mona Byström3, Pär Larsson3, Anna Macellaro3.
Abstract
We present the whole genome sequence and annotation of the Coxiella burnetii strain Namibia. This strain was isolated from an aborting goat in 1991 in Windhoek, Namibia. The plasmid type QpRS was confirmed in our work. Further genomic typing placed the strain into a unique genomic group. The genome sequence is 2,101,438 bp long and contains 1,979 protein-coding and 51 RNA genes, including one rRNA operon. To overcome the poor yield from cell culture systems, an additional DNA enrichment with whole genome amplification (WGA) methods was applied. We describe a bioinformatics pipeline for improved genome assembly including several filters with a special focus on WGA characteristics.Entities:
Keywords: Annotation; Assembly; Coxiella burnetii; Next generation sequencing (NGS); Q fever; Whole genome amplification; Whole genome sequencing
Year: 2014 PMID: 25593636 PMCID: PMC4286197 DOI: 10.1186/1944-3277-9-22
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Classification and general features of strain Namibia according to the MIGS recommendations[33]
| | Classification | Domain: | TAS [ |
| Phylum: | TAS [ | ||
| Class: | TAS [ | ||
| Order: | TAS [ | ||
| Family: | TAS [ | ||
| Genus: | TAS [ | ||
| Species: | TAS [ | ||
| Strain: Namibia | NAS | ||
| | Gram stain | Negative | TAS [ |
| | Cell shape | Coccobacillary rod | TAS [ |
| | Motility | None | TAS [ |
| | Sporulation | No* | TAS [ |
| | Temperature range | 35 – 37°C | TAS [ |
| | Optimum temperature | 37°C | TAS [ |
| | pH range; Optimum | 4.5-5.3; 4.5 | TAS [ |
| | Carbon source | Glutamate, citrate | TAS [ |
| MIGS-6 | Habitat | intracellular, polyhostal long persistence in the environment | TAS [ |
| MIGS-6.3 | Salinity | Unknown | NAS |
| MIGS-22 | Oxygen | Microaerophilic (2.5%) | TAS [ |
| MIGS-15 | Biotic relationship | Endosymbiont | NAS |
| MIGS-14 | Pathogenicity | highly pathogenic | TAS [ |
| MIGS-4 | Geographic location | Windhoek, Namibia | NAS |
| MIGS-5 | Sample collection | 1991 | NAS |
| MIGS-4.1 | Latitude | Unknown | NAS |
| MIGS-4.2 | Longitude | Unknown | NAS |
| MIGS-4.4 | Altitude | Unknown | NAS |
aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [45].
*A morphological distinct variant with enhanced stability in harsh environmental conditions is described (SCV = small cell variant).
Figure 1infected BGM cells displaying the typical intracellular vacuoles (400× Hoffman modulation contrast image; E. Schröpfer and D. Frangoulidis).
Figure 2Phylogenetic tree highlighting the position of strain Namibia (shown in bold) relative to the other strains with whole genome sequences available. The average linkage (UPGMA) tree was inferred from 5,010 aligned positions of conserved blocks (determined using Gblocks [27]) of the rRNA operon sequences using the Juxes & Cantor model, calculated with the R packages ape [28] and phangorn [29]. Bootstrap values (expressed as percentages of 1,000 replicates) are shown at branch points. The closest related species based on a BLAST [30] search against bacterial genomes of the National Center for Biotechnology Information (NCBI) non-redundant (nr) database [31] using the rRNA operon sequence is currently Thioalkalivibrio sulfidophilus HL-EbGr7 [32], a species commonly isolated from soda lakes, and was used as outgroup to root the tree.
Project information
| MIGS-31 | Finishing quality | Improved high-quality draft |
| MIGS-28 | Libraries used | Nextera DNA Sample Prep Kit |
| MIGS-29 | Sequencing platforms | Illumina MiSeq, 2x 150 paired-end |
| MIGS-31.2 | Fold coverage | 91x |
| MIGS-30 | Assemblers | SPAdes, IDBA |
| MIGS-32 | Gene calling method | Prodigal, GeneMarkS, Glimmer |
| | Locus Tag | CBNA |
| | NCBI Taxonomy ID | 1321945 |
| | Genbank ID | CP007555, CP007556 |
| | Genbank Date of Release | October 16, 2014 |
| | GOLD ID | Gi0055848 |
| | BIOPROJECT | PRJNA197124 |
| | Project relevance | Medical, bioforensic, evolution |
| MIGS-13 | Source Material Identifier | SAMN02045684 |
Summary of genome: one chromosome and one plasmid
| Chromosome | 2.06 | circular | CP007555 |
| Plasmid | 0.04 | circular | CP007556 |
Genome statistics
| Genome size (bp) | 2,101,438 | 100.00 |
| DNA coding (bp) | 1,788,283 | 82.88 |
| DNA G + C (bp) | 865,056 | 41.16 |
| DNA scaffolds | 2 | 100.00 |
| Total genes | 2,030 | 100.00 |
| Protein coding genes | 1,979 | 97.49 |
| RNA genes | 51 | 2.51 |
| Pseudo genes | 98 | 4.83 |
| Genes in internal clusters | 21 | 1.03 |
| Genes with function prediction | 1,309 | 66.14 |
| Genes assigned to COGs | 1,294 | 65.39 |
| Genes with Pfam domains | 1,424 | 71.96 |
| Genes with signal peptides | 263 | 13.29 |
| Genes with transmembrane helices | 473 | 23.90 |
| CRISPR repeats | NA |
aThe total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome.
Number of genes associated with general COG functional categories
| J | 135 | 6.82 | Translation, ribosomal structure and biogenesis |
| A | 1 | 0.05 | RNA processing and modification |
| K | 50 | 2.53 | Transcription |
| L | 90 | 4.55 | Replication, recombination and repair |
| B | 0 | 0.00 | Chromatin structure and dynamics |
| D | 28 | 1.41 | Cell cycle control, Cell division, chromosome partitioning |
| V | 23 | 1.16 | Defense mechanisms |
| T | 40 | 2.02 | Signal transduction mechanisms |
| M | 120 | 6.06 | Cell wall/membrane biogenesis |
| N | 12 | 0.61 | Cell motility |
| U | 37 | 1.87 | Intracellular trafficking and secretion |
| O | 59 | 2.98 | Posttranslational modification, protein turnover, chaperones |
| C | 89 | 4.50 | Energy production and conversion |
| G | 74 | 3.74 | Carbohydrate transport and metabolism |
| E | 105 | 5.31 | Amino acid transport and metabolism |
| F | 46 | 2.32 | Nucleotide transport and metabolism |
| H | 93 | 4.70 | Coenzyme transport and metabolism |
| I | 62 | 3.13 | Lipid transport and metabolism |
| P | 47 | 2.37 | Inorganic ion transport and metabolism |
| Q | 32 | 1.62 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 155 | 7.83 | General function prediction only |
| S | 104 | 5.26 | Function unknown |
| - | 685 | 34.61 | Not in COGs |
The total is based on the total number of protein coding genes in the genome.
Figure 3Graphical circular map of the chromosome. From outside to the center: Genes on forward strand (color by ‘with function prediction’ turquoise or hypothetical magenta), Genes on reverse strand (color scheme is the same as on forward strand), pseudogenes (blue), insertion elements (orange), gaps (gray), RNA genes (tRNAs green, rRNAs red), GC content, GC skew.
Figure 4Graphical circular map of the plasmid. From outside to the center: Genes on forward strand (color by ‘with function prediction’ turquoise or hypothetical magenta), Genes on reverse strand (color scheme is the same as on forward strand), pseudogenes (blue), GC content, GC skew.