| Literature DB >> 22958348 |
Tina Strobel1, Arwa Al-Dilaimi, Jochen Blom, Arne Gessner, Jörn Kalinowski, Marta Luzhetska, Alfred Pühler, Rafael Szczepanowski, Andreas Bechthold, Christian Rückert.
Abstract
BACKGROUND: The genus Saccharothrix is a representative of the family Pseudonocardiaceae, known to include producer strains of a wide variety of potent antibiotics. Saccharothrix espanaensis produces both saccharomicins A and B of the promising new class of heptadecaglycoside antibiotics, active against both bacteria and yeast.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22958348 PMCID: PMC3469384 DOI: 10.1186/1471-2164-13-465
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
General genome statistics and comparison of the completely sequenced
| Chromosome size | 10,236,715 | 8,248,144 | 7,096,571 | 8,212,805 | 9,360,653 | 4,308,349 | 4,189,976 |
| G + C content [%] | 71.30 | 73.71 | 73.31 | 71.29 | 72.19 | 67.32 | 72.43 |
| Chromosomal CDS | 9,228 | 7,100 | 6,495 | 7,198 | 8,427 | 3,906 | 3,596 |
| rRNA operons | 4 | 5 | 3 | 4 | 4 | 3 | 3 |
| tRNAs | 52 | 74 | 47 | 50 | 80 | 64 | 63 |
| Plasmids | 0 | 0 | 3 | 0 | 0 | 0 | 0 |
| Reference | [ | [ | [ | [ | this study | [ | [ |
Figure 1 Schematic representation of the genome. The genome scale is given in kilobases from the start of dnaA. The two outermost circles show all genes on the forward and the reverse strand, respectively, color-coded according to their predicted COG classes. The next five circles represent the genes S. espanaensis color-coded according to their conservation in the genomes of the other completely sequenced Pseudonocardiaceae. Green denotes genes present in the core genome, red those conserved at least in the two compared species and light blue indicates singletons. The comparison with S. espanaensis was done (from the outside in) with A. mirum, A. mediterranei, S. erythraea, S. viridis and T. bispora. The last two circles represent G + C content and G + C skew ((G-C)/(G + C)), both calculated for a 500 bp window with 100 bp stepping.
Number of genes associated with the general eggNOG functional categories
| J | 166 | 1,97 | Translation |
| A | 1 | 0,01 | RNA processing and modification |
| K | 475 | 5,64 | Transcription |
| L | 180 | 2,14 | Replication |
| B | 0 | 0,00 | Chromatin structure and dynamics |
| D | 21 | 0,25 | Cell cycle control, cell division, chromosome partitioning |
| Y | 0 | 0,00 | Nuclear structure |
| V | 112 | 1,33 | Defense mechanisms |
| T | 190 | 2,26 | Signal transduction mechanisms |
| M | 161 | 1,91 | Cell wall/membrane/envelope biogenesis |
| N | 0 | 0,00 | Cell motility |
| Z | 0 | 0,00 | Cytoskeleton |
| W | 0 | 0,00 | Extracellular structures |
| U | 27 | 0,32 | Intracellular trafficking, secretion, and vesicular transport |
| O | 142 | 1,69 | Posttranslational modification, secretion, and vesicular transport |
| C | 308 | 3,66 | Energy production and conversion |
| G | 213 | 2,53 | Carbohydrate transport and metabolism |
| E | 376 | 4,46 | Amino acid transport and metabolism |
| F | 86 | 1,02 | Nucleotide transport and metabolism |
| H | 163 | 1,93 | Coenzyme transport and metabolism |
| I | 167 | 1,98 | Lipid transport and metabolism |
| P | 242 | 2,87 | Inorganic ion transport and metabolism |
| Q | 171 | 2,03 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 449 | 5,33 | General function prediction only |
| S | 737 | 8,75 | Function unknown |
| - | 4037 | 47,92 | Not hit in eggNOG |
Figure 2 Phylogenetic distribution of proteins based on BlastP hits against the RefSeq database. The amino acid sequences of all predicted CDS in the genome of S. espanaensis were compared against the RefSeq protein database [22] (from August 2011) using BLASTP. The species for each best hit (e-value cutoff 1e-10, hit must cover at least 75% of query and subject) was retrieved and the results were plotted from the least to the most abundantly hit group in the respective taxonomic level. For reasons of clarity, groups with few hits were either lumped together (e.g. under "Other Bacteria") or omitted entirely.
Figure 3 Whole genome comparison of To analyze gene synteny, the amino acid sequences of all predicted CDS in the genome of S. espanaensis were compared against those of (A) A. mirum (red) and S. erythraea (green) as well as A. mediterranei (purple) and S. viridis (orange) using the bidirectional BLAST comparison implemented in EDGAR. Aligning all genomes at dnaA, the position of each potential ortholog was then plotted against the position in the S. espanaensis genome. In order to accommodate different genome sizes, the relative position is used for the target genomes.
Figure 4 Development of the core genome of the Using EDGAR, the development of the core genome of the Pseudonocardiaceae was extrapolated by calculating the mean core genome numbers for all possible permutations of genomes (red crosses/line). By non-linear least squares curve fitting, an exponential decay function (dark blue curve and equation) was fitted to the mean core data. A 95% confidence interval was calculated for the fitted model, and the boundaries are displayed (light blue and purple curves). Using the genomes of A. mirum, A. mediterranei, P. dioxanivorans, S. erythraea, S. espanaensis, S. viridis, and T. bispora, a final core genome of approximately 810 genes is predicted, with the current core of the seven analyzed species consisting of 864 genes.
Figure 5 Principle component analysis of the dinucleotide frequencies of the CDS. A) Using EDGAR, all CDS from S. espanaensis were divided into three groups: "core" (conserved in all six completely sequenced Pseudonocardiaceae; blue "*"), "other" (shared between S. espanaensis and at least one other Pseudonocardiaceae species; green "x") and singletons ("unique" in S. espanaensis; red " + "). For all genes the relative dinucleotide frequencies were calculated, a PCA was performed using the R package and the results for the two main components are plotted. In addition, the median values for all three distributions were calculated and plotted. (B) Using the same calculation as in A, the genes were divided in relation to their position in the genome relative to the origin of replication. Genes close to the oriC (corresponding to the "top half" of the genome) are given as red "x", genes closer to the terminus ("bottom half" of the genome) are depicted as green " + ". Median points are denoted as black "*" and " + ", green and black circles mark the 90% boundaries.
Figure 6 The saccharomicin gene cluster from (A) Chemical structures of caffeic acid, taurine, saccharomicin A and B. Fuc, d-fucose; Sac, d-saccharosamine; Rha, l-rhamnose; Eva, l-4-epivancosamine; Dig, l-digitoxose [6]. (B) Organization of the saccharomicin cluster. Proposed functions for individual CDS are summarized in Additional file 1.
Secondary metabolite cluster comparison of the completely sequenced
| 4 | 10 | 3 | 5 | 1 | ||
| 4 | 6 | 7 | 4 | 4 | ||
| 2 | 0 | 1 | 0 | 2 | ||
| 10 | 11 | 4 | 2 | 3 | ||
| 7 | 3 | 5 | 6 | 5 | ||
| 2 | 2 | 2 | 0 | 2 | ||
| 1 | 1 | 0 | 0 | 3 |
Figure 7 gene clusters for nonribosomal peptide and polyketide biosynthesis. Genes encoding nonribosomal peptide synthases are depicted in dark blue, type I polyketide synthases in red and type II polyketide synthases in orange. The genes involved in the synthesis of putative precursors are highlighted in light green. The remaining genes of the clusters are presented in pale blue. All genes involved in the biosynthesis of an enediyne core in cluster 6 are framed brown. aao, l-amino-acid oxidase; abc, ABC transporter; acc, acyl-CoA carboxylase; acd, acyl-CoA dehydrogenase; acp, acyl carrier protein; acs, acyl-CoA synthetase; act, acyl-CoA transferase; amo, amine oxidase; amt, aminotransferase; ap, aminopeptidase; ask, adenylylsulfate kinase; asl, AMP-dependent synthetase and ligase; ass, sulfate adenylyltransferase; at, acyl transferase; bh, beta-hydroxylase; cbs, carbamoyltransferase; cd, cysteine desulfurase; cho, cholesterol oxidase; cl, chlorinating protein; clf, chain length factor; ct, carboxyltransferase; cys, cysteine synthase; dbp, DNA-binding protein; dc, decarboxylase; dgb, glyoxalase/bleomycin resistance protein/dioxygenase; dh, dehydratase; dhbas, protein involved in the synthesis of activated 2,3-dihydroxybenzoic acid; dhg, dehydrogenase; e/l, esterase/lipase; eci, enoyl-CoA hydratase/isomerase; eff, efflux protein; gsit, glutamine—scyllo-inositol transaminase; gt, glycosyltransferase; hal, histidine ammonia-lyase; hmacps, protein involved in the synthesis of hydroxymalonyl-ACP; hmbppr, 4-hydroxy-3-methylbut-2-enyl diphosphate reductase; hmg, hydroxymethylglutaryl-CoA synthase; hpah, 4-hydroxyphenylacetate-3-hydroxylase; hyd, hydrolase; int, integrase; kr, ketoreductase; ks II, FabF-like protein; ks III, FabH-like protein; lam, lysine 2,3-aminomutase like protein; llp, lipolytic protein; lys, protein involved in lysine synthesis via alpha-aminoadipate; mfs, transporter of the major facilitator superfamily; mmcd, methylmalonyl-CoA decarboxylase; mo, monooxygenase; mt, methyltransferase; mtr, methionyl-tRNA synthetase; npd, 2-nitropropane dioxygenase; ocd, ornithine cyclodeaminase; oxy, oxidoreductase; p450, cytochrome P450; phas, polyhydroxy alkanoic acid synthase; pkc, polyketide cyclase; ppph, 2-polyprenylphenol 6-hydroxylase; pro, protease; reg, regulatory protein; rsam, radical SAM protein; sarp, streptomyces antibiotic regulatory protein; sip, siderophore-interacting protein; tcd, taurine catabolism dioxygenase; te, thioesterase; tetr, protein similar to the tetracycline repressor; tk, transketolase; tn, transposase.