| Literature DB >> 29299108 |
Yue-Hong Wu1, Hong Cheng1, Ying-Yi Huo1, Lin Xu1, Qian Liu1, Chun-Sheng Wang1, Xue-Wei Xu1.
Abstract
Croceicoccus marinus E4A9Twas isolated from deep-sea sediment collected from the East Pacific polymetallic nodule area. The strain is able to produce esterase, which is widely used in the food, perfume, cosmetic, chemical, agricultural and pharmaceutical industries. Here we describe the characteristics of strain E4A9, including the genome sequence and annotation, presence of esterases, and metabolic pathways of the organism. The genome of strain E4A9T comprises 4,109,188 bp, with one chromosome (3,001,363 bp) and two large circular plasmids (761,621 bp and 346,204 bp, respectively). Complete genome contains 3653 coding sequences, 48 tRNAs, two operons of 16S-23S-5S rRNA gene and three ncRNAs. Strain E4A9T encodes 10 genes related to esterase, and three of the esterases (E3, E6 and E10) was successfully cloned and expressed in Escherichia coli Rosetta in a soluble form, revealing its potential application in biotechnological industry. Moreover, the genome provides clues of metabolic pathways of strain E4A9T, reflecting its adaptations to the ambient environment. The genome sequence of C. marinus E4A9T now provides the fundamental information for future studies.Entities:
Keywords: Alphaproteobacteria; Croceicoccus marinus E4A9T; Esterase; Genome sequence
Year: 2017 PMID: 29299108 PMCID: PMC5740743 DOI: 10.1186/s40793-017-0300-0
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Fig. 1Transmission electron microscopy showing the cell morphology (a) and ultrastructure (b) of Croceicoccus marinus E4A9T. The flagella are present. Bars represent scales of 0.5 μm (a) and 0.2 μm (b), respectively
Fig. 2Phylogenetic tree based on 16S rRNA gene sequences was constructed by neighbor-joining algorithms. Related sequences were aligned with Clustal W. Evolutionary distances were calculated according to the algorithm of the Kimura two-parameter model. Bootstrap values (> 60%) based on 1000 replications are shown at branch nodes. Filled circles indicate that the corresponding nodes were also recovered in the trees generated with the maximum-likelihood and maximum-parsimony algorithms. Bar, 0.01 substitutions per nucleotide position
Classification and general features of Croceicoccus marinus E4A9T according to the MIGS recommendations [30]
| MIGS ID | Property | Term | Evidence codea |
|---|---|---|---|
| Classification | Domain | TAS [ | |
| Phylum | TAS [ | ||
| Class | TAS [ | ||
| Order | TAS [ | ||
| Family | TAS [ | ||
| Genus | TAS [ | ||
| Species | TAS [ | ||
| Gram stain | Negative | TAS [ | |
| Cell shape | Coccus | TAS [ | |
| Motility | Motile | TAS [ | |
| Sporulation | Non-sporulation | TAS [ | |
| Temperature range | 4–42 °C | TAS [ | |
| Optimum temperature | 28–30 °C | TAS [ | |
| pH range; Optimum | 6.0–9.0; 7.0 | TAS [ | |
| Carbon source | Organic carbon | TAS [ | |
| MIGS-6 | Habitat | Deep-sea sediment | TAS [ |
| MIGS-6.3 | Salinity | Moderately halophilic, 0.5–10% NaCl | TAS [ |
| MIGS-22 | Oxygen requirement | Aerobic | TAS [ |
| MIGS-15 | Biotic relationship | Free-living | TAS [ |
| MIGS-14 | Pathogenicity | Non-pathogen | NAS |
| MIGS-4 | Geographic location | East Pacific polymetallic nodule area | TAS [ |
| MIGS-5 | Sample collection | Not reported | |
| MIGS-4.1 | Latitude | 8°22′38” N | TAS [ |
| MIGS-4.2 | Longitude | 145°23′56” W | TAS [ |
| MIGS-4.4 | Altitude | −5280 m | TAS [ |
aEvidence codes - IDA Inferred from Direct Assay, TAS Traceable Author Statement (i.e., a direct report exists in the literature), NAS Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [32]
Genome sequencing project information
| MIGS ID | Property | Term |
|---|---|---|
| MIGS 31 | Finishing quality | Finished |
| MIGS-28 | Libraries used | 10 kb |
| MIGS 29 | Sequencing platforms | A PacBio RS II platform |
| MIGS 31.2 | Fold coverage | 248-fold |
| MIGS 30 | Assemblers | HGAP Assembly version 2, Pacific Biosciences |
| MIGS 32 | Gene calling method | GeneMarkS+ (NCBI) |
| Locus Tag | A9D14 | |
| Genbank ID | CP019602, CP019603, and CP019604 | |
| GenBank Date of Release | June 13, 2017 | |
| GOLD ID | Go0030822 | |
| BIOPROJECT | PRJNA322659 | |
| MIGS 13 | Source Material Identifier | CGMCC(China General Microbiological Culture Collection) |
| Project relevance | Esterases production |
Summary of genome: one chromosome and two plasmids
| Label | Size (Mb) | Topology | INSDC identifier | RefSeq ID |
|---|---|---|---|---|
| Chromosome | 3.001363 | Linear | CP019602.1 | NZ_CP019602.1 |
| Plasmid 1 (pCME4A9I) | 0.761621 | Linear | CP019603.1 | NZ_CP019603.1 |
| Plasmid 2 (pCME4A9II) | 0.346204 | Linear | CP019604.1 | NZ_CP019604.1 |
Fig. 3Circular map of the chromosome (a), plasmid pCME4A9I (b) and plasmid pCME4A9II (c). From outside to the center: CDSs and RNA genes on the forward strand (colored by COG categories), CDSs and RNA genes on the reverse strand (colored by COG categories), G + C content (peaks out/inside the circle indicate values higher or lower than the average G + C content, respectively), GC skew (calculated as (G-C)/(G + C), green/purple peaks out/inside the circle indicate values higher or lower than 1, respectively), genome size
Genome statistics of 10.1601/nm.14629 E4A9T
| Attribute | Value | % of Total |
|---|---|---|
| Genome size (bp) | 4,109,188 | 100 |
| DNA coding (bp) | 3,565,753 | 86.78 |
| DNA G + C (bp) | 2,650,881 | 64.51 |
| DNA scaffolds | 3 | – |
| Total genes | 3842 | 100 |
| Protein coding genes | 3653 | 95.08 |
| RNA genes | 57 | 1.48 |
| Pseudo genes | 132 | 3.47 |
| Genes in internal clusters | 517 | 13.46 |
| Genes with function prediction | 2699 | 70.25 |
| Genes assigned to COGs | 2827 | 73.58 |
| Genes with Pfam domains | 1566 | 40.76 |
| Genes with signal peptides | 304 | 7.91 |
| Genes with transmembrane helices | 755 | 19.65 |
| CRISPR repeats | 1 | 0.03 |
Number of genes associated with general COG functional categories
| Code | Value | %agea | Description |
|---|---|---|---|
| J | 156 | 4.73 | Translation, ribosomal structure and biogenesis |
| A | – | – | RNA processing and modification |
| K | 190 | 5.76 | Transcription |
| L | 212 | 6.43 | Replication, recombination and repair |
| B | 1 | 0.03 | Chromatin structure and dynamics |
| D | 30 | 0.91 | Cell cycle control, Cell division, chromosome partitioning |
| V | 46 | 1.40 | Defense mechanisms |
| T | 168 | 5.10 | Signal transduction mechanisms |
| M | 193 | 5.86 | Cell wall/membrane biogenesis |
| N | 44 | 1.33 | Cell motility |
| U | 101 | 3.06 | Intracellular trafficking and secretion |
| O | 124 | 3.76 | Posttranslational modification, protein turnover, chaperones |
| C | 228 | 6.92 | Energy production and conversion |
| G | 187 | 5.67 | Carbohydrate transport and metabolism |
| E | 220 | 6.67 | Amino acid transport and metabolism |
| F | 64 | 1.94 | Nucleotide transport and metabolism |
| H | 146 | 4.43 | Coenzyme transport and metabolism |
| I | 199 | 6.04 | Lipid transport and metabolism |
| P | 174 | 5.28 | Inorganic ion transport and metabolism |
| Q | 111 | 3.37 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 413 | 12.53 | General function prediction only |
| S | 289 | 8.77 | Function unknown |
| – | 770 | 23.36 | Not in COGs |
aThe total is based on the total number of protein coding genes in the genome
Fig. 4Maximum-likelihood phylogenetic tree based on esterases amino acid sequences. Bootstrap values (>60%) based on 1000 replications are shown at branch nodes