Literature DB >> 26388970

High quality draft genomic sequence of Flavihumibacter solisilvae 3-3(T).

Gang Zhou¹, Chong Chen¹, Che Ok Jeon², Gejiao Wang¹, Mingshun Li¹.

Abstract

Flavihumibacter solisilvae 3-3(T) (= KACC 17917(T) = JCM 19891(T)) represents a type strain of the genus Flavihumibacter within the family Chitinophagaceae. This strain can use various sole carbon sources, making it applicable in industry and bioremediation. In this study, the draft genomic information of F. solisilvae 3-3(T) is described. F. solisilvae 3-3(T) owns a genome size of 5.41 Mbp, 47 % GC content and a total of 4,698 genes, including 4,215 protein coding genes, 439 pseudo genes and 44 RNA encoding genes. Analysis of its genome reveals high correlation between the genotypes and the phenotypes.

Entities: Chemical Disease Mutation Species

Keywords: Flavihumibacter; Flavihumibacter solisilvae; Genomic information; Genotype; Phenotype

Year: 2015 PMID： 26388970 PMCID： PMC4575449 DOI： 10.1186/s40793-015-0037-6

Source DB: PubMed Journal: Stand Genomic Sci ISSN： 1944-3277

Introduction

The genus was established in 2010 [1] and comprises three recognized species, T41T [1], WS16T [2] and 3-3T [3], that were isolated from a subtropical rainforest soil, a shallow stream sediment and a forest soil, respectively. The members are Gram-positive, rod-shaped, strictly aerobic, non-motile, yellow-pigmented bacteria. The strains all contain phosphatidylethanolamine (as the major polar lipid, menaquinone-7 as the major respiratory quinine, iso-C15:0 and iso-C15:1 G as the principal fatty acids. In addition, the strains are oxidase- and catalase-positive and with a G + C content range of 45.9-49.5 mol% [1-3]. To the best of our knowledge, the genomic information of members still remains unknown. In this study, we present the draft genome information of 3-3T. A polyphasic taxonomic study revealed that 3-3T could utilize 33 kinds of sole carbon substrates, including 11 kinds of saccharides and 22 kinds of organic acids and amino acids [3]. Specially, this strain could utilize aromatic compound 4-hydroxyphenylacetic acid as a sole carbon source making it applicable environmental bioremediation [4-6]. In addition, this strain could utilize quinic acid as a sole carbon. Quinic acid is the substrate used to synthesize aromatic amino acids (phenylalanine, tyrosine and tryptophan) via the shikimate pathway. These aromatic amino acids are very useful as food additives, sweetener and pharmaceutical intermediates [6, 7]. The genome analysis of 3-3T will provide the genomic basis for better understanding these mechanisms and applying the strain to industries and bioremediation more efficiently.

Organism information

Classification and features

3-3T was isolated from forest soil of Bac Kan province in Vietnam [3]. The classification and features of 3-3T are shown in Table 1. A maximum-likelihood tree was constructed based on the 16S rRNA gene sequences using MEGA 5.0 [8]. The bootstrap values were calculated based on 1,000 replications and distances were calculated in accordance with Kimura’s two-parameter method [9]. The phylogenetic tree showed that 3-3T was clustered with the other members (Fig. 1).

Table 1

Classification and general features of F. solisilvae 3-3T according to the MIGS recommendations [22]

MIGS ID	Property	Term	Evidence code^a
	Classification	Domain Bacteria	TAS [23]
		Phylum Bacteroidetes	TAS [24, 25]
		Class Sphingobacteriia	TAS [25, 26]
		Order Sphingobacteriales	TAS [25, 27]
		Family Chitinophagaceae	TAS [28]
		Genus Flavihumibacter	TAS [1]
		Species Flavihumibacter solisilvae	TAS [3]
		Strain: 3-3^T	TAS [3]
	Gram stain	Positive	TAS [3]
	Cell shape	Rod-shaped	TAS [3]
	Motility	Non-motile	TAS [3]
	Sporulation	Non-sporulating	NAS
	Temperature range	20–37 °C	TAS [3]
	Optimum temperature	28 °C	TAS [3]
	pH range; Optimum	5.5–9.5; 7.5	TAS [3]
	Carbon source	Sucrose, D-glucose, D-galactose, lactose, N-acetyl-glucosamine, L-arabinose, D-maltose, glycerol, dextrin, D-melibiose, glucuronamide, succinic acid mono-methyl ester, L-aspartic acid, D-galacturonic acid, D-glucosaminic acid, D-glucuronic acid, malonic acid, β-hydroxybutyric acid, 4-hydroxyphenlyacetic acid, quinic acid, D-saccharic acid, bromosuccinic acid, succinamic acid, L-pyroglutamic acid, urocanic acid,γ-aminobutyric acid, L-alanine, L-serine, D-alanine, L-histidine, L-alanyl-glycine, L-alaninamide and L-asparagine.	TAS [3]
GS-6	Habitat	Forest soil	TAS [3]
MIGS-6.3	Salinity	0–0.5 % NaCl (w/v)	TAS [3]
MIGS-22	Oxygen requirement	Aerobic	TAS [3]
MIGS-15	Biotic relationship	Free-living	NAS
MIGS-14	Pathogenicity	Non-pathogenic	NAS
MIGS-4	Geographic location	Bac Kan Province, Viet Nam	TAS [3]
MIGS-5	Sample collection	2012	TAS [3]
MIGS-4.1	Latitude	Not reported
MIGS-4.2	Longitude	Not reported
MIGS-4.4	Altitude	Not reported

aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [29]

Fig. 1

A 16S rRNA gene based ML phylogenetic tree showing the phylogenetic position of F. solisilvae 3-3T. Bootstrap values (>50 %) based on 1,000 replications are shown at branch nodes. Bar, 1 substitutions per 100 nucleotide positions. Sphingobacterium alimentarium DSM 22362T is used as the outgroup. The GenBank accession numbers are shown in parentheses

Classification and general features of F. solisilvae 3-3T according to the MIGS recommendations [22] aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [29] A 16S rRNA gene based ML phylogenetic tree showing the phylogenetic position of F. solisilvae 3-3T. Bootstrap values (>50 %) based on 1,000 replications are shown at branch nodes. Bar, 1 substitutions per 100 nucleotide positions. Sphingobacterium alimentarium DSM 22362T is used as the outgroup. The GenBank accession numbers are shown in parentheses Cells of 3-3T (Fig. 2) are Gram-positive, aerobic, non-motile, and rod-shaped. Colony is yellow due to the production of flexirubin-type pigments [10]. 3-3T grows well on NA and R2A agar (optimum), but do not grow on LB or TSA agar [3]. It can hydrolyze aesulin, gelatin, casein and tyrosine [3]. 3-3T can also utilize various carbohydrate substrates (Table 1) and produces several glycosyl hydrolases, such as β-N-acetylhexosaminidase, α-galactosidase, β-glucosidase, β-galactosidase, α-fucosidase, α-mannosidase and α-glucosidase [3].

Fig. 2

A transmission electron micrograph of F. solisilvae 3-3T cell. The bar indicates 0.5 μm

A transmission electron micrograph of F. solisilvae 3-3T cell. The bar indicates 0.5 μm 3-3T contains iso-C15:0, iso-C15:1 G and summed feature 3 (C16:1ω6c/C16:1ω7c) as the principal fatty acids, MK-7 as the major respiratory quinine. The major polar lipids were PE, three unidentified aminolipids and three unidentified lipids [3].

Genome sequencing information

Genome project history

3-3T was selected for sequencing based on its taxonomic representativeness and the potential application in food industry and bioremediation. The genome of 3-3T was sequenced at Wuhan Bio-Broad Co., Ltd, Hubei, China. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession JSVC00000000. The version described in this paper is version JSVC00000000.1. A summary of the genomic sequencing project information is given in Table 2.

Table 2

Project information of F. solisilvae 3-3T

MIGS ID	Property	Term
MIGS-31	Finishing quality	High-quality draft
MIGS-28	Libraries used	Illumina Paired-End library(300 bp insert size)
MIGS-29	Sequencing platforms	Illumina Hiseq 2000
MIGS-31.2	Fold coverage	250 ×
MIGS-30	Assemblers	velvet v.1.2.10
MIGS-32	Gene calling method	GeneMarkS⁺
	Locus Tag	OI18
	Genbank ID	JSVC00000000
	Genbank Date of Release	2015-01-05
	BIOPROJECT	PRJNA265817
MIGS-13	Source Material Identifier	F. solisilvae 3-3^T (= KACC 17917^T = JCM 19891^T)
	Project relevance	Genome comparison

Project information of F. solisilvae 3-3T

Growth conditions and genomic DNA preparation

3-3T was grown aerobically in 50 ml R2A broth at 28 °C for 24 h with 160 r/min shaking. About 20 mg cells were harvested by centrifugation and suspended in normal saline, and then lysed using lysozyme. The DNA was obtained using the QiAamp kit according to the manufacturer’s instruction (Qiagen, Germany).

Genome sequencing and assembly

The genome of 3-3T was sequenced by Illumina Hiseq 2,000 technology [11] with Paired-End library strategy (300 bp insert size). TruSeq DNA Sample Preparation Kits are used to prepare DNA libraries with insert sizes from 300–500 bp for single, paired-end, and multiplexed sequencing. The protocol supports shearing by either sonication or nebulization of 1 μg of DNA [12]. The genome of 3-3T generated 7,041,525 reads totaling 1,422,388,050 bp data with an average coverage of 250 ×. Sequence assembly and quality assessment were performed using velvet v.1.2.10 [13] software. Finally, all reads were assembled into 75 contigs (> 200 bp) with a genome size of 5.41 Mbp.

Genome annotation

Genome annotation was performed through the NCBI Prokaryotic Genome Annotation Pipeline which combined the Best-Placed reference protein set and the gene caller GeneMarkS+. WebMGA-server [14] with E-value cutoff 1-e3 was used to assess the COGs. The translated predicted CDSs were also used to search against the Pfam protein families database [15]. TMHMM Server v.2.0 [16], SignalP 4.1 Server [17] and CRISPRfinder program [18] were used to predict transmenbrane helices, signal peptides and CRISPRs in the genome, respectively. The metabolic pathway analysis were constructed using the KEGG (Kyoto Encyclopedia of Genes and Genomes) [19].

Genome properties

The daft genome size of 3-3T is 5,410,659 bp with 47 % GC content and contains 75 contigs. From a total of 4,698 genes, 4,215 (89.72 %) genes are protein coding genes, 439 (9.34 %) are pseudo genes and 44 (0.94 %) are RNA encoding genes. The genome properties and statistics are shown in Table 3 and Fig. 3. Altogether, 3,137 (74.42 %) protein coding genes are distributed into COG functional categories (Table 4).

Table 3

Genome statistics of F. solisilvae 3-3T

Attribute	Value	% of total^a
Genome size (bp)	5,410,659	100.00
DNA coding (bp)	4,540,989	83.93
DNA G + C (bp)	2,543,035	47.00
DNA scaffolds	75	-
Total genes	4,698	100.00
Protein coding genes	4,215	89.72
RNA genes	44	0.94
Pseudo genes	439	9.34
Frameshifted Genes	9	0.19
Genes with function prediction	1,893	44.91
Genes assigned to COGs	3,137	74.42
Genes with Pfam domains	3,511	83.30
Genes with signal peptides	670	15.89
Genes with transmembrane helices	919	21.80
CRISPR repeats	1	-

aThe total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome

Fig. 3

A Graphical circular map of F. solisilvae 3-3T genome. From outside to center, ring 1, 4 show protein-coding genes colored by COG categories on forward or reverse strand; ring 2, 3 denote genes on forward or reverse strand; ring 5 shows G + C content plot, and the innermost ring represents GC skew

Table 4

Number of genes associated with general COG functional categories of F. solisilvae 3-3T genome

Code	Value	% age	Description
J	169	4.01	Translation, ribosomal structure and biogenesis
A	0	0.00	RNA processing and modification
K	278	6.60	Transcription
L	115	2.73	Replication, recombination and repair
B	2	0.05	Chromatin structure and dynamics
D	23	0.55	Cell cycle control, cell division, chromosome partitioning
V	93	2.21	Defense mechanisms
T	243	5.77	Signal transduction mechanisms
M	251	5.95	Cell wall/membrane/envelope biogenesis
N	9	0.21	Cell motility
U	58	1.38	Intracellular trafficking and secretion
O	110	2.61	Posttranslational modification, protein turnover, chaperones
C	206	4.89	Energy production and conversion
G	224	5.31	Carbohydrate transport and metabolism
E	239	5.67	Amino acid transport and metabolism
F	76	1.80	Nucleotide transport and metabolism
H	155	3.68	Coenzyme transport and metabolism
I	129	3.06	Lipid transport and metabolism
P	205	4.86	Inorganic ion transport and metabolism
Q	87	2.06	Secondary metabolites biosynthesis, transport and catabolism
R	466	11.06	General function prediction only
S	315	7.47	Function unknown
-	1078	25.58	Not in COGS

The total is based on the total number of protein coding genes in the genome

Genome statistics of F. solisilvae 3-3T aThe total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome A Graphical circular map of F. solisilvae 3-3T genome. From outside to center, ring 1, 4 show protein-coding genes colored by COG categories on forward or reverse strand; ring 2, 3 denote genes on forward or reverse strand; ring 5 shows G + C content plot, and the innermost ring represents GC skew Number of genes associated with general COG functional categories of F. solisilvae 3-3T genome The total is based on the total number of protein coding genes in the genome

Insights from the genome sequence

3-3T could grow on 33 kinds of sole carbon substrates including saccharides, organic acids and amino acids [3] (Table 1). Analysis of the genome reveals that this strain possesses putative enzymes for central carbohydrate metabolism to assimilate these carbon sources through different metabolic pathways [20]. The putative enzymes that responsible to the utilization of 20 sole carbons were found in the genome (Table 5). All key enzymes in the Embden-Meyerhof-Parnas pathway (glucokinase, pyruvate kinase and 6-phosphofructokinase) and TCA cycle are present in 3-3T. The key enzymes of Pentose Phosphate pathway (glucose-6-phosphate dehydrogenase, 6-phosphogluconolactonase and 6-phosphogluconate dehydrogenase) were also found.

Table 5

Putative enzymes responsible the utilization of different sole carbon sources in the genome of F. solisilvae 3-3T

Substrates	Enzymes	Accession no.
Sucrose	Alpha-glucosidase	KIC96373
D-maltose	Alpha-glucosidase	KIC96373
D-glucose	Glucokinase	KIC93940
D-galactose	Aldose epimerase	KIC96300
	Galactokinase	KIC95381
Lactose	Beta-galactosidase	KIC94337
Glycerol	Glycerol kinase	KIC93992
	Glycerol-3-phosphate dehydrogenase	KIC94583
N-acetyl-glucosamine	β-N-acetylhexosaminidase	KIC92674
L-arabinose	Arabinose isomerase	KIC96297
D-melibiose	Alpha-galactosidase	KIC96021
4-hydroxyphenlyacetic acid	4-hydroxyphenylpyruvate dioxygenase	KIC95062
	Homogentisate 1,2-dioxygenase	KIC93392
Quinic acid	3-dehydroquinate dehydratase	KIC93382
	Shikimate dehydrogenase	KIC92987
	Shikimate kinase	KIC93265
	3-phosphoshikimate 1-carboxyvinyltransferase	KIC94147
	Chorismate synthase	KIC94148
Urocanic acid	Urocanate hydratase	KIC93805
L-asparagine	L-asparaginase	KIC93060
L-aspartic acid	Aspartate ammonia-lyase	KIC93059
L-histidine	Histidine ammonia-lyase	KIC93735
L-serine	Serine dehydratase	KIC94326
L-alanine	Alanine dehydrogenase	KIC92870
D-alanine	D-alanine--D-alanine ligase	KIC93315
D-glucuronic acid	Glucuronate isomerase	KIC95816
D-galacturonic acid	Glucuronate isomerase	KIC95816

Putative enzymes responsible the utilization of different sole carbon sources in the genome of F. solisilvae 3-3T The presence of 4-hydroxyphenylpyruvate dioxygenase (KIC95062), homogentisate 1,2-dioxygenase (KIC93392) and other related enzymes suggests that 4-hydroxyphenylacetic acid is degradable via homogentisic acid pathway [21]. In addition, the presence of 3-dehydroquinate dehydratase (KIC93382), shikimate dehydrogenase (KIC92987), shikimate kinase (KIC93265), 3-phosphoshikimate 1-carboxyvinyltransferase (KIC94147) and chorismate synthase (KIC94148) indicates that 3-3T could probably utilize quinic acid to synthesize the three aromatic amino acids (tryptophan, tyrosine and phenylalanine) via shikimate pathway [7].

Conclusion

To the best of our knowledge, this report provides the first genomic information of the genus . Analysis of the genome shows high correlation between the genotypes and the phenotypes. The genome possesses many key proteins of central carbohydrate metabolism which provides the genomic basis to utilize the various carbon sources. In addition, analyzing its genome indicates that this strain has potential application for the production of aromatic amino acids and for environmental bioremediation.

22 in total

1. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes.

Authors: A Krogh; B Larsson; G von Heijne; E L Sonnhammer
Journal: J Mol Biol Date: 2001-01-19 Impact factor: 5.469

2. Proposed minimal standards for describing new taxa of the family Flavobacteriaceae and emended description of the family.

Authors: Jean-Francois Bernardet; Yasuyoshi Nakagawa; Barry Holmes
Journal: Int J Syst Evol Microbiol Date: 2002-05 Impact factor: 2.747

3. SignalP 4.0: discriminating signal peptides from transmembrane regions.

Authors: Thomas Nordahl Petersen; Søren Brunak; Gunnar von Heijne; Henrik Nielsen
Journal: Nat Methods Date: 2011-09-29 Impact factor: 28.547

4. Velvet: algorithms for de novo short read assembly using de Bruijn graphs.

Authors: Daniel R Zerbino; Ewan Birney
Journal: Genome Res Date: 2008-03-18 Impact factor: 9.043

5. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods.

Authors: Koichiro Tamura; Daniel Peterson; Nicholas Peterson; Glen Stecher; Masatoshi Nei; Sudhir Kumar
Journal: Mol Biol Evol Date: 2011-05-04 Impact factor: 16.240

6. Hydrotalea flava gen. nov., sp. nov., a new member of the phylum Bacteroidetes and allocation of the genera Chitinophaga, Sediminibacterium, Lacibacter, Flavihumibacter, Flavisolibacter, Niabella, Niastella, Segetibacter, Parasegetibacter, Terrimonas, Ferruginibacter, Filimonas and Hydrotalea to the family Chitinophagaceae fam. nov.

Authors: P Kämpfer; N Lodders; E Falsen
Journal: Int J Syst Evol Microbiol Date: 2010-04-09 Impact factor: 2.747

7. Production of aromatic compounds by metabolically engineered Escherichia coli with an expanded shikimate pathway.

Authors: Daisuke Koma; Hayato Yamanaka; Kunihiko Moriyoshi; Takashi Ohmoto; Kiyofumi Sakai
Journal: Appl Environ Microbiol Date: 2012-06-29 Impact factor: 4.792

8. Catabolism of 3- and 4-hydroxyphenylacetic acid by Klebsiella pneumoniae.

Authors: M Martín; A Gibello; J Fernández; E Ferrer; A Garrido-Pertierra
Journal: J Gen Microbiol Date: 1991-03

9. The minimum information about a genome sequence (MIGS) specification.

Authors: Dawn Field; George Garrity; Tanya Gray; Norman Morrison; Jeremy Selengut; Peter Sterk; Tatiana Tatusova; Nicholas Thomson; Michael J Allen; Samuel V Angiuoli; Michael Ashburner; Nelson Axelrod; Sandra Baldauf; Stuart Ballard; Jeffrey Boore; Guy Cochrane; James Cole; Peter Dawyndt; Paul De Vos; Claude DePamphilis; Robert Edwards; Nadeem Faruque; Robert Feldman; Jack Gilbert; Paul Gilna; Frank Oliver Glöckner; Philip Goldstein; Robert Guralnick; Dan Haft; David Hancock; Henning Hermjakob; Christiane Hertz-Fowler; Phil Hugenholtz; Ian Joint; Leonid Kagan; Matthew Kane; Jessie Kennedy; George Kowalchuk; Renzo Kottmann; Eugene Kolker; Saul Kravitz; Nikos Kyrpides; Jim Leebens-Mack; Suzanna E Lewis; Kelvin Li; Allyson L Lister; Phillip Lord; Natalia Maltsev; Victor Markowitz; Jennifer Martiny; Barbara Methe; Ilene Mizrachi; Richard Moxon; Karen Nelson; Julian Parkhill; Lita Proctor; Owen White; Susanna-Assunta Sansone; Andrew Spiers; Robert Stevens; Paul Swift; Chris Taylor; Yoshio Tateno; Adrian Tett; Sarah Turner; David Ussery; Bob Vaughan; Naomi Ward; Trish Whetzel; Ingio San Gil; Gareth Wilson; Anil Wipat
Journal: Nat Biotechnol Date: 2008-05 Impact factor: 54.908

10. Comparison of environmental and isolate Sulfobacillus genomes reveals diverse carbon, sulfur, nitrogen, and hydrogen metabolisms.

Authors: Nicholas B Justice; Anders Norman; Christopher T Brown; Andrea Singh; Brian C Thomas; Jillian F Banfield
Journal: BMC Genomics Date: 2014-12-15 Impact factor: 3.969