Literature DB >> 28116105

Paenibacillus phocaensis sp. nov., isolated from the gut microbiota of a healthy infant.

M Tidjani Alou¹, J Rathored¹, T T Nguyen¹, C Andrieu¹, C Couderc¹, S Brah², B A Diallo³, P-E Fournier¹, D Raoult⁴, G Dubourg¹.

Abstract

Paenibacillus phocaensis sp. nov. strain mt24T (= CSUR P2238 = DSM 101777) is a Gram-negative, facultative anaerobic, spore-forming and motile bacilli. This strain was isolated from the stool sample of a healthy infant from Niger. Its genome was estimated to a size of 5 521 415 bp with a 53.54% GC content. It contains 4835 protein-coding genes and 89 RNAs, among which two were 16S rRNA genes. There were also 101 genes (2.09%) identified as ORFans.

Entities: Chemical Disease Mutation Species

Keywords: Culturomics; Paenibacillus phocaensis; genome; gut microbiota; taxonogenomics

Year: 2016 PMID： 28116105 PMCID： PMC5233791 DOI： 10.1016/j.nmni.2016.12.001

Source DB: PubMed Journal: New Microbes New Infect ISSN： 2052-2975

The human gut microbiota is a vast and complex ecosystem that contributes greatly to the human body’s health. Among other functions, its involvement in the digestive process, the immune system and the production of some micronutrients has been proven [1]. Several pathologies like obesity, inflammatory bowel disease, irritable bowel syndrome and severe acute malnutrition have been associated with a disruption of the gut microbiota [1], [2]. Its establishment begins at the birth when the newborn is exposed to the mother’s vaginal microbiota [3]. Great variability has been shown in the composition of the gut microbiota of healthy children [4], [5], but it stabilizes and reaches maturity around the age of 2 or 3 years [1]. In order to study the composition of the gut microbiota, culture and metagenomics approaches have been used, although metagenomics have been the method of the choice in the last decade [6]. Recently, a wide-array culture method, the microbial culturomics strategy, has been developed in our laboratory. It consists in the multiplication of culture conditions with a variation of culture media and physicochemical parameters like temperature, pH and atmosphere in order to characterize a bacterial ecosystem as exhaustively as possible [7]. This method was used to characterize by a culture approach the gut microbiota of healthy African children. A new member of the Paenibacillus genus was thus isolated. This genus was created by Ash et al. in 1994 [8] and consists of 174 species [9]. The Paenibacillus genus consists of Gram-variable, spore-forming, motile bacilli. These aerobic or facultative anaerobic species have a G+C content ranging from 40 to 59%. Its type species is Paenibacillus polymyxa. Paenibacillus species have not been associated with pathogenicity in humans; nevertheless, Paenibacillus larvae, Paenibacillus lentimorbus and Paenibacillus popilliae are insect pathogens [8]. Bacterial classification is currently based on phenotypic and genotypic characteristics [10], [11], [12]. The decreasing cost of genome sequencing has allowed more insights into the properties and capabilities of a species. A new concept of bacterial description, called taxonogenomics, was developed in our laboratory associating phenotypic characteristics, a proteomic description of the strain with the matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) profile and a presentation of the complete sequence of the annotated genome of the described strain [13]. Using this concept, we describe here a new member of the Paenibacillus genus called Paenibacillus phocaensis (= CSUR P2238 = DSM 101777), which is the 13th previously unknown Paenibacillus isolated as a part of a culturomics study.

Materials and Methods

Sample information

A stool sample was collected from a 13-month-old girl in Niamey, Niger. A weight-for-height z score of −0.95 was estimated for this healthy girl, with no growth stunting as showed by the calculated height-for-age z score of −0.88. No antibiotics were being administered at the time of sample collection. Upon arrival in Marseille where the study was conducted, the stool sample was stored at −80°C. Consent was obtained from the girl’s parent, and approval for this study was provided by the Institut Fédératif de Recherche 48 (Faculty of Medicine, Marseille, France) under agreement 09-022.

Strain identification by MALDI-TOF MS and 16S rRNA sequencing

The bacterial diversity of the stool sample was characterized using the 18 culture conditions of standardized culturomics [14]. For each culture condition, a liquid preincubation of the sample was performed, and tenfold serial dilutions of this culture were seeded on 5% sheep’s blood–enriched Colombia agar every 3 days for 30 days (bioMérieux, Marcy l’Étoile, France). Colonies were purified and identified using MALDI-TOF MS as previously described [15], [16]. The MALDI-TOF MS analysis was carried out using a Microflex Spectrometer (Bruker Daltonics, Leipzig, Germany) with a MTP 96 MALDI-TOF target plate (Bruker). Each colony was tested in duplicate, and the obtained spectra were imported into MALDI BioTyper 2.0 software (Bruker). The spectra were then compared by standard pattern matching (with default parameter settings) to the 7567 references contained in our database, which consists of the Bruker database incremented with data from species that were not already present in the database. A strain is considered identified at the species level for an identification score of ≥1.9. Between identification scores of 1.7 to 1.9, the strain is identified at the genus level. A score of <1.7 does not allow any identification. The 16S rRNA gene was sequenced in order to obtain an identification as previously described [17].

Growth conditions

Different atmospheres, temperature, pH and NaCl concentrations were tested in order to determine the ideal growth condition of strain mt24T. Anaerobic and microaerophilic atmospheres were tested using respectively GENbag anaer and GENbag miroaer systems (bioMérieux). Aerobic growth was tested with or without 5% CO2. The following temperatures were tested in each atmosphere: 25, 28, 37, 45 and 56°C. Growth was also tested on Colombia agar with different pH (6, 6.5, 7 and 8.5) and on Colombia agar supplement with various NaCl concentration (0.5, 1, 5, 7.5 and 10%).

Morphologic, biochemical and antibiotics susceptibility tests

Phenotypic characteristics like Gram staining, oxidase, catalase, motility and sporulation were determined as described previously [14]. Morphologic observations were also carried out by performing negative staining. Detection formvar-coated grids were deposited on a 40 μL bacterial suspension drop and incubated at 37°C for 30 minutes. Then followed a 10-second incubation on ammonium molybdate 1%. The grids were dried on blotting paper and observed with a Tecnai G20 transmission electron microscope (FEI Company, Limeil-Brevannes, France). In order to determine the metabolic features of strain mt24T, API strips 50CH, ZYM and 20 NE were used according to the manufacturer’s instructions (bioMérieux). Starch hydrolysis was determined using the starch agar test. Colonies were grown on a starch agar plate; upon drenching the colonies with an iodine solution, circles of hydrolysis appear when amylase was produced by the tested strain as previously described [18]. Gelatin hydrolysis was tested according to the gelatin stab method [19]. A medium with gelatin as a nutrient and a solidifying agent was prepared, and the strain was inoculated in the appropriate condition until optimal growth. After incubation, in order to obtain growth, the tubs containing the liquefied medium and the grown strain are incubated at 4°C for a few hours. Gelatin hydrolysis induces an elimination the solidifying agent in the medium. Consequently gelatinase production induces a permanent liquefied state of the medium; when a solidification of the medium occurs upon cooling, gelatinase is not produced by the tested strain. Cellular fatty acid methyl ester (FAME) analysis was performed by gas chromatography/mass spectrometry (GC/MS). Two samples were prepared, with approximately 50 mg of bacterial biomass each harvested from several culture plates. FAMEs were prepared as described by Sasser [20]. GC/MS analyses were carried out as previously described [21]. Briefly, FAMEs were separated using an Elite 5-MS column and monitored by mass spectrometry (Clarus 500-SQ 8 S, PerkinElmer, Courtaboeuf, France). Spectral database search was performed using MS Search 2.0 operated with the Standard Reference Database 1A (National Institute of Standards and Technology, Gaithersburg, MD, USA) and the FAMEs mass spectral database (Wiley, Chichester, UK). Antibiotic susceptibility was determined using the disk diffusion method and according to the European Committee on Antimicrobial Susceptibility Testing 2015 recommendations [22].

Genomic DNA preparation

Strain mt24T was cultured on 5% sheep’s blood–enriched Columbia agar (bioMérieux) at 37°C aerobically. Bacteria grown on three petri dishes were resuspended in 4 × 100 μL Tris-EDTA (TE) buffer. Then 200 μL of this suspension was diluted in 1 mL TE buffer for lysis treatment that included a 30-minute incubation with 2.5 μg/μL lysozyme at 37°C, followed by an overnight incubation with 20 μg/μL proteinase K at 37°C. Extracted DNA was then purified using three successive phenol–chloroform extractions and ethanol precipitations at −20°C overnight. After centrifugation, the DNA was resuspended in 160 μL TE buffer.

Genome sequencing and assembly

Genomic DNA (gDNA) of strain mt24T was sequenced on the MiSeq Technology (Illumina, San Diego, CA, USA) with the mate pair strategy. The gDNA was barcoded in order to be mixed with 11 other projects with the Nextera Mate Pair sample prep kit (Illumina). gDNA was quantified by a Qubit assay with the high sensitivity kit (Thermo Fisher Scientific, Waltham, MA, USA) to 66.2 ng/μL. The mate pair library was prepared with 1 μg of gDNA using the Nextera mate pair Illumina guide. The gDNA sample was simultaneously fragmented and tagged with a mate pair junction adapter. The pattern of the fragmentation was validated on an Agilent 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA) with a DNA 7500 labchip. The DNA fragments ranged in size from 1 to 11 kb, with an optimal size at 3.927 kb. No size selection was performed, and 505 ng of tagmented fragments were circularized. The circularized DNA was mechanically sheared to small fragments with optimum at 597 bp on the Covaris device S2 in microtubes (Covaris, Woburn, MA, USA). The library profile was visualized on a High Sensitivity Bioanalyzer LabChip (Agilent Technologies), and the final concentration library was measured at 59.2 nmol/L. The libraries were normalized at 2 nM and pooled. After a denaturation step and dilution at 15 pM, the pool of libraries was loaded onto the reagent cartridge and then onto the instrument along with the flow cell. An automated cluster generation and sequencing run was performed in a single 39-hour run at a 2 × 251 bp read length.

Genome annotation and analysis

Open reading frames (ORFs) were predicted using Prodigal [23] with default parameters, but the predicted ORFs were excluded if they spanned a sequencing gap region. The predicted bacterial protein sequences were searched against the GenBank [24] and the Clusters of Orthologous Groups (COGs) databases using BLASTP (E value 1e-03, coverage 0.7 and identity percentage 30%). If no hit was found, it was searched against the NR database using BLASTP with an E value of 1e-03, a coverage of 0.7 and identity percentage of 30%, and if the sequence length was smaller than 80 aa, we used an E value of 1e-05. The tRNAScanSE tool [25] was used to find tRNA genes, whereas ribosomal RNAs were found by using RNAmmer [26]. Lipoprotein signal peptides and the number of transmembrane helices were predicted using Phobius [27]. Mobile genetic elements were predicted using PHAST [28] and RAST [29]. ORFans were identified if all the BLASTP performed did not give positive results (E value smaller than 1e-03 for ORFs with sequence size larger than 80 aa or E value smaller than 1e-05 for ORFs with sequence length smaller 80 aa). Such parameter thresholds have already been used in previous studies to define ORFans. Artemis [30] and DNA Plotter [31] were used for data management and visualization of genomic features, respectively. The Mauve alignment tool (version 2.3.1) was used for multiple genomic sequence alignment [32]. Comparator species for genomic comparison were identified in the 16S RNA tree using PhyloPattern software [33]. The genome of strain mt24T was compared to those of Paenibacillus barengoltzii strain NBRC 101216, Paenibacillus macerans strain ZY18, Paenibacillus polymyxa strain CF05, Paenibacillus sanguinis strain 2301083, Paenibacillus sabinae strain T27 and Paenibacillus borealis strain DSM 13188. For each selected genome, the complete genome sequence, proteome genome sequence and ORFeome genome sequence were retrieved from the FTP site of the National Center for Biotechnology Information. An annotation of the entire proteome was performed to define the distribution of functional classes of predicted genes according to the clusters of orthologous groups of proteins (using the same method as for the genome annotation). Annotation and comparison processes were performed in the Multi-Agent software system DAGOBAH [34] that include Figenix [35] libraries that provide pipeline analysis. To evaluate the genomic similarity between studied genomes, we determined two parameters, digital DNA-DNA hybridization (DDH) that exhibits a high correlation with DDH [36], [37] and average genomic identity of orthologous gene sequences (AGIOS) [38], which was designed to be independent from DDH [38]. The AGIOS score [38] is the mean value of nucleotide similarity between all couples of orthologous proteins between the two studied genomes [38].

Results

Strain identification and phylogenetic analyses

Strain mt24T was isolated after 15 days’ preincubation in aerobic marine broth at 37°C and seeding on 5% sheep’s blood–enriched Colombia agar also in aerobic condition at 37°C. A score under 1.7 was obtained for strain mt24T after the MALDI-TOF MS analysis and the 16S rRNA gene was sequenced. The sequence available under accession number LN998053 showed a similarity level of 98.2% with Paenibacillus timonensis (Fig. 1). This similarity level under the 98.65% threshold defined by Stackebrandt and Ebers [39] to delineate a new species and later confirmed by a recent study [40] allowed us to create a new species within the genus Paenibacillus called Paenibacillus phocaensis the general characteristics of which are presented in Table 1. The reference spectra for Paenibacillus phocaensis (Fig. 2) was incremented in our database. The spectrum of P. phocaensis was also compared to other Paenibacillus species (Fig. 3).

Fig. 1

Phylogenetic tree highlighting position of Paenibacillus phocaensis strain mt24T (= CSUR P2238 = DSM 101777) relative to other close strains. Respective GenBank accession numbers for 16S rRNA genes are indicated in parentheses. Sequences were aligned using CLUSTALW, and phylogenetic inferences were obtained using maximum-likelihood method within MEGA6 software. Numbers at nodes are percentages of bootstrap values obtained by repeating analysis 1000 times to generate majority consensus tree. Clostridium felsineum strain NCIMB 10690 was used as outgroup. Scale bar represents 2% nucleotide sequence divergence.

Table 1

Classification and general features of Paenibacillus phocaensis strain mt24T

Property	Term
Current classification	Domain: Bacteria
	Phylum: Firmicutes
	Class: Bacilli
	Order: Bacillales
	Family: Paenibacillaceae
	Genus: Paenibacillus
	Species: Paenibacillus phocaensis
	Type strain: mt24
Gram stain	Negative
Cell shape	Bacilli
Motility	Motile
Sporulation	Sporulating
Temperature range	Mesophilic
Optimum temperature	37°C

Fig. 2

Reference mass spectrum from Paenibacillus phocaensis strain mt24T (= CSUR P2238 = DSM 101777). Spectra from 12 individual colonies were compared and reference spectrum generated.

Fig. 3

Reference mass spectrum from Paenibacillus phocaensis strain mt24T (= CSUR P2238 = DSM 101777). Spectra from 12 individual colonies were compared and reference spectrum generated.

Phenotypic description

Growth was observed between 25 and 56°C under aerobic, anaerobic and microaerophilic conditions. Strain mt24T also grew at all the tested pH, but only at a 0.5% NaCl concentration. Ideal growth was observed under aerobic condition at 37°C and pH 7 after an incubation of 48 hours. There was also healthy growth observed at 37°C under anaerobic and microaerophilic conditions and at 28°C under aerobic condition. Cells were Gram negative, motile and sporulating (Fig. 4) and formed translucent concentric colonies. Negative staining of strain mt24T showed a mean diameter of 0.47 μm and a mean length of 3.44 μm (Fig. 5).

Fig. 4

Gram staining of Paenibacillus phocaensis strain mt24T (= CSUR P2238 = DSM 101777).

Fig. 5

Transmission electron microscopy of Paenibacillus phocaensis strain mt24T (= CSUR P2238 = DSM 101777) using Tecnai G20 transmission electron microscope (FEI Company) at operating voltage of 60 kV. Scale bar = 500 nm.

The most abundant fatty acid is 15:0 anteiso (52%). The fatty acids were mainly saturated. Only three unsaturated species were detected with low abundances (<1%) (Table 2). When compared to Paenibacillus timonensis, the phylogenetically closest species, the majority fatty acid was the same. Nevertheless, the other fatty acids found in both species were in different abundances. Only one unsaturated fatty acid was present in Paenibacillus timonensis, and three fatty acids present in traces in strain mt24T were completely absent in Paenibacillus timonensis (Table 2).

Table 2

Cellular fatty acid composition (%) of Paenibacillus phocaensis strain mt24T

Fatty acid	IUPAC name	Paenibacillus phocaensis (mean relative %)a
15:0 anteiso	12-methyl-Tetradecanoic acid	52.0 ± 1.1
16:00	Hexadecanoic acid	10.6 ± 0.4
16:0 iso	14-methyl-Pentadecanoic acid	8.7 ± 0.5
15:0 iso	13-methyl-Tetradecanoic acid	8.5 ± 0.1
17:0 anteiso	14-methyl-Hexadecanoic acid	7.5 ± 0.3
15:00	Pentadecanoic acid	3.9 ± 0.3
17:0 iso	15-methyl-Hexadecanoic acid	3.0 ± 0.1
14:0 iso	12-methyl-Tridecanoic acid	1.8 ± 0.1
17:00	Heptadecanoic acid	1.0 ± 1.1
18:1n9	9-Octadecenoic acid	TR
14:00	Tetradecanoic acid	TR
18:00	Octadecanoic acid	TR
18:2n6	9,12-Octadecadienoic acid	TR
5:0 anteiso	2-methyl-Butanoic acid	TR
18:1n6	12-Octadecenoic acid	TR
13:00	Tridecanoic acid	TR

IUPAC, International Union of Pure and Applied Chemistry; TR, trace amounts <1%.

Mean peak area percentage ± standard deviation.

Strain mt24T showed oxidase and catalase activities. Activities of the following enzymes were also exhibited using an API ZYM strip: esterase C4, lipase esterase C8, leucine arylamidase, naphtol-AS-BI-phosphohydrolase, α-galactosidase, β-galactosidase, α-glucosidase, β-glucosidase and N-acetyl-β-glucosaminidase. Negative reactions were obtained for alkaline phosphatase, lipase C14, valine arylamidase, cysteine arylamidase, trypsin, α-chymotrypsin, acid phosphatase, β-glucuronidase, α-mannosidase and α-fucosidase. Using an API 20 NE strip, positive reactions were observed for nitrate reduction, β-glucosidase and β-galactosidase. Acid was produced from the fermentation of glucose, mannitol and N-acetyl-glucosamine. Using an API 50CH strip, l-arabinose, d-ribose, d-galactose, d-glucose, d-fructose, d-mannose, l-rhamnose, d-mannitol, methyl-α-d-glucopyranoside, N-acetyl-glucosamine, amygdaline, esculin, salicin, d-cellobiose, d-maltose, d-lactose, d-melibiose, d-sucrose, d-trehalose, d-melezitose, d-raffinose, starch, glycogen, xylitol, gentiobiose, l-fucose, potassium gluconate and potassium 2-ketogluconate were metabolized. Paenibacillus timonensis was not able to metabolize l-rhamnose, l-fucose, glycogen, xylitol and potassium 2-ketogluconate. Negative reactions were observed for glycerol, erythrol, d-arabinose, d-ribose, d-xylose, d-adonitol, methyl-β-d-xylopyranoside, l-sorbose, dulcitol, inositol, d-sorbitol, methyl-α-d-mannopyranoside, arbutin, d-turanose, d-lyxose, d-tagatose, d-fucose, d-arabitol, l-arabitol and potassium 5-ketogluconate. Though negative reactions were observed for d-sorbitol, arbutin and d-turanose, they were found to be metabolized by Paenibacillus timonensis. Strain mt24T and Paenibacillus timonensis both showed negative reactions for gelatinase and positive reactions for starch hydrolysis. Cells were sensitive to doxycycline, ceftriaxone, penicillin, gentamycin, trimethoprim/sulfamethoxazole, imipenem, vancomycin, rifamycin and teicoplanin but resistant to colistin, ciprofloxacin, erythromycin, oxacillin, nitrofurantoin and metronidazole. Phenotypic features of P. phocaensis were compared to those of close relatives (Table 3).

Table 3

Differential characteristics of Penicillus phoceensis strain mt24T DSM 101777, Paenibacillus timonensis strain 2301032T CCUG 48216, Paenibacillus sanguinis strain 2301083T CCUG 48214, Paenibacillus barengoltzii strain SAFN-016T ATCC BAA-1209, Paenibacillus xylanilyticus strain XIL14T LMG 21957, Paenibacillus phoenicis strain 3PO2SAT NRRL B-59348, Paenibacillus sabinae strain T27T DSM 17841, Paenibacillus puldeungensis CAU 9324T CCUG 59189, Paenibacillus terrae strain AM141T JCM 11466 and Paenibacillus jamilae strain B.3T DSM 13815 [41], [42], [43], [44], [45], [46], [47]

Property	Paenibacillus phocaensis	Paenibacillus timonensis	Paenibacillus sanguinis	Paenibacillus barengoltzii	Paenibacillusxylanilyticus	Paenibacillusphoenicis	Paenibacillus sabinae	Paenibacillus puldeungensis	Paenibacillus terrae	Paenibacillus jamilae
Cell diameter (μm)	0.5–0.4	0.5	0.5	0.5–0.8	1.5–1.55	1.0–1.5	0.7–0.8	0.3–0.4	1.3–1.8	0.5–1.2
Oxygen requirement	+	+	+	+	+	+	+	+	+	+
Gram stain	−	+	+	+	+	+	+	+	+/−	+/−
Salt requirement	−	−	−	−	+	+	+	+	+	−
Motility	+	+	+	+	+	+	+	−	+	+
Endospore formation	+	+	+	+	+	+	+	+	+	+
Indole	−	NA	NA	−	−	+	−	NA	NA	−
Production of:
Catalase	+	+	−	+	+	+	+	−	+	+
Oxidase	+	−	−	+	−	NA	−	+	−	−
Nitrate reductase	−	w	+	−	+	+	+	+	+	+
Urease	−	NA	NA	+	−	−	NA	NA	−	NA
β-Galactosidase	+	NA	NA	+	+	+	NA	NA	NA	+
N-Acetyl-glucosamine	+	+	+	+	+	+	NA	−	w	−
Acid from:
l-Arabinose	−	+	+	+	+	+	NA	+	+	+
Ribose	+	−	−	+	NA	+	NA	+	+	+
Mannose	+	+	−	+	+	+	NA	−	+	+
Mannitol	+	−	+	+	+	+	NA	−	+	+
Sucrose	+	−	+	−	+	+	−	+	+	+
d-Glucose	+	+	−	−	+	+	−	+	+	+
d-Fructose	+	+	+	+	NA	+	−	−	+	+
d-Maltose	+	+	+	+	+	+	−	+	+	+
d-Lactose	+	+	+	NA	NA	+	−	+	+	+
Habitat	Human gut	Blood	Blood	Clean room floor	Air	Floor	Soil	Soil	Soil	Olive oil

+, positive result; −, negative result; w, weakly positive result; NA, data not available.

Genome properties

The genome is 5 521 412 bp long with 53.54% GC content (Fig. 6). It is composed of seven scaffolds (composed of 39 contigs). Of the 4924 predicted genes, 4835 were protein-coding genes and 89 were RNAs (eight genes are 5S rRNA, two genes are 16S rRNA, three genes are 23S rRNA and 76 genes are tRNA genes). A total of 3560 genes (73.63%) were assigned as putative function (by COGs or by NR BLAST). A total of 101 genes were identified as ORFans (2.09%). The remaining 902 genes (20.52%) were annotated as hypothetical proteins. The properties and statistics of the genome are summarized in Table 4, while the distribution of genes into COGs functional categories is presented in Table 5.

Fig. 6

Graphical circular map of chromosome. From outside to centre: genes on forward strain colored by COGs categories (only gene assigned to COGs), RNA genes (tRNAs green, rRNAs red), G+C content and G+C skew. COGs, Clusters of Orthologous Groups database.

Table 4

Nucleotide content and gene count levels of genome

Attribute	Genome (total)
Attribute	Value	% of totala
Size (bp)	5 521 412	100
G+C content (%)	2 952 929	53.54
Coding region (bp)	4 699 807	85.11
Total genes	4924	100
RNA genes	89	1.80
Protein-coding genes	4835	98.19
Genes with function prediction	3755	76.25
Genes assigned to COGs	3338	67.79
Genes with peptide signals	707	14.35
CRISPR repeats	04	0.08
ORFans genes	101	2.05
Genes with transmembrane helices	1222	24.81
Genes associated with PKS or NRPS	09	0.10
No of antibiotic resistance genes	0	0
No. of genes associated Pfam-A domains	4484	91.06

COGs, Clusters of Orthologous Groups database.

Total is based on either size of the genome in base pairs or total number of protein-coding genes in annotated genome.

Table 5

Number of genes associated with 25 general COGs functional categories

Code	Value	% of totala	Description
J	174	2.98	Translation
A	0	0.0	RNA processing and modification
K	476	8.16	Transcription
L	178	3.05	Replication, recombination and repair
B	1	0.02	Chromatin structure and dynamics
D	33	0.57	Cell cycle control, mitosis and meiosis
Y	0	0.0	Nuclear structure
V	113	1.94	Defense mechanisms
T	324	5.55	Signal transduction mechanisms
M	207	3.55	Cell wall/membrane biogenesis
N	74	1.27	Cell motility
Z	3	0.05	Cytoskeleton
W	0	0.0	Extracellular structures
U	56	0.96	Intracellular trafficking and secretion
O	106	1.82	Posttranslational modification, protein turnover, chaperones
C	161	2.76	Energy production and conversion
G	619	10.61	Carbohydrate transport and metabolism
E	309	5.30	Amino acid transport and metabolism
F	81	1.39	Nucleotide transport and metabolism
H	116	1.99	Coenzyme transport and metabolism
I	90	1.54	Lipid transport and metabolism
P	251	4.30	Inorganic ion transport and metabolism
Q	83	1.42	Secondary metabolites biosynthesis, transport and catabolism
R	550	9.43	General function prediction only
S	316	5.42	Function unknown
—	417	7.14	Not in COGs

COGs, Clusters of Orthologous Groups database.

Total is based on total number of protein-coding genes in annotated genome.

Genome comparison

Compared to other close species reported in Table 6, the genome size of Paenibacillus phocaensis (5.52 Mb) is larger than the genome sizes of Paenibacillus barengoltzii (4.78 Mb) and Paenibacillus sanguinis (4.8 Mb) but smaller than Paenibacillus borealis (8.61 Mb), Paenibacillus macerans (7.34 Mb), Paenibacillus polymyxa (5.76 Mb) and Paenibacillus sabinae (5.27 Mb) genome sizes. The G+C content of P. phocaensis (53.54%) is also higher than the G+C content of Paenibacillus macerans, Paenibacillus sabinae, Paenibacillus barengoltzii, Paenibacillus borealis, Paenibacillus sanguinis and Paenibacillus polymyxa (52.6, 52.6, 51.9, 51.4, 49.3 and 45.5% respectively). Similarly, the number of protein-coding genes in P. phocaensis (4835) is higher than P. polymyxa (4715), P. sabinae (4634), P. sanguinis (4238) and P. barengoltzii (4145) and lower than P. macerans (6166) and P. borealis (6698). Likewise, the total number of genes (4924) is higher than P. barengoltzii (4284), P. sanguinis (4388) and P. sabinae (4857) and lower than that of P. polymyxa (4928), P. macerans (6474) and P. borealis (6945). Nevertheless, Fig. 7 highlights the fact that the repartition of the genes into COGs categories is similar in all the genomes besides the COGs category coding for extracellular structures, which was only represented in P. borealis.

Table 6

Genome comparison of closely related species to Paenibacillus phocaensis strain mt24T

Organism	INSDC	Size(Mb)	G+C(%)	Protein-coding genes	Total genes
Paenibacillus phocaensisstrain mt24^T	FCOQ00000000	5.52	53.54	4835	4924
Paenibacillus barengoltziistrain NBRC 101216	ASSZ00000000.1	4.78	51.9	4145	4284
Paenibacillus maceransstrain ZY18	JMQA00000000.1	7.34	52.6	6166	6474
Paenibacillus polymyxastrain CF05	CP009909.1	5.76	45.5	4715	4928
Paenibacillus sanguinisstrain 2301083	ARGO00000000.1	4.8	49.3	4238	4388
Paenibacillus sabinaestrain T27	CP004078.1	5.27	52.6	4634	4857
Paenibacillus borealisstrain DSM 13188	CP009285.1	8.61	51.4	6698	6945

INSDC, International Nucleotide Sequence Database Collaboration.

Fig. 7

Distribution of functional classes of predicted genes according to COGs of protein. COGs, Clusters of Orthologous Groups database.

P. phocaensis shared 2192, 2217, 2526, 2651, 2922 and 2946 orthologous proteins with P. sabinae, P. polymyxa, P. sanguinis, P. borealis, P. barengoltzii and P. macerans respectively. Among species with standing in nomenclature, AGIOS ranged from 67.09% between P. macerans and P. polymyxa and 76.40% between P. macerans and P. barengoltzii. When compared to the other species, AGIOS values ranged from 67.18% between P. phocaensis and P. polymyxa to 83.07% between P. phocaensis and P. barengoltzii (Table 7). Values between 20.3% (similarity between P. phocaensis and P. sanguinis) and 25.6% (P. phocaensis and P. barengoltzii) were obtained for digital DDH (Table 8). These values are under the 70% threshold used to delineate a new species with DDH [36], [37], thus establishing the status of P. phocaensis as a new species.

Table 7

Numbers of orthologous protein shared between genomes (upper right)a

	Paenibacillusphocaensis	Paenibacillusbarengoltzi	Paenibacillusborealis	Paenibacillusmacerans	Paenibacilluspolymxa	Paenibacillus sanguinis	Paenibacillussabinae
P. phocaensis	4916	2922	2651	2946	2217	2526	2192
P. baragnostic	83.07	4305	2514	2793	2170	2421	2139
P. borealis	68.73	68.48	6968	2747	2453	2394	2481
P. macerans	77.60	76.40	68.68	6476	2414	2568	2254
P. polymxa	67.18	67.44	67.79	67.09	4929	2093	2167
P. sanguinis	75.32	74.88	68.44	74.05	67.62	4395	1981
P. sabinae	69.32	68.85	74.05	69.49	67.57	68.31	4866

Average percentage similarity of nucleotides corresponding to orthologous protein shared between genomes (lower left) and numbers of proteins per genome (bold).

Table 8

Pairwise comparison of Paenibacillus phocaensis strain mt24T with other species using GGDC, formula 2 (DDH estimates based on identities/HSP length),a upper right

	Paenibacillusphocaensis	Paenibacillusbarengoltzi	Paenibacillusborealis	Paenibacillusmacerans	Paenibacilluspolymxa	Paenibacillus sanguinis	Paenibacillussabinae
P. phocaensis	100 ± 00	25.6 ± 2.41	22.3 ± 2.36	22.7 ± 2.37	25.8 ± 2.41	20.3 ± 2.23	22.2 ± 2.36
P. barengoltzi		100 ± 00	22.6 ± 2.36	21.1 ± 2.33	26.5 ± 2.42	19.3 ± 2.29	22.8 ± 2.37
P. borealis			100 ± 00	22.8 ± 2.37	25.7 ± 2.41	19.6 ± 2.30	20.4 ± 2.32
P. macerans				100 ± 00	27.4 ± 2.43	19.9 ± 2.31	23.1 ± 2.37
P. polymxa					100 ± 00	19.6 ± 2.30	30.8 ± 2.45
P. sanguinis						100 ± 00	19.9 ± 2.30
P. sabinae							100 ± 00

DDH, DNA-DNA hybridization; GGDC, Genome-to-Genome Distance Calculator; HSP, high-scoring segment pairs.

Confidence intervals indicate inherent uncertainty in estimating DDH values from intergenomic distances based on models derived from empirical test data sets (which are always limited in size). These results are in accordance with the 16S rRNA (Fig. 1) and phylogenomic analyses as well as GGDC results.

Conclusion

Given the 98.2% similarity level of the 16S r RNA sequence of strain mt24T with Paenibacillus timonensis, its MALDI-TOF MS profile and its annotated genome comparison with close species, we hereby create a new species within the genus Paenibacillus called Paenibacillus phocaensis (= CSUR P2238 = DSM 101777).

Description of Paenibacillus phocaensis sp. nov. strain mt24T

Paenibacillus phocaensis (pho.ce′ensis, ‘from Phocée,’ the city after which Marseille was named; strain mt24T was isolated in Marseille). The organism is facultative anaerobic. The major fatty acid is 15:0 anteiso. It is oxidase and catalase positive. Nitrates were reduced into nitrites. Strain mt24T showed oxidase and catalase activities. Urease and alkaline phosphatase were negative. Positive reactions were obtained for esterase C4, lipase esterase C8, leucine arylamidase, naphtol-AS-BI-phosphohydrolase, α-galactosidase, β-galactosidase, α-glucosidase, β-glucosidase and N-acetyl-β-glucosaminidase. Negative reactions were obtained for lipase C14, valine arylamidase, cysteine arylamidase, trypsin, α-chymotrypsin, acid phosphatase, β-glucuronidase, α-mannosidase and α-fucosidase. Using an API 50CH strip, l-arabinose, d-ribose, d-galactose, d-glucose, d-fructose, d-mannose, l-rhamnose, d-mannitol, methyl-α-d-glucopyranoside, N-acetyl-glucosamine, amygdaline, esculin, salicin, d-cellobiose, d-maltose, d-lactose, d-melibiose, d-sucrose, d-trehalose, d-melezitose, d-raffinose, starch, glycogen, xylitol, gentiobiose, l-fucose, potassium gluconate and potassium 2-ketogluconate were metabolized. Negative reactions were observed for glycerol, erythrol, d-arabinose, d-ribose, d-xylose, d-adonitol, methyl-β-d-xylopyranoside, l-sorbose, dulcitol, inositol, d-sorbitol, methyl-α-d-mannopyranoside, arbutin, d-turanose, d-lyxose, d-tagatose, d-fucose, d-arabitol, l-arabitol and potassium 5-ketogluconate. Cells were sensitive to doxycycline, ceftriaxone, penicillin, gentamycin, trimethoprim/sulfamethoxazole, imipenem, vancomycin, rifamycin and teicoplanin but resistant to colistin, ciprofloxacin, erythromycin, oxacillin, nitrofurantoin and metronidazole. The G+C content is 53.54%. The whole genome shotgun sequence and 16S r RNA sequence are deposited in GenBank under accession numbers FCOQ00000000 and LN998053 respectively. The type strain mt24 (= CSUR P2238 = DSM 101777) was isolated from the faeces of a 13-month-old girl from Niger.

42 in total

1. Bacterial alpha amylase paper disc tests on starch agar.

Authors: E STARK; R WELLERSON; P A TETRAULT; C F KOSSACK
Journal: Appl Microbiol Date: 1953-09

2. Phylogeny of Firmicutes with special reference to Mycoplasma (Mollicutes) as inferred from phosphoglycerate kinase amino acid sequence data.

Authors: Matthias Wolf; Tobias Müller; Thomas Dandekar; J Dennis Pollack
Journal: Int J Syst Evol Microbiol Date: 2004-05 Impact factor: 2.747

3. Ongoing revolution in bacteriology: routine identification of bacteria by matrix-assisted laser desorption ionization time-of-flight mass spectrometry.

Authors: Piseth Seng; Michel Drancourt; Frédérique Gouriet; Bernard La Scola; Pierre-Edouard Fournier; Jean Marc Rolain; Didier Raoult
Journal: Clin Infect Dis Date: 2009-08-15 Impact factor: 9.079

Review 4. The human gut microbiome, a taxonomic conundrum.

Authors: Senthil Alias Sankar; Jean-Christophe Lagier; Pierre Pontarotti; Didier Raoult; Pierre-Edouard Fournier
Journal: Syst Appl Microbiol Date: 2015-03-25 Impact factor: 4.022

5. Paenibacillus pasadenensis sp. nov. and Paenibacillus barengoltzii sp. nov., isolated from a spacecraft assembly facility.

Authors: Shariff Osman; Masataka Satomi; Kasthuri Venkateswaran
Journal: Int J Syst Evol Microbiol Date: 2006-07 Impact factor: 2.747

6. PHAST: a fast phage search tool.

Authors: You Zhou; Yongjie Liang; Karlene H Lynch; Jonathan J Dennis; David S Wishart
Journal: Nucleic Acids Res Date: 2011-06-14 Impact factor: 16.971

7. Development of the human infant intestinal microbiota.

Authors: Chana Palmer; Elisabeth M Bik; Daniel B DiGiulio; David A Relman; Patrick O Brown
Journal: PLoS Biol Date: 2007-06-26 Impact factor: 8.029

8. Gut microbiomes of Malawian twin pairs discordant for kwashiorkor.

Authors: Michelle I Smith; Tanya Yatsunenko; Mark J Manary; Indi Trehan; Rajhab Mkakosya; Jiye Cheng; Andrew L Kau; Stephen S Rich; Patrick Concannon; Josyf C Mychaleckyj; Jie Liu; Eric Houpt; Jia V Li; Elaine Holmes; Jeremy Nicholson; Dan Knights; Luke K Ursell; Rob Knight; Jeffrey I Gordon
Journal: Science Date: 2013-01-30 Impact factor: 47.728

9. DNAPlotter: circular and linear interactive genome visualization.

Authors: Tim Carver; Nick Thomson; Alan Bleasby; Matthew Berriman; Julian Parkhill
Journal: Bioinformatics Date: 2008-11-05 Impact factor: 6.937

10. Genome sequence and description of Anaerosalibacter massiliensis sp. nov.

Authors: N Dione; S A Sankar; J-C Lagier; S Khelaifia; C Michele; N Armstrong; M Richez; J Abrahão; D Raoult; P-E Fournier
Journal: New Microbes New Infect Date: 2016-01-11

1 in total

1. Gut Bacteria Missing in Severe Acute Malnutrition, Can We Identify Potential Probiotics by Culturomics?

Authors: Maryam Tidjani Alou; Matthieu Million; Sory I Traore; Donia Mouelhi; Saber Khelaifia; Dipankar Bachar; Aurelia Caputo; Jeremy Delerce; Souleymane Brah; Daouda Alhousseini; Cheikh Sokhna; Catherine Robert; Bouli A Diallo; Aldiouma Diallo; Philippe Parola; Michael Golden; Jean-Christophe Lagier; Didier Raoult
Journal: Front Microbiol Date: 2017-05-23 Impact factor: 5.640

1 in total