Literature DB >> 35472204

Isolation, identification, and whole-genome sequencing of high-yield protease bacteria from Daqu of ZhangGong Laojiu.

Yanbo Liu^1,2,3,4, Junying Fu^1,3,4, Linlin Wang^1,3,4, Zhijun Zhao^1,3,4, Huihui Wang^1,3,4, Suna Han², Xiyu Sun^1,3,5, Chunmei Pan^1,3,4.

Abstract

A total of 296 strains of protease-producing bacteria were isolated and purified from medium-temperature Daqu produced by ZhangGong LaoJiu Wine Co. Ltd. After calculating the ratio of transparent ring diameter to colony diameter and measuring the protease activities, a strain of high-yield protease bacteria, called DW-7, was screened out with a protease activity of 99.54 U/mL. Through morphological observation, 16S rDNA sequence analysis, and physiological and biochemical tests, the isolated bacteria DW-7 was determined to be Bacillus velezensis. In addition, whole-genome sequencing (WGS), using PacBio and the Illumina platform, was performed. Gene annotation was then conducted using the Clusters of Orthologous Groups (COG), Kyoto Encyclopedia of Genes and Genomes (KEGG), Non-Redundant Protein Sequence Database (NR), and Gene Ontology (GO) databases. The results showed that the genome of DW-7 was 3,942,829 bp long with a GC content of 46.45%. A total of 3,662 protein-encoding genes were predicted, with a total length of 3,402,822 bp. Additionally, 2,283; 2,796; and 2,127 genes were annotated in the COG, KEGG, and GO databases, respectively. A total of 196 high-yield protease genes were mainly enriched in the metabolism of alanine, aspartic acid, glutamate, glycine, serine, and threonine, as well as ABC transporter and transporter pathways.

Entities: Chemical

Mesh：

Substances：
Peptide Hydrolases

Year: 2022 PMID： 35472204 PMCID： PMC9041807 DOI： 10.1371/journal.pone.0264677

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Chinese Luzhou-flavored liquor is typically created via an oriental solid-state fermentation process. Liquor fermentation is a complex process that takes place in in an open environment and is coordinated by various microorganisms and enzymes. The microorganisms in liquor mainly come from the Daqu, the surrounding environment, pit mud, and yellow water [1]. Daqu is a unique saccharification starter for solid liquor brewing in China. It not only contains important crude enzymes and complex microflora, but it is also a source of important biologically active substances, such as aroma precursors that determine flavor characteristics [2, 3]. The quality of Daqu directly affects the quality, yield, and style of the liquor product. The microorganisms in Daqu mainly include bacteria, yeasts, molds, and a small number of actinomycetes. Bacteria are the vital microorganism in Daqu, the amount and quantity of which directly determines its quality. Bacillus sp. is the dominant flora in Daqu, which produces various enzymes that are closely linked to the formation of flavor substances during the fermentation process [4]. It has been reported that many flavor substances in liquor are derived from proteins, and the functions of those proteins and their enzymes are s significant for developing the flavor of liquor. For example, protease is an enzyme that hydrolyzes peptide bonds into amino acids or short peptides [5] and functions by maintaining enzyme activities under acidic, neutral, and alkaline conditions. Additionally, it is a metabolite of various microorganisms involved in the production of Daqu [6]. In the process of solid-state liquor fermentation, proteases in Daqu can promote the dissolution of raw materials, decompose proteins, produce amino acids, provide nitrogen sources, and flavor components, promote the growth of microorganisms, degrade yeast cells, improve the utilization of raw materials, cooperate in co-fermentation with other microbes, inhibit the production of fusel oil, and enhance the yield, quality, and aroma of the original liquor [7, 8]. Whole-genome sequencing (WGS) is a fast, low-cost, and highly efficient third-generation sequencing technology that can provide a complete bacterial genome sequence. The differences between species of the same genus can be better identified using WGS and by following gene annotations using online databases, such as GO, KEGG, COG, and NR. WGS has become a popular detection method and is widely used to identify microbial communities within intestinal flora, soil, and fungi [9, 10]. It is necessary for mining the core genome, identifying specific genes, and analyzing functional genomics, which ultimately contributes to the exploration of the diversity and biological characteristics of unknown microbial groups [11]. At present, domestic and foreign research mainly focuses on the physical and chemical properties and the purification technologies of proteases, as well as the methods to obtain high-yield protease strains and protease products through strain mutagenesis, genetic engineering, and other methods. For example, Lu et al. [12] screened out a high-yield, neutral protease bacterium from high-temperature yeast for making hard liquor, which was identified as B. amyloliquefaciens. Nie et al. [13] initially screened for microbes by calculating the transparent ring diameter, then re-screened by measuring protease activity, and finally identified two strains of high-yield proteases from Daqu in Jiang-flavor Chinese spirits, which were characterized as B. subtilis. However, the exact mechanism underlying the screening of high-yield proteases from Daqu of Chinese Luzhou-flavored liquor is not well understood. In the present study, we screened a strain of high-yield protease bacteria from the Daqu of ZhangGong Laojiu. We obtained the whole-genome sequence by performing WGS using the PacBio and Illumina platforms. Following gene annotations using online databases of GO, KEGG, COG, NR, and Swiss-Prot databases, we further revealed the target enriched genes and metabolic pathways. Our findings provide theoretical references for the mining of high-yield proteases and their biological functions.

Materials and methods

Materials and reagents

A medium-temperature starter for our study was provided by Henan ZhangGong Laojiu Wine Co., Ltd. Sodium carbonate anhydrous was obtained from Tianjin Dingshengxin Chemical Co., Ltd; the reagents L-tyrosine, agar powder, casein, trichloroacetic acid, and sodium chloride were purchased from Tianjin Kemiou Chemical Reagent Co., Ltd; beef extract peptone was obtained from Beijing Aoboxing Biotechnology Co., Ltd; the Folin-Ciocalteu reagent was from Solarbio, Biotechnology Co., Ltd; the Ezup Column Bacteria Genomic DNA Purification Kit was from Sangon Biotech Co., Ltd.; DNA 1000 assay kit was from Lithuania Agilent Technologies; and the ABI StepOnePlus Real-Time PCR System was purchased from Life Technologies.

Instruments

A constant-temperature incubator was purchased from Shanghai Shcimo Medical Device Manufacturing Co., Ltd; a high-speed refrigerated centrifuge was provided by Shanghai Anting Device Manufacturing Co., Ltd; a Dk-8D digital thermostatic laboratory water bath was from Guangzhou Hezhong Biotechnology Co., Ltd; a three-dimensional autoclave was obtained from Shanghai Shenan Medical Device Manufacturing Co., Ltd; 752 UV-VIS spectrophotometer was from Shanghai Jinghua Instruments; the biological purification table was from Suzhou Purification Equipment Co., Ltd; PacBio Single Molecule, real-time (SMRT) sequencing system was acquired from Pacific Biosciences of California; the Novaseq 6000 system was from Illumina; a 2100 Bioanalyzer Instrument was from Agilent Technologies; the Eppendorf 5427R centrifuge was from Eppendorf; the NanoDrop 2000 spectrophotometer was obtained from Thermal Fisher Scientific; the DYY-6C agarose gel electrophoresis kit was from Beijing Liuyi Biotechnology Co., Ltd; and finally, the PCR instrument was from Dongsheng Xingye Scientific Instrument Co., Ltd.

Culture media

The Casein medium [14] contained: 3 g beef extract, 5 g sodium chloride, 10 g peptone, 20 g agar powder, and 4 g casein at the constant volume of 1000 mL of distilled water with a pH level between 7.0 and 7.2. The medium was sterilized at 121°C for 20 minutes. The beef extract peptone agar medium contained: 3 g beef extract, 5 g sodium chloride, 10 g peptone, 20 g agar powder, and 4 g casein in a constant volume of 1000 mL of distilled water at pH 7.0–7.2. The medium was sterilized at 121°C for 20 min. The fluid medium [15] contained: 3 g beef extract, 5 g sodium chloride, and 10 g peptone in a constant volume of 1000 mL of distilled water at pH 7.0–7.2, and sterilized at 121°C for 20 min. Solid medium [16]: 20 g bran and 20 mL distilled water, and sterilized at 121°C for 20 min. Isolation: Daqu was ground into powder, 10 g of which was added to an Erlenmeyer flask containing 90 mL of sterile distilled water. After gently shaking the solution in a shaking incubator at 150 rpm for 30 min [17], 1 mL of the suspension was transferred to a tube containing 9 mL of sterile water and diluted 10 times. 0.1 mL of 10−3, 10−4, 10−5, and 10−6-times diluted bacteria solution was plated and cultured at 37°C for 24 h in inverted position [18]. Initial screening: Under aseptic conditions, a sterile pipette tip was used to place a small amount of purified bacteria on the casein medium plate, created three parallels, and cultured for 24 h in inverted position to observe the formation of a transparent ring. The ratio (D/d) of transparent ring diameter (D) to cell colony diameter (d) was calculated, and those with a larger D/d ratio were selected for re-screening [19]. Preparation of the standard curve: The standard curve was plotted using the absorbance value and the concentration of L-tyrosine as the ordinate and abscissa, respectively. Preparation of crude enzyme solution: To prepare the crude enzyme solution, bacteria were inoculated into solid medium of beef extract peptone, and cultured at 37°C for 24 h in inverted position. Subsequently, a single colony with obvious morphology was picked for inoculation into the liquid medium and then cultured on a shaker at 37°C for 24 hours. The bacterial suspension was inoculated into the solid culture medium at 10% of the inoculation volume, and incubated at 37°C for 3 days. Solid fermentation product (10 g) was added to a 250 mL Erlenmeyer flask containing 90 mL of distilled water and incubated in a 40°C water bath for 1 hour. Then, the mixture was stirred every 15 minutes. After centrifugation, the crude enzyme solution was obtained [20]. Measurement of protease activity using the Folin-Ciocalteu method: Under the conditions of 1 mL of protease solution, at pH 7.5 and 40°C, the amount of protease required to hydrolyze casein to produce 1 μg L-tyrosine per minute was defined as the protease activity, calculated using the 0

Screening of protease-producing bacteria

The constant A was obtained from the standard curve and it is equal to the number of micrograms of L-tyrosine equivalent to OD680 = 1; 4 represents 1 mL of solution collected from the total volume of 4 mL for measurement; N is the dilution multiple of protease solution to be measured; 10 represents the reaction time of 10 min, and OD represents the mean optical density of the parallel sample. Morphology identification: The high-yield, protease-producing bacteria were activated on the plate, inoculated on the casein medium by the spot inoculation method, and subjected to Gram staining. The morphology was observed under a microscope [21]. Molecular characterization: Using the DNA extracted from the high-yield, protease-producing bacteria as a template, the 16S rDNA gene in the bacterial genomic DNA was amplified using the bacterial universal primers 27F (5’-AGA GTT TCA TCT GGC TCA G-3’) and 1492R (5’-GGT TAC CTT GTT ACG ACT T-3’). PCR was performed at 94°C for 5 minutes, followed by 35 cycles of denaturation at 94°C for 30 seconds, annealing at 55°C 45 seconds, and extension at 72°C for 1.5 minutes. Finally, repair and extension at 72°C were performed for 10 minutes, and the sample was stored at 4°C. A 50 μL PCR amplification solution was prepared, and the amplification results were detected using 1% agarose electrophoresis to observe the specific products and their relative molecular masses. The amplified products were sent to Shanghai Sangon Biotech Co., Ltd. for sequencing [22]. Spliced sequences were compared with data in the NCBI database, and the standard bacteria with similar genetic relationships were analyzed using the MEGA 6.0, software to construct a phylogenetic tree used for visual observation of the genera and species of the bacteria [23].

Identification of protease-producing bacteria

Physiological and biochemical tests: Physiological and biochemical tests of the high-yield protease-producing bacteria were performed based on the 8 edition of Bergey’s Manual l of Determinative Bacteriology and the Common Identification Methods of General Bacteria [24]. DNA extraction: Genomic DNA was extracted according to the instructions of the bacterial DNA extraction kit. To ensure the quality of WGS as well as the purity and concentration of the DNA, the samples were sent to Guangzhou Gene Denovo Biotechnology Co., Ltd. for sequencing. PacBio sequencing: g-TUBE for genomic DNA shearing was performed to process genomic DNA into 8–10 kbp fragments, followed by DNA end repair. The obtained SMRT Bell DNA template was sent off for quality detection, and an Agilent 2100 bioanalyzer was used to evaluate the insert size. Sequencing was performed using PacBio platform. Illumina sequencing: Genomic DNA fragments were prepared using dsDNA fragmentase, followed by end repair, phosphorylation, addition of polyA tails, and sequencing adapters. DNA fragments were purified using AMPure XP magnetic beads, and target fragments of 300–400 bp were subjected to PCR to establish the sequence library. PCR products were further purified using AMPure XP magnetic beads and examined using an Agilent 2100 bioanalyzer. Qualified products were sequenced using the Illumina Noveseq 6000 platform. Bioinformatic analyses: Low-quality data in the raw data (reads) of PacBio and Illumina sequencing were filtered out of the dataset to ensure the accuracy and reliability of the bioinformatic analyses. Clean data were assembled to analyze genomes containing target strains, including genomic components of strains, tandem repeat prediction, tRNA prediction, rRNA prediction, sRNA prediction, genomic island prediction, transposons prediction, and prephage prediction using TRF (version 4.09) [25], tRNAscan (version 1.3.1) [26], rRNAmmer (version 1.2) [27], cmscan (version 1.1.2) [28], IslandPath-DIMOB (version 1.0.0) [29], TransposonPSI (version: 20100822) [30], and PHAST (version 2.0) [31], respectively. Gene function annotation was performed using the following online databases: COG database (https://www.ncbi.nlm.nih.gov/COG/), KEGG database (https://www.genome.jp/kegg/), NR database (ftp://ftp.ncbi.nih.gov:21/blast/db/FASTA/), SwissProt database (https://www.uniprot.org/), and GO database (http://geneontology.org/). Isolation: Through gradient dilution and plate coating, 296 strains of protease-producing bacteria were screened out from Daqu of ZhangGong Laojiu. Initial screening: A total of 296 strains of protease-producing bacteria were inoculated on casein medium and cultured at 37°C for 24 h in inverted position. The calculated D/d values are listed in Table 1.

Table 1

Diameter of transparent ring and strain diameter ratio.

No.	d/cm	D/cm	D/d
DW-2	0.60±0.06	2.30±0.10	3.84±0.25
DW-7	0.57±0.15	2.50±0.06	4.39±0.98
DW-8	0.50±0.06	1.90±0.06	3.80±0.34
DW-22	0.60±0.06	2.20±0.12	3.69±0.62
DW-32	0.60±0.06	2.40±0.10	4.00±0.25
DW-40	0.50±0.00	1.80±0.06	3.60±0.12

Results

Isolation of protease-producing bacteria

The D/d ratio of DW-7 was 4.39 and the highest among all of the isolated strains, which is likely attributed to the strong protease activity and degradation ability of casein. Protease activity in the subsequent highest six strains was measured to detect high-yield protease-producing bacteria.

Screening of high-yield protease-producing bacteria

As shown in Fig 1, the regression equation of the L-tyrosine standard curve was y = 0.0102x - 0.0057, and the correlation coefficient (R) was 0.9977, indicating a significant linear relationship between protease activity and L-tyrsoine production.

Fig 1

L-tyrosine standard curve.

I. Measurement of protease activity using the Folin-Ciocalteu method As shown in Table 2, the protease activity of DW-7 was the highest among the six strains tested at 99.54 U/mL. We subsequently examined the morphology, molecular biology, and physiological and biochemical properties of DW-7.

Table 2

Determination of protease activity.

No.	OD-value	protease activity (U/mL)
DW-2	0.555±0.01	87.56±1.79
DW-7	0.631±0.01	99.54±2.23
DW-8	0.540±0.00	85.19±0.11
DW-22	0.525±0.00	82.82±0.00
DW-32	0.574±0.01	90.55±1.90
DW-40	0.497±0.03	78.41±4.24

II. Identification of high-yield protease-producing bacteria Cell colonies were milky white with a moist and viscous surface and wrinkles. They were round with smooth edges but lackED flagella (Fig 2A). Gram staining was performed to observe the cell morphology under a microscope. The bacteria were stained red, suggesting the presence of Gram-negative bacteria. The individual bacterial cells were rod-shaped (Fig 2B).

Fig 2

(a) Colony morphology of DW-7; (b) Morphology of DW-7 cells (magnification 1000X).

Molecular characterization

The DNA fragment of DW-7 was amplified by PCR and visualized by agarose gel electrophoresis. A phylogenetic tree was constructed using the MEGA 6.0. As shown in Fig 3, the size of the amplified product was approximately 1,500 bp, which is consistent with that of the target product.

Fig 3

Electrophoresis results of DW-7.

Comparative analysis using the NCBI database showed that DW-7 shared the closest relationship with Bacillus velezensis, at 97.81% similarity (Fig 4).

Fig 4

Phylogenetic tree of the DW-7 bacteria.

Physiological and biochemical test results

DW-7 presented a positive catalase test, meaning it was able to use D-mannitol to produce acid and survive on a medium containing 10% NaCl or at pH 5.0. Citrate was not used, but propionate was available (Table 3). Based on the physiological and biochemical test results, as well as 16S rDNA sequence analysis, DW-7 was confirmed to be Bacillus velezensis.

Table 3

Physiological and biochemical analysis of the DW-7 bacterial strain.

Experiment	Result
Tolerance to 10% NaCl	+
Tolerance to pH 5.0	+
D-mannitol acid production	+
Catalase test	+
Propionate	+
Citrate	-

+ indicates a positive result, and − indicates a negative result.

Genome assembly of DW-7

WGS of DW-7 was performed using the PacBio sequencing platform, and the results showed that there were 193,753 effective reads after filtration, with a mean sequence length of 8380.3 bp (Table 4).

Table 4

Genome statistics of the DW-7 strain.

Type	Length	Amount
Sequence	3942829 bp	1
Gene	3402822 bp	3662
GC content	46.46%
Total transposon length	4515 bp	64
Long scattered sequence	1869 bp	24
Short scattered sequence	1474 bp	22
Long terminal repeat sequence	185 bp	3
tandem repeat sequences	10086 bp	147
tRNA	6639 bp	86
rRNA	41367 bp	37
sRNA	2227 bp	17
Gene Islands	28970 bp	1
prophage	90882 bp	1

The DW-7 was comprised of a circular genome of 3,942,829 bp length. Protein-encoding genes were predicted using the NCBI database. A total of 3,662 genes were identified, with a combined length of 3,402,822 bp. The longest gene length was 16,299 bp, and the shortest was 75 bp. The GC content accounted for 46.46%. There were 64 transposons in the full genome of DW-7, with a total length of 4,515 bp. Additionally, there were 24 long interspersed repeats (LINEs) with a total length of 1,869 bp accounting for 0.05% of the genome, and 22 short interspersed repeats (SINEs) with a total length of 1,474 bp accounting for 0.04% of the genome. There were also three long terminal repeats (LTRs) with a combined length of 185 bp and 147 tandem repeats with a combined length of 10,068 bp, that together accounted for 0.26% of the genome. There were 86 tRNAs detected in the genome of DW-7, with a combined length of 6,639 bp and a mean length of 77 bp, as well as nine 5S rRNAs with a total length of 1,044 bp and a mean length of 116 bp. Nine 16S rRNAs with a combined length of 13,948 bp and a mean length of 1,549 bp were detected and nine 23S rRNAs with a combined length of 26,375 bp and a mean length of 2,930 bp were also detected. A total of 17 sRNAs with a combined length of 2,227 bp and a mean length of 131 bp were also detected. Finally, one genomic island was detected in the DW-7 with a total length of 28,970 bp as well as one prephage with a length of 90,882 bp.

Bacterial genome components

Circular genome maps were depicted based on assembled genome sequences and prediction results of protein-encoding genes, which provided a visual of the genomic characteristics (Fig 5).

Fig 5

Complete map of the genome of DW-7 bacteria.

The outermost circle of the circular genome map of DW-7 is 3,402,822 bp in length. The second and third circles are positive and negative-strand coding regions, respectively. Different colors represent different functions annotated by the COG database. A total of 3,662 protein-encoding genes were identified, of which 2,796 (76.35%) genes were annotated by COG. Metabolism, information storage and processing, cell function and signal transmission, and atypical features were the four main functional categories. A total of 25 types were annotated, including general function prediction only, amino acid transport and metabolism, unknown function, transcription, carbohydrate transport, metabolism, and several other functions The fourth circle iscomprised of ncRNAs, in which tRNAs and rRNAs are colored in black and red, respectively. The fifth circle is the GC content, in which orange and blue indicate that the GC content was higher or lower than the mean value of the full genome, respectively. The sixth circle represents the GC-Skew value. Collectively, the circular genome maps of DW-7 directly showed the distribution of the DW-7 genome.

Gene annotation using COG

After comparison of the function annotations between the DIAMOND and COG databases, annotations were classified and depicted in Fig 6.

Fig 6

Functional classification of COG.

Function annotation of identified protein-encoding genes in DW-7 was performed using COG, and a total of 2,796 protein-coding genes involving 25 categories were annotated. Among them, 445 (15.92%) protein-encoding genes were enriched in the general function prediction only, which was the most enriched category, followed by 336 (12.02%) protein-encoding genes enriched in amino acid transport and metabolism category. Furthermore, 274 (9.80%), 245 (8.76%), and 205 (7.33%) protein-encoding genes were enriched in the transcription, carbohydrate transport and metabolism, and inorganic ion transport and metabolism categories, respectively. A combined total of 302 (10.80%) protein-encoding genes were enriched in all functional positions, which necessitates further exploration.

Gene annotation using KEGG

The amino acid sequence of DW-7 was compared with data obtained from the KEGG database, and annotation results were obtained by matching the genes of target species and their corresponding function annotation information. A total of 2,249 genes in DW-7 were enriched in 23 pathways and five major functions of metabolism, including genetic information processing, environmental information processing, cell process, and biological system (Fig 7).

Fig 7

Bacterial gene functional annotation KEGG metabolic pathway.

In the category of metabolic processes, 12 pathways were annotated, and 242 and 206 genes were enriched in the carbohydrate metabolism and amino acid metabolism pathways, respectively. Four pathways in the of genetic information processing category were annotated, with the translation pathway having the most annotations, in one of which 85 genes were enriched. There were 147 and 124 genes enriched in the membrane transport and signal transduction pathways, respectively, both of which are subcategories of the environmental information processing category. In addition, three pathways were annotated in the cell process category, in which the prokaryotic cell community was the most frequently annotated, in one of which 85 genes were enriched. In the biological systems category, four and three protein-encoding genes were enriched in the environmental adaptation and immune system subcategories, respectively. According to the KEGG annotation information, 196 protein-encoding genes in DW-7 were enriched in the following pathways, all of which were correlated with presence of proteases involved in the metabolism of alanine, aspartic acid, glutamate, glycine, serine, threonine, ABC transporter, and transporter pathways. These protein-encoding genes included the KSG66-11125 (serine dehydratase), KSG66-13075 (aspartate kinase), KSG66-14990 (serine kinase), KSG66–10425 (aspartate aminotransferase), and several others.

Gene annotation using nr and Swiss-Prot

The gene sequence of DW-7 was translated into the corresponding amino acid sequence, then compared with data from the nr database within the NCBI database (Fig 8).

Fig 8

Annotated species statistics of Nr database (top 20 species).

A total of 3,662 genes in DW-7 were annotated using the nr database. Among them, 2,709; 389; 198; 107; 70; 53; and 41 genes were enriched in the primary seven identified species B. velezensis, B. amyloliquefaciens, B. subtilis, B. abscessus, and Bacillus sp.5B6, and Streptococcus pneumoniae, respectively. A total of 3,399 genes in DW-7 were functionally annotated in the Swiss Prot database, which is a selected protein sequence database that describes protein function, structure, post-translational modification, mutations, and other characteristics.

Gene annotation using GO

According to the annotation information in the nr database, GO function annotations were identified to comprehensively describe the attributes of genes and gene products in DW-7 (Fig 9).

Fig 9

GO classification of bacterial gene function annotation.

A total of 2,127 genes from DW-7 were annotated in the GO database. GO terms including biological process (BP), cellular component (CC), and molecular function (MF), comprising 18, 10, and 10 elements, respectively. The top three annotated BPs were the metabolic processes, biological processes, and single-cell tissue processes, in which 1,287, 1,189, and 944 genes were enriched, respectively. There were 647, 528, 527, and 527 genes enriched in the top four CCs, including the cell membranes, cell membrane compositions, cells, and genes related to cell components, respectively. A total of 1,152, 888, and 133 genes were enriched in the most annotated MFs of catalytic activity term: gene and binding, and transport activity, respectively.

Discussion

In this study, 296 strains of protease-producing bacteria were isolated from Daqu from ZhangGong Laojiu Wine Co., Ltd. Additionally, one strain of high protease-producing bacteria, DW-7, with an enzyme activity of 99.54 U/mL was isolated by combining the transparent circle screening method and the forinol screening method. Morphological observations, 16S rDNA molecular biology identification, as well as physiological and biochemical tests, were performed on the isolated DW-7 which was identified as Bacillus velezensis. Few studies have examined high protease-producing strains using baijiu Daqu as samples. Yuan Xianling et al. [14] isolated and purified five strains of protease-producing bacteria from strong baijiu Daqu, including Bacillus thuringiensis, Bacillus subtilis, Bacillus cereus, Bacillus multilocularis and Bacillus subtilis with enzyme activities of 41.75 U/mL, 31.90 U/mL, 53.40 U/mL, 46.10 U/mL and 44.55 U/mL, respectively; Zhao Qunli [32] et al. used Moutai-flavored baijiu Daqu as the raw material, from which they screened a strain 3 J-1 with high enzyme activity and a protease activity of 70.15 U/mL; Huang et al. [33] also screened a strain from Moutai-flavored baijiu Daqu, identified as A. hennebergii, with a maximum enzyme activity of 71.13 U/mL after optimization of pH, carbon source, nitrogen source and metal ions; and a high enzyme activity level of 99.54 U/mL. Protease-producing bacteria are widely found in marine [34], animal intestine [35], soil [36], and other environments, and screening for these high protease-producing strains can be useful for industrial production and food processing. Proteases are a class of enzymes that catalyze the hydrolysis of proteins into peptides and amino acids and degrade proteins into raw materials to form small peptides or amino acids used in yeast and lactic acid bacteria fermentation as a nitrogen source to promote the growth of brewing microorganisms [37]. Some of the amino acids in the protein degradation products are flavoring substances themselves, while others are precursors of aroma components that ultimately affect the formation and quality of the flavor of white wine [38]; and thus are indispensable enzymes for white wine production. Studies have shown that the hydrolytic enzyme system in soybean currants is mainly protease [39]. When the protease content is normal, it can significantly inhibit the production of hetero-ethanol oil [40], and the protease activity of superior grade barley is higher than that of normal grade barley [41]. Deng et al. compared different colors of high-temperature barley and found that QW protease activity was higher than QR, QY, and QB, while the windiness of bacterial enzyme-encoding genes was higher in QW [42]. Numerous studies have shown that bacteria of the genus Bacillus have a strong protease production capacity [2, 43]. To further investigate the protease metabolism mechanism and metabolic pathway, we performed WGS of DW-7, and determined that the genome length of this strain was 3,942,829 bp, with a GC content of 46.45%. The combined length of DW-7’s coding genes was 3,402,822 bp, and by gene function annotation, there were 2,796 genes annotated in the COG database, with the main focus on general function prediction, amino acid transport and metabolism, transcription, carbohydrate transport and metabolism, and inorganic ion transport and metabolism. In the KEGG database, 2,283 genes were annotated, mainly in carbohydrate metabolism, amino acid metabolic pathway, translation pathway, membrane transport, and signal transduction pathway. 2,127 genes were annotated in the GO database, mainly in metabolic function, cellular process, cell membrane, cell membrane components, catalytic activity, gene, and binding. Protease genes do not have biological functions after transcription and translation to form proteins, rather only after modification by a series of chemical reactions, such as ubiquitination, phosphorylation, acetylation, glycosylation, methylation, or lipidylation, can there be more protein types with complex structures, precise regulation, specific action, and perfect function [44]. In this study, COG function prediction, KEGG metabolic pathway annotation, and GO annotation of the genes encoded by this strain showed normal gene transcription and translation and a relatively high number of annotated genes, which indicated that the enzyme metabolism of this bacterium is active. Protease gene expression metabolic pathways were mainly focused on alanine, aspartate, glutamate, glycine, serine, and threonine, as well as ABC transporter proteins; and transporter protein. The main proteases involved in these pathways were serine dehydratase, aspartate kinase, serine kinase, and aspartate aminotransferase. Prajapati et al. [45] resolved the protease gene of Bacillus amyloliquefaciens strain KCP2 by WGS, while Li et al. [46] sequenced the genome of Laceyella sacchari FBKL4 and found that the genome contains genes encoding key enzymes, such as proteases and peptidases, along the tetramethylpyrazine metabolic pathway. Proteases are classified into four categories based on their catalytic role in a variety of physiological pathways, including serine proteases, aspartate proteases, cysteine proteases, and metalloproteases [47]. Among the proteases, serine proteases are the most common, accounting for nearly one-third of known proteases [48, 49]. The 196 protease-related gene annotations in this study were significantly enriched for serine dehydratase, aspartate kinase, serine kinase, and aspartate aminotransferase, which laterally reflected that the bacterium was active in serine protease and aspartate protease synthesis. This result also indicated that this high protease-producing Bacillus sp. metabolized mainly these two types of catalytic proteases.

Conclusion

In this experiment, a strain of high-yield protease bacteria DW-7 with 99.54 U/mL protease activity was isolated from medium-temperature Daqu provided by ZhangGong LaoJiu Wine Co., Ltd. Through morphological observation, 16S rDNA sequence analysis, and physiological and biochemical tests, DW-7 was determined to be Bacillus velezensis. WGS analysis of DW-7 revealed 196 genes that were highly related to proteases predominantly enriched in the metabolism of alanine, aspartic acid, glutamate, glycine, serine, threonine, ABC transporter proteins, and transporter pathways. Overall, our study preliminarily explored the biological characteristics and metabolic functions of the high-yield protease-producing strain DW-7 by WGS analysis. Abundant genomic information was obtained that provided useful references for identifying the functional characteristics of the strain, as well as the foundation for elucidating the background of the high-yield protease-producing feature. Our results are of great significance for the development of novel high-yield proteases from Daqu. (PDF) Click here for additional data file. (ZIP) Click here for additional data file. (PDF) Click here for additional data file. 26 Nov 2021

PONE-D-21-33082

“Isolation, identification and whole genome sequencing of high-yield protease bacteria from daqu of Zhanggong Laojiu”

PLOS ONE Dear Dr. Liu, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 10 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Tofazzal Islam, Ph.D. Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. We suggest you thoroughly copyedit your manuscript for language usage, spelling, and grammar. If you do not know anyone who can help you do this, you may wish to consider employing a professional scientific editing service. Whilst you may use any professional scientific editing service of your choice, PLOS has partnered with both American Journal Experts (AJE) and Editage to provide discounted services to PLOS authors. Both organizations have experience helping authors meet PLOS guidelines and can provide language editing, translation, manuscript formatting, and figure formatting to ensure your manuscript meets our submission guidelines. To take advantage of our partnership with AJE, visit the AJE website (http://learn.aje.com/plos/) for a 15% discount off AJE services. To take advantage of our partnership with Editage, visit the Editage website (www.editage.com) and enter referral code PLOSEDIT for a 15% discount off Editage services. If the PLOS editorial team finds any language issues in text that either AJE or Editage has edited, the service provider will re-edit the text for free. Upon resubmission, please provide the following: The name of the colleague or the details of the professional service that edited your manuscript A copy of your manuscript showing your changes by either highlighting them or using track changes (uploaded as a *supporting information* file) A clean copy of the edited manuscript (uploaded as the new *manuscript* file) 3. Thank you for stating the following in the Acknowledgments Section of your manuscript: “This work supported by the Major Science and Technology Projects of Henan Province of China(181100211400),Key Technologies Research and Development Program of Henan Province of China(202102110130) , Scientific Research Foundation for Docotors of Henan University of Animal Husbandry and Economy(2018HNUAHEDF011) and Key Subject Projects of Henan University of Animal Husbandry and Economy.” We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: “The authors received no specific funding for this work” Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 4. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. We will update your Data Availability statement to reflect the information you provide in your cover letter. 5. PLOS ONE now requires that authors provide the original uncropped and unadjusted images underlying all blot or gel results reported in a submission’s figures or Supporting Information files. This policy and the journal’s other requirements for blot/gel reporting and figure preparation are described in detail at https://journals.plos.org/plosone/s/figures#loc-blot-and-gel-reporting-requirements and https://journals.plos.org/plosone/s/figures#loc-preparing-figures-from-image-files. When you submit your revised manuscript, please ensure that your figures adhere fully to these guidelines and provide the original underlying images for all blot or gel data reported in your submission. See the following link for instructions on providing the original image data: https://journals.plos.org/plosone/s/figures#loc-original-images-for-blots-and-gels. In your cover letter, please note whether your blot/gel image data are in Supporting Information or posted at a public data repository, provide the repository URL if relevant, and provide specific details as to which raw blot/gel images, if any, are not available. Email us at plosone@plos.org if you have any questions. 6. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Check the Reference format. Some references are unavailable with the title search. The findings are interesting. Enough data has been represented to support the article. Overall, this is a nice article! Reviewer #2: The manuscript is interesting and write up is good. However, the significance of identifying, screening and whole genome sequencing of high-yield protease bacteria are not clear. The author should add some sentences to justify their research. Some other corrections are listed below. Line 16: Correct spacing problem as U/mL (Abstract) Line 43: Bacillus should be Italic (Introduction, 2nd paragraph) -No need to write Bacillus in every cases. Write the genus name first and then write only the short form like B. amyloliquefaciens, B. subtilis etc throughout the manuscript (Line 72, 75) -Figure caption (Fig.4 or Fig 5 etc) is not similar. Please follow the exact rules of writing. -Is it necessary to discuss what other did in their experiment in the conclusion sections? My recommendation is to transfer them in the discussion section. I surprised there is no discussion section. Please add discussion sections and discuss based on the previous findings. Highlighted the significant findings, suggestions, further research possibilities in conclusion section. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: identification and whole genome sequencing of bacteria.docx Click here for additional data file. 12 Jan 2022 Dear Editor and Reviewers: Thank you for your letter and for the reviewers' comments concerning our manuscript. Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. Revised portion are marked in yellow in the paper. The manuscript has been polished by Editage. The main corrections in the paper and the responds to the reviewer 's comments are as following: Response to Reviewers 1.Believing ‘Zhanggong Laojiu’ as the name of an alcoholic beverage and isolated from a host (yeast), the place (origin of source) could be mentioned here. The medium-temperature Daqu（A traditional starter used to make Chinese baijiu） produced by ZhangGongLaoJiu Wine Co. Ltd.( Brands of alcoholic beverages) 2. Better to use the full form when mentioned for the first time. Already added 3. Shorter keywords are preferable. The keywords have been shortened and summarized 4. fusel oil? Some of the alcohols present in alcoholic beverages, including 1-propanol, iso-butanol, 1-butanol, 2-butanol, iso-amyl alcohol, active amyl alcohol etc. 5. Was there any verification performed with protease-producing gene sequences obtained from annotation? The sequence of the protease-producing gene has not been verified yet, which is the next step we will do. 6. Words should be in italic where required. Some references are not available on google scholar (or even on google search) upon searching with the title. A correction has been made, some references are only published in China, and may not be found on Google search for this reason. 7. What is ‘spore spore’? There was an error in this area and changes have been made. We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. We appreciate for Editor and Reviewers' warm work earnestly, and hope that the correction will meet with approval. If there are any shortcomings in the article, please tell me immediately, and I will seriously revise it again.Once again, thank you very much for your comments and suggestion. Thank you and best regards. Yours sincerely, Yanbo Liu E-mail: yanboliu@hnuahe.edu.cn Submitted filename: 20220112-Response to Reviewers.docx Click here for additional data file. 16 Feb 2022 “Isolation, identification and whole genome sequencing of high-yield protease bacteria from daqu of Zhanggong Laojiu” PONE-D-21-33082R1 Dear Dr. Yanbo Liu , We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Tofazzal Islam, Ph.D. Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 8 Apr 2022 PONE-D-21-33082R1 Isolation, identification, and whole-genome sequencing of high-yield protease bacteria from Daqu of ZhangGong Laojiu Dear Dr. Liu: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Professor Dr. Tofazzal Islam Academic Editor PLOS ONE

25 in total

1. Optimization of extracellular alkaline protease production from species of Bacillus.

Authors: Wei-Hua Chu
Journal: J Ind Microbiol Biotechnol Date: 2006-12-15 Impact factor: 3.346

2. Roles of aging in the production of light-flavored Daqu.

Authors: Guangsen Fan; Zhilei Fu; Baoguo Sun; Yuhang Zhang; Xinlei Wang; Yanqiu Xia; Mingquan Huang; Xiuting Li
Journal: J Biosci Bioeng Date: 2018-09-18 Impact factor: 2.894

Review 3. Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing.

Authors: J Ronholm; Neda Nasheri; Nicholas Petronella; Franco Pagotto
Journal: Clin Microbiol Rev Date: 2016-10 Impact factor: 26.132

4. Draft genome sequence of a thermostable, alkaliphilic α-amylase and protease producing Bacillus amyloliquefaciens strain KCP2.

Authors: Vimalkumar S Prajapati; Sanket Ray; Jitendra Narayan; Chaitanya C Joshi; Kamlesh C Patel; Ujjval B Trivedi; R M Patel
Journal: 3 Biotech Date: 2017-10-13 Impact factor: 2.406

10. Analysis of Difference in Microbial Community and Physicochemical Indices Between Surface and Central Parts of Chinese Special-Flavor Baijiu Daqu.

Authors: Yanru Chen; Kaimin Li; Ting Liu; Ruyi Li; Guiming Fu; Yin Wan; Fuping Zheng
Journal: Front Microbiol Date: 2021-01-14 Impact factor: 5.640