Literature DB >> 30805384

A Novel LSSVM Based Algorithm to Increase Accuracy of Bacterial Growth Modeling.

Masoud Salehi Borujeni¹, Mostafa Ghaderi-Zefrehei², Farzan Ghanegolmohammadi³, Saeid Ansari-Mahyari⁴.

Abstract

BACKGROUND: The recent progress and achievements in the advanced, accurate, and rigorously evaluated algorithms has revolutionized different aspects of the predictive microbiology including bacterial growth.
OBJECTIVES: In this study, attempts were made to develop a more accurate hybrid algorithm for predicting the bacterial growth curve which can also be applicable in predictive microbiology studies.
MATERIALS AND METHODS: Sigmoid functions, including Logistic and Gompertz, as well as least square support vector machine (LSSVM) based algorithms were employed to model the bacterial growth of the two important strains comprising Listeria monocytogenes and Escherichia coli. Even though cross-validation is generally used for tuning the parameters in LSSVM, in this study, parameters tuning (i.e.,'c' and 'σ') of the LSSVM were optimized using non-dominated sorting genetic algorithm-II (NSGA-II), named as NSGA-II-LSSVM. Then, the results of each approach were compared with the mean absolute error (MAE) as well as the mean absolute percentage error (MAPE).
RESULTS: Applying LSSVM, it was resulted in a precise bacterial growth modeling compared to the sigmoid functions. Moreover, our results have indicated that NSGA-II-LSSVM was more accurate in terms of prediction than LSSVM method.
CONCLUSION: Application of the NSGA-II-LSSVM hybrid algorithm to predict precise values of 'c' and 'σ' parameters in the bacterial growth modeling resulted in a better growth prediction. In fact, the power of NSGA-II for estimating optimal coefficients led to a better disclosure of the predictive potential of the LSSVM.

Entities: Chemical

Keywords: Bacterial growth curve; Hybrid algorithm; LSSVM; Modeling; NSGA-II

Year: 2018 PMID： 30805384 PMCID： PMC6371636 DOI： 10.21859/ijb.1542

Source DB: PubMed Journal: Iran J Biotechnol ISSN： 1728-3043 Impact factor: 1.671

1. Background

Prediction is a part of considerable numbers of scientific studies (1, 2). Also, time series analysis is an active research area in the modeling of the growth curves (1, 3-5). However, the accuracy of the applied model is a critical issue for many decisions making processes (2, 5). The advent of powerful computational approaches has enabled predictive microbiology to anticipate the behavior of a wide range of microorganisms under various environmental stimuli (6, 7). The complicated modifications in predictive microbiology models such as models describing individual cell behavior (8), individual lag time (9, 10), secondary models considering environmental factors, and their interactions (11, 12) have been developed for an accurate prediction of the experimental growth curves under different conditions (6, 7, 13). Bacterial growth contains four stages, including lag phase, log or exponential phase, stationary phase, and death phase. Each phase has computationally been investigated (14-16) such as comparison of the bacterial growth parameters (17), formulation of the transition from stationary to exponential phases (18), estimation of lag phase duration (14), approximation of the maximum growth rate (19), predictive models of the growth curve, etc. (for review, see 20). Least square support vector machine (LSSVM) is derived from the Support Vector Machine (SVM) approach, as described previously (20), providing immense computational advantages over the general standard SVM. Through combination of a sophisticated mathematical approaches in a single bundle, LSSVM is a cutting edge approach between theory and the real-world problems which is the advantage of the machine learning methods over that of statistical studies in various fields of science such as clinical trials (21, 22), financial studies (23), engineering (1, 24, 25), etc. Nonetheless, the potentiality of the LSSVM approaches have been neglected in the bacterial research, and a few studies have been published, such as prediction of the adaptive colony segmentation (26), ecotoxicity of the ionic liquids (27), and estimation of the Escherichia coli promoter gene sequences (22). The estimation of the parameters to feed the predictive model is a key step in modeling processes. Different optimization algorithms, especially genetic algorithm (GA), have widely been employed for parameter optimization of the empirical data (2, 28-30). GA is a simple and accurate approach to calculate sigmoid function coefficients. It can also be extended to multi-objective optimization (MOO) method for solving problems with two fitness functions (31).

2. Objectives

Even though growth predictive models are specifically developed to provide conservative predictions of a given microorganism’s growth under various conditions (11, 12), validation studies of the published models are not precise enough, which could be due to the sensitive nature of the applied algorithms. Considering the predictive potential of the LSSVM in a functional prediction of experimental data, here, LSSVM based algorithms are addressed to the model in a more accurate bacterial growth curve. In this way, a novel so-called non-dominated Sorting Genetic Algorithm-II (NSGA-II)-LSSVM algorithm has been employed for learning LSSVM tuning parameters. Then, the general performance of the proposed algorithm for both train and test datasets has been compared with GA-LSSVM, simplex-LSSVM, and sigmoid functions including Logistic and Gompertz. This comparison has been carried out using the mean absolute error (MAE) and the mean absolute percentage error (MAPE). All algorithms were fitted to the two important bacterial strains Listeria monocytogenes, an important food-borne pathogen (6, 13) and Escherichia coli, a renowned prokaryotic model organism in a vast range of biological studies.

3. Materials and Methods

Bacterial growth was measured as log colony formation unite cfu.mL-1 (18). Growth data of the two L. monocytogenes datasets with different initial population size (), over a period of 552 h were obtained from Augustin et al. (11). Furthermore, the growth data of the E. coli was profiled by the optical density (OD) using spectrophotometry at 610 nm. For E. coli dataset, the following information was collected over an experimental period of 13 h. E. coli was batch cultured in a liquid LB (Luria-Bertani) medium at 37 °C and the OD was measured (every 30 min) until the death phase (). Click here for additional data file. Click here for additional data file.

3.1. Growth Models

Two paradigms were used to model bacterial growth data: sigmoid functions (Eq. 1-3) and LSSVM based algorithms (Eqs. 4-18).

3.1.1. Sigmoid Functions Formulation

Sigmoid functions are the general ways of computationally treating bacterial growth data. The logistic function is defined as follow (Eq. 1 and 3): Gompertz function is developed as follow (Eq. 2 and 3): Where Whereas, ‘μm’ represents the maximum living bacteria in a batch culture, ‘λ’ stands for the lag growth phase of the bacteria, which is in general small. ‘A’ indicates the death phase, e=exp(, ‘N0’ shows initial bacterial population, and ‘N(t)’ is the bacterial population at ‘t’ time (19) (for comprehensive information see 30). Coefficients of the Logistic and Gompertz functions were calculated via trust-region optimization method (32) in curve fitting toolbox of MATLAB (version 8.5.0.197613 - R2015a).

3.1.2. LSSVM Mathematical Formulation

The pivotal difference between SVM and LSSVM is that LSSVM solves linear systems instead of quadratic programming. In LSSVM for function estimation (i.e., here prediction of the bacterial growth curve), the optimization problem is formulated (20) as: Subject to the equality constraints: Where, ‘Φ(x)’ is a nonlinear function that maps input space into a high dimensional space, ‘w’ is weighting vector and ‘b’ is bias, ‘c’ is the regularization (tuning) parameter determining the trade-off between the training error minimization and smoothness, and ‘ε’ is the training error. The corresponding Lagrange function can be written as: In which, ‘a’ is Lagrange multipliers (or support vectors). Taking the partial derivatives with respect to ‘w’,’b’,‘ε’,‘a’, and equating them to zero, the conditions for optimality are obtained as follows: These conditions lead to the following linear system: Where The solution can be written as: Using Mercer’s condition: Therefore, the fitting LSSVM regression would be: Several choices are possible for the kernel K(.,.). Some typical choices are linear LSSVM (20): , polynomial LSSVM (20) of the degree d: , and radial basis function (RBF) kernel (20): . RBF is a common and powerful function to solve regression problems (20, 21, 33). In fact, the RBF kernel, unlike the linear kernel, nonlinearly maps samples into a higher dimensional space; therefore, handles the cases when the relation between inputs and outputs vectors is nonlinear. Furthermore, in comparison to the polynomial kernel, the RBF kernel has less number of hyperparameters which results in the fewer numerical difficulties (see 34 for further information). In order to use LSSVM model, ‘a’ and ‘b’ coefficients should be calculated. These coefficients are calculated using linear equations of Eq. 8. In this case, first of all, ‘c’ and ‘σ’ coefficients should be determined. Determination of proper coefficients is critical for prediction accuracy.

3.1.3. N-fold Cross-Validation

N-fold cross-validation is a common method for calculating ‘c’ and ‘σ’ coefficients. This approach starts with the dividing data into modeling (80%) and test (20%) datasets ( - Stage I). The modeling dataset is used to estimate ‘c’ and ‘σ’ coefficients via GA or simplex algorithms, and the test dataset is used to test the accuracy of the final trained LSSVM model. Then, the modeling dataset is randomly divided into different groups (5 groups in this study); in every iteration, one group is considered as validation and the others as a training dataset. Afterward, LSSVM was trained per different ‘c’ and ‘σ’ coefficients which were determined through an optimization algorithm. Next, the trained LSSVM model was tested using validation data, and finally, fitness function (Eq. 18) was calculated (Fig. 1 - Stage II-I to IV). This cycle is terminated by finding the minimum fitness function. Lastly, the obtained ‘c’ and ‘σ’ coefficients are the final optimized coefficients (35).

Figure 1.

Flowchart of N-fold cross-validation method. Stage I: Dividing dataset into two sets: modeling (80 %) and test (20 %). Stage II: Calculation of ‘c’ and ‘σ’ coefficients via GA or simplex algorithms; II-I: Dividing dataset into two sets of training and validation through cross-validation method; II-II: Training of LSSVM per different ‘c’ and ‘σ’ coefficients; II-III: Test of trained LSSVM model and fitness function calculation (Eq. 18); II-IV: Iteration of stages II-I, II-II, and II-III for obtaining the best ‘c’ and ‘σ’ coefficients. Stage III: Training of LSSVM model using modeling dataset and calculation of modeling error. Stage IV: Test of LSSVM model using test dataset, and calculation of test error.

Where, ’n’ is the number of groups, ‘m’ represents number of data in each group, ‘G (i,j)’ is calculated value from ‘j’ data of ‘i’ group via LSSVM model, and ‘G(i,j)’ is ‘j’ data of ‘i’ group in experimental data. Generally, ‘c’ and ‘σ’ coefficients are optimized using simplex-LSSVM. In this paper, GA has been additionally used to calculate ‘c’ and ‘σ’ coefficients as well as the fitness function. Then, modeling error and test error were calculated through modeling dataset, trained LSSVM (stage III), and test dataset (stage IV), respectively.

3.1.4. Proposed Method

Our novel hybrid method; the so-called NSGA-II-LSSVM, was also applied in the present study. The optimization via NSGA-II-LSSVM begins with dividing datasets into modeling and test datasets. Then, the test dataset was used to test the results of modeling dataset. Also, in order to conserve model’s smoothness and improving learning accuracy, the modeling dataset was divided into training and validation datasets. In our method, LSSVM was trained using the training dataset, and tested using the validation data. Then, training error was considered as the first fitness function (f1) (Eq. 19), and validation error as the second fitness function (f2) (Eq. 20) (). We speculate that minimization of the two fitness functions (Eq. 19 and 20) will result in the calculation of ‘c’ and ‘σ’ coefficients, and acquired coefficients will provide a more accurate and smooth LSSVM modeling of the bacterial growth curve. Where ‘n’ is the number of training data, ‘G(t)’ stands for logarithmic bacterial grow at ‘t’ time, and ‘G (t)’ show the original value of growth in logarithmic scale at ‘t’ time. Where ‘m’ is the number of validation data, ‘Gvalidation (t)’ represents logarithmic bacterial growth at ‘t’ time, and ‘G (t)’ is the original amount of bacterial growth in logarithmic scale at ‘t’ time. In order to calculate ‘c’ and ‘σ’ coefficients, NSGA-II: A multi-objective optimization algorithm toolbox (http://www.mathworks.com) was employed. For complementary information about NSGA-II see . Click here for additional data file.

3.2. Comparison of the Results

In order to compare the accuracy of the employed algorithms, mean absolute error (MAE) (Eq.21) and mean absolute percentage error (MAPE %) (Eq.22) were used (36). Where, ‘G(t)’ is the logarithmic bacterial growth at time trained by model, and ‘G(t)’ is real logarithmic bacterial growth at ‘t’ time. MATLAB (version 8.5.0.197613 - R2015a) codes of implementation of LSSVM using the optimized ‘c’ and ‘σ’ coefficients obtained from NSGA-II, for E. coli growth dataset are presented as . Click here for additional data file.

4. Results

Coefficients of Logistic and Gompertz functions for each dataset are presented in and . Moreover, ‘c’ and ‘σ’ coefficients of LSSVM based models are shown in It is important to be stated that ‘c’ values should be greater than zero. Simplex-LSSVM has displayed the more accurate result in comparison with sigmoid functions (, and ). In the first dataset, comparing simplex-LSSVM with Logistics and Gompertz functions, MAPE of modeling indicated an improvement in the accuracy of the prediction using simplex-LSSVM (1.601, 5.345 and 3.984 for simplex-LSSVM, Logistic and Gomperetz models, respectively). Consistent results were also observed in MAPE of testing dataset (Table 4). Moreover, the obtained analogous results regarding the other two datasets and MAE value (Table 4) indicated that LSSVM is obviously superior to the sigmoid functions.

Table 4.

The result of sigmoid and LSSVM based algorithms over three bacterial datasets.

ModelRef.:	Listeria monocytogenes (Dataset No. 1)				Listeria monocytogenes (Dataset No. 2)				Escherichia coli (Dataset No. 3)
ModelRef.:	(11)				(11)				This study
	Modeling		Test		Modeling		Test		Modeling		Test
	MAPE[1]%	MAE[2]	MAPE%	MAE	MAPE%	MAE	MAPE%	MAE	MAPE%	MAE	MAPE%	MAE
Logistic	5.345	0.360	4.554	0.239	4.142	0.194	3.746	0.175	3.686	0.091	4.37	0.109
Gompertz	3.984	0.268	3.818	0.208	2.036	0.095	2.821	0.131	2.318	0.057	3.070	0.076
Simplex-LSSVM	1.601	0.106	3.142	0.169	1.312	0.061	2.418	0.111	0.401	0.010	0.748	0.018
GA-LSSVM	1.579	0.106	3.110	0.169	1.218	0.059	2.402	0.109	0.391	0.009	0.748	0.018
NSGAII-LSSVM	1.565	0.105	3.091	0.168	1.214	0.056	2.346	0.109	0.358	0.008	0.722	0.018

1 Mean Absolute Percentage Error

2 Mean Absolute Error

Click here for additional data file. Comparing different LSSVM based algorithms, using both MAPE and MAE measures, it was revealed that even though GA-LSSVM and NSGA-II-LSSVM are superior to simplex-LSSVM, NSGA-II-LSSVM is a more advantageous approach for the modeling bacterial growth curve (less MAPE and MAE values) (Table 4). However, no difference was observed in MAE of the test dataset of E. coli population (Table 4). In general, modeling errors in the two optimization approaches (GA and NSGA-II), based on MAPE and MAE in the train and validation datasets pinpointed that NSGA-II-LSSVM to be the most accurate approach for the bacterial growth prediction in this study. The proximity between predicted and the observed values displayed the accuracy of the applied model (Fig. 3-5, and Table 4). The errors of predictive models of the test dataset are also shown in test column of Table 4. Click here for additional data file.

5. Discussion

In the present study, sigmoid functions and LSSVM based algorithms were employed to model bacterial growth curve. Unlike conventional LSSVM model, a MOO method (NSGA-II) was applied to calculate coefficients (‘c’ and ‘σ’) for LSSVM training procedure. The accuracy of the proposed hybrid model was verified through several comparisons of the methods using MAE and MAPE. Even though Logistics and Gompertz functions have widely been applied for modeling of the bacterial growth due to their simplicity (19, 37), they are less accurate comparing to the sophisticated approaches. Actually, nonlinear time series models, such as LSSVM, have been developed, because of limitations of the linear methods, to improve the forecasting performance (27). Consistent with our previous study (30), GA, due to its simplicity and accuracy, is a more appropriate parameter optimizer approach to calculate sigmoid function coefficients. However, to solve the optimization problems with two fitness functions, MOO approaches are superior (31). In this study, using NSGA-II for calculating the two coefficients: ‘c’ and ‘σ’, has resulted in a better optimization in addition to an increased accuracy of the bacterial growth modeling, accordingly. The hybrid models overcome the deficiencies of the individual models through merging different methods which result in the improvement of the prediction accuracy (2). The distinct differences between conventional LSSVM models and hybrid LSSVM model, aligned with various studies (1, 22, 26, 38), revealed the accuracy of hybrid models for the bacterial growth prediction in comparison to the single models. Although the developer of a prediction method should determine a proper model considering the characteristics of each method (2), finding the best scenario also depends on the nature of underlying data distribution, for instance, combination of feature selection (FS) and LSSVM has shown100% success rate in recognizing E. coli promoter gene sequence (22), or self-organizing map (SOM), and LSSVM has shown a promising alternative technique for river flow time series forecasting (1). In conclusion, through exploring the capability and effectiveness of the idea of the hybrid modeling, we found NSGA-II-LSSVM outperforms other models, which in turn provides a promising alternative technique for prediction of the bacterial growth curve. In fact, the proposed model offers a better prediction due to its capability for fitting appropriate coefficients values. Prior studies, with different prediction approaches, have addressed other bacterial population datasets, therefore, more efforts should be made using other strains, different datasets of experimental cases and/or under various environmental conditions to check the robustness of our proposed hybrid model. Additionally, finding a more optimal kernel function for a given learning task is still open to debate.

Table 1.

Estimated parameters of Logistics function using curve fitting method.

Dataset	A[1]	μ_m[2]	ƛ[3]	Ref.
Listeria monocytogenes (Dataset No. 1)	14.2	96.68	0.059	(11)
Listeria monocytogenes (Dataset No. 2)	10.15	86.25	0.059	(11)
Escherichia coli (Dataset No. 3)	3.26	1.06	1.33	This study

1Death phase

2Maximum alive bacteria in the batch culture (1/h)

3Lag growth phase (h)

Table 2.

Estimated parameters of Gompertz function using curve fitting method.

Dataset	A¹	μ_m²	ƛ³	Ref.
Listeria monocytogenes (Dataset No. 1)	14.91	0.054	78.39	(11)
Listeria monocytogenes (Dataset No. 2)	10.55	0.053	72.85	(11)
Escherichia coli (Dataset No. 3)	3.306	1.005	1.16	This study

1Death phase

2Maximum alive bacteria in the batch culture (1/h)

3Lag growth phase (h)

Table 3.

LSSVM parameters over Simplex, GA and NSGAII optimization algorithms.

Algorithms	Simplex		GA		NSGA-II		Ref.
Dataset	c	σ	c	σ	c	σ	Ref.
Listeria monocytogenes (Dataset No. 1)	1542.1	0.71	390.7	0.52	449.9	0.36	(11)
Listeria monocytogenes (Dataset No. 2)	2016.4	0.99	318.5	0.53	912.3	0.62	(11)
Escherichia coli (Dataset No. 3)	860851.5	0.65	3889.9	0.41	117453.1	0.50	This study

16 in total

1. Significance of inoculum size in the lag time of Listeria monocytogenes.

Authors: J C Augustin; A Brouillaud-Delattre; L Rosso; V Carlier
Journal: Appl Environ Microbiol Date: 2000-04 Impact factor: 4.792

2. Improving artificial neural networks with a pruning methodology and genetic algorithms for their application in microbial growth prediction in food.

Authors: Rosa María García-Gimeno; César Hervás-Martínez; SilónizMariaIsabel de
Journal: Int J Food Microbiol Date: 2002-01-30 Impact factor: 5.277

3. Dynamic modeling of genetic networks using genetic algorithm and S-system.

Authors: Shinichi Kikuchi; Daisuke Tominaga; Masanori Arita; Katsutoshi Takahashi; Masaru Tomita
Journal: Bioinformatics Date: 2003-03-22 Impact factor: 6.937

4. Estimating the bacterial lag time: which model, which precision?

Authors: Florent Baty; Marie-Laure Delignette-Muller
Journal: Int J Food Microbiol Date: 2004-03-15 Impact factor: 5.277

5. Concepts and tools for predictive modeling of microbial dynamics.

Authors: Kristel Bernaerts; Els Dens; Karen Vereecken; Annemie H Geeraerd; Arnout R Standaert; Frank Devlieghere; Johan Debevere; Jan F Van Impe
Journal: J Food Prot Date: 2004-09 Impact factor: 2.077

6. Modeling of the bacterial growth curve.

Authors: M H Zwietering; I Jongenburger; F M Rombouts; K van 't Riet
Journal: Appl Environ Microbiol Date: 1990-06 Impact factor: 4.792

7. Modeling individual cell lag time distributions for Listeria monocytogenes.

Authors: Arnout R Standaert; Kjell Francois; Frank Devlieghere; Johan Debevere; Jan F Van Impe; Annemie H Geeraerd
Journal: Risk Anal Date: 2007-02 Impact factor: 4.000

8. Behaviour of Listeria monocytogenes under combined chilling processes.

Authors: J M Membré; T Ross; T McMeekin
Journal: Lett Appl Microbiol Date: 1999-03 Impact factor: 2.858

9. Modelling the individual cell lag time distributions of Listeria monocytogenes as a function of the physiological state and the growth conditions.

Authors: Laurent Guillier; Jean-Christophe Augustin
Journal: Int J Food Microbiol Date: 2006-07-18 Impact factor: 5.277

10. Growth rate and growth probability of Listeria monocytogenes in dairy, meat and seafood products in suboptimal conditions.

Authors: J-C Augustin; V Zuliani; M Cornu; L Guillier
Journal: J Appl Microbiol Date: 2005 Impact factor: 3.772

1 in total

1. Optimization of Ultrasonic-Assisted Extraction of Active Components and Antioxidant Activity from Polygala tenuifolia: A Comparative Study of the Response Surface Methodology and Least Squares Support Vector Machine.

Authors: Xuran Li; Simiao Chen; Jinghui Zhang; Li Yu; Weiyan Chen; Yuyan Zhang
Journal: Molecules Date: 2022-05-10 Impact factor: 4.927

1 in total