Literature DB >> 32374287

Comparison of Optimization-Modelling Methods for Metabolites Production in Escherichia coli.

Mee K Lee¹, Mohd Saberi Mohamad^2,3, Yee Wen Choon¹, Kauthar Mohd Daud¹, Nurul Athirah Nasarudin¹, Mohd Arfian Ismail⁴, Zuwairie Ibrahim⁵, Suhaimi Napis⁶, Richard O Sinnott⁷.

Abstract

The metabolic network is the reconstruction of the metabolic pathway of an organism that is used to represent the interaction between enzymes and metabolites in genome level. Meanwhile, metabolic engineering is a process that modifies the metabolic network of a cell to increase the production of metabolites. However, the metabolic networks are too complex that cause problem in identifying near-optimal knockout genes/reactions for maximizing the metabolite's production. Therefore, through constraint-based modelling, various metaheuristic algorithms have been improvised to optimize the desired phenotypes. In this paper, PSOMOMA was compared with CSMOMA and ABCMOMA for maximizing the production of succinic acid in E. coli. Furthermore, the results obtained from PSOMOMA were validated with results from the wet lab experiment.

Entities: Chemical

Keywords: Artificial Intelligence; Bioinformatics; Metabolic Engineering; Metaheuristic algorithms; Minimization of Metabolic Adjustment

Mesh：

Substances：
Succinic Acid

Year: 2020 PMID： 32374287 PMCID： PMC7734505 DOI： 10.1515/jib-2019-0073

Source DB: PubMed Journal: J Integr Bioinform ISSN： 1613-4516

Introduction

In previous years, petroleum is used as a primary component in transportation, mining, industrial and others. However, due to the restrained reserve, petroleum is unable to withstand the increased demand for global and consumer products. Furthermore, the use of fossil fuels rears serious implications to the environment such as greenhouse effects. Hence, this induced the use of biomass as an alternative source considering it is renewable and available locally. These biomass resources include agronomic residues such as sugarcane waste, wheat or rice straw, and paper waste. The bioprocess is known as biomass fermentation. Microorganisms such as E. coli and Saccharomyces cerevisiae are able to produce succinic acid and ethanol in anaerobic condition. However, the amount of succinic acid and ethanol produced are still below the threshold. Metabolic network consists of reactions between enzymes and metabolites occur in an organism that may help the biologists and researchers to understand the genotypic and phenotypic characteristics of a cell. With the advancement in genome sequencing, a detail organization of an organism can be deciphered, thus exploit the organisms for strains optimization. However, metabolic network is too complex, which resulted in high dimensionality of solution space, thus increasing the computational time exponentially. Therefore, metabolic engineering has been an important factor for improving the production of various chemical substances by altering organisms. Recently, metabolic engineering has been improved by incorporating systems biology known as systems metabolic engineering. Systems biology provides a more conceptual understanding of metabolic enzymes and pathways, thus accelerate the formation or modification of pathways with regard to optimize the production of industrial metabolites [1]. One of the modifications is gene knockout, whereby a set of genes is removed from the mutant and the phenotypic effect is analyzed. The purpose of gene knockout is to ensure the flux will go towards the production of desired metabolites [2]. However, it is difficult to obtain a near-optimal set of genes knockout. Therefore, the development of constraint-based methods has become a great achievement in metabolic engineering as they help to predict, analyze and interpret all the biological functions in the metabolic networks [3]. The first constraint-based method is Flux Balance Analysis (FBA) that discovers the behaviors of a metabolic network using the mathematical computation [4]. Hence, a higher level of abstraction needs new mathematical approaches to illustrate these biological processes. Eventually, this brings to the development of Minimization of Metabolic Adjustment (MOMA) and Regulatory On/Off Minimization (ROOM) [5], [6]. Both MOMA and ROOM are used to predict the steady-state of the mutant’s metabolic network after gene knockouts. However, there is a possibility that the steady-state obtained by ROOM is hardly being found by the organism. In this research, MOMA is chosen as modeling algorithm considering that FBA assumes the mutant organism is having the same optimal metabolic state as a wild-type organism [7]. Furthermore, MOMA is more suitable to predict the suboptimal flux distribution in mutant organisms. Still, MOMA lacks the optimization algorithm that is used to identify knockout genes that can maximize ethanol production. Hence, MOMA is hybridized with an optimization algorithm to analyze and predict the effect of genes knockout towards the overproduction of ethanol. Metaheuristic algorithms have been proposed to improve the production of ethanol in E. coli [8]. Different metaheuristic algorithms have been applied to identify near-optimal genes knockout as metaheuristic algorithms are computationally less expensive. The first method that applies the metaheuristic algorithm is OptGene. OptGene applies Genetic Algorithm (GA) for searching and identifying a set of genes knockout that is evaluated by FBA [9]. Furthermore, OptGene introduced a new fitness function, which is Biomass-Product Couple Yield (BPCY). Following that, Simulated annealing (SA) and Set-based Evolutionary Algorithm (SEA) have been proposed to identify a set of genetic manipulations that resulted in increased desired phenotypes [10]. However, these methods produce over-optimistic solutions, solutions trapped in local optima and high computation time [9], [11], [12]. Several major advances in in silico metabolic engineering take different approaches. One of the development is multiobjective optimization that produces a set of non-dominated solutions between two competing objectives such as production rate and growth rate [13], [14], [15], [16]. Several methods have been developed to solve the issues of competing objectives, including Linear Physical Programming based Flux Balance Analysis (LPPFBA), Noninferior Set Estimation (NISE) with FBA, Genetic Design through Multi-objective Optimisation (GDMO) and others [17], [18], [19]. The advantages of these methods are the decision-makers, which are industrialists or biologists, may have various solutions instead of one single solution. Furthermore, the suggested knockout genes may produce mutant with higher growth rate as well as higher production rate. In this paper, a comparative study of PSOMOMA, ABCMOMA and CSMOMA are shown in terms of production rate and growth rate of succinic in mutant E. coli. These algorithms were improved with MOMA as fitness function evaluation. The paper is organized as follows: Section 2 describes the metaheuristic algorithms, Section 3 provides the results and discussion and lastly is the conclusion of the paper.

Swarm Intelligence

Swarm intelligence was inspired by the foraging behavior of animals such as bees, ants, birds and fishes. The discipline focuses on the behaviors of animal interaction with one another and with the environment under a decentralized control system. At a high level, a swarm can be viewed as a group of agents cooperating to achieve some purposeful behavior and achieve the same goal [20]. Foraging behaviors describe the movement of animals around their food resources or movement when finding their nest and mates. Besides, swarm intelligence provides a global optimization method that helps to solve complex problems in real life.

Particle Swarm Optimization (PSO)

PSO is an algorithm used to solve discrete and continuous optimization problems in a population. Traditional PSO was inspired by the social-psychology, such as bird flocking and fish schooling which is introduced by [21]. PSO involves the use of simple concepts and mathematical operators. Besides, PSO is similar to the genetic algorithm (GA), whereby the algorithm is initialized randomly. The only difference between GA and PSO is that PSO has particles, which is agents that move across the problem space. The population in PSO is known as “swarm”. Each particle has its own velocity and position at a certain instance. The different location of the particle in the problem space indicates different possible solutions for a given optimization problem. Every particle will look for the best location in problem space by changing their velocity towards the best solution.

Artificial Bee Colony (ABC)

Artificial Bee Colony (ABC) algorithm was inspired by the foraging behavior of honeybee colony [22]. ABC consists of two modes of behavior, which are recruitment to a nectar source and the abandonment of a source. It consists of three main components: employed foragers, onlookers and scouts. Employed foragers: are associated with the food sources that are currently exploited. They share the information such as distance and direction of the food sources with other bees waiting in the hive. Onlookers: acquire the information from employed foragers and chooses the food source with higher nectar amounts. Scouts: randomly search for new food sources (solutions) that are abandoned by the employed bees.

Cuckoo Search (CS)

Cuckoo search is based on the parasitic behavior of cuckoos in nature [23]. It incorporates a Levy flight strategy in finding the best solution. There are three rules in CS: Only one egg can be laid in a nest at once The nest with higher fitness will survive for the next generations The probability of replacing and discovered by the host is between [0,1]. The rules above are used in searching operations of CS where the selection process is operated by Levy flight while the exploitation process is operated by applying the probability of p ∝ ∈ [0,1]. The advantage of CS is the incorporation of Levy flight, which allows the new solutions to be generated far from the current best solution [23]. Due to this, there are fewer chances of solutions trapped in local optima. Therefore, a fraction of probability is imposed on the cuckoo egg. These metaheuristic algorithms have been compared and the advantages and disadvantages are represented in Table 1.

Table 1:

Comparisons of metaheuristic algorithms.

Algorithm	Advantages	Disadvantages	Ref.
PSO	– Easy implement– No overlapping mutation calculation	– Easily suffers from the partial optimism	[24], [25], [26], [27]
ABC	– Strong robustness– Fast convergence– High flexibility	– Premature convergence in the later search period– Accuracy of the optimal value may not meet the requirements	[7], [28], [29]
CS	– Dynamic applicable (adapt to changes)– Easy to implement	– Easily trapped in local optima– Levy flight affects the convergence rate	[22], [30], [31]

Comparisons of metaheuristic algorithms. To improve metabolite production, the problem can be described as follows: MOMA is similar to FBA. Thus, metabolic network is represented in a stoichiometric matrix S of a size m × n, whereby m is the metabolites and n is the reactions. The matrix, S shows the relationship between reactions v of length n and concentrations x of length m. FBA is used to evaluate the fitness, which is fluxes as shown in the equation below: The fluxes are evaluated to time, where T means transposed. FBA is used to calculate the flux distribution of wild-type and mutant, while MOMA is used to minimize the Euclidean distance between wild-type fluxes and mutant fluxes. Therefore, using linear programming, the objective of FBA is optimized as follows: where v is flux vector and c is a vector weight of coefficient reactions to be optimized. After FBA computation, by using quadratic programming, MOMA is used to minimize the distance between wild-type and mutant. The objective of MOMA is shown as follows: where v and v are flux distribution of wild-type and mutant, respectively. I is the identity matrix of size n × n with length v

Materials and Methods

In this paper, PSOMOMA, ABCMOMA and CSMOMA have been validated with E. coli for maximizing the production of succinic acid. The glucose is used as the sole carbon and its uptake rate is set to 10 mmol gDW−1 h−1. MATLAB R2013b is used to implement these algorithms. Meanwhile, Constraints Based Reconstruction Analysis (COBRA) toolbox is used to model and analyse the metabolic model by MOMA. SBML Toolbox is used to read the file in SBML format. Table 2 shows the model used.

Table 2:

Numbers of reactions and metabolites involved before and after the model pre-processing.

Model	Number of reactions	Number of metabolites
Raw model	2583	1805
Pre-processed model	2342	1585

Numbers of reactions and metabolites involved before and after the model pre-processing.

Results and Discussion

The experimental result obtained from the hybrid of PSOMOMA algorithm is compared with the previous algorithms in enhancing the succinic production and the growth rate of E. coli. This section compares the growth rate and succinic production of E. coli from PSOMOMA with the previous results obtained for CSMOMA, ABCMOMA and also results from the wet laboratory [7], [11]. Table 3 below shows the result obtained for PSOMOMA, CSMOMA and ABCMOMA. The results showed that PSOMOMA achieves the highest growth rate compared to ABCMOMA and CSMOMA. Meanwhile, CSMOMA is able to found a mutant with the highest production rate of succinic acid. PSOMOMA able to found the highest production rate with 4 suggested genes knockout, CSMOMA with 5 genes knockout and ABCMOMA with 2 genes knockout.

Table 3:

Result comparison on succinate production for PSOMOMA, CSMOMA and ABCMOMA.

Method	Gene knockouts	Succinic production (mmol gDW⁻¹ h⁻¹)	Growth rate (h⁻¹)
PSOMOMA	ackA, pta, ghrA ^* , dctA ^*	15.27	0.7967
CSMOMA [7]	asnA, ghrA ^* , pykA, putP, dctA ^*	16.58	0.50898
ABCMOMA [11]	fum, zwf	6.69	0.44

Result comparison on succinate production for PSOMOMA, CSMOMA and ABCMOMA. The suggested genes knockout by PSOMOMA are: [ackA, pta, fum and lpd]. The inactivation of pta-ackA genes has been proved to improve the production of succinic acid [32]. According to the authors, the removal of these genes affects the fluxes towards ethanol formation. The inactivation of these genes indirectly will affect the production of ethanol, which is encoded by adhE gene. Therefore, the mutant strain will increase the production of succinate and D-lactate. The inactivation of ghrA responsible for glycoxylate reductase will affect the metabolism of glycine and serine [33]. Meanwhile, dctA gene is required for transport of dicarboxylate [34]. The removal of these genes will reduce the competition for carbon sources, which is glucose. Moreover, PSOMOMA can find 2 similar gene knockout as CSMOMA. Although CSMOMA found the highest production rate, however, it involves knocking out five genes compared to PSOMOMA and ABCMOMA, which only knocked out four and two genes, respectively. Furthermore, the suggested knockout genes obtained by PSOMOMA generates viable mutant with the highest growth rate. Nevertheless, the suggested knockout genes obtained by these algorithms are restricted to the computer simulation. In a wet-lab experiment, various other factors need to identify and considered, as it is difficult to apply and identify a single gene. Overall, PSOMOMA can find a set of genes knockout with the highest growth rate in E. coli compared to the other methods.

Wet Laboratory

In this section, the production of ethanol in E. coli obtained by PSOMOMA is compared with results from the wet laboratory. The results of ethanol production by PSOMOMA has been published in [35]. According to [36], three mutant strains of E. coli were created for maximizing the ethanol production, which are SY03, SY04 and MG1655. The results of iJO1366 are compared with MG1655 mutant strains considering that iJO1366 was constructed from this strain. Table 4 shows the ethanol production obtained from both PSOMOMA algorithm and wet laboratory test.

Table 4:

Result comparison on ethanol production for PSOMOMA and Wet Laboratory Test.

Method	Knockouts/environment condition	Gene knockouts	Ethanol Production (mmol gDW⁻¹ h⁻¹)
PSOMOMA	2	ACKr, PPS	17.2029
	3	pflA,frdB,ldhA	17.2270
	4	ACKr, ldhA, FUMt2_2, fdhF	16.4891
	5	ACKr, fumB, PPS, GND, GLUDy	16.4501
Wet Laboratory [36]	pH 7.5	MG1655 (pZSBlank)	7.8400
	pH 7.5	MG1655 (pZSKLMgldA)	8.7000
	pH 6.3	MG1655 (pZSKLMgldA)	11.1400

Result comparison on ethanol production for PSOMOMA and Wet Laboratory Test. As shown in Table 4 below, all different numbers of genes knockout in PSOMOMA result in higher ethanol production than the wet laboratory test. The highest ethanol by mutant MG1655 in wet laboratory is only 11.14 mmol gDW−1 h−1 whereas the highest production by PSOMOMA is 17.227 mmol gDW−1 h−1, which is a significant difference of 6.087 mmol gDW−1 h−1. Although PSOMOMA provides an overly optimistic result of ethanol production, however, the suggested knockout genes obtained are restricted to the computational simulation. It is advisable, thus, to test the suggested knockout genes by PSOMOMA in wet-lab experiments.

Conclusion

This paper focuses on a comparison of metaheuristic algorithms to solve the identification of near-optimal genes knockout to optimize the production of succinic acid. Of the three tested algorithms, PSO performs better in terms of growth rates while CS performs better in finding mutant with a higher production rate. Although CSMOMA produces the highest production rate for 5 suggested genes knockout, however, the growth rate is lesser than PSOMOMA. In future works, multiobjective optimization can be included for optimization of two competing objectives.

22 in total

Review 1. Succinate production in Escherichia coli.

Authors: Chandresh Thakker; Irene Martínez; Ka-Yiu San; George N Bennett
Journal: Biotechnol J Date: 2011-09-20 Impact factor: 4.677

2. Regulatory on/off minimization of metabolic flux changes after genetic perturbations.

Authors: Tomer Shlomi; Omer Berkman; Eytan Ruppin
Journal: Proc Natl Acad Sci U S A Date: 2005-05-16 Impact factor: 11.205

Review 3. Metabolic systems modeling for cell factories improvement.

Authors: Po-Wei Chen; Matthew K Theisen; James C Liao
Journal: Curr Opin Biotechnol Date: 2017-04-04 Impact factor: 9.740

4. FOCuS: a metaheuristic algorithm for computing knockouts from genome-scale models for strain optimization.

Authors: Sarma Mutturi
Journal: Mol Biosyst Date: 2017-06-27

5. Analysis of optimality in natural and perturbed metabolic networks.

Authors: Daniel Segrè; Dennis Vitkup; George M Church
Journal: Proc Natl Acad Sci U S A Date: 2002-11-01 Impact factor: 11.205

6. What is flux balance analysis?

Authors: Jeffrey D Orth; Ines Thiele; Bernhard Ø Palsson
Journal: Nat Biotechnol Date: 2010-03 Impact factor: 54.908

7. Soft constraints-based multiobjective framework for flux balance analysis.

Authors: Deepak Nagrath; Marco Avila-Elchiver; François Berthiaume; Arno W Tilles; Achille Messac; Martin L Yarmush
Journal: Metab Eng Date: 2010-05-27 Impact factor: 9.783

8. Enhancement of lactate and succinate formation in adhE or pta-ackA mutants of NADH dehydrogenase-deficient Escherichia coli.

Authors: N-R Yun; K-Y San; G N Bennett
Journal: J Appl Microbiol Date: 2005 Impact factor: 3.772

9. Design of homo-organic acid producing strains using multi-objective optimization.

Authors: Tae Yong Kim; Jong Myoung Park; Hyun Uk Kim; Kwang Myung Cho; Sang Yup Lee
Journal: Metab Eng Date: 2014-12-23 Impact factor: 9.783

10. Evolutionary programming as a platform for in silico metabolic engineering.

Authors: Kiran Raosaheb Patil; Isabel Rocha; Jochen Förster; Jens Nielsen
Journal: BMC Bioinformatics Date: 2005-12-23 Impact factor: 3.169