Literature DB >> 34965249

The automatic parameter-exploration with a machine-learning-like approach: Powering the evolutionary modeling on the origin of life.

Abstract

The origin of life involved complicated evolutionary processes. Computer modeling is a promising way to reveal relevant mechanisms. However, due to the limitation of our knowledge on prebiotic chemistry, it is usually difficult to justify parameter-setting for the modeling. Thus, typically, the studies were conducted in a reverse way: the parameter-space was explored to find those parameter values "supporting" a hypothetical scene (that is, leaving the parameter-justification a later job when sufficient knowledge is available). Exploring the parameter-space manually is an arduous job (especially when the modeling becomes complicated) and additionally, difficult to characterize as regular "Methods" in a paper. Here we show that a machine-learning-like approach may be adopted, automatically optimizing the parameters. With this efficient parameter-exploring approach, the evolutionary modeling on the origin of life would become much more powerful. In particular, based on this, it is expected that more near-reality (complex) models could be introduced, and thereby theoretical research would be more tightly associated with experimental investigation in this field-hopefully leading to significant steps forward in respect to our understanding on the origin of life.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34965249 PMCID： PMC8752021 DOI： 10.1371/journal.pcbi.1009761

Source DB: PubMed Journal: PLoS Comput Biol ISSN： 1553-734X Impact factor: 4.475

Introduction

Starting from a prebiotic chemical world, the scenario concerning the origin of life should have included a series of complicated processes with the combination of chemistry and evolution [1-4]. Since the pioneer work of Miller and Urey in 1950s [5], experimental studies have provided quite a lot of insights into the chemical aspect of these processes [6-8], however, the involved evolutionary processes, perhaps having lasted quite a long time (e.g., years, tens of years, or even much longer), are difficult to mimic or study in the lab. Fortunately, theoretical modeling has proven a valid way to deal with the evolutionary aspect, as exemplified by Eigen’s famous work concerning “hypercycle” [9].

Computer modeling on evolutionary processes in the origin of life

In early years, theoretical modeling in the field of the origin of life was typically based upon chemical reaction kinetics, which derived a group of differential equations to describe the dynamics of the target system–thus the so-called “differential equation model”. Most parameters for the model are simply rates of various reactions. The differential equation group is usually too complicated to obtain an analytical solution, and therefore, primordial researches along this line usually only analyzed the equilibrium point(s) of the model system [10-12]. By such modeling, the knowledge we obtained about the relevant evolutionary process was quite limited. Fortunately, we may also get aware of the evolution mechanism if we can “see” the behavior of the model system under different conditions–that is, the theoretical mimicking or simulation. For the differential equation model, we can obtain its “numerical solution” through “numerical integration” and thereby “observe” the model system’s behavior. Aided by computer technology, numerical solution of complex differential equation models became a conventional approach in this area, which just represented early work of “computer simulation” (or termed “computer modeling”) in the origin of life [13-18]. Later on, another sort of computer simulation also arose, i.e. the so-called “Monte-Carlo simulation”. Wherein, a probabilistic model is established and the model system “runs” through sampling on random numbers iteratively–results are obtained by statistics on certain key variable(s), which reflect the behavior of the system. To date, Monte-Carlo simulation has become the most important method in the theoretical investigation on the evolutionary aspect of the origin of life [19-29] (here the reference list is certainly not a complete one).

The significance of parameter-space exploration

A normal way for computer modeling is to set the parameters in the model according to our knowledge on reality, and run the model (conduct “simulation”) to see the outcome–then, we may make prediction for the behavior of the target system. However, owing to our limitation of knowledge concerning prebiotic environments and chemistry, it is usually difficult to justify the value-setting of the parameters used in relevant modeling studies. For example, the Miller-Urey experiment assumed a reducing atmosphere for the prebiotic earth [5]; however, this putative scene was later doubted in the field of geochemistry [30,31], and no clear conclusion has yet been achieved so far. Therefore, in the field of the origin of life, computer simulation studies were usually conducted in a reverse way: the model’s parameter space was explored to find out parameter values which may support a hypothetic evolutionary process. That is, we may expect to assess the “legitimacy” of the “favoring parameter values” later, when relevant chemical or environmental knowledge is available. On the other hand, if we are confident about the hypothetic process because it is “reasonable” or has “conspicuous supporting evidence” (most notably, e.g., “the RNA world” [32-Life. 2017 ">35]), we may make some deduction on relevant prebiotic conditions according to the “favoring parameter values” for that process, thereby even likely improving our knowledge on prebiotic environments and chemistry. In previous studies, people used to explore the parameter space manually (typically not presented explicitly in their papers–perhaps because it is difficult to characterize the route of the manual exploration), which is an arduous job, especially for the complicated modeling comprising many parameters (notably, the parameter space increases exponentially with parameter numbers). As a consequence, researchers in this area tend to adopt modeling systems quite abstract (thus involving fewer parameters)–some studies even resorted on oversimplified models (e.g., the so-called “toy models” or even “artificial chemistry” [36-38]), which appeared doubtful in their relevance to the reality concerning the origin of life. The awkward situation raises an urgent issue: can we do the parameter-exploration in a more automatic way?

The application of machine-learning approaches

Machine-learning is a special kind of computer algorithms (as called, “master algorithm” [39] or the algorithm of artificial intelligence), seeking to automatically train a target model (or function) aiming at various tasks (e.g., classification, image recognition and natural language processing). In particular, a branch of machine-learning, “connectionism”, uses the so-called “artificial neural network” as its target model [39], thereby mimicking the human brain from the basic structure and essential mechanisms–thus is expected to realize veritable artificial intelligence. Later on, the power of such machine-learning turned out to be associated with the ‘depth’ of the artificial neuron network, and the so-called “deep learning” has achieved tremendous successes in a variety of application fields [40]. In fact, the model training in machine-learning is, by and large, just to explore parameters automatically (there are often many parameters as well, e.g., in the complex artificial neuron network of “deep learning”), seeking to find “appropriate” parameter values for the model to be adept at the aimed task. Getting inspiration from this, we ask: “can we conduct our parameter-exploration automatically in a similar way?” That is, using the evolutionary model as the target model to train, can we explore the parameter-space automatically and thereby find “appropriate parameter-settings” resulting in the evolutionary behavior we suppose?

Results

Firstly, we investigated a case derived from one of our previous modeling studies, relevant to early evolution in the scenario of the RNA world [20]. In that study, we supposed that a ribozyme that catalyzes the synthesis of nucleotides (nucleotide synthetase ribozyme, “NSR” for short) may have become thriving in a prebiotic RNA pool, because this ribozyme could synthesize nucleotides around itself, thus favoring its own replication. Therein, we explored the parameter space manually and found out “appropriate parameter values” supporting the prosperity of NSR in the model system [20] (Fig 1A shows a typical case). Now we intend to do the parameter-exploration automatically based upon the idea of machine-learning.

Fig 1

The evolutionary dynamics of molecular number of NSR in the modeling.

The evolutionary dynamics of molecular number of NSR in the modeling.

(a) A typical case for the spread of NSR. One NSR molecule is inoculate at step 1×104, and it replicates and becomes prosperous in the system [20]. Parameter values: PNF = 2×10−4, PNFR = 0.9, PND = 0.01, PRL = 2×10−6, PBB = 1×10−6, PAT = 0.1, PFP = 0.01, and PMV = 0.01 (note that for a concise description we omit the other five parameters in the model which are not involved in the parameter-exploration here). (b) A case we assumed as the starting point for the machine-learning. One hundred NSR molecules are inoculated at step 1×104, and the molecular number decreases gradually. Parameter setting: PNF = 4×10−3, PNFR = 0.02, PND = 1×10−3, PRL = 2×10−5, PBB = 1×10−5, PAT = 0.5, PFP = 0.1, and PMV = 1×10−3. Obviously, such a parameter-setting does not favor the spread of the NSR. The NSR number at step 15×104 (i.e., the final step shown here) is adopted as the reference criterion–i.e. the objective function for machine-learning, which is expected to be improved through the following automatic parameter-adjustment (exploration). There are totally thirteen parameters in the mode [20], and here we concentrate on eight of them (S1 Table) to conduct automatic exploration. In that study, we inoculated one NSR molecule to see if such molecules could spread (become thriving by replication) in the system (Fig 1A). In fact, the number of NSR molecules a definite period of time after the inoculation may serve as a target value reflecting the spread tendency of the NSR. That is to say, via automatic parameter-adjustment, this number is expected to increase–in terms of machine-learning, “the objective function”. However, for the starting point of the machine-learning–assumedly with a “bad parameter set”, one NSR molecule would tend to degrade by chance before it can give rise of more offspring (thus no learning feasible). Therefore, we inoculated 100 NSR molecules initially–so that we see a decline of NSR molecules rather than an immediate extinction (Fig 1B). Then, the NSR number at a reference point within the declining curve (here, step 15×104) was chosen and the automatic parameter-exploration was conducted to find parameter settings favoring the increase of this “objective function”. The alteration of parameter settings was expected to reverse the declining tendency and ultimately favor the spread of the NSR in the system instead. This “machine-learning” used an optimization approach of gradient ascent–the parameters changed simultaneously in line with the steepest rising direction regarding the objective function–the NSR number (see Methods). The approach turned out to be successful, and the target NSR number rises dramatically while the parameter setting is being adjusted automatically (Fig 2). The subfigures in Fig 2 correspond to four different starting parameter sets; wherein, the curves with different colors correspond to cases adopting different learning rate. Generally, the final target NSR number tends towards a similar level (around 3500 here), and a larger learning rate means a faster learning process but greater fluctuation during the learning.

Fig 2

The improvement of the target NSR number by the automatic parameter-exploration in principle of machine-learning.

The improvement of the target NSR number by the automatic parameter-exploration in principle of machine-learning.

The target NSR number refers to the number of NSR molecules at step 1.5×105 in the evolutionary dynamics (see Fig 1B; as the “objective function”). “e” denotes the learning rate (the color legend in the lower right subfigure applies to the whole figure). A round of learning means a round of parallel adjustment of all the eight parameters according to the greatest gradient (see Methods). The four subfigures show cases with different starting parameter values: (a) PNF = 4×10−3, PNFR = 0.02, PND = 1×10−3, PRL = 2×10−5, PBB = 1×10−5, PAT = 0.5, PFP = 0.1, and PMV = 1×10−3 (i.e., the same as those in Fig 1B); (b) PNF = 4×10−3, PNFR = 0.2, PND = 0.01, PRL = 2×10−5, PBB = 1×10−6, PAT = 0.1, PFP = 1×10−3, and PMV = 1×10−4; (c) PNF = 5×10−3, PNFR = 0.2, PND = 0.01, PRL = 1×10−5, PBB = 1×10−5, PAT = 0.05, PFP = 0.01, and PMV = 0.01; (d) PNF = 1×10−3, PNFR = 0.01, PND = 1×10−3, PRL = 1×10−5, PBB = 1×10−5, PAT = 0.1, PFP = 1×10−3, and PMV = 1×10−3. Fig 3 demonstrates the details concerning the automatic adjustment of the eight parameters during the learning process. Notably, though with different starting parameter sets, the ultimate “good parameter sets” appear to be similar (see the curves with different colors). In general, by the learning, the parameters PRL, PBB, PFP and PNF tend towards values rather small (these results are coincident with those by manual exploration in the previous study [20]). The results concerning other parameters (i.e., PNFR, PAT, PMV, PND) are not so straightforward, showing more complicated influence of them on the evolutionary dynamics.

Fig 3

The automatic parameter-adjustment during the parameter-exploration in principle of machine-learning.

The automatic parameter-adjustment during the parameter-exploration in principle of machine-learning.

The parameter-adjustment tendency of the four cases with the learning rate of 0.2 in Fig 2A–2D (i.e., represented by the red lines within the subfigures in Fig 2) are shown here in different colors (the legend in the lower right subfigure applies to the whole figure). The vertical axis of a subfigure represents the value of the corresponding parameter whose name is shown within the panel (in the PFP subfigure, the Case_b line is covered by the Case_d line). Note that the eight probabilities are in practice adjusted simultaneously in a learning round (see Methods). Above we demonstrate the result of machine-learning for the parameter settings favoring the increase of NSR after the inoculation of 100 molecules (Fig 4A shows the greatly “improved” dynamics–in comparison with Fig 1B). Indeed, under the same parameter setting, when inoculating only one NSR molecule initially, it can also spread in the system (Fig 4B)–manifesting the success of our automatic parameter-exploration (in fact, here there are about 800 NSR molecules at the ultimate balance, apparently better than the result in our previous study–i.e. less than 300 NSR molecules in Fig 1A).

Fig 4

The parameter setting resulting from the machine-learning turned out to favor both the spread of NSR and that of non-functional RNA species.

The parameter setting resulting from the machine-learning turned out to favor both the spread of NSR and that of non-functional RNA species.

In reference of the case shown in Fig 2A with learning rate of 0.2 (the red line), the parameter set at the 200th round of the machine-learning is adopted: PNF = 3.04×10−4, PNFR = 5.51×10−3, PND = 0.0194, PRL = 7.25×10−8, PBB = 6.24×10−8, PAT = 0.0763, PFP = 3.51×10−4, PMV = 4.36×10−5. At step 1×104, (a) one hundred NSR molecules are inoculated; (b) one NSR molecule is inoculated; (c) one hundred control (Ctrl) molecules, which have no enzymatic activity, are inoculated; (d) one control molecule is inoculated. However, we noticed a detail which looks abnormal. In the previous modeling study [20], we had supposed that NSR may have spread naturally in the system because this ribozyme could accumulate building blocks of RNA around itself, thus favoring its own replication. If so, a higher catalytic rate, which corresponds to a greater PNFR (refer to S1 Table), should favor the spread of the NSR. However, from the automatic parameter-adjustment curves (Fig 3), we noticed that PNFR does not increase as expected–it even tends to decrease. Does it mean that under the ‘learned’ parameter setting, the catalytic function of NSR is in practice of no use, and any RNA sequence inoculated into the system could spread? This turned out to be the case–when we inoculated a non-functional RNA sequence (the control) instead of NSR into the system, it spread as well! (Fig 4C and 4D). Interestingly, this abnormal situation offered us a chance to demonstrate the power of our automatic parameter-exploration further. For a modification, while inoculating 100 molecules of NSR together with 100 molecules of the control RNA species into the system, we chose the difference between their molecular number (NSR minus Control) as the target value to be improved in the process of machine-learning. As a result, the difference was enlarged via the machine-learning (Fig 5A). Then, with the learned parameter set, NSR could spread in the system but the control species cannot! (Fig 6). Notably, here in the learned parameter set, PNFR (the rate for enzymatic nucleotide synthesis) becomes significantly greater than PNF (the rate for non-enzymatic nucleotide synthesis): PNFR = 0.036 and PNF = 8.36×10−7 (see the legend of Fig 6), in comparison with the former pair: PNFR = 5.51×10−3 and PNF = 3.04×10−4 (see the legend of Fig 4). That is, the machine-learning finally achieved a result showing that NSR may spread in the RNA pool on account of their enzymatic function–fully supporting the aforementioned hypothetic idea [20].

Fig 5

Adopting the difference between NSR and the control RNA species as the objective function.

The starting value of parameters are the same as those in Fig 2A: i.e., PNF = 0.004, PNFR = 0.02, PND = 0.001, PRL = 2×10−5, PBB = 1×10−5, PAT = 0.5, PFP = 0.1, and PMV = 0.001. The learning rate e = 0.5. Here the objective function is the difference (Diff) between the number of NSR and the control RNA species (i.e. NSR—Ctrl), instead of the number of NSR per se. (a) NSR plays its normal role as it is named. (b) NSR is assumed to lose its function. In fact, in case , the curve of NSR (black) is almost covered by that of Ctrl (yellowish-green) because there is nearly no difference between them throughout the whole learning process, and the curve of Diff (purple) lies very close to the horizontal axis.

Fig 6

Using the difference between the NSR and control molecules as the objective function resulted in the parameter set favoring the spread of NSR but not the control.

Adopting the difference between NSR and the control RNA species as the objective function.

Using the difference between the NSR and control molecules as the objective function resulted in the parameter set favoring the spread of NSR but not the control.

In reference of the case shown in Fig 5A, the parameter set at the 50th round of the machine-learning is adopted: PNF = 8.36×10−7, PNFR = 0.036, PND = 3.77×10−3, PRL = 9.89×10−8, PBB = 3.05×10−5, PAT = 0.373, PFP = 2.17×10−5, PMV = 2.89×10−4. At step 1×104, (a) one hundred NSR molecules are inoculated; (b) one NSR molecule is inoculated; (c) one hundred control (Ctrl) molecules, which have no enzymatic activity, are inoculated; (d) one control molecule is inoculated (the number symbols lie very close to the horizontal axis). Doubtlessly, the power of such automatic parameter-exploration is impressive. Then it is naturally to think that, could such an approach be bringing about an illusion? In other words, whether the automatic exploration is “omnipotent”, capable of seeking out any results one desires, regardless of any relevant evolutionary mechanisms? In terms of machine-learning, it is somewhat like the concept of “over-fitting”. If so, the approach would be meaningless for our goal of investigating evolutionary processes. Hence, we chose to “knock out” the function of NSR in the modeling (i.e., the NSR no longer catalyzes nucleotide synthesis) and came back to the learning case aiming at enlarging the difference between the NSR and the control species. Then no difference can be “learned out”! (Fig 5B). That is to say, the aforementioned difference between the NSR and control species (Fig 5A) is indeed stemmed from the function of NSR, and our worry about the problem of “over-fitting” is not necessary. Above we have seen that the objective function can be modified to serve a different goal of the parameter-exploration. To a degree, this illustrates the robustness of the present method. In fact, the method is also robust even when our learning strategy is modified. For instance, in the strategy used originally, when a change of one parameter does not bring about the value change of the objective function, in the next round of learning it is adjusted upward–actually, if it is adjusted downward instead, the machine-learning method still effectively lead to the increase of the NSR (S1A Fig). Additionally, in regard of the learning rate, when we adopt a strategy of multiplication instead of a strategy of addition (see Methods for details), the machine-learning is still effective (S1B and S1C Fig). Furthermore, when we adopt the approach of coordinate ascent instead of the gradient ascent (see Methods for details), the machine-learning can also find an “appropriate” parameter setting favoring the spread of NSR (S1D Fig). Then we are curious about whether the machine-learning method described here could apply to computer simulation studies from other groups in the field, which may just serve as a verification test of our method. A famous research of Szathmáry and coworkers [19], in which the source code of the simulation program was provided, offers us a chance to assess this concern. The work addressed an important issue concerning the plausibility of Darwinian evolution at molecular level: without membrane, could RNA-like molecules evolved towards more efficiency and greater complexity? The study assumed a relatively abstract model, in which the RNA-like polymers are referred to as “replicators”. The results showed that with limited dispersal, the replicators could evolve towards higher efficiency and fidelity (in their replication). Fig 7A shows a typical case of such evolution in the computer simulation (Monte-Carlo simulation), in which the average length of the replicators in the system was monitored, marking the evolutionary process towards complexity. Obviously, here, such an average length may just serve as the objective function in our machine-learning.

Fig 7

The evolutionary dynamics of average length of replicators in the modeling.

The evolutionary dynamics of average length of replicators in the modeling.

(a) A typical case demonstrating the hypothetic scenario in the original work [19]. The average length of the replicators in the system is initially 5 (in monomers), and it increases significantly accompanying with the evolutionary process. Parameters are set according to the original work: αA = 0.1, βA = 3, γA = 200, αB = 0.1, βB = 3, γB = 200, βC = 2, γC = 5 (note that for clarity we omit the description of the other four parameters in the same model which are not involved in the parameter-exploration here). (b) A case we assumed as the starting point for the machine-learning. Parameter setting: αA = 0.2, βA = 2, αB = 0.2, βB = 2, βC = 1, and the other three parameters are the same as in (a). Obviously, such a parameter-setting does not support the hypothetic idea [19]–there is nearly no increase in the average length of the replicators. The situation is expected to “improve” via the automatic parameter-exploration in principle of machine-learning that we present here. To set a starting point for the machine-learning, we changed several parameter values in the case of Fig 7A, with which the average length of the replicators could not increase in the evolution (Fig 7B). The replicator length at step 100×104 was chosen as the objective function. Then we started the parameter-exploration in favor of the increase of the replicator length. The result of the learning manifested a clear success–the replicator length at step 100×104 was improved from 5 to around 30 (monomer residues) (Fig 8A), and when the learned parameter setting was applied into the evolutionary simulation, the replicator length reached about 35 at step 400×104 (Fig 8B)–no less than that in the typical case from the original work (Fig 7A). S2 Fig illustrates the automatic parameter-adjustment during the machine-learning.

Fig 8

The average length of replicators improves via parameter-exploration in principle of machine-learning.

The average length of replicators improves via parameter-exploration in principle of machine-learning.

(a) The objective function is the average length of replicators at step 100×104, with a starting point of learning which was assumed in case Fig 7B. The arrows indicate “odd points”, which denote the rounds when the total number of replicators is zero and the average length of replicators is assumed (ad hoc) to adopt the starting value, i.e. 5. (b) The improved evolutionary dynamics (in reference of that in Fig 7B) by adopting the learned parameter set at the 60th learning round in : αA = 0.0727, βA = 2.57, γA = 393, αB = 0.0766, βB = 4.42, γB = 169, βC = 1.88, γC = 3.35. (c) The objective function is the average length of replicators at step 20×104, with a starting point of learning assumed in case Fig 7B. (d) The improved evolutionary dynamics (in reference of that in Fig 7B) by adopting the parameter set at the 80th learning round in : αA = 0.0358, βA = 2.97, γA = 452, αB = 0.251, βB = 1.44, γB = 143, βC = 2.22, γC = 1.23. In fact, in accordance to the case in Fig 7A, we can see that the replicator length does not reach a stable level until step 300~400×104 –thus a better choice for the objective function seems to be the average length of the replicators after the reach of this balance (instead of the length at step 100×104 as we adopted in our learning). However, if so, the computational burden would be much greater. On the other hand, we found that further intention to save the computational burden by adopting (as objective function) the replicator lengths at earlier stage may result in a limited success–e.g., the replicator length at step 20×104, if adopted, was merely improved to around 15 (Fig 8C), and when applying the learned parameter setting, the replicator length can only reach a level around 18 at step 400×104 (Fig 8D). Obviously, the parameter values optimized for the replicator length at an early stage of the evolution may not necessarily be consistent with the optimal parameter values associated with the replicator length at a later stage. Indeed, due to complexity of the evolutionary processes in the origin of life, computational burden is always an issue deserving consideration in relevant modeling studies. Here, this issue is particularly noteworthy because perhaps many rounds of learning would be required to achieve an optimized parameter setting. Considering the “time gap” cannot be too great for an early-step target value to represent a later-step one (Fig 8), we designed a “progressive-learning” strategy. The replicator lengths at step 10, 20, 40 and 100×104 are adopted as the objective functions at different stages of the machine-learning, one after another (S3A Fig). The parameter values learned from a previous stage are set as the starting point for the next learning stage. The strategy turned out to be successful–when the ultimately learned parameter setting was applied into the evolutionary simulation, the replicator length increased to a level no less than (here, even higher than) the case in which the progressive strategy is not adopted (S3B Fig). Obviously, by using the strategy, computational costs would be significantly saved while keeping the learning effective.

Discussion

In the present study, we demonstrate that automatic parameter-exploration in principle of machine-learning may be used in the computer modeling researches adopting the reverse approach–that is, trying to find out parameter values favoring a specific outcome. As mentioned already, computer modeling researches on the evolutionary processes during the origin of life typically adopted the reverse approach. This is largely owing to the shortage of our knowledge about the prebiotic chemistry and environments, which is associated with the parameter setting. On the other hand, why this reverse approach was assumed to be particularly effective in this area? In fact, the evolution during the origin of life is remarkably characterized by the tendency from simplicity to complexity, which is a special, rare phenomenon in nature [3,4]. Therefore, any relevant hypothetic scene in the area (suggesting a simplicity-to-complexity evolution), if supported by modeling, deserves our attention–therefore, exploring parameter-setting in favor of the scene is valuable. Or else, provided that a simplicity-to-complexity evolution were ordinary, it would make little sense to find out parameter values supporting a relevant hypothetic scene because we have a quantity of choices concerning such hypotheses–then, perhaps we should instead focus our attention on which scenes may have really occurred in history. If so, it appears more meaningful to wait for the accumulation of evidence and knowledge on prebiotic chemistry and environments–then conducting computer modeling in a conventional way. It is noteworthy that after we find an optimal parameter-setting in favor of a hypothetic scene, an obviously useful job would be to test the influence of different parameters by altering them separately, especially to see the cases where the model does not work well to support the scene. The detailed information on the optimal parameter-setting and the results of the subsequent testing by separate parameter adjustments would provide clues for us to judge the likelihood of such a scene in history based on our extant knowledge concerning prebiotic chemistry and environments, or for such a judgement in future if by now we have not got relevant knowledge. That is just where the significance of our “reverse” computer modeling lies. For the automatic parameter-exploration, a large learning rate, though better for a fast learning, may brought about unstability in the learning process (Fig 2), especially when the optimum is approaching. In fact, the unstability is similar to the common situation for any optimization. An ordinary solution to this problem in machine-learning is to adjust the learning rate progressively, that is, e would decrease gradually when the optimum is approaching. But here such a strategy (or some strategy alike) is not necessary because the aim of parameter-exploration for the evolutionary modeling studies on the origin of life is not to seek an optimum in a pure sense, but to know roughly about the existence of a good parameter set favoring a hypothetic scene. Certainly, for any potential goals to seek an absolute optimum in future work of this area, improvements along this line would be readily to achieve–simply based on those relevant techniques developed in the field of machine-learning [40]. Notably, for the automatic parameter-learning process, the selection of an objective function is a critical step. Different objective functions may result in apparently distinct results. When we chose merely the molecular number of NSR as objective function (Fig 2), the resulting parameter setting may favor the spread of control species as well as that of NSR (Fig 4). When the difference between the NSR and the control species was chosen as the objective function (Fig 5A), the resulting parameter setting may favor the spread of NSR but not the control species (Fig 6). When the average replicator length at step 100×104 was chosen as objective function (Fig 8A), the resulting parameter setting is satisfactory in regard of increasing the replicator length in the evolution (Fig 8B). In contrast, when the average replicator length at step 20×104 was chosen as the objective function (Fig 8C), the resulting parameter setting is obviously “unsatisfactory” (Fig 8D)–because the parameter setting optimizing for the replicator length at an early stage does not necessarily represent a good one favoring long replicators at the final balance. To balance the effectiveness and the computational costs of the machine-learning, we designed a strategy of progressive learning, in which the parameter values learned from a previous stage are set as the starting point of the learning in a following one (S3 Fig). In fact, such a mechanism involves a notion of “transfer learning” [40]–that is, a parameter set resulting from a learning process may be “lent” and used as the starting point of another learning process provided that the two learning cases are similar or clearly related to each other–thereby speeding the latter learning process. Furthermore, the notion of transfer learning may be quite useful in the whole area of evolutionary modeling concerning the origin of life. As mentioned already, the origin of life represented a remarkable developmental scenario from simplicity to complexity. For example, the two examples used in the present study are merely dealing with the Darwinian evolution at the molecular level and being relatively simple [19,20]. There have already been quite a few modeling studies tackling Darwinian evolution in a later stage, i.e. at the “proto-cellular” level (e.g., see Ref. [21,24,27,28]), which involves much more parameters. That is to say, on account of intrinsic relationship between earlier and later scenes during the origin of life, probably a portion of parameters in the complex models, i.e. those having been explored in previous simpler models, need not be explored ab initio. In addition to the optimization approach of the gradient ascent, we have shown that the approach of coordinate ascent (see Methods) may also work in the automatic parameter-exploration (S1D Fig), wherein parameters are explored one by one, with iterative cycles. However, in principle of machine-learning, the coordinate ascent is less powerful than the gradient ascent, especially when the parameters are strongly “interdependent”. Indeed, here we have observed the interdependence between parameters–during the automatic parameter-adjustment, some parameters increase in their values in certain stages while decreasing in other stages (Figs 3 and S2). This inconsistence of tendency indicates that a “better” value for one parameter may depend on the values of other parameters in use. Therefore, it is expected that, in this area, the approach of the gradient ascent would be more robust than that of coordinate ascent–especially for those complex models concerning the origin of life, in which the interdependence between parameters might be more significant. Certainly, there may also be other learning algorithms for the automatic parameter-exploration. For example, as a variant of the Metropolis algorithm, we may make small random changes to the parameters, and then accept all moves that improve the objective function and accept moves that decrease the objective function with a smaller probability that depends on the scale of the decrease (accepting decrease to an extant might avoid being trapped permanently in local optima). The idea is no doubt interesting. But the Metropolis algorithm is sometime not quite efficient in finding an optimum. Concerning the automatic parameter-exploration, perhaps further investigation on this method is needed to draw a conclusion. Another interesting idea is to apply the evolutionary algorithms. For instance, we may treat the investigated parameters as genotypes and the objective function as the phenotype. Genotypes favoring the improvement of the phenotype is selected iteratively over many rounds of “reproduction” where mutation and recombination of the genotypes are introduced. Then, starting from a “bad” parameter-set, perhaps a good parameter-set favoring the hypothetic scene may gradually emerge in the “parameters’ evolution”. Obviously, the effectiveness and the efficiency (if effective) of the evolutionary algorithms in regard to the automatic parameter-exploration are also expected to be evaluated in future. No matter how, the automatic parameter-exploration in principle of machine-learning demonstrated in the present study is inspiring. With less scruples regarding the burden of searching for “appropriate” parameters, it is anticipated that researchers in this area would establish models considering more details of prebiotic chemistry (thus with more parameters). Therewith, the results and conclusions derived from the computer simulations would be more comparable with reality and more convincing. Then perhaps the two aspects of scientific efforts concerning the origin of life, experimental and theoretical, would become much more cross-referenced, interdependent, and finally, even merged–thereby hopefully bringing about a breakthrough in the field. Finally, we notice that the approach we present here may have a broader sense. Although as mentioned above, the reverse way of modeling is particularly important (and effective) in the field of the origin of life, obviously, in other fields of modeling sometimes we might also want to know what a kind of parameter setting would bring about a definite outcome of interest, thus tending to utilize the reverse way. Then the presented approach of automatic parameter-exploration in principle of machine-learning would be significant.

Methods

The gradient ascent

Suppose that n parameters are involved in the parameter-exploration: p, p, ……, p, and a definite outcome of the model running (i.e., simulation) using these parameters is O, which can be denoted as: O = Model {p, p, ……, p}. This definite outcome, which may represent a hypothetic scene, is adopted as the objective function in our machine-learning, which means we aim to obtain a maximum value of O by adjusting the values of p, p, ……, p. Certainly, we may also try to minimize O by modifying the parameters, which are called the “gradient descent”. To be concise, we will only describe the optimizing direction of “ascent” here (the same below for the approach of coordinate ascent). Firstly, we obtain O[0] by running the model with initial parameter values p[0], p[0], ……, p[0], which can be described here as: Then, we begin to test the influence of the parameters on the objective function. We change the value of p to p[0]*(1+e), where e denotes the learning-rate (0 Likewise, we change the values of other parameters and obtain: Having done these, we begin to calculate the difference of the objective function brought by the change of parameters: Then the maximal absolute value of these differences is found out: Based on this, the parameter values for the next learning-round is calculated: Subsequently, a new round of learning starts from the step marked by Formula (1) above, and so on, iteratively. In the approach, the notion of gradient ascent has been manifested in the determination of the parameter values in a new round. For example, if the increase of a parameter (p) could bring about the greatest improvement regarding the objective function (calculated in the testing stage represented by Formula (2), and judged in Formula (3) and Formula (4)), in the new round (as demonstrated in Formula (5)), it would be adjusted upwards in a full scale, i.e., p = p*(1+e). In contrast, other parameters, which have less potential to improve the objective function, would be modified in a scale in proportion to its potential. Importantly, in the learning algorithm, the changing directions of the parameter and the objective function are interrelated–for instance, if the increase of a parameter (p) brings about the decrease of the objective function, in the new round (as calculated in Formula (5)), it would be adjusted downwards–due to the sign of △O[0]x. For details, one can refer to the source code of our learning algorithm (See Code availability). No matter how, the approach here is choosing the steepest direction leading to the maximization of the objective function–in term of mathematics, △O[0]x represents the changing rate of the objective function O relative to the change of p, and a vector comprising all these relative changing rates just represents the “gradient”. In practice, △O[0]x / Max_ abs_△O [0] is a normalized form of the changing rate △O[0]x. Notably, here we use the “learning rate” “e” in two places, i.e., in Formulas (2) and (5) respectively. However, conceptually, while the one in (5) is the veritable learning rate, the one in (2) actually simply defines the small change of the parameters used to estimate the gradient. For a deterministic model (e.g., the differential equation model as mentioned in introduction), in which the gradient could be determined by analytic methods, the usage of e in sense of that in (2) is often unnecessary. But for the Monte-Carlo model as demonstrated here, we need a “testing rate” to estimate the gradient. Though in principle, the “testing rate” and the true “learning rate” need not be the same, they should adopt a similar scale in practice. For instance, when we want to conduct a fast learning and thus use a great “learning rate” for adjusting the parameters, then using a very small “testing rate” for an accurate estimation of the gradient (which would bring great computational cost) would appear unnecessary; on the other hand, if we would like to conduct a smooth learning and thus use a small learning rate, then using a relatively great “testing rate” would tend to be insufficient to estimate a gradient accurate enough for a valid learning step forward. Therefore, for simplification in practice, here we adopt the same value and use a unitary representation for the two rates.

Variations on learning strategies

In the approach, the “learning direction” in the testing stage would be adjusted according to the result in the previous round. For example, when the increase of a parameter (p) brought about the decrease of the objective function in a previous round, in the testing stage of the new round, the parameter would be adjusted downwards instead–that is, in the Formula (2), the actual learning direction might be represented by p[0]*(1-e), instead of p[0]*(1+e). In the particular situation when the increase (or decrease) of p did not result in the alteration of the objective function, we typically assume an “upward” direction. But when an “downward” direction is assumed, there would be no problem (S1A Fig). Or we may even change the implementation of the testing by using a strategy of “multiplication” rather than that of “addition”: when the increase of a parameter (p) brought about the decrease of the objective function in a previous round, the parameter would be adjusted downwards in a form of p[0]/(1+e), instead of p[0]*(1+e) (S1B Fig). Furthermore, the approach is also robust when both these two strategies are used simultaneously, that is, the downward adjustment is in a form of p[0]/(1+e), and in the particular situation when the increase (or decrease) of p did not result in the alteration of the objective function, the downward direction of learning is adopted (S1C Fig).

The coordinate ascent

In the beginning round of learning, for the first parameter (p), while the other parameters are fixed, its initial value (p[0]), three values adjusted upwards (p[0]*(1+e), p[0]*(1+e)2, p[0]*(1+e)3) and three values adjusted downwards (p[0]*(1-e), p[0]*(1-e)2, p[0]*(1-e)3), are tested respectively. In regard of “favoring” the objective function, the most “outstanding” one among the seven values is reserved for testing the other parameters, as well as for serving as the initial value for testing this parameter in the next round (i.e., p[1]). When all the parameters have been tested, a new round of exploration starts from the first parameter. That is to say, actually, rather than simultaneous exploration as conducted in the approach of the gradient ascent, the parameters are here explored one by one, iteratively–thus, characterized with the notion of the coordinate ascent (S1D Fig).

Parameters used in the automatic exploration

(PDF) Click here for additional data file.

Variations on the machine-learning method.

The starting value of parameters are the same as those in Fig 2A: i.e., PNF = 4×10−3, PNFR = 0.02, PND = 1×10−3, PRL = 2×10−5, PBB = 1×10−5, PAT = 0.5, PFP = 0.1, and PMV = 1×10−3. The learning rate e = 0.5 (corresponding to the blue line in Fig 2A). As with the cases in Fig 2, the objective function is the number of NSR molecules at step 1.5×105 in the evolutionary dynamics (refer to Fig 1B). (a) When a change of one parameter does not bring about the value change of the objective function, in the next round of learning it is adjusted downwards instead of upwards. (b) The change of a parameter concerning the learning rate is implemented by the rule of multiplication, instead of the rule of addition. (c) Adoption of the implementation strategy in with the adjusting strategy of . (d) Here, instead of the gradient ascent, the approach of coordinate ascent is used. See Methods for a detailed description about all these variations. (TIF) Click here for additional data file.

The automatic parameter-adjustment during the machine-learning–optimizing for longer replicators.

The improvement of replicator length of this case is shown in Fig 8A. The vertical axis of a subfigure represents the value of the corresponding parameter whose name is shown within the panel. The eight parameters are adjusted simultaneously in a learning round. (TIF) Click here for additional data file.

The progressive parameter-exploration for the sake of saving computational costs.

(a) The black line is the same as that in Fig 8A, whose objective function, all through the learning process, is the average length of replicators at step 100×104. The learning process denoted by the magenta segment (from round 1 to 15) uses an objective function of the replicator length at step 10×104, with a starting point of parameter setting the same as that of the black line case (refer to Fig 7B); the learning process denoted by the green segment (from round 16 to 30) uses an objective function of the replicator length at step 20×104, with a starting point of parameter setting achieved from the previous learning process (the magenta segment); the brown segment (from round 31 to 45) uses an objective function of the replicator length at step 40×104, with a starting point of parameter setting achieved from the previous learning process (the green segment); the red segment (from round 46 to 60) uses an objective function of replicator length at step 100×104, with a starting point of parameter setting achieved from the previous learning process (the brown segment). The arrows indicate “odd points” (see the legend of Fig 8 for an explanation). (b) The black dots represent the improved evolutionary dynamics (in comparison with Fig 7B) by adopting the parameter set at the 60th learning round of the black line case in , which does not use the progressive strategy (actually the same as shown in Fig 8B), while the red dots represent the improved evolutionary dynamics (in comparison with Fig 7B) by adopting the parameter set at the 60th learning round of the red line case in , which uses the progressive strategy. (TIF) Click here for additional data file. 1 Nov 2021 Dear Dr. Ma, Thank you very much for submitting your manuscript "The automatic parameter-exploration in principle of machine-learning: Powering the evolutionary modeling on the origin of life" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. The reviewers have raised various points about the algorithm that should be fully addressed. In particular, many aspects of the algorithm developed and its connection to other algorithms need to be addressed. Added information is needed both on the method development and discussion on its application. Further, please revise the manuscript title for clarity, "in principle of machine-learning" is unclear/ambiguous. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Tamar Schlick Associate Editor PLOS Computational Biology Kiran Patil Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] The reviewers have raised various points about the algorithm that should be fully addressed. In particular, many aspects of the algorithm developed and its connection to other algorithms need to be addressed. Added information is needed both on the method development and discussion on its application. Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The paper starts from the important point that many models used in evolutionary biology and origin of life studies have many parameters, and that finding suitable values of these parameters is difficult by trial and error. The paper gives a method of automatic learning of parameters which directs the parameters via a gradient ascent method towards optimal values. The model of nucleotide synthetase replication is taken from the author's previous work. This is an interesting model related to the RNA World. The current paper demonstrates that good parameter sets for this model can be found by the automatic learning method. I think this is a useful and interesting paper. In practice, the method requires evaluating the model on many different parameter sets. The practicality of the method will depend on how fast the model can be run on each parameter set. Philosophical points: We always have a tendency to want to "prove that our model is right" by finding parameters where our model produces a good outcome in comparison to somebody else's model. If authors select parameters by hand, they could be accused of "cherry picking" parameters that demonstrate a preconceived desired outcome. It seems that this automated method does not get rid of this problem, because there will always be parameters in the learning algorithm itself that can be chosen – such as the outcome function O which is optimized, and details of the rules by which the parameters are adjusted. My point is simply that the automated method does not removed potential human biases. It is often interesting to present results of a model as a function of changing values of a particular parameter. In this way we can see that good outcomes of a model depend on certain values of the parameter. In other words, it is useful see cases where the model does not work as well as the optimum parameter values. Technical points: The parameter e is called the learning rate. But e is used in two different ways in the method. Firstly a parameter is changed from p to p(1+e). The change is Deltap = pe, and the gradient is DeltaO/Deltap. Secondly, the parameters are adjusted by an amount proportional to e (equation 5). In the second sense, e is indeed a learning rate because it determines the rate of change of parameters, but in the first sense, e is not a learning rate. It simply determines the small change Deltap used to estimate the gradient. It is therefore slightly confusing to call e the learning rate on line 336. There is no particular reason why the e used for Deltap has to be the same as the e used for adjusting the parameters . In a deterministic model we would expect that the estimated gradient DeltaO/Deltap has a well-defined limit dO/dp when e tends to 0, and that the estimated gradient should not depend much on e. In a stochastic model, the outcome will fluctuate, even if the model is run twice with the same parameters. So the gradient will be dominated by statistical error if e is too small. Did the authors ever observe this problem in their simulations? For the results shown in Fig 2, the small e case seems to give quite smooth results, so probably e was not too small. On the other hand, the large e case appears to be unstable, with big drops in the O function occurring frequently. I would suggest that the reason for this is that the parameters are adjusted by an amount that depends on DeltaO/Max-abs-DeltaO (in equation 5). If the parameters are approaching an optimum, then Max-abs-DeltaO will be small, and all the parameters will be shifted by a big amount, thus moving the model a long way from the optimum. It would probably be better to shift the parameters by an amount proportional simply to DeltaO. Then the algorithm would smoothly approach the optimum. Another option would be to make small random changes to the parameters, and then accept all moves that increase O and accept moves that decrease O with a smaller probability that depends on DeltaO. This would be a variant of the Metropolis algorithm. This would have the potential advantage of avoiding being trapped permanently in local optima, and also it does not require the calculation of the gradient, which is time-consuming because the model has to be evaluated separately for changes in each parameter. Reviewer #2: This is a valuable contribution, and may help move the field further. There is only one aspect that I am missing from the discussion. The Authors apply mostly the gradient ascent algorithm, but there are several others. Do they have a comment what they would expect (or not) from alternatives? Evolutionary algorithms naturally come to mind in this context. At least some of the investigated parameters could be made genotypic properties and thus evolvable. Parameters that are simply given by chemistry in the given environment (such as probability of breaking a phosphodiester bond) cannot be treated in this way, but others can. I think this would be interesting. It would be a bit similar to population genetic models that have evolvable mutation or recombination rates. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Eors Szathmary Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols References: Review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. 12 Nov 2021 Submitted filename: response-to-reviewers.docx Click here for additional data file. 15 Dec 2021 Dear Dr. Ma, We are pleased to inform you that your manuscript 'The automatic parameter-exploration with a machine-learning-like approach: Powering the evolutionary modeling on the origin of life' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Tamar Schlick Associate Editor PLOS Computational Biology Kiran Patil Deputy Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The responses to the previous questions are clear. I do not have any further questions. Reviewer #2: The paper has been duly revised. ********** Have the authors made all data and (if applicable) computational code underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data and code underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data and code should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data or code —e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: Yes: Eörs Szathmáry 21 Dec 2021 PCOMPBIOL-D-21-01741R1 The automatic parameter-exploration with a machine-learning-like approach: Powering the evolutionary modeling on the origin of life Dear Dr Ma, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zsofia Freund PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol

34 in total

1. Survival of replicators with parabolic growth tendency and exponential decay.

Authors: I Scheuring; E Szathmáry
Journal: J Theor Biol Date: 2001-09-07 Impact factor: 2.691

2. Self-organization and evolution in a simulated cross catalyzed network.

Authors: A Hunding; R Engelhardt
Journal: Orig Life Evol Biosph Date: 2000-10 Impact factor: 1.950

3. A production of amino acids under possible primitive earth conditions.

Authors: S L MILLER
Journal: Science Date: 1953-05-15 Impact factor: 47.728

4. Nucleotide synthetase ribozymes may have emerged first in the RNA world.

Authors: Wentao Ma; Chunwu Yu; Wentao Zhang; Jiming Hu
Journal: RNA Date: 2007-09-18 Impact factor: 4.942

5. The origin of life: a problem of history, chemistry, and evolution.

Authors: Wentao Ma
Journal: Chem Biodivers Date: 2014-12 Impact factor: 2.408

6. The origins of life: old problems, new chemistries.

Authors: Stephen Mann
Journal: Angew Chem Int Ed Engl Date: 2012-12-03 Impact factor: 15.336

7. Attempts to define life do not help to understand the origin of life.

Authors: Jack W Szostak
Journal: J Biomol Struct Dyn Date: 2012

8. On the origin of DNA genomes: evolution of the division of labor between template and catalyst in model replicator systems.

Authors: Nobuto Takeuchi; Paulien Hogeweg; Eugene V Koonin
Journal: PLoS Comput Biol Date: 2011-03-24 Impact factor: 4.475

9. The origin of biological homochirality along with the origin of life.

Authors: Yong Chen; Wentao Ma
Journal: PLoS Comput Biol Date: 2020-01-08 Impact factor: 4.475

10. Evolution of linkage and genome expansion in protocells: The origin of chromosomes.

Authors: András Szilágyi; Viktor Péter Kovács; Eörs Szathmáry; Mauro Santos
Journal: PLoS Genet Date: 2020-10-29 Impact factor: 5.917