Literature DB >> 31025861

Intuition-Enabled Machine Learning Beats the Competition When Joint Human-Robot Teams Perform Inorganic Chemical Experiments.

Vasilios Duros1, Jonathan Grizou1, Abhishek Sharma1, S Hessam M Mehr1, Andrius Bubliauskas1, Przemysław Frei1, Haralampos N Miras1, Leroy Cronin1.   

Abstract

Traditionally, chemists have relied on years of training and accumulated experience in order to discover new molecules. But the space of possible molecules is so vast that only a limited exploration with the traditional methods can be ever possible. This means that many opportunities for the discovery of interesting phenomena have been missed, and in addition, the inherent variability of these phenomena can make them difficult to control and understand. The current state-of-the-art is moving toward the development of automated and eventually fully autonomous systems coupled with in-line analytics and decision-making algorithms. Yet even these, despite the substantial progress achieved recently, still cannot easily tackle large combinatorial spaces, as they are limited by the lack of high-quality data. Herein, we explore the utility of active learning methods for exploring the chemical space by comparing the collaboration between human experimenters with an algorithm-based search against their performance individually to probe the self-assembly and crystallization of the polyoxometalate cluster Na6[Mo120Ce6O366H12(H2O)78]·200H2O (1). We show that the robot-human teams are able to increase the prediction accuracy to 75.6 ± 1.8%, from 71.8 ± 0.3% with the algorithm alone and 66.3 ± 1.8% from only the human experimenters demonstrating that human-robot teams can beat robots or humans working alone.

Entities:  

Year:  2019        PMID: 31025861      PMCID: PMC6593393          DOI: 10.1021/acs.jcim.9b00304

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


Introduction

The scientific exploration of the vast chemical space for the discovery of new molecules has always been a challenging endeavor, since it is estimated that there are approximately 1060–10100 synthetically feasible molecules.[1,2] As a result, the discovery of new chemical reactions can be a time-consuming process[3] especially when relying in traditional synthesis methods.[4] A huge improvement in reactivity prediction came with the development of computational methods (such as density functional theory, DFT, and empirical force field methods), which can screen a large number of candidate compounds in silico, reducing the need for all experiments to actually be carried out.[2,5] However, these methods can be computationally demanding as the system grows in complexity and are limited in that only ground-state structures can be calculated, ignoring metastable and transient species.[5] The emergence of artificial intelligence (AI) methods, and their implementation in chemistry, offers another avenue of exploration for chemical reactivity, see Figure . These methods have been facilitated by the availability of both big data[6,7] and open-source code for the training of algorithms.[8,9] A subfield of AI that has recently found applications in chemistry is machine learning,[5,10] which relies on data in order to construct a model of the chemical space under investigation. An advantage of machine learning is that the mechanistic details of the system do not need to be explicitly known in order to predict the probability of a given outcome or property of interest. Recent progress in automated chemistry[11] and online analysis[12,13] has allowed experimenters to build robots capable of exploring for chemical reactivity in a more autonomous way.[14,15] This means that robotic platforms can easily gather the data needed to implement machine learning algorithms. Nevertheless, the vast majority of algorithms (irrespective of their type) are not fully autonomous and require some guidance from the user, ranging from the choice of the internal parameters[16] (hyperparameters) for training to the selection of the algorithm for a specific chemical query[10] and the selection of the variables by the experimental scientist. Although deep-learning approaches (such as neural networks) have shown promise for predicting rules-of-thumb[17,18] in chemical synthesis, they suffer from two major drawbacks: First, they require large amounts of high-quality data in order to learn effectively, and second, they have difficulty operating outside their knowledge base.[10] In the former case, the problem stems from the fact that in chemistry, we are often limited to a relatively small number of high-quality data points (usually in the order of hundreds or thousands), while deep-learning methods are more attuned to systems with millions of data points (e.g., image recognition or text processing). In the latter case, machine learning methods can be predictive but not necessarily interpretable, since they tend to ignore molecular context because of the way that the data is represented in their model.[10] Therefore, the interaction with experimental scientists is important in order to assess these predictions, and in the end, it is chemical intuition that determines which outcomes are valuable and which may be ignored.[5,10,19]
Figure 1

The evolution process from the traditional synthesis of one-pot methods to the automated high-throughput methods and more recently the advent of the increased use of machine learning methods for the exploration of the vast chemical space. For the case of the machine learning methods, in the beginning a hypothesis is formed starting from the available data in the literature and theoretical calculations. Then, this information is used to train an algorithm in recognizing patterns in the data and subsequently suggest a series of experiments to be performed. After the data evaluation, it is possible to update the model of the chemical space, and the cycle can begin again with training of the algorithm in the new acquired information.

The evolution process from the traditional synthesis of one-pot methods to the automated high-throughput methods and more recently the advent of the increased use of machine learning methods for the exploration of the vast chemical space. For the case of the machine learning methods, in the beginning a hypothesis is formed starting from the available data in the literature and theoretical calculations. Then, this information is used to train an algorithm in recognizing patterns in the data and subsequently suggest a series of experiments to be performed. After the data evaluation, it is possible to update the model of the chemical space, and the cycle can begin again with training of the algorithm in the new acquired information. Intuition is generally described as heuristics, comprising strategies that human experimenters employ in problem solving and decision making by finding patterns, analogies, similarities, and rules-of-thumb in their data.[20,21] While automation has allowed generating, collecting, and storing data from scientific measurements in a very reliable and precise way, the field lacks uniform ways to process this data into concrete knowledge in the form of an analytical expression. The most significant advantage of intuition is that it does not require full information or knowledge of an unknown situation,[22−24] and in this way, it allows experimenters to perform well even in areas of high uncertainty.[25] Furthermore, the human mind is not able to process situations with a multitude of variables,[26] and as a result, it resorts to intuition and establishes a direction along which exploration can be performed in a consistent and meaningful way without getting lost in the details. In the context of chemistry, we can therefore only have a general overview of the system we are studying. Thus far, data mining methods are the closest approximations that have been developed as a means to substitute for human intuition in the experimental design.[10] Within this framework, we propose that strategies based on chemists’ intuition coupled with machine learning methodologies are a powerful alternative way to explore complex problems that involve large combinatorial spaces or nonlinear processes, where machine learning methods alone are unsuitable. Additionally, we propose that human intuition can help in guiding chemical synthesis, especially in cases where there is a lack of high-quality data. To our knowledge, there has been very little experimental work combining heuristics and machine learning methods. An algorithmic approach has been shown to detect nonlinear energy conservation laws without any prior knowledge of physics, kinematics, and geometry.[19] To achieve this, the algorithm automatically searched experimental motion-tracking data captured from various physical systems (ranging from simple harmonic oscillators to chaotic double-pendula) and built its own model of the physical space. Depending on the types of variables provided to the algorithm, different types of laws were derived. This dependence suggests that any analytical expression derived from a given computational method is amenable to human interpretation, and so close collaboration between the human factor and an algorithm can help in finding interesting phenomena more rapidly than before. In the field of chemistry, Raccuglia et al.,[27] implemented machine learning algorithms to predict reaction outcomes of vanadium compounds by using data from unsuccessful and unreported syntheses (labeled by the authors as “dark” data) and compared the efficiency of the algorithms with the typical strategies that human chemists apply. Additionally, they demonstrated how the prediction accuracy of the model provided by the algorithm is higher than that of the human chemical intuition, both for single-crystalline and polycrystalline products. Nevertheless, the comparison is indirect, since the authors use unreported data from their lab books as a database for their analysis and did not actively compare the methodologies that human experimenters employ when searching the chemical space of a given compound. In our previous work, we have demonstrated that we can push the envelope of both the synthesis and the crystallization process of a new polyoxometalate (POM) compound.[28] Our method is drawn from recent advances for active data acquisition in the field of machine learning, known as active learning. Active learning consists of methodologies that can decide what experiments should be performed next in order to improve the understanding of a system in the most efficient way. We studied how human experimenters approach the exploration and modeling of crystallization conditions for a given POM compound and directly compared the performance of their strategies to a machine learning approach. We hypothesized that this could be a first step to developing a new approach, which could combine the intuition of the chemists with machine learning in order to explore complex chemical systems and identify new phenomena. Additionally, in the work of Granda et al.,[15] and inspired by strategies based on chemists’ intuition, it is demonstrated that a reaction system controlled by a machine learning algorithm is capable of exploring the space of chemical reactions quickly, especially if trained by an expert. An organic synthesis robot can perform chemical reactions and analyses faster than they can be performed manually as well as predict the reactivity of possible reagent combinations after conducting a small number of experiments, thus effectively navigating a chemical reaction space. By using real-time data from this robot, the predictions of reactivity are followed up manually by a chemist, leading to the discovery of four reactions. Herein, we build on that previously reported work[28] of comparing an algorithm against the human intuition of human experimenters, and we attempt to combine them to explore the chemical space of the compound with chemical formula Na6[Mo120Ce6O366H12(H2O)78]·200H2O (1) (hereafter also mentioned as {Mo120Ce6}). In this context, the key question is whether we can quantify the way soft knowledge (i.e., heuristics and more concretely human intuition) and hard knowledge (i.e., the increased computational capability of a machine learning method) interact with each other as a team and, potentially, gain some insights in how this collaboration works. Ultimately, we want to benefit from these insights and improve the way we explore the vast chemical space. In Figure , we illustrate in a simplified conceptual scheme of our observations from the evolution of the prediction accuracy as a result of our experiments previously done.[28] There are two areas of special interest for the performance of the combination of human intuition and machine learning that can be observed: area A and area B. In the case of the former, the resulting performance from the experiments is better than by simply utilizing an algorithm. In the case of the latter, the performance lies between that of the human experimenters and the algorithm.
Figure 2

A conceptual scheme which represents the general trends of the evolution of the prediction accuracy based on our previous study.[28] In area A, the performances are higher than the ones observed from the algorithm by itself. In area B, the performances lie between that of the human experimenters and the algorithm. Lastly, in area C, the performances are only marginally better than a random search. Color scheme: algorithm, red line; human experimenters, green line; and random search, blue line.

A conceptual scheme which represents the general trends of the evolution of the prediction accuracy based on our previous study.[28] In area A, the performances are higher than the ones observed from the algorithm by itself. In area B, the performances lie between that of the human experimenters and the algorithm. Lastly, in area C, the performances are only marginally better than a random search. Color scheme: algorithm, red line; human experimenters, green line; and random search, blue line. Therefore, we aim to see how and if the team effort of human intuition and machine learning is able to increase its efficiency and lie in area A of Figure . The part of the machine learning is expressed with the use of an algorithm as described in the Supporting Information (SI), part 5. As for the reasons that we are interested in crystallization, the first is because of its broad implementation in the pharmaceutical industry and materials chemistry (e.g., with the isolation of new molecules that can be used as active pharmaceutical ingredients in drugs), and the second is because the crystal structure presents some inherent challenges as a result of the difficulty to find a format able to represent a crystalline solid in such a way so that it can be easily fed to a statistical learning procedure. We believe that finding a way to digitalize intuition can have an impact on both accelerating the discovery rate of new phenomena in more complex systems and on how young chemists are trained, since we can distill a vast body of seemingly random chemical information into an organized and interconnected web of knowledge.[20,28,29]

Methods

In our previous work,[28] we observed how the models computed by an algorithm are able to improve their prediction accuracy better than the models suggested by the human experimenters (82.4% over 77.1%, respectively, with a baseline performance of 68.1%). The algorithm we implemented is a classifier assigning labels (e.g., crystal/no-crystal) to regions of a parameters space; the human experimenters were volunteer Ph.D. students in our group familiar with inorganic chemistry synthesis; and the baseline method used as a control was a random search, rendering it blind to both the initial and the subsequently collected data. As a result, this difference in performance between algorithm and human experimenters indicates the effect that the different strategies followed by the human experimenters can have when they are based solely on their intuition. The basis for the current work is the formation and crystallization of cluster (1), and the general experimental procedure is depicted in Figure , where teams are formed consisting of human experimenters and an algorithm with the objective to explore chemical space together. In order to start their exploration, these teams are provided with the experimental conditions for the formation of (1): first, the chemicals involved in its synthesis (SI, part 2); second, the experimental protocol for the synthesis and crystallization process; and third, an initial set of data consisting of successful and unsuccessful crystallization experiments (SI, part 3).
Figure 3

Experimental protocol describing the decision-making process during the exploration of the chemical space of {Mo120Ce6} that was implemented during this work. An initial set of data serves as a starting point for experiments to perform next after analysis and model calculation. The experiments are performed in a fully automated platform, and the outcome is observed and recorded in an updated version of the initial database. Coloring code of the building units found in {Mo120Ce6}: {Mo2}, red; {Mo1}, yellow; {Mo8}, blue with central atom in cyan; Ce, green. For the 3D plot of the initial set of data, the axes represent the following: A, Na2MoO4·2H2O 1 M and Ce(NO3)3·6H2O 0.1 M (mL); B, HClO4 1 M (mL); and C, NH2NH2·2HCl 0.25 M (mL).

Experimental protocol describing the decision-making process during the exploration of the chemical space of {Mo120Ce6} that was implemented during this work. An initial set of data serves as a starting point for experiments to perform next after analysis and model calculation. The experiments are performed in a fully automated platform, and the outcome is observed and recorded in an updated version of the initial database. Coloring code of the building units found in {Mo120Ce6}: {Mo2}, red; {Mo1}, yellow; {Mo8}, blue with central atom in cyan; Ce, green. For the 3D plot of the initial set of data, the axes represent the following: A, Na2MoO4·2H2O 1 M and Ce(NO3)3·6H2O 0.1 M (mL); B, HClO4 1 M (mL); and C, NH2NH2·2HCl 0.25 M (mL). The flowchart of Figure describes the decision-making process of the human experimenter and algorithm teams. In the beginning, the initial set of data (SI, part 3 and Table S2) serves as the starting information used to decide what experiments to perform next. As a first step, the algorithm evaluates these experiments and builds a model of the chemical space. Based on that model, the algorithm provides us with a list of 20 suggested experiments. Then, these experiments are presented to the human experimenters, and they select 10 out of them to perform in the platform. Finally, we perform the experiments in an automated platform (SI, part 4) and receive the information about the presence or the absence of crystals for each of the requested experiments by illuminating the samples under a strong white light-emitting diode (3300–3500 lux at a distance of 5 cm). The process is repeated 10 times per method for a total of 100 experiments each. At each iteration, all data collected previously are integrated in the decision process for generating the next set of 20 experiments.
Figure 4

Decision-making process of the exploration protocol for the collaboration between the algorithm and the human experimenter. Starting from an initial set of data, the algorithm evaluates these results and suggests 20 experiments. For the next stage, the human experimenters select 10 experiments to run in the platform. The other 10 experiments that are not selected are discarded. After the reaction is finished, we wait for the crystallization time, and the database of experiments is updated with the outcome of the reaction before starting again, giving a loop that is repeated 10 times.

Decision-making process of the exploration protocol for the collaboration between the algorithm and the human experimenter. Starting from an initial set of data, the algorithm evaluates these results and suggests 20 experiments. For the next stage, the human experimenters select 10 experiments to run in the platform. The other 10 experiments that are not selected are discarded. After the reaction is finished, we wait for the crystallization time, and the database of experiments is updated with the outcome of the reaction before starting again, giving a loop that is repeated 10 times. At this point, we should mention that our previous experience with this chemical system allowed us to perform 10 experiments per day and wait overnight for crystallization of the product.[28] For the investigations described herein, we needed to modify our experimental procedure, as shown in Figure , in order to accommodate for the addition of the intuition of the human experimenters (through their suggestions) as an additional factor in the decision making of the algorithm. Therefore, the algorithm was altered in order to produce a list of 20 suggested experiments, and the human experimenters were instructed to select 10 to be performed in the platform. The machine learning parameters of the algorithm used a 10-fold cross-validation to search the best C and γ hyperparameters, where C is the regularization parameter and γ is the kernel coefficient of the radial basis function (ESI, part 6). To do this, we ran a cross-validation with all possible combinations of C and γ within the set (10–5, 10–4.5, 10–4, ..., 104.5, 105) and selected the C and γ values producing the smallest classification error, that is, the most accurate model. In our case, these values are C = 100 and γ = 10–3/2, and they are the same ones as before[28] since both are extremely important in order to tune the model provided by the algorithm. In the case where different values were used, it is possible to get entirely different performance from the algorithm,[16] and this means we are unable to directly compare the results across methods. To do this work, we use scikit-learn,[30,31] a machine learning library built for Python.

Results and Discussion

The data from the experiments, unless presented as an average result over multiple runs, are represented as H1 and H2 for the human experimenters, A1 and A2 for the algorithm runs, R1 and R2 for the random search, and T1, T2, and T3 for the teams of human experimenters and algorithm. The results shown are after the end of the 100 experiments requested at the beginning of our study. The exploration performed by all methods is quantified by using metrics such as the evolution of the prediction accuracy, the similarity of experiments, and the volume exploration. A brief theoretical background behind these metrics is provided in the SI, parts 8.1–8.4.

Prediction Accuracy

The evolution of the prediction accuracy of each method trained on the data collected in each run can be seen in Figure . The quality of the prediction (i.e., the percentage of time a crystal prediction is accurate) is expected to increase as more data are collected. The initial prediction quality based on the initial set of data provided to all methods is 66.5%. We can observe that the team was able to collect better quality data than the algorithm (75.6 ± 1.8% over 71.8 ± 0.3%) and improved its classification accuracy the most. Since we used 989 experimental points (ESI, part 8.5, Tables S5 and S6) in order to compute the prediction quality, this means that a 3.8% difference represents on average 38 additional experiments correctly predicted in our data set. This difference is quite substantial both in terms of our model and in machine learning grounds.
Figure 5

Average of the prediction accuracies for all methods with error bars as implemented by RandomForest. We can observe the higher variability in the error of the team in comparison to the other methods, which can be attributed to the different methodologies that the human experimenters followed for their calculations during the exploration of the chemical space (see ESI, part 8.3, Table S4 and Figure S24).

Average of the prediction accuracies for all methods with error bars as implemented by RandomForest. We can observe the higher variability in the error of the team in comparison to the other methods, which can be attributed to the different methodologies that the human experimenters followed for their calculations during the exploration of the chemical space (see ESI, part 8.3, Table S4 and Figure S24). In light of Figure , we can observe that the interaction of the human intuition and the algorithm manages to increase the performance of the individual parts and achieve higher efficiencies than the algorithm by itself. As for the existence of the larger variability of the standard deviation for the teams, we can only assume at this stage that it is the result of the different methodologies from the human experimenters in their interaction with the algorithm (see ESI, part 9).

Similarity of Experiments

For this metric, we calculate how many other experiments lie within a specific radius R (we use R = 2) in the parameters’ space (see ESI, part 8.4, Figure S26). This distance is a similarity measure between experiments: A large value indicates similar experiments, while a small value indicates more explored chemical space. In Figure , we plot this similarity metric as more experiments are performed. First, we note that in the initial set, 95% of the experiments leading to crystals are within a radius of R = 2 of each other in the chemical space.
Figure 6

Similarity metric of the experiments plotted as a comparison of the average ratio of crystals found within a given distance of other crystals. The faster this ratio drops, the wider the exploration. The data are represented as H1 and H2 for the human experimenters, A1 and A2 for the algorithm runs, R1 and R2 for the random search, and T1, T2, and T3 for the teams of human experimenters and algorithm. Note how two different groups emerge from this data: The first group consists of H2, R1, and R2, and the second group consists of H1, A1, A2, T1, T2, and T3.

Similarity metric of the experiments plotted as a comparison of the average ratio of crystals found within a given distance of other crystals. The faster this ratio drops, the wider the exploration. The data are represented as H1 and H2 for the human experimenters, A1 and A2 for the algorithm runs, R1 and R2 for the random search, and T1, T2, and T3 for the teams of human experimenters and algorithm. Note how two different groups emerge from this data: The first group consists of H2, R1, and R2, and the second group consists of H1, A1, A2, T1, T2, and T3. We can observe that T1, T2, and T3 reduce this ratio faster than any other method, indicating a wider exploration and thus less data points in the vicinity of each other. A similar dynamic can be observed in the algorithm runs A1 and A2 which follow the same trend of fast exploration as the teams. As for the random search (R1 and R2), there is no improvement in their performance. In the case of the human experimenters H1 and H2, we can observe two different behaviors: H1, who shares the same trend as the algorithm and the teams, and H2, who is closer to the random search (R1 and R2). We have previously demonstrated[29] that this broad difference between these runs can be attributed to conservative strategies of exploration, where small steps of exploration are performed, that can limit the information that we can obtain about the chemical landscape. At this stage, we can make a ranking of which individual run is better in exploring the chemical space: T2 > T1 > T3 > A1 > A2 > H1 ≫ H2 > R2 > R1. This ranking resonates with the observations we made with the previous metric in Figure and is the first clear evidence that the collaboration of the algorithm and the human intuition can perform better than each of these two parts individually. In light of this metric, we hypothesize that the effect of the different strategies adopted by the human experimenters as well as their inherent biases can be mitigated with the collaboration of the machine learning methods in exploring the chemical space.

Volume Exploration

Considering the crystallization area as a proxy for the volume of the parameters’ space of the chemicals involved in the experiments, a valuable metric is to estimate how much of the crystallization volume has been explored by each method. Following the results from our experiments, we plotted the average explored volume as a function of the number of experiments performed, see Figure . For the volume calculation, the volume of the experiments leading to crystals was computed (ESI, part 8.2).
Figure 7

Average explored volume of the crystallization space by the four methods (algorithm, human experimenter, random, and team) along with their respective error bars. We can observe two areas of interest between algorithm and team: area A (from the beginning until experiment 50), where they follow a similar behavior, and area B (experiments 50–100), where they start deviating from each other and become distinct following a different path. The respective values of volume and standard deviation are presented in the SI, part 8.2, Table S3 and Figure S23.

Average explored volume of the crystallization space by the four methods (algorithm, human experimenter, random, and team) along with their respective error bars. We can observe two areas of interest between algorithm and team: area A (from the beginning until experiment 50), where they follow a similar behavior, and area B (experiments 50–100), where they start deviating from each other and become distinct following a different path. The respective values of volume and standard deviation are presented in the SI, part 8.2, Table S3 and Figure S23. The y-axis in the results of Figure corresponds to a four-dimensional volume of all crystal points in the parameter space of the chemical reagents. Since each parameter is in mL units, the y-axis unit is strictly speaking mL4, which has no intuitive meaning, and therefore the results should be interpreted relative to each other as arbitrary units (a.u.) and not as absolute values. The error bars of the standard deviation depict the effect that the different methodologies can have in collecting useful data for improving the calculated model of the chemical space in each iteration. The respective values of volume and standard deviation for each individual run are provided in Table S3. From the values in this table, we can observe that from the sixth run onward, the team increases the amount of space substantially it covers (from 1.08 × 10–2 a.u. to 2.12 × 10–2 a.u.), while the algorithm exhibits a relatively slower pace of exploration (from 0.91 × 10–2 a.u. to 1.56 × 10–2 a.u.). The difference between algorithm and human experimenters can be explained by the fact that the algorithm is agnostic to the chemical environment and untied to prior chemical knowledge. This way it can perform jumps in the chemical space straight into the believed boundaries between crystal and no-crystal. On the contrary, human experimenters have drastically varied strategies depending on personal perceptions and biases of the particular chemistry involved in the system under study. A noticeable feature of Figure is that the collaboration of the human experimenter and an algorithm seem to lift this difference between the two and the team work allows for more chemical space to be covered in the same amount of time. Furthermore, the team work manages to outperform the algorithm despite the differences of the exploring strategies followed by the human experimenters.

Interaction between Human Experimenters and Algorithm

We also attempted to understand the interaction between the human experimenter and the algorithm in a deeper level by depicting this interaction as a two-dimensional (2D) contour plot of the experiments over the different generations, see Figure . We observe that T1 is primarily focused on the amount of reducing agent (NH2NH2·2HCl) and perchloric acid (HClO4). In terms of the use of the reducing agent, we can observe a direction toward areas of higher amounts (left graph). Although the experimenter reports the use of perchloric acid as important for their protocol (SI, part 9, Team 1), it is not as evident from this plot since the selection of perchloric acid appears to be evenly distributed (middle graph). Additionally, the amounts of perchloric acid that are used for the experiments remain constrained between 2.5 and 7.5 mL. Another feature that we notice is the decrease of the ratio of Na2MoO4·2H2O/Ce(NO3)3·6H2O during the study (right graph). Given the provided experimental protocol, we are not able to comment whether this is a feature that was also taken into account from the beginning but was not described, or if it occurred because of the specific selection of the experimental variables (i.e., the reducing agent and the perchloric acid) as guides for the exploration.
Figure 8

2D plot represents the choice of the selected experiments during the different runs of the teams. The data from the different runs are plotted against the same surface, and the distinction is made by using a color scheme. The first run is represented in dark blue, while the last in dark red. For Team 1 (T1), we can observe the strong preference for experiments with an increased amount of reducing agent (NH2NH2·2HCl) over the course of the experiments (graphs in the left and right). For Team 2 (T2), the middle graph shows how the increasing amounts of perchloric acid are selected during this study as also reported in the experimental protocol. For team 3 (T3), notice the similarity of behavior in relation to T1.

2D plot represents the choice of the selected experiments during the different runs of the teams. The data from the different runs are plotted against the same surface, and the distinction is made by using a color scheme. The first run is represented in dark blue, while the last in dark red. For Team 1 (T1), we can observe the strong preference for experiments with an increased amount of reducing agent (NH2NH2·2HCl) over the course of the experiments (graphs in the left and right). For Team 2 (T2), the middle graph shows how the increasing amounts of perchloric acid are selected during this study as also reported in the experimental protocol. For team 3 (T3), notice the similarity of behavior in relation to T1. In the case of T2, the reported key guides for the exploration are the amount of reducing agent and the ratio of Mo/Ce (see SI, part 9, Team 2). In Figure , we can observe a more widespread search in terms of the reducing agent (graphs in left and right). Furthermore, we can observe a tendency to use more perchloric acid (middle graph), as it has also been reported in their experimental protocol. As for the ratio of Mo/Ce, it seems to be decreasing but still remaining in a region between 4 and 8 mL (middle and right graph). Finally, for the case of T3, there seems to be a lot of similarities with T1 in the direction of the reagents, although the reported experimental guide is only the ratio of Mo/Ce (SI, part 9, Team 3). A reason behind the choice of these common variables by the human experimenters is that small amounts of perchloric acid will not provide a low enough pH in order to reduce the system, whereas excessive amounts of NH2NH2·2HCl will cause overreduction. On the other hand, a small ratio of Mo/Ce will cause a deficiency in Mo, and the wheel will not be able to form. The trends that we can observe in the nonselected experiments of Figure mirror the reasoning of the human experimenters, as described in their protocols, and allow us as a whole to derive preferred directions in the experimental procedure as well as identify the specific variables used for the exploration of the chemical space of {Mo120Ce6} by identifying patterns in the experimental data. Nevertheless, it is not possible to directly unveil the trends that we observed in Figures –7 since Figure is only a qualitative perspective of the data.

Conclusions

In our previous study[28] we hypothesized that the combination of both machine learning and intuition could be the first step to developing a new approach in order to explore complex chemical systems. This work demonstrates the significant impact that collaboration between human and machine can have, as a significantly higher performance is achieved by working together than either the algorithm or human experimenter could achieve individually. The most important advantage of intuition is its ability to perform well even in areas of high uncertainty. One such area is the lack of high-quality data in chemistry, and this is the framework around which this work was developed. The increased computational power of a machine learning model can allow us to identify hidden patterns in the data, while the human intuition can develop the direction for the experiments. In this way, the inherent personal and chemical biases of the human experimenter can be mitigated, and more “adventurous” studies of large combinatorial spaces or nonlinear processes can be accomplished. We were able to observe and quantify the effects of the team work between human and machine, but not without many problems arising from the different ways in which experimental procedures are documented. This reinforces the imperative need to find a way to digitize this knowledge. We believe that machine learning methods should be viewed as tools in order to assist human experimenters rather than replace them, and these results provide a proof of concept of how this interaction can work. There is a lot more ground to cover in this area, but we feel that bringing together advanced machine learning with human intuition will be transformative and lead to new methodologies in exploring complex problems.
  25 in total

1.  Chemical intuition or chemical institution?

Authors:  Bruce C Gibb
Journal:  Nat Chem       Date:  2012-03-22       Impact factor: 24.427

Review 2.  Heuristic decision making.

Authors:  Gerd Gigerenzer; Wolfgang Gaissmaier
Journal:  Annu Rev Psychol       Date:  2011       Impact factor: 24.137

Review 3.  Computer-based de novo design of drug-like molecules.

Authors:  Gisbert Schneider; Uli Fechner
Journal:  Nat Rev Drug Discov       Date:  2005-08       Impact factor: 84.694

4.  Neural-Symbolic Machine Learning for Retrosynthesis and Reaction Prediction.

Authors:  Marwin H S Segler; Mark P Waller
Journal:  Chemistry       Date:  2017-02-22       Impact factor: 5.236

5.  How many variables can humans process?

Authors:  Graeme S Halford; Rosemary Baker; Julie E McCredden; John D Bain
Journal:  Psychol Sci       Date:  2005-01

Review 6.  Dual-processing accounts of reasoning, judgment, and social cognition.

Authors:  Jonathan St B T Evans
Journal:  Annu Rev Psychol       Date:  2008       Impact factor: 24.137

7.  Machine-learning-assisted materials discovery using failed experiments.

Authors:  Paul Raccuglia; Katherine C Elbert; Philip D F Adler; Casey Falk; Malia B Wenny; Aurelio Mollo; Matthias Zeller; Sorelle A Friedler; Joshua Schrier; Alexander J Norquist
Journal:  Nature       Date:  2016-05-05       Impact factor: 49.962

8.  The Chemistry Development Kit (CDK): an open-source Java library for Chemo- and Bioinformatics.

Authors:  Christoph Steinbeck; Yongquan Han; Stefan Kuhn; Oliver Horlacher; Edgar Luttmann; Egon Willighagen
Journal:  J Chem Inf Comput Sci       Date:  2003 Mar-Apr

9.  An autonomous organic reaction search engine for chemical reactivity.

Authors:  Vincenza Dragone; Victor Sans; Alon B Henson; Jaroslaw M Granda; Leroy Cronin
Journal:  Nat Commun       Date:  2017-06-09       Impact factor: 14.919

10.  Controlling an organic synthesis robot with machine learning to search for new reactivity.

Authors:  Jarosław M Granda; Liva Donina; Vincenza Dragone; De-Liang Long; Leroy Cronin
Journal:  Nature       Date:  2018-07-18       Impact factor: 49.962

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.