| Literature DB >> 31615989 |
Olga Kononova1, Haoyan Huo1,2, Tanjin He1,2, Ziqin Rong2, Tiago Botari1,3, Wenhao Sun2, Vahe Tshitoyan2,4, Gerbrand Ceder5,6.
Abstract
Materials discovery has become significantly facilitated and accelerated by high-throughput ab-initio computations. This ability to rapidly design interesting novel compounds has displaced the materials innovation bottleneck to the development of synthesis routes for the desired material. As there is no a fundamental theory for materials synthesis, one might attempt a data-driven approach for predicting inorganic materials synthesis, but this is impeded by the lack of a comprehensive database containing synthesis processes. To overcome this limitation, we have generated a dataset of "codified recipes" for solid-state synthesis automatically extracted from scientific publications. The dataset consists of 19,488 synthesis entries retrieved from 53,538 solid-state synthesis paragraphs by using text mining and natural language processing approaches. Every entry contains information about target material, starting compounds, operations used and their conditions, as well as the balanced chemical equation of the synthesis reaction. The dataset is publicly available and can be used for data mining of various aspects of inorganic materials synthesis.Entities:
Year: 2019 PMID: 31615989 PMCID: PMC6794279 DOI: 10.1038/s41597-019-0224-1
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Schematic representation of synthesis “recipes” extraction pipeline. Top panel: The pipeline starts with retrieval of HTML content from major publishers which is then parsed into a raw text. Next, paragraphs describing synthesis are identified and classified according to synthesis type. Every paragraph is then processed to extract synthesis “recipe”, i.e. materials, operations and conditions. The output is stored in a database for further data mining. Bottom panel: Example of processing a synthesis paragraph into a “recipe”. The key component of “recipe”, such as target and starting materials, synthesis steps and their conditions are found and extracted from the paragraph by different text mining algorithms (see Methods).
Format of each data record: description, key label, data type.
| YellowGreen Data description | Data Key Label | Data Type |
|---|---|---|
| DOI of the original paper | doi | |
| Snippet of the raw text | paragraph_string | |
| Chemical equation | reaction | Object ( |
| - element_substitution: | ||
| - left_side: | ||
| - right_side: | ||
| Chemical equation in string format | reaction_string | |
| Target material data | target | Object ( |
| - material_string: | ||
| - material_formula: | ||
| - composition: | ||
| - additives: | ||
| - elements_vars: {var: | ||
| - amounts_vars: {var: | ||
| - oxygen_deficiency: | ||
| - mp_id: | ||
| List of target formulas obtained after variables substitution | targets_string | |
| Precursor materials data | precursors | |
| Sequence of synthesis steps and corresponding conditions | operations | |
| - token: | ||
| - type: | ||
| - conditions: Object | ||
| –heating_temperature: | ||
| –heating_time: | ||
| –heating_atmosphere: | ||
| –mixing_device: | ||
| –mixing_media: |
a{amount: float, material: string}.
b{formula: string, elements: {element: amount of element}, amount: string}.
c{max_value: float, min_value: float, values: list of floats}.
d{max_value: float, min_value: float, values: list of floats, units: string}.
Performance of data extraction for dataset entries.
| Data attribute | Precision | Recall | F1 score |
|---|---|---|---|
| - targets | 0.97 | / | / |
| - precursors | 0.99 | 0.99 | 0.99 |
| Operations | 0.86 | 0.95 | 0.90 |
| - temperature | 0.85 | 0.87 | 0.86 |
| - time | 0.90 | 0.88 | 0.89 |
| - atmosphere | 0.89 | 0.86 | 0.87 |
| - mixing media | 0.62 | 0.66 | 0.64 |
| - mixing device | 0.82 | 0.55 | 0.66 |
| Balanced reactions | 0.95 | / | / |
Ten most common targets, precursors and reactions present in the dataset.
| Targets | Precursors | Reactions |
|---|---|---|
| LiFePO4 | TiO2 | BaCO3 + TiO2 = BaTiO3 + CO2 |
| LiMn2O4 | SrCO3 | 3CuO + 4TiO2 + CaCO3 = CaCu3Ti4O12 + CO2 |
| BaTiO3 | BaCO3 | 0.5Bi2O3 + 0.5Fe2O3 = BiFeO3 |
| BiFeO3 | La2O3 | SrCO3 + TiO2 = SrTiO3 + CO2 |
| CaCu3Ti4O12 | CaCO3 | 2Li2CO3 + 5TiO2 = Li4Ti5O12 + 2CO2 |
| SrTiO3 | Bi2O3 | TiO2 + CaCO3 = CaTiO3 + CO2 |
| Li4Ti5O12 | Fe2O3 | Nb2O5 + ZnO = ZnNb2O6 |
| Y3Al5O12 | Nb2O5 | 6Fe2O3 + BaCO3 = BaFe12O19 + CO2 |
| CaTiO3 | Li2CO3 | Li2CO3 + TiO2 = Li2TiO3 + CO2 |
| LiNi0.5Mn1.5O4 | Na2CO3 | 0.5Li2CO3 + 0.333Co3O4 + 0.083O2 = LiCoO2 + 0.5CO2 |
Fig. 2Map of chemical space covered by the dataset. For each element, the frame colored in a yellow-to-green gradient represents the total amount of reactions that produce a target compound containing the element. The bar graph below each element shows the list of ions paired with the element in precursor compounds. The length of the bar corresponds to the firing temperature averaged over all the reactions using the given precursor (i.e. element + counter-ion). The elements occurring in five and less targets are faded in grey. “Ac” stands for acetate radical CH3COO− in the compound formula.
Fig. 3Correspondence between choice of synthesis route and precursors counter-ions. The top table gives an example of the four synthesis types defined: one-step synthesis, solution-based, synthesis with intermediate heating steps, synthesis including grinding of precursors in liquid media. The pie-charts on the right displays the fraction of each synthesis route in the dataset. The donuts-like charts represent the fractions of the four synthesis routes (given in table) for each counter-ions used in precursors. “Ac” stands for acetate radical CH3COO− in the compound formula. “Org” stands for organic radical (–CH–) in the compound formula.
Fig. 4Graphical representation of dataset entries queried for the Li-Mn-O system. Examples of the subset entries: target LMO material, synthesis reaction and route. The DOIs are provided for reference. The triangle shows the distribution of the LMO materials on the phase diagram. The circles size and color are scaled according to the number of reaction in the dataset with the given target material.
| Measurement(s) | solid-state synthesis data |
| Technology Type(s) | natural language processing |