| Literature DB >> 35354831 |
Olivia P Pfeiffer1, Haihao Liu2, Luca Montanelli2, Marat I Latypov3, Fatih G Sen3, Vishwanath Hegadekatte3, Elsa A Olivetti4, Eric R Homer5.
Abstract
Researchers continue to explore and develop aluminum alloys with new compositions and improved performance characteristics. An understanding of the current design space can help accelerate the discovery of new alloys. We present two datasets: 1) chemical composition, and 2) mechanical properties for predominantly wrought aluminum alloys. The first dataset contains 14,884 entries on aluminum alloy compositions extracted from academic literature and US patents using text processing techniques, including 550 wrought aluminum alloys which are already registered with the Aluminum Association. The second dataset contains 1,278 entries on mechanical properties for aluminum alloys, where each entry is associated with a particular wrought series designation, extracted from tables in academic literature.Entities:
Year: 2022 PMID: 35354831 PMCID: PMC8967828 DOI: 10.1038/s41597-022-01215-7
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Overview of the methodology used to extract information from available literature and create useful visualization of the aluminum alloy compositional and property spaces. The Aluminum Association is abbreviated as AA.
Number of unique paper DOIs or patent publication numbers contributing to each dataset after all cleaning.
| Composition Dataset | Property Dataset | |
|---|---|---|
| Article Database | 5,172 | — |
| Table Database | 2,882 | 349 |
| Patent Database | 310 | — |
Details of the attributes and values contained in the composition dataset.
| Attribute | Value Datatype | Description | Applicability by ‘source’ |
|---|---|---|---|
| source | String (class) | The original source of composition information, one of: (full text, table, named, patent) | — |
| ft_doi_list | String (list of) | Full text DOI list: List containing all DOIs associated with a given composition | full text |
| table_doi | String | DOI of table’s journal article | table |
| name | String | Determined by source: (named: Four-digit identifier code designated by AA; table: Original source table row name; patent: Patent publication number) | named, table, patent |
| table_extr_AA_des | Integer | Table-extracted AA designation: AA designation code (extracted from original source table row name or table caption via text matching digits of format 'XXXX') | table |
| comp_rule_based_series | Integer (class) | Composition rule-based series: Aluminum alloy series, assigned by applying a set of rules (based on Table | all |
| <element> | Decimal | Percent weight of this <element> within the Al alloy | all |
The csv file contains 6 descriptive attributes (columns) in addition to the element composition columns indicating the weight percent within the alloy.
Description of wrought series composition.
| Series | Description (principal alloying element) |
|---|---|
| 1000 | Pure (99.0% or more aluminum) |
| 2000 | Copper |
| 3000 | Manganese |
| 4000 | Silicon |
| 5000 | Magnesium |
| 6000 | Magnesium and Silicon |
| 7000 | Zinc |
| 8000 | Other |
Wrought aluminum alloys are grouped into eight series, which are defined by the primary alloying element in the alloy, as shown in this table.
Details of the attributes and values contained in the property dataset.
| Attribute | Value Datatype | Description | Notes |
|---|---|---|---|
| doi | String | Digital Object Identifier of the journal article | |
| name | String | Original table row name | |
| series | Integer (class) | Aluminum alloy series designation, one of: 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000 (see Table | The ‘series’ value is first based on the alloy composition associated with the same ‘doi’. It is then manually cleaned following validation processing. |
| caption | String | Original table caption | |
| table_extr_AA_des | Integer | AA designation code | Extracted from original source row name or table caption via text matching where available, otherwise, empty. |
| YS | Decimal | Yield strength (MPa) | When available |
| UTS | Decimal | Ultimate tensile strength (MPa) | When available |
| temper | String | Temper designation | When available |
| elong | Decimal | Percent elongation | When available |
| flag | True/False | Alloy undergoes special processing | |
| flag_note | String | Reason for flag |
The csv file contains 11 attributes (columns), which are described here along with the datatypes of the column values.
Fig. 2Validation of composition information via dimension reduction. This scatter plot shows a 2D projection of the high-dimensional composition space for aluminum alloys that is achieved via t-distributed stochastic neighbor embedding (t-SNE). The shape of the points in the scatter indicates the source type of the alloy composition as follows: alloys registered with AA are diamond, alloys from Journal Texts are vertical line segments, alloys from Journal Tables are horizontal line segments, alloys from Patents are dots. The color of the points indicates key alloy composition information as follows: in the case of Registered Alloys, color corresponds to the alloy series (1000 is black, 2000 is red, 3000 is orange, 4000 is green, 5000 is purple, 6000 is pink, 7000 is brown, 8000 is yellow; in the case of all other source types, color corresponds to the principal alloying element (Cu is red, Mn is orange, Si is green, Mg is purple, Zn is brown, Cr is blue, Fe is turquoise, Ti is grey). Coloring is consistent based on definitions of series (e.g., 2000 series is primarily alloyed by Cu, thus both are red).
Fig. 3Verification of yield strength values. The swarm plot shows the alloy yield strengths extracted from journal article tables, grouped by the alloy’s series. The shaded regions define upper and lower yield strength bounds for each series (not available for 4000 series), as provided by educational software tool Ansys Granta Edupack, and they serve as validation for the points extracted from the literature.
Fig. 4Verification of elongation and yield strength values for 5000, 6000, and 7000 series alloys. The shaded regions define bounding ellipses for each series, as provided by educational software tool Ansys Granta Edupack, and serve as a validation for the points extracted from literature.
| Measurement(s) | chemical composition of aluminum alloys • mechanical properties of aluminum alloys |
| Technology Type(s) | natural language processing |