Literature DB >> 35474778

Machine Learning Approach to Community Detection in a High-Entropy Alloy Interaction Network.

Raheleh Ghouchan Nezhad Noor Nia1, Mehrdad Jalali1,2, Matthias Mail3,4, Yulia Ivanisenko3, Christian Kübel3,4,5.   

Abstract

There is a growing trend toward the use of interaction network methods and algorithms, including community-based detection methods, in various fields of science. The approach is already used in many applications, for example, in social sciences and health informatics to analyze behavioral patterns during the COVID-19 pandemic, protein-protein networks in biological sciences, agricultural science, economy, and so forth. This paper attempts to build interaction networks based on high-entropy alloy (HEA) descriptors in order to discover HEA communities with similar functionality. In addition, these communities could be leveraged to discover new alloys not yet included in the data set without any experimental laboratory effort. This research has been carried out using two community detection algorithms, the Louvain algorithm and the enhanced particle swarm optimization (PSO) algorithm. The data set, which is used in this paper, includes 90 HEAs and 6 descriptors. The results reveal 13 alloy communities, and the accuracy of the results is validated by the modularity. The experimental results show that the method with the PSO-based community detection algorithm can achieve alloy communities with an average accuracy improvement of 0.26 compared to the Louvain algorithm. Furthermore, some characteristics of HEAs, for example, their phase composition, could be predicted by the extracted communities. Also, the HEA phase composition has been predicted by the proposed method and achieved about 93% precision.
© 2022 The Authors. Published by American Chemical Society.

Entities:  

Year:  2022        PMID: 35474778      PMCID: PMC9026177          DOI: 10.1021/acsomega.2c00317

Source DB:  PubMed          Journal:  ACS Omega        ISSN: 2470-1343


Introduction

Since the ancient eras, human civilization has attempted to discover new and unknown materials, for example, metals and alloys that can all play a key role in the overall quality of human life. Since the Bronze Age, alloys have been produced based on a “basic element” pattern containing one principal element. Various elements are added to the basic element to improve selected properties.[1] Over the past decades, a new approach has been introduced to design alloys, which involves mixing typically five elements or more in equimolar amounts to produce balanced alloys called high-entropy alloys (HEAs).[1] These were initially introduced and developed by Cantor et al.[2] and Yeh et al.[3] The entropy of mixing for these complex alloys is high. The atoms used to create the alloys have a similar size.[6] HEAs have been widely investigated due to their attractive properties, for example, thermal and electrical conductivity,[4] high corrosion resistance,[5] and high strength in combination with high ductility. A parametric method is commonly used to understand and predict the phase stability, often used in pairs presented by a two-dimensional diagram.[17] Although there are many parametric and statistical methods in the field of materials science, machine learning (ML) is considered one of the most effective methods in materials science.[7] ML algorithms are capable of learning models to explore communities and provide results effectively. The purpose of the present study is to introduce a new model for the HEA interaction network, which is made based on HEA descriptors. This model measures similarities among HEA descriptors by creating a network of interactions based on similarities. Communities are extracted from the interaction network so that each community comprises similar HEA compounds. These compositions are available for interpretation. The outcomes of this paper are communities that can help anticipate HEA phases and detect HEA functionalities through the ability to better analyze them. With that, it might be possible to suggest more efficient alloys for selected applications.

Review of Literature

The HEAs containing at least five elements with equal or similar atomic percentages have high strength.[32] These alloys are different from conventional alloys due to four main effects, which include the (1) high-entropy effect, (2) sluggish diffusion effect, (3) lattice distortion effect, and (4) cocktail effect.[32] These effects contribute to the ultimate strength or hardness of HEAs.[32] In a study carried out by Ye et al.,[1] the phase formation of HEAs and their new properties are discussed, such as strength, mechanical performance at high and cryogenic temperatures, ductility, hardness, magnetism, and electrical conductivity. By using ML in HEAs, the design of alloys can be facilitated and used to discover new compounds with desirable properties. Dai et al.[8] introduced a method that creates a low dimensional descriptor for predicting the phase content of HEAs based on their composition. The process behind their method has several stages: first a coefficient analysis is used to select closely related highly relevant descriptors. It then increases dimensions, which is based on the main structure of primary functions and finally, it is important to select descriptors to explain the material.[8] The main focus is on predicting the alloy phase composition. Using a Pearson correlation, the highly correlated descriptors are removed. They created a new nonlinear descriptor, which analyzes the relationship between descriptors to eliminate additional descriptors.[8] Based on the study performed, the authors proposed a framework to collect HEAs data for interpretation.[27] The features are removed which are causing weak phase predictions.[27] The various ML classifiers are used to predict the HEAs phases the HEA interaction network is not created to extract similar compounds as a community.[27] In a study performed by Kaufmann and Vecchio,[21] a ML method is applied to HEAs to predict solid solution forming ability. Thermodynamic and chemical features are used to do predictions by a random forest model.[21] The HEAs descriptors are not used to compute a hybrid similarity to construct a HEA interaction network.[21] identify similar Based on another study, the authors presented a structure by using a genetic algorithm to choose an effective ML model and features for HEAs phase forecast.[22] The content and structural similarity are not considered by the proposed technique to make a HEAs interaction network.[22] Similar HEAs compositions are not detected as a cluster.[22] To the best of our knowledge, a full interaction network of HEAs has not yet been created to analyze it by social network tools. One of the social network analysis tools is community detection. Communities are a group of compounds that can be used to improve functionality, new compound, and new descriptor discovery. Most studies are about phase detection of alloys by ML methods, which predict the phase composition of a compound, HEA communities can be extracted to predict phases more accurately. In addition, each community consists of HEAs with a similar descriptor and behavior. In a study, carried out by M’Barek et al.,[26] biological interaction networks have explored such as genes or proteins. The communities extracted from these biological networks are a set of proteins or genes that collaborate on a similar cellular functionality. By using a genetic algorithm, they presented a specific fitness function based on the amount of similarity and interaction between genes.[26] They have used the semantic similarity in a KEGG data set, which they have taken as score for the structure of communities. It is calculated according to the semantic similarity method based on a genetic ontology.[26] Based on another study, a modified Deepwalk algorithm is presented by the authors, which predicted a link in the protein-protein interaction network.[31] The feature dimensions are reduced to integrate the network structure and features for link prediction,[31] an HEA network by using their descriptors that is used to detect community for similar nodes.[31] In another study, the authors proposed a new tool named MOFSocialNet based on creating social networks using a metal–organic framework (MOF) database. MOFSocialNet is able to guide MOF researchers through the vast chemical space of existing and hypothetical MOFs. For a demonstration, they used social network analysis to identify the most representative MOFs in this research data set and to detect MOF communities.[33] In another study carried out by Ahajjam et al.,[25] a scalable and deterministic approach is proposed to identify communities using leader nodes called the community leader recognition approach. Their approach has two main steps: the first step is to retrieve the leaders and the second step is to identify the community using the similarities between the nodes. Two important issues in their work are community recognition and leader detection in complex networks.[25] The network leader nodes are responsible for disseminating the influence and then, using the similarities between the nodes, the communities around the leader are formed. In social networks, the central nodes are responsible for spreading the intrusion. The advantage of this method is that there is no need for prior knowledge of the number of leaders and communities. They start by finding a leader to identify the most effective nodes and then extract the communities. For each leader, a community is obtained by calculating the similarity between the nodes.[25] They distinguish communities based on the similarities of the nodes with the leader, who are all in the leader’s neighborhood. They used real social network data sets and used the Jaccard, Salton, Human Development Index (HDI) and Human Poverty Index (HPI) to calculate the similarity for finding out which works best in finding the leader of their method.[25] In a study carried out by Zhao et al.,[28] a community detection algorithm based on graph compression is introduced, which is effective in large networks. The compressed graph is first obtained by repeatedly merging nodes of degree one or two with their bigger degree neighbors. Then, two indexes, namely, the density and quality of nodes, are defined to evaluate the probability of nodes as the seed of a community. With these two criteria in a compressed social network, the number of communities and the initial members of the related community are determined.[28] They use the real social network data set to evaluate their method. There is no similarity computation between nodes to create a HEA interaction network. In a study conducted by Rostami et al.,[30] a community detection based on a genetic algorithm is presented for feature selection, which has three steps. It first calculates the similarities of the features, then the features are classified into clusters by community detection algorithms, and finally the features are selected by a genetic algorithm with repair operation based on a new community. A community detection method is used in their approach to divide features into different groups. Using Pearson correlation similarity, the similarity of the features is examined.[30] Clustering is performed on features, and a threshold is set for determining the number of features in each cluster to reach that number using random repair or score repair. Their proposed method selects the optimal number of features, which is automatically determined based on the overall feature structure and their internal similarities.[30] They achieved accurate results in community detection by the genetic algorithm.[30] In a study conducted by Ozaki et al.,[20] the pruning method was added to the Louvain algorithm to optimize computational time while maintaining the quality of the community detection and its process. Using this method simplifies the entire process as calculations on the quality of the clusters do not occur at each phase for all nodes, but instead, such calculations are done for nodes that are used in the next phase of the community detection. Ozaki et al.[20] have applied the Louvain algorithm to network that have similarity among nodes calculated based on a Cosine similarity.[20] The novelty of this paper lies in establishing an interaction network for HEAs that has not been implemented in the field of alloy metallurgy so far. In addition, an interaction network analysis method has been used to analyze the HEAs. This particular method uses ML algorithms for the alloy community detection, along with the Louvain algorithm and the particle swarm optimization algorithm.

Proposed Method

In this section, we present our approach for community detection based on interaction networks of HEAs, using the concepts of Louvain and modified particle swarm optimization (PSO) algorithms. The community members are HEAs that are merged to find the best community number by the Louvain method and by the modified PSO, which are considered the node connections. Our method solves the community detection problem by maximizing an objective function called modularity. Initially, a descriptor for HEAs is selected and a preprocessing is carried out. Communities are extracted by Louvain and by modified PSO algorithms. Our approach is based on the following five steps: Data set was preprocessed to perform ML algorithms. Descriptor content similarity was calculated. Interaction network of HEAs was created. Descriptor structural similarity was calculated. Communities that maximize the objective function were extracted The data set used in this paper contains 90 HEA alloys, as listed in Table in ref (1), each of which is characterized by six descriptors (Appendix A).[1] δ is the atomic size difference.[9] ΔHmix (in kJ/mol) is the mixing enthalpy, calculated using eq :[1]Sc (in kB/atom, where kB is Boltzmann’s constant) is the configurational entropy of mixing for an ideal solid solution.[1] φ is a single dimensionless thermodynamic parameter for designing HEAs.[6] εRMS is the root-mean-square residual strain, usually measured through the energy storage density of the elastic pressure.[10] VEC is considered an important parameter in the selection of the valence electron concentration of the alloys due to the lack of robust atomic size difference.[1] It should be noted that the data class label can also be called phase, which is not considered in these calculations, and the most important challenge in the current article is to find the relationship among similar alloy types. The six descriptors and a portion of the data set are listed in Table in Appendix A, the first column of which contains the number for each chemical composition of the HEA alloy used in the results in Section . The second column in Table gives the HEA chemical composition, and the other six columns show the values of the six descriptors for each composition.
Table 1

Comparison of Quality of Community Detection and Extraction Using Modularity Criteria with the Developed Louvain Algorithm and the Particle Swarm Optimization Algorithm

community detection algorithmmodularity criteria
developed Louvain algorithm0.7130
PSO algorithm0.8912
Table A1

Precision of Louvain and PSO Algorithms in the Phase Prediction of HEA Alloys Indicates That Communities Can Improve the Phase Prediction Precision

compound numberHEA alloy compositionδ (%)ΔHmix (kJ/mol)Sc (kB/atom)φεRMSVEC
1Al0.5CoCrCuFeNiTi0.24.93–4.151.8617.410.04878.12
2Al0.3CoCrFeNi3.76–7.271.5419.990.0377.88
3Al0.5CrCuFeNi24.2–2.511.5220.440.04148.45
4CoCrFeNi0.3–3.751.393583.310.00398.25
5CoFeMnNi3.55–41.3924.510.03538.5
6CoCrMnNi3.45–5.51.3923.990.03438
7CoCrFeNiPd4.46–7.041.6115.950.04468.8
8CoCrCu0.5FeNi0.840.491.58627.50.00838.56
9CuNiCoFeCrAl0.5V0.24.15–2.51.8626.490.04098.16
10CuNiFeCrMo3.584.641.6128.970.03568.2
11CuNiCoFe1.1451.39223.550.01149.5
12CuNiCoFeMn3.181.761.6141.040.03169
13CuNi2FeMn2Cr3.57–0.491.5533.580.03568.43
14CuNi2FeCrAl0.22.940.121.4444.590.02898.77
15CuNi2FeCrAl0.43.86–1.71.524.870.03818.56
16CuNi2FeCrAl0.54.2–2.511.5220.440.04148.45
17Cu0.75NiCoFeCrAl0.253.25–0.711.7242.390.0328.4
18Cu0.5NiCoFeAl0.5Cr4.37–4.61.7520.160.04318
19Cu0.5NiCoCrAl0.5Fe24.08–3.531.6823.060.04038
20Cu0.5NiCoCrAl0.5Fe33.84–2.841.5724.990.03798
21Cu0.5NiCoCrAl0.5Fe3.53.74–2.581.5225.760.03688
22FeCoNiCrCu1.033.21.61369.340.01038.8
23FeNi2CrCuAl0.22.940.121.4444.590.02898.77
24FeCrMnNiCo3.27–4.161.6134.710.03258
25FeCoNiCrCuAl0.33.420.161.7944.960.03378.47
26FeCoNiCrCuAl0.54.17–1.521.7725.770.04118.27
27FeNi2CrCuAl0.64.49–3.271.5317.390.04438.36
28NiCoFeCrMo0.32.38–4.151.5462.10.02358.09
29NiCoFeCrMo0.1Al0.33.9–7.261.6220.050.03857.84
30NiCoFeCrAl0.253.48–6.751.5323.780.03427.94
31NiCoFeCrAl0.33.76–7.271.5419.990.0377.88
32NiCoFeCrAl0.3754.12–7.991.5616.160.04067.8
33VCuFeCoNi2.2–2.241.6184.950.0228.6
34TaNbHfZrTi4.992.721.6116.90.04994.4
35TaNbVTi3.93–0.251.3926.070.03974.75
36TaNbVTiAl0.253.83–4.821.5325.840.03874.65
37TaNbVTiAl0.53.74–8.41.5824.280.03774.56
38TaNbVTiAl1.03.57–13.441.6120.380.0364.4
39WNbMoTa2.31–6.51.3960.870.02315.5
40WNbMoTaV3.15–4.641.6141.180.03155.4
41Al20Li20Mg10Sc20Ti305.16–0.41.5616.170.05152.8
42GdTbDyTmLu5.0701.6118.760.05153
43HoDyYGdTb0.8101.61701.520.00813
44YgdTbDyLu1.3701.61245.870.01373
45AlCo3CrCu0.5FeNi4.88–7.251.6212.520.04827.93
46Al0.8CrCuFeMnNi5.15–3.971.7915.730.05127.66
47AlCo2CrCu0.5FeNi5.17–7.671.7111.830.05117.77
48AlCrCuFeMnNi5.39–5.111.7913.540.05367.5
49Al0.5CoCrFeNi4.6–9.091.5812.230.04547.67
50Al0.5CoCrCuFeNiTi0.45.49–6.421.913.020.05437.98
51Al0.5CrFeNiCoCuTi0.65.92–8.41.9210.360.05867.85
52Al0.5CrFeNiCoCuTi0.86.26–10.111.928.540.06217.73
53Al0.5CoCrCuFeNiTi1.06.53–11.61.937.230.06497.62
54Al0.5CoCrCuFeNiTi1.26.76–12.891.926.260.06717.51
55Al0.5CoCrCuFeNiTi1.46.94–14.021.915.470.0697.41
56Al0.5CoCrCuFeNiTi1.67.09–15.011.94.850.07067.31
57Al0.5CoCrCuFeNiTi1.87.21–15.861.894.340.07197.22
58Al0.5CoCrCuFeNiTi2.07.31–16.61.883.910.07297.13
59CoCrFeNiTi0.55.33–11.561.587.910.05257.78
60CoCrFeNiAlNb0.256.1–14.661.725.260.06057.1
61CoCrFeNiAlNb0.756.5–18.031.793.950.06486.91
62CoCrCuFeNiTi0.85.7–6.751.7911.120.05638.14
63CoCrCuFeNiTi6.12–8.441.798.920.06058
64CuAlNiCoCrFeSi6.13–18.861.954.150.0617.29
65CuNi2FeCrAl0.95.15–5.221.5612.080.05098.08
66CuNi2FeCrAl1.25.6–6.781.579.250.05567.83
67CuNi2FeCrAl1.55.93–8.051.577.470.05897.62
68Cu0.5Ti0.5CrFeCoNiAl0.55.97–10.841.898.790.05917.64
69CuCoNiCrAlFeTiV6.34–13.942.087.730.06317
70FeNi2CrCuAl5.32–5.781.5610.940.05268
71FeNi2CrCuAl1.25.6–6.781.579.250.05557.84
72FeCoNiCrCuAl0.84.92–3.611.7917.150.04878
73FeCoNiCrCuAl5.28–4.781.7914.120.05237.83
74FeCoNiCrCuAl1.55.89–7.051.789.90.05857.46
75FeCoNiCrCuAl2.06.26–8.651.757.620.06237.14
76FeCoNiCrCuAl2.36.4–9.381.736.70.06386.97
77FeCoNiCrCuAl2.86.57–10.281.685.530.06566.71
78FeCoNiCrCuAl3.06.61–10.561.675.170.06616.63
79FeCoNiCuAl5.61–5.281.6110.440.05568.2
80MnCrFe1.5Ni0.5Al0.34.7–5.511.4813.890.0477.19
81MnCrFe1.5Ni0.5Al0.55.16–7.261.5210.620.0517
82ErTbDyNiAl13.74–37.61.61–2.240.14294.4
83PdPtCuNiP9.29–23.681.61–1.260.09529.2
84SrCaYbMgZn15.25–13.121.61–0.0170.15654.2
85SrCaYbMgZn0.5Cu0.516.37–10.61.750.610.16994.1
86SrCaYbLi0.55Mg0.45Zn15.71–12.151.750.20.16124.09
87TiZrCuNiBe12.53–30.241.61–0.90.12686.2
88ZrHfTiCuNi10.34–27.361.61–0.270.10496.6
89ZrHfTiCuFe10.43–15.841.611.730.10596.2
90ZrHfTiCuCo10.24–23.521.610.420.10396.4
Algorithm 1 shows the pseudocode of the proposed method that detects communities using Louvain and PSO algorithms:The flowchart of the proposed method is shown in Figure . There are three stages in the proposed method. The first stage is preparing the data, which consists of three steps including HEA feature selection, feature vector creation, and normalization. The second stage is creating an HEA interaction network by using similarity and pruning graph methods. The third step is to apply a ML algorithm that extracts communities from the network. Finally, the modularity is measured, which shows the quality of the communities.
Figure 1

Flowchart of the proposed method with details. The process is done in three phases including the preprocess, creation of the HEA interaction network, and ML algorithms.

Flowchart of the proposed method with details. The process is done in three phases including the preprocess, creation of the HEA interaction network, and ML algorithms.

Data Normalization

Normalization is used when the provided data values are not in the same range and have different intervals to prevent properties and descriptors that contain large values to dominate the overall performance of the system. Additionally, the normalization can potentially minimize the impact of out-of-range scales and maintains all inputs in a single interval. In the present article, min–max normalization was used for property values to normalize the property values to the interval [0, 1] using eq :[13]where minA and maxA indicate the current minimum and maximum values of the properties found in A. The original values and the normalized values of the properties are presented as v and v′, respectively. As can be seen in eq used above, the maximum and minimum values are 1 and 0.[13]

Content Cosine Similarity Criteria

Content cosine similarity is measured based on the internal angle between two vectors and determines whether the selected vectors are considered codirectional.[11] As shown in the data set in Appendix A, each property of a single composition can be analyzed and compared to another compound.[1]Equation shows the content cosine similarity as follows[11]where x represents the ith property of the first compound and y is the ith property of the second compound.

Structural Jaccard Similarity Criteria

The Jaccard index is mainly used for a comparison of the structural similarity of a data set.[12] The value of the Jaccard similarity coefficient between two data sets is usually obtained by dividing the number of common properties of the two available sets by the total properties of the two sets.[12] Because the input of the interaction network graphical structure is required for the calculation of the Jaccard criterion, the matrix obtained by content cosine similarity must be examined first with different thresholds to find the appropriate value and create the desired graphical representation of the network, so that structural similarities can be measured using the established graph. The threshold to obtain a graph for the content cosine similarity analysis was set at 0.98 in the current study. The description of the structural Jaccard similarity criteria is shown in eq :[12]where v and v are the two nodes representing the compounds of HEAs, |N ∩ N| denotes the common properties of the two compounds v and v, and |N ∪ N| are all the properties of v and v. It is also important to note that this particular criterion can be applied to all common pairs of attributes.

α Coefficient

The calculation of the parameters for content and structural similarity results in two matrices with similarity values. To detect communities, a hybrid similarity matrix is needed as input that contains similar properties. The α coefficient determines the effect of each of the similarities. The α coefficient also determines the possible effect of each of these common similarities as well as the effect of structural Jaccard similarity. The α coefficient also determines the proper effect of each of content and structural similarities. The output of this phase is a hybrid similarity matrix as required for the community detection algorithm.

Community Detection

Each community in the interaction network shows the alloys that have dependencies between each other to perform the same functionality in an equal community.

Theoretical Notation Definitions

A complex network can be mapped to the graph G(V,E), where V is the node set and E is the edge set. A network C(v,e) is said to be subnetwork, if v is the subset of V and e is the subset of E. Let A be the adjacency matrix; two nodes are adjacent if they have an edge between them. If there exists a link between vertex i and vertex j, then A = 1; otherwise A = 0. A weighted network has weight w joined to the edges, where w is a real number. Communities in networks are the groups of nodes, which are more profoundly connected to each other than to the rest of the nodes within the network. Community detection is the key characteristic, which may well be utilized to extract valuable information from networks.[29]

Louvain Algorithm

The content of science studies can usually be represented as complex networks, in which the topology of interconnected vertices is obtained from either an organized or random compound.[14] The Louvain algorithm is a metaheuristic method that is introduced to identify and detect communities and groups within the provided graph. In addition, each extracted group represents a community, and this type of algorithm is considered to be an ascending clustering method.[15] Furthermore, a parameter called modularity is used to determine the quality of the obtained communities in this algorithm, and the maximization of this particular parameter is considered to be of great importance. This specific parameter selects the type of communities that are integrated with the target vertex and creates highly modular communities.[15] Despite difficulties in the calculation of modularity in large graphs, the Louvain algorithm can overcome this issue by speeding up the processing of large graphs.[15] This unique property of the algorithm led to its popularity.[16] It is also essential to add that the Louvain algorithm is considered the fastest and most effective algorithm for community detection that tends to operate tirelessly to achieve maximum modularity. The implemented algorithm is divided into two phases that are alternately repeated. Imagine that the procedure begins through a weighted interaction network with N nodes. It first places each node in a separate community, which has just as many nodes as the current network, and then examines the possible neighbors for each node and evaluates the precise rate of modularity, which is accomplished through the removal of the nodes from its related community and transfering them to its neighboring community. Finally, the targeted nodes are placed within the community with the highest possible modularity rate (positive rate); otherwise, it will remain in the current community. Afterward, this process is repeated alternately for all the interaction network nodes of HEAs until no new enhancements are achieved and the first stage of the process is essentially completed.[17] Although this process is repeated several times for each node, the first stage is completed when a local maximum modularity is reached and the rate of modularity remains stable. Examining the order of the nodes in the output of the algorithm may affect the computational load that requires further study. The overall performance quality of the Louvain algorithm for the community detection can be obtained using the modularity rate ΔQ, which is calculated through transferring isolated node i to the C community via eq :[17]where Σ is the sum of the links found within the community C, Σ is the total weight of the links connected to the nodes within the community C, k is the total weight of links associated with node i, k is the sum of the weights of the links connected from node i within the community C, and m is the total weight of all the network links. When node i is transferred from its related community, a similar term is often used to evaluate the modularity changes and adjustments, and the modularity changes can be measured through the removal of node i from its related community and its replacement in the neighboring community.[17] The second stage of this particular algorithm involves establishing a new network of nodes that have previously found their community during the first phase. The weight of the links among new nodes is reached through the total weight of the links between nodes in two respective communities, and the links between the nodes in the same community can potentially lead to an inner circle of the community in the newly established networks. After the second phase is completed, the initial phase of the algorithm can then be re-applied in the previously created weighted network to evaluate the obtained results more accurately, and the combination of these two phases is is termed as a pass. As a result, the overall number of meta-communities decreases with each iteration, and its highest computational load occurs in the initial phase. In fact, these phases are to be continued until the maximum modularity is reached and no further changes occur. This particular algorithm can represent highly complex networks and often operates hierarchically so that the final obtained communities are created and established through an iterative process of integration. Moreover, the height of the hierarchy is determined through the number of iterations, which is usually discovered to be small; take note that this algorithm can possibly have various advantages, such as the visibility of courses that can easily be conducted as well as reaching the targeted outcome without a need for individual attendance or any type of monitoring. Next, it should also be added that the algorithm operates quite as fast and can calculate the modularity rate simply based on Formula , which after several repeated courses tends to reduce the number of obtained communities through the integration method. The maximum conduction period of this particular method is related to the initial iteration in the first phase.[17] Also, the qualitative limitations and boundaries of modularity have been eliminated, due to the multilevel nature of the algorithm. Finally, it is also worth mentioning that the isolated nodes are transferred from one community to another, in the first phase of the algorithm.[17] The probability of merging two separate communities through transfer of nodes one-by-one is considered extremely low. However, take note that these communities can very well be merged later, after the consolidation of the nodes is complete.[17]

Community Detection Based on the Particle Swarm Optimization Algorithm

Kennedy and Eberhart initially introduced the PSO algorithm in 1995, which was inspired by the characteristics and behavior of birds.[24] The PSO algorithm is considered to be one of the most important and useful swarm intelligence algorithms, which frequently offers better overall solutions compared to other available algorithms. The mobility found within the particles, which is an array (90 × 1) of nodes, is potentially the best possible way to update each particle for community detection.[19] Optimization of the algorithm may lead to rapid convergence as well as a reduction in the rate of references to the proportionality function, which is directly related to the modularity criterion of community quality.[19] For example, suppose there is a major optimization issue found in the dimensional space of d, where X = (X, X, ..., X) and V = (V, V, ..., V) are the position and velocity vectors, respectively. Let pbest be the best possible solution for particle i (i = 1, 2, ..., Psize) and gbest be the best possible solution among any type of particle. Furthermore, collaborative and ML of particles are also conducted in each update of pbest and gbest. Besides, with each iteration of the PSO algorithm, the current velocity and position of the particles can also be updated using eqs –8 as follows:[23]in which the parameter t represents the iteration of the conducted algorithm, w is essentially the inertia coefficient, c1 and c2 are the learning rates, rand1 and rand2 are random numbers that are uniformly generated in the interval [0, 1], and ρ generally functions as a predefined threshold.[23]

Optimal PSO Algorithm and Group Learning

Given that independent communities are obtained by sorting the set of HEA compounds in the interaction network of L(G) materials, they are optimal communities and smaller than G. To identify independent communities in a network, there is a need to discover independent communities in the corresponding line graph. The developed PSO algorithm is used along with group learning techniques resulting in LEPSO, which can be used to optimize the results obtained by linear graph segmentation.[23]

Presentation of Community Detection Using Optimal PSO

The linear graph for the chemical composition of alloys is represented as L(G) = ⟨N, E⟩, where N = (n1, n2, ..., n), in which a part of L(G) can be presented as X = (X, X, ..., X) and k = |N|. In the case where the initial value was assumed to be X = m, then the results may indicate that there is a relationship between the two compounds e = ⟨n, n⟩ and the X particles, specifically when n and n are found within the same communities as L(G). In order to determine the initial community as the optimal type, each PSO must first be considered as an array of alloy compounds. In this regard, the matrix proximity of the primary interaction network gains the materials using the connected and linked nodes. Some of the potential drawbacks of this design include random initialization of the particles and frequent updates of the particle locations. Moreover, this issue is often so major that the particle components may potentially display links that have never existed before. To solve such problems, particles are recommended to be presented on a list of regular neighbors.[23] The foundation of this particular design is essentially based on the use of data distribution of the neighbors for each node as a representative of an alloy composition, which potentially ensures that newly entered particles used in the process of transference or initialization are all allowed. However, the complete removal of unauthorized particles as well as the prevention of the production of local optimal communities, with the use of repetitive binary division and automated community detection methods, are all considered some of the potential advantages of this optimization method in PSO.[23]

Particle Fitness Function in the Optimal PSO

The comprehensible definition of community can encourage researchers to introduce new and different types of quality indicators to evaluate the possible benefits of a partition. The main assumption behind modularity is that the edge density of a cluster should be higher than the predicted density of the sub-graph, so the nodes can randomly be linked. In order to complete the discretization process of the provided algorithm, each node and its relationship with the other available nodes has to be individually analyzed and checked. Therefore, the link between the initial compound and the other compositions is obtained first, and the adjacency matrix is established subsequently. Finally, the particle fitness function that can determine the quality of final-phase communities, also known as modularity is shown in eq (23):In this equation, fit(P) is the particle proportionality and fitness value of P, m is the number of communities found in the C partition of the network as G = ⟨N, E⟩, l is the number of edges that link the vertices in the community, which is shown as c ∈ C, d is the sum of the nodes within C, and |E| is the total number of edges found in G.

Update of Particle Speed and Position

Particle Speed Update

An optimal particle velocity updating algorithm called GbestGenerator is used to avoid the local optimization method, which applies a voting-based clustering technique to take full advantage of valuable hidden community patterns found in less efficient particles and gbest values. In case the proportion of gbest does not improve in Tmax consecutive iterations, meaning that if the swarm particles are trapped in, member particle clusters MPS are created through the selection of all available gbest particles within the Tmax and its consecutive iteration particles leading to the combination of the right MPS particles to produce new gbest particles. Accordingly, each particle can potentially have a minimum and a maximum speed for velocity.[23] Equation suggests that the inertia coefficient shown as w is considered extremely significant in the implementation of particle velocity updates. The adjustment strategy of w can be well expressed using eq , which is described as follows[23]where wmax and wmin are the initial and final inertia coefficients, respectively, tmax is the representation of the maximum iteration, and t indicates the current iteration. As can be seen in eq , in the initial stage (t = 0), the parameters wmax and w are both considered correspondence of each other. When t is too close to tmax, w gradually decreases toward wmin. Furthermore, due to the algorithm converges in the early stages, larger coefficient values are needed for the particles to be faster in velocity, while in later stages, much smaller coefficient values are provided to the particles to gradually enhance their overall stability.

Positional Particle Update

Based on the previously provided Formula , the positional vector components are assigned to either particles 0 or 1, which is not very suitable for the display of particles with respect to the neighbor. Accordingly, the previous position of the particles is related to the previous community and the current new position can be related to the final community. Therefore, the value of X as a part of i is obtained from an integer within the range from 1 to deg(n), meaning that X ∈ {1, 2, ..., deg(n)} can essentially improve the PSO and the searching abilities of the system. The particle positional updates are shown in great detail using following eqs and 12:[23]where k = rand × deg(n), k ≠ X(t), deg(n) is the degree of the vertex found in n, and ρ is the threshold set by the user. Another noteworthy point to mention involves the generated positional value based on the distribution degree, which indicates that if the value of node v is greater than those of its surrounding neighbors or if sig(V(t + 1)) is ever greater than the value of ρ, the neighbors of the nodes must then be transferred to the currently selected neighbors. Therefore, the sigmoid function sig() function found within eq is modified to solve this issue. The particle position is very likely to change through the particle velocity reduction procedure, causing the PSO to gradually converge in the global optimization.[23]

Results and Discussion

In view of the novelty of this study, several experiments were carried out to evaluate the efficiancy of the proposed method. In order to detect the HEA interaction network, the similarity of alloy features must be addressed. The weighted interaction network of the HEAs is shown in Figure , where the weight of the links between the compounds determines the degree of similarity among them. This particular interaction network was initially formed based on the content and structural similarity of the alloy descriptors. Besides, all compositions were linked leading to the formation of a complete graph. The primary interaction network had 90 sets of nodes corresponding to the HEA compounds, and each compound is shown by the number presented in results as defined in Appendix A. As shown in Figure , every compound was linked to every other compound using 3968 edges. The interaction network is an undirected graph where all compounds are connected to each other. The degree of each node in the HEA interaction network is the number of edges that it has to other nodes. The degree distribution shows the probability distribution of degrees over the whole network. The degree distribution diagram for the graph constructed is shown in Figure . As shown in Figure , the average degree based on the diagram is 14.20, which is the probability distribution of these degrees over the network. The next step is illustrated in Figure , an α coefficient value of 0.9 and a threshold of 0.6 are both applied to the network, which eliminates less similar nodes and weak connections. As shown in Figure , the resulting interaction network contains 632 edges to maintain the communications among all 90 nodes. In Figure , the nodes are drawn with the size reflecting the degree of each node.
Figure 2

Weighted interaction network of 90 HEAs, where all compounds are connected to each other. The HEA interaction network is fully connected before applying the threshold.

Figure 3

Degree distribution of the HEA interaction network. The degree distribution presents the probability distribution of degrees over the whole network. The degree of each compound is the number of edges that it has to other compounds.

Figure 4

Impact of threshold and α on the HEA interaction network. The threshold prunes the weak connection in the HEA network. The α coefficient determines the value of content and structural similarity.

Weighted interaction network of 90 HEAs, where all compounds are connected to each other. The HEA interaction network is fully connected before applying the threshold. Degree distribution of the HEA interaction network. The degree distribution presents the probability distribution of degrees over the whole network. The degree of each compound is the number of edges that it has to other compounds. Impact of threshold and α on the HEA interaction network. The threshold prunes the weak connection in the HEA network. The α coefficient determines the value of content and structural similarity. As presented in Figure , the Louvain algorithm has been applied to the HEA interaction network, which extracts 13 communities with an overall quality of approximately 0.71. For these 13 communities shown in Figure , each community is indicated by a unique color, and the compounds in every community are fully connected. As shown in Figure , the communities are also extracted with the optimal PSO algorithm using the HEA interaction network in 100 iterations. The 13 optimal communities by PSO are displayed with an enhanced quality of approximately 0.89 shown in Figure . Because the compositions of each community are not connected to the other communities’ compounds, the communities obtained from the optimal PSO algorithm have a higher quality modularity parameter. The analysis of each community shows that the neighbors of every compound have the same phase label and include similar elements.
Figure 5

Community detection with the use of the developed Louvain algorithm. Any colored community shows a community and the compounds have similar functionality and descriptors.

Figure 6

Community detection with the use of an optimal PSO algorithm. Any colored community presents a community of HEA compound that is extracted by the PSO algorithm.

Community detection with the use of the developed Louvain algorithm. Any colored community shows a community and the compounds have similar functionality and descriptors. Community detection with the use of an optimal PSO algorithm. Any colored community presents a community of HEA compound that is extracted by the PSO algorithm. In this paper, the measurement criteria for both community recognition algorithms are the main parameters for assessing the quality of communities.[14] If the number of edges found in a community is not more than a random diagram, it can be concluded that the modularity is zero. Another point to note is the maximum modular value, which is basically obtained when all the internal nodes within a community are connected and there is no external edge to other communities.[14] One of the basic features of modularity is the ability to compare different communities with various methods. Because other algorithms do not necessarily extract the same results, many existing criteria cannot assess the quality of communities. Therefore, using the Louvain’s hierarchical bottom-up method, analysis of modularity trends in the process of dividing or merging communities can be investigated. The maximum value of this parameter is considered as the best outcome. Moreover, the modularity of each community is a scalar value between −1 and 1, which essentially measures the density of the community’s internal links in comparison with the links found between communities;[15,16] a modularity between 0.3 and 0.7 indicates a strong community.[18] Because this criterion is closer to 1, the communities are high quality. According to the experimental result, the modularity of both optimal algorithms in this paper is upper than 0.7 (Table ). The four communities are, as an example, shown in Figure , for communities extracted based on the developed Louvain algorithm. For example, in the blue community (Figure ), SrCaYbMgZn, SrCaYbMgZn0.5Cu0.5, and SrCaYbLi0.55Mg0.45Zn are in a single community with similar elements including Sr, Ca, Yb, Mg, and Zn which are found in these alloys with equal functionality. The number of elements in the blue community (Figure ) is five and six, where two alloys SrCaYbMgZn0.5Cu0.5 and SrCaYbLi0.55Mg0.45Zn have the same number of elements and contain common elements. The alloys in the blue community (Figure ) are amorphous. The descriptor values for the alloys in the blue community (Figure ) vary in ranges: δ [15.25, 16.37], ΔHmix [−10.6, −13.12], Sc [1.61, 1.75], φ [−0.017, 0.61], εRMS [0.1565, 0.1699], and VEC [4.09, 4.2]. The yellow community (Figure ) included nine HEA compounds, such as CoCrFeNiAlNb0.75, Al0.5CoCrCuFeNiTi1.2, Al0.5CoCrCuFeNiTi1.4, Al0.5CoCrCuFeNiTi1.6, CuAlNiCoCrFeSi, Al0.5CoCrCuFeNiTi1.8, CoCrFeNiAlNb0.25, Al0.5CoCrCuFeNiTi2.0, and FeCoNiCrCuAl3.0. They have five common elements including Al, Co, Cr, Fe, and Ni. The HEA compounds in the yellow community are multiphase alloys. They contain six and seven elements and their descriptors have a similar range for three compounds. The δ parameter is in the range [6.1, 7.31], ΔHmix in the range [−18.86, −10.56], Sc in the range [1.67, 1.95], φ in the range [3.91, 6.26], εRMS in the range [0.0605, 0.0729], and VEC in the range [6.63, 7.51]. The green community included nine HEA compounds, such as Al0.5CrFeNiCoCuTi0.8, CuCoNiCrAlFeTiV, FeCoNiCrCuAl2.0, FeCoNiCrCuAl2.3, FeCoNiCrCuAl2.8, Al0.5CoCrCuFeNiTi1.0, CoCrFeNiTi0.5, CuNi2FeCrAl1.5, and Cu0.5Ti0.5CrFeCoNiAl0.5. They have three common elements including Cr, Fe, and Ni. The HEA compounds in the green community are multiphase alloys. The δ parameter is in the range [5.33, 6.57], ΔHmix in the range [−13.94, −8.05], Sc in the range [1.57, 2.08], φ in the range [5.53, 8.79], εRMS in the range [0.0525, 0.0656], and VEC in the range [6.71, 7.78]. The red community (Figure ) included PdPtCuNiP, TiZrCuNiBe, ZrHfTiCuCo, ErTbDyNiAl, and ZrHfTiCuNi alloys. All alloys within a red community consist of five elements. Furthermore, ZrHfTiCuNi and ZrHfTiCuCo have common elements such as Zr, Hf, Ti, and Cu. All the alloys within that community are amorphous. The descriptor values for alloys in the red community are in the range δ [9.29, 13.74], ΔHmix [−23.52, −37.6], φ [−2.24, 0.42], εRMS [0.0952, 0.1429], VEC [4.4, 9.2], and Sc is 1.61, equal for all alloys in the red cluster.
Figure 7

Four community samples obtained through the developed Louvain algorithm. These four communities are selected as an example from the results that present the functionality of the proposed method.

Four community samples obtained through the developed Louvain algorithm. These four communities are selected as an example from the results that present the functionality of the proposed method. The four communities are illustrated in Figure , which is obtained based on the PSO algorithm. As an example, the yellow community included FeCoNiCrCuAl, AlCrCuFeMnNi, NiCoFeCrAl0.375, MnCrFe1.5Ni0.5Al0.3, and CoCrFeNiPd compounds. The common elements in a community are Fe, Ni, and Cr. The alloys in the yellow community (Figure ) contain five and six elements. Two alloys FeCoNiCrCuAl and AlCrCuFeMnNi have the same number of elements and contain five common elements. Some alloys in the yellow community (Figure ) are multiphase, and some of them are single phase (FCC). The descriptor values for the alloys in the yellow community (Figure ) vary δ in the range [4.12, 5.39], ΔHmix in the range [−7.99, −4.78], Sc in the range [1.48, 1.79], φ in the range [13.54, 16.16], εRMS in the range [0.0406, 0.0536], and VEC in the range [7.19, 8.8]. The blue community included SrCaYbLi0.55Mg0.45Zn, SrCaYbMgZn, and SrCaYbMgZn0.5Cu0.5, which is the same as in the blue community (Figure ) extracted by the Louvain method. The green community (Figure ) included 14 HEA compounds, such as AlCo3CrCu0.5FeNi, CoCrCuFeNiTi0.8, CoCrCuFeNiTi, CuNi2FeCrAl0.9, CuNi2FeCrAl1.2, MnCrFe1.5Ni0.5Al0.5, FeCoNiCrCuAl1.5, FeCoNiCuAl, FeNi2CrCuAl, FeNi2CrCuAl1.2, Al0.5CrFeNiCoCuTi0.6, Al0.5CoCrCuFeNiTi0.4, Al0.5CoCrFeNi, and AlCo2CrCu0.5FeNi. They have common elements, such as Fe and Ni. The HEAs contain five and six elements. All the HEA compounds in the green community (Figure ) are multiphase. The δ parameter is in the range [4.6, 6.12], ΔHmix in the range [−9.09, −5.22], Sc in the range [1.52, 1.92], φ in the range [8.92, 13.02], εRMS in the range [0.0454, 0.0605], and VEC in the range [7.0, 8.2]. The red community (Figure ) included PdPtCuNiP, TiZrCuNiBe, ZrHfTiCuCo, ErTbDyNiAl, and ZrHfTiCuNi alloys, which is the same as the red community (Figure ) extracted by the Louvain method.
Figure 8

Four community samples obtained through the PSO method. These four communities are selected as an example from the results that present the functionality of the proposed method.

Four community samples obtained through the PSO method. These four communities are selected as an example from the results that present the functionality of the proposed method. Considering the communities in the present article, it can be concluded that these clusters essentially have high quality and accuracy, which can be shown through the modularity criterion. Table shows the obtained results of the modularity criterion of community quality using the developed Louvain algorithm and optimal PSO method. As is shown in Table , the modularity criteria in the developed Louvain algorithm is a constant value equal to 0.71, and it has not changed. Also, in optimal PSO, the modularity parameter is started from 0.87 and after 30 iterations, it increased up to 0.89 and after 60 iterations until 150 iterations, it did not change any more. Therefore, the modularity value by optimal PSO is about 0.89. Finally, the benefits of HEA community detection are discussed. In the field of biology and proteins, the analysis of protein networks is useful because proteins that are in a community have common behaviors and properties. This means that proteins of the same community behave similarly. Therefore, it can be concluded that the purpose of community detection is to have HEAs with similar elements in their composition and phases, such as colored community alloys (Figure ). Considering the fact that HEAs in a community have similar properties, one can recognize the properties of HEAs as soon as the community of alloys is determined. For example, the maximal number of elements in the alloy could be predicted according to the communities extracted. Another advantage of community detection in HEA interaction networks is the phase prediction using ML techniques, which is presented in this paper. Because the most extracted communities have the same phase in the HEA network, the phase composition of each compound, which is indistinct, can be anticipated. The unseen compound’s phase can be identified by the other compounds that are in the same community. As shown in Table , we looked at the number of phases in the same community. In addition, the precision of phase prediction is shown in Table showing the phase forecast by Leuven and PSO, which is approximately 88% and approximately 93%, respectively.
Table 2

Precision of Louvain and PSO Algorithms in the Phase Prediction of HEA Alloys Indicates That Communities Can Improve the Phase Prediction Precision

community numbercompound phase by PSOcommunity prediction phase by PSOprecision by PSO %compound phase by Louvaincommunity prediction phase by Louvainprecision by Louvain %
1383078191894
2191789.518132
3111005240
4215011100
5111002150
61110011100
71110011100
89910011100
9991001414100
10551001818100
113310055100
121110033100
total average precision93.125total average precision88.019

Data and Software Availability

In this study, we have used MATLAB software version R2019a, which has been referred in https://www.mathworks.com/. As described and referenced in Section , the data set and source code for this paper are located at the GitHub. The data set referred https://github.com/rghoochannejad/HEAs-Community-Detection/tree/Dataset and the source code is referred at https://github.com/rghoochannejad/HEAs-Community-Detection/tree/main. The data set to this article can be found online at https://doi.org/10.1016/j.mattod.2015.11.026.

Conclusions and Suggestions for Future Research

The present study aimed to present a novelty for community detection based on ML to detect HEA compounds that behave similarly to each other. At first, the descriptors of each compound are analyzed, and then the similarities among the alloys in terms of phase composition are calculated accordingly. Second, an interaction network of HEAs is established, which could very well be linked to the interaction network. Additionally, both the quality and accuracy of extractive communities and their modularity criteria have been analyzed and investigated thoroughly using two methods of Louvain and PSO algorithms, indicating that the proposed method has a high quality in community detection. This evaluation shows that the detected clusters potentially have robust internal connections among the compounds. Although the obtained results of the current method were indicative of high quality and precision, it does not mean that it cannot be further developed. It is also important to mention that other methods can be implemented very well in future studies to determine the more advanced properties of alloys. The present method can also be developed in larger data sets with maintaining the quality. The use of other ML methods still have great potential for obtaining better results, although these statistical methods and ML algorithms do in fact enhance the speed of the research conduction in the field of materials science. The introduced method is not considered as the only efficient way for community detection, but it can be applied in other areas of materials science leading to the detection of other beneficial alloy compositions that can be used in the industry. Finally, the HEA community detection is useful to finding new common features of similar alloys. Moreover, phase prediction is an action, which can be performed by community detection in this study with a good precision rate.
  7 in total

Review 1.  Community structure in social and biological networks.

Authors:  M Girvan; M E J Newman
Journal:  Proc Natl Acad Sci U S A       Date:  2002-06-11       Impact factor: 11.205

2.  Fast algorithm for detecting community structure in networks.

Authors:  M E J Newman
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2004-06-18

3.  Modularity and community structure in networks.

Authors:  M E J Newman
Journal:  Proc Natl Acad Sci U S A       Date:  2006-05-24       Impact factor: 11.205

4.  MOFSocialNet: Exploiting Metal-Organic Framework Relationships via Social Network Analysis.

Authors:  Mehrdad Jalali; Manuel Tsotsalas; Christof Wöll
Journal:  Nanomaterials (Basel)       Date:  2022-02-20       Impact factor: 5.076

Review 5.  Machine learning for molecular and materials science.

Authors:  Keith T Butler; Daniel W Davies; Hugh Cartwright; Olexandr Isayev; Aron Walsh
Journal:  Nature       Date:  2018-07-25       Impact factor: 49.962

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.