Literature DB >> 35535303

Use of machine learning for unraveling hidden correlations between particle size distributions and the mechanical behavior of granular materials.

Abstract

A data-driven framework was used to predict the macroscopic mechanical behavior of dense packings of polydisperse granular materials. The discrete element method, DEM, was used to generate 92,378 sphere packings that covered many different kinds of particle size distributions, PSD, lying within 2 particle sizes. These packings were subjected to triaxial compression and the corresponding stress-strain curves were fitted to Duncan-Chang hyperbolic models. An artificial neural network (NN) scheme was able to anticipate the value of the model parameters for all these PSDs, with an accuracy similar to the precision of the experiment and even when the NN was trained with a few hundred DEM simulations. The estimations were indeed more accurate than those given by multiple linear regressions (MLR) between the model parameters and common geotechnical and statistical descriptors derived from the PSD. This was achieved in spite of the presence of noise in the training data. Although the results of this massive simulation are limited to specific systems, ways of packing and testing conditions, the NN revealed the existence of hidden correlations between PSD of the macroscopic mechanical behavior.

Entities: Chemical

Keywords: Artificial neural networks; Discrete element method; Geotechnics; Machine learning; Triaxial

Year: 2021 PMID： 35535303 PMCID： PMC9050806 DOI： 10.1007/s11440-021-01420-5

Source DB: PubMed Journal: Acta Geotech ISSN： 1861-1125 Impact factor: 5.570

Introduction

The specific values of properties such as strength, compressibility and permeability of dry and cohesionless coarse grain materials (including sand, gravel, railway ballast or rockfill) depend on the features of the constituent particles (intrinsic properties) and on the way in which the particles are arranged (state parameters). Among the intrinsic properties of a sand, the surface friction, the compressibility and the strength of individual grains, the particle shape and particle size distributions are known to play a crucial role in its macroscopic properties [7, 20, 56, 60, 66]. Relative density and confining pressure are the most influent state variables for dry granular soils [5] and govern the mechanical behavior of the material to a large extent [3, 53, 63]. The relationship between the particle size distribution, PSD, and the mechanical behavior is not yet fully understood. On one hand, the effects of variations in the PSD are not independent from those produced by variations of other intrinsic properties or state parameters. For example, the state parameter , proposed within the theoretical framework of the critical state of sands [5], helps to distinguish between the contractive or dilatant behavior exhibited by a sand upon triaxial compression. However the critical state line, and hence the value of associated to given void ratio e, changes with the PSD [28, 38, 39, 43]. As another example, there is a complex interplay between size and shape polydispersity, as shown by numerical modeling [46]. On the other hand, linking single quantities (maximum and minimum dry density, critical state void ratio, macroscopic friction angle, stiffness, etc.) to a PSD is not immediate, since the latter is a highly variable curve that is many times long-tailed and/or multi-modal. Descriptors derived from the PSD are not enough to anticipate macroscopic (void ratio, stiffness, friction angle) or microscopic features (average coordination number, fraction of non-contributing particles, etc.) obtained after a given process. To the best of our knowledge, neither geotechnical descriptors, such as the (i.e., the sieve size passed by xx percent in weight of the sample), the coefficient of curvature or the uniformity coefficient , nor statistical descriptors (mean, variance, skewness, kurtosis, etc.) enable satisfying estimations. There is not clear procedure to work directly with the whole PSD curve. Even in the case of very idealized systems (e.g., packings of spheres) variations of the PSD may lead to considerably differences in the fabric resulting after a packing protocol [57, 65], in the relative density [20] or in the shear strength [17]. In the case of non-idealized systems this can be even worse, as several kinds of physico-chemical phenomena occur on different length and time scales. Relationships between geotechnical descriptors obtained from the PSD and geotechnical properties have been sought (e.g., [42, 66–68, 70]), but findings are always empirical and limited to a specific set of soils, conditions and stress paths. The use of large datasets enables promising techniques to understand how the complex behavior of granular systems can be anticipated from the microscopic features. For example, the use machine learning techniques, together with complex network theory, has allowed for the establishment of relationships between the fabric of a packing and some macroscopic geotechnical properties, such as the permeability [31, 64] or the effective heat transfer coefficient [14]. These techniques have been also applied for obtaining morphological information of granular materials, including their particle size distribution, from X-ray computed tomography images [36] or sample images [75]. The use of artificial neural networks, or just neural networks (NN), has been proposed as a potentially useful technique to model materials behavior [15, 49]. In the case of geotechnical applications, NNs have been used for unsaturated soils (to predict the shear strength [37], to model their mechanical behavior [29, 74] and to determine the effective stress parameter [2]), for fine-grain soils (to predict the compression index [47, 79], shear strength [50], unconfined compression strength [21], creep index [76, 77] and hydraulic conductivity [76] from index properties), for rocks (to predict the uniaxial compressive strength and the elastic modulus [11]) and for coarse-grain soils—sands and gravels—to model the mechanical behavior [4, 13, 16, 48, 78]. The inputs for these NN approaches included both intrinsic properties and state parameters. In some of these cases the target outputs were directly some model parameters (namely, the compression index [47, 79], creep index [76, 77], shear strength [50], unconfined compression strength [11, 21], apparent cohesion [37], effective stress parameter [2], elastic modulus [11], etc.). In other of the above mentioned cases (i.e., [4, 13, 48]), the purpose of NNs was to reproduce the stress–strain curve by anticipating new values of stress or strains obtained when some others were changed in a controlled way. The datasets were the result of a limited number of laboratory experiments (around several tens to a few hundred). Only in [47], the database included data from near 1 thousand experiments. In some cases, a single stress–strain curve measured in a laboratory experiment was used to gather the data. NNs have also been successfully used to link parameters used in DEM simulation to the macroscopic behavior of granular materials (angle of repose [6], hopper discharge rate [35] or even stress–strain curves in some contemporary works [61, 75], by anticipating the curves according to the instantaneous state of the fabric). These researches have shown how NNs enable for the understanding of the bulk behavior of granular materials with a reduced number of virtual experiments. One of the main issues in geotechnical engineering is dealing with uncertainty, especially to analyze an engineering-scale problem and conduct risk assessment. When the model parameters are estimated from intrinsic or state properties by means of data-driven frameworks, it seems necessary to take into account the uncertainty of input and output parameters. To understand how uncertainty propagates, NNs can be combined with Bayesian methods [71] or dropout regularization techniques [59]. For example, dropout has been used in some contemporary works to estimate the mechanical properties of soft clays [76] and of idealized granular materials [61] or machine learning combined with a Bayesian approach has been followed to quantify uncertainty in coarse-grained models [30]. Another contemporary work [79] has investigated the performance of five commonly used machine learning algorithms (namely, back-propagation NN, extreme learning machine, support vector machine, random forest—RF—and evolutionary polynomial regression) when predicting the compression index of remoulded clays from three input parameters (void ratio, water content and plasticity index). This study indicated that RF is recommended when the ranges of variation of input variables in database are large. The capacity of RF has been shown, for example, in risk prediction of deep foundation pits in subway stations [81] (a situation in which there is a frequent imbalance between low-risk and high-risk data). In this research the role played by the PSD in the mechanical behavior of an idealized system of polydisperse spheres has been investigated by means of massive numerical testing with the DEM and NNs. To do that, we have used simplified models that have been conceived to isolate the effect of the PSD on the mechanical behavior. These models are simple but still exhibit a non-obvious relationship between the PSD and stress versus strain curves. Although the usability of results is limited to specific systems and testing protocols, the approach may be useful for other cases and may shed light on the mechanical behavior of dry coarse-grain soils. This is a very timely moment since techniques such as computer vision [51] or X-ray tomography are allowing for an exhaustive characterization of the microstructure of granular packings (including PSD and fabric) [69]. Potential applications of the combination of DEM and NNs in the context of staggered multilevel material identification procedures [41] can also be interesting. There are some considerable differences with respect to the previously referenced works on the use of NNs for geotechnical applications. On one hand, the NN is built on a dataset that was the outcome of a series of more than 90 thousand virtual experiments, performed with samples of varying PSD. The set of PSDs is the outcome of a systematic exploration of possible cases lying within two particle sizes. The probability and size increments used during a discretization of the sample space determined the number of PSDs to explore. We simulated all the cases to have a sufficiently large data sample, to find out how the accuracy of the estimations depends on the size of the training dataset and to know what the Probability Distribution Functions, PDF, of the target outputs for the NN are. On the other hand, we exclusively focus on the role played by the PSD in the macroscopic behavior observed in a specific test. We use quite simple granular systems (made of elastic and frictional spheres) and experiments to exclusively focus on the differences in the mechanical behavior due to the PSD. Our NNs anticipate parameters for widely used constitutive relationships, which are a valuable input for continuum based approaches and which can be used independently. Therefore the approach is not to store data of the current state of a packing in order to anticipate its evolution (as in [4, 13, 48, 61, 75]), but to callibrate a constitutive model just by knowing the PSD. The proposed approach is ab initio as phenomenological laws are not used (except that for the contact mechanics interaction). Neither intrinsic parameters that cannot be defined on the grain scale (such as maximum or minimum dry density, etc.) nor state parameters related to packing features (void ratio, average coordination number, etc.) were introduced. The mechanical features of particles and the packing and compression protocols have always been the same and the only difference from one case to another was the PSD. These virtual samples shew the typical behavior of loose sands in triaxial compression but the stress versus strain curves changed from one case to another. The results of these experiments are not intended for other systems or tests, but they are used to illustrate a methodology based on NNs. Although these are simplified models, far from real soils and not capable for accounting for relevant aspects affecting the mechanical behavior (such as the packing ratio or the particle shape), they still exhibit complexity. Albeit the simplicity of these simplified models, the relationship between the stress versus strain curves and the PSD remains totally unknown. An additional difficulty is that the output data used to train ML are noisy because the limited number of particles in the simulation adds some uncertainty. This could be also the case in which the data are measured with limited precision. The data were fitted to the celebrated Duncan–Chang hyperbolic model [12], which has a successful history of application in soil mechanics despite its simplicity and is still being used (e.g., [25]). This model is defined by two model parameters, namely, the tangent elastic modulus and the ultimate deviatoric stress . Thus, the proposed NN receives as input a discrete description of the PSD of a granular material at hand, and returns as output and . The performance of NNs was also compared to that of two multivariate linear regressions, MLR, which explained model parameters as a linear combination of common geotechnical and statistical descriptors derived from the PSD. As it will be illustrated below, the network is able to predict the Duncan–Chang model’s parameters with higher accuracy than MLR, extremely fast, and even in the presence of noisy training data. Indeed, it proved itself to be a powerful tool for unraveling the existing correlations between PSD of granular materials and their macroscopic mechanical behavior, hidden to the naked eye. The rest of this paper is structured as follows: Initially, the discrete element method used for the generation of virtual triaxial experiments, as well as the considered PSDs and the obtained results are described in Sect. 2; secondly, in Sect. 3, we present the basic principles of artificial neural networks, together with the design of the networks used in this work and their training process; the results obtained with the NNs are presented and discussed in Sect. 4, as well as a study of the amount of required data to train them and their robustness with respect to noisy data; finally, conclusions are drawn in Sect. 5.

Massive DEM triaxial testing

The discrete element method [8], DEM, has been proven to be a very effective tool for the study of the macroscopic mechanical behavior of granular materials under drained [23, 27, 33, 34, 45, 55, 58, 63, 72, 73, 80] and undrained [18, 26, 54] triaxial or biaxial compression. In this work the DEM is used to perform virtual drained triaxial tests for a large number of sphere packings with different PSDs. In what follows we describe the model used for carrying out such simulations, as well as the obtained results.

Numerical setup

We performed 92,378 DEM simulations of triaxial compression tests on samples made of particles following varying PSDs. The different PSDs used in each case were selected according to a systematic exploration described as follows: particle diameters ranged between m and m. This narrow range of particle sizes was considered because of computational issues associated to the DEM (critical timestep decreasing with the particle size and computational time increasing with the number of particles in the simulation). This interval was divided into 10 equal size bins with , and . The central size of each bin is . The expected percentage in mass of the particles within each size bin i is denoted as . We consider that is a discrete variable that can take values from 0.0 to 1.0 and spaced by 0.1. All possible combinations satisfying are considered. This procedure led to the 92,378 cases of PSDs that were subsequently used in the triaxial tests. Once all the PSDs were defined, a random sample of particles was generated for each of them. The mass of the particles was uniformly distributed in each bin. The considered set of PSDs includes very different kinds of granular systems: Monodisperse, well graded, gap-graded multimodal distributions, etc. A few of them, which could be more recognizable by readers, have been particularly considered for illustrative purposes. These special PSDs are labeled and shown in Fig. 1a.

Fig. 1

9 special PSDs (out of 92,378) were selected for illustrative purposes. The upper figures show the cumulated percentage passings. The figures below show the stress–strain curves obtained through virtual testing 3D Models of YADE-DEM illustrating the steps of numerical experiments: (1) A random loose cloud of around 20,000 particles is located within a box; (2) the simulation box is reduced to achieve a packing that is in equilibrium under isotropic stress; and (3) the simulation box is reduced in one direction while the stress is maintained in 2 perpendicular directions For each PSD, a sample was generated by randomly locating a loose cloud of around 20,000 spherical particles within a cubic box (Fig. 2a). We imposed periodic boundary conditions and then the cubic box was isotropically shrunk to achieve a dense packing under isotropic compression conditions kPa, where , and are the principal stresses (Fig. 2b). This confining stress was pretty high considering the stiffness of the particles (Young’s modulus of MPa). This led to a loose sand behavior, which fits very well with the Duncan–Chang hyperbolic model. Then the stress was kept in 2 perpendicular directions ( and ), while the sample was shortened in the third perpendicular direction until reaching a unit strain (Fig. 2c). The corresponding average stress was measured at several strain levels. The deviatoric stress–strain curve, versus , was registered and fitted to a Duncan–Chang hyperbolic model [12], which is defined by 2 model parameters, namely, the tangent elastic modulus, and the ultimate deviatoric stress :

Fig. 2

3D Models of YADE-DEM illustrating the steps of numerical experiments: (1) A random loose cloud of around 20,000 particles is located within a box; (2) the simulation box is reduced to achieve a packing that is in equilibrium under isotropic stress; and (3) the simulation box is reduced in one direction while the stress is maintained in 2 perpendicular directions

Nonlinear least squares method was used to fit Eq. (1) and obtain the model parameters in each experiment. The fitting was done in the interval . A few examples of generated curves (corresponding to special PSDs) can be seen in Fig. 1b.

Numerical model

We used the DEM implemented in YADE-DEM [62].1 Particles behave as rigid solids that obey the laws of classical mechanics. The interaction between particles is produced through a soft contact model. In particular, we used a simple linear elastic and frictional contact law. This is a common choice in DEM simulation [8, 24]. Normal forces between particles are thus computed aswhere is the normal force exerted by particle j on particle i, is the distance overlap, and are the particles’ radii, is their relative position vector, is its associated unit vector, and is the normal contact stiffness. In this model, was related to the Young’s modulus of the material, MPa, as . If two particles that were previously in contact (i.e., ) are displaced in a direction perpendicular to , an opposite shear force appears. Shear forces are limited by the inter-particle friction:where is the shear force exerted by particle j on particle i, is the total tangential displacement of the contact, radians is the inter-particle friction angle and is the shear stiffness. The density of particles kg/m (as the size of the particles and the stiffness) was scaled to reduce the collision time and therefore the critical timestep used in the explicit integration of the equations of motion. The maximum strain rate imposed during the triaxial compression was fixed according to this critical timestep and updated on the fly to speedup simulations. A numerical damping was used to dissipate the kinetic energy. Details can be found in [62]. Coefficient of variation of the parameters for the Duncan–Chang hyperbolic model as a function of the number of particles in the sample. Samples followed the uniform PSD in Fig. 1a. The experiment was repeated 15 times for each number of particles

Precision and performance

As the generated samples include a finite number of particles, the computed stress–strain curves for a single PSD may fluctuate around the expected values. Accordingly, the values of the Duncan–Chang model parameters obtained from a single DEM triaxial test, are only punctual estimations, , , which are generally different from the expected values, and . There are several reasons for this variability: The size of the particles used in each simulation is randomly chosen according to the PSD, particles are randomly located within the simulation box and the system is chaotic. In any case, the larger the sample, the smaller the fluctuation. The expected variability of measurements was assessed through a series of virtual triaxial tests. These tests were performed with samples made of varying number of particles but that always followed the same PSD (the uniform PSD in Fig. 1). The experiment was repeated 15 times for each number of particles to gather a statistical sample of and values. A coefficient of variation was defined for each model parameter x as (where is the sample standard deviation, is the sample mean and M stands for measurement). Results are shown in Fig. 3. In order to achieve a good compromise between accuracy and computational cost, the size of samples was limited to around 20,000 particles in the DEM experiments used to train the NN. With this number of particles the of and are expected to remain around 0.05 and 0.10, respectively (see Fig. 3). These numbers indicate that the measurement of is more precise than that of .

Fig. 3

Coefficient of variation of the parameters for the Duncan–Chang hyperbolic model as a function of the number of particles in the sample. Samples followed the uniform PSD in Fig. 1a. The experiment was repeated 15 times for each number of particles

The numerical experiments were computed using the version 2018.02b of YADE-DEM [62], running on Ubuntu 18.04.4 64 bits, on a server machine with four processors Intel Xeon Gold 6148 2.40 GHz, with 20 physical cores each, and 1 TB of RAM memory. As a rough estimation, each single DEM simulation took on average 1 h and 20 min on a single core. Therefore, the total computation time for processing the 92,378 samples was around 5135 days. In order to speed up the process, many computer cores were used for running multiple independent simulations in parallel. Thus, the total process time was reduced to 4 and a half months of computation, approximately.

Virtual triaxial testing results

The 92,378 samples were virtually subjected to triaxial compression. The corresponding stress–strain curves presented the typical behavior of loose sands. A good matching between each series of data and a Duncan–Chang hyperbolic curve was achieved. The values of obtained from DEM after a flat sampling over the set of PSDs, are distributed as shown in Fig. 4a. The sample mean is , its standard deviation is and the maximum and minimum values are and , respectively. The coefficient of variation of this problem quantifies how the expected value of a specific PSD may separate from the mean value across all the PSDs. Regarding the tangent elastic modulus, the coefficient of variation is . With respect to the values of obtained from DEM, the distribution is shown in Fig. 4b, the sample mean is , its standard deviation is () and the maximum and minimum values are and , respectively.

Fig. 4

Histograms of and values obtained from virtual triaxial testing with the set of 92,378 PSDs explored, and variation between both values. The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2

Table 2

Values of , , obtained with virtual triaxial simulations, and other indicators for the cases in Fig. 1

PSD	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{E}_{0}$$\end{document}E^0	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hat{\sigma }_{\mathrm{ult}}$$\end{document}σ^ult	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{D}$$\end{document}D¯	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_D$$\end{document}sD	Skew.	Kurt.	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\mathrm{u}}$$\end{document}Cu	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\mathrm{c}}$$\end{document}Cc
Big-small	6.59	2.48	0.109	0.01	2.66	5.11	1.0	1.00
Monodisperse	6.77	2.35	0.105	0.00	–	–	1.0	1.00
Decreasing	6.49	2.99	0.105	0.01	0.60	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.80$$\end{document}-0.80	1.1	0.91
Increasing	6.85	2.41	0.105	0.01	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.60$$\end{document}-0.60	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.80$$\end{document}-0.80	1.2	1.01
Bell	6.91	2.32	0.105	0.01	0.00	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.50$$\end{document}-0.50	1.2	1.01
5-modal	7.60	2.91	0.105	0.03	0.00	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-1.30$$\end{document}-1.30	1.6	1.05
Small-big	8.39	2.25	0.100	0.02	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-2.66$$\end{document}-2.66	5.11	1.8	1.83
Uniform	8.22	2.56	0.100	0.05	0.00	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-2.00$$\end{document}-2.00	1.8	0.97
2-modal	9.31	2.72	0.100	0.05	0.00	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-2.00$$\end{document}-2.00	2.5	0.40

The descriptors included in the table, correspond to the mean , standard deviation , skewness and excess kurtosis of the particle diameters; and the curvature and uniformity coefficients (geotechnical indicators). and . These values are presented graphically in Figs. 4, 5 and 6

These results evidence that, in these experiments, significative variations of the Duncan–Chang model parameters can be found depending on the PSD. Unfortunately, there is no sign of correlation between the two model parameters (see Fig. 4c). In addition, they neither correlate to the set of inspected statistical or geotechnical descriptors described in Table 1, as it can be observed in Figs. 5 and 6, respectively.

Table 1

Set of descriptors used to relate the parameters of the Duncan–Chang model to the PSD

Descriptor	Symbol	Definition
Geotechnical descriptors
Uniformity coefficient	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text {u}}$$\end{document}Cu	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text {u}} = \frac{D_{60}}{D_{10}}$$\end{document}Cu=D60D10
Coefficient of curvature	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text {c}}$$\end{document}Cc	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{\text {c}} = \frac{D_{30}^2}{ D_{10} D_{60}}$$\end{document}Cc=D302D10D60
Statistical descriptors
Expected value	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{D}$$\end{document}D¯	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{D} = \sum _i p_i d_i$$\end{document}D¯=∑ipidi
Standard deviation	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_D$$\end{document}sD	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_D = \sqrt{\sum _i p_i \left( d_i - \bar{D} \right) ^2}$$\end{document}sD=∑ipidi-D¯2
Skewness	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\mu }_3$$\end{document}μ~3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\mu }_3 = \frac{\sum _i p_i \left( d_i - \bar{D} \right) ^3}{s_D^3}$$\end{document}μ~3=∑ipidi-D¯3sD3
Excess Kurtosis	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${K}_{\left[ D \right] } - 3$$\end{document}KD-3	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${K}_{\left[ D \right] }-3= \frac{\sum _i p_i \left( d_i - \bar{D} \right) ^4}{s_D^4} - 3$$\end{document}KD-3=∑ipidi-D¯4sD4-3

Fig. 5

Variation of , obtained from virtual triaxial testing, compared to different statistical descriptors, namely the mean diameter of particles, the standard deviation, skewness and excess kurtosis of the PSD (see Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2

Fig. 6

Variation of , obtained with virtual triaxial testing, with geotechnical descriptors of the PSD, namely the curvature and the uniformity (see Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2

Set of descriptors used to relate the parameters of the Duncan–Chang model to the PSD For the 9 special PSDs included in Fig. 1, the values of the considered statistical and geotechnical descriptors are gathered in Table 2 and also shown in Figs. 5 and 6. Variation of , obtained from virtual triaxial testing, compared to different statistical descriptors, namely the mean diameter of particles, the standard deviation, skewness and excess kurtosis of the PSD (see Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2 Variation of , obtained with virtual triaxial testing, with geotechnical descriptors of the PSD, namely the curvature and the uniformity (see Table 1). The cases reported in Fig. 1 are highlighted, while their actual values are gathered in Table 2 Values of , , obtained with virtual triaxial simulations, and other indicators for the cases in Fig. 1 The descriptors included in the table, correspond to the mean , standard deviation , skewness and excess kurtosis of the particle diameters; and the curvature and uniformity coefficients (geotechnical indicators). and . These values are presented graphically in Figs. 4, 5 and 6 In the light of these results, the establishment of relationships between PSD descriptors and Duncan–Chang model parameters does not seem feasible. Geotechnical descriptors are useless in these cases as they do not give too much information of the whole PSD and cannot distinguish, for example, between curves that have the same , and but are very different from each other. Statistical descriptors provide a better idea of the PSD curve, but they are also pointless to explain the mechanical behavior. This is probably due to the fact that the mechanical behavior is a direct consequence of the features of the contact network and each PSD creates its own complex topology with varying consequences [40].

Artificial neural networks

Artificial neural networks, or simply neural networks (NN), are biologically inspired computing systems able to learn from data. Data abundance, together with increasing computing power, are probably the two main factors behind the great success of these algorithms and their exponential growth during the last decade, despite the fact that their origin dates back to the early 40s of 20th century [22]. Artificial neural networks, together with other machine learning techniques, have been proven very successful tools for tackling tasks as image recognition, language processing or financial forecasting, to name just a few. Beyond doubt, machine learning in general, and neural networks in particular, are powerful tools for untangling complex patterns on large datasets. Motivated by the apparent lack of correlation between PSD descriptors and the Duncan–Chang model parameters evidenced in the previous section, in this work we present, as an accurate alternative, the use of NNs for inferring the macroscopic mechanical behavior of polydisperse granular packings. As it will be seen in the results presented in Sect. 4, this tool will help us to find hidden connections between the particle size distribution of spherical packings and their macroscopic mechanical behavior.

The multilayer perceptron

One of the most simple and commonly used NN architectures is the Multi-Layer Perceptron (MLP). The MLP can be seen as a nonlinear function that maps input data to output data. It consists of several layers: One input layer, one or more (intermediate) hidden layers, and one output layer. The input information is feed-forwarded from the input layer, through all the intermediate layers, up to the output layer. Each layer is composed of one or more nodes (or neurons) that are the basic computational units (see Fig. 7). At each layer, the neurons are fed with the output generated by the neurons of the previous layer, they process the data, filter it through a nonlinear activation function, and produce new output values that feed the neurons of the next layer (if any). The presence of nonlinear activation functions grants NNs the ability of approximating non-trivial functions. Indeed, as stated by the universal approximation theorem [10], feed-forward NNs with a single (finite) hidden layer and differentiable activation functions, can approximate any continuous function; and in the case of two hidden layers or more, any function [9].

Fig. 7

Multilayer perceptron architecture

Multilayer perceptron architecture Let us describe how a MLP generates output values from given input. Let be the number of layers in a NN, such that and , and let be the number or neurons of the l-th layer, with , where the layer 0 corresponds to the input layer and the L-th layer is the output one. We denote as the input vector of the l-th layer and, accordingly, is the output of that layer and the input of the next one. Thus, are the input values of the network and are the output ones. Starting from input vector , the values of layers 1 to L are computed through the recursive expression:where and are the weights matrix and bias vector, is a matrix-vector product that results in a vector of length and is the nonlinear activation function. Among others, the Rectified Linear Unit (ReLU) function [44], defined as , is one of the most commonly used activation functions. In the case in which z is a vector, as it is the case of Eq. (4), is applied to each vector component independently. On the other hand, the coefficients of the weights matrix and the bias vector are a collection of trainable parameters that describe the NN. Those parameters, initially unknown, are determined by means of a process known as training. The goal of the training is to find a set of values for those parameters that leads to an accurate input–output mapping of the network for the training dataset (a subset of the available input–output samples). Finding the locally optimal parameters is a minimization process of a (loss) function that measures the distance (in a certain norm) between the known output sample values and the ones predicted by the network. The mean squared error norm, used in this work, is one of the most commonly used loss functions. Such optimization is commonly carried out by means of gradient-based iterative algorithms, like the ones of the family of stochastic gradient descendent methods, as it is the case of Adam [32]. For an in-depth discussion of MLPs and other NN architectures we refer the interested reader to [19, 22].

NNs for predicting Duncan–Chang model’s parameters from PSDs

In this work we used MLP networks for predicting the parameters of the Duncan–Chang’s model from a discrete description of the particle size distribution of a given spherical packing. The use of MLP networks in the context of DEM simulation has allowed for the establishment of links between model features and bulk material behavior [6]. The definition, training and evaluation of the NNs was implemented using TensorFlow [1]. Thus, as it can be seen in Fig. 7, the designed network receives as input the ten PSD related values , defined in Sect. 2.1, and returns as output and . Therefore, the input and output layers present 10 and 2 neurons, respectively. The best network architecture (number of hidden layers and neurons) for the problem at hand is unknown a priori. Thus, in order to choose a good architecture, we systematically explored network configurations with different number of hidden layers and neurons per layer. In the results presented in Sect. 4, networks with 1, 2, 3 and 4 hidden layers and 8, 16, 32 or 64 neurons each (16 different architectures) were considered. For all of them, the ReLU activation function was used for all the layers, including the output one. The available virtual triaxials dataset (92,378 samples), was divided into three separated groups: the training dataset, composed of of the total samples available (66,152); the cross-validation dataset, of the total samples (7390); and the test dataset, constituted by the remaining samples (18,476). These three datasets were chosen randomly, nevertheless, they remain constant along the different analyses performed. Whereas the training dataset was used for training the NNs, the cross-validation dataset helped us to compare the networks’ performance and verifying the absence of undesired overfitting effects during the training process. Finally, the test dataset was used for measuring the accuracy of the chosen networks when predicting a series of cases that were not used during the training process. The chosen splitting () is inspired by the typical values that can be found in the literature (see, e.g., [19]). The training process of all the networks was carried out using Adam [32] with 1000 epochs (training iterations through the whole test dataset). And, in order to speedup the training process, the input and output data were previously normalized. Three different learning rates were considered for Adam, namely . The network’s training is an inherently random process for two main reasons: Adam is by definition a stochastic algorithm in which the training samples are processed in a random order at each iteration; and the network’s weights are randomly initialized. Thus, in order to overcome these two sources of randomness, each one of the 16 network architectures was trained 5 times for each learning rate. After this training process, the network with the lowest loss function value for the cross-validation dataset was chosen. No large differences were observed among the different architectures, nevertheless, a NN with a single hidden layer and 32 neurons on that layer presented a slightly better performance (network NN1 in Table 3). Its loss error training history is plotted in Fig. 8 for the training and validation datasets. As it can be appreciated in that figure, the validation error is slightly higher than the training error, as expected, and no signs of overfitting were noticed. In addition, it can be noticed that the choice of 1000 training epochs seems to be an overconservative choice.

Table 3

Different neural networks architectures considered in this work

Name	# Hidden layers	# Neurons per h. layer	r
NN1	1	32	100
NN2	4	8	10
NN3	4	8	5
NN4	1	8	1
NN5	1	8	0.5
NN6	1	8	0.2
NN7	1	8	0.1

Each NN consists of a different number of (#) hidden layers and neurons per hidden layer, while the input and output layers have 10 and 2 neurons, respectively. Each network in the table was trained with a different number of samples (r denotes the % of the full test dataset)

Fig. 8

Training and validation loss error histories for the chosen neural network, denoted as NN1 in Table 3. NN1 was trained using Adam with a learning rate

Different neural networks architectures considered in this work Each NN consists of a different number of (#) hidden layers and neurons per hidden layer, while the input and output layers have 10 and 2 neurons, respectively. Each network in the table was trained with a different number of samples (r denotes the % of the full test dataset) Training and validation loss error histories for the chosen neural network, denoted as NN1 in Table 3. NN1 was trained using Adam with a learning rate

Prediction of Duncan–Chang model parameters through neural networks

The ability of the NNs described in Sect. 3.2 to predict the values of and from PSD information, is discussed in this section. In order to assess the prediction ability we consider deviations between NN predictions and DEM measurements for the same PSD. The relative deviation for the model parameter x ( or ) associated to a PSD is evaluated according to:where is the DEM measurement and is the NN estimation. Similar definitions are used for other estimators different from NN (e.g. represents the estimation of model parameter x by a multiple linear regression). In contrast to deviations, errors are defined with respect to the expected value of each model parameter, ( or ), associated to a PSD. However the expected values are usually unknown. They would be obtained with an infinitely large sample or by averaging over many random realizations of the triaxial test with packings following the same PSD. The mean squared deviation for parameter x, , is defined as:where is the number of cases used in the testing set. As mentioned above, a randomly chosen test dataset of of the sample cases (18,476 out of 92,378) was used for testing the network’s accuracy. These data are new to the network, in the sense that they were used neither during the training nor the cross-validation processes. As presented below, different networks were trained using varying number of samples, nevertheless, the test dataset used for evaluating the networks’ performance was kept constant along all the cases considered.

Neural networks ability to predict the Duncan–Chang model parameters

Let us consider the network NN1 (see Table 3), trained using the full test dataset (80% of the 92,378 cases, see Sect. 3.2). For this specific NN the corresponding distributions of relative deviations within the test dataset are shown in Fig. 9. The NN1 anticipated values with deviations that fluctuated around 0. The standard deviations were and . The accuracy with the tangent elastic modulus is higher than with the ultimate deviatoric stress. This is consistent with the fact that the precision in the measurement is higher and the observed variation across the considered cases is lower in the case of .

Fig. 9

Histogram of the relative deviations between NN and DEM in the estimation of Duncan–Chang model parameters when 80% of the experiments were used to train the network. Results obtained with the network NN1 (see Table 3) For the sake of comparison, if the outcomes of the NN had been equal to the expected values, to those predicted by multiple linear regression or to random estimations, then the average deviations would have been considerably higher. In effect, we used two random estimations: Random 1, which followed the observed probability distribution functions, PDFs, of and (Fig. 4a, b); and Random 2, which followed uniform distributions lying between and (or between and ). We also compared the outcomes of NN to those obtained from multiple linear regressions, MLR. We used statistical and geotechnical descriptors as dependent values. The corresponding linear models are denoted as MLR and MLR, respectively. These models take the formwhere x is the dependent Duncan–Chang’s model parameter, either or , and and are the MLR coefficients (included in Table 4). The comparison between the predictions of the several methods and the measured values is summarized in Table 5. These results evidence the superior performance of the NN to predict the model parameters.

Table 4

Model parameters and coefficient of determination of the multiple linear regressions used to estimate Duncan–Chang models from statistical and geotechnical descriptors included in Table 1

MLR	x	Descriptors, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$x^{(0)}_i$$\end{document}xi(0)						\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$R^2$$\end{document}R2
MLR	x		Intercept	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{D}$$\end{document}D¯	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_D$$\end{document}sD	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\bar{\mu }_D$$\end{document}μ¯D	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${K}_{\left[ D \right] } - 3$$\end{document}KD-3
MLR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{{\text {s}},\, E_{0}}$$\end{document}s,E0	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$E_{0}/ E$$\end{document}E0/E	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _i$$\end{document}αi :	0.062	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.055$$\end{document}-0.055	0.944	− 0.007	0.003	0.667
MLR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{{\text {s}},\sigma _{\text {ult}}}$$\end{document}s,σult	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _{\text {ult}} / E$$\end{document}σult/E	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha _i$$\end{document}αi :	0.037	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-0.120$$\end{document}-0.120	0.096	0.000	− 0.000	0.326

Table 5

Comparison of the standard deviation of deviations and the mean square deviation with DEM for NN1 predictions, MLR and some random estimations (Mean: mean value of the training dataset; Random 1: Observed PDF; Random 2: Uniform PDF)

	Estimation
	NN1	MLR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{{\text {s}}, x}$$\end{document}s,x	MLR\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{{\text {g}}, x}$$\end{document}g,x	Mean	Random 1	Random 2
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{\varDelta _{E_{0}}}$$\end{document}sΔE0	0.046	0.067	0.065	0.106	0.097	0.131
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{MSD}}_{E_{0}} \times 10^5$$\end{document}MSDE0×105	1.383	2.961	2.805	8.000	12.77	39.99
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$s_{\varDelta _{\sigma _{\text {ult}}}}$$\end{document}sΔσult	0.115	0.124	0.124	0.155	0.167	0.178
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathrm{MSD}}_{\sigma _{\text {ult}}} \times 10^5$$\end{document}MSDσult×105	1.171	1.295	1.274	1.750	2.153	3.355

Model parameters and coefficient of determination of the multiple linear regressions used to estimate Duncan–Chang models from statistical and geotechnical descriptors included in Table 1 Comparison of the standard deviation of deviations and the mean square deviation with DEM for NN1 predictions, MLR and some random estimations (Mean: mean value of the training dataset; Random 1: Observed PDF; Random 2: Uniform PDF) To correctly assess the accuracy of the NN, it is worth recalling that the data are pretty noisy (DEM measurements with a coefficient of variation for measurements of and , as seen in Sect. 2.3). It is also important to mention that the PDFs of and , when all PSDs are considered, are bell-shaped (cf. Fig. 4a, b) with and , respectively. Thus, despite the narrow margin left by the measurement precision and the distribution of values associated to this problem, the NN anticipated the Duncan–Chang model parameters from the PSD with the same accuracy than the precision of the DEM experiments with which it was trained. The accuracy was higher than that obtained from multiple linear regressions. The capability of NNs to anticipate the model parameters from the PSD unveils the existence of hidden and nonlinear correlations between the PSD and the macroscopic mechanical behavior of granular materials, which are unravelled by NNs. This result is even more interesting taking into account the fact that the DEM data used for training the NN are noisy, as discussed below in Sect. 4.3.

Neural network accuracy with respect to the size of the DEM training dataset

Once we knew the expected accuracy of the NN predictions, we progressively reduced the size of the training datasets. Along all the presented results, the test cases were always the same 20% subset of the total. Our goal was to estimate the number of DEM tests (out of the 73,902 possible) that are needed to effectively train the NN without significantly compromising its accuracy. To assess the accuracy of a NN trained with a certain subset of the training dataset, we evaluated the network for the test dataset and computed the mean squared deviations, , where x refers to the model parameter (either or ). The subscript r denotes the percentage of the potential training cases (out of the 73,902 possible) that were used in each training set. The exact number is referred to as , i.e.: E.g., means that only samples of the test dataset were used to train the network. Variation of the mean squared deviations between NN and DEM estimations, respect to the size of training dataset, for the parameters and . Dashed lines represent the mean squared deviation after the repetition of the most unlikely DEM simulations, whereas solid lines regard the first results. The neural networks used for each training dataset r are described (see Table 3) Figure 10 shows how the performance of the NN is barely affected by the size of the training dataset, even when this is drastically reduced. The networks used in Fig. 10 are defined in Table 3. With only of the potential cases (around 700 DEM experiments), the network NN4 was able to predict the Duncan–Chang model parameters for the test dataset (18,476 samples) with almost the same accuracy as NN1. Thus, we conclude that it is possible to train a NN that accurately predicts the Duncan–Chang parameters from PSDs by just using a dataset with less than one thousand DEM simulations. It is also important to remark that, to predict the model parameters for a new PSD would take more than 1 hour of computing time, using a DEM model analogous to the ones used in this work, whereas, using an already trained NN the time is in the order of the microseconds.

Fig. 10

Variation of the mean squared deviations between NN and DEM estimations, respect to the size of training dataset, for the parameters and . Dashed lines represent the mean squared deviation after the repetition of the most unlikely DEM simulations, whereas solid lines regard the first results. The neural networks used for each training dataset r are described (see Table 3)

On the other hand, by observing the asymptotic behavior of the mean squared deviations at , it seems worthless to generate a larger training dataset, or to reduce the test dataset in favor of the training one. Thus, the test dataset was consciously chosen as of the total samples, in order to guarantee that the network’s performance is evaluated using a reasonably large test dataset. Finally, as it can be seen in Fig. 10, the networks’ accuracy dropped for networks that were trained with less than 1% of the potential training samples. For these small datasets, the accuracy of the NN prediction tended to the value obtained when the sample means of and are used as estimations.

Neural network robustness with respect to noisy DEM training data

As it can be observed in Fig. 9, despite the good agreement between NN and DEM predictions for most of the test cases, the deviations were relatively high in some of them. Using network NN1 (see Table 3), the maximum absolute deviations were and . Nevertheless, a high deviation just means that DEM and NN estimations do not agree, but does not necessarily imply that the NN prediction is wrong. In order to determine whether these deviations were due to inability of the NN to predict the DEM estimation or to unlikely estimations of the model parameters from the DEM, we repeated the virtual triaxial testing in the cases with highest deviations. We considered the networks that were trained with 1%, 5%, 10% and 100% of potential training cases (networks NN1, NN2, NN3 and NN4 in Table 3). We identified the 100 predictions with the highest deviation in and , for the networks NN2, NN3 and NN4, and the worst 1000 deviations for the network NN1. Many of them overlapped, so we finally selected around 1450 experiments to repeat. It is worth emphasizing that we did not repeat some of the cases to train the NN again in order to achieve a better matching with different data, the NNs remained unchanged. After the repetition of these simulations, the relative deviation were considerably reduced in most of these cases (see Figs. 10, 11). The standard deviation of the relative deviations over the whole test dataset were also reduced: went from 0.046 to 0.037 and went from 0.115 to 0.090. This proves that for the repeated cases the first DEM measurement was very unlikely, whereas the NN prediction was much more accurate.

Fig. 11

Discrepancies between NN predictions and first and second DEM measurements in the cases that had given the highest deviations when comparing the first DEM measurement to the NN predictions. Results obtained with the network NN1 (see Table 3) This result does not come as a surprise: As already highlighted in some recent works (see, e.g., [52]), NNs are robust to a certain extent respect to mislabeled or noisy training data. In the context of this work, this can be understood based on the fact that the NN was trained using datasets that contain many test cases corresponding to PSDs that are very close to those being troublesome. Therefore the NN downplays the contribution of outliers. Thus, as it was done in this work, the trained NN can also be used as a tool for identifying unlikely estimations of the DEM. In order to further support this claim, one of the cases with the highest deviation between the NN estimation and the DEM measurement (PSD case 59861, see Fig. 12a), was more thoroughly analyzed. This PSD was used to randomly generate 1000 new packings to be subjected to DEM triaxial compression. The stress–strain curves were fitted to Duncan–Chang model, generating statistical samples of and values for this single PSD. With such large samples we could estimate the expected values of the model parameters for PSD 59861 from the samples means. Focusing on the tangent stiffness , the obtained sample mean was , its standard deviation was () and the minimum and maximum values were and , respectively.

Fig. 12

The case 59861 showed one of the highest relative deviations between DEM and NN estimations. The packing generation and triaxial test were repeated 1000 times for that specific PSD. Its PSD and the histogram of predicted values are shown together with the NN and MLR estimations and first and second DEM measurements

Figure 12b shows the histogram of for these 1000 triaxial tests. The value estimated in the first DEM test was . Therefore, the first DEM measurement provided very unlikely model parameters () and this is the reason why the deviation with NN estimation was so large. When the experiment was repeated for a second time, the new DEM estimation was , which is considerable closer to the expected value (). In contrast, the NN (which was trained from noisy data) predicted a tangent stiffness value of , which is really close to the expected value (). Multilinear regressions predicted values of and , which are closer to the expected value than the first DEM measurement but are not as good as the NN’s prediction ( and ). The case 59861 showed one of the highest relative deviations between DEM and NN estimations. The packing generation and triaxial test were repeated 1000 times for that specific PSD. Its PSD and the histogram of predicted values are shown together with the NN and MLR estimations and first and second DEM measurements

Conclusions

This research article illustrates how to use machine learning techniques to anticipate the relationship between the PSD and the mechanical behavior in specific conditions of virtual testing. The research is unprecedented because of the size of the database and because the simplified models used were conceived to isolate the effect of the PSD on the mechanical behavior. Albeit the simplicity of the systems, they still show complexity and retain geotechnical meaning (they behave as loose sand upon triaxial compression). We selected 92,378 Particle Size Distributions, PSD, lying within two particle sizes. We performed virtual triaxial tests with the DEM on samples that followed these PSDs. We fitted the resulting stress–strain curves to Duncan–Chang hyperbolic models, gathering a statistical sample of the two model parameters, namely, and . We found variations of these parameters across the statistical sample that are not easily associated to the PSD. The parameters followed bell-shaped distributions. In the case of , and . In the case of , and . Because of the finite number of particles used in each experiment (20,000), the parameters obtained from a single DEM simulation may fluctuate to some extent from the expected values (with coefficients of variation for measurements of and ). In order to relate the Duncan–Chang model parameters to the PSD, we set up a neural network, NN, which was trained with a dataset generated through DEM simulations. The input for this NN was directly the PSD and the output was the model parameters. We tried several NN architectures. 20% of the dataset was used to test the ability of networks and MLR to anticipate the model parameters, and . The size of the training dataset varied between 0.1 and 100% of the remaining DEM experiments. For the sake of comparison, we also performed a multivariate linear regression between the model parameters and common statistical and geotechnical descriptors. More precisely, we used the coefficient of uniformity, the coefficient of curvature, the mean size, the standard deviation, the skewness and the excess kurtosis. The NN was able to predict the model parameters for each experiment with higher accuracy than the MLR. In the case of , the best MLR achieved a standard deviation of the relative deviation of of 0.065, while the NN achieved a value of 0.046. In the case of , both MLRs achieved , whereas the value obtained from NN was 0.115. Considering the fact that using the expected values as predictions would end with and and that the precision of DEM experiments was limited, we conclude that NNs are providing significantly better estimations that MLRs. This is also evidenced by the differences found in the mean squared deviation measured for each model parameter and prediction method (Table 5). In order to know how many tests are necessary to train the NN, we varied the training dataset between 0.08 and 80% of the DEM experiments. We observed that NNs trained with less than one thousand triaxial experiments were still capable to accurately predict the macroscopic mechanical behavior of granular materials by just using their PSD. We also observed that the largest deviations between NN predictions and DEM measurements occurred precisely when the DEM experiments led to unlikely values in the first simulation. Therefore the NN was also useful to identify unlikely DEM results. The key to achieve more accurate estimations seems to be the reduction of the data noise. The PSD often affects the mechanical behavior of granular materials. In every sample subjected to some constraints and stress paths, there must exist relationships linking the mechanical behavior to its PSD that are hidden to the naked eye. Neural networks are capable of finding those relationships better than multivariate linear regressions using statistical or geotechnical descriptors derived from the PSD as independent variables. This research article presents a theoretical contribution supported with massive simulation with the DEM that illustrates how to relate the PSD to the mechanical behavior in a specific case. The outcomes of this research are limited for real soils, as they are constrained by the simplifications introduced in the DEM and the conditions of the experiments, but they illustrate the capabilities of the method. These capabilities can be exploited in any other research that relies on the use of DEM and has been limited to a single PSD. If the particular conditions of our experiment are those of interest, the network is already trained and can be made accessible in such a way that anyone can give the PSD as an input and anticipate the parameters governing the mechanical behavior. Advanced data acquisition techniques, such as computer vision [51] or X-ray computed tomography [36] can be used to obtain these inputs. Future works may consider variations in other intrinsic properties (e.g., the particle shape) or state parameters (e.g., void ratio), or in the kind of test (e.g., resonant column tests). In particular, particle morphology can be assessed through geotechnical descriptors (e.g. sphericity and roundness indices) or through direct inputs from the particles (e.g. statistical distribution of radii in the case of ellipsoids). In the context of a staggered multilevel material identification approach [41], a few laboratory experiments with real soil samples can be considered for the calibration of DEM models, which would be used in turn to expand the database with the help of NNs. This article thoroughly proves how neural networks can be powerful tools to explore the role of PSD in these models. The great advantage of the combination of DEM with NNs is that we can know much more by simulating much less.

4 in total

4. Machine learning framework for analysis of transport through complex networks in porous, granular media: A focus on permeability.

Authors: Joost H van der Linden; Guillermo A Narsilio; Antoinette Tordesillas
Journal: Phys Rev E Date: 2016-08-17 Impact factor: 2.529