Literature DB >> 35559186

New Empirical Correlations to Estimate the Least Principal Stresses Using Conventional Logging Data.

Ahmed Gowida¹, Ahmed Farid Ibrahim¹, Salaheldin Elkatatny¹, Abdulwahab Ali¹.

Abstract

The maximum (Shmax) and minimum (Shmin) horizontal stresses are essential parameters for the well planning and hydraulic fracturing design. These stresses can be accurately measured using field tests such as the leak-off test, step-rate test, and so forth, or approximated using physics-based equations. These equations require measuring some in situ geomechanical parameters such as the static Poisson ratio and static elastic modulus via experimental tests on retrieved core samples. However, such measurements are not usually accessible for all drilled wells. In addition, the recently proposed machine learning (ML) models are based on expensive and destructive tests. Therefore, this study aims at developing a new approach to predict the least principal stresses in a time- and cost-effective way. New models have been developed using ML approaches, that is, artificial neural network (ANN) and support vector machine (SVM), to predict Shmin and Shmax gradients (outputs) from well-log data (inputs). A wide-ranged set of actual field data were collected and extensively analyzed before being fed to the algorithms to train the models. The developed ANN-based models outperformed the SVM-based ones with a mean absolute average error (MAPE) not exceeding 0.30% between the actual and predicted output values. Besides, new equations have been developed to mimic the processing of the optimized networks. The new empirical equations were verified by another unseen data set, resulting in a remarkably matched actual stress-gradient values, confirmed by a prediction accuracy exceeding 90% in addition to an MAPE of 0.43%. The results' statistics confirmed the robustness of the developed equations to predict the Shmin and Shmax gradients with a high degree of accuracy whenever the logging data are available.

Entities: Chemical

Year: 2022 PMID： 35559186 PMCID： PMC9088765 DOI： 10.1021/acsomega.1c06596

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

The downhole formation stresses are key factors in different operations in the petroleum industry. How the stresses are concentrated in the vicinity of the wellbore directly affect the drilling operation since it controls the wellbore integrity and hence may cause many drilling-related incidents, that is, stuck bottom-hole-assembly, pack-off, and lost circulation.[1] The availability of formation stress data that describe the wellbore stress-state condition would contribute to providing viable solutions to many integrity-related wellbore problems that may be encountered during drilling. These solutions include determining the optimum mud weight, defining the safe drilling window, specifying stable trajectories, determining casing setting depths, and so forth.[2] Furthermore, defining the downhole stress condition or distribution is considered the cornerstone for developing a representative geomechanical model of subsurface formations whereby a broad suite of problems along different stages of the reservoir life could be addressed and resolved.[3−7] With a simplifying assumption, three mutual orthogonal principal stress components can represent the downhole stress state, that is, the overburden stress (Sv) and the least principal stresses: the maximum (Shmax) and minimum (Shmin) horizontal stresses. Since it is due to the compressive stress caused by the overburden formations, the vertical stress (Sv) can be estimated from the overburden formation-bulk-density log.[8] There are two types of techniques, that is, direct and indirect methods, to determine the least principal stresses. The direct method comprises the direct measurement of the stress state by conducting in situ field tests such as the leak-off test, mini-frac test, step-rate test, and so forth.[2,9,10] Shmax cannot be directly measured using these methods;[11] hence, theoretical (empirical) correlations are developed to estimate Shmax depending on the values of Sv and Shmin.[12,13] The main challenges of this method are being time-consuming, expensive, and usually unavailable for most of the wells. Making the matters even more challenging is that such tests are typically applied at specific depths which means there is no continuous profile of these stresses that would be available based solely on these direct tests. On the other hand, the indirect methods involve the determination of the least principal stresses using the well-log data. Different physics-informed theoretical models, that is, uniaxial strain theory and poroelastic strain models, were developed to determine the downhole formation stresses.[14−17] These models are based on lab measurements of some in situ geomechanical parameters, that is, static elastic moduli, strains, and static Poisson’s ratio. These measurements can be accurately measured from the lab tests (e.g., triaxial tests) conducted on retrieved cores that have been sampled from the downhole formations.[17] Thereafter, the measured parameters would be presented in a continuous profile form after correlating them to the conventional logging data. Besides, there is still a need for at least one direct field test, that is, the leak-off test, to incorporate the effect of tectonic stresses on the generated profiles.[18−20] However, one main drawback of this technique is the high cost of retrieving such core samples to be subjected to such lab measurements. This, in turn, limits the accessibility of this kind of information for most of the drilled wells. Some recent studies introduced the application of machine learning (ML) to estimate the downhole principal stresses using the breakout data.[21,22] The breakout geometries can be derived from the analysis of image logs.[23] However, borehole breakouts are considered destructive techniques that are based on failure models.[24] Besides, most drilled wells lack such data due to the high cost and time consumption of running these special logs. Accordingly, based on the literature, direct nondestructive techniques to determine the formation stresses are yet to be researched. Another approach was introduced by AlTammar and Alruwaili.[25] to estimate Shmin and Shmax based on the caliper log data; however, a certainty analysis has to be incorporated into the model for geomechanical properties that are not readily available. Therefore, a project was initiated to investigate the feasibility of ML to estimate the formation stresses using the available and easy-to-get data such as mechanical data and logging data. The results of the first phase demonstrated the ability of ML-based models to predict the in situ stresses using the mechanical drilling data.[26] The second phase of the project, which is the subject of this paper, investigated the application of ML to predict the least principal stresses using logging data in a white-box version. Therefore, this study aims at developing a new, robust tool that can estimate the gradients of the least principal stresses, Shmin and Shmax, from the conventional well-log data by deploying ML approaches: artificial neural network (ANN) and support vector machine (SVM). The ML approaches have been selected due to the recent high computational capabilities of computers and the outstanding performance of such approaches to mimic and solve highly complex problems. Recently, different ML approaches have been successfully applied in the field of petroleum-related geomechanics such as predicting unconfined compressive strength,[27,28] elastic parameters,[29,30] and wellbore failures.[31] The novelty of this study was extended to develop state-of-the-art equations to estimate Shmin and Shmax directly from the logging data. These equations, with a detailed procedure for application, introduce the developed ML models in a white-box version to allow the reproducibility of the results, unlike the usual black-box nature of the ML models.

Data Analysis

This section describes the data set used for this study with summarized insights on the data preprocessing applied before proceeding with the model development.

Data Description

Field measurements, 2385 data points, were collected from two wells in a Middle East field representing a complex carbonate reservoir. These data include well-logging records and in situ maximum and minimum horizontal stresses, Shmax and Shmin. The logging data comprise gamma-ray (GR) log, formation bulk density (RHOB) log, compressional (DTC) and shear (DTS) wave transit-time log, neutron porosity (Phi) log, dynamic Poisson’s ratio (PRd), and dynamic elastic modulus (Ed). The data collected from well-A have been used for training and testing the models, while the data gathered from well-B were directed to validate the developed models and verify their performance.

Data Acquisition

The stress magnitude can be estimated either by employing field tests or using developed theoretical-based equations. The equations developed based on the poroelastic model are considered the most common and applicable method to estimate the stress profile at the desired depth of the drilled wells.[6,11,32] Blanton and Oslon[16] were the first to introduce anisotropy in the in situ horizontal stress equation for different lithologies. Their model considers the effect of the tectonic stresses by introducing the tectonic strains into the equation. Accordingly, the least principal stresses can be estimated using eqs and 2.[16,32]where PRs is the static Poisson’s ratio, Sv is the vertical stress component, α is Biot’s elastic coefficient, and ε and ε are the elastic strains in the Shmin and Shmax directions, respectively. First, the vertical stress Sv was estimated from the RHOB log by integrating the formation density from the surface to the depth of interest using eq .where ρ(z)is the formation density at a certain depth of z and g is the gravitational acceleration. Then, dynamic Poisson’s ratio (PRd) and dynamic elastic modulus (Ed) were estimated based on the acoustic and RHOB logs using the formulas listed in Appendix A. The calculated Ed and PRd were then correlated with Es and PRs obtained from the experimental tests conducted on the samples cored from the downhole formations. To determine the elastic strains’ values ε and ε, an equal-strain assumption was initially considered for both directions before estimating the Shmin using eq . A field test was then used to calibrate Shmin for tectonic effects. In the case of not achieving an accurate match with the measured value, Shmin was recalculated using different ratio values (ε/ε). This step was repeated iteratively until an acceptable converge on Shmin values and accurate match were achieved. Finally, the Shmin and Shmax profiles were estimated and considered as the outputs for the proposed ML models.

Statistical Descriptive Analysis

The obtained data in this study were statistically analyzed and describes by deploying different statistical measures, as listed in Table . This helps provide better understanding of the data and its distribution. The descriptive measures listed indicated that both the logging and stress data cover a wide range with a representative distribution and hence give more confidence to capture the nature of the problem. The data ranges can be summarized as following: GR: 3.34–90.49 API unit, DTC: 44.82–66.12 μs/ft, DTS: 81.28–132.47 μs/ft, RHOB: 2.32–3.04 g/cm3, Phi: 0.28–0.32 fraction, Ed: 5.70–14.79 Mpsi, PRd: 0.28–0.33 fraction, Shmin: 11 292.34–12 361.17 psi, and Shmax: 12 308.02–14 599.00 psi.

Table 1

Descriptive Statistical Summary of the Data Set Used in This Study

parameter	GR (API unit)	DTC (μs/ft)	DTS (μs/ft)	RHOB (g/cm³)	Phi	E_d (Mpsi)	PR_d	Sh_min (psi)	Sh_max (psi)
minimum	3.34	44.82	81.28	2.32	0.28	5.70	0.28	11292.34	12308.02
maximum	90.47	66.12	132.47	3.04	0.32	14.79	0.33	12361.17	14599.00
mean	29.56	48.43	89.97	2.82	0.29	12.37	0.30	11886.61	13778.29
std	14.25	2.89	6.94	0.11	0.01	1.57	0.01	274.67	450.26
skewness	0.64	2.66	2.66	–0.92	1.09	–1.34	1.55	–0.07	–0.50

Data Preprocessing

Data preprocessing is an essential step for developing ML-based models since the quality of data directly has a considerable impact on the ability of the model to learn and give accurate predictions.[33] Therefore, the obtained data were initially preprocessed before being fed to the proposed models.[34] The data set was first cleaned from any missing data, redundant or duplicated information, and contextual errors such as negative values and unreasonable values that do not make sense from the engineering point of view. Then, a MATLAB code was specially designed to detect and eliminate the outliers using several techniques, that is, quartiles, and so forth.

Dimensionality Reduction

It refers to the process of reducing the dimensionality of the input features, that is, the logging data, to obtain a set of principal features. Accordingly, the redundant and irrelevant input information was identified by studying the collinearity between the input features and then removed. First, the input data were normalized between 0 and 1 for better representation. Then, the correlation coefficient (R-value) was calculated between the inputs to evaluate how strongly each input linearly correlates with the others (Table ). In the case of having two or more features that have an R-value of more than 0.95, only one of them would be considered, and the others would be excluded. Therefore, only GR, RHOB, and DTC were selected to feed the proposed models after excluding the others that have almost the same distribution as the selected ones (Figure ).

Table 2

Correlation Coefficient Analysis among the Input Features (Logging Data)

parameter	GR	DTC	DTS	RHOB	PHI	E_d	PR_d
GR	1.00
DTC	–0.45	1.00
DTS	–0.45	1.00	1.00
RHOB	–0.21	–0.35	–0.35	1.00
Phi	–0.49	0.95	0.95	–0.27	1.00
E_d	0.38	–0.95	–0.95	0.51	–0.96	1.00
PR_d	–0.48	0.98	0.98	–0.31	0.99	–0.96	1.00

Figure 1

Graphic display of the distribution of the normalized logging data where the y-axis represents the normalized data values and the x-axis represents the data index.

Graphic display of the distribution of the normalized logging data where the y-axis represents the normalized data values and the x-axis represents the data index. Moreover, taking the square root (Sqrt) of the GR values, square-root transformation reduced its skewness from 0.63 to −0.07. It approached zero, which is indicative of being more like a normal distribution. Therefore, Sqrt(GR) values were considered instead of GR values as an input feature.

Correlation Analytics

Pearson’s correlation coefficient was used to investigate the relative importance between each input feature and the outputs. This correlation helps identify to what extent the output is dependent on each input feature.[35] The R-value between Shmin and the selected inputs did not exceed 0.29. As an attempt to enhance this value, the formation depth was integrated into the stress profile to express it as a stress gradient profile instead. Studying the correlation between each feature and the Shmin gradient, Figure a shows a significant increase in the R-value from 0.29, −0.27, and 0.12 to −0.53, 0.60, and 0.21 for Sqrt(GR), DTC, and RHOB, respectively, compared to the initial case with the Shmin values.

Figure 2

Correlation coefficient between (a) Shmin and Shmin-gradient and (b) Shmax and Shmax-gradient, with each input feature [Sqrt(GR), DTC, and RHOB].

Correlation coefficient between (a) Shmin and Shmin-gradient and (b) Shmax and Shmax-gradient, with each input feature [Sqrt(GR), DTC, and RHOB]. Similarly, the Shmax gradient was found to have a relatively higher R-value with the input features compared to ShmaxR-values, as shown in Figure b. Therefore, the Shmin and Shmax gradients were considered the proposed models’ outputs instead of the absolute Shmin and Shmax values. The formula used to calculate Pearson’s correlation coefficient is presented in Appendix A.

Model Development

The proposed models were then developed using the preprocessed data set by employing ANN and SVM techniques to predict the Shmin and Shmax gradients based on the selected conventional logging data; GR, DTC, and RHOB.

Artificial Neural Network

ANN as a supervised-learning technique is recently well-known for its high capability of modeling several engineering problems with a high degree of complexity. The basic architecture of a neural network typically consists of three types of layers: the input layer, hidden layer(s), and output layer.[41] The input features are assigned to the input layer that has weighted connections with the hidden layer(s). The neurons in the hidden layer process the input data before being transferred through the network connections to the output layer to ultimately produce the output in the output layer.[36] The optimization process of the network aims at tuning the weights of the network connections as well as the biases to yield the lowest possible error for a given network configuration.[37,38]

Support Vector Machine

SVM is one of the most common ML techniques, well-known for its high capability to deal with classification and regression applications with a high degree of complexity.[41] It follows the supervised learning approach while carrying out the transformation of the input data set into a higher-degree dimensional (n-dimensional) feature space whereby more space would be available for training instances to achieve the optimal hyper-plane.[42] Several parameters are required to be adequately optimized while SVM training to develop a robust model with optimal performance.[42−44] Recently, many studies employed the SVM technique in estimating several petroleum-related parameters and in geomechanics-related applications.[45−49]

Results and Discussion

ANN-Based Model Development

In this study, ANN was employed to develop new models that can estimate the Shmin and Shmax gradients based on the well-log data as feeding inputs. The obtained data set was initially divided into three main categories: training, validation, and testing sets. Typically, multiple models are trained using the training set with different hyper-parameters before being tested internally utilizing the validation set to evaluate the selected hyper-parameters. The developed model with those hyper-parameters that yield acceptable prediction accuracy on the validation set is then tested using the testing set to evaluate the generalization error of the trained model.[39] Ratios ranging from 70 to 90% were tested for the training set, and for each trial, the rest of the data was split using a one-to-one ratio for the testing and validation sets. Meanwhile, different combinations of the ANN parameters were tested to optimize the model. Table lists the ANN-parameter options that have been tested in addition to the selected (optimized) ones.

Table 3

Tested Options for Optimizing the Developed ANN-Based Models

				optimized parameters
parameter	tested options/ranges			Sh_min gradient model	Sh_max gradient model
number of hidden layers	1–4			single hidden layer
number of neurons in each layer	5–40			30	15
split ratio	70–90% (for training set) the rest was divided by 1-to-1 ratio for validation and testing			(training/validation/testing) 0.8/0.1/0.1
training algorithms	trainlm	trainbfg	trainrp	trainlm
	trainscg	traincgb	traincgf
	traincgp	trainoss	traingdx
transfer function	tansig	logsig	elliotsig	tansig
	radbas	hardlim	satlin
learning rate	0.01–0.9			0.05	0.15

The gradient descent algorithm was implemented while iteratively updating the network parameters in the gradient direction of the objective function. The process includes considering random values of the model hyperparameters and iteratively adjusting them using the available options to eventually reduce the loss function over a series of trials (epochs). The model’s hyper-parameters are updated through each iteration to minimize the loss of the next iteration using the back propagation technique. A MATLAB code was developed to test different scenarios while optimizing the network. Each scenario includes different combinations of the available options of the ANN parameters. The prediction for each case was evaluated in terms of the R-value to assess the collinearity between the predicted and actual output values. In addition, the prediction error was evaluated using the mean absolute percentage error (MAPE) and root-mean-squared error (RMSE) between the predicted and observed output values for the training, validation, and testing processes. Achieving the highest R-value besides the lowest MAPE and RMSE was the objective criteria to select the optimized parameters of the network. The mathematical formulas used to calculate MAPE and RMSE are stated in Appendix A.

Shmin Gradient Prediction

The tuning process of the developed Shmin gradient model results in a network architecture of three layers: one input layer including the input features [Sqrt(GR); DTC and RHOB], one hidden layer with 30 neurons, and one output (Shmin gradient) layer. The developed model was trained by the Levenberg Marquardt algorithm (trainlm) with a learning rate of 0.05 using a transfer function of tan-sigmoidal type for the input layer and a linear function for the output layer. Figure shows a typical architecture schematic of the developed ANN-based models. The crossplots between the predicted and actual Shmin gradients, Figure , showed a significant match with an R-value of 0.90 and MAPE not exceeding 0.14% both for the training and testing processes.

Figure 3

Typical architecture of the developed ANN-based models.

Figure 4

Crossplots between the actual and predicted Shmin gradients for the developed ANN-based model for (a) training and (b) testing processes.

Typical architecture of the developed ANN-based models. Crossplots between the actual and predicted Shmin gradients for the developed ANN-based model for (a) training and (b) testing processes. After fitting a regression model, the prediction residuals have been checked to ensure reliable regression results. Therefore, the residuals of the Shmin-gradient model were plotted versus the fitted values, as depicted in Figure a, which shows the random scattering of the residuals around zero. The residual histograms were also found to be more-like normally distributed (Figure b), which demonstrates that all the fitted values have almost the same degree of scattering.[40]

Figure 5

Analysis of the prediction residuals of the Shmin-gradient ANN-based model: (a) residuals vs fitted values and (b) histogram of the prediction residuals.

Shmax Gradient Prediction

Similarly, the optimized network for predicting the Shmax gradient contained one hidden layer with 15 neurons. The model was trained with a learning rate of 0.15 using trainlm as a learning algorithm. The narrow scatter of the points along the 45-line in the crossplots is shown in Figure , indicating the agreement between the observed Shmax gradient and the predicted ones for both the training and testing. This is further verified by the low MAPE 0.30% between the observed and predicted values for the testing process. In addition, the average R-value is 0.98 for both. The evaluation metrics (R-value, MAPE, and RMSE) listed in Table describe the accuracy of the ANN-based models. Furthermore, plotting the model prediction residuals versus the fitted values showed a scattered pattern around zero, Figure a, with approximately a normal distribution in the residual histogram plot depicted in Figure b. These measures indicate the stable prediction (regression) performance of the developed model.

Figure 6

Crossplots between the actual and predicted Shmax gradients for the developed ANN-based model for (a) training and (b) testing processes.

Table 4

Summary of the Metric Used for Evaluating the Accuracy of the Developed ANN-Based and SVM-Based Models

		training process			testing process
model	output parameter	R-value	MAPE (%)	RMSE	R-value	MAPE (%)	RMSE
Sh_min gradient	ANN	0.92	0.12	0.0016	0.92	0.14	0.0013
	SVM	0.86	0.18	0.0019	0.86	0.16	0.0017
Sh_max gradient	ANN	0.98	0.28	0.0037	0.98	0.30	0.0038
	SVM	0.98	0.34	0.0041	0.97	0.41	0.0041

Figure 7

Analysis of the prediction residuals of the Shmax-gradient ANN-based model: (a) residuals vs fitted values and (b) histogram of the prediction residuals.

Crossplots between the actual and predicted Shmax gradients for the developed ANN-based model for (a) training and (b) testing processes. Analysis of the prediction residuals of the Shmax-gradient ANN-based model: (a) residuals vs fitted values and (b) histogram of the prediction residuals.

SVM-Based Model Development

The same data set was used for building the SVM-based models to estimate the Shmin and Shmax gradients using the same input features. For optimizing the SVM-based models, both Gaussian and polynomial kernel functions were tested with different SVM-model optimizing parameters; epsilon, lambda, kernel option, C-parameter, and verbose. The model was trained using 70% of the obtained data, while the rest were used for the validation and testing processes with a one-to-one ratio. For both the Shmin- and Shmax-gradient models, the sensitivity analysis shows that the epsilon, lambda, and verbose parameters did not significantly impact prediction accuracy. The Gaussian kernel function yielded better prediction performance regarding the R-value between the predicted and actual output values than the polynomial function. Varying kernel options from one to nine showed that a kernel option of 3.5 gave the best prediction performance with the lowest MAPE for both the Shmin- and Shmax-gradient models. A C-parameter of 400 was selected for the Shmin gradient model, while 600 was chosen for the Shmax-gradient model. Increasing the C-parameter value beyond the values chosen resulted in an over-fitting problem in the developed models indicated by low training error while, conversely, very high errors in the testing process. These selected values of the SVM-based model parameters yielded the best prediction performance during the testing process in terms of the R-value of 0.86 and 0.97 and MAPE values of 0.16 and 0.41% between the predicted and the actual values for the Shmin and Shmax gradient models, respectively. The statistical parameters (R-value, MAPE, and RMSE) describing the performance of the SVM-based models to estimate the Shmin and Shmax gradients are listed in Table . Table summarizes the selected SVM parameters for the developed Shmin- and Shmax-gradient models. Figures and 9 show the crossplots between the predicted and observed output values for model development processes (training and testing).

Table 5

Tested Options for Optimizing the Developed SVM-Based Models

		selected parameters
parameter	tested options/ranges	Sh_min gradient model	Sh_max gradient model
kernel function	Gaussian, polynomial, htrbf, rbf	Gaussian function
kernel option	1.5–7	3.5
lambda	1 × 10^–7 to 1 × 10^–1	1 × 10^–5
epsilon	0.00001–0.1	0.1
verbose	1	1
C-parameter	10–1000	400	600

Figure 8

Crossplots between the actual and predicted Shmin gradients for the developed SVM-based models for (a) training and (b) testing processes.

Figure 9

Crossplots between the actual and predicted Shmax gradients for the developed SVM-based models for (a) training and (b) testing processes.

Crossplots between the actual and predicted Shmin gradients for the developed SVM-based models for (a) training and (b) testing processes. Crossplots between the actual and predicted Shmax gradients for the developed SVM-based models for (a) training and (b) testing processes. Comparing the prediction performance of both ANN and SVM models in the testing data set showed that the developed ANN-based models outperformed the SVM-based ones while predicting Shmin and Shmax gradients. The developed ANN-based models yielded better predictions for the testing process of the developed models regarding higher R-values of 0.92 and 0.98 and lower MAPE values of 0.14 and 0.30% between the predicted and actual Shmin and Shmax gradients. However, the predictions of the developed SVM-based models resulted in R-values of 0.86 and 0.97 with MAPE values of 0.16 and 0.41% for the Shmin and Shmax gradients models, respectively, Figures and 11. Furthermore, the ANN approach has the privilege of having the potential to extract imitating equations to the neural network process.

Figure 10

Comparison of the prediction performance between the developed (Shmin gradient) ANN-based and SVM-based models in terms of (a) R-value and (b) MAPE for training, validation, and testing processes.

Figure 11

Comparison of the prediction performance between the developed (Shmax gradient) ANN-based and SVM-based models in terms of (a) R-value and (b) MAPE for training, validation, and testing processes.

Comparison of the prediction performance between the developed (Shmin gradient) ANN-based and SVM-based models in terms of (a) R-value and (b) MAPE for training, validation, and testing processes. Comparison of the prediction performance between the developed (Shmax gradient) ANN-based and SVM-based models in terms of (a) R-value and (b) MAPE for training, validation, and testing processes.

Empirical Equations for Estimating Shmin and Shmax Gradients

One of the primary outcomes of this study was the development of new empirical equations that can be used to estimate the Shmin and Shmax gradients without needing to run the MATLAB codes. Accordingly, Shmin and Shmax gradients can be calculated using the novel ANN-based eqs and 5, respectively. The subscript “normalized” refers to the normalized form of the Shmin and Shmax gradients, and the input parameters should be first normalized using the point-slope form in eq .where X is the actual value of the input parameter, Xmin and Xmax are the minimum and maximum values of the input features, respectively, and Xnormalized is the normalized form of the input parameter. The normalized form of the Shmin and Shmax gradients in eqs and 5 can be calculated using eqs and 8. The [Sqrt(GR)], DTC, and RHOB represent the normalized forms of the input parameters obtained using eq . These equations were established to mimic the developed ANN-based models utilizing the tuned weights and biases of the optimized networks. The weights and biases of the developed Shmin and Shmax models in eqs and 8 are listed in Tables and 7, respectively. The input parameters should be measured in the following units: GR in API unit, DTC in μs/ft, and RHOB in g/cm3.

Table 6

Extracted Weights and Biases to Be Used in Eq for Estimating the Shmin Gradient

	W_{1_i,j}
i	j = 1	j = 2	j = 3	W_{2_i}	b_1,i	b₂
1	–3.921	0.527	1.072	0.298	4.579	–0.870
2	2.934	2.286	2.366	–0.298	–3.932
3	0.043	–6.361	–0.428	–0.876	–5.893
4	–3.070	–2.316	–2.405	0.664	2.966
5	2.943	1.749	–2.839	0.685	–3.041
6	–1.677	–4.048	0.278	0.127	2.709
7	–0.767	–4.177	–4.646	–0.340	1.410
8	–3.120	–3.061	0.288	0.945	1.909
9	0.968	–2.507	–2.958	0.441	–1.716
10	–3.968	1.687	1.129	–0.330	1.137
11	–1.476	4.656	2.314	0.339	2.230
12	3.189	2.455	–1.452	0.154	–1.175
13	–3.588	0.998	2.001	0.278	0.889
14	–0.228	1.546	3.449	0.625	1.180
15	–1.189	3.067	–2.984	0.213	0.542
16	1.895	–2.761	2.420	0.148	1.167
17	3.512	1.644	–2.954	0.251	0.944
18	–1.783	–1.520	–0.185	0.752	–1.164
19	–4.342	–0.693	–3.299	0.380	–0.803
20	–4.097	–0.671	–2.223	–0.496	–0.770
21	2.793	–3.729	0.564	–0.697	1.154
22	1.768	–2.186	3.675	–0.134	1.558
23	–0.492	–1.169	5.038	–0.751	–3.895
24	–3.288	0.868	–2.794	0.185	–2.401
25	3.086	0.805	3.362	0.112	1.596
26	4.075	–1.966	1.095	–0.105	2.999
27	–4.754	1.211	–1.468	–0.870	–2.993
22	–0.508	1.499	–4.257	0.431	–3.777
29	2.612	1.414	2.946	–0.492	4.200
30	0.698	–1.903	5.501	0.693	–4.498

Table 7

Extracted Weights and Biases to Be Used in Eq for Estimating the Shmax Gradient

	W_{1_i,j}
i	j = 1	j = 2	j = 3	W_{2_i}	b_1,i	b₂
1	1.719	–2.422	12.895	–0.327	–10.487	–0.250
2	–1.273	1.409	–8.854	–0.437	7.050
3	–5.160	–1.220	–4.956	–1.429	2.775
4	–5.181	1.336	–2.033	0.170	3.534
5	0.431	1.159	3.291	0.184	0.899
6	5.567	–3.632	–6.719	–0.045	–5.231
7	–9.146	–4.834	–5.639	–0.201	2.091
8	–5.561	–1.786	–5.311	1.451	2.698
9	0.096	–4.575	–0.572	0.207	–0.478
10	–2.126	0.730	19.793	0.040	–6.191
11	–6.788	3.517	3.563	0.083	–0.822
12	–0.056	–3.909	–0.429	0.448	–2.425
13	–2.571	–4.794	3.664	1.154	–8.127
14	2.126	4.353	–3.162	1.428	7.309
15	2.434	–12.016	–1.286	0.129	8.345

Model Verification

For further investigation of the performance of the developed equations, 456 (unseen) data points from well B were used to evaluate the performance of the developed equations. These data involved the logging measurements (GR, RHOB, and DTC) and the corresponding Shmin and Shmax gradients. The logging data were fed as inputs for the developed ANN-based equations, and the results were then compared with the actual stress-gradient values. The prediction results of both the Shmin and Shmax gradients remarkably matched the actual values, confirmed by MAPE values of 0.18 and 0.43%; besides, R-value exceeds 0.90 for the Shmin and Shmax predictions, respectively, Figure . These results demonstrate the outstanding performance of the developed ANN-based equations to develop continuous profiles of the Shmin and Shmax with high accuracy whenever the well-log data are available.

Figure 12

Prediction performance of the developed ANN-based equations (actual vs predicted stress gradients) for the verification process: (a) Shmin-gradient prediction and (b) Shmax-gradient prediction.

Prediction performance of the developed ANN-based equations (actual vs predicted stress gradients) for the verification process: (a) Shmin-gradient prediction and (b) Shmax-gradient prediction. Having continuous profiles of the least principal stresses for the drilled wells could help provide practical solutions to several wellbore instability issues that may affect the well integrity. Besides, such data would help develop a comprehensive geomechanical model of the subsurface formations. As a result, a broad suite of problems along different stages of the well life could be addressed and avoided. It should be highlighted that the application of the developed correlation is more recommended for carbonate formations from which most of the data used in developing the models were obtained. This can be explained as other formation types may have different log responses to the geomechanical properties that control the downhole stress distributions. Therefore, some errors might be expected upon the application for different formation lithologies. Moreover, it is recommended to employ the developed equations using inputs within the range and the same units listed in Table to ensure reliable results.

Conclusions

New models were developed using two ML techniques, ANN and SVM, to predict the maximum (Shmax) and minimum (Shmin) horizontal stress gradients. The developed models used the conventional logging data: GR, RHOB, and DTC as feeding inputs to the algorithms. The findings of this research can be highlighted as follows: The prediction performance of the developed models by ANN surpassed the SVM-based ones with accuracy exceeding 90% and a MAPE of 0.30%. Novel equations were established according to the tuned weights and biases of the optimized neural networks. These equations can estimate the Shmin and Shmax gradients directly from the logging data. The new equations were validated using a different data set achieving an obvious match between the predicted and actual stress-gradient values with MAPE not exceeding 0.43%. The results reflect the robustness of the new equations to accurately estimate the Shmin and Shmax gradients directly from the well-logging data.

3 in total

1. A fast learning algorithm for deep belief nets.

Authors: Geoffrey E Hinton; Simon Osindero; Yee-Whye Teh
Journal: Neural Comput Date: 2006-07 Impact factor: 2.026

2. On Splitting Training and Validation Set: A Comparative Study of Cross-Validation, Bootstrap and Systematic Sampling for Estimating the Generalization Performance of Supervised Learning.

Authors: Yun Xu; Royston Goodacre
Journal: J Anal Test Date: 2018-10-29

3. Prediction of the Least Principal Stresses Using Drilling Data: A Machine Learning Application.

Authors: Ahmed Gowida; Ahmed Farid Ibrahim; Salaheldin Elkatatny; Abdulwahab Ali
Journal: Comput Intell Neurosci Date: 2021-11-30

3 in total