Yang Yu1, Shangqi Liu1, Yang Liu1, Yu Bao1, Lixia Zhang1, Yintao Dong2. 1. PetroChina Research Institute of Petroleum Exploration and Development, Beijing 100083, China. 2. CNOOC Research Institute, Beijing 100028, China.
Abstract
The purpose of this study is to develop a data-driven proxy model for forecasting of cumulative oil (Cum-oil) production during the steam-assisted gravity drainage process. During the model building process, an artificial neural network (ANN) is used to offer a complementary and computationally efficient tool for the physics-driven model, and the von Bertalanffy performance indicator is used to bridge the physics-driven model with the ANN. After that, the accuracy of the model is validated by blind-testing cases. Average absolute percentage error of related parameters of the performance indicator in the testing data set is 0.77%, and the error of Cum-oil production after 20 years is 0.52%. The results illustrate that the integration of performance indicator and ANN makes it possible to solve time series problems in an efficient way. Besides, the data-driven proxy model could be applied to fast parametric studies, quick uncertainty analysis with the Monte Carlo method, and average daily oil production prediction. The findings of this study could help for better understanding of combination of physics-driven model and data-driven model and illustrate the potential for application of the data-driven proxy model to help reservoir engineers, making better use of this significant thermal recovery technology for oil sands or heavy oil reservoirs.
The purpose of this study is to develop a data-driven proxy model for forecasting of cumulative oil (Cum-oil) production during the steam-assisted gravity drainage process. During the model building process, an artificial neural network (ANN) is used to offer a complementary and computationally efficient tool for the physics-driven model, and the von Bertalanffy performance indicator is used to bridge the physics-driven model with the ANN. After that, the accuracy of the model is validated by blind-testing cases. Average absolute percentage error of related parameters of the performance indicator in the testing data set is 0.77%, and the error of Cum-oil production after 20 years is 0.52%. The results illustrate that the integration of performance indicator and ANN makes it possible to solve time series problems in an efficient way. Besides, the data-driven proxy model could be applied to fast parametric studies, quick uncertainty analysis with the Monte Carlo method, and average daily oil production prediction. The findings of this study could help for better understanding of combination of physics-driven model and data-driven model and illustrate the potential for application of the data-driven proxy model to help reservoir engineers, making better use of this significant thermal recovery technology for oil sands or heavy oil reservoirs.
The viscous oil is an
issue of global importance. As conventional
oil resources are depleted, continuous demand for fossil fuels has
been promoting the production from unconventional reservoirs with
viscous oil during the last few decades.[1,2] In the past,
a large quantity of oil sands or heavy oil reservoirs, such as MacKay
River oil sands in Canada and Fengcheng extra-heavy oil development
area in China, which are difficult to exploit were discovered around
the world.[3,4] The steam-assisted gravity drainage (SAGD)
process, an effective thermal technique to exploit oil sands/heavy
oil reservoirs, has a higher recovery factor than traditional thermal
recovery approaches (for instance, cyclic steam stimulation or steam
flooding) in general. Also, with the development of the SAGD technology
(as depicted in Figure ), large-scale commercial applications have been realized all over
the world.[5,6]
Figure 1
Cross-sectional view of the typical SAGD process:
(A) preheating
phase, (B) ramp up phase, and (C) lateral expanding phase (modified
from Irani et al.[13]).
Cross-sectional view of the typical SAGD process:
(A) preheating
phase, (B) ramp up phase, and (C) lateral expanding phase (modified
from Irani et al.[13]).Although the concept of the SAGD process seems quite simple, it
is a multiphysical process involving simultaneous heat and mass transfer
in reality.[7−9] So that, conventional approaches, such as empirical
formula method and analytic productivity formula method, cannot accurately
predict the SAGD performance. Recently, reservoir numerical simulation
is an effective way to predict the performance for full life cycle
of the SAGD process, once adequate inputs are provided.[10] It is one of effective physics-driven modeling
methods and is considered as a dependable researching tool in the
reservoir engineering field.[11] However,
once the reservoir numerical simulation model is complex, the running
process will be very time-consuming; meanwhile, the storage requirement
of big data raises additional challenges for complicated simulation
models.[12] So it is of great value building
a more efficient proxy model to meet the requirements of today’s
fast-paced application scenarios. This is an effort to establish the
workflow which could use the prepared data sets to construct the proxy
model and offer accurate forecasts at less computational and storage
costs.A data-driven proxy model, an alternative model to a
physics-driven
model, starts to arouse extensive concern as a result of its capability
to learn and memorize throughly the training process with appropriate
data sets. Data-driven methods have great potential in the oil and
gas industry, and the scope of its application covers upstream and
downstream fields which include exploration and development, storage
and transportation, and so forth. Especially, many scholars utilized
various kinds of data-driven methods to complete performance forecasting
tasks in the reservoir engineering field. Gupta et al. (2014) provided
a workflow that uses the power of the autoregressive integrated moving
average (ARIMA) model to forecast production of shale gas reservoirs.[14] Kulga et al. (2017) proposed an artificial neural
network (ANN)-based forecast model to predict daily gas production
from tight-gas sand formation and found that the ANN model has a good
performance.[15] Amirian et al. (2018) employed
artificial and computational intelligence (ACI)-based learning algorithms
to realize performance forecasting for polymer flooding in heavy oil
reservoirs.[16] Sagheer et al. (2019) built
a deep long short-term memory (LSTM) network, in order to solve time
series prediction problem of petroleum production.[17] Negash and Yaw (2020) established an ANN model to forecast
production of a hydrocarbon reservoir under water injection.[18] Xue et al. (2020) built a data-driven proxy
model based on the multiobjective random forest method to forecast
dynamic behavior of shale gas production.[19] Zhong et al. (2020) proposed a deep convolutional generative neural
network (CDC-GAN)-based data-driven proxy model to predict the field
oil production of reservoir developed by the waterflooding technology.[20] Deng and Pan (2021) designed and implemented
the echo state network (ESN)-based data-driven proxy model to complete
predicting tasks for waterflooding fields.[21]Aforementioned works reveal that the data-driven proxy model
could
provide a powerful tool for solving performance forecasting problem
and most of them involve time series changes. Several methods, including
ANN, support vector machine, random forest regression, and their variants,
can be used to construct a data-driven proxy model. Among them, ANN
is the most popular approach to solve various forecasting problems
with time series.[22−26] As is known to all, the ANN could be roughly divided into two categories,
that is, feedforward neural network and feedback neural network. Most
of the forecasting problems with time series were solved through feedback
neural networks, such as Elman neural network, recurrent neural network,
and LSTM neural network.[27−29] The feedback neural network,
however, has a more complicated network structure than the feedforward
neural network as a result of bi-directional transmission, self-circulation,
memory, or other functions. The feedforward neural network, by contrast,
is widely used in the performance forecasting field without time series
problem for easy intelligibility and accessibility.In the performance
forecasting field of reservoir engineering,
many scholars have utilized feedforward neural network, support vector
machine, random forest regression, or their variants to complete prediction
tasks, and great results have been obtained. However, these tasks
usually only involve non-time series problems, such as recovery prediction
at a given time step. Forecasting of the cumulative oil (Cum-oil)
production profile during the SAGD process is a kind of time series
problem (as shown in Figure ). Throughout the literature review, when it comes to time
series problems of performance forecasting, a lot of previously performed
studies choose to employ some data-driven methods that can directly
solve time series problems or the combination of multiple data-driven
methods. Thus, the complexity of the data-driven model and the difficulty
of its application are increased. Hence, it is necessary to explore
a convenient approach to forecast the Cum-oil production profile during
the SAGD process.
Figure 2
General sketch of Cum-oil production forecasting problem.
General sketch of Cum-oil production forecasting problem.This paper focuses on the establishment of the
data-driven proxy
model which can take full advantage of the feedforward neural network,
instead of using more complex neural networks or other methods. Also,
the data-driven proxy model can accurately and efficiently predict
the Cum-oil production changes with time during the SAGD process under
the application of an appropriate knowledge-based performance indicator.
The integration of selected reservoir model, performance indicator,
and feedforward neural network makes it possible to solve time series
problems in an efficient way. It is an attempt to combine the physics-driven
method and data-driven method. Ultimately, it can help reservoir engineers
make better use of the SAGD technology for oil sands and heavy oil
reservoirs.This paper is structured as follows: first, the
methodology for
reservoir modeling and data generation is explained; then, the methodology
for determination of performance indicator and data-driven proxy model
construction is presented; next, validation and application of data-driven
proxy model are elaborated; and finally, the related discussions are
shown and key conclusions are summarized.
Methodology
Reservoir Modeling and Data Generation
According to
typical properties of MacKay River oil sands, a 3D SAGD
base model (Figure ) was constructed for flow simulation in an oil sand reservoir. As
shown in Figure ,
the horizontal injector and producer are parallelly placed at the
lower part of the model from the vertical view (Z-direction), and two parallel horizontal wells are located in the
middle part of the model in the X-direction. The X-directional length of the 3D SAGD base model is set as
125 m considering the actual distance between two adjacent SAGD well
pairs in the field. Also, 200 m horizontal wellbore along the Y-direction is modeled. Therefore, the 3D SAGD base model
is 125 m × 220 m × 25 m, and the grid size is 1 m in both X and Z directions, and 10 m, 50 m ×
4, and 10 m in the Y direction. The preheating period
lasts 150 days and the production period lasts 20 years with the consideration
of the realistic SAGD project. All simulation cases are based on the
3D SAGD base model.
Figure 3
3D SAGD base model (ΔX = ΔZ = 1 m; ΔY = 10 m, 50 m × 4,
10 m).
3D SAGD base model (ΔX = ΔZ = 1 m; ΔY = 10 m, 50 m × 4,
10 m).Attributes belonging to initial
conditions or reservoir characteristics
are ungovernable factors which are associated with the reservoir or
fluid. Also, attributes which belong to operating parameters are artificial
factors relevant to SAGD well pairs. All attributes affect the performance
of the SAGD process together. So that, input parameters used in generating
simulation cases should cover aforementioned three categories, as
present in Figure . Through literature review and field experience, a series of typical
attributes which could be considered in the numerical simulation model
are chosen as input parameters.[30−34] Input parameters attached to initial conditions include initial
reservoir pressure, initial oil saturation, and thermal conductivity
of rocks. Input parameters attached to reservoir characteristics include
effective thickness, porosity, horizontal permeability, and ratio
of vertical permeability to horizontal permeability. Operational pressure,
steam rate of production well, and steam quality are considered as
operating input parameters. According to actual conditions or experiences
of the SAGD process in MacKay River oil sands, the ranges of these
parameters are determined. Table shows the ranges of all input parameters, which are
divided into three categories, as also shown in Figure .
Figure 4
Input and output of the proxy model for Cum-oil
production profile
forecast during the SAGD process.
Table 1
Value Ranges of Inputs for the Data-Driven
Proxy Model
category
parameter
unit
minimum
maximum
initial
conditions
Pi
kPa
200
600
Soi
Fraction
0.65
0.85
λr
J/(m·d·°C)
1.56 × 105
4.5 × 105
reservoir characteristics
h
m
15
25
ϕ
Fraction
0.25
0.35
kh
mD
2000
4000
kv/kh
Fraction
0.3
0.8
operating parameters
Pinj
kPa
1500
3000
QS
m3/d
5
15
XS
Fraction
0.75
0.95
Input and output of the proxy model for Cum-oil
production profile
forecast during the SAGD process.As shown in Table , initial reservoir pressure, Pi, values
vary from 200 to 600 kPa and initial oil saturation, Soi, values range from 0.65 to 0.85. Thermal conductivity
of rocks, λr, values vary from 1.56 × 105 to 4.5 × 105 J/(m d °C) and effective
thickness, h, values range from 15 to 25 m. Porosity,
ϕ, values vary from 0.25 to 0.35 and horizontal permeability, kh, values range from 2000 to 4000 mD. Ratio
of vertical permeability to horizontal permeability, kv/kh, values vary from 0.3
to 0.8 and operational pressure, Pinj,
values range from 1500 to 3000 kPa. Maximum steam rate of production
well, QS, values vary from 5 to 15 m3/d and steam quality, XS, values
range from 0.75 to 0.95. The latin hypercube sampling method, one
of the experimental design methods, is used to produce 524 sets of
data which yield uniform distribution within related value ranges
listed in Table .
Then, the corresponding simulation results are obtained through the
commercial numerical simulator (CMG STARS, 2020).
Construction of the Knowledge-Based Performance
Indicator
For building the data-driven proxy model which
can accurately and efficiently predict the Cum-oil production profile
of the SAGD process, an appropriate knowledge-based performance indicator
must be found and clearly defined. Various growth mathematic models
derived from the biological growth field have been widely used in
the research of population growth problems, cell growth problems,
and other domains of life, social, and economic sciences.[35,36] Growth is a common feature in various scenarios including reservoir
production. Among the aforementioned application scenarios, the growth
mathematic model offers an effective tool to account the growth under
given confronting expansion and restraint forces.[37] The capacity of growth mathematic models makes it possible
to describe the Cum-oil production profile from interaction of complicated
recovery mechanisms involving simultaneous heat and mass transfer
across a connectivity network. Importantly, strong analogies between
the SAGD process and tumor growth process where growth mathematic
models have been successfully applied could be established.[37] It is a significant motivation to take advantage
of growth mathematic models for solving Cum-oil production forecasting
problem during the SAGD process.Some successful application
cases of production forecasting with growth mathematic models are
reported.[37−39] To be able to achieve better performance, different
mathematic models, such as logistic model, Gompertz model, and von
Bertalanffy model, have been introduced to fit the simulation results.
After our attempts and comparisons, the von Bertalanffy model has
better performance than other models when fitting the Cum-oil production
profile during the SAGD process in our study. Therefore, the von Bertalanffy
model is chosen as the performance indicator to fit Cum-oil production
profiles. The general mathematical form of the von Bertalanffy model
is described as follows[40−42]Then, the dz/dt can be described
as followsAlso, the d2z/dt2 isThe main mathematical characteristics of the
von Bertalanffy model
are as follows:From eq , it can be seen that , so that the boundedness of the
von Bertalanffy
model is proved. Term a always refers to maximum
size or carrying capacity.According to eq ,
it can be seen that dz/dt ≥
0, so this model has the characteristic of monotonically
increasing (all constants of the model are greater than 0).In addition, according
to eq , it can be known
that
the curve of this model is S-shaped.In the SAGD process, the characteristic of the Cum-oil production
profile is similar with the curve of the von Bertalanffy model. First,
not all the resources can be extracted under the specific technical
conditions, so . Second,
it is obvious that the Cum-oil
production curve is monotonically increasing. Third, the Cum-oil production
curve is also S-shaped. All the features are consistent with characteristics
of the von Bertalanffy model.Based on the abovementioned analogies
and eq , the Cum-oil
production indicator can be
described as follows[39]Based on eq , the NR and coefficients b and k could be used to fit the Cum-oil production
profile of
the SAGD process. By introducing such an indicator, it is possible
to solve the time series problems with the feedforward neural network.
Thus, it is feasible to acquire the related Cum-oil production at
any desired time step.Based on the Cum-oil production profiles
which are extracted from
the simulation results mentioned in Section , the NR and
coefficients b and k of each data
sets are obtained according to eq . In the fitting process, the Levenberg–Marquart
method is used to acquire better fitting performance. Results show
that the Cum-oil performance indicator used in this paper provides
great fitting results for the SAGD process. For all cases, the R-square which represent the fitting accuracy is almost
equal to 1.00. Also, among the cases, the NR is found to be in the range 4.84 to 15.56 × 104 m3, coefficient b is found to be in the range
0.54 to 0.82, and coefficient k is found to be in
the range 0.11 to 0.49.
Data-Driven Proxy Model
Related Basic Theories of the Feedforward
Neural Network
In Section , the feedforward neural network is employed to construct
the data-driven proxy model. The feedforward neural network is a highly
nonlinear mapping processing system with self-organization, self-learning,
and self-adaptation capabilities, inspired by the biological nervous
system. Generally, it is made up of an input layer, one or more hidden
layers, and an output layer, and each layer has a different number
of neurons (Figure ).[43,44] The neuron number of input layer and output
layer is related to the number of input parameters and output parameters,
respectively, while the hidden layers always have several highly interconnected
neurons. Generally, the utilization of the neural network includes
two parts: training and forecasting. As depicted in Figure , the training process of the
feedforward neural network could be divided into two sections. The
first section is that the signal propagates forward from input layer
to output layer. Also, the second section is that the error propagates
backward from output layer to start point, in order to correct the
weight matrix. Finally, the well-trained neural network could be obtained
through several iterations.
Figure 5
Schematic diagram of the feedforward neural
network.
Schematic diagram of the feedforward neural
network.During the forward propagation
process of the signal, the algorithm
of single-layered perception is shown in Figure . After inputting the data set, connection
weight is used to adjust the weight ratio of each input. The next
procedure is the summation process. When the summation result including
bias is obtained, the output can be acquired through activation function.
Thus, the mathematical formula is given by[45,46]
Figure 6
Algorithm
of single-layered perception.
Algorithm
of single-layered perception.During the back propagation process of error, the updating formula
of weight and bias isFor convenience,
we use the NN (neuron network) to denote the feedforward
neural network used here in the following part of this paper.
NN-Based Data-Driven Proxy Model
To build the NN-based
data-driven proxy model, the first step is
to construct the overall data set which include all the input and
output parameters. Ten input parameters are shown in Table and three output parameters
are NR, b, and k which are derived from fitting results of simulation outputs.
Due to the inconsistent magnitude of ten input parameters, it is normalized
using the following formulaThe overall data set which consists
of 524 sets of input and output data is randomly categorized into
three using the 80/10/10 percent split: training data set, validation
data set, and testing data set. The training data set is used for
training the NN, while the validation data set is used for hyperparameter
adjustment to find the suitable structure of the NN. Last but not
the least, the testing data set plays a significant role in the blind-testing
process (not involved in the training process at all). It is used
for evaluating the final forecasting performance of the data-driven
proxy model.In this paper, the NN is implemented in Python
3.7 through programming
with the utilization of PyTorch. To find the suitable structure of
the NN, the main hyperparameters that need to be adjusted are the
learning rate, activation function, loss function, updating optimization
algorithm, number of neurons in hidden layers, and number of layers.After the attempt of tuning the learning rate from 0.0001 to 0.8,
0.004 is selected as the initial value of the learning rate, while
a learning rate optimizer named “ReduceLROnPlateau”
is used.[47] Such an optimizer allows dynamic
reducing of the learning rate once the loss function stops decreasing.
Specifically, the factor value is set as 0.1 and the patience value
is set as 10. ReLU activation function and L1 loss function have been
picked as a final activation function and loss function, respectively.
After that, different updating optimization algorithms, such as stochastic
gradient descent (SGD) method and Adam method, have been applied in
the training process of the NN.[48] Results
reveal that the SGD method with appropriate settings outperforms other
methods, so the SGD method with a 0.75 momentum value is the final
choice. In addition, the L2 regularization method is used, in order
to avoid overfitting phenomena.Based on the aforementioned
settings, several varieties of configurations
involving one layer or multiple layers (5 to 100 neurons in each layer)
are explored. Eventually, the best-performing NN structure is chosen.
It can be seen from results that the forecast precision of b and k are relatively low, compared with
the NR. To improve the performance, the
revised output parameters have been adopted in the latest NN. The
latest NN uses NR, 10b, and 10k instead of NR, b, and k considering the magnitude
difference between original output parameters. The performance improvement
can be seen from Figure . Figure a shows
that the average error of all sets are reduced and from Figure b, we can observe that more
accurate outputs are obtained.
Figure 7
Comparison of different choices of output
parameters: (a) average
error and (b) coefficient of determination (R2) for three outputs.
Comparison of different choices of output
parameters: (a) average
error and (b) coefficient of determination (R2) for three outputs.Figure shows the
final topology of the NN and there are three hidden layers with 21,
16, and 15 neurons, respectively. Thus, the construction of the NN-based
data-driven proxy model is completed.
Figure 8
Final topology of the NN.
Final topology of the NN.
Overview of Workflow for Building the Data-Driven
Proxy Model
According to the abovementioned statement, the
workflow of building the data-driven proxy model for Cum-oil production
profile forecast of the SAGD process could be summarized as follows
(Figure ):
Figure 9
Proposed workflow in this paper.
The first
step is to choose the appropriate
variables as input parameters and construct the database of the input
parameters within given ranges.After the data generation procedure,
a variety of reservoir models with input parameters mentioned in Step
(1) are established. Then, each simulation run is completed using
the commercial simulator. Therefore, the Cum-oil profiles are extracted
as performance characters.The third step is to construct the
knowledge-based performance indicator according to simulation results
of various models built in Step 2. In this step, the von Bertalanffy
model is chosen to represent the Cum-oil production profiles. The NR and coefficients b and k of each data sets can be obtained through the Levenberg-Marquart
method.According to
the abovementioned research
results, the initial framework of the feedforward neural network is
designed.The aim
of the fifth step is to find
the suitable topology of the NN. In this step, main hyperparameters
are constantly adjusted until the error is in the accepted range.After the training process,
the adaptability
and accuracy of the trained NN are validated by the blind-testing
data set. Thus, it can be used as a data-driven proxy model to forecast
the Cum-oil production profile during the SAGD process.Proposed workflow in this paper.
Application of the Data-Driven Proxy Model
After the accuracy of the model has been verified, it can be used
to do some application works. The application of the data-driven proxy
model will be illustrated in Section . First, efficient parametric studies are shown. Second,
uncertainty analysis of the SAGD process is conducted with some assumptions
shown in Table . It
is assumed that all parameters yield normal distribution. Third, the
ability of the data-driven proxy model to predict the average daily
oil production is shown.
Table 2
Different Variable
Parameters and
Their Value Ranges
parameter
unit
expectation
standard deviation
Pi
kPa
400
20
Soi
Fraction
0.75
0.02
λr
J/(m·d·°C)
3 × 105
3 × 104
h
m
20
1
ϕ
Fraction
0.3
0.01
kh
mD
3000
200
kv/kh
fraction
0.55
0.06
Pinj
kPa
2250
150
QS
m3/d
10
1
XS
fraction
0.85
0.02
Results
Performance of the Data-Driven Proxy Model
In this
part, the performance of the data-driven proxy model is
evaluated. The data-driven proxy model is mainly used to predict the NR, b, and k. Once the NR, b, and k of a given case are determined, the Cum-oil production
profile can be obtained using the performance indicator. It is worth
noting that the outputs of revised version of NN are NR, 10b, and 10k. Therefore,
the simple conversion is needed before the final calculation. The
relative error (RE) is defined as followsAlso, the average
absolute percentage
error (AAPE) is defined as followsThe AAPE
of NR, b,
and k between different models is shown in Figure and so did the
20-year Cum-oil production. Figure shows that the error of each set (training set, validation
set, and testing set) is relatively low. The AAPE of all the parameters
in each set is less than 2%. Therefore, the accuracy of the model
is verified.
Figure 10
AAPE of NR, b, k, and 20-year Cum-oil production in each data
set.
AAPE of NR, b, k, and 20-year Cum-oil production in each data
set.Then, the error frequencies about
RE of different parameters in
each set are drawn in Figure . It can be seen from Figure that the error in most cases is less than 1% and the
error in almost entire cases is less than 3%. It can also be observed
from Figures and 11 that the forecasting precision of k is lower than NR and b. The maximum RE of k is higher than 5%. However,
it should be noted that the effect of k on the Cum-oil
production profile is comparatively small. Thus, the error of k is acceptable.
Figure 11
RE frequencies of output parameters and 20-year
Cum-oil production
for different sets: (a) NR, (b) b, (c) k, and (d) 20-year Cum-oil production.
RE frequencies of output parameters and 20-year
Cum-oil production
for different sets: (a) NR, (b) b, (c) k, and (d) 20-year Cum-oil production.Two cases with comparatively better forecasting
performance are
shown in Figure . We can see from Figure that the Cum-oil production curves obtained from two different
models fit well. Two cases with comparatively worse forecasting performance
are shown in Figure . Although the certain deviation between two curves is observed from Figure , the general trend
of the curve predicted using the data-driven proxy model is consistent
with the other. In addition, we can also conclude that the forecasting
precision of NR plays a significant role
in the model. In Figure , Cum-oil production profiles appear to be marginally overestimated
mainly because the value of NR could not
be forecasted as desired so that it also affects the error of 20-year
Cum-oil production negatively.
Figure 12
Two cases with comparatively better forecasting
performance: (a)
average error of three outputs = 0.47%; (b) average error of three
outputs = 0.89%.
Figure 13
Two cases with comparatively
worse forecasting performance: (a)
average error of three outputs = 1.35%; (b) average error of three
outputs = 2.11%.
Two cases with comparatively better forecasting
performance: (a)
average error of three outputs = 0.47%; (b) average error of three
outputs = 0.89%.Two cases with comparatively
worse forecasting performance: (a)
average error of three outputs = 1.35%; (b) average error of three
outputs = 2.11%.Each reservoir simulation
run completed in this paper spends about
50 min due to the complexity of the recovery mechanism on an Intel
Core i7-3770 3.40 GHz CPU, whereas the data-driven proxy model just
takes around a few minutes for the overall data sets (524 cases).
Efficient Parametric Studies
The
data-driven proxy model is a quite powerful tool to complete the sensitivity
analysis work of different input parameters, as a result of its ability
of saving time. In this section, Soi and kh are taken as an example to do the sensitivity
analysis work. For instance, we change the value of Soi and leave the remaining parameters unchanged so as
to study the effect of Soi on the Cum-oil
production profile of the SAGD process.The sensitivity analysis
results obtained from the data-driven proxy model and numerical simulation
model are shown in Figure . The comparison between the two shows a good consistency. Figure a shows the sensitivity
of Cum-oil production to Soi, and it can
be seen that the greater the Soi is, the
more the oil can be drained from porous media, and the higher the
Cum-oil production is. Figure b shows the sensitivity of Cum-oil production to kh, and it can be seen that the greater the kh is, the higher the expansion velocity of the
steam chamber is, and the higher the Cum-oil production is, but the
20-year Cum-oil production of each case is relatively close.
Figure 14
Comparison
of sensitivity analysis results between the data-driven
proxy model and numerical model: (a) Soi; (b) kh.
Comparison
of sensitivity analysis results between the data-driven
proxy model and numerical model: (a) Soi; (b) kh.
Uncertainty Analysis of the SAGD Process
The Monte Carlo simulation method is adopted to conduct the uncertainty
analysis of the SAGD process through the data-driven proxy model.
The expectation curves of the Cum-oil recovery factor for two different
production time periods are shown in Figure , which are obtained by the Monte Carlo
simulation of 10,000 samples. It takes around just a few seconds.
Such an application allows us to quantify the uncertainties of different
input parameters to observe their effect on the performance of the
SAGD process.
Figure 15
Expectation curves of the Cum-oil recovery factor for
two different
production time periods.
Expectation curves of the Cum-oil recovery factor for
two different
production time periods.Figure shows
the P10, P50, and P90 estimations of the Cum-oil recovery factor for
5 and 10 years. This workflow can be used to compare the effect of
different operating parameters on the performance of SAGD with known
uncertainties. Furthermore, the Cum-oil recovery factor could be integrated
into the economic evaluation process to help reservoir engineers making
decisions.
Figure 16
P10, P50, and P90 estimations of the Cum-oil recovery
factor for
two different production time periods.
P10, P50, and P90 estimations of the Cum-oil recovery
factor for
two different production time periods.
Prediction of Average Daily Oil Production
This model also could be used to forecast the average daily oil
production. For a given case, the NR, b, and k could be predicted using the data-driven
proxy model. Then, the Cum-oil profile could be plotted using the
performance indicator based on the aforementioned parameters. Thus,
the average daily oil production can be calculated by such a simple formula . This function
is illustrated in Figure with a given case.
Figure 17
Average daily oil production profile calculated from the Cum-oil
production profile.
Average daily oil production profile calculated from the Cum-oil
production profile.
Discussion
The performance indicator is a convenient and effective tool to
characterize the production profile. Meanwhile, the fitting results
obtained using a performance indicator could be integrated into the
NN, which is a powerful approach in the forecasting field. The integration
of two parts make it possible to solve time series problems in an
efficient way. The data-driven proxy model takes full advantage of
the capability of the feedforward neural network, instead of using
more complicated neural networks, and the desired effect is achieved.
Therefore, it can help reservoir engineers make better use of the
SAGD technology for oil sands or heavy oil reservoirs.It is
noteworthy that the error derived from the reservoir numerical
model or performance indicator will be carried into the data-driven
proxy model because the reservoir simulation models obey some assumptions,
compared with the actual situation. Also, the degree of agreement
between the simulation results and the performance indicator cannot
reach 100%. However, this fact does not prevent the data-driven proxy
model from becoming a powerful tool to do forecasting works.In our study, ten attributes are selected as variable in the numerical
simulation model, and cases containing different combinations of those
attributes are generated. Considering that such a feature dimension
is not high, all of the ten attributes are selected as input parameters
for the data-driven proxy model, in order to capture different configurations
as much as possible. However, when it comes to a more complex situation
including a large number of input features, it is rather remarkable
that sensitivity analysis is a useful approach which can assist engineers
to complete input parameter determination tasks. Such a way could
help engineers to reduce the computational cost while maintaining
the performance of the data-driven model at the acceptable level.In addition, when it is aimed to study the case which had the value
range outside our study, the data set used in the training process
ought to be expanded and the data-driven proxy model ought to be retrained,
in order to include new conditions. The workflow of building the data-driven
proxy model presented in this study would offer guidance to the corresponding
research.
Summary and Conclusions
Based on the reservoir
numerical simulation
approach, the von Bertalanffy performance indicator, and ANN, the
data-driven proxy model for Cum-oil production profile forecasting
of the SAGD process is established. The data-driven proxy model fully
considers initial conditions, reservoir characteristics, and operating
parameters.During
the training process of the
NN, several attempts of hyperparameter adjustment have been done to
find the suitable structure of the network. For this study, the combination
of “ReduceLROnPlateau” optimizer, ReLU activation function, L1 loss function, and
SGD algorithm is applied. For further improving forecasting performance
of the neural network, some strategies or tricks, such as L2 regularization
method and output revision, are used. The ultimate structure of the
neural network consists of three hidden layers with 21, 16, and 15
neurons, respectively.The reliability of the data-driven
proxy model is verified by testing the data set. Average absolute
percentage error of related parameters of the performance indicator
in the testing data set is 0.77%, and the error of Cum-oil production
after 20 years is 0.52%.The data-driven proxy model could
be employed to study large amounts of data efficiently, as shown in
the application of parametric studies and uncertainty analysis, and
it could also be used to forecast average daily oil production of
a given case. Such functions could help engineers to make the decision.
Furthermore, the developed workflow also can be extended to more complex
situations of the SAGD process.