| Literature DB >> 32382523 |
Helen Pinto1, Ian Gates2, Xin Wang1.
Abstract
Bayesian Biclustering by Dynamics (BBCD) is a new clustering algorithm for Steam-Assisted Gravity Drainage (SAGD) oil recovery time series data [1]. In this companion paper the BBCD algorithm is tested on synthetic data, demonstrating use of the algorithm, as well as its robustness, and performance accuracy against Random Agglomeration. Supplementary information includes formulae to calculate analytical steam and oil volume data used as background knowledge for the SAGD application. Advantages of the BBCD algorithm are listed below. •It includes background knowledge directly into the clustering process.•It finds similarity between series and over time.•It allows a user-specified definition for behaviour of interest, which relaxes dependency on series shape. This is important when similar behavioural events do not necessarily occur in the same temporal order.Entities:
Keywords: Bayesian statistics; Bbcd; Biclustering algorithm; Steam-assisted gravity drainage (sagd) application
Year: 2020 PMID: 32382523 PMCID: PMC7199012 DOI: 10.1016/j.mex.2020.100897
Source DB: PubMed Journal: MethodsX ISSN: 2215-0161
Fig. 1Flowchart of BBCD algorithm.
Fig. 2Observed and analytically calculated steam injection and oil production monthly volumes for one well
Note: Further explanation on the appearance of these calculated series is given in supplementary material.
An example of steam-oil cstates.
| steam | oil | |
|---|---|---|
| either = 0 | 0 | |
| High | Low | 1 |
| Medium | Low | 2 |
| Low | Low | 3 |
| High | Medium | 4 |
| Medium | Medium | 5 |
| Low | Medium | 6 |
| High | High | 7 |
| Medium | High | 8 |
| Low | High | 9 |
Fig. 3A matrix P of transition probabilities at time t.
Fig. 4Layouts tested.
Transition probability matrix MC1.
| 0.031 | 0.060 | 0.169 | 0.308 | 0.357 | 0.041 | 0.001 | 0.001 | 0.001 | 0.001 | ||
| 0.001 | 0.011 | 0.070 | 0.150 | 0.239 | 0.288 | 0.130 | 0.070 | 0.031 | 0.011 | ||
| 0.120 | 0.060 | 0.100 | 0.090 | 0.100 | 0.090 | 0.100 | 0.130 | 0.130 | 0.080 | ||
| 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.100 | 0.704 | 0.140 | 0.051 | 0.001 | ||
| 0.120 | 0.060 | 0.100 | 0.090 | 0.100 | 0.090 | 0.100 | 0.130 | 0.130 | 0.080 | ||
| 0.031 | 0.060 | 0.169 | 0.308 | 0.357 | 0.041 | 0.001 | 0.001 | 0.001 | 0.001 | ||
| 0.011 | 0.060 | 0.100 | 0.140 | 0.179 | 0.150 | 0.130 | 0.130 | 0.070 | 0.021 | ||
| 0.001 | 0.001 | 0.001 | 0.001 | 0.051 | 0.348 | 0.288 | 0.179 | 0.060 | 0.051 | ||
| 0.001 | 0.001 | 0.001 | 0.001 | 0.001 | 0.100 | 0.704 | 0.140 | 0.051 | 0.001 | ||
| 0.001 | 0.021 | 0.001 | 0.060 | 0.110 | 0.258 | 0.179 | 0.249 | 0.100 | 0.021 | ||
Fig. 5Ideal number of clusters found for Layout 2 and 3.
Percentage accuracy of clustering results.
| number of bins = 10, merge-qualifying cut-off = 2 | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| noise added to both observed and prior data sets | 0% | 10% | 20% | 50% | ||||||||
| 0.01 | 0.1 | 1 | 0.01 | 0.1 | 1 | 0.01 | 0.1 | 1 | 0.01 | 0.1 | 1 | |
| Layout 1 | 98 | 99 | 100 | 90 | 90 | 94 | 91 | 91 | 93 | 75 | 66 | 68 |
| Layout 2 | 92 | 91 | 100 | 92 | 95 | 100 | 93 | 89 | 100 | 75 | 92 | 93 |
| Layout 3 | 98 | 91 | 99 | 97 | 97 | 97 | 95 | 95 | 97 | 89 | 87 | 93 |
Percentage accuracy with different initial binning.
| noise | 0% | 10% | 20% | 50% | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| bins | 5 | 10 | 17 | 5 | 10 | 17 | 5 | 10 | 17 | 5 | 10 | 17 |
| Layout 1 | 100 | 100 | 96 | 100 | 94 | 94 | 84 | 93 | 88 | 54 | 68 | 83 |
| Layout 2 | 99 | 100 | 99 | 98 | 100 | 98 | 98 | 100 | 98 | 88 | 93 | 96 |
| Layout 3 | 98 | 99 | 98 | 97 | 97 | 97 | 96 | 97 | 95 | 83 | 93 | 87 |
Fig. 6Posterior probability and mincombo scores for one test run.
Percentage accuracy associated with varying merge-qualifying cut-off points.
| merge-qualifying cut-off | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| Layout 1 | 98 | 100 | 100 | 100 |
| Layout 2 | 88 | 100 | 98 | 98 |
| Layout 3 | 82 | 99 | 98 | 98 |
Fig. 7BBCD matrix operation counts.
Fig. 8BBCD running time.
Comparison of BBCD against random agglomeration.
| Accuracy | F1 Score | |||
|---|---|---|---|---|
| BBCD | Random agglomeration | BBCD | Random agglomeration | |
| Layout 1 | 100 | 81 | 100 | 77 |
| Layout 2 | 100 | 88 | 98 | 41 |
| Layout 3 | 99 | 79 | 89 | 25 |
| Subject Area | • Energy |
| • Engineering | |
| • Computer Science | |
| More specific subject area: | An algorithm that biclusters time-series data structured as Bayesian matrices, which makes it easier to interpret the resulting clusters. |
| Method name: | Bayesian Biclustering by Dynamics (BBCD) |
| Name and reference of original method | |
| Resource availability | Java software files, user guide, synthetic data and template files have been uploaded alongside this submission. |