| Literature DB >> 33266929 |
Dragutin T Mihailović1, Emilija Nikolić-Đorić1, Slavica Malinović-Milićević2, Vijay P Singh3, Anja Mihailović2, Tatijana Stošić4, Borko Stošić4, Nusret Drešković5.
Abstract
The purpose of this paper was to choose an appropriate information dissimilarity measure for hierarchical clustering of daily streamflow discharge data, from twelve gauging stations on the Brazos River in Texas (USA), for the period 1989-2016. For that purpose, we selected and compared the average-linkage clustering hierarchical algorithm based on the compression-based dissimilarity measure (NCD), permutation distribution dissimilarity measure (PDDM), and Kolmogorov distance (KD). The algorithm was also compared with K-means clustering based on Kolmogorov complexity (KC), the highest value of Kolmogorov complexity spectrum (KCM), and the largest Lyapunov exponent (LLE). Using a dissimilarity matrix based on NCD, PDDM, and KD for daily streamflow, the agglomerative average-linkage hierarchical algorithm was applied. The key findings of this study are that: (i) The KD clustering algorithm is the most suitable among others; (ii) ANOVA analysis shows that there exist highly significant differences between mean values of four clusters, confirming that the choice of the number of clusters was suitably done; and (iii) from the clustering we found that the predictability of streamflow data of the Brazos River given by the Lyapunov time (LT), corrected for randomness by Kolmogorov time (KT) in days, lies in the interval from two to five days.Entities:
Keywords: Brazos River; K-means clustering; Kolmogorov complexity-based measures; Kolmogorov time; Lyapunov time; average-linkage clustering hierarchical algorithm; largest Lyapunov exponent; predictability of streamflow time series; streamflow time series
Year: 2019 PMID: 33266929 PMCID: PMC7514696 DOI: 10.3390/e21020215
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Geographical locations of the gauging stations on the Brazos River used in this study.
Basic descriptive statistics of the daily discharge data of the Brazos River for the period (1989–2016); (the first number indicates the order of the station used in this study).
| USGS Code | Station | Mean | Median | Min | Max | IQR | SD |
|---|---|---|---|---|---|---|---|
| 1_08082500 | Seymour | 223.5 | 51.0 | 0.0 | 30,700.0 | 130.0 | 907.9 |
| 2_08088000 | South Bend | 613.0 | 110.0 | 0.0 | 43,800.0 | 320.0 | 2209.8 |
| 3_08088610 | Graford | 623.5 | 109.0 | 4.1 | 43,800.0 | 300.0 | 2306.9 |
| 4_08089000 | Palo Pinto | 723.7 | 133.0 | 8.5 | 39,700.0 | 361.0 | 2557.9 |
| 5_08090800 | Dennis | 974.4 | 195.0 | 0.0 | 79,500.0 | 418.0 | 3600.3 |
| 6_08091000 | Glen Rose | 1078.8 | 86.0 | 1.5 | 82,100.0 | 530.0 | 4093.9 |
| 7_08093100 | Aquilla | 1561.2 | 445.0 | 1.2 | 27,100.0 | 1118.0 | 3687.3 |
| 8_08096500 | Waco | 2456.1 | 695.0 | 0.5 | 44,000.0 | 1775.0 | 5237.7 |
| 9_08098290 | Highbank | 3103.7 | 873.5 | 30.0 | 70,300.0 | 2240.0 | 6148.1 |
| 10_08111500 | Hempstead | 8014.3 | 2520.0 | 58.0 | 137,000.0 | 7650.0 | 12,821.1 |
| 11_08114000 | Richmond | 8523.8 | 2855.0 | 182.0 | 102,000.0 | 8660.0 | 13,232.0 |
| 12_08116650 | Rosharon | 8851.4 | 3060.0 | 27.0 | 109,000.0 | 9080.0 | 13,638.0 |
Figure 2Frequency counts of daily discharge data for the USGS 08082500 Brazos River station at Seymour, Texas (USA) for the period 1989–2016.
Largest Lyapunov exponent (LLE), Kolmogorov complexity (KC), and the highest value of Kolmogorov complexity spectrum (KCM) of standardized daily discharge data on the Brazos River.
| USGS Code | Station | LLE | KC | KCM |
|---|---|---|---|---|
| 1_08082500 | Seymour | 0.158 | 0.266 | 0.489 |
| 2_08088000 | South Bend | 0.038 | 0.242 | 0.446 |
| 3_08088610 | Graford | 0.394 | 0.474 | 0.682 |
| 4_08089000 | Palo Pinto | 0.032 | 0.371 | 0.658 |
| 5_08090800 | Dennis | 0.042 | 0.311 | 0.510 |
| 6_08091000 | Glen Rose | 0.051 | 0.301 | 0.508 |
| 7_08093100 | Aquilla | 0.055 | 0.352 | 0.581 |
| 8_08096500 | Waco | 0.061 | 0.298 | 0.526 |
| 9_08098290 | Highbank | 0.061 | 0.316 | 0.422 |
| 10_08111500 | Hempstead | 0.027 | 0.218 | 0.285 |
| 11_08114000 | Richmond | 0.014 | 0.201 | 0.260 |
| 12_08116650 | Rosharon | 0.018 | 0.200 | 0.252 |
Figure 3Dendrogram for hierarchical clustering of daily streamflow based on applied dissimilarity measure: (a) Compression-based dissimilarity measure; (b) permutation distribution dissimilarity measure; and (c) Kolmogorov complexity distance.
Figure 4Map of hierarchical clustering of daily streamflow based on the applied dissimilarity measure: (a) Compression-based dissimilarity measure; (b) permutation distribution dissimilarity measure; and (c) Kolmogorov complexity distance.
Figure 53D scatter plot specified by the vectors KC, KCM, and LLE.
The centroids (means) of the clusters.
| Information Measure | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 |
|---|---|---|---|---|
| KC | 0.289 | 0.474 | 0.362 | 0.206 |
| KCM | 0.484 | 0.682 | 0.620 | 0.266 |
| LLE | 0.069 | 0.394 | 0.044 | 0.020 |
Figure 6Plot of means for all clusters.
Analysis of variance (ANOVA) table for K-means clustering. The symbols introduced have the following meaning: SS—sum of squares; df—degrees of freedom; F—calculated value of F-test; P—value.
| Variable | Between SS | df | Within SS | df | F |
|
|---|---|---|---|---|---|---|
| KC | 0.064679 | 3 | 0.004561 | 8 | 37.81 | 0.00005 |
| KCM | 0.215958 | 3 | 0.011885 | 8 | 48.46 | 0.00002 |
| LLE | 0.112645 | 3 | 0.010489 | 8 | 28.64 | 0.00013 |
Figure 7Predictability of the standardized daily discharge data of the Brazos River, given by the Lyapunov time (LT) corrected for randomness (in days).