| Literature DB >> 35408419 |
Chuanlu Liu1, Shuliang Wang1,2, Hanning Yuan1, Yingxu Dang1, Xiaojia Liu1.
Abstract
Detecting correlations in high-dimensional datasets plays an important role in data mining and knowledge discovery. While recent works achieve promising results, detecting multivariable correlations especially trivariate associations still remains a challenge. For example, maximal information coefficient (MIC) introduces generality and equitability to detect bivariate correlations but fails to detect multivariable correlation. To solve the problem mentioned above, we proposed quadratic optimized trivariate information coefficient (QOTIC). Specifically, QOTIC equitably measures dependence among three variables. Our contributions are three-fold: (1) we present a novel quadratic optimization procedure to approach the correlation with high accuracy; (2) QOTIC exceeds existing methods in generality and equitability as QOTIC has general test functions and is applicable in detecting multivariable correlation in datasets of various sample sizes and noise levels; (3) QOTIC achieved both higher accuracy and higher time-efficiency than previous methods. Extensive experiments demonstrate the excellent performance of QOTIC.Entities:
Keywords: correlation; large data; maximal information coefficient (MIC); quadratic optimized trivariate information coefficient (QOTIC); trivariate associations
Mesh:
Year: 2022 PMID: 35408419 PMCID: PMC9003031 DOI: 10.3390/s22072806
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Advantages of trivariate associations analysis.
Figure 2Partition strategies of bivariate variables and trivariate ones.
Figure 3Comparison of equitability of adaptive equipartition, single optimization and quadratic optimization.
The color on a relationship and the variable on an axis in Figure 3.
| Functions |
|
|
| Legend |
|---|---|---|---|---|
| 1 | Linear | Linear | Linear |
|
| 2 | Linear | Linear × Cosine | Linear × Sine |
|
| 3 | Linear | Polynomial | Sine + Linear |
|
| 4 | Linear | Piecewise Linear | Linear |
|
| 5 | Linear | Cosine | Parabola |
|
| 6 | Linear | Exponential | Linear |
|
| 7 | Linear | Sine | Logarithm |
|
| 8 | Linear | Exponential + Parabola | Cosine + Linear |
|
| 9 | Linear | Sine | Cosine |
|
| 10 | Polynomial | Cosine | Sine + Linear |
|
| 11 | Linear | Polynomial | Polynomial |
|
| 12 | Linear | Power | Linear |
|
Figure 4Comparison of bias and variance of QOTIC, MTDIC, TEIC.
The analysis of bias and variance of QOTIC, MTDIC, TEIC.
| Method | Bias | Variance | ||||
|---|---|---|---|---|---|---|
| Min | Mean | Max | Min | Mean | Max | |
| QOTIC | 0.011 | 0.096 | 0.143 | 0.0 | 0.001 | 0.003 |
| MTDIC | 0.017 | 0.11 | 0.164 | 0.0 | 0.002 | 0.005 |
| TEIC | 0.084 | 0.391 | 0.61 | 0.001 | 0.034 | 0.139 |
Figure 5The flow chart of QOTIC method.
Figure 612 functional relationships used to verify generality.
Generality performance of MTDIC, QOTIC, and ESTIC.
| Functions | |||||||
|---|---|---|---|---|---|---|---|
| MTDIC | QOTIC | ESTIC | MTDIC | QOTIC | ESTIC | ||
| 1 | Linear | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 2 | Exponential | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 3 | Logarithmic | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 4 | Quadratic | 0.94 | 0.96 | 0.96 | 1.00 | 1.00 | 1.00 |
| 5 | Cubic | 0.96 | 0.97 | 0.97 | 0.97 | 0.98 | 0.98 |
| 6 | Sinusoidal low freq. | 0.87 | 0.92 | 0.92 | 0.94 | 0.95 | 0.95 |
| 7 | Sinusoidal high freq. | 0.58 | 0.59 | 0.59 | 0.74 | 0.75 | 0.75 |
| 8 | Circle | 0.49 | 0.50 | 0.50 | 0.55 | 0.56 | 0.56 |
| 9 | Step function | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
| 10 | Two lines | 0.95 | 0.95 | 0.95 | 0.96 | 0.96 | 0.96 |
| 11 | X line | 0.47 | 0.51 | 0.51 | 0.53 | 0.55 | 0.55 |
| 12 | X curve | 0.48 | 0.54 | 0.54 | 0.53 | 0.56 | 0.56 |
Time cost of MTDIC, QOTIC, and ESTIC.
| Functions | MTDIC | QOTIC | ESTIC | |
|---|---|---|---|---|
| 1 | Linear | 0.06 | 5.13 | 7.13 |
| 2 | Exponential | 0.07 | 5.13 | 7.13 |
| 3 | Logarithmic | 0.09 | 5.13 | 7.13 |
| 4 | Quadratic | 0.36 | 5.13 | 7.13 |
| 5 | Cubic | 0.19 | 5.13 | 7.13 |
| 6 | Sinusoidal low freq. | 0.41 | 5.13 | 7.13 |
| 7 | Sinusoidal high freq. | 0.40 | 5.13 | 7.13 |
| 8 | Circle | 0.34 | 5.13 | 7.13 |
| 9 | Step function | 0.08 | 5.13 | 7.13 |
| 10 | Two lines | 0.10 | 5.13 | 7.13 |
| 11 | X line | 0.39 | 5.13 | 7.13 |
| 12 | X curve | 0.37 | 5.13 | 7.13 |
Figure 7Time cost with different sample sizes.
Figure 8Equitability comparison of six trivariate correlation methods.
Trivariate associations by QOTIC and Bivariate associations by MIC.
| Group | Trivariate Associations | QOTIC | Bivariate Associations | MIC | |||
|---|---|---|---|---|---|---|---|
| 1 | infant deaths | under-five deaths | life expectancy | 0.942 | infant deaths | life expectancy | 0.31 |
| under-five deaths | life expectancy | 0.342 | |||||
| infant deaths | under-five deaths | 0.958 | |||||
| 2 | thinness | thinness five to nine years | life expectancy | 0.857 | thinness 1–19 years | life expectancy | 0.386 |
| thinness five to nine years | life expectancy | 0.384 | |||||
| thinness 1–19 years | thinness five to nine years | 0.912 | |||||
| 3 | polio | diphtheria | life expectancy | 0.777 | polio | life expectancy | 0.298 |
| diphtheria | life expectancy | 0.295 | |||||
| polio | diphtheria | 0.801 | |||||
| 4 | percentage expenditure | GDP | life expectancy | 0.685 | percentage expenditure | life expectancy | 0.31 |
| GDP | life expectancy | 0.377 | |||||
| percentage expenditure | GDP | 0.714 | |||||
| 5 | income composition of resources | schooling | life expectancy | 0.67 | income composition of resources | life expectancy | 0.614 |
| schooling | life expectancy | 0.497 | |||||
| income composition of resources | schooling | 0.72 | |||||
| 6 | adult mortality | percentage expenditure | life expectancy | 0.642 | adult mortality | life expectancy | 0.709 |
| percentage expenditure | life expectancy | 0.31 | |||||
| adult mortality | percentage expenditure | 0.22 | |||||
| 7 | adult mortality | GDP | life expectancy | 0.64 | adult mortality | life expectancy | 0.709 |
| GDP | life expectancy | 0.377 | |||||
| adult mortality | GDP | 0.284 | |||||
Figure 9Exploring GHO dataset for trivariate associations with QOTIC.
Figure 10Exploring GHO dataset for bivariate associations with MIC.