| Literature DB >> 35647556 |
Fabian Berns1, Jan Hüwel1, Christian Beecks1,2.
Abstract
Gaussian process models (GPMs) are widely regarded as a prominent tool for learning statistical data models that enable interpolation, regression, and classification. These models are typically instantiated by a Gaussian Process with a zero-mean function and a radial basis covariance function. While these default instantiations yield acceptable analytical quality in terms of model accuracy, GPM inference algorithms automatically search for an application-specific model fitting a particular dataset. State-of-the-art methods for automated inference of GPMs are searching the space of possible models in a rather intricate way and thus result in super-quadratic computation time complexity for model selection and evaluation. Since these properties only enable processing small datasets with low statistical versatility, various methods and algorithms using global as well as local approximations have been proposed for efficient inference of large-scale GPMs. While the latter approximation relies on representing data via local sub-models, global approaches capture data's inherent characteristics by means of an educated sample. In this paper, we investigate the current state-of-the-art in automated model inference for Gaussian processes and outline strengths and shortcomings of the respective approaches. A performance analysis backs our theoretical findings and provides further empirical evidence. It indicates that approximated inference algorithms, especially locally approximating ones, deliver superior runtime performance, while maintaining the quality level of those using non-approximative Gaussian processes.Entities:
Keywords: Gaussian processes; Machine learning; Probabilistic machine learning
Year: 2022 PMID: 35647556 PMCID: PMC9123926 DOI: 10.1007/s42979-022-01186-x
Source DB: PubMed Journal: SN Comput Sci ISSN: 2661-8907
Major distinctive, conceptual properties of state-of-the-art, automated GPM inference algorithms
| Algorithm | Global approx. | Local approx. | Multidim. input | Max. | |
|---|---|---|---|---|
| CKS | ||||
| ABCD | ||||
| SKC | ||||
| LARGe | ||||
| 3CS | ||||
| LGI |
Used benchmark datasets
| # | Dataset | Size | Dimensions |
|---|---|---|---|
| 1 | Airline [ | 144 | 2 |
| 2 | Hardware [ | 209 | 7 |
| 3 | Solar Irradiance [ | 391 | 2 |
| 4 | AutoMPG [ | 392 | 8 |
| 5 | Mauna Loa [ | 702 | 2 |
| 6 | Energy [ | 768 | 9 |
| 7 | SML Systema [ | 4,137 | 24 |
| 8 | Power Plantb [ | 9,568 | 5 |
| 9 | GEFComc [ | 38,064 | 2 |
| 10 | Jena Weatherd [ | 420,551 | 15 |
| 11 | Household Energye [ | 2,075,259 | 9 |
aDimension “Carbon dioxide in ppm (room)” was used as uni-dimensional target Y
b Dimension “Electrical power output ()” was used as uni-dimensional target Y
c We chose one of the 20 utility zones (i.e. the first one) as the informational content among zones may be considered equivalent [21]
dDimension “Air_Temperature” was used as uni-dimensional target Y. We are using the recognized interval of this continuously gathered dataset from 2009-01-01 to 2016-12-31 (cf. [11])
eDimension “Global_active_power” was used as uni-dimensional target Y
Median accuracy of models resulting from different automatic GPM inference algorithms in terms RMSE
| Dataset | CKS | ABCD | SKC | 3CS | LARGe | LGI |
|---|---|---|---|---|---|---|
| Airline | 0.0438 | 0.0439 | 0.3350 | 0.0438 | 0.0438 | 0.0565 |
| Solar Irr. | 0.0877 | 0.0892 | 0.2689 | 0.0887 | 0.1133 | 0.1201 |
| Mauna L. | 0.0030 | 0.0034 | 0.4901 | 0.0032 | 0.0033 | 0.0042 |
| SML S. | 0.0090 | 0.0089 | 0.0591 | 0.0089 | 0.0090 | 0.0273 |
| Power Pl. | 0.2240 | 0.2240 | 0.2878 | 0.2252 | 0.2257 | 0.2460 |
| GEFCom | – | – | – | 0.0093 | 0.0094 | 0.0246 |
| Jena W. | – | – | – | 0.0024 | 0.0023 | 0.0035 |
| House. En. | – | – | – | 0.0185 | 0.0186 | 0.0221 |
| Hardware | 0.0352 | – | 0.0848 | – | 0.0066 | 0.0303 |
| Auto MPG | 0.1015 | – | 0.3971 | – | 0.1025 | 0.1239 |
| Energy | 0.0206 | – | 0.4846 | – | 0.0360 | 0.0701 |
| SML S. | 0.0121 | – | 0.0646 | – | 0.0191 | 0.0549 |
| Power Pl. | 0.0615 | – | 0.4980 | – | 0.0555 | 0.0833 |
| Jena W. | – | – | – | – | 0.0007 | 0.0001 |
| House. En. | – | – | – | – | 0.0032 | 0.0032 |
Median efficiency of different automatic GPM inference algorithms in terms of runtime (Format HH:MM:SS)
| Dataset | CKS | ABCD | SKC | 3CS | LARGe | LGI |
|---|---|---|---|---|---|---|
| Airline | 00:00:41 | 00:00:55 | 00:00:26 | 00:00:14 | 00:00:13 | 00:00:02 |
| Solar Irr. | 00:01:14 | 00:00:46 | 00:00:16 | 00:00:11 | 00:00:25 | 00:00:09 |
| Mauna L. | 00:01:15 | 00:03:43 | 00:00:45 | 00:01:07 | 00:00:34 | 00:00:25 |
| SML S. | 03:51:53 | 04:56:29 | 00:35:20 | 00:04:00 | 00:04:50 | 00:01:59 |
| Power Pl. | 36:16:02 | 46:34:21 | 05:38:25 | 00:14:07 | 00:16:55 | 00:06:00 |
| GEFCom | – | – | – | 00:49:23 | 00:42:09 | 00:12:49 |
| Jena W. | – | – | – | 07:43:30 | 05:35:16 | 04:51:56 |
| House. En. | – | – | – | 37:17:50 | 65:34:23 | 20:26:09 |
| Hardware | 00:00:06 | – | 00:00:25 | – | 00:00:16 | 00:00:02 |
| Auto MPG | 00:01:35 | – | 00:00:28 | – | 00:00:34 | 00:00:24 |
| Energy | 00:05:30 | – | 00:01:25 | – | 00:01:00 | 00:00:42 |
| SML S. | 08:14:34 | – | 00:56:45 | – | 00:06:16 | 00:05:59 |
| Power Pl. | 26:35:09 | – | 08:49:59 | – | 00:12:00 | 00:07:41 |
| Jena W. | – | – | – | – | 06:37:40 | 14:54:04 |
| House. En. | – | – | – | – | 25:35:26 | 12:01:54 |
Fig. 1Distribution of relative runtime for different automatic GPM inference algorithms
Fig. 2Distribution of model error in terms of RMSE per model resulting from different automatic GPM inference algorithms