Literature DB >> 8963468

Nonlinear gated experts for time series: discovering regimes and avoiding overfitting.

A S Weigend1, M Mangeas, A N Srivastava.   

Abstract

In the analysis and prediction of real-world systems, two of the key problems are nonstationarity (often in the form of switching between regimes) and overfitting (particularly serious for noisy processes). This article addresses these problems using gated experts, consisting of a (nonlinear) gating network, and several (also nonlinear) competing experts. Each expert learns to predict the conditional mean, and each expert adapts its width to match the noise level in its regime. The gating network learns to predict the probability of each expert, given the input. This article focuses on the case where the gating network bases its decision on information from the inputs. This can be contrasted to hidden Markov models where the decision is based on the previous state(s) (i.e. on the output of the gating network at the previous time step), as well as to averaging over several predictors. In contrast, gated experts soft-partition the input space, only learning to model their region. This article discusses the underlying statistical assumptions, derives the weight update rules, and compares the performance of gated experts to standard methods on three time series: (1) a computer-generated series, obtained by randomly switching between two nonlinear processes; (2) a time series from the Santa Fe Time Series Competition (the light intensity of a laser in chaotic state); and (3) the daily electricity demand of France, a real-world multivariate problem with structure on several time scales. The main results are: (1) the gating network correctly discovers the different regimes of the process; (2) the widths associated with each expert are important for the segmentation task (and they can be used to characterize the sub-processes); and (3) there is less overfitting compared to single networks (homogeneous multilayer perceptrons), since the experts learn to match their variances to the (local) noise levels. This can be viewed as matching the local complexity of the model to the local complexity of the data.

Mesh:

Year:  1995        PMID: 8963468     DOI: 10.1142/s0129065795000251

Source DB:  PubMed          Journal:  Int J Neural Syst        ISSN: 0129-0657            Impact factor:   5.866


  2 in total

1.  Computational approaches to motor learning by imitation.

Authors:  Stefan Schaal; Auke Ijspeert; Aude Billard
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2003-03-29       Impact factor: 6.237

2.  A Generalized Mixture Framework for Multi-label Classification.

Authors:  Charmgil Hong; Iyad Batal; Milos Hauskrecht
Journal:  Proc SIAM Int Conf Data Min       Date:  2015
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.