Literature DB >> 28090147

Regression shrinkage and neural models in predicting the results of 400-metres hurdles races.

K Przednowek¹, J Iskra², A Maszczyk³, M Nawrocka³.

Abstract

This study presents the application of regression shrinkage and artificial neural networks in predicting the results of 400-metres hurdles races. The regression models predict the results for suggested training loads in the selected three-month training period. The material of the research was based on training data of 21 Polish hurdlers from the Polish National Athletics Team Association. The athletes were characterized by a high level of performance. To assess the predictive ability of the constructed models a method of leave-one-out cross-validation was used. The analysis showed that the method generating the smallest prediction error was the LASSO regression extended by quadratic terms. The optimal model generated the prediction error of 0.59 s. Otherwise the optimal set of input variables (by reducing 8 of the 27 predictors) was defined. The results obtained justify the use of regression shrinkage in predicting sports outcomes. The resulting model can be used as a tool to assist the coach in planning training loads in a selected training period.

Entities: Chemical Disease Gene Species

Keywords: 400-metres hurdles; Neural modelling; Predicting in sport; Regression shrinkage

Year: 2016 PMID： 28090147 PMCID： PMC5143778 DOI： 10.5604/20831862.1224463

Source DB: PubMed Journal: Biol Sport ISSN： 0860-021X Impact factor: 2.806

INTRODUCTION

In times of ubiquitous information technology coaches and athletes are able to use advanced mathematical methods in modelling the training process. These methods certainly include advanced regression models and artificial neural networks (ANN) and they are applied in various aspects of optimization of sports training [1, 2, 3]. One of the most complex athletic competitions is the 400-metres hurdles. It combines the elements of all motor abilities [4]. Research on hurdles is mainly concerned with kinematics [5], physiology and biochemistry of performance [6, 7] and the impact of external factors (e.g., wind) on the results in the 400-metres hurdles [8]. The problem of training loads is a topic rarely discussed in the context of training hurdlers. In many sport events, the planning of the training process is conducted on the basis of the practical experience of the coach, and it lacks a scientific background. Therefore, it is very important to find and verify different ways of training plan optimization. Thanks to the advanced computational methods, the expected level of athlete’s development can be modelled and the optimal training load can be generated, so that the athlete can achieve the desired result [9, 10, 11]. Selection of a suitable training load taking into account the intensity and amount of work to be done should also include the individual capabilities of the athlete’s body, as well as its reaction to the applied load. If this principle is not followed, the body can be overloaded (overtrained), and the further development of the athlete’s abilities may be inhibited. There is a lot of research describing the use of complex linear and non-linear models (including ANN) in sport training [12]. An example of a practical solution is the research by Maszczyk [13], which shows the linear regression models and ANN used for selecting future competitors in javelin throw. Many applications concern the planning and optimization of training loads [2, 9, 14]. In the paper by Przednowek and Wiktorowicz [10] the shrinkage model performed the tasks of predicting results in race walking. The rise in levels of achievement in sports leads to the creation of new opportunities for coaches and athletes through the modelling of sports training. The aim of this study is to examine regression shrinkage models and artificial neural networks in predicting the results of 400-metres hurdles races in a three-month training period. This examination is based on training data of athletes whose level of sport abilities was very high.

MATERIALS AND METHODS

The material of the research was based on training data of 21 Polish hurdlers taking part in competitions between 1989 and 2011, who were distinguished by their high level of performance (score for 400-metres hurdles: 51.26±1.24 s). The competitors were members of the Polish National Team representing Poland at the Olympic Games, and World and European Championships in the age categories of juniors, colts and seniors. To build the models of predicting the result of 400-metres hurdles, 28 variables were used. The set of variables included one dependent variable and 27 independent variables. The dependent variable was the result of run (y 1). Independent variables included athlete’s parameters (x1 - x3), variables representing training periods (x4 - x5), training loads developing speed (x6 - x8), training loads developing endurance (x9 - x12), training loads developing run endurance (x13 - x14), training loads developing strength (x15 - x21) and training loads developing technique and rhythm (x22 - x27). Table 1 presents the variables under consideration and their basic statistics.

TABLE 1

Characteristics of the variables used to construct the models.

Variable	Description	X¯	x_min	x_max	sd	V
y	Expected result in 500 m (s)	65.2	60.9	71.2	2.1	3.2
x₁	Age (years)	22.3	19.0	27.0	2.0	8.8
x₂	BMI	21.7	19.7	24.1	1.0	4.7
x₃	Current result in 500 m (s)	66.4	61.5	72.1	2.0	3.0
x₄	General preparation period*	-	-	-	-	-
x₅	Special preparation period*	-	-	-	-	-
x₆	Maximal speed (m)	1395	0	4300	799	57.3
x₇	Technical speed (m)	1748	0	7550	1293	74.0
x₈	Technical and speed exercises (m)	1418	0	5100	840	59.2
x₉	Speed endurance (m)	4218	0	93670	7985	189.3
^x₁₀	Specific hurdle endurance (m)	4229	0	13700	2304	54.5
^x₁₁	Pace runs (m)	54599	0	211400	37070	67.9
^x₁₂	Aerobic endurance (m)	121086	4800	442100	75661	62.5
^x₁₃	Strength endurance I (m)	8690	0	31300	6806	78.3
^x₁₄	Strength endurance II (amount)	1999.8	0	21350	2616	130.8
^x₁₅	General strength of lower limbs (kg)	41353	0	216100	35566	86.0
^x₁₆	Directed strength of lower limbs (kg)	19460	0	72600	12540	64.4
^x₁₇	Specific strength of lower limbs (kg)	13887	0	156650	16096	115.9
^x₁₈	Trunk strength (amount)	15480	0	200000	21921.4	141.6
^x₁₉	Upper body strength (kg)	1102	0	24960	2121	192.5
^x₂₀	Explosive strength of lower limbs (amount)	274.7	0	1203	190.4	69.3
^x₂₁	Explosive strength of upper limbs (amount)	147.9	0	520	116.1	78.5
^x₂₂	Technical exercises – walking pace (min)	141.6	0	420	109.7	77.5
^x₂₃	Technical exercises – running pace (min)	172.9	0	920	135.7	78.4
^x₂₄	Runs over 1-3 hurdles (amount)	31.9	0	148	30.3	95.0
^x₂₅	Runs over 4-7 hurdles (amount)	56.5	0	188	51.6	91.3
^x₂₆	Runs over 8-12 hurdles (amount)	50.5	0	232	52.7	104.3
^x₂₇	Hurdle runs in varied rhythm (amount)	285.7	0	1020	208.7	73.0

– in accordance with the rule of introducing a qualitative variable of a “training period type” with the value of general preparation period, special preparation period and starting period was replaced with two variables, x4 and x5, holding the value of 1 or 0.

Characteristics of the variables used to construct the models. – in accordance with the rule of introducing a qualitative variable of a “training period type” with the value of general preparation period, special preparation period and starting period was replaced with two variables, x4 and x5, holding the value of 1 or 0. The collected data involved 144 training programmes. The registered training was used in one of the three periods during the annual cycle of training, lasting three months each (general preparation, special preparation and starting period). As the inputs, in addition to training loads, parameters and the current result of the competitor are used (Fig. 1). The system generates the predicted result for a 500 m run which will be obtained by a competitor after the completion of the proposed training. This allows the coach to observe the possible effects of the modification of individual values of training loads.

FIG. 1

The model of predicting the result for 400-metres hurdles races.

The model of predicting the result for 400-metres hurdles races. Prediction of the result in terms of the period of training for the 400-metres hurdles requires a definition of the training indicator. This is due to the fact that the use of a track test for the 400-metres hurdles in each of the analysed periods of the annual cycle is technically impossible (e.g., in winter). Therefore the result of a 500 m flat run was used as the criterion over the selected periods [15, 16]. The correlation between the results obtained for 500 m and 400-metres hurdles races during the starting period is very strong (r = 0.84), demonstrating the statistic significance level α = 0.001 (Fig. 2). Therefore it confirms the validity of choosing the result for 500 m as a dependent variable in the construction of models.

FIG. 2

Correlation between 500 m flat run and 400-metres hurdle races.

Methods of regression shrinkage

While the number of input variables is greater than the number of patterns, or the input variables are correlated, we say that the problem is ill-posed or ill-conditioned. There are many methods used to improve the conditioning of the problem, one of them being regression shrinkage [17], which includes ridge, LASSO and elastic net regression. In ridge regression models [18] a parameter λ is selected, which determines the additional penalty associated with the regression coefficients. The larger the λ parameter, the greater is the penalty imposed on the weights. The solution is obtained solving: In LASSO regression as well as in ridge regression a penalty is added to the quality criterion. The difference is that in ridge regression the penalty is the sum of squares, while in the LASSO model the penalty has the form of the sum of absolute values [19]. The solution is obtained solving To implement the LASSO regression the LARS algorithm is used [20]. In this algorithm the penalty depends on the parameter s, whose value ranges from 0 to 1. Using these methods we can also select the optimal input set. Elastic net regression combines the functionality of LASSO and ridge regression [21]. In this regression, there are two parameters (λ1, λ2) determining the penalties imposed on the model parameters. To solve the problem the LARS-EN algorithm is used, which is a modified LARS algorithm. In this algorithm the penalty is decided by the parameter s (as in LASSO) and the parameter λ.

Extensions of the linear model

In order to obtain better prediction, a model consisting of the linear and non-linear part was calculated. The linear part is equal to the best linear regression, and the nonlinear part has the form of a quadratic function of selected predictors (x1, x2, x3 – parameters of the athlete). The extended model has the following form: All regression models (along with modification and evaluation) were calculated in the R programming language using extra packages [22].

Neural models

In this study the Multilayer Perceptron (MLP) is used, which is the most common type of artificial neural networks [23]. It requires iterative learning that is sometimes time-consuming, but the resulting networks are small (in structure) and quick, and give satisfactory results. Learning of networks was implemented using the BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm, and exponential (exp) and hyperbolic tangent (tanh) functions were used as the activation functions of hidden neurons. All the analysed networks have only one hidden layer. The Statistica 10 software was used for calculations [24].

Selection and validation of models

For selecting and evaluating calculated models the method of crossvalidation (CV) [25] was applied. In this method, the data set is divided into two subsets: learning and testing (validation). The first of them is used to build the model, and the second one to evaluate its prediction ability. There are a few types of cross-validation; in this paper leave-one-out cross-validation (LOOCV) was chosen. In LOOCV the test set is composed of a selected pair of data (x, x), and the number of tests is equal to the number of data n. During the cross-validation process two errors were calculated: where: n – number of patterns, y–1 – the output value of the model built in the i-th step of cross-validation based on a data set containing no testing pair (x1, y1), – the output value of the model built in the i-th step based on the full data set, RMSE – root mean square error of prediction, RMSE – root mean square error of training. In addition to the prediction error, which was the main criterion, the training error was calculated. This error describes how the model matches the data.

RESULTS

Regression shrinkage

The ordinary least squares (OLS) method generates the prediction error RMSE = 0.72 s and training error RMSE = 0.57 s. The calculated model has the following form: In order to obtain a smaller prediction error, the nonlinear terms, in the form of square functions, were added to the OLS regression. After the extension, cross-validation was conducted again. The extended model has the errors RMSE = 0.63 and RMSE_ = 0.62. Thus it can be assumed that non-linearity can improve the predictive ability of the OLS model. The coefficients of the non-linear part are The first model of regression shrinkage type is ridge regression. This model, is calculated to find a value of λ for which the model obtains the smallest prediction error. The ridge regression models were calculated for parameter λ in the range from 0 to 20 with steps of 0.1 (Fig. 3). Based on these results, it was found that the best ridge regression is achieved for λ = 3 with RMSE = 0.71. The training error was RMSE = 0.57. Similarly as in the case of OLS regression, all weights are non-zero, and thus all the input variables are involved in determining the output of the model.

FIG. 3

Prediction and training errors for ridge regression; red line marks the best model. inner axis represents the errors

Prediction and training errors for ridge regression; red line marks the best model. inner axis represents the errors The obtained ridge regression model for predicting the result of 400-metres hurdles races has the form of: As in the case of OLS regression, the obtained ridge regression model was also modified. The modified model has the prediction error RMSE = 0.61 and training error RMSE = 0.60. The weights of the non-linear part are The next model from the regression shrinkage type is LASSO regression. The model was cross-valuated for s equal from 0 to 1 with s = 0.76 steps of 0.01 (Fig. 4). The best LASSO model was obtained for with the error RMSE = 0.67. The training error is similar as in the case of ridge regression and is equal to RMSE = 0.58. Additionally, the LASSO method eliminated the predictors x2, x5, x8, x11, x15, x16, x23, x25 (coefficients equal 0), so the model has the form:

FIG. 4

Prediction and training errors for LASSO regression; red line marks the best model.

Prediction and training errors for LASSO regression; red line marks the best model. The modification of the LASSO model was made involving only two variables (x1, x3). Parameter x2 was not included, because the weight of this parameter in the linear part equals 0. The modified LASSO model generated errors RMSE = 0.59 and RMSE = 0.59. The coefficients of the non-linear part are equal to The application of elastic net regression failed to bring any improvement in reducing the prediction error. The best elastic net model was obtained for the pair of parameters s = 0.76 and λ = 0. The results of cross-validation are shown in Figure 5. Due to the fact that the parameter λ is zero, the model is reduced to LASSO regression.

FIG. 5

Prediction and training errors for elastic net regression.

Artificial neural networks

The neural network models were cross-validated for the number of neurons in the hidden layer changing from 1 to 10. Based on the results presented in Figure 6, the optimal structure of the model was chosen with one neuron in the hidden layer and an exponential function of activation. This network generates the prediction error of RMSECV= 0.72 and training error RMSET= 0.56.

FIG. 6

Prediction and training errors for MLP with: (a) exp function, (b) tanh function; outer axis represents number of hidden neurons, inner axis represents the errors

Prediction and training errors for MLP with: (a) exp function, (b) tanh function; outer axis represents number of hidden neurons, inner axis represents the errors For comparison of the models, RMSE and RMSE errors gener-ated for each method were put together (Table 2).

TABLE 2

Summary of results.

Methods	RMSE _CV[s]	RMSE_T[s]
OLS (ordinary least squares regression)	0.72	0.57
OLS with nonlinear part	0.63	0.62
Ridge regression (λ = 3)	0.71	0.57
Ridge regression with nonlinear part	0.61	0.60
LASSO regression (s = 0.76)	0.67	0.58
LASSO regression with nonlinear part	0.59	0.59
Elastic net regression (s = 0.76, λ = 0)	0.67	0.58
MLP (tanh) 26-1-1*	0.73	0.56
MLP (exp) 26-1-1*	0.72	0.56

Note: * – network architecture (number of neurons in the following layers: input-hidden-output).

Summary of results. Note: * – network architecture (number of neurons in the following layers: input-hidden-output).

DISCUSSION

In this article the effectiveness of the use of regression shrinkage and artificial neural networks in predicting the outcome of competitors training for the 400-metres hurdles was verified. The best model validated using LOOCV was the LASSO extended by quadratic terms. The resulting model generates the prediction error RMSE = 0.59, which confirms the validity of this method in the implementation of the task. Using this model in practice allows for the optimal selection of training loads, and thus supports the achievement of the desired result. The task of predicting outcomes is, from the coach’s point of view, very important in the process of sports training. Using the constructed model, a coach can predict how the training will affect the sports result. These models perform predictions based on the proposed training introduced as the sum of the training loads of each training means applied at a given training phase. However, the results obtained by neural networks turned out to be disappointing. It is noted that the use of more than one neuron in the hidden layer causes a rapid increase in the prediction error (Fig. 6), in particular, for the exponential activation function (the values outside the range 0-10 s are cut off in order to better illustrate the dependence). Looking at the available studies, it can be stated that neural models are more often used in the implementation of the tasks of predicting sports results than linear regression [1, 2, 9, 12, 13]. In most of these works ANN are characterized by a smaller prediction error. When analyzing the results for the prediction of the outcome for the 400-metres hurdles in terms of the selected training period, it is noted that this situation is not reflected here. As has been demonstrated in numerous studies the final result is influenced by different factors including coordination capacity, techniques of jumping hurdles and its specific rhythm [26, 27, 28, 29]. All these aspects are important, yet, the results obtained in this paper suggest that the development of a sports result is not directly dependent on the variables BMI and a variable representing the period of special preparation. The eliminated training means included: technical and speed exercises, pace runs, general strength of lower limbs, directed strength of lower limbs, technical exercises – running pace, and runs over 4-7 hurdles (Table 1). All these training means belong to the “target” groups [26]. The results of the analysis confirm the views of researchers of competitive sports, claiming that in highly qualified training these exercises should be limited and the focus should be placed on special training. It is a mistake, however, to draw conclusions leading to statements that the previously mentioned groups of training means should not be used. Similar conclusions were drawn by Iskra [26] by examining the correlation between the sports level of 400-metres hurdles and the size of training loads. He showed that the previously mentioned training means are characterized by a lack or a very small dependence of the final result for the 400-metres hurdles at a selected stage of the annual cycle. An exception is the exercises strengthening lower limbs and directed strength of lower limbs, where the statistical significance of the correlation with the final result has been demonstrated, unlike in the present study. In the literature [30, 31], we can find some other attempts to describe the correlation between the training means and the final result for the 400-metres or 110 m hurdles. However, in these analyses data came from a record season. Such an analysis can lead to erroneous conclusions because the sports level of the competitor is often shaped within a long period of time (macrocycle) [32]. In prediction using the resulting LASSO model the result is expected not only on the basis of the suggested training loads but also on the basis of the current result over 500 m, which is, however, shaped within a long macrocycle. Therefore, a small error and the idea of this solution lead to including this model among tools supporting planning training loads in a selected period of the annual cycle. In practice, this allows for prediction of the effects of completing the suggested training and if they are not satisfactory, it enables making necessary corrections before the start of the planned training.

8 in total

1. Application of regression and neural models to predict competitive swimming performance.

Authors: Adam Maszczyk; Robert Roczniok; Zbigniew Waśkiewicz; Miłosz Czuba; Kazimierz Mikołajec; Adam Zajac; Arkadiusz Stanula
Journal: Percept Mot Skills Date: 2012-04

2. Artificial neural networks as a tool of modeling of training loads.

Authors: Igor Rygula
Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2005

3. The use of neural network technology to model swimming performance.

Authors: António José Silva; Aldo Manuel Costa; Paulo Moura Oliveira; Victor Machado Reis; José Saavedra; Jurgen Perl; Abel Rouboa; Daniel Almeida Marinho
Journal: J Sports Sci Med Date: 2007-03-01 Impact factor: 2.988

4. Applications of neural networks in training science.

Authors: Mark Pfeiffer; Andreas Hohmann
Journal: Hum Mov Sci Date: 2011-02-18 Impact factor: 2.161

5. A mathematical analysis of the bioenergetics of hurdling.

Authors: A J Ward-Smith
Journal: J Sports Sci Date: 1997-10 Impact factor: 3.337

6. The effects of sprint (300 m) running on plasma lactate, uric acid, creatine kinase and lactate dehydrogenase in competitive hurdlers and untrained men.

Authors: B Klapcińska; J Iskra; S Poprzecki; K Grzesiok
Journal: J Sports Med Phys Fitness Date: 2001-09 Impact factor: 1.637

7. External effects in the 400-m hurdles race.

Authors: Mike D Quinn
Journal: J Appl Biomech Date: 2010-05 Impact factor: 1.833

8. The effects of a maximal power training cycle on the strength, maximum power, vertical jump height and acceleration of high-level 400-meter hurdlers.

Authors: Carlos Balsalobre-Fernández; Carlos M Tejero-González; Juan Del Campo-Vecino; Dionisio Alonso-Curiel
Journal: J Hum Kinet Date: 2013-03-28 Impact factor: 2.193