Literature DB >> 35309546

Predicting ratings of perceived exertion in youth soccer using decision tree models.

Jakub Marynowicz^1,2, Mateusz Lango³, Damian Horna³, Karol Kikut², Marcin Andrzejewski⁴.

Abstract

The purpose of this study was to determine the effectiveness of white-box decision tree models (DTM) for predicting the rating of perceived exertion (RPE). The second aim was to examine the relationship between RPE and external measures of intensity in youth soccer training at the group and individual level. Training load data from 18 youth soccer players were collected during an in-season competition period. A total of 804 training observations were undertaken, with a total of 43 ± 17 sessions per player (range 12-76). External measures of intensity were determined using a 10 Hz GPS and included total distance (TD, m/min), high-speed running distance (HSR, m/min), PlayerLoad (PL, n/min), impacts (n/min), distance in acceleration/deceleration (TD ACC/TD DEC, m/min) and the number of accelerations/decelerations (ACC/DEC, n/min). Data were analysed with decision tree models. Global and individualized models were constructed. Aggregated importance revealed HSR as the strongest predictor of RPE with relative importance of 0.61. HSR was the most important factor in predicting RPE for half of the players. The prediction error (root mean square error [RMSE] 0.755 ± 0.014) for the individualized models was lower compared to the population model (RMSE 1.621 ± 0.001). The findings demonstrate that individual models should be used for the assessment of players' response to external load. Furthermore, the study demonstrates that DTM provide straightforward interpretation, with the possibility of visualization. This method can be used to prescribe daily training loads on the basis of predicted, desired player responses (exertion).

Entities: Chemical

Keywords: Fatigue; GPS; RPE; Team sport; Training load; Training monitoring

Year: 2021 PMID： 35309546 PMCID： PMC8919883 DOI： 10.5114/biolsport.2022.103723

Source DB: PubMed Journal: Biol Sport ISSN： 0860-021X Impact factor: 2.806

INTRODUCTION

Athlete monitoring data provide information to coaching staff about desired (fitness) and undesired (fatigue) training outcomes, thus representing how athletes react to training stimuli [1]. To optimize the process of training, coaches should understand the dose-response relationship [2]. There is a great deal of evidence that appropriate management of training loads is effective in improving physical performance [3], reducing the risk of injury and illness [4], and minimizing the risk of non-functional overreaching [5]. When it comes to training and match loads, it is common to distinguish between the external and internal load [6]. External load is the work completed by an athlete independently of internal responses [7], whereas internal load represents the psychophysiological stress experienced by the player in response to external stimuli. Training load monitoring in team sports is difficult to achieve because: a) various exercises have different physiological and mechanical requirements; b) individual physical and physiological responses to the same external workload can be different [8], which is caused by individual characteristics of the players, such as physical fitness. Monitoring both internal and external loads is important in soccer because of individualized responses to the same external load [9]. Furthermore, research shows a discrepancy between the exertion intended and observed by the coach and exertion as perceived by the player [10]. In running-based team sports, one of the most valid and reliable tools for monitoring metrics related to external training load is the global positioning system (GPS) [11]. Use of this system is becoming increasingly common because of its great practicality [12]. Some of the metrics provided by this system that practitioners can use to analyse activity profiles of athletes include: total distance covered, distance covered at different speed zones, and the number of accelerations and decelerations [13]. Internal training load can be quantified using the rating of perceived exertion (RPE) [14], which is based on the Borg CR10 scale [15], or the Foster modified version of the CR10 [14]. Previous research has established RPE-based methods as a straightforward, valid measurement of internal training load by demonstrating a strong correlation between RPE and other objective internal load measurements such as heart rate and blood lactate [16]. Furthermore, compared to heart rate, RPEs integrate psychological and physiological load experienced by athletes [17], making it a simultaneously simple, versatile, and cost-effective method [18]. Knowledge about athletes’ responses to the external training load in the training process is crucial in the context of effectively prescribing and monitoring training loads [19]. Therefore, it is important to integrate external and internal training load metrics [12]. Monitoring of training at youth level is essential, not only to enable players to reach higher performance levels, but also for preserving athletes’ health in the long term and consequently avoiding early retirement [20]. The relationship between internal and external load measures has been previously studied in adult soccer players at different levels [21-24]. However, all of this previous research investigated the internal-external training load relationship only at the group level, with linear models. Recently, Bartlett et al. [25] proposed an approach to performing this kind of analysis on an individual basis using machine learning techniques (namely artificial neural networks), which appear to be better equipped to predict athlete response to external training load metrics accurately. Unfortunately, the neural networks used in their study are black-box machine learning models, which means that, contrary to linear models, they do not provide any insights for practitioners other than mere predictions. Another problem is that some of the previous research has erroneously used the terms ‘association’ and ‘prediction’ interchangeably [26]. Moreover, the relationship between internal and external measures of training load for youth soccer players has not yet been studied at the individual level. Therefore, the first aim of the present study was to determine the effectiveness of white-box decision tree models for predicting RPE based on GPS-derived external measures of intensity, as well as to attempt a visualization of such a model. The second aim was to examine the relationship between internal load and external measures of intensity in youth soccer training using machine learning techniques at the group and individual level.

MATERIALS AND METHODS

Participants

Eighteen youth soccer players (age 17.81 ± 0.96 years, height 179.47 ± 4.77 cm, body mass 70.94 ± 4.72 kg) participated in the investigation. More than half of the players were members of their youth national teams. The players who participated in this study were competing at the highest level in their under-19 age category soccer league. During the investigation, none of the players were injured. Goalkeepers were not included in the study due to the different physical demands of their position. Although the data obtained for this analysis are part of the athletes’ daily monitoring routine, the study was conducted and fully approved before the start of the assessments by the Health Research Ethics Committee of the institution where the research was conducted. All participants or their parents/guardians were informed of the risks and signed an informed consent form before the investigation.

Design

Training load data were collected during the 2018–2019 in-season competition period. The in-season period was used to minimize variability in physical fitness. Data were only analysed from microcycles which contained one game. All of the analysed training sessions took place during the same part of the day. Only field-based soccer sessions with warm-ups performed on the field were included for the purpose of the study. A typical microcycle during this period included 5–6 field based sessions with a break of 24 hours between consecutive training sessions. Only data from players who performed the full session duration were analysed. Individual rehabilitation and individual fitness sessions were not included in the analysis. All training sessions within the investigation period were performed on the same surface, an outdoor grass training pitch. During rest periods, players were allowed to drink fluids. A total of 804 training observations were made. The number of sessions recorded per player ranged from 12 to 76 with a mean of 43 ± 17 sessions and the mean duration of a training session was 68 ± 15 minutes, with an average temperature of 10.2 ± 3.11°C.

Methodology

The players’ external load during each training session was monitored using a non-differential 10 Hz global positioning system (GPS) integrated with a 400 Hz Triaxial Accelerometer and a 10 Hz Triaxial Magnetometer (PLAYERTEK, Catapult Innovations, Melbourne, Australia). The reliability and validity of these types of GPS devices for use in team sports have been reported [11, 27]. The devices were placed between the players’ scapulae, through a tight vest. Each player wore the same unit for all of the collection period to minimize inter-unit variability [28]. After recording, data were downloaded and analysed using a software package. The internal load was measured using the modified Borg CR-10 scale [14]. Each player’s RPE was collected in isolation ~20 minutes after each training session to eliminate the impact from the last part of the training session [14] and to minimize the influence of peer pressure [29]. The RPE was derived by asking each player “How hard was your session?” with 1 being very, very easy and 10 being maximal exertion. All players were fully familiarized with the use of RPE before the beginning of the study.

External measures of intensity

For the purpose of this study, 8 variables were recorded. Total distance (TD, m), high-speed running distance (HSR, distance above 19.8 km·h-1, m), PlayerLoad (PL, a.u.), impacts (above 3 g, n), distance in acceleration/deceleration (above 2 m·s-2, m) and the number of accelerations/decelerations (ACC/DEC, above 2 m·s-2, n). All variables recorded were reported in relative terms (per minute). The speed threshold for HSR was established in light of previous research [30-31]. PlayerLoad, calculated automatically using an established algorithm, is a measure from tri-axial accelerometers in GPS, and represents the sum of accelerations recorded in the anteroposterior, mediolateral, and vertical planes of movement. Research has shown that PlayerLoad is a valid and reliable measure [24, 32]. Player impact data extracted from triaxial accelerometers measure significant impact events (e.g. collision activities), but exclude footsteps when walking or running. Impact was defined as maximum accelerometer magnitude values above 3 g in a 0.1 second period. Acceleration is defined as an increase in speed for at least 0.5 s that exceeds a maximum acceleration of at least 2 m·s-2. The same approach was used with regard to deceleration, which is defined as a decrease in speed for at least 0.5 s that exceeds a maximum deceleration of at least -2 m·s-2.

Statistical analysis

Decision trees were used to analyse the collected data. These commonly used, non-parametric, and non-linear statistical models applied in machine learning [33]. Tree models offer a number of advantages, including interpretability, automatic capture of interactions between predictors, efficient training and prediction procedures, and accurate modelling. Such advantages have led to their widespread use in many areas, for instance, in medicine [34] and psychology [35]. A more extended discussion and comparison with, e.g., linear models has been undertaken by Breiman [36]. Comparing regression trees to the neural networks used in related works [25], the former offer not only more accurate predictions, but also the possibility to visualize the model. This allows for subsequent expert analysis, which is similar to linear models. The regression trees were constructed using the CART procedure [37]. Mean squared error (MSE) was selected as a splitting criterion since it is most commonly used for the regression task. The standard procedure to make decision tree models more accurate and interpretable is tree pruning [37]. In this study we used an early stopping criterion on tree depth as a form of pre-pruning. We constructed a global (population) tree model, which was fitted with observations for all the players, and eighteen individualized models built for each particular athlete. Feature importance was calculated as normalized total reduction of MSE by feature in the regression tree and estimated with bootstrapping [38]. Variables included in the models were selected with a combination of expert knowledge [13] regarding their practicality during training planning and maintaining a variance inflation factor (VIF) of < 5 to avoid multi-collinearity. In order to assess the performance of our model root mean squared error (RMSE) was used. RMSE is the standard deviation of the residuals (prediction errors) and is frequently used to measure the error of a model in predicting quantitative data. Statistical analysis was conducted with scikit-learn software (Python).

RESULTS

Mean intensity measures are presented in Table 1. Figure 1 shows the distribution of RPE values. Figure 2 shows a decision tree model used for prediction of RPE with seven leaf nodes and depth = 3. Results from the decision tree regression for the entire group are shown in Figure 3 (population model). The aggregated importance of each intensity variable across all players revealed high-speed running distance per minute as the strongest predictor of RPE, with a relative importance of 0.61. The prediction error (RMSE) of the population tree model is 1.621 ± 0.001. Figure 4 represents the normalized importance (%) of each training intensity variable in individualized models for each player. The obtained results demonstrate that the strongest predictor of RPE was also high-speed running distance per minute. This variable has the highest importance score for half of the players (9/18). The number of impacts per minute and number of accelerations per minute were the strongest RPE predictors for four players, whilst distance per minute was the strongest predictor for only one player (#15). The prediction error was lower compared to the model for the entire group and accounted for RMSE of 0.755 ± 0.014.

TABLE 1

Descriptive statistics of collected data.

Variable	Mean	SD
External intensity measures
RPE	4.6	1.9
Distance (m) per minute	71.7	14.6
PlayerLoad (a.u.) per minute	3.8	0.8
Impacts (n) per minute	2.5	2.0
High-speed running distance (m) per minute	3.0	3.8
Distance in deceleration (m) per minute	3.0	1.0
Distance in acceleration (m) per minute	2.4	0.7
Accelerations (n) per minute	2.3	0.6
Decelerations (n) per minute	2.2	0.6

FIG. 1

Distribution of the rating of perceived exertion (RPE) values.

FIG. 2

Decision tree regression model for RPE.

Abbreviation: ACC, acceleration; HSR, high-speed running distance; TD, total distance; MSE, mean squared error.

FIG. 3

Feature importance in the decision tree regression model constructed for the entire group.

Abbreviation: ACC, acceleration; HSR, high-speed running distance; TD, total distance.

FIG. 4

Normalized importance (%) of each training intensity variable for each player.

Abbreviation: ACC, acceleration; HSR, high-speed running distance; TD, total distance.

Descriptive statistics of collected data. Distribution of the rating of perceived exertion (RPE) values. Decision tree regression model for RPE. Abbreviation: ACC, acceleration; HSR, high-speed running distance; TD, total distance; MSE, mean squared error. Feature importance in the decision tree regression model constructed for the entire group. Abbreviation: ACC, acceleration; HSR, high-speed running distance; TD, total distance. Normalized importance (%) of each training intensity variable for each player. Abbreviation: ACC, acceleration; HSR, high-speed running distance; TD, total distance.

DISCUSSION

The relationship between internal load and external measures of training intensity is important in understanding the dose–response nature of youth soccer players’ training. According to the current body of knowledge concerning individual physical and physiological responses to the same external workload, responses at the group level are helpful in understanding overall relationships between external and internal training load. A more detailed analysis at the individual level should be carried out to in order to inform decision making [39]. The first aim of the present study was to determine the effectiveness of decision tree models for predicting RPE, based on GPS-derived, external measures of intensity, as well as a visualization of this model. The second aim was to examine the relationship between RPE and external measures of intensity in youth soccer training at group and individual levels. This was achieved through machine learning techniques. As mentioned above, the most common approaches found in sports science literature for quantifying the relationships between external and internal training load are based on traditional statistical methods, which are linear models [9, 22, 24]. The novel method proposed in the present study is to use decision tree models for quantifying these relationships and prediction of RPE based on GPS-derived external measures of intensity. The root (root node) of the tree in the present model is the question pertaining to high-speed running distance per minute. This is the starting point for the process of RPE value prediction. In our model, the first question is whether high-speed running distance per minute is above or below 6.2 m per minute. Branches represent potential answers. These are points where one of two options must be selected. The leaves of a decision tree are the decisions made and represent the value of predicted RPE at a particular level. RPE value on the last leaf can thus be predicted based on external measures of training intensity. The prediction error of every leaf is presented in the figure and expressed as mean squared error (MSE). It is worth noting that the lowest available value of RPE in this model has the lowest MSE among all those possible. We concluded that if a player reported a low value of RPE, this is strictly reflected in external intensity measures. Thus, after reporting a low value of RPE, subsequent training loads can be planned with confidence. MSE is the highest for average values of RPE (MSE = 3.037 for RPE = 4.98), which could suggest that players find it difficult to determine medium exertion on Borg’s scale. The comparison of prediction error (RMSE) of the predictive tree model for group (1.621 ± 0.001) and for the individual player (0.755 ± 0.014) confirm that for the assessment of players’ response to the external load, individual models should be used [40]. These results are in line with previous research on Australian football players. That study showed that individual artificial neural networks (ANN) demonstrated a better ability to predict RPE from external training load metrics, compared to the group model [25]. In contrast, group models turned out to predict RPE with an equivalent or superior accuracy compared with individual models in professional soccer [41]. It is worth mentioning that in Australian rules football the relationships between RPE and GPS-derived variables were quantified using at the same time both external load metrics (total distance, high-speed running distance) and intensity metrics (session distance per minute, percentage of HSR as a proportion of distance covered). Similarly, in the case of the second study [41], a set of external load measures and intensity related parameters were used at the same time. In the present study the relationships between RPE and GPS-derived variables was quantified using intensity related parameters. Despite these differences, comparison of the RMSE obtained in this study showed significantly lower error for the individual player (RMSE = 0.755 ± 0.014) compared to the previously mentioned study on Australian rules football (RMSE = 1.24 ± 0.41). The relationship between internal and external load measures in soccer and rugby league have been investigated using traditional statistical methods, such as Pearson correlation coefficients, multiple regression, and general linear models with partial correlation coefficient [9, 22, 24], all of which are linear methods. Studies on Australian football [25] and soccer [41] used a machine learning approach to predict players’ responses based on GPS-derived external load measures. In the first study, ANN were used, whilst the latter employed artificial neural networks together with least absolute shrinkage and selection operator (LASSO). LASSO is an interpretable regression model but is linear and does not take into account the fact that data were collected within subjects over time. Jaspers et al. [41] indicated difficulties in the interpretation of the model’s results as a clear disadvantage of ANN. This could indicate an obstacle for practical application of this information in daily practice. We do know, however, that the possibility for a model to be visualized and analysed by stakeholders can increase coaching staff ‘buy-in’, and consequently improve the confidence of decisions made [42]. Moreover, recently studies in soccer [40, 43] attempted to predict RPE using both GPS-derived external load indicators and additional variables. Both of the studies used different machine learning techniques. In their analysis, Geurkink et al. [40] included a large set of predictive indicators. In addition to external load indicators, internal load indicators, individual characteristics and supplementary variables were used to predict RPE. The findings from this study show that external load indicators – total distance, total time and number of sprints – are the strongest individual predictors of the RPE, accounting for 61.5% of normalized importance. A large number of different external load indicators derived from GPS, together with contextual factors, have also been used by Rossi et al. [43] to predict RPE. Interestingly, the results from this study show that RPE is affected not only by workload performed in the current training session but also cumulative load. Both studies [40, 43] highlight the importance of including a broad spectrum of variables in the prediction model. By contrast, the present study is focused on limited external load indicators, which might provide practitioners with a simple tool to understand the dose-response relationship between external intensity measures and RPE. Moreover, in contrast to the present study, the discussed research [40, 43] is not focused on interpretability of the models used. Quantifying inter-player differences in responses to external measures of intensity was one of the main objectives of this study (Figure 4). Group analysis revealed high-speed running distance per minute as the strongest predictor of RPE. This was the most important variable in predicting RPE for half of the players, with a relative importance of 61% in the global model (Figure 3). However, at individual level high-speed running, distance per minute accounted for 7 to 62% of the relative importance. Accelerations per minute and impact per minute were the most important variables in prediction of RPE for four players, with a relative importance of 21 and 12% respectively. In contrast, metres covered per minute with the relative importance of 5% was the most important variable for only one player, and individual level relative importance values ranged from 7 to 51%. The obtained results and high variability of importance of the external measures of training intensity confirm that internal training load is a combination of applied external load and several factors, such as individual characteristics of the player, which may modulate the player response [6]. The significance of the individual characteristics and supplementary variables in quantifying internal load has been demonstrated by Geurkink et al. [40]. As indicators in the prediction of RPE, players’ individual characteristics (physiological and personal) and supplementary variables accounted for 4.5% and 33% of the total normalized importance respectively [40]. Therefore, the same external training load may result in a completely different internal load. In line with our observations in the context of the importance of high-speed running, results from a systematic review with meta-analysis [44] clearly demonstrates the importance of high-speed running in changes in post-game fatigue-related markers in soccer. Distance covered above 5.5 m/s (19.8 km/h, HSR in present study), as the only monitoring variable, was highly correlated with both biochemical and neuromuscular markers. Each 100 m of distance running above 5.5 m/s increased by 30% activity of creatine kinase, an objective marker of internal training load [44]. Our findings demonstrate that high-speed running distance per minute is the strongest predictor of RPE at the group level. These findings are confirmed by a number of previous studies that have emphasized the relationship between high-speed running and both RPE and session-RPE (sRPE), calculated by multiplying training duration (minutes) by the RPE. A moderate correlation was reported between high-speed running distance and RPE (r = 0.30) in rugby league training, although a different threshold was used in that study (> 15 km/h) [9]. In soccer, Casamichana et al. reported a strong correlation (r = 0.64, p < 0.01) between frequency of efforts at high speed (≥ 18 km/h) and sRPE training load [21] and Scott et al. [24] reported a moderate correlation (r = 0.43, p < .05) between very high-speed (> 19.8 km/h) running distance and sRPE training load. These two studies showed the overall relationships between external and internal training load by the use of sRPE training load where training duration is a component of this measure. A small, within-individual correlation (r = 0.255, p < .001) was found between high-speed running distance per minute (> 14.4 km/h) and RPE by Gaudino et al. [22]. These observations underlined the importance of high-speed running in the context of perceived exertion. The main limitations of the current study are associated with the individual characteristics of the player (e.g. level of physical fitness) and its possible influence on internal load. The relatively small number of training observations is the second limitation. In addition, the present study was unable to quantify the influence of player self-reported measures, such as sleep quality, fatigue, stress and delayed-on-set muscle soreness (DOMS).

PRACTICAL IMPLICATIONS

Our results demonstrated that high-speed running distance per minute is the strongest predictor of RPE at the group level. For this reason, practitioners should be aware of the importance of HSR volume and intensity management. Skilful management of HSR volume and intensity will help avoid undesired fatigue during tapering days in a microcycle (e.g. one or two days before the match [MD-1, MD-2]), as well as negative training effects (e.g. injury) in the long term. Knowledge about the strongest predictors (external intensity measures) of RPE at an individual level will allow practitioners to prescribe training sessions to replicate competition exertion (e.g. during MD+1) or avoid undesired fatigue during return to play processes. This novel method for prediction of RPE can be useful for stakeholders (e.g. coaches) because of the possibility of visualization – and hence interpretability – without the need for advanced statistical knowledge. This method can be used to prescribe daily training loads on the basis of predicted, desired players’ responses (exertion).

CONCLUSIONS

These findings provide further evidence on knowledge about inter-player differences in responses to external load. This is particularly important in enhancing training prescription and athlete monitoring. Furthermore, knowledge about individual relationships between external measures of training intensity helps practitioners to achieve desired training outcomes during both training on a daily basis and return to play processes. Inter-player differences in responses to external load might be an explanation for mismatches between coaches’ intended and players’ perceived exertion.

Conflict of interest

The authors declare that they have no competing interests.

37 in total

Review 1. Decision trees: an overview and their use in medicine.

Authors: Vili Podgorelec; Peter Kokol; Bruno Stiglic; Ivan Rozman
Journal: J Med Syst Date: 2002-10 Impact factor: 4.460

2. Validity and reliability of GPS for measuring instantaneous velocity during acceleration, deceleration, and constant motion.

Authors: Matthew C Varley; Ian H Fairweather; Robert J Aughey
Journal: J Sports Sci Date: 2011-11-29 Impact factor: 3.337

Predicting ratings of perceived exertion in youth soccer using decision tree models.

INTRODUCTION

MATERIALS AND METHODS

Participants

Design

Methodology

External measures of intensity

Statistical analysis

RESULTS

DISCUSSION

PRACTICAL IMPLICATIONS

CONCLUSIONS

Conflict of interest

Review 1. Decision trees: an overview and their use in medicine.

2. Validity and reliability of GPS for measuring instantaneous velocity during acceleration, deceleration, and constant motion.

3. Heart rate and blood lactate correlates of perceived exertion during small-sided soccer games.

4. Monitoring accelerations with GPS in football: time to slow down?

5. Factors influencing perception of effort (session rating of perceived exertion) during elite soccer training.

Review 6. Monitoring Fatigue Status in Elite Team-Sport Athletes: Implications for Practice.

7. Positional Differences in the Most Demanding Passages of Play in Football Competition.

8. Factors affecting perception of effort (session rating of perceived exertion) during rugby league training.

9. Monitoring Players' Readiness Using Predicted Heart-Rate Responses to Soccer Drills.

10. Training Load and Player Monitoring in High-Level Football: Current Practice and Perceptions.