Literature DB >> 34767602

Development of an expected possession value model to analyse team attacking performances in rugby league.

Thomas Sawczuk^1,2,3, Anna Palczewska¹, Ben Jones^2,3,4,5,6.

Abstract

This study aimed to evaluate team attacking performances in rugby league via expected possession value (EPV) models. Location data from 59,233 plays in 180 Super League matches across the 2019 Super League season were used. Six EPV models were generated using arbitrary zone sizes (EPV-308 and EPV-77) or aggregated according to the total zone value generated during a match (EPV-37, EPV-19, EPV-13 and EPV-9). Attacking sets were considered as Markov Chains, allowing the value of each zone visited to be estimated based on the outcome of the possession. The Kullback-Leibler Divergence was used to evaluate the reproducibility of the value generated from each zone (the reward distribution) by teams between matches. Decreasing the number of zones improved the reproducibility of reward distributions between matches but reduced the variation in zone values. After six previous matches, the subsequent match's zones had been visited on 95% or more occasions for EPV-19 (95±4%), EPV-13 (100±0%) and EPV-9 (100±0%). The KL Divergence values were infinity (EPV-308), 0.52±0.05 (EPV-77), 0.37±0.03 (EPV-37), 0.20±0.02 (EPV-19), 0.13±0.02 (EPV-13) and 0.10±0.02 (EPV-9). This study supports the use of EPV-19 and EPV-13, but not EPV-9 (too little variation in zone values), to evaluate team attacking performance in rugby league.

Entities: Chemical

Mesh：

Year: 2021 PMID： 34767602 PMCID： PMC8589207 DOI： 10.1371/journal.pone.0259536

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

In recent years, the growing availability of event level data in rugby league has led to an increase in research surrounding the characteristics of match winning performances [1-4]. These studies can broadly be split into two categories based on the inclusion of spatial (i.e. event location) data within their analyses [3], or lack thereof [1, 2, 4]. Although those studies not including spatial data provide valuable insights into potential match winning actions [2] or the classification of player positions [4], they do not account for some of the most valuable contextual information surrounding the location of the events analysed. Incorporating this spatial context into the analysis of a team’s attacking performances could have a significant impact on tactical preparations for future matches and thus provides a valuable avenue for future research in rugby league. Spatial data has been included in analyses of team and player performances via expected possession value (EPV) models across several sports [3, 5, 6]. EPV models assign a value to every pitch location (or location-action tuple) visited during a match based on the probability of scoring a goal, basket or try from that action/location within a given amount of time. These values can be summed at a match level to quantify attacking performances. EPV models have been developed using either probabilistic [5, 7] or stochastic [8, 9] methods. Fernandez et al. [5, 7] used probabilistic deep learning methods to establish the EPV in soccer, whereas Routley and Schulte [9] and Liu and Schulte [8] both used stochastic models based upon Markovian principles to evaluate ice hockey player performances via Q-values. The episodic nature of rugby league, whereby teams have a finite six tackle period of possession, barring fouls/errors, ensures it is perfectly suited to stochastic analyses. Despite this, limited research has been conducted in rugby league using stochastic analyses [3]. When considering spatial data within EPV models, it is common to split the pitch into different zones, which pool data together. Discretising the pitch into these zones is computationally efficient and allows for improved generalisability to other samples. Nevertheless, selecting the correct zone system is fraught with difficulty. If the zones are too large, valuable data could be lost, but if they are too small, the results of the analysis will not be generalisable [10]. Two key methods have been used to determine zone sizes within the EPV literature. These are fixed zone sizes, arbitrarily chosen by the authors [3], and the selection of zones based on likely shooting locations [6, 8]. Given a try can be scored across the width of the pitch, it isn’t possible to identify specific shooting (or try scoring) locations within rugby league. However, it may be possible to produce a set of zones which are more rugby league specific by aggregating different zones based on their point scoring potential. This aggregated set of zones may prove more adept at evaluating attacking performances within rugby league than the fixed size zones previously considered [3], but to date no study has compared these methods. Within a model evaluating rugby league attacking performances, one of the most important elements is the reproducibility of performances between fixtures [11]. To understand the reproducibility of attacking performances from the perspective of an EPV model, it is necessary to evaluate the similarity of the total EPV generated during a match by a zone (i.e. the zone’s match EPV) between fixtures. Completing such an analysis with EPV models using different zone sizes (e.g. the previously published fixed zone size [3], a smaller fixed zone size and aggregated zone sizes) would help to identify the most suitable set of zones for rugby league. The aims of this study were to i) produce six EPV models (two with fixed zone sizes of ~5m x 5m and ~10m x 10m [3], and four with aggregated zones based on differences in the zones’ match EPV of 0.5, 1.0, 1.5 and 2.0 points per match) to quantify expected points scored during rugby league matches from specific locations on the pitch (i.e. attacking performances), ii) compare the reproducibility of match EPV between fixtures for the EPV models, and iii) quantify individual teams’ attacking performances across a season using an EPV model. Six EPV models were chosen to provide a range of zone sizes from small to large, which could be compared with regards to their reproducibility between matches and usefulness in practice.

Methods

Sample

Event level match-play data were obtained from Opta (Stats Perform, London, UK) for all 180 matches of the 2019 Super League season. In total, 59,233 plays were analysed. Within this sample, 1,369 tries were scored (1,013 successful conversions, 356 unsuccessful conversions), 271 penalty goals were attempted (239 successful, 32 unsuccessful) and 89 drop goals were attempted (42 successful, 47 unsuccessful). Prior to analysis, informed consent was obtained and ethics approval was provided by a University sub-ethics committee.

Data pre-processing

Although event level data can include various pieces of information regarding the actions completed and players involved, Opta only includes location data (x and y co-ordinates) for the first action of each play. Consequently, only the location of the first action of each play was used when developing the EPV models. Despite some variation being present in pitch sizes across the Super League, Opta standardises pitch dimensions to 68m x 120m through its coding software so these dimensions were used for this study. The two fixed zone size models used zone sizes of 5m x 5m and 10m x 10m [3]. When creating the zones for these models, the opposition try areas was removed from the 68m x 120m pitch to leave a 68m x 110m area. This was necessary as the opposition try area is an area of implicitly high value, where a team is likely to attempt to ground the ball for a try before the next play begins. The 68 x 110m area was split into fourteen ~5m columns and twenty two 5m rows, resulting in 308 ~5m x 5m zones (EPV-308), and seven ~10m columns and eleven 10m rows, resulting in 77 ~10m x 10m zones (EPV-77). In both models, the columns closest to the touchlines were 1m narrower than all other columns. This transformation was necessary as Opta only provides location to the nearest metre, so splitting the 68m pitch width equally into 4.85m or 9.7m columns would provide no additional detail and would be more difficult to understand in practice. In preparation for use within the EPV models, match event data were split into attacking sets. An attacking set was coded as a sequence of plays, which began when a team obtained possession of the ball and ended when the team lost possession of the ball (i.e. due to an error, handover, field kick, penalty, drop goal or try). The 59,233 plays used in this study were therefore grouped into 10,156 attacking sets (median length 4 plays per attacking set, range 1–26 plays). Table 1 provides a list of the events, which could end an attacking set and defines them. For every attacking set, the location at the beginning of each play (as a zone) was used. Therefore, each attacking set consisted of a sequence of zones equal to the number of plays in the possession. To enable the EPV models to calculate zone values, each zone visited was assigned a reward based on the outcome of the play: converted try scored (+6); unconverted try scored (+4); penalty goal scored (+2); drop goal scored (+1); loss of possession or missed goal attempt (0). In plays where none of these events occurred, a reward of 0 was assigned. Each time step within the sequence was assigned a reward, so it was possible for a zone to receive multiple rewards if more than one play began in a given location within the same attacking set.

Table 1

List of events, which could end an attacking set.

Event	Description
Handover	A completed sixth tackle by the opposition team
Kick at goal	Conversion, Penalty Goal or Drop Goal Attempt
Foul	Any foul resulting in the opposition team receiving the ball (e.g. conceding a penalty)
Misplaced pass	A pass or tap down, which is intercepted by the opposition.
Misplaced kick	Any kick not caught by the team in possession, including bombs/grubber kicks and positional kicks
Handling error	Any situation where the ball is lost from the player’s possession or dropped (e.g. lost in contact, dropped catch)

NB: Any situation where a pass/kick missed its target player, but was not successfully collected by the opposition resulted in the continuation of the attacking set for the attacking team.

Calculation of EPV-308 and EPV-77 fixed zone size values

To evaluate the EPV for each zone on the pitch within the fixed zone size models, attacking sets were considered as Markov Chains whereby the location of the ball on the pitch (i.e. its zone) at a given time (i.e. play) within the possession was represented as an event. The value for each zone s, at a given time within the possession t was defined as: where V(s, t) is the estimated reward obtained from the zone s, at the time t, Ru = Rt+k+1 is the reward obtained at the time u, which is determined by the end of play u-1 (e.g. if a converted try is scored at play u-1, Ru = 6), γ is the discount factor and k is a play within the attacking set. Subsequently, the overall return of any zone s after play t across the sample of attacking sets was calculated as: where G(s, t) refers to the overall return for zone s after play t across the sample of attacking sets; As,t is a set of attacking sets where the ball is at location s, in play t; τj is the play number within an attacking set j, Rj is the reward for the attacking set j. Finally, the expected possession value (EPV) of each zone s after time t was simulated using the Monte Carlo every visit algorithm. The Monte Carlo every visit algorithm calculates the empirical mean of each zone by summing the discounted rewards accumulated by the zone and dividing them by the total number of visits. The algorithm allows every visit to a zone to be valued, which is important within rugby league as there is no guarantee that a play will be able to move between states as the opposition defence aims to stop progression up the pitch. It is calculated as follows:

Calculation of aggregated zone values (EPV-37, EPV-19, EPV-13, EPV-9)

To calculate the aggregated set of zones for EPV-37, EPV-19, EPV-13 and EPV-9, the zones from EPV-308 were grouped together or split based upon differences in their match EPV. The match EPV (Gm(s,t)) for zone s in match m was calculated using Eq 2. However, rather than considering the zones’ match EPV individually, they were summed at a column level to provide the column match EPV, and at a row level to provide the row match EPV. This allowed the influence of, for example, starting a play in a wide location to be evaluated globally across the pitch. Visual inspection of the initial column and row values showed that they could be smoothed based on their spatial similarity. The fourteen columns were averaged at the 5m level symmetrically to form seven ~10m columns, whereas the twenty two 5m rows were aggregated to eleven 10m rows similar to Kempton et al. (2016). Following this initial aggregation, linear mixed models were used to evaluate whether the columns or rows could be further combined. In separate models, the column match EPV and row match EPV were added as dependent variables, with team and fixture ID added as random effects. To evaluate for differences in their match EPVs, column and row indexes were added to their respective models as categorical fixed effects. Minimal effects testing [12] was used in four separate models to determine whether two columns or rows could be combined against a smallest effect size of interest (SESOI) of 0.5, 1.0, 1.5 and 2.0 units of match EPV respectively. If the difference between two columns or rows was statistically significant (i.e. P < 0.05), they remained separate; otherwise, their column/row match EPVs were averaged and compared to the next column or row’s match EPV. This iterative process was conducted independently for the columns and rows. The columns and rows were then combined to form a grid. All zones were aggregated at a row level between -10m and 10m, rather than splitting them into columns. This decision was made because the zones within the row were visited infrequently relative to the other zones and so had highly variable zone values. All statistical values obtained within this process are provided in S1 File. At SESOI 0.5, 7 rows and columns were present, resulting in 37 zones for the EPV-37. At SESOI 1.0, 4 rows and 6 columns were present, resulting in 19 zones for the EPV-19. At SESOI 1.5, 4 rows and 4 columns were present, resulting in 13 zones for the EPV-13. At SESOI 2.0, 3 rows and 4 columns were present, resulting in 9 zones for the EPV-9. The aggregated zone values were calculated as a weighted average of the values of the EPV-308 zones they were composed of. Fig 1 highlights this process by depicting the similarities and differences between the EPV-308, EPV-77 and EPV-19 in the 30m closest to the opposition try line.

Fig 1

Depiction of similarities and differences between EPV-308 (A), EPV-77 (B) and EPV-19 (C) in the 30m closest to the opposition try line. Each zone from the EPV-77 and EPV-19 is a weighted average of the EPV-308 zones they are composed of. Dotted line represents the opposition 20m line.

Evaluating the reproducibility of match EPV between fixtures

To evaluate the reproducibility of match EPV between fixtures, the individual zones’ match EPV was compared between fixtures at a team level. The previous 1–10 fixtures were compared against the subsequent fixture. Every possible fixture within the 2019 Super League season was evaluated, resulting in 28 comparisons per team when one previous fixture was considered through to 19 comparisons per team when ten previous fixtures were considered. The Kullback-Leibler (KL) Divergence [13] was used to calculate the similarity between the reward distribution in the subsequent match i and the reward distribution in the k previous match(es). The reward distribution for zone s in match mi (PGmi(s)) was calculated via the equation: The reward distribution for zone s in k matches prior to match i was calculated as: where PGM(s) refers to the reward distribution obtained by zone s across M fixtures, Gm(s) refers to the match EPV obtained by zone s in match m and S refers to the set of all zones within the EPV model. The KL Divergence is calculated according to the equation: Where P is the true reward distribution, Q is the approximating reward distribution and S refers to the set of all states within the model. The subsequent match’s reward distribution (PGmi(S)) was used as the true reward distribution with the previous matches’ reward distribution (PGM(S)) as the approximating distribution. The KL Divergence is a measure used in information theory and provides an understanding of the similarity between two distributions of values. It is an unbounded measure, where a value of 0 indicates two distributions are perfectly matched, but a value of infinity indicates that there is no relationship between the two distributions. A value of infinity typically occurs when a zone within the approximating distribution (PGM(S)) has no value (i.e. it has not been visited), but has been visited by the true distribution (PGmi(S)). The percentage of non-infinity values was used to provide an understanding of how many of the subsequent match’s zones were completely visited in the previous matches. The KL Divergence value was used as a measure of similarity between the two reward distributions’ values. All results are provided as a mean and standard deviation values across the twelve Super League clubs.

Quantifying teams’ attacking performances using EPV-19

The attacking performances of individual teams was quantified using z-score analysis. Each team’s reward distribution across the 2019 Super League season was calculated for the EPV-19 via Eq 5. Z-score analysis of the reward distributions was used to calculate a standardised value evaluating how the proportion of match EPV a team obtained from a zone compared to the average across all teams in the Super League. Values of +1 and +2 z-scores were chosen to represent greater and much greater proportion of match EPV generated by the zones relative to the average team, values of -1 and -2 were used to represent a lower and much lower proportion of match EPV generated. All analyses were conducted using bespoke Python scripts (Python 3.7, Python Software Foundation, Delawere, USA) or via Proc Mixed (SAS University Edition, SAS Institute, Cary, NC).

Results

Calculation of EPV models

Fig 2 illustrates the zone values for all six EPV models (EPV-308, EPV-77, EPV-37, EPV-19, EPV-13 and EPV-9). There is a general trend that the closer the zone is to the opposition try line, the more valuable it is. Similarly, central zones are more valuable than wider zones as indicated by the darker colours in these areas. As the number of zones decreases, there is a smoothing effect, whereby the values of adjacent zones move closer together (indicated by the reduced light-dark contrast between them).

Fig 2

EPV-308 (A), EPV-77 (B), EPV-37 (C), EPV-19 (D), EPV-13 (E) and EPV-9 (F). Lines represent the try line, 20m line and 50m line. All state values are shaded to the same scale.

EPV-308 (A), EPV-77 (B), EPV-37 (C), EPV-19 (D), EPV-13 (E) and EPV-9 (F). Lines represent the try line, 20m line and 50m line. All state values are shaded to the same scale. It is noticeable in all models that the majority of variation in values between zones is present within 30m of the opponent’s try line. There are considerable differences in how each of the aggregated models handles this variation though. EPV-37 has a separate row for each 10m block, EPV-19 and EPV-13 split it into 20-30m from the try line and 0-20m from the try line, whereas EPV-9 only considers the 0-20m area to be different from the rest of the pitch.

Reproducibility of match EPV between fixtures

Table 2 shows the percentage of non-infinity values for all six models after 1–10 previous matches (i.e. the percentage of fixtures where all subsequent match’s zones had been visited in the previous matches). For EPV-308, there were only three occasions where this is not 0% (8, 9 and 10 previous matches). There was a consistent increase in the percentage of non-infinity values as the number of previous fixtures increased for EPV-77 and EPV-37, peaking at 77 ± 8% and 97 ± 4% respectively after 10 previous fixtures. For EPV-19, there was a large increase in the percentage of non-infinity values before 6 previous fixtures were considered, after which limited change was observed (95–98% from 6 to 10 fixtures). A similar trend was present for EPV-13 before 3 previous fixtures were considered (96–100% from 3 to 10 fixtures). In EPV-9, 100% of values were not infinity after only 3 previous fixtures.

Table 2

Percentage of non-infinity zones for each model, providing the percentage of matches where the complete set of the subsequent match’s zones were visited in the previous n matches.

Values are mean (standard deviation) percentage (%) across all clubs.

	Number of previous matches
Model	1	2	3	4	5	6	7	8	9	10
EPV-308	0	0	0	0	0	0	0	2	4	5
EPV-308	(0)	(0)	(0)	(0)	(0)	(0)	(0)	(2)	(4)	(5)
EPV-77	0	2	13	26	40	51	63	69	72	77
EPV-77	(0)	(2)	(6)	(7)	(9)	(12)	(9)	(8)	(8)	(8)
EPV-37	1	19	48	67	77	85	89	92	95	97
EPV-37	(2)	(7)	(8)	(11)	(8)	(9)	(9)	(6)	(5)	(4)
EPV-19	23	59	76	88	91	95	96	96	98	98
EPV-19	(8)	(6)	(6)	(8)	(5)	(4)	(3)	(3)	(3)	(3)
EPV-13	61	89	96	98	99	100	100	100	100	100
EPV-13	(9)	(3)	(4)	(4)	(2)	(0)	(0)	(0)	(0)	(0)
EPV-9	86	99	100	100	100	100	100	100	100	100
EPV-9	(5)	(2)	(0)	(0)	(0)	(0)	(0)	(0)	(0)	(0)

Percentage of non-infinity zones for each model, providing the percentage of matches where the complete set of the subsequent match’s zones were visited in the previous n matches.

Values are mean (standard deviation) percentage (%) across all clubs. Fig 3 shows the KL Divergence for EPV-77, EPV-37, EPV-19, EPV-13 and EPV-9. After 8 (KL Divergence = 1.50 ± 0.19), 9 (KL Divergence = 1.41 ± 0.15) and 10 (KL Divergence = 1.44 ± 0.15) previous matches, the KL Divergence for EPV-308 was greater than any other model. The KL Divergence reduced as more previous matches were considered in all EPV models in Fig 3. For EPV-37, EPV-19, EPV-13 and EPV-9, the majority of this reduction occurred between 1 and 3 previous matches before the values stabilised. For EPV-77, the values stabilised after six previous matches.

Fig 3

KL Divergence values for EPV-77, EPV-37, EPV-19, EPV-13 and EPV-9.

KL Divergence values for EPV-77, EPV-37, EPV-19, EPV-13 and EPV-9.

A lower value indicates greater similarity in reward distributions between previous matches and the subsequent match. EPV-308 not included as values could not be calculated for the first seven matches due to no non-infinity values being present. Line provides mean value, shaded area indicates standard deviation. Fig 4 provides a numbered zone breakdown for the EPV-19. Fig 5 depicts the z-score analysis of each team’s attacking performances across the 2019 Super League season using the EPV-19 model. Relative to the average Super League team, team 4 gained greater proportion of match EPV from wider areas 10-70m from their try line. Conversely, teams 6 and 8 gained a greater proportion of match EPV attacking centrally (zones 5–7 for team 6, zones 5–6 for team 8). Within 20m of the opposition try line, team 9 gained a lower proportion of match EPV from the widest zone (zone 14), but the spread of their match EPV over the more central areas was much more even than other teams.

Fig 4

EPV-19 zones numbered so they can be distinguished from each other.

Where numbers are repeated, both sides of the pitch make up the same zone (e.g. zone 2 is comprised of the widest ~5m on both sides of pitch, between 10m and 70m from the team’s own try line).

Fig 5

Z-score analysis of teams’ attacking performances in 2019 Super League season.

Numbers 1–19 reflect the zone numbers in Fig 4. A greater value indicates a greater proportion of EPV was obtained from the zone than the average Super League 2019 team. Distances are measured from the team’s own try line.

EPV-19 zones numbered so they can be distinguished from each other.

Where numbers are repeated, both sides of the pitch make up the same zone (e.g. zone 2 is comprised of the widest ~5m on both sides of pitch, between 10m and 70m from the team’s own try line).

Z-score analysis of teams’ attacking performances in 2019 Super League season.

Discussion

The aims of this study were to i) produce six EPV models (two with fixed zone sizes of ~5m x 5m and ~10m x 10m [3], and four with aggregated zones based on differences in the zones’ match EPV of 0.5, 1.0, 1.5 and 2.0 points per match) analysing attacking performance in rugby league, ii) compare the reproducibility of match EPV between fixtures for the EPV models, and iii) quantify individual teams’ attacking performances across a season using an EPV model. Six EPV models were produced: EPV-308, EPV-77, EPV-37, EPV-19, EPV-13 and EPV-9. The results show that the attacking performances of previous matches, as assessed by the match EPV, were more reproducible in subsequent matches as the number of zones in the model decreased and the number of previous matches used increased. However, as reproducibility increased, the homogeneity of the zone values also increased. The results also showed that z-scores of the reward distribution could be used to identify zones through which teams obtain a greater or lower proportion of their match EPV relative to the average Super League team.

Generation of EPV models

By generating six EPV models, it is possible to compare the value that each model estimates to be generated by possessing the ball in any location on a rugby league pitch. In all six EPV models, zones were more valuable the closer they were to the opposition try line and the more central they were, which aligns with the findings of Kempton et al. [3]. Additionally, in all six models much greater value is generated within 20-30m of the opposition try line, compared with >30m. This finding is similar to previous research within football, which shows that the chance of scoring is significantly reduced to below 7% when shots are taken from outside the 18 yard box [14]. The identification of these zones of value in all six models provides a new method through which individual possessions can be valued. Furthermore, they provide a valuable methodology through which the zones visited in tactical set plays could be measured to establish which play may be most advantageous against a given team. However, it should be noted that within Kempton and colleagues’ [3] model, the increase in value was much more gradual from the team in possession’s 20m line through to the opposition try line than in the models produced here. Whether this is due to the methodological differences (e.g. the different definition of possessions) or a difference in playing style between National Rugby League and Super League teams is unclear.

Reproducibility of attacking performances

A key element of any model evaluating attacking performances is to identify how well they relate to performance in future fixtures. Despite this, few studies have attempted to evaluate this component of their models [15]. This study was the first to evaluate the reproducibility of the EPV models between fixtures via the KL Divergence. The results showed that although the EPV-308, EPV-77 and EPV-37 provide significantly more variability than either the EPV-19, EPV-13 and EPV-9 with regards to the values of different zones, they had poor reproducibility between fixtures. This was noticeable in both the percentage of subsequent match zones visited and in the similarity in reward distributions between the previous and subsequent matches. The EPV-308, EPV-77 and EPV-37 therefore have limited application in practice. By contrast, the EPV-19, EPV-13 and EPV-9 all showed excellent reproducibility between fixtures. When six previous matches were considered, these three models were able to visit all zones in the subsequent match on 95–100% of occasions. Furthermore, the reward distributions had low KL Divergence values indicating that proportion of points obtained from each zone was also very similar to the subsequent match. Six matches is a relatively small number of matches to consider given the excellent reproducibility shown, suggesting that any of the EPV-19, EPV-13 or EPV-9 models could be used to evaluate team attacking performance in rugby league. However, it is the usefulness of the zones generated that should define which model is used in practice. The EPV-19 and EPV-13 both contain four rows (-10 to 10m, 10-70m, 70-80m, 80-100m), whereas the EPV-9 only contains three rows (-10m to 10m, 10-80m, 80-100m). As five of the six models produced suggest that the value of zones in the 70-80m row can be differentiated from those around it, it is possible that the EPV-9 has oversmoothed the data, reducing its usefulness in practice. The EPV-19 and EPV-13 models only differ in the manner through which they split group the columns along the x-axis. The EPV-19 has more columns (6), separating out the widest and second most central areas of EPV-13. This results in the EPV-13 having a smoother progression of zone values from wide to central. However, it also results in the value of the zones just outside the posts being much smaller relative to EPV-19 and EPV-37. Given the value of the central zones being of upmost importance for conversions of tries, the EPV-19 may be considered the more useful set of zones, but either model could be used in practice. This study used the EPV-19 model to quantify the attacking performances of individual teams within the 2019 Super League season. Using z-score analysis of the match EPV, it is clear that team 4 generates a greater proportion of match EPV than the average team from different zones to team 6 when 10m-70m from its own try line (Fig 5). Team 4 gains greater match EPV from wide areas (zones 3 and 4), whereas Team 6 gains more match EPV centrally (zones 5–7) relative to other zones. The identification of these zones pre-match could assist teams in their tactical preparations. Furthermore, the figure shows those teams who spread their attack more evenly. For example, from 80-100m, team 9 obtained a small proportion of its match EPV from the widest zone (14) compared to other Super League teams, but they generated similar proportions of match EPV across the rest of the zones. It is possible that this ability to generate value close to the average team across the majority of the pitch made the team difficult to defend against and could explain why they were one of the top points scorers across the season. The use of this z-score analysis has strong potential as a method through which the areas on a pitch where an opposition team may attack can be highlighted quickly and efficiently, regardless of the EPV model used, enabling tactical preparations for future matches to be tailored to the opposition.

Limitations

The EPV-19 provides an excellent starting point through which the tactical analysis of opposition teams can be conducted in a time efficient and easily interpretable manner in rugby league. However, it is subject to several limitations. The first of these is the use of only the start location of each play. Although it is not currently possible to provide any further information due to the limitations of the data used, it is important to note that the model could be improved if specific actions (e.g. passes, kicks or tackles) and their locations were included, alongside the locations of all players on the pitch. In soccer for example, every single action is location-coded by several providers, so a more complete model could be completed in that domain using a similar event level data only process. A second limitation of the model is that it does not analyse whether being aware of or attempting to stop an opposition team from visiting their highest valued zones during their attacking sets is detrimental to their ability to win rugby league matches. Future studies could resolve this by building on our framework and evaluating whether there is a difference in the zones visited when a team wins or loses matches. Similarly, future studies may wish to consider identifying specific sequences of play, as has recently been published in rugby union [16], as these sequences may also help with tactical preparations for future matches. Finally, our study also does not attempt to directly predict future attacking trends. The authors do not consider this to be a limitation due to the variability inherent within predicting single matches, but it should be highlighted. By using the KL Divergence with the subsequent match as the true distribution, any areas with 0 values in the subsequent match are automatically believed to have 0 values in the approximating distribution. Consequently, teams could spend time preparing for the opposition to use a zone they have used in previous matches, which they don’t end up visiting in this specific match. However, using the EPV-19 model, the team would be prepared for the vast majority of other zones that the opponent visits during attacking sets based on the previous six matches’ performances.

Conclusions

In conclusion, this study produced six EPV models, which could be used to analyse a team’s attacking performances in rugby league. The EPV-19 and EPV-13 both provide a useful understanding of attacking performances, which are reproducible in subsequent fixtures, when six previous matches are evaluated. Furthermore, z-score analysis comparing the proportion of match EPV generated by each zone relative to other teams within the league highlights the zones a team gains more value from and may provide a method through which the tactical preparation of rugby league teams could be enhanced. (DOCX) Click here for additional data file. 10 Aug 2021 PONE-D-21-20054 Development of an expected possession value model to analyse team attacking performances in rugby league PLOS ONE Dear Dr. Sawczuk, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. The reviewers have some concerns about the rigour of the research which will require particular attention. Please submit your revised manuscript by Sep 24 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Caroline Sunderland Academic Editor PLOS ONE Journal requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please provide additional details regarding participant consent. In the Methods section, please ensure that you have specified (1) whether consent was informed and (2) what type you obtained (for instance, written or verbal). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information 3. In your Data Availability statement, you have not specified where the minimal data set underlying the results described in your manuscript can be found. PLOS defines a study's minimal data set as the underlying data used to reach the conclusions drawn in the manuscript and any additional data required to replicate the reported study findings in their entirety. All PLOS journals require that the minimal data set be made fully available. For more information about our data policy, please see http://journals.plos.org/plosone/s/data-availability. Upon re-submitting your revised manuscript, please upload your study’s minimal underlying data set as either Supporting Information files or to a stable, public repository and include the relevant URLs, DOIs, or accession numbers within your revised cover letter. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. Any potentially identifying patient information must be fully anonymized. Important: If there are ethical or legal restrictions to sharing your data publicly, please explain these restrictions in detail. Please see our guidelines for more information on what we consider unacceptable restrictions to publicly sharing data: http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Note that it is not acceptable for the authors to be the sole named individuals responsible for ensuring data access. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This study investigated the validity of the evaluation method of team attacking performances in rugby league via expected possession value (EPV) models. The authors examined three EPV models (two with fixed zone sizes of ~5m x 5m and ~10m x 10m [Kempton et al. 2016, JSS], and one with aggregated zones based on the zones’ match EPV) analyzing attacking performance in rugby league. They identified the EPV model which provides the greatest reproducibility of match EPV between fixtures, and quantified individual teams’ attacking performances across a season using an EPV model. The motivations in this paper were clear and there were some contributions (to investigate the validity of EPV with 59,233 plays in 180 Super League matches), whereas the novelty was not so high (but this seems to be no problem in this journal). My major concern is that only one spatiotemporal resolution (EPV-19) is proposed and the results cannot conclude EPV-19 is the best. At least resolutions less than 77 and less than 19 may be required to conclude it. Furthermore, the interpretation of KL divergence may be wrong and the related part should be revised (for detail, see comment #5). For these reasons (including the comments below), the manuscript may be acceptable for me if the authors can solve these problems. Specific comments 1. In the introduction, other EPV studies (e.g., soccer and deep learning approach) should be referred to. For example: [1] J Fernández, L Bornn, D Cervone, Decomposing the Immeasurable Sport: A deep learning expected possession value framework for soccer, MIT SSAC, 2019 [2] J Fernández, L Bornn, D Cervone, A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions, Machine Learning, 2021 2. In the introduction, other rugby studies or methods related to movements and scores should be referred to. For example, see Related work in: Rory Bunker, Keisuke Fujii, Hiroyuki Hanada, Ichiro Takeuchi, Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: an application to rugby union, 2020. 3. The above of Eq.(3): I am not familiar with Monte Carlo every visit algorithm. Please explain the summary of the algorithm and the reason why the authors selected it. 4. P7: the authors used linear mixed models to evaluate whether the columns or rows could be combined. However, there were no related statistical values. I wonder how the areas of EPV-19 were determined based on such an approach. A detailed and clear flow of the determination may be required for reproducibility. 5. P9: KL divergence is an asymmetric measure between probability distributions P and Q. In this case, what are P and Q? These should be specified with the definition of KL divergence (i.e., equation). By the usual definition, when Q(x)=0, the KL divergence will be an infinite value. This does not always mean P (or Q) is reproducible by Q (or P). In summary, I consider that the authors may misunderstand the interpretation of KL divergence about reproducibility (this is to measure the dissimilarity between two probability distributions). This point should be revised throughout the manuscript. 6. Results and Discussion: although the analysis using KL divergence may be useful in this study, the property in the similar reward distribution between previous and current matches may not be useful to extract meaningful information. For example, coarser resolution (e.g., EPV-5) may obtain more similar distributions to the previous matches, but more general and unmeaningful insight will be obtained. EPV-19 may be an appropriate resolution to obtain meaningful insight (e.g., Fig 4), but there were no related results and discussion. The analysis will enhance the validity of the authors' claims. Reviewer #2: The authors aim to use expected possession value (EPV) models to assess attacking performance in Rugby League. I agree that the event-level and spatial data used in this paper has good value and presents interesting opportunities for AI/statistical methods to improve match analysis in Rugby. Similar datasets have led to some very successful and interest papers at top AI venues when applied for football/soccer. It would be nice to see a more in-depth introduction and literature review about work in this space and valuing actions in sports data. This would help make the work more accessible to a wider audience who may not be as familiar with EPV models or Rugby League. A comparison in the differences between this work and that in [5] for football would also be of interest. I think the zone selection criteria the authors have used makes sense. However, it would have been nice to have seen a visual representation of these zones in the paper and potentially an experiment comparing the different approaches for zone selection, events that occur in those zones and how this would affect their model performances. In the data-processing it would be good to see a full-list of events that can end a sequence and then it would be interesting to explore the impact of negative reward-weights based on loss of possession at the end of a sequence vs other 0 events such as (penalties, scrums, kicks etc). Looking at the results to compare the 3 EPV models in Figure 2 - it seems like EPV-208 (A) may be using zones that are too granular and therefore does not have enough data to fully learnt the value of some zones (e.g., a darker zone behind the teams posts). EPV-77 (B) shows some better findings with large zones and EPV-19 (C) shows the most useful of results however I would expect most coaches were already aware of the findings. However, I think the charts shown in Figure 2 would be extremely valuable for a defensive coach when planning against an opposition as he/she would be able to understand where attacks are most dangerous. There could be more details around each of the models. The work in [5] showing an EPV model for football goes into a lot more detail around a single EPV model and I think this paper could benefit from a more detail throughout. I think it is important for the authors and reader to recognise the limitations of their work due to data etc. Finally, I think greater discussion into future work and real-world impact of the study could be beneficial. It would be good if eventually there is a free online source of the data used for a sample number of games in Rugby League as this would help with reproducibility and improvements of the models. Overall, I think the flow of the paper could be improved to help with readability but the work describes an interesting and sound evaluation of EPV models in Rugby League. The paper does report original research and with a few tweaks it will make a valid contribution to the base of academic knowledge in sports analytics. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 24 Sep 2021 ** THIS IS THE SAME AS THE WORD DOCUMENT ATTACHED TO THE SUBMISSION ** Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes ________________________________________ 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: No Reviewer #2: Yes ________________________________________ 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No Reviewer #2: No The data that was used for this study was acquired from a third-party, formerly Opta Sports, now Stats Perform. It is available from www.optaprorugby.com. The data was provided under a license agreement with Opta Sports/Stats Perform, and the data is subject to an approved research ethics application from our University. The terms of our license agreement prevent us from sharing the raw data we used for this analysis. Our ethical approval also prevents us from sharing any data in any way that could be re-identified. The metadata and (fixture/location/action) data itself would allow someone else to re-identify fixtures, teams and/or players, breaching the ethics approval given. However, it should be possible to obtain access to the data by contacting Stats Perform (www.statsperform.com/contact/). The authors had no special access privileges and all data is taken from the 2019 season of the Super League. The corresponding author is happy to liaise with any researchers who have queries about how to obtain the data. ________________________________________ 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ________________________________________ 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This study investigated the validity of the evaluation method of team attacking performances in rugby league via expected possession value (EPV) models. The authors examined three EPV models (two with fixed zone sizes of ~5m x 5m and ~10m x 10m [Kempton et al. 2016, JSS], and one with aggregated zones based on the zones’ match EPV) analyzing attacking performance in rugby league. They identified the EPV model which provides the greatest reproducibility of match EPV between fixtures, and quantified individual teams’ attacking performances across a season using an EPV model. The motivations in this paper were clear and there were some contributions (to investigate the validity of EPV with 59,233 plays in 180 Super League matches), whereas the novelty was not so high (but this seems to be no problem in this journal). My major concern is that only one spatiotemporal resolution (EPV-19) is proposed and the results cannot conclude EPV-19 is the best. At least resolutions less than 77 and less than 19 may be required to conclude it. Furthermore, the interpretation of KL divergence may be wrong and the related part should be revised (for detail, see comment #5). For these reasons (including the comments below), the manuscript may be acceptable for me if the authors can solve these problems. Specific comments 1. In the introduction, other EPV studies (e.g., soccer and deep learning approach) should be referred to. For example: [1] J Fernández, L Bornn, D Cervone, Decomposing the Immeasurable Sport: A deep learning expected possession value framework for soccer, MIT SSAC, 2019 [2] J Fernández, L Bornn, D Cervone, A framework for the fine-grained evaluation of the instantaneous expected value of soccer possessions, Machine Learning, 2021 Thank you for this comment, we have now added references to these methods in the introduction (INTRODUCTION: PARAGRAPH 2). 2. In the introduction, other rugby studies or methods related to movements and scores should be referred to. For example, see Related work in: Rory Bunker, Keisuke Fujii, Hiroyuki Hanada, Ichiro Takeuchi, Supervised sequential pattern mining of event sequences in sport to identify important patterns of play: an application to rugby union, 2020. Thank you for this comment. The paper mentioned answers a slightly different question to ours as we are looking at which zones are visited and attempting to find optimal zone sizes rather than the sequences of actions themselves. However, we do see the relevance of the paper as an alternative method and have therefore made reference to it in the discussion. Indeed one of the limitations of our method is that although we now know the value of different zones and where an opposition team is likely to attack from, we do not know the sequence of events through which this happens. This limitation has been added to the discussion, referencing the Bunker et al. study. (DISCUSSION – LIMITATIONS) 3. The above of Eq.(3): I am not familiar with Monte Carlo every visit algorithm. Please explain the summary of the algorithm and the reason why the authors selected it. Thank you for this comment, we have added the details requested (METHODS – CALCULATION OF EPV-308 AND EPV-77 FIXED ZONE SIZE VALUES: PARAGRAPH 3). 4. P7: the authors used linear mixed models to evaluate whether the columns or rows could be combined. However, there were no related statistical values. I wonder how the areas of EPV-19 were determined based on such an approach. A detailed and clear flow of the determination may be required for reproducibility. Thank you for this comment. We have now included every single statistical value obtained in the supplementary data for this paper. This information provides significantly more detail as to how the areas were determined (SUPPLEMENTARY DATA 1). 5. P9: KL divergence is an asymmetric measure between probability distributions P and Q. In this case, what are P and Q? These should be specified with the definition of KL divergence (i.e., equation). By the usual definition, when Q(x)=0, the KL divergence will be an infinite value. This does not always mean P (or Q) is reproducible by Q (or P). In summary, I consider that the authors may misunderstand the interpretation of KL divergence about reproducibility (this is to measure the dissimilarity between two probability distributions). This point should be revised throughout the manuscript. Thank you for this comment. In this study, we use the subsequent match as the true distribution (P), and the previous 1-10 matches as the approximating distribution (Q). We are only interested in whether Q can approximate P as we only know Q when we are preparing for future matches. In line with your comments, we have extended our use of the KL Divergence to ensure the similarity aspect is used more. We still use the term reproducibility as a global term (this can be changed if there is preference for a different term), but now we consider both the percentage of non-infinity zones and the average value of the KL Divergence across all teams. We use the percentage of non-infinity zones is useful as a measure of how many of the subsequent match’s complete set of zones are visited in the previous matches, given a zone not visited by P (the subsequent match), automatically obtains a value of 0. The KL Divergence value then provides an understanding of how similar the reward distributions are between matches. To this end, as you note, we are looking at how similar the distributions are in terms of their values, rather than whether they have just been visited. We believe that our definition and usage now matches the more accurate definition you provided in the comment above. We introduce these elements in the methodology (METHODS – EVALUATING THE REPRODUCIBILITY OF MATCH EPV BETWEEN FIXTURES), provide the percentage of non-infinity values in TABLE 2, the KL Divergence values in FIGURE 3, and evaluate the results within the discussion (DISCUSSION – REPRODUCIBILITY OF ATTACKING PERFORMANCES). 6. Results and Discussion: although the analysis using KL divergence may be useful in this study, the property in the similar reward distribution between previous and current matches may not be useful to extract meaningful information. For example, coarser resolution (e.g., EPV-5) may obtain more similar distributions to the previous matches, but more general and unmeaningful insight will be obtained. EPV-19 may be an appropriate resolution to obtain meaningful insight (e.g., Fig 4), but there were no related results and discussion. The analysis will enhance the validity of the authors' claims. Thank you for this comment. We agree that a coarser resolution (e.g. EPV-5) would provide more similar distributions to previous matches, but poor insight in practice. We also agree that more analyses would be beneficial to the study. As such, we have added three further zone analyses to the study. In the original manuscript, we used a smallest effect size of interest of 1 to split zones. We have now added three further state spaces (EPV-37, EPV-13, EPV-9), which use smallest effect sizes of interest of 0.5, 1.5 and 2.0 respectively to split the zones. We feel that this adds a significant amount to the article and thank you again for the suggestion. Reviewer #2: The authors aim to use expected possession value (EPV) models to assess attacking performance in Rugby League. I agree that the event-level and spatial data used in this paper has good value and presents interesting opportunities for AI/statistical methods to improve match analysis in Rugby. Similar datasets have led to some very successful and interest papers at top AI venues when applied for football/soccer. It would be nice to see a more in-depth introduction and literature review about work in this space and valuing actions in sports data. This would help make the work more accessible to a wider audience who may not be as familiar with EPV models or Rugby League. A comparison in the differences between this work and that in [5] for football would also be of interest. Thank you for this comment, we have now added more detail to the introduction to make reference to the probabilistic deep learning models used in football vs the stochastic Markovian models used in ice hockey. We have also explained why rugby league is better suited to stochastic analyses. (INTRODUCTION: PARAGRAPH 2). I think the zone selection criteria the authors have used makes sense. However, it would have been nice to have seen a visual representation of these zones in the paper and potentially an experiment comparing the different approaches for zone selection, events that occur in those zones and how this would affect their model performances. Thank you for this comment, we are unclear what is meant by a ‘visual representation of these zones’. We believe this may be provided by Figure 2, but have also provided a non-heatmapped version of each model in the supplementary data (SUPPLEMENTARY DATA 1). We agree that the events that occur in any zone could have a large impact on the results. Unfortunately, we are unable to provide the events that occur in the specific zones due to limitations within the dataset (only the location of the first event of each play is provided, so we cannot guarantee the location accuracy of any event thereafter). However, we have added three further experiments regarding the mixed model approach for zone selection (using SESOI 0.5, 1.0, 1.5 and 2.0 now, vs only SESOI 1.0 before) to provide greater discussion around the models. In the data-processing it would be good to see a full-list of events that can end a sequence and then it would be interesting to explore the impact of negative reward-weights based on loss of possession at the end of a sequence vs other 0 events such as (penalties, scrums, kicks etc). Thank you for this comment, we have now added a full list of events which could end an episode (TABLE 1). We considered negative reward weights for this study but were uncertain as to how we could include them given our episodes began when the team obtained possession and ended when the team lost possession. We believe negative events could only be included if the episode duration was extended (i.e. to the next points being scored), but this in turn creates problems as the model becomes more like a Markov Game. Such an approach is better suited with both larger datasets, and action data, neither of which was suitably available for this study. We agree the use of negative rewards is of great interest though and are hoping to identify an appropriate method of using them, or accounting for negative actions, in future studies. Looking at the results to compare the 3 EPV models in Figure 2 - it seems like EPV-208 (A) may be using zones that are too granular and therefore does not have enough data to fully learnt the value of some zones (e.g., a darker zone behind the teams posts). EPV-77 (B) shows some better findings with large zones and EPV-19 (C) shows the most useful of results however I would expect most coaches were already aware of the findings. However, I think the charts shown in Figure 2 would be extremely valuable for a defensive coach when planning against an opposition as he/she would be able to understand where attacks are most dangerous. Thank you, we agree with all these points. We believe that you are correct in the assertion that most coaches may expect to see some of the findings but visualising them simply in such a quantitative way could provide some insight in a quick way, which may help to prepare for future matches. There could be more details around each of the models. The work in [5] showing an EPV model for football goes into a lot more detail around a single EPV model and I think this paper could benefit from a more detail throughout. I think it is important for the authors and reader to recognise the limitations of their work due to data etc. Finally, I think greater discussion into future work and real-world impact of the study could be beneficial. Thank you for this comment. We have added significant detail to the development of the zones for each of the models in the supplementary material now, which we hope helps the reader to better understand how we have generated each model (SUPPLEMENTARY DATA 1). We have also added further information regarding limitations (data, lack of predictive work within this study), future work (attempts to improve data collection techniques and provide more meaningful models including action/player location data) and real-world impact (use of EPV-19 to evaluate the attacking performances of opposition teams) throughout the DISCUSSION. It would be good if eventually there is a free online source of the data used for a sample number of games in Rugby League as this would help with reproducibility and improvements of the models. Thank you, we agree. Hopefully, given the increasing amounts of free data available in other sports (soccer in particular), sports like rugby (league and union) will follow suit and release data so that models can be improved. Unfortunately, although the data used for this study is accessible, it is behind a paywall, as described in the data availability statement. Overall, I think the flow of the paper could be improved to help with readability but the work describes an interesting and sound evaluation of EPV models in Rugby League. The paper does report original research and with a few tweaks it will make a valid contribution to the base of academic knowledge in sports analytics. Thank you Submitted filename: Reviewer response.docx Click here for additional data file. 13 Oct 2021 PONE-D-21-20054R1Development of an expected possession value model to analyse team attacking performances in rugby leaguePLOS ONE Dear Dr. Sawczuk, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please address the minor comments and then the paper will be ready for acceptance. Please submit your revised manuscript by Nov 27 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Caroline Sunderland Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. Additional Editor Comments (if provided): [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thanks for your response and revising the manuscript. I consider that the manuscript is ready for publication if the authors revise the following points. 1. L167: EPV-37, EPV-19, EPV-13 and EPV-9, EPV-308 zones -> EPV-37, EPV-19, EPV-13, EPV-9, and EPV-308? 2. KL divergence formula should be clarified (P and Q are not defined in this paper). ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 19 Oct 2021 Reviewer #1: Thanks for your response and revising the manuscript. I consider that the manuscript is ready for publication if the authors revise the following points. Thank you again for your critique during the review period. We are pleased that our changes satisfied your comments. 1. L167: EPV-37, EPV-19, EPV-13 and EPV-9, EPV-308 zones -> EPV-37, EPV-19, EPV-13, EPV-9, and EPV-308? This sentence has now been clarified. “To calculate the aggregated set of zones for EPV-37, EPV-19, EPV-13 and EPV-9, the zones from EPV-308 were grouped together or split based upon differences in their match EPV.” 2. KL divergence formula should be clarified (P and Q are not defined in this paper). We have now added the KL Divergence formula to the text, with more description to ensure there is no ambiguity between P and Q, our true and approximating distributions. “The KL Divergence is calculated according to the equation: D_KL (P||Q)= ∑_(s∈S)▒〖P(s) log⁡((P(s))/(Q(s))) 〗 Where P is the true reward distribution, Q is the approximating reward distribution and S refers to the set of all states within the model. The subsequent match’s reward distribution (PGmi(S)) was used as the true reward distribution with the previous matches’ reward distribution (PGM(S)) as the approximating distribution. “ Submitted filename: Reviewer response.docx Click here for additional data file. 21 Oct 2021 Development of an expected possession value model to analyse team attacking performances in rugby league PONE-D-21-20054R2 Dear Dr. Sawczuk, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Caroline Sunderland Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: 3 Nov 2021 PONE-D-21-20054R2 Development of an expected possession value model to analyse team attacking performances in rugby league Dear Dr. Sawczuk: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Caroline Sunderland Academic Editor PLOS ONE

6 in total

1. The expected value of possession in professional rugby league match-play.

Authors: Thomas Kempton; Nicholas Kennedy; Aaron J Coutts
Journal: J Sports Sci Date: 2015-07-20 Impact factor: 3.337

2. Explaining match outcome and ladder position in the National Rugby League using team performance indicators.

Authors: Carl T Woods; Wade Sinclair; Sam Robertson
Journal: J Sci Med Sport Date: 2017-04-21 Impact factor: 4.319

3. Match analysis in football: a systematic review.

Authors: Hugo Sarmento; Rui Marcelino; M Teresa Anguera; Jorge CampaniÇo; Nuno Matos; José Carlos LeitÃo
Journal: J Sports Sci Date: 2014-05-01 Impact factor: 3.337

4. Examining the evolution and classification of player position using performance indicators in the National Rugby League during the 2015-2019 seasons.

Authors: C Wedding; C T Woods; W H Sinclair; M A Gomez; A S Leicht
Journal: J Sci Med Sport Date: 2020-02-27 Impact factor: 4.319

5. Performance consistency of international soccer teams in euro 2012: a time series analysis.

Authors: Mohsen Shafizadeh; Marc Taylor; Carlos Lago Peñas
Journal: J Hum Kinet Date: 2013-10-08 Impact factor: 2.193

6 in total