| Literature DB >> 35174271 |
Abstract
In modern sports, strategy and tactics are important in determining the game outcome. However, many coaches still base their game tactics on experience and intuition. The aim of this study is to predict tactics such as formations, game styles, and game outcome based on soccer dataset. In this paper, we propose to use Deep Neural Networks (DNN) based on Multi-Layer Perceptron (MLP) and feature engineering to predict the soccer tactics of teams. Previous works adopt simple machine learning techniques, such as Support Vector Machine (SVM) and decision tree, to analyze soccer dataset. However, these often have limitations in predicting tactics using soccer dataset. In this study, we use feature selection, clustering techniques for the segmented positions and Multi-Output model for Soccer (MOS) based on DNN, wide inputs and residual connections. Feature selection selects important features among features of soccer player dataset. Each position is segmented by applying clustering to the selected features. The segmented positions and game appearance dataset are used as training dataset for the proposed model. Our model predicts the core of soccer tactics: formation, game style and game outcome. And, we use wide inputs and embedding layers to learn sparse, specific rules of soccer dataset, and use residual connections to learn additional information. MLP layers help the model to generalize features of soccer dataset. Experimental results demonstrate the superiority of the proposed model, which obtain significant improvements comparing to baseline models.Entities:
Keywords: Clustering; Feature selection; MLP; Multi-output model; Soccer tactics
Year: 2022 PMID: 35174271 PMCID: PMC8802790 DOI: 10.7717/peerj-cs.853
Source DB: PubMed Journal: PeerJ Comput Sci ISSN: 2376-5992
Abbreviations of the positions.
| Abbreviation | Positions |
|---|---|
| FW | Forward |
| Wing | Winger |
| AMF | Attack Midfielder |
| CMF | Center Midfielder |
| DMF | Defense Midfielder |
| WB | Wing Back |
| CB | Center Back |
| GK | Goal Keeper |
Figure 1Segmented positions by clustering.
Figure 2Overview of the model training.
Feature description of the soccer player dataset (Lee, Jung & Camacho, 2021).
| Feature name | Feature description | Feature name | Feature description |
|---|---|---|---|
| Apps | Number of appearances | Mins | Appearances time |
| S_Total | Number of shots | S_OutOfBox | Number of shots outside the penalty zone |
| S_SixYardBox | Number of shots in the 6 yard box | S_PenaltyArea | Number of shots in the penalty zone |
| S_OpenPlay | Number of shots in open play situations | S_Counter | Number of shots in counterattack situation |
| S_SetPiece | Number of shots in a set piece situation | PenaltyTaken | Number of penalty kicks obtained |
| S_OffTarget | Number of invalid shots | S_OnPost | Number of shots hit by the goal |
| S_OnTarget | Number of effective shots | Blocked | Number of blocked shots |
| G_Total | Number of points | G_SixYardBox | Number of goals scored in the 6 yard box |
| G_PenaltyArea | Number of goals scored in penalty zone | G_OutOfBox | Number of goals scored outside the penalty zone |
| G_OpenPlay | Number of goals scored in open play situations | G_Counter | Number of goals scored in counterattack |
| G_PenaltyScored | Number of penalty kicks scored | G_Own | Number of own goals |
| G_Normal | Number of goals scored in normal situations | T_Dribbles | Number of dribble attempts |
| Successful | Number of dribble successes | Unsuccessful | Number of dribble failures |
| Touch Miss | Number of touch failures | Dispossessed | Number of lost possession of the ball |
| D_Total | Number of air contention | Won | Air Competition Wins |
| Lost | Air contention defeats | P_Total | Total number of passes |
| AccLB | Number of long pass successes | InAccLB | Number of long pass failures |
| AccSP | Number of short pass successes | InAccSP | Number of short pass failures |
| AccCR | Cross success number | InAccCR | Number of cross failures |
| InAccCrn | Number of successful corner kicks | InAccCrn | Number of corner kick failures |
| AccFrK | Number of free kicks successful | InFrK | Number of free kick failures |
| K_Total | Number of key passes | K_Long | Number of long key passes |
| K_Short | Number of short key passes | K_Cross | Number of cross key passes |
| K_Throughball | Number of Through Key Passes | A_Total | Number of assists |
| A_Cross | Number of cross assists | A_Corner | Number of corner kick assists |
| A_Throughball | Number of through-pass assists | A_Freekick | Number of free kick assists |
| T_Total | Number of tackle attempts | DribbledPast | Number of breakthroughs |
| Tackle | Number of tackle successes | Interception | Number of interceptions |
| Fouled | Number of being fouled | Fouls | Number of fouls |
| Offiside | Number of offsides | Clearance | Number of cleared balls |
| ShotsBlocked | Number of blocking shots | CrossesBlocked | Number of Cross blocks |
| PassesBlocked | Number of passes blocked | Save_Total | Number of savings |
| Save_SixYardBox | Number of shooting saves outside the 6-yard box | PenaltyArea | Penalty Zone Shooting Savings |
| Save_OutOfBox | Number of savings by shooting outside the penalty zone | Rating | Average rating |
Features of the game appearance dataset.
| Binary features | Continuous features | Categorical features |
|---|---|---|
| FW0 FW0B FW1 | Ball Possession | Match order |
| AMF0 AMF0B AMF1 AMF1B | Season | |
| Wing0 Wing0B Wing1 Wing1B | Opposing team | |
| CMF0 CMF0B CMF0C CMF1 CMF1B | Opp level | |
| DMF0 DMF1 DMF1B | ||
| WB0 WB0B WB1 WB1B | ||
| CB0 CB0B CB1 CB1B | ||
| GK0 GK1 |
Boruta algorithm for feature selection.
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Soccer player dataset for FW.
| Features | Harry Kane | Roberto Firmino | Jamie Vardy |
|---|---|---|---|
| S_Total | 3.6 | 2.9 | 4.1 |
| S_OutOfBox | 0.7 | 0.8 | 0.9 |
| S_SixYardBox | 0.4 | 0.5 | 0.2 |
| S_PenaltyArea | 1.4 | 2.1 | 1.7 |
| S_OpenPlay | 2.3 | 2.8 | 1.9 |
| Rating | 7.7 | 6.8 | 7.3 |
Segmented positions for FW.
| Players | Harry Kane | Roberto Firmino | Jamie Vardy |
|---|---|---|---|
| Label (Segmented position) | 0 | 1 | 0 |
Figure 3Structure of multi-output model for soccer.
The numbers in this figure mean the dimensions of each layer. Also, the color of each layer is indicated to distinguish the role of the layer. Red color indicates the feed-forward layer as dense layer, black indicates the embedding layer, green indicates batch normalization and drop-out, and yellow indicates the output layer.
Configuration of our model.
| Configuration of our model | |||
|---|---|---|---|
| Embedding size | 16 | batch size | 24 |
| Dropout | 0.2 | #epochs | 100 |
| 0.001 | optimizer |
| |
| Learning rate | 0.01 | ||
Features selected from Boruta algorithm (Lee, Jung & Camacho, 2021).
| Position | Category | Selected features | |
|---|---|---|---|
| FW | FW | Offensive | S_Total, S_SixYardBox, S_PenaltyArea, S_OpenPlay, S_SetPiece, S_OffTarget, S_OnTarget, G_Total, G_SixYardBox, G_PenaltyArea, G_OutOfBox, G_OpenPlay, G_SetPiece, G_Normal, T_Dribbles, Unsuccessful, Successful, D_Total, Won |
| Passing | P_Total, AccLB, AccSP, InAccSP, K_Total, K_Short | ||
| Defensive | ShotsBlocked | ||
| MF | AMF | Offensive | S_Total, S_OutOfBox, S_PenaltyArea, S_OpenPlay, Blocked, G_Total, G_PenaltyArea, G_OutOfBox, G_OpenPlay, G_Normal, Successful, D_Total |
| Passing | P_Total, AccLB, AccSP, InAccSP, AccCr, InAccCr, InAccCrn, AccFrk, K_Total, K_Long, K_Short, K_Cross, K_Corner, K_Throughball, A_Total, A_Cross, A_Corner | ||
| Defensive | T_Total | ||
| CMF | Offensive | S_Total, S_OutOfBox, S_SixYardBox, S_PenaltyArea, S_OpenPlay, S_OffTarget, S_OnTarget, Blocked, G_Total, G_PenaltyArea, G_OutOfBox, G_OpenPlay, G_Normal, Unsuccessful, Successful, T_Dribbles, D_Total, Won, Lost | |
| Passing | P_Total, AccSP, InAccSP, ACCCr,AccCrn, K_Total, K_Short, K_Throughball, A_Total, A_Throughball | ||
| Defensive | T_Total, Interception, PassesBlocked, Tackle | ||
| DMF | Offensive | S_Total, S_OutOfBox, S_SixYardBox, S_PenaltyArea, PenaltyTaken, S_OffTarget, S_OnTarget, G_Total, G_PenaltyArea, Successful, Unsuccessful, Dispossessed, D_Total, Won | |
| Passing | P_Total, AccSP, AccFrK, K_Total, K_Long, K_Short, K_Throughball, A_Total | ||
| Defensive | T_Total, Interception, Tackle | ||
| Wing | Offensive | S_Total, S_OutOfBox, S_SixYardBox, S_PenaltyArea, S_OpenPlay, S_OnTarget, Blocked, G_Total, G_PenaltyArea, G_OutOfBox, G_OpenPlay, G_normal, Unsuccessful, Successful, T_Dribbles, Won | |
| Passing | P_Total, AccLB, AccSP, InAccSP, AccCr, InAccCr, AccCrn, InAccCrn, InAccFrk, K_Total, K_Long, K_Short, K_Cross, K_Corner, K_Throughball, K_Freekick, A_Cross, A_Total | ||
| Defensive | T_Total, Interception, Fouled, ShotsBlocked, Tackle | ||
| DF | WB | Offensive | S_Total, S_SixYardBox, S_PenaltyArea, S_OpenPlay, S_SetPiece, S_OnTarget, Blocked, G_Total, G_OutOfBox, G_SetPiece, G_Normal, Successful, T_Drribles, D_Total, Won, Lost |
| Passing | P_Total, AccSP, AccCr, InAccCr, K_Total, K_Long, K_Short, K_Cross, A_Cross, A_Total | ||
| Defensive | T_Total, DrribledPast, Tackle, Interception, Fouled, ShotsBlocked, PassesBlocked | ||
| CB | Offensive | S_Total, S_OutOfBox, S_PenaltyArea, S_OpenPlay, S_SetPiece, S_OffTarget, S_OnTarget, G_Total, G_SetPiece, G_Own, G_Normal, Successful, T_Dribbles, D_Total, Won, Lost | |
| Passing | P_Total, AccSP, K_Total, K_Short | ||
| Defensive | T_Total, Interception, Fouled, Clearance, Tackle | ||
| GK | GK | Passing | InAccLB, AccSP, AccFrK |
| Defensive | Clearance | ||
| Saving | Save_Total, Save_PenaltyArea, Save_OutOfBox |
The numbers of the clusters by elbow method. We have empirically decided the numbers of these clusters.
| Position (# cluster) | Position (# cluster) | Position (# cluster) | Position (# cluster) |
|---|---|---|---|
| FW (2) | AMF (2) | Wing (2) | CMF (2) |
| DMF (2) | SB (2) | CB (2) | GK (2) |
The performance evaluation metrics.
| Metrics | Description | Formula |
|---|---|---|
| Accuracy | This is the sum of TP and TN divided by the total sum of the population |
|
| Precision | It is TP divided by the total sum of TP, FP |
|
| Recall | This is TP divided by the total sum of TP, FN |
|
Performance evaluation of model accuracy from cross-validation methods (baseline, WDL, GRN, our model without feature engineering, our model with feature engineering).
| Method | Model | Formation Acc | Game style Acc | Game outcome Acc |
|---|---|---|---|---|
| Hold-out validation | Baseline | 74.35% | 72.4% | 42.5% |
| WDL | 91.78 | 82.14 | 57.96 | |
| GRN | 93.54 | 83.65 | 56.64 | |
| Our model (Without F.E.) | 60.58 | 58.1 | 34.2 | |
| Our model (With F.E.) | 93.67 | 83.35 | 57.59 | |
| 5-fold cross validation | Baseline | 73.58% | 69.57% | 43.51% |
| WDL | 90.85 | 81.87 | 56.85 | |
| GRN | 94.84 | 83.62 | 57.78 | |
| Our model (Without F.E.) | 59.34 | 57.76 | 33.1 | |
| Our model (With F.E.) | 94.97 | 83.82 | 57.82 | |
| Stratified 5-fold cross validation | Baseline | 71.64% | 70.26% | 42.41% |
| WDL | 90.69 | 81.23 | 56.94 | |
| GRN | 93.11 | 82.49 | 55.81 | |
| Our model (Without F.E.) | 59.28 | 54.32 | 34.97 | |
| Our model (With F.E.) | 94.85 | 82.97 | 56.16 | |
| Repeated random validation | Baseline | 70.86% | 68.38% | 43.76% |
| WDL | 89.85 | 82.47 | 56.81 | |
| GRN | 93.94 | 82.57 | 56.21 | |
| Our model (Without F.E.) | 57.58 | 53.88 | 33.62 | |
| Our model (With F.E.) | 93.53 | 83.07 | 57.54 |
Performance evaluation of model precision from 5-fold cross-validation (baseline, WDL, GRN, our model without feature engineering, our model with feature engineering).
| Method | Model | Formation precision | Game style precision | Game outcome precision |
|---|---|---|---|---|
| 5-fold cross validation | Baseline | 72.07% | 70.61% | 45.43% |
| WDL | 91.81 | 82.97 | 56.07 | |
| GRN | 93.51 | 81.72 | 56.53 | |
| Our model (Without F.E.) | 56.75 | 53.76 | 35.60 | |
| Our model (With F.E.) | 93.84 | 83.35 | 56.26 |
Performance evaluation of model recall from 5-fold cross-validation (baseline, WDL, GRN, our model without feature engineering, our model with feature engineering).
| Method | Model | Formation recall | Game style recall | Game outcome recall |
|---|---|---|---|---|
| 5-fold cross validation | Baseline | 70.09% | 66.37% | 42.93% |
| WDL | 90.15 | 80.83 | 53.17 | |
| GRN | 90.58 | 80.94 | 54.33 | |
| Our model (Without F.E.) | 57.06 | 56.98 | 32.79 | |
| Our model (With F.E.) | 91.92 | 81.18 | 54.26 |
Figure 4Confusion matrix of our model from the best performing method (5-fold cross validation) for formation classification.
Figure 5Confusion matrix of our model from the best performing method (5-fold cross validation) for game style classification.
Figure 6Confusion matrix of our model from the best performing method (5-fold cross validation) for game outcome classification.
Input dataset for model prediction.
| Features | |||
|---|---|---|---|
| FW0 | 1 | 1 | 1 |
| FW0B | 0 | 0 | 0 |
| FW1 | 0 | 0 | 0 |
| AMF0 | 0 | 0 | 0 |
| AMF0B | 0 | 0 | 0 |
| AMF1 | 1 | 0 | 1 |
| AMF1B | 0 | 0 | 0 |
| Wing0 | 1 | 1 | 1 |
| Wing0B | 1 | 0 | 0 |
| Wing1 | 0 | 1 | 1 |
| Wing1B | 0 | 0 | 0 |
| CMF0 | 0 | 1 | 1 |
| CMF0B | 0 | 0 | 1 |
| CMF0C | 0 | 0 | 0 |
| CMF1 | 0 | 1 | 0 |
| CMF1B | 0 | 0 | 0 |
| DMF0 | 1 | 1 | 0 |
| DMF1 | 1 | 1 | 1 |
| DMF1B | 0 | 0 | 0 |
| WB0 | 1 | 0 | 0 |
| WB0B | 0 | 0 | 0 |
| WB1 | 1 | 0 | 0 |
| WB1B | 0 | 0 | 0 |
| CB0 | 1 | 1 | 1 |
| CB0B | 1 | 1 | 1 |
| CB1 | 0 | 1 | 1 |
| CB1B | 0 | 0 | 0 |
| GK0 | 1 | 1 | 1 |
| GK1 | 0 | 0 | 0 |
| Opp level | High | Mid | Low |
| Match order | 9 round | 32 round | 18 round |
| Season | 2020–2021 | 2020–2021 | 2020–2021 |
| Opposing Team | Manchester City | Everton | Sheffield |
| Ball Possession | 66.1 | 47.2 | 58 |
Figure 7Visualization against three cases, which are Manchester City (9 Round), Everton (32 Round), and Sheffield (18 Round).
Prediction results against three cases, which are Manchester City (9 Round), Everton (32 Round), and Sheffield (18 Round).
| Results | |||
|---|---|---|---|
| Actual formation | 4-2-0-1-3 | 3-2-2-0-3 | 3-1-2-1-3 |
| Actual game style | Offensive | Defensive | Offensive |
| Actual game outcome | Lose | Win | Win |
| Predicted formation | 4-2-0-1-3 | 3-2-2-0-3 | 3-1-2-1-3 |
| Predicted game style | Offensive | Defensive | Offensive |
| Predicted game outcome | Lose | Draw | Win |