| Literature DB >> 35205582 |
Shu-Fen Li1, Mei-Ling Huang1, Yun-Zhi Li1.
Abstract
(1) Background and Objective: Major League Baseball (MLB) is one of the most popular international sport events worldwide. Many people are very interest in the related activities, and they are also curious about the outcome of the next game. There are many factors that affect the outcome of a baseball game, and it is very difficult to predict the outcome of the game precisely. At present, relevant research predicts the accuracy of the next game falls between 55% and 62%. (2)Entities:
Keywords: Major League Baseball (MLB); deep learning; machine learning; model prediction
Year: 2022 PMID: 35205582 PMCID: PMC8871522 DOI: 10.3390/e24020288
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1Flowchart of the study.
Figure 2Hit-related variables from Baseball-reference.
Figure 3Pitcher-related variables from Baseball-reference.
MLB variables.
| Variable | Abbreviation | Variable | Abbreviation |
|---|---|---|---|
|
| PA |
| SLG |
|
| AB |
| OPS |
|
| R |
| LOB |
|
| H |
| H |
|
| HR |
| R |
|
| RBI |
| BB |
|
| BB |
| SO |
|
| SO |
| HR |
|
| SB |
| ERA |
|
| CS |
| AB |
|
| BA |
| WHIP |
|
| OBP |
| Win% |
Accumulation for MLB variables (HOU).
| Year | Variable |
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|
|
| 1 | 28 | 25 | 2 | … | 3 | 0 | 3 | … | 1.000 | 0 |
| 2 | 61 | 55 | 2 | 10 | 2 | 3 | 0.500 | 0 | |||
| 3 | 96 | 83 | 3 | 21 | 7 | 6 | 0.333 | 1 | |||
| … | … | … | … | … | … | … | … | … | … | … | |
| 162 | 6073 | 5459 | 729 | … | 1308 | 618 | 423 | … | 0.531 | 1 | |
|
| 1 | 38 | 34 | 5 | … | 4 | 3 | 4 | … | 1.000 | 0 |
|
| … | … | … | … | … | … | … | … | … | … | … |
|
| 161 | 6310 | 5540 | 906 | … | 1192 | 632 | 443 | … | 0.656 | 1 |
| 162 | 6349 | 5573 | 912 | 1197 | 635 | 444 | 0.658 | 1 |
Parameter setting for 1DCNN.
| Parameter | Range |
|---|---|
| optimizer | Adam, RMSprop |
| epochs | 50, 100, 300, 500 |
| batch_size | 10, 20, 30 |
Parameter setting for ANN.
| Parameter | Range |
|---|---|
| kernel_initializer | Zeros, RandomNormal, glorot_normal, glorot_uniform, he_normal, uniform, lecun_uniform, he_uniform |
| optimizer | Adam, RMSprop |
| epochs | 50, 100, 300, 500 |
| batch_size | 10, 20, 30 |
Parameter setting for SVM.
| Parameter | Range |
|---|---|
| kernel | Linear, RBF |
| C | 1, 10, 100, 1000 |
| gamma | 0.0001, 0.001, 0.1, 1, 10, 100, 1000 |
Parameter setting for LR.
| Parameter | Range |
|---|---|
| penalty | L1, L2 |
| C | 1, 10, 100, 1000 |
| Solvers | Liblinear, newton-cg, lbfgs, sag |
Binary confusion matrix.
| Predicted | |||
|---|---|---|---|
| Actual | Win | Lose | |
| Win | True Positive ( | False Negative ( | |
| Lose | False Positive ( | True Negative ( | |
Confusion matrix for 1DCNN (TEX).
| Predicted | ||||||
|---|---|---|---|---|---|---|
| CV | Win | Lose | Accuracy (%) | AUC | ||
| Actual | 1 | Win | 41 | 41 | 54.32 | 0.5402 |
| Lose | 33 | 47 | ||||
| 2 | Win | 39 | 43 | 54.32 | 0.5402 | |
| Lose | 31 | 49 | ||||
| 3 | Win | 41 | 41 | 52.47 | 0.5276 | |
| Lose | 36 | 44 | ||||
| 4 | Win | 44 | 38 | 58.64 | 0.5686 | |
| Lose | 29 | 51 | ||||
| 5 | Win | 48 | 34 | 55.56 | 0.5502 | |
| Lose | 38 | 42 | ||||
| Average | 55.06 (±2.04) | 0.5454 (±0.01) | ||||
Figure 4Training and testing process for 1DCNN (TEX).
Confusion matrix for ANN before feature selection (TEX).
| Prediction Results | ||||||
|---|---|---|---|---|---|---|
| CV | Win | Lose | Accuracy (%) | AUC | ||
| Actual results | 1 | Win | 38 | 58 | 50.00 | 0.5407 |
| Lose | 23 | 43 | ||||
| 2 | Win | 32 | 64 | 52.47 | 0.5502 | |
| Lose | 13 | 53 | ||||
| 3 | Win | 52 | 44 | 56.79 | 0.5762 | |
| Lose | 26 | 53 | ||||
| 4 | Win | 44 | 52 | 53.70 | 0.5691 | |
| Lose | 23 | 43 | ||||
| 5 | Win | 21 | 75 | 48.15 | 0.5038 | |
| Lose | 9 | 57 | ||||
| Average | 52.22 (±2.99) | 0.5480 (±0.03) | ||||
Figure 5Training and testing process for ANN (TEX).
Confusion matrix for SVM before feature selection (TEX).
| Prediction Results | ||||||
|---|---|---|---|---|---|---|
| CV | Win | Lose | Accuracy (%) | AUC | ||
| Actual results | 1 | Win | 39 | 36 | 61.11 | 0.6386 |
| Lose | 27 | 60 | ||||
| 2 | Win | 48 | 29 | 64.81 | 0.6493 | |
| Lose | 28 | 57 | ||||
| 3 | Win | 49 | 26 | 63.58 | 0.6466 | |
| Lose | 33 | 54 | ||||
| 4 | Win | 51 | 28 | 64.81 | 0.6493 | |
| Lose | 29 | 54 | ||||
| 5 | Win | 52 | 31 | 69.75 | 0.6659 | |
| Lose | 18 | 61 | ||||
| Average | 64.79 (±2.84) | 0.6500 (±0.01) | ||||
Confusion matrix for LR before feature selection (TEX).
| Prediction Results | ||||||
| CV | Win | Lose | Accuracy (%) | AUC | ||
| Actual results | 1 | Win | 44 | 44 | 53.70% | 0.5062 |
| Lose | 31 | 43 | ||||
| 2 | Win | 41 | 43 | 52.47% | 0.5038 | |
| Lose | 34 | 44 | ||||
| 3 | Win | 46 | 27 | 56.17% | 0.5185 | |
| Lose | 44 | 45 | ||||
| 4 | Win | 49 | 30 | 58.64% | 0.5238 | |
| Lose | 37 | 46 | ||||
| 5 | Win | 42 | 39 | 56.79% | 0.5375 | |
| Lose | 31 | 50 | ||||
| Average | 55.55 (±2.21) | 0.5180 (±0.01) | ||||
Selected variables (TEX).
| Team | Selected Variables |
|---|---|
| TEX | R( |
Confusion matrix for ANN after feature selection (TEX).
| Prediction Results | ||||||
|---|---|---|---|---|---|---|
| CV | Win | Lose | Accuracy (%) | AUC | ||
| Actual results | 1 | Win | 45 | 51 | 55.56 | 0.5971 |
| Lose | 21 | 45 | ||||
| 2 | Win | 38 | 58 | 54.94 | 0.5919 | |
| Lose | 15 | 51 | ||||
| 3 | Win | 34 | 62 | 49.38 | 0.5038 | |
| Lose | 20 | 46 | ||||
| 4 | Win | 32 | 64 | 53.70 | 0.5696 | |
| Lose | 11 | 55 | ||||
| 5 | Win | 37 | 59 | 54.94 | 0.5919 | |
| Lose | 14 | 52 | ||||
| Average | 53.70 (±2.24) | 0.5709 (±0.03) | ||||
Figure 6Training and testing process for ANN (TEX).
Confusion matrix for SVM after feature selection (TEX).
| Prediction Results | ||||||
|---|---|---|---|---|---|---|
| CV | Win | Lose | Accuracy (%) | AUC | ||
| Actual results | 1 | Win | 47 | 30 | 60.49 | 0.6180 |
| Lose | 34 | 51 | ||||
| 2 | Win | 51 | 24 | 67.28 | 0.6313 | |
| Lose | 29 | 58 | ||||
| 3 | Win | 50 | 25 | 67.90 | 0.6814 | |
| Lose | 27 | 60 | ||||
| 4 | Win | 54 | 26 | 67.90 | 0.6814 | |
| Lose | 26 | 56 | ||||
| 5 | Win | 58 | 24 | 66.05 | 0.6427 | |
| Lose | 31 | 49 | ||||
| Average | 65.92 (±2.80) | 0.6510 (±0.03) | ||||
Confusion matrix for LR after feature selection (TEX).
| Prediction Results | ||||||
|---|---|---|---|---|---|---|
| CV | Win | Lose | Accuracy (%) | AUC | ||
| Actual results | 1 | Win | 56 | 26 | 57.41 | 0.5618 |
| Lose | 43 | 37 | ||||
| 2 | Win | 43 | 38 | 54.32 | 0.5238 | |
| Lose | 36 | 45 | ||||
| 3 | Win | 43 | 38 | 54.32 | 0.5238 | |
| Lose | 36 | 45 | ||||
| 4 | Win | 42 | 37 | 56.79 | 0.5586 | |
| Lose | 33 | 50 | ||||
| 5 | Win | 49 | 28 | 58.02 | 0.5645 | |
| Lose | 39 | 45 | ||||
| Average | 56.17 (±1.56) | 0.5465 (±0.02) | ||||
Prediction accuracies for four models before feature selection.
| Number | Team | Accuracy (%) |
| |||
|---|---|---|---|---|---|---|
| 1DCNN | ANN | SVM | LR | |||
| 1 | ATL | 53.09 (±1.91) | 52.72 (±1.59) | 66.66 (±3.12) a | 55.18 (±2.58) | 0.00 ** |
| 2 | CIN | 60.74 (±0.92) b | 55.80 (±0.30) b,c | 61.98 (±3.23) c | 58.37 (±1.57) | 0.00 ** |
| 3 | MIA | 60.62 (±1.06) | 60.62 (±4.54) | 65.06 (±2.33) b | 59.26 (±2.37) b | 0.05 * |
| 4 | NYM | 54.44 (±1.76) | 51.36 (±0.72) | 63.46 (±2.26) a | 53.46 (±3.80) | 0.00 ** |
| 5 | PHI | 58.27 (±2.39) | 57.53 (±1.53) | 63.83 (±4.41) | 59.88 (±2.84) | 0.14 |
| 6 | WSN | 50.62 (±2.38) | 49.38 (±0.00) | 64.44 (±2.09) a | 56.05 (±3.48) a | 0.00 ** |
| 7 | MIL | 52.71 (±1.15) | 56.30 (±2.51) | 62.84 (±2.00) a | 53.33 (±3.95) | 0.00 ** |
| 8 | CHC | 55.31 (±0.84) | 52.47 (±0.00) | 64.20 (±2.07) a | 59.63 (±2.16) a | 0.00 ** |
| 9 | STL | 56.42 (±0.92) | 55.56 (±0.00) | 65.43 (±3.56) a | 54.57 (±3.11) | 0.00 ** |
| 10 | PIT | 57.65 (±1.39) b | 51.24 (±1.53) b | 66.54 (±1.53) a | 54.57 (±3.55) | 0.00 ** |
| 11 | SFG | 51.73 (±1.53) | 50.49 (±0.46) | 65.31 (±3.61) a | 53.21 (±2.35) | 0.00 ** |
| 12 | SDP | 55.80 (±0.84) b | 56.54 (±1.49) c | 66.55 (±3.99) b,c | 58.52 (±2.98) | 0.01 * |
| 13 | LAD | 61.11 (±0.63) | 60.49 (±1.03) | 61.36 (±3.37) | 59.50 (±2.09) | 0.38 |
| 14 | COL | 53.33 (±2.01) | 50.74 (±3.46) a | 62.59 (±3.09) a | 54.08 (±3.41) | 0.01 * |
| 15 | ARI | 56.79 (±2.21) b | 51.36 (±0.91) | 63.95 (±2.12) a | 51.07 (±2.65) b | 0.00 ** |
| 16 | TBR | 54.20 (±2.80) | 51.11 (±0.60) | 63.83 (±2.64) a | 53.83 (±5.14) | 0.00 ** |
| 17 | BOS | 60.86 (±1.08) b | 54.44 (±1.26) b,c | 62.84 (±3.85) c,d | 57.11 (±1.47) d | 0.00 ** |
| 18 | NYY | 59.50 (±1.08) | 58.88 (±0.74) | 65.43 (±1.41) a | 54.32 (±1.79) a | 0.00 ** |
| 19 | TOR | 51.73 (±2.48) | 49.14 (±2.43) b | 63.33 (±2.01) a | 56.29 (±2.48) b | 0.00 ** |
| 20 | BAL | 58.03 (±2.50) | 55.68 (±1.06) b | 62.22 (±3.77) b | 60.62 (±3.59) | 0.03 * |
| 21 | CHW | 52.84 (±1.73) b | 57.28 (±0.91) b | 65.68 (±2.46) a | 54.57 (±3.26) | 0.00 ** |
| 22 | CLE | 51.61 (±1.00) b | 54.44 (±0.25) | 62.96 (±2.40) a | 58.27 (±2.78) b | 0.00 ** |
| 23 | KCR | 55.43 (±1.63) | 55.43 (±1.37) | 64.81 (±2.47) a | 56.30 (±2.80) | 0.00 ** |
| 24 | DET | 54.69 (±1.14) b,c | 60.25 (±1.08) b | 63.95 (±3.96) c | 61.48 (±3.55) | 0.01 * |
| 25 | MIN | 56.17 (±1.10) b | 52.10 (±2.02) b | 63.21 (±3.23) a | 54.20 (±2.85) | 0.00 ** |
| 26 | OAK | 56.29 (±1.26) b | 55.19 (±2.55) c | 66.42 (±1.00) b,c | 59.78 (±3.46) | 0.00 ** |
| 27 | HOU | 57.90 (±0.72) | 57.04 (±0.30) | 63.09 (±3.61) a | 57.28 (±2.90) | 0.01 * |
| 28 | LAA | 52.96 (±0.99) | 52.47 (±1.0.3) | 66.66 (±2.97) a | 52.47 (±1.51) | 0.00 ** |
| 29 | SEA | 48.40 (±1.49) | 50.49 (±1.43) | 64.20 (±2.53) a | 53.45 (±2.78) | 0.00 ** |
| 30 | TEX | 55.06 (±2.04) | 52.22 (±2.44) | 64.79 (±2.84) a | 55.55 (±2.21) | 0.00 ** |
| Mean | 55.48 (±3.22) | 54.29 (±3.30) b | 64.25 (±1.47) a | 56.21 (±2.72) b | 0.00 ** | |
Note: * p < 0.05; ** p < 0.01; model with superscript a is significantly different from the other three models; models with same superscripts b–d are significantly different from each other.
Prediction accuracies for four models after feature selection.
| Number | Team | Accuracy (%) |
| |||
|---|---|---|---|---|---|---|
| 1DCNN | ANN | SVM | LR | |||
| 1 | ATL | 53.09 (±1.91) | 52.59 (±0.99) | 66.05 (±2.47) a | 54.08 (±3.04) | 0.00 ** |
| 2 | CIN | 60.74 (±0.92) | 55.56 (±0.00) | 63.46 (±3.06) a | 60.49 (±2.18) | 0.00 ** |
| 3 | MIA | 60.62 (±1.06) b | 63.46 (±0.25) c | 65.53 (±0.99) b,d | 59.14 (±3.37) c,d | 0.00 ** |
| 4 | NYM | 54.44 (±1.76) | 51.48 (±0.84) | 62.59 (±1.00) a | 52.47 (±3.38) | 0.00 ** |
| 5 | PHI | 58.27 (±2.39) | 57.28 (±0.25) b | 64.20 (±2.84) b,c | 58.15 (±1.89) c | 0.01 * |
| 6 | WSN | 50.62 (±2.38) | 49.38 (±0.00) | 64.32 (±3.48) a | 55.56 (±1.17) a | 0.00 ** |
| 7 | MIL | 52.71 (±1.15) b | 56.79 (±0.68) b | 65.31 (±3.80) a | 55.19 (±3.06) | 0.00 ** |
| 8 | CHC | 55.31 (±0.84) | 52.47 (±0.00) | 66.05 (±2.17) a | 60.74 (±2.80) a | 0.00 ** |
| 9 | STL | 56.42 (±0.92) | 54.82 (±0.46) | 65.93 (±2.42) a | 56.30 (±2.42) | 0.00 ** |
| 10 | PIT | 57.65 (±1.39) b | 49.26 (±1.43) b | 66.91 (±2.39) a | 54.45 (±4.14) | 0.00 ** |
| 11 | SFG | 51.73 (±1.53) | 50.62 (±1.41) | 64.82 (±1.10) a | 51.86 (±2.51) | 0.00 ** |
| 12 | SDP | 55.80 (±0.84) | 57.04 (±1.73) | 67.53 (±1.38) a | 59.26 (±2.90) | 0.00 ** |
| 13 | LAD | 61.11 (±0.63) b | 60.99 (±0.25) c | 66.42 (±1.82) b,c | 60.24 (±3.61) | 0.02 * |
| 14 | COL | 53.33 (±2.01) | 50.62 (±2.10) | 66.43 (±1.92) a | 54.44 (±1.81) | 0.00 ** |
| 15 | ARI | 56.79 (±2.21) | 51.85 (±1.10) | 63.95 (±3.57) a | 48.27 (±3.48) | 0.00 ** |
| 16 | TBR | 54.20 (±2.80) | 50.62 (±1.87) | 62.72 (±3.39) a | 52.10 (±3.26) | 0.00 ** |
| 17 | BOS | 60.86 (±1.08) b | 55.56 (±0.00) b | 65.80 (±3.67) a | 57.41 (±1.74) | 0.00 ** |
| 18 | NYY | 59.50 (±1.08) | 59.51 (±0.74) | 66.44 (±3.91) a | 60.62 (±3.30) | 0.01 * |
| 19 | TOR | 51.73 (±2.48) | 47.53 (±2.62) b | 66.54 (±1.81) a | 54.82 (±3.43) b | 0.00 ** |
| 20 | BAL | 58.03 (±2.50) | 56.54 (±0.74) | 70.74 (±3.88) a | 61.48 (±3.07) | 0.00 ** |
| 21 | CHW | 52.84 (±1.73) b,c | 58.15 (±0.99) b | 62.59 (±2.64) c,d | 56.17 (±1.29) d | 0.00 ** |
| 22 | CLE | 51.61 (±1.00) | 54.07 (±0.50) | 65.18 (±1.49) a | 59.51 (±3.26) a | 0.00 ** |
| 23 | KCR | 55.43 (±1.63) | 56.42 (±1.44) | 66.05 (±1.83) a | 58.03 (±3.96) | 0.00 ** |
| 24 | DET | 54.69 (±1.14) a | 60.99 (±0.25) b | 64.81 (±1.95) b | 62.10 (±2.94) | 0.00 ** |
| 25 | MIN | 56.17 (±1.10) b | 51.73 (±1.58) b | 67.53 (±2.83) a | 54.94 (±5.24) | 0.00 ** |
| 26 | OAK | 56.29 (±1.26) | 56.05 (±2.01) | 66.43 (±4.02) a | 61.11 (±0.78) a | 0.00 ** |
| 27 | HOU | 57.90 (±0.72) | 57.28 (±0.46) | 69.01 (±3.13) a | 60.74 (±2.42) | 0.00 ** |
| 28 | LAA | 52.96 (±0.99) | 52.59 (±1.06) | 67.53 (±4.07) a | 53.70 (±3.84) | 0.00 ** |
| 29 | SEA | 48.40 (±1.49) | 48.89 (±2.32) | 65.55 (±2.15) a | 51.36 (±3.03) | 0.00 ** |
| 30 | TEX | 55.06 (±2.04) | 53.70 (±2.24) b | 65.92 (±2.80) a | 56.17 (±1.56) | 0.00 ** |
| Mean | 55.48 (±3.22) | 54.47 (±3.91) b | 65.75 (±1.77) a | 56.70 (±3.52) b | 0.00 ** | |
Note: * p < 0.05; ** p < 0.01; model with superscript a is significantly different from the other three models; models with same superscripts b–d are significantly different from each other.
Accuracies before and after feature selection.
| Feature Selection | Accuracy (%) |
| |||
|---|---|---|---|---|---|
| 1DCNN | ANN | SVM | LR | ||
| No | 55.48 (±3.22) | 54.29 (±3.30) b | 64.25 (±1.47) a | 56.21 (±2.72) b | 0.00 ** |
| Yes | 55.48 (±3.22) | 54.47 (±3.91) b | 65.75 (±1.77) a | 56.70 (±3.52) b | 0.00 ** |
Note: ** p < 0.01; model with superscript a is significantly different from the other three models; models with superscript b are significantly different from each other.
Comparisons with related studies.
| Author | Input Variables | Methods | Accuracy (%) | AUC |
|---|---|---|---|---|
| Jia et al. [ | BA, RBI, OBP, ERA, H, E, and | 59.60 | - | |
| Soto Valero [ | isHomeClub, Log5, PE, WP, RC, HomeWonPrev, VisitorWonPrev, BABIP, FP, PitchERA, OBP, SLG, VisitorLeague, HomeVersusVisitor, Stolen | 58.92 | - | |
| Elfrink [ | AB, AVG, OBP, SLG, OPS, BA/RISP, WHIP, RA | Random forest | 55.52 | - |
| Cui [ | OBP, ISO, FIP, WHIP, K/9, HR/9, K/BB, ELO, rest days between games | 61.77 | 0.6706 | |
| This study | TEX Team | 1DCNN | 65.75 | 0.6501 |
Methods with bold format achieved the highest accuracy in the study.