| Literature DB >> 28819355 |
Taegu Kim1, Jungsik Hong2, Pilsung Kang3.
Abstract
Accurate box office forecasting models are developed by considering competition and word-of-mouth (WOM) effects in addition to screening-related information. Nationality, genre, ratings, and distributors of motion pictures running concurrently with the target motion picture are used to describe the competition, whereas the numbers of informative, positive, and negative mentions posted on social network services (SNS) are used to gauge the atmosphere spread by WOM. Among these candidate variables, only significant variables are selected by genetic algorithm (GA), based on which machine learning algorithms are trained to build forecasting models. The forecasts are combined to improve forecasting performance. Experimental results on the Korean film market show that the forecasting accuracy in early screening periods can be significantly improved by considering competition. In addition, WOM has a stronger influence on total box office forecasting. Considering both competition and WOM improves forecasting performance to a larger extent than when only one of them is considered.Entities:
Mesh:
Year: 2017 PMID: 28819355 PMCID: PMC5551474 DOI: 10.1155/2017/4315419
Source DB: PubMed Journal: Comput Intell Neurosci
Main contributions of this study.
| Categories | Explanatory variables | Forecasting algorithms | Time-horizon |
|---|---|---|---|
| Considering factors | Screening | Three forecasting horizons | |
| Competition | |||
| Word-of-mouth (WOM) | |||
|
| |||
| Employed techniques | Genetic algorithm (GA) | Machine learning | |
| Forecasting combination | |||
|
| |||
| Interpretation | Two scenarios under three forecasting horizons | ||
| Scenario 1: competition, WOM, machine learning, and forecasting combination | |||
| Scenario 2: WOM, competition, machine learning, and forecasting combination | |||
Figure 1Research framework for developing box office forecasting models.
Explanatory variable description (X denotes the target motion picture whereas Top1 denotes the motion picture that was ranked first at the box office one day prior to the release day. Top1–5 denotes the top five movies in terms of box office scores one day prior to release).
| Var | Category | Attribute | Description |
|---|---|---|---|
| 1 | — | Index | Motion picture identifier |
|
| |||
| 2 | Screening |
| Number of seats for |
|
| |||
| 3 | Competition |
| Number of screens for |
| 4 |
| Number of seats for | |
| 5 | Sc_share | Screen share for | |
| 6 |
| Number of screens for | |
| 7 |
| Number of seats for | |
| 8 | Sc_share | Screen share for | |
| 9 | IN_number | Number of motion pictures among | |
| 10 | IN_screen | Number of total screens on the release day for the motion pictures in | |
| 11 | IN_seat | Number of total seats on the release day for the motion pictures in | |
| 12 | IN_share | Aggregated screen shares on the release day for the motion pictures in | |
| 13 | IG_number | Number of motion pictures among | |
| 14 | IG_screen | Number of total screens on the release day for the motion pictures in | |
| 15 | IG_seat | Number of total seats on the release day for the motion pictures in | |
| 16 | IG_share | Aggregated screen shares on the release day for the motion pictures in | |
| 17 | IR_number | Number of motion pictures among | |
| 18 | IR_screen | Number of total screens on the release day for the motion pictures in | |
| 19 | IR_seat | Number of total seats on the release day for the motion pictures in | |
| 20 | IR_share | Aggregated screen shares on the release day for the motion pictures in | |
| 21 | ID_number | Number of motion pictures among | |
| 22 | ID_screen | Number of total screens on the release day for the motion pictures in | |
| 23 | ID_seat | Number of total seats on the release day for the motion pictures in | |
| 24 | ID_share | Aggregated screen shares on the release day for the motion pictures in | |
| 25 | Avg_age | The average screening days of | |
| 26 | Rank_screen | Rank of | |
| 27 |
|
| |
|
| |||
| 28 | WOM |
| Total number of SNS mentions posted between three and two weeks prior to the release |
| 29 |
| Total number of SNS mentions posted between two weeks and one week prior to the release | |
| 30 |
| Total number of SNS mentions posted during one week prior to the release | |
| 31 |
| Total number of emotional SNS mentions posted between three and two weeks prior to the release | |
| 32 |
| Total number of emotional SNS mentions posted between two weeks and one week prior to the release | |
| 33 |
| Total number of emotional SNS mentions posted during one week prior to the release | |
| 34 |
| Total number of positive SNS mentions posted between three and two weeks prior to the release | |
| 35 |
| Total number of positive SNS mentions posted between two weeks and one week prior to the release | |
| 36 |
| Total number of positive SNS mentions posted during one week prior to the release | |
| 37 |
| Total number of negative SNS mentions posted between three and two weeks prior to the release | |
| 38 |
| Total number of negative SNS mentions posted between two weeks and one week prior to the release | |
| 39 |
| Total number of negative SNS mentions posted during one week prior to the release | |
| 40 | Tot_SNS |
| |
| 41 | Avg_SNS_inc | ( | |
| 42 | Weekly_SNS_inc |
| |
| 43 | Tot_emo |
| |
| 44 | Avg_emo_inc | ( | |
| 45 | Weekly_emo_inc |
| |
| 46 | Tot_pos |
| |
| 47 | Avg_pos_inc | ( | |
| 48 | Weekly_pos_inc |
| |
| 49 | Tot_neg |
| |
| 50 | Avg_neg_inc | ( | |
| 51 | Weekly_neg_inc |
| |
Forecasting model configuration.
| Criterion | Identifier | Description |
|---|---|---|
| Target | W1 | Accumulative box office takings in the first week |
| W2 | Accumulative box office takings in the first two weeks | |
| T | Accumulative box office takings over the entire screening period | |
|
| ||
| Explanatory variables | S | Screening variable only |
| SC | Screening + competition variables | |
| SW | Screening + WOM variables | |
| SCW | Screening + competition + WOM variables | |
|
| ||
| Forecasting algorithm | MLR | Multiple linear regression |
| SVR | Support vector machine | |
| GPR | Gaussian process regression | |
|
|
| |
| Comb. | Combining the forecasting results of SVR, GPR, and | |
Tested explanatory variable forecasting combinations for each target.
| MLR | SVR | GPR |
| Comb. | |
|---|---|---|---|---|---|
| S | O | ||||
| SC | O | ||||
| SW | O | ||||
| SCW | O | O | O | O | O |
Figure 2Two scenarios for analyzing forecasting performance improvement.
Selected variables by GA for each forecasting model with MLR as a base learner (all the selected variables are statistically significant at α = 0.1).
| Model W1 | Model W2 | Model T | |
|---|---|---|---|
| Screening + competition (SC) |
|
|
|
|
| |||
| Screening + WOM (SW) |
|
|
|
Number of selected variables by GA in each category for each forecasting model and algorithm.
| Model | Algorithm | Screening | Competition | WOM | Total |
|---|---|---|---|---|---|
| Model W1 | MLR | 1 | 12 | 12 | 25 |
| SVR | 1 | 6 | 11 | 18 | |
| GPR | 1 | 16 | 7 | 24 | |
|
| 1 | 14 | 10 | 25 | |
|
| |||||
| Model W2 | MLR | 1 | 11 | 13 | 25 |
| SVR | 1 | 9 | 9 | 19 | |
| GPR | 1 | 13 | 10 | 24 | |
|
| 1 | 13 | 12 | 26 | |
|
| |||||
| Model T | MLR | 1 | 11 | 9 | 21 |
| SVR | 1 | 6 | 12 | 19 | |
| GPR | 1 | 9 | 11 | 21 | |
|
| 1 | 9 | 9 | 19 | |
Selected variables by GA for each algorithm and each forecasting model when all screening, competition, and SNS-related variables are considered. The number in each cell indicates whether the corresponding variable is selected for the forecasting model (1: selected, 0: not selected).
| Variable | Model W1 | Model W2 | Model T | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MLR | SVR | GPR |
| MLR | SVR | GPR |
| MLR | SVR | GPR |
| |
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
|
| 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 |
|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 |
| Sc_share | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 |
|
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
| Sc_share | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| IN_number | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| IN_screen | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 0 |
| IN_seat | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
| IN_share | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| IG_number | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
| IG_screen | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 |
| IG_seat | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| IG_share | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| IR_number | 1 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
| IR_screen | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
| IR_seat | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| IR_share | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 1 | 0 |
| ID_number | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 |
| ID_screen | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 1 |
| ID_seat | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
| ID_share | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 |
| Avg_age | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Rank_screen | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
|
| 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
|
| 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
|
| 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
|
| 0 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 |
|
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
|
| 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 |
|
| 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 |
|
| 0 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 |
|
| 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 |
|
| 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 1 | 0 |
|
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
|
| 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
|
| 1 | 0 | 0 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0 |
| Tot_SNS | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |
| Avg_SNS_inc | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| Weekly_SNS_inc | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
| Tot_emo | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 |
| Avg_emo_inc | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| Weekly_emo_inc | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 |
| Tot_pos | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |
| Avg_pos_inc | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 |
| Weekly_pos_inc | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 |
| Tot_neg | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| Avg_neg_inc | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 |
| Weekly_neg_inc | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 |
|
| ||||||||||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Forecasting accuracy in terms of MAPE for each forecasting model and algorithm with different explanatory variables. ∗∗∗, ∗∗, and ∗ in the second row in each model denote that the MAPE of the corresponding model is lower than that of (S, MLR) at the significant level of 0.05, 0.1, and 0.2, respectively. Asterisks in the third, fourth, and fifth rows also indicate statistical significance in the aforementioned manner against (SC, MLR), (SW, MLR), and (SCW, MLR), respectively.
| Variables | Screening (S) | Screening + competition (SC) | Screening + WOM (SW) | Screening + competition + WOM (SCW) | |||||
|---|---|---|---|---|---|---|---|---|---|
| Algorithm | MLR | MLR | MLR | MLR | SVR | GPR |
| ML average | Combination |
|
| |||||||||
|
| 0.8383 | 0.4389 | 0.7482 | 0.3515 | 0.4420 | 0.3503 | 0.3175 | 0.3699 |
|
| (47.64%) | (10.75%) | (58.07%) | (47.28%) | (58.21%) | (62.13%) | (55.87%) | (62.77%) | ||
| (19.92%) | (−0.70%) | (20.19%) | (27.67%) | (15.72%) | (28.89%) | ||||
| (53.02%) | (40.93%) | (53.18%) | (57.57%) | (50.56%) | (58.29%) | ||||
| (−25.75%) | (0.33%) | (9.68%) | (−5.25%) | (11.20%) | |||||
| (15.63%) | |||||||||
|
| |||||||||
|
| 0.8391 | 0.6245 | 0.5325 | 0.4616 | 0.3343 | 0.2975 | 0.3731 | 0.3350 |
|
| (25.58%) | (36.54%) | (44.99%) | (60.16%) | (64.54%) | (55.54%) | (60.08%) | (60.98%) | ||
| (26.08%) | (46.46%) | (52.35%) | (40.25%) | (46.36%) | (47.57%) | ||||
| (13.30%) | (37.21%) | (44.12%) | (29.93%) | (37.09%) | (38.51%) | ||||
| (27.58%) | (35.55%) | (19.18%) | (27.44%) | (29.08%) | |||||
| (2.26%) | |||||||||
|
| |||||||||
|
| 0.5501 | 0.4702 | 0.4583 | 0.3681 | 0.2666 | 0.2807 | 0.3123 | 0.2865 |
|
| (14.54%) | (16.69%) | (33.10%) | (51.54%) | (48.98%) | (43.23%) | (47.92%) | (52.76%) | ||
| (21.72%) | (43.30%) | (40.30%) | (33.58%) | (39.06%) | (44.72%) | ||||
| (19.70%) | (41.83%) | (38.75%) | (31.86%) | (37.48%) | (43.29%) | ||||
| (27.56%) | (23.73%) | (15.15%) | (22.15%) | (29.39%) | |||||
| (9.30%) | |||||||||
Figure 3Performance improvements in two different scenarios for three forecasting models (MAPE: bar chart, y-axis on the left, improve: line graph, y-axis on the right).