| Literature DB >> 33041415 |
Yi Liao1, Yuxuan Peng1, Songlin Shi1, Victor Shi2, Xiaohong Yu3.
Abstract
Artificial intelligence has been increasingly employed to improve operations for various firms and industries. In this study, we construct a box office revenue prediction system for a film at its early stage of production, which can help management overcome resource allocation challenges considering the significant investment and risk for the whole film production. In this research, we focus on China's film market, the second-largest box office in the world. Our model is based on data regarding the nature of a film itself without word-of-mouth data from social platforms. Combining extreme gradient boosting, random forest, light gradient boosting machine, k-nearest neighbor algorithm, and stacking model fusion theory, we establish a stacking model for film box office prediction. Our empirical results show that the model exhibits good prediction accuracy, with its 1-Away accuracy being 86.46%. Moreover, our results show that star influence has the strongest predictive power in this model. © Springer Science+Business Media, LLC, part of Springer Nature 2020.Entities:
Keywords: Artificial intelligence; Box office forecast; Film industry; Machine learning; Predictive model; Stacking fusion model
Year: 2020 PMID: 33041415 PMCID: PMC7537957 DOI: 10.1007/s10479-020-03804-4
Source DB: PubMed Journal: Ann Oper Res ISSN: 0254-5330 Impact factor: 4.854
Fig. 1The number of movie-goers in urban cinemas in China (2008–2019)
Fig. 2Film cycle and prediction type
The contributing factors and their predictive effectiveness for box office at different stages
| Time | Features | Predictive effectiveness |
|---|---|---|
| Pre-production prediction | Based on the nature of the film itself, it uses features including release date, type, content, star value, sequel, and duration | With fewer features, it has lower prediction accuracy, but the earliest prediction period, and therefore, the highest practical application value of the prediction results |
| Pre-release prediction | In addition to the characteristics of the film itself, it also includes social media, search platform data, etc | The prediction accuracy is higher than that of pre-production prediction, and it can guide operational decision-making for cinemas but has little value for early investment decision-making |
| Post-release prediction | In addition to pre-release features, it also includes a large amount of theatre data, heat index, and audience comment information | It contains the most information and the best predictive effectiveness, but the application value of the results is very low |
Top 10 box office films in the Chinese market in 2019
| Rank | English Title | Box office (unit: 100 million RMB) | Country of origin | Genre |
|---|---|---|---|---|
| 1 | Ne Zha | 49.34 | China | Comedy/Cartoon |
| 2 | The Wandering Earth | 46.18 | China | Fantasy |
| 3 | Avengers: Endgame | 42.05 | US | Action/Adventure |
| 4 | My People, My Country | 31.46 | China | Drama |
| 5 | The Captain | 28.84 | China | Drama |
| 6 | Crazy Alien | 21.83 | China | Comedy/Fantasy |
| 7 | Pegasus | 17.03 | China | Comedy/Action |
| 8 | The Bravest | 16.76 | China | Drama |
| 9 | Better Days | 15.32 | China | Drama/Romance |
| 10 | Hobbs and Shaw | 14.18 | US | Action/Crime |
Data description
| Variable | Type | Description | Data source |
|---|---|---|---|
| Title | Character string | Title of film | Movie Box Office Database |
| Actor 1/2/3 | Character string | Name of top 3 actors/actresses | Movie Box Office Database Douban Movie |
| Director | Character string | Name of the main director | Movie Box Office Database |
| Actor 1/2/3 microblog fans | Value | Actor/actress’s number of microblog fans | Microblog |
| Release area | Category | Category including ‘Chinese mainland’, ‘Hong Kong’, ‘Taiwan’ | Movie Box Office Database Douban Movie |
| Release date | Date | Film release date | Movie Box Office Database Douban Movie |
| Genre | Category | Including 18 categories | Movie Box Office Database Douban Movie |
| Actor (director) awards | Character string | Golden Rooster Awards, Golden Horse Awards and Hong Kong Film Awards for actors (directors) | Baidu |
Genre classification
| No. | Genre |
|---|---|
| 1 | Romance |
| 2 | Action |
| 3 | Crime |
| 4 | Thriller |
| 5 | Fantasy |
| 6 | Mystery |
| 7 | Sport |
| 8 | War |
| 9 | Literary adaptation |
| 10 | Adventure |
| 11 | Ancient history |
| 12 | History |
| 13 | Family |
| 14 | Drama |
| 15 | Comedy |
| 16 | Music |
| 17 | Cartoon |
| 18 | Documentary |
The feature set of domestic box office prediction system
| Factor | Feature number | Feature | Data type | Feature description |
|---|---|---|---|---|
| Genre | 1 | Dynamic influence of the genre | Continuity | The average box office of this type of film in the past year |
| Star value | 2–22 | Dynamic star value | Continuity | The total box office, average box office, highest box office, lowest box office, number of films performed or directed by the director, and the top three major actors in the past 10 years. The sum of the average box office of the top three actors and the director |
| 23–30 | Static star value | Continuity | The number of microblog fans of the three actors, the number of Golden Rooster Awards, Golden Horse Awards and Hong Kong Film Awards won by the director and the top three major actors. The sum of three actors’ microblog fans | |
| Release date | 31 | Film release year | Dispersed | Film release year |
| 32 | Film release schedule | Dispersed | According to the archive data of films released during Spring Festival, on National Day, during summer holidays and on normal days, the codes are 4, 3, 2 and 1, respectively | |
| Release area | 33–35 | Release area | Dispersed | ‘Mainland’, ‘Hong Kong’ and ‘Taiwan’ region codes |
| Sequel | 36 | Whether a sequel | Dispersed | Is a sequel (assigned 1) or is not a sequel (assigned 0) |
| 37 | Sequel index | Continuity | Box office income of parent film (unit: 10,000 yuan) |
Category labels corresponding to film revenue range
| Class | Revenue range (unit: 10,000 yuan) |
|---|---|
| (Blockbuster) G | > 20,000 |
| F | 8000–20,000 |
| E | 2000–8000 |
| D | 800–2000 |
| C | 200–800 |
| B | 50–200 |
| (Flop) A | < 50 |
Fig. 3Training method of the stacking model with fivefold sample data set
Fig. 4Characteristic contribution analysis of various algorithms: a Feature contribution analysis degree of LightGBM algorithm; b Feature contribution analysis of XGBoost algorithm; c Feature contribution analysis of RF algorithm
Fig. 5Box office prediction method based on model fusion in the framework of stacking
Comparison of classification accuracy of different models
| Performance evaluation | KNN | Random forest | XGBoost | LightGBM | Stacking model |
|---|---|---|---|---|---|
Count (Bingo) Count (1-Away) | 143 | 225 | 226 | 222 | 240 |
| 224 | 287 | 286 | 290 | 300 | |
APHR (Bingo) APHR (1-Away) | 41.21% | 64.84% | 65.13% | 63.98% | 69.16% |
| 64.55% | 82.71% | 82.42% | 83.57% | 86.46% |
Prediction accuracy matrix of the stacking model for film box office prediction
| Predict | A | B | C | D | E | F | G | APHR (Bingo) (%) | APHR (1-Away) (%) |
|---|---|---|---|---|---|---|---|---|---|
| A | 92 | 15 | 4 | 0 | 1 | 1 | 1 | 80.70 | 93.86 |
| B | 4 | 22 | 6 | 0 | 1 | 0 | 0 | 66.67 | 96.97 |
| C | 1 | 7 | 48 | 3 | 5 | 4 | 0 | 70.59 | 85.29 |
| D | 0 | 0 | 2 | 5 | 1 | 1 | 0 | 55.56 | 88.89 |
| E | 0 | 1 | 1 | 3 | 18 | 6 | 3 | 56.25 | 84.38 |
| F | 0 | 1 | 0 | 1 | 4 | 9 | 2 | 52.94 | 88.24 |
| G | 0 | 1 | 3 | 6 | 11 | 7 | 46 | 62.16 | 71.62 |
| AVG | 69.16 | 86.46 | |||||||
Comparison of model results
| Model | Features | Period | APHR (Bingo) (%) |
|---|---|---|---|
| Stacking model (in this paper) | New features | Pre-production | 69.16 |
| Hybrid model (Delen and Sharda | Old features | Pre-release | 56.07 |
| MLP model (Quader et al. | Old features | Pre-release | 58.50 |