| Literature DB >> 30092101 |
Qiujun Lan1, Qingyue Xiong1, Linjie He2, Chaoqun Ma1.
Abstract
Predicting and analyzing behaviors of investors is of great value to financial institutions. This paper uses survey data from about 9,000 individual investors across China to explore the predictability of decision behaviors by studying demographic characteristics that are relatively easy to obtain. After applying Pearson's chi-squared test, Spearman rank correlation test, and several data mining methods, we verified that demographic characteristics are closely linked to decision behaviors, and it would be an economical and feasible solution for financial organizations to build initial behavioral prediction models especially when investors' behavioral data are insufficient.Entities:
Mesh:
Year: 2018 PMID: 30092101 PMCID: PMC6085059 DOI: 10.1371/journal.pone.0201916
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Geographical distribution of survey objects.
| Region | Percent | Region | Percent | Region | Percent |
|---|---|---|---|---|---|
| Guangdong | 15.29% | Guangxi | 3.08% | Yunnan | 0.99% |
| Shanghai | 11.06% | Tianjin | 2.98% | Guizhou | 0.52% |
| Shandong | 8.25% | Anhui | 2.87% | Xinjiang | 0.47% |
| Beijing | 7.60% | Liaoning | 2.86% | Neimenggu | 0.45% |
| Zhejiang | 7.14% | Hunan | 2.02% | Hainan | 0.25% |
| Jiangsu | 6.57% | Shanxi | 1.88% | Gansu | 0.22% |
| Hebei | 4.38% | Chongqing | 1.77% | Ningxia | 0.15% |
| Hubei | 3.57% | Jiangxi | 1.76% | Qinghai | 0.04% |
| Fujian | 3.37% | Shanxi | 1.52% | Unknown | 0.22% |
| Sichuan | 3.27% | Heilongjiang | 1.18% | — | — |
| Henan | 3.25% | Jilin | 1.02% | — | — |
Fig 1Distribution of demographic characteristic variables.
Fig 2Distribution of investment behaviors.
Variable description.
| Variable | Variable value | description | |
|---|---|---|---|
| Gender | 1.(Man) 2.(woman) | ||
| Age | 1.(25 down) 2.(26–30) 3.(31–40) 4.(41–50) 5.(51–60) 6.(60up) | Years | |
| Occupation | 1.(Government agencies) 2.(Public institutions) 3.(Corporates) 4.(Self-employed) 5.(Students) 6.(Freelancers) 7.(Retirees) 8.(Other) | ||
| Education | 1.(6 down) 2.(6–9) 3.(10–12) 4.(13–15) 5.(16–18) 6.(18 up) | Years of Education | |
| Knowledge | 1.(Poor) 2.(Moderate) 3.(Good) 4.(Excellent) | Investment knowledge&skill | |
| Experience | 1.(2 down) 2.(2–5) 3.(6–10) 4.(11–15) 5.(15 up) | Years engaged in financial investment | |
| Income | 1.(2000 down) 2.(2000–5000) 3.(5001–8000) 4.(8001–12000) 5.(12001–16000) 6.(16001–20000) 7.(20000–25000) 8.(25000 up) | Monthly income(Yuan) | |
| Investment Scale | 1.(40% down) 2.(40% up) | The proportion of investment to disposable funds | |
| Investment instrument | 1.(Stock) 2.(Other) | Most favorite investment instrument | |
| Transaction frequency | 1.(Low(≤10)) 2.(High(>10)) | Trade times per year | |
| Decision-making Style | 1.(Decisive) 2.(Cautious) | ||
| Information Channel | 1.(Financial website, Official website) 2.(Other) | ||
Fig 3Supervised classification.
Confusion matrix.
| Prediction Value | |||
| Positive | Negative | ||
| True Value | Positive | true positive (tp) | false negative (fn) |
| Negative | false positive (fp) | true negative (tn) | |
Results of Pearson’s chi-squared test.
| Scale | Instrument | Frequency | Style | Channel | ||
|---|---|---|---|---|---|---|
| Stag1 | 15.00*** | 43.68*** | 82.51*** | 13.54*** | 0.09 | |
| Stag2 | 25.72*** | 87.10*** | 111.33*** | 6.58** | 2.26 | |
| Stag1 | 101.63*** | 215.93*** | 286.11*** | 56.72*** | 6.68 | |
| Stag2 | 91.93*** | 241.60*** | 338.43*** | 76.02*** | 16.51*** | |
| Stag1 | 102.28*** | 102.23*** | 187.92*** | 17.55** | 12.05* | |
| Stag2 | 80.88*** | 72.19*** | 198.83*** | 9.39 | 24.48*** | |
| Stag1 | 5.86 | 19.37*** | 32.84*** | 11.88** | 11.82** | |
| Stag2 | 13.20** | 29.04*** | 66.85*** | 25.25*** | 11.30** | |
| Stag1 | 188.22*** | 173.23*** | 300.43*** | 64.79*** | 11.04** | |
| Stag2 | 189.52*** | 136.19*** | 263.11*** | 80.27*** | 7.58* | |
| Stag1 | 298.37*** | 467.21*** | 725.21*** | 40.12*** | 14.32*** | |
| Stag2 | 337.71*** | 476.05*** | 965.79*** | 29.16*** | 14.66*** | |
| Stag1 | 219.02*** | 228.98*** | 576.24*** | 66.15*** | 11.34 | |
| Stag2 | 270.12*** | 251.87*** | 585.49*** | 49.31*** | 9.64 |
Note: the chi-squared statistic is marked with ** when it’s significant at level of 0.05 and marked with *** at level of 0.01.
Spearman rank correlation coefficients.
| Scale | Frequency | Style | ||
|---|---|---|---|---|
| Stag1 | 0.091*** | 0.234*** | -0.088*** | |
| Stag2 | 0.086*** | 0.233*** | -0.087*** | |
| Stag1 | 0.020*** | 0.091*** | 0.044*** | |
| Stag2 | 0.038*** | 0.112*** | 0.056*** | |
| Stag1 | 0.195*** | 0.248*** | -0.048*** | |
| Stag2 | 0.184*** | 0.175*** | 0.031** | |
| Stag1 | 0.281*** | 0.441*** | -0.054*** | |
| Stag2 | 0.265*** | 0.447*** | 0.004 | |
| Stag1 | 0.232*** | 0.394*** | -0.019 | |
| Stag2 | 0.230*** | 0.346*** | 0.013*** |
Note: the correlation coefficients marked with ** when it’s significant at level of 0.05 and marked with *** at level of 0.01.
Results of data mining models.
| Style | Instrument | frequency | Channel | Scale | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | Accu | 1 | 2 | Accu | 1 | 2 | Accu | 1 | 2 | Accu | 1 | 2 | Accu | ||
| 0.45 | 0.69 | 0.63 | 0.68 | 0.63 | 0.65 | 0.67 | 0.72 | 0.70 | 0.62 | 0.48 | 0.56 | 0.58 | 0.67 | 0.61 | ||
| 0.34 | 0.78 | 0.51 | 0.77 | 0.73 | 0.66 | 0.62 | 0.48 | 0.76 | 0.47 | |||||||
| 0.39 | 0.73 | 0.59 | 0.69 | 0.70 | 0.69 | 0.62 | 0.48 | 0.66 | 0.55 | |||||||
| 0.49 | 0.70 | 0.64 | 0.66 | 0.63 | 0.65 | 0.68 | 0.74 | 0.71 | 0.60 | 0.48 | 0.55 | 0.63 | 0.63 | 0.65 | ||
| 0.36 | 0.80 | 0.51 | 0.76 | 0.75 | 0.67 | 0.61 | 0.47 | 0.75 | 0.49 | |||||||
| 0.42 | 0.74 | 0.58 | 0.69 | 0.72 | 0.70 | 0.61 | 0.47 | 0.69 | 0.55 | |||||||
| 0.48 | 0.68 | 0.63 | 0.67 | 0.66 | 0.66 | 0.71 | 0.68 | 0.69 | 0.57 | 0.53 | 0.55 | 0.66 | 0.62 | 0.64 | ||
| 0.35 | 0.79 | 0.53 | 0.78 | 0.72 | 0.67 | 0.62 | 0.47 | 0.75 | 0.50 | |||||||
| 0.40 | 0.73 | 0.59 | 0.71 | 0.71 | 0.67 | 0.59 | 0.50 | 0.70 | 0.55 | |||||||
| 0.57 | 0.58 | 0.58 | 0.67 | 0.63 | 0.65 | 0.66 | 0.76 | 0.71 | 0.56 | 0.47 | 0.52 | 0.64 | 0.62 | 0.63 | ||
| 0.32 | 0.79 | 0.51 | 0.77 | 0.76 | 0.66 | 0.59 | 0.44 | 0.75 | 0.49 | |||||||
| 0.41 | 0.67 | 0.58 | 0.69 | 0.70 | 0.71 | 0.57 | 0.45 | 0.69 | 0.55 | |||||||
| 0.53 | 0.61 | 0.63 | 0.69 | 0.62 | 0.65 | 0.70 | 0.71 | 0.71 | 0.58 | 0.48 | 0.54 | 0.61 | 0.65 | 0.62 | ||
| 0.55 | 0.59 | 0.51 | 0.78 | 0.74 | 0.67 | 0.60 | 0.45 | 0.76 | 0.48 | |||||||
| 0.54 | 0.60 | 0.59 | 0.69 | 0.72 | 0.69 | 0.59 | 0.47 | 0.67 | 0.55 | |||||||
| 0.54 | 0.61 | 0.59 | 0.69 | 0.64 | 0.66 | 0.68 | 0.73 | 0.71 | 0.63 | 0.47 | 0.56 | 0.62 | 0.66 | 0.63 | ||
| 0.33 | 0.79 | 0.52 | 0.78 | 0.74 | 0.67 | 0.62 | 0.48 | 0.76 | 0.49 | |||||||
| 0.41 | 0.69 | 0.60 | 0.70 | 0.71 | 0.70 | 0.62 | 0.47 | 0.68 | 0.56 | |||||||
Importance of DC to IB variables.
| IB variable | Importance of DC variables |
|---|---|
| Experience(0.49) > Income(0.21) > Occupation(0.10) > Age(0.05)≥Gender(0.05) ≥ Knowledge(0.05) ≥ Education(0.05) | |
| Experience(0.59) > Income(0.13) > Occupation(0.09) > Age(0.08) > Gender(0.05) > Education(0.04) > Knowledge(0.02) | |
| Experience(0.59) > Income(0.20) > Knowledge(0.06) > Age(0.05) > Gender(0.04) > Education(0.04) ≥ Occupation(0.03) | |
| Knowledge(0.24) > Income(0.20) > Age(0.16) > Education(0.13) > Occupation(0.09) ≥ Gender(0.09) > Experience(0.08) | |
| Knowledge(0.25) > Experience(0.22) > Occupation(0.12) ≥ Income(0.12) > Age(0.11) ≥ Gender(0.11) > Education(0.07) |
Instances of predictive rule from C&R.
| IB variable | No. | Rule | Confidence |
|---|---|---|---|
| 1 | 0.68 | ||
| 2 | 0.69 | ||
| 3 | 0.81 | ||
| 4 | 0.90 | ||
| 5 | 0.83 | ||
| 6 | 0.86 | ||
| 7 | 0.83 | ||
| 8 | 0.70 | ||
| 9 | 0.66 | ||
| 10 | 0.75 |
Result of benefits using different marketing way.
| Marketing without prediction | Marketing with prediction | |
|---|---|---|
| Target Number | 100,000 | 25,000 |
| Promotion Cost | 1,000,000 | 250,000 |
| Number of Responders | 35,180×10% + 64,820×1% = 4,166 | 17,250×10%+7,750×1% = 1,802 |
| Revenue | 4,166×250 = 1,041,500 | 1,802×250 = 450,500 |
| Modeling Cost | 0 | 40,000 |
| Profit | 41,500 | 160,500 |