| Literature DB >> 33011712 |
Wonju Seo1, Namho Kim1, Sang-Kyu Lee2, Sung-Min Park1.
Abstract
BACKGROUND AND AIMS: Problem gambling among adolescents has recently attracted attention because of easy access to gambling in online environments and its serious effects on adolescent lives. We proposed a machine learning-based analysis method for predicting the degree of problem gambling.Entities:
Keywords: adolescents; feature engineering; machine learning-based analysis method; problem gambling
Mesh:
Year: 2020 PMID: 33011712 PMCID: PMC8943669 DOI: 10.1556/2006.2020.00063
Source DB: PubMed Journal: J Behav Addict ISSN: 2062-5871 Impact factor: 6.756
Fig. 1.The proposed machine learning-based analysis method frame. In training models, a grid search method with 5-fold stratified cross-validation was used to tune the hyper-parameters of each model
General characteristics of the study population
| Variables | Sampled set | Class 0 | Class 1 |
| (GPSS/CAGI | (GPSS/CAGI | ||
| Frequency(%)/Mean ± 1 SD | Frequency(%)/Mean ± 1 SD | Frequency(%)/Mean ± 1 SD | |
|
| 5,045 | 3,920 | 1,125 |
|
| |||
| Female | 2,467(48.9) | 2008(51.2) | 459(40.8) |
| Male | 2,578(51.1) | 1912(48.8) | 666(59.2) |
|
| 15.0 ± 1.4 | 14.9 ± 1.4 | 15.1 ± 1.5 |
|
| |||
| Middle school 1 | 944(18.7) | 758(19.3) | 186(16.5) |
| Middle school 2 | 991(19.6) | 787(20.1) | 204(18.1) |
| Middle school 3 | 1,034(20.5) | 820(20.9) | 214(19.0) |
| High school 1 | 999(19.8) | 785(20.0) | 214(19.0) |
| High school 2 | 1,077(21.3) | 770(19.6) | 307(27.3) |
|
| |||
| Capital (Seoul) | 504(10.0) | 421(10.7) | 83(7.4) |
| Metropolitan area | 1,656(32.8) | 1,349(34.4) | 307(27.3) |
| Provinces | 2,885(57.2) | 2,150(54.8) | 735(65.3) |
| Age at gambling onset, years | 12.7 ± 2.4 | 12.8 ± 2.4 | 12.6 ± 2.7 |
| Number of gambling behaviors in the past 3 months | 2.3 ± 1.8 | 2.0 ± 1.5 | 3.3 ± 2.2 |
|
| |||
| | 3,920(77.7) | 3,920(100) | 0(0) |
| | 1,125(22.3) | 0(0) | 1,125(100) |
|
| |||
| No or do not know | 3,883(77.0) | 3,248(82.9) | 635(56.4) |
| Yes | 1,162(23.0) | 672(17.1) | 490(43.6) |
|
| |||
| No or do not know | 4,382(86.9) | 3,462(88.3) | 920(81.8) |
| Yes | 663(13.1) | 458(11.7) | 205(18.2) |
|
| |||
| None | 149(3.0) | 110(2.8) | 39(3.5) |
| Less than $40 | 2,103(41.7) | 1,695(43.2) | 408(36.3) |
| Less than $80 | 1,548(30.7) | 1,227(31.3) | 321(28.5) |
| Approximately $80–$240 | 1,077(21.3) | 791(20.2) | 286(25.4) |
| Approximately $240–$400 | 132(2.6) | 80(2.0) | 52(4.6) |
| Approximately $400–$800 | 26(0.5) | 10(0.3) | 16(1.4) |
| Greater than or equal to $800 | 10(0.2) | 7(0.2) | 3(0.3) |
Changed KRW to US dollar ($). Abbreviations: GPSS, Gambling Problem Severity Scale; CAGI, Canadian Adolescent Gambling Inventory.
Fig. 2.Calculated permutation importance of selected features with the random forest-based method. Y axis = feature importance. X axis = selected features’ number. A blue box indicates the mean feature importance. A black line indicates ±1 standard deviation
Metrics on the testing set of each model
| Model | AUC | Accuracy (%) | F1 score |
| RF | 0.752 | 71.8 | 0.504 |
| SVM | 0.747 | 71.4 | 0.507 |
| ET | 0.755 | 71.5 | 0.502 |
| RR | 0.753 | 69.9 | 0.495 |
Abbreviations: AUC, area under the curve; ET, extra trees; RR, ridge regression; RF, random forest; SVM, support vector machine.
Fig. 3.ROC curves for all models. Each color line indicates each model's ROC curve. The blue line indicates the ROC curve of RF. The orange line indicates the ROC curve of SVM. The green line indicates the ROC curve of ET. The red line indicates the ROC curve of RR