| Literature DB >> 35404261 |
Sytske Wiegersma1, Maurice Hidajat2, Bart Schrieken2, Bernard Veldkamp1, Miranda Olff3,4.
Abstract
BACKGROUND: Text mining and machine learning are increasingly used in mental health care practice and research, potentially saving time and effort in the diagnosis and monitoring of patients. Previous studies showed that mental disorders can be detected based on text, but they focused on screening for a single predefined disorder instead of multiple disorders simultaneously.Entities:
Keywords: automated intake and referral; computerized CBT; mental health disorders; multi-class classification; screening; supervised text classification
Year: 2022 PMID: 35404261 PMCID: PMC9039807 DOI: 10.2196/21111
Source DB: PubMed Journal: JMIR Ment Health ISSN: 2368-7959
Figure 1Supervised text classification model procedure. In the training phase, the model is trained on labeled feature sets extracted from the input texts. In the prediction phase, the trained model is used to predict labels for new, unlabeled feature sets extracted from the input texts.
Confusion matrix for the 7-class classifier: comparison of true and predicted class labels for classesA-G.
| True label | Predicted label | ||||||
|
| ClassA | ClassB | ClassC | ClassD | ClassE | ClassF | ClassG |
| ClassA |
| EA,B | EA,C | EA,D | EA,E | EA,F | EA,G |
| ClassB | EB,A |
| EB,C | EB,D | EB,E | EB,F | EB,G |
| ClassC | EC,A | EC,B |
| EC,D | EC,E | EC,F | EC,G |
| ClassD | ED,A | ED,B | ED,C |
| ED,E | ED,F | ED,G |
| ClassE | EE,A | EE,B | EE,C | EE,D |
| EE,F | EE,G |
| ClassF | EF,A | EF,B | EF,C | EF,D | EF,E |
| EF,G |
| ClassG | EG,A | EG,B | EG,C | EG,D | EG,E | EG,F |
|
aTP: true positive.
bThe values on the diagonal (in italics) show the correctly predicted class labels. The off-diagonal values show the prediction errors.
Figure 2Nested 5-fold cross-validation scheme. The validation strategy consists of an inner and an outer 5-fold cross-validation loop. In the inner loop an exhaustive parameter grid search is conducted using data from the development set to select the best combination of parameter settings. The selected model is then tested on the held-out test set from the outer loop to evaluate final model performance. Both loops are being iterated 5 times, alternately using each fold as test set (outer loop) or validation set (inner loop) once.
Patient and lexical characteristics (N=5863).
| Variable | Addiction (n=197) | Anxiety (n=1100) | Panic (n=1100) | PTSDa (n=1016) | Somatic (n=1100) | Mood (n=1100) | Eating (n=250) | Total (N=5863) | ||||||||||||||||||||
|
| ||||||||||||||||||||||||||||
|
|
| |||||||||||||||||||||||||||
|
|
| Female | 18 (9.14) | 362 (32.91) | 394 (35.82) | 498 (49.02) | 500 (45.45) | 265 (24.09) | 180 (72) | 2217 (37.81) | ||||||||||||||||||
|
|
| Male | 34 (17.26) | 176 (16) | 174 (15.82) | 119 (11.71) | 197 (17.91) | 166 (15.09) | 8 (3.20) | 874 (14.91) | ||||||||||||||||||
|
|
| Unknownb | 145 (73.60) | 562 (51.09) | 532 (48.36) | 399 (39.27) | 403 (36.64) | 669 (60.82) | 62 (24.80) | 2772 (47.28) | ||||||||||||||||||
|
| Age (years), mean (SD) | 37.9 (15.0) | 36.5 (14.2) | 36.3 (13.8) | 36.5 (13.1) | 41.2 (11.7) | 39.2 (14.4) | 30.8 (10.0) | 37.7 (13.6) | |||||||||||||||||||
|
| ||||||||||||||||||||||||||||
|
| Anxiety | 5.8 (5.3) | 8.0 (5.0) | 11.9 (5.5) | 9.3 (6.3) | 5.8 (4.9) | 6.6 (5.3) | 5.8 (5.6) | 8.1 (5.8) | |||||||||||||||||||
|
| Depression | 3.9 (3.8) | 3.3 (3.1) | 4.1 (3.7) | 4.8 (3.8) | 3.5 (3.1) | 6.3 (3.7) | 4.4 (3.8) | 4.4 (3.7) | |||||||||||||||||||
|
| Distress | 19.0 (8.4) | 19.2 (7.5) | 20.5 (7.6) | 23.6 (6.9) | 21.5 (6.9) | 23.7 (6.8) | 19.1 (8.2) | 21.5 (7.5) | |||||||||||||||||||
|
| Somatization | 10.5 (6.8) | 11.1 (6.6) | 15.3 (6.9) | 14.7 (7.4) | 13.6 (6.7) | 12.6 (6.9) | 12.4 (7.1) | 13.3 (7.1) | |||||||||||||||||||
|
| ||||||||||||||||||||||||||||
|
| No care | 15 (7.61) | 62 (5.64) | 28 (2.55) | 31 (3.05) | 61 (5.55) | 55 (5) | 13 (5.20) | 265 (4.52) | |||||||||||||||||||
|
| General practice | 46 (23.35) | 198 (18) | 165 (15) | 90 (8.86) | 171 (15.55) | 183 (16.64) | 19 (7.60) | 872 (14.87) | |||||||||||||||||||
|
| Basic: short | 11 (5.58) | 127 (11.55) | 92 (8.36) | 93 (9.15) | 110 (10) | 102 (9.27) | 8 (3.20) | 543 (9.26) | |||||||||||||||||||
|
| Basic: moderate | 4 (2.03) | 90 (8.18) | 69 (6.27) | 41 (4.04) | 84 (7.64) | 34 (3.09) | 7 (2.80) | 329 (5.61) | |||||||||||||||||||
|
| Basic: intensive | 23 (11.68) | 340 (30.91) | 340 (30.91) | 244 (24.02) | 457 (41.55) | 283 (25.73) | 29 (11.60) | 1716 (29.27) | |||||||||||||||||||
|
| Specialist | 98 (49.75) | 283 (25.73) | 406 (36.91) | 517 (50.89) | 217 (19.72) | 443 (40.27) | 174 (69.60) | 2138 (36.47) | |||||||||||||||||||
| Lexical characteristics: words (N), mean (SD) | 55.1 (55.0) | 71.7 (69.5) | 68.0 (103.5) | 75.1 (157.0) | 70.9 (74.9) | 65.5 (75.2) | 76.4 (72.4) | 69.9 (98.2) | ||||||||||||||||||||
aPTSD: posttraumatic stress disorder.
bFor patients who entered the study through their general practitioner, the gender is not registered; as such, gender is unknown for a large group of patients.
cDIPP: Digitale Indicatiehulp Psychische Problemen (Digital Indication Aid for Mental Health Problems).
d4DSQ: Dutch 4D Symptom Questionnaire. For the 4DSQ, trichotomized 5-point scale responses on each subscale are reported (see the study by Terluin et al [27] for the exact scoring method). Scores are considered moderately elevated (>10, >2, >8, >10) or strongly elevated (>20, >5, >12, >20) for distress, depression, anxiety, and somatization, respectively.
Best parameters selected by exhaustive grid search.
| Parameter | Best value |
| Remove stop words | Yes |
| Minimal | 1 |
| Representation scheme | Unigrams |
| Term weight | Term frequency |
| Select | 470 |
| Regularization parameter | 1 |
ax: number of documents a feature should be present in.
bk: number of most informative features selected.
The 50 most informative features (keywords) of the multi-class classifier with the highest chi-square values and significant (P<.05) P values.
| English keyword (Dutch stem) | Chi-square ( | Addictiona | Anxietya | Eatinga | Mooda | PTSDa,b | Panica | Somatica | |
| food (eten) | 437.0 (1) | <.001 | 1 | 18 |
| 19 | 20 | 32 | 22 |
| binge (eetbui) | 407.3 (1) | <.001 | 0 | 3 |
| 3 | 3 | 0 | 2 |
| fear (angst) | 126.6 (1) | <.001 | 17 | 411 | 25 | 98 | 205 |
| 82 |
| eating disorder (eetstoornis) | 100.9 (1) | <.001 | 0 | 1 |
| 1 | 3 | 1 | 1 |
| panic attacks (paniekaanvall) | 96.6 (1) | <.001 | 0 | 13 | 2 | 12 | 21 |
| 11 |
| to vomit (brak) | 93.1 (1) | <.001 | 0 | 6 |
| 0 | 2 | 0 | 4 |
| bulimia (boulimia) | 78.4 (1) | <.001 | 0 | 1 |
| 0 | 0 | 0 | 0 |
| eating pattern (eetpatron) | 75.8 (1) | <.001 | 0 | 0 |
| 2 | 1 | 0 | 1 |
| weight (gewicht) | 69.9 (1) | <.001 | 0 | 0 |
| 4 | 1 | 0 | 3 |
| to throw up (overgev) | 62.2 (1) | <.001 | 2 | 16 |
| 0 | 1 | 19 | 4 |
| panic (paniek) | 57.7 (1) | <.001 | 8 | 42 | 4 | 22 | 49 |
| 23 |
| eat (eet) | 53.4 (1) | <.001 | 2 | 6 |
| 2 | 4 | 7 | 2 |
| drink (drink) | 48.0 (1) | <.001 |
| 5 | 2 | 8 | 2 | 9 | 1 |
| eating behavior (eetgedrag) | 44.4 (1) | <.001 | 0 | 0 |
| 0 | 0 | 0 | 0 |
| nightmares (nachtmerries) | 42.3 (1) | <.001 | 0 | 7 | 0 | 6 |
| 8 | 1 |
| binge (vreetbui) | 40.9 (1) | <.001 | 0 | 0 |
| 0 | 0 | 0 | 0 |
| work (werk) | 39.5 (1) | <.001 | 30 | 214 | 26 | 238 | 172 | 232 |
|
| past (verled) | 37.4 (1) | <.001 | 5 | 74 | 11 | 65 |
| 73 | 47 |
| healthy (gezond) | 36.8 (1) | <.001 | 4 | 21 |
| 30 | 17 | 37 | 20 |
| overeating (overet) | 35.6 (1) | <.001 | 0 | 0 |
| 0 | 0 | 0 | 0 |
| sense (zin) | 34.5 (1) | <.001 | 20 | 41 | 17 |
| 78 | 56 | 103 |
| to lose weight (afvall) | 30.6 (1) | <.001 | 2 | 1 |
| 6 | 3 | 3 | 3 |
| eating problems (eetproblem) | 30.3 (1) | <.001 | 0 | 0 |
| 0 | 2 | 1 | 2 |
| scared (bang) | 30.1 (1) | <.001 | 13 | 205 | 22 | 65 | 131 |
| 54 |
| to attack (aanvall) | 29.5 (1) | <.001 | 2 | 6 | 1 | 8 | 22 |
| 7 |
| to compensate (compenser) | 28.3 (1) | <.001 | 0 | 3 |
| 0 | 0 | 0 | 0 |
| fat (dik) | 28.2 (1) | <.001 | 0 | 3 |
| 2 | 3 | 3 | 1 |
| anxious (angstig) | 27.6 (1) | <.001 | 6 | 152 | 8 | 62 | 102 |
| 43 |
| tired (moe) | 27.2 (1) | <.001 | 12 | 66 | 10 | 145 | 88 | 66 |
|
| panic attack (paniekaanval) | 27.1 (1) | <.001 | 1 | 2 | 0 | 1 | 3 |
| 3 |
| drug (drug) | 26.3 (1) | <.001 |
| 5 | 3 | 5 | 4 | 6 | 3 |
| raped (verkracht) | 23.6 (1) | .001 | 1 | 2 | 3 | 0 |
| 6 | 4 |
| accident (ongeluk) | 23.0 (1) | .001 | 7 | 26 | 1 | 20 |
| 30 | 24 |
| overweight (overgewicht) | 22.9 (1) | .001 | 0 | 1 |
| 2 | 1 | 1 | 1 |
| to smoke (blow) | 22.6 (1) | .001 |
| 1 | 1 | 0 | 6 | 0 | 0 |
| hyperventilation (hyperventilatie) | 22.5 (1) | .001 | 2 | 3 | 0 | 2 | 4 |
| 7 |
| tired (vermoeid) | 22.5 (1) | .001 | 7 | 33 | 4 | 60 | 35 | 38 |
|
| alcohol (alcohol) | 22.5 (1) | .001 |
| 9 | 5 | 6 | 4 | 6 | 5 |
| abuse (misbruik) | 21.1 (1) | .002 | 5 | 9 | 0 | 6 |
| 6 | 4 |
| obsession (obsessie) | 21.1 (1) | .002 | 0 | 2 |
| 0 | 0 | 0 | 0 |
| flashback (flashback) | 20.7 (1) | .002 | 2 | 1 | 0 | 4 |
| 1 | 0 |
| eating (eten) | 20.2 (1) | .003 | 0 | 0 |
| 0 | 0 | 0 | 0 |
| heavy-headed (lustelos) | 20.0 (1) | .003 | 9 | 40 | 6 |
| 27 | 23 | 74 |
| control (control) | 19.6 (1) | .003 | 9 | 53 | 49 | 45 | 36 |
| 38 |
| ate (geget) | 19.3 (1) | .004 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| underweight (ondergewicht) | 18.9 (1) | .004 | 0 | 0 |
| 1 | 0 | 0 | 0 |
| nutrition (voeding) | 18.9 (1) | .004 | 0 | 2 |
| 1 | 1 | 0 | 0 |
| gloomy (somber) | 18.6 (1) | .005 | 3 | 32 | 6 |
| 32 | 40 | 38 |
| normal (normal) | 18.4 (1) | .005 | 8 | 58 | 55 | 44 | 83 |
| 63 |
| addictive (verslav) | 17.9 (1) | .007 |
| 4 | 4 | 3 | 2 | 1 | 4 |
aOccurrence frequencies for each feature in each class (disorder).
bPTSD: posttraumatic stress disorder.
cThe frequency for the class in which it occurs the most is presented in italics.
Performance metrics final model: per class and average performance scores for the final model (N=1173).
| Disorder | Patients in test set, n (%) | Precision | Recall | Overall accuracya | |
| Addiction | 40 (3.41) | 0.25 | 0.33 | 0.28 | —b |
| Anxiety | 220 (18.76) | 0.44 | 0.35 | 0.39 | — |
| Eating | 50 (4.26) | 0.75 | 0.82 | 0.78 | — |
| Mood | 220 (18.76) | 0.44 | 0.50 | 0.47 | — |
| PTSDc | 203 (17.31) | 0.57 | 0.52 | 0.54 | — |
| Panic | 220 (18.76) | 0.57 | 0.55 | 0.56 | — |
| Somatic | 220 (18.76) | 0.46 | 0.50 | 0.48 | — |
| Weighted average | N/Ad | 0.50 | 0.49 | 0.49 | 0.49 |
aAccuracy is the overall accuracy of the classifier averaged over all classes.
bData not available for separate classes.
cPTSD: posttraumatic stress disorder.
dN/A: not applicable.
Confusion matrix for the 7-class classifier: absolute and normalized values (%) for the true versus predicted class labels.
| True disorder | Predicted disorder | ||||||
|
| Addiction | Anxiety | Eating | Mood | PTSDa | Panic | Somatic |
| Addiction (N=40), n (%) |
| 3 (7.5) | 1 (2.5) | 8 (20) | 3 (7.5) | 3 (7.5) | 9 (22.5) |
| Anxiety (N=220), n (%) | 11 (5) |
| 6 (2.7) | 33 (15) | 27 (12.3) | 41 (18.6) | 25 (11.4) |
| Eating (N=50), n (%) | 1 (2) | 1 (2) |
| 4 (8) | 1 (2) | 0 (0) | 2 (4) |
| Mood (N=220), n (%) | 11 (5) | 26 (11.8) | 0 (0) |
| 14 (6.4) | 10 (4.5) | 49 (22.3) |
| PTSD (N=203), n (%) | 2 (1) | 18 (8.9) | 0 (0) | 36 (17.7) |
| 19 (9.4) | 23 (11.3) |
| Panic (N=220), n (%) | 4 (1.8) | 27 (12.3) | 3 (1.4) | 24 (10.9) | 18 (8.2) |
| 23 (10.4) |
| Somatic (N=220), n (%) | 10 (4.5) | 23 (10.5) | 4 (1.8) | 37 (16.8) | 16 (7.3) | 19 (8.6) |
|
aPTSD: posttraumatic stress disorder.
bThe diagonal cells show the correctly predicted labels (in italics). The off-diagonal cells show the prediction errors for each class.
Figure 3Normalized confusion plot. Visual presentation of the true versus predicted class labels. The darker the tone, the higher the proportion in the corresponding cell. PTSD: posttraumatic stress disorder.