| Literature DB >> 34526553 |
Johannes Allgaier1, Winfried Schlee2, Berthold Langguth2, Thomas Probst3, Rüdiger Pryss4.
Abstract
Tinnitus is an auditory phantom perception in the absence of an external sound stimulation. People with tinnitus often report severe constraints in their daily life. Interestingly, indications exist on gender differences between women and men both in the symptom profile as well as in the response to specific tinnitus treatments. In this paper, data of the TrackYourTinnitus platform (TYT) were analyzed to investigate whether the gender of users can be predicted. In general, the TYT mobile Health crowdsensing platform was developed to demystify the daily and momentary variations of tinnitus symptoms over time. The goal of the presented investigation is a better understanding of gender-related differences in the symptom profiles of users from TYT. Based on two questionnaires of TYT, four machine learning based classifiers were trained and analyzed. With respect to the provided daily answers, the gender of TYT users can be predicted with an accuracy of 81.7%. In this context, worries, difficulties in concentration, and irritability towards the family are the three most important characteristics for predicting the gender. Note that in contrast to existing studies on TYT, daily answers to the worst symptom question were firstly investigated in more detail. It was found that results of this question significantly contribute to the prediction of the gender of TYT users. Overall, our findings indicate gender-related differences in tinnitus and tinnitus-related symptoms. Based on evidence that gender impacts the development of tinnitus, the gathered insights can be considered relevant and justify further investigations in this direction.Entities:
Mesh:
Year: 2021 PMID: 34526553 PMCID: PMC8443560 DOI: 10.1038/s41598-021-96731-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Description of the dataframe used for the machine learning approaches.
| Meaning | Scaling | Implementation | Count | Mean | Std | |
|---|---|---|---|---|---|---|
| Question1 | Did you perceive the tinnitus right now? | Binary | YesNoSwitch | 80,969 | 0.76 | 0.43 |
| Question2 | How loud is the tinnitus right now? | Continuous | Slider in range (0,1) | 80,969 | 0.46 | 0.3 |
| Question3 | How stressful is the tinnitus right now? | Continuous | Slider in range (0,1) | 80,969 | 0.36 | 0.28 |
| Question4 | How is your mood right now? | Discrete | SAM from 0 to 1 with step size 0.125 | 80,969 | 0.56 | 0.21 |
| Question5 | How is your arousal right now? | Discrete | SAM from 0 to 1 with step size 0.125 | 80,969 | 0.26 | 0.22 |
| Question6 | Do you feel stressed right now? | Continuous | Slider in range (0,1) | 80,969 | 0.28 | 0.24 |
| Question7 | How much did you concentrate on the things you are doing right now? | Continuous | Slider in range (0,1) | 80,969 | 0.58 | 0.31 |
| Question8_0 | Because of the tinnitus it is hard for me to get to sleep | Binary | YesNoSwitch | 7919 | 0.35 | 0.48 |
| Question8_1 | I am feeling depressed because of the tinnitus | Binary | YesNoSwitch | 10,361 | 0.23 | 0.42 |
| Question8_2 | I find it harder to relax because of the tinnitus | Binary | YesNoSwitch | 13,904 | 0.45 | 0.5 |
| Question8_3 | I don’t have any of these symptoms | NULL | NULL | NULL | NULL | NULL |
| Question8_4 | I have strong worries because of the tinnitus | Binary | YesNoSwitch | 10,839 | 0.27 | 0.45 |
| Question8_5 | Because of the tinnitus it is difficult to follow a conversation, a piece of music or a film | Binary | YesNoSwitch | 11,877 | 0.33 | 0.47 |
| Question8_6 | Because of the tinnitus it is difficult to concentrate | Binary | YesNoSwitch | 8220 | 0.42 | 0.49 |
| Question8_7 | Because of the tinnitus I am more irritable with my family, friends and colleagues | Binary | YesNoSwitch | 3391 | 0.32 | 0.47 |
| Question8_8 | Because of the tinnitus I am more sensitive to environmental noises | Binary | YesNoSwitch | 9179 | 0.09 | 0.29 |
| Qender | 0 = Male, 1 = female | Binary | Single Choice | 80,969 | 0.26 | 0.44 |
Note that the count for the questions 8_0, 8_1, ..., q_8 is dependent on the number of individuals that selected this answer in the baseline questionnaire. If an individual selected I don’t have any of these symptoms, no follow-up question appeared, so that these values are NULL. SAM = Self Assessment Manikin[65].
Overview of the three Research Questions i-iii, the used classifiers and the results.
| No. | Research question | Machine learning algorithm | Results | |||
|---|---|---|---|---|---|---|
| SVM | Tree | RF | NN | |||
| i | Is it generally possible to learn a mapping function from X to y where X are questions that the user answered daily and y is a binary target representing the gender of a user? | Precision on average: male: 81.5% female: 84.3% | ||||
| ii | Which machine learning model is most suitable for this task and a high prediction power? | Mean accuracy on a fivefold cross validation set: Random Forest classifier (81.7%) | ||||
| iii | Which are the features with the highest importance to predict the gender? | Most important features are: q8_4: Worries about the tinnitus q8_5: difficulties in following a conversation | ||||
SVM = Support Vector Machine, Tree = Decision Tree, RF = Random Forest, NN = Multilayer Perceptron Neural Network. A checkmark means that this classifier has been used to answer the research question.
Comparison of the four used classifiers in terms of precision per gender and F1-score.
| Classifier | Precision male | Precision female | F1-score |
|---|---|---|---|
| Support Vector Machine | 0.80 | 0.86 | 0.83 |
| Decision Tree | 0.81 | 0.80 | 0.81 |
| Neural Network | 0.82 | 0.83 | 0.83 |
| Random Forest | 0.83 | 0.88 | 0.85 |
Number of examples is denoted by m = 1702. Used features: {q1, q2, ..., q7, q8_5}, test size 20%. Note that the feature labels qx are further explained in Table 3.
Figure 1Set of hyper-parameters for a grid search in order to improve the forest’s accuracy. Note that not all hyper-parameters have be varied, such as n_jobs, oob_score or verbose. Only hyper-parameters were varied that have a higher impact on the accuracy score. However, static parameters are listed for the purpose of integrity.
Figure 2Comparison of three approaches to determine the most important feature for gender prediction. A ranking value of 1 means that this feature is most important to predict the gender.
Figure 4Distribution of the worst symptom grouped by gender in a horizontal stacked plot. Each row of the figure adds up to 100%.
Figure 3The dashed lines denote the age distribution for the all individuals, whereas the solid lines indicate the subset of individuals used for the machine learning calculations. This subset has a size of m = 11,877, and contains 238 + 94 individual users. For all users, m equals to 80,969. Note that the high p-values for both groups indicate equality of the age distribution.
Baseline characteristics of the Tinnitus Sample Case History Questionnaire (TSCHQ) for all individuals that filled out at least one follow-up questionnaire.
| Characteristic | ||||||
|---|---|---|---|---|---|---|
| n | Age (std) | Right-handed | Left-handed | Both sides | Existing family history of tinnitus complaints | |
| Male | 1871 | 49.23 (14.40) | 1345 (71%) | 282 (13%) | 244 (16%) | 426 (23%) |
| Female | 875 | 46.04 (14.72) | 650 (75%) | 126 (11%) | 99 (14%) | 235 (27%) |
| Total | 2746 | 48.71 (14.89) | 1994 (73%) | 408 (12%) | 343 (15%) | 661 (24%) |
Individual users that registered for this study, but did not fill out at least one follow-up questionnaire, are not considered in this table.
Figure 5Number of filled out daily questionnaires per group (left) and per gender (right). The red-dashed line in the right plot indicates the mean value. Most of the individual users filled out the questionnaire only once. On average, men answered the questionnaire 32 times (± 124 std), and women 24 times (± 82 std) with t(2757) = 2.17 and p < 0.05. Notably, there is one male user that filled out the daily questionnaire 3073 times.
Figure 6Heatmap for feature-gender cross-correlations. The last column (resp. the last row) shows the correlation of the whole data set (without equal splits for male and female individuals) with the target gender. Depending on the feature scaling, different correlation approaches (Cramer’s V, Pointbiserial and Pearson) have been used. The matrix reveals strong positive correlations between stressfulness and loudness of the tinnitus or negative correlations between mood and stressfulness of an individual user. The heatmap was formatted using MS Excel 365. Correlation metrics were calculated using SciPy 1.5.0 within a Python 3.7 environment.
Figure 7ROC curve for compared classifiers. As the decision tree contains only pure subsets, the class probabilities are either 0 or 1. This leads to a triangled ROC curve.
Listing 1Hyperparameter set-up for the used classifiers.