| Literature DB >> 35052311 |
Charis Ntakolia1, Dimitrios Priftis1, Mariana Charakopoulou-Travlou1, Ioanna Rannou1, Konstantina Magklara2, Ioanna Giannopoulou3, Konstantinos Kotsis4, Aspasia Serdari5, Emmanouil Tsalamanios6, Aliki Grigoriadou7, Konstantina Ladopoulou8, Iouliani Koullourou9, Neda Sadeghi10, Georgia O'Callaghan10, Eleni Lazaratou2.
Abstract
The global spread of COVID-19 led the World Health Organization to declare a pandemic on 11 March 2020. To decelerate this spread, countries have taken strict measures that have affected the lifestyles and economies. Various studies have focused on the identification of COVID-19's impact on the mental health of children and adolescents via traditional statistical approaches. However, a machine learning methodology must be developed to explain the main factors that contribute to the changes in the mood state of children and adolescents during the first lockdown. Therefore, in this study an explainable machine learning pipeline is presented focusing on children and adolescents in Greece, where a strict lockdown was imposed. The target group consists of children and adolescents, recruited from children and adolescent mental health services, who present mental health problems diagnosed before the pandemic. The proposed methodology imposes: (i) data collection via questionnaires; (ii) a clustering process to identify the groups of subjects with amelioration, deterioration and stability to their mood state; (iii) a feature selection process to identify the most informative features that contribute to mood state prediction; (iv) a decision-making process based on an experimental evaluation among classifiers; (v) calibration of the best-performing model; and (vi) a post hoc interpretation of the features' impact on the best-performing model. The results showed that a blend of heterogeneous features from almost all feature categories is necessary to increase our understanding regarding the effect of the COVID-19 pandemic on the mood state of children and adolescents.Entities:
Keywords: COVID-19 pandemic; children and adolescents; machine learning; model calibration; post hoc explainability
Year: 2022 PMID: 35052311 PMCID: PMC8775664 DOI: 10.3390/healthcare10010149
Source DB: PubMed Journal: Healthcare (Basel) ISSN: 2227-9032
Lockdown policies implemented worldwide adapted from [3].
| Type of Measures | Measures | Explanation |
|---|---|---|
| International Measures | Curfew | The effective date when a country announced a restriction on the movement of individuals within a given time of the day |
| State of emergency | The effective date when a country announced a state of emergency | |
| Within-country regional lockdown | The effective date when a region within a country announced a total lockdown | |
| Partial selective lockdown | The earliest effective date for the partial restriction of the movement of people, i.e. school closures or limitations regarding the number of gathered people allowed | |
| External measures | Selective international border closures | The earliest effective date when a country decided to close its borders with a region or country that has been significantly affected by COVID-19 |
| Selective border closures | The earliest effective date following the selective international border closure, when a country closed its border to individuals from one or multiple other countries that have been significantly affected by COVID-19 | |
| International lockdown | The effective date when a country banned all flights, rail, and automotive movements internationally |
Summarization of studies related to the first COVID-19 outbreak, including children and young adults.
| Study | Country | Population | Target | Method |
|---|---|---|---|---|
| [ | China | 8079 Chinese students aged 12–18 | To identify correlations between sociodemographic features and mental health problems in Chinese adolescents during the outbreak of COVID-19 | Multivariable logistic regression analysis |
| [ | China | 668 Chinese children aged 7–15 | To identify the main factors that contribute to the education and the mental health of Chinese children during COVID-19 | Multiple logistic regression analysis |
| [ | China | 584 youths | To study the effects of COVID-19 on youth mental health | Univariate analysis and univariate logistic regression |
| [ | China | Two cross-sectional studies of 9554 and 3886 participants | To evaluate the factors that contribute to depression and anxiety among Chinese adolescents during COVID-19 | Multivariable logistic regression analyses |
| [ | China | 1,199,320 school-aged children and adolescents | To assess the prevalence and the risk factors associated with self-reported psychological distress | Multivariate logistic regression |
| [ | China | 11,835 Chinese adolescents and young adults (12–29 years) | To identify sleeping problems during COVID-19 | Binomial logistic regression analysis |
| [ | China | 2009 Chinese undergraduate students | To predict anxiety and insomnia during COVID-19 | XGBoost model |
| [ | China | 746,217 Chinese university students | To examine variables associated with mental health problems during COVID-19 | Univariate and hierarchical logistic regression analyses |
| [ | China | 89,588 Chinese university students | To identify the risk factors for anxiety symptoms during COVID-19 | Multivariate logistic regression models |
| [ | China | 933 medical students | To evaluate the impact of COVID-19 on anxiety | Multivariate logistic regression |
| [ | France | 69,054 French university students | To study mental health issues due to COVID-19 | Multivariate logistic regression |
| [ | France | 3671 participants | To identify the risk factors for depression during the COVID-19 pandemic | Multivariate logistic regression |
| [ | Bangladesh | 476 university students | To identify the risk factors for depression due to COVID-19 | Binary logistic regression |
| [ | Bangladesh | 384 parents with at least one child aged 5–15 | To identify mental health disturbances during COVID-19 | Binary logistic regression |
| [ | Canada | 1013 children and adolescents aged 6–18, with or without pre-existing diagnoses | To evaluate the effects on mental health during COVID-19 | Multinomial logistic regression |
| [ | Brazil | 157 girls and 132 boys aged 6–12 | To examine the prevalence of anxiety during COVID-19 | Logistic regression |
| [ | Spain | 523 adolescents (13–17 years) | To examine the association between sociodemographic factors and COVID-19-related variables and their effect on depression, anxiety, and stress | Multivariable logistic regression |
| [ | Australia | Parents of 213 children and adolescents aged 5–17 who have been diagnosed with ADHD | To identify the impact of COVID-19 restrictions | Adjusted logistic regression analyses |
| [ | China | 478 college students after school reopening | To examine the psychological impact of COVID-19 | Multivariate logistic regression |
| [ | Belgium | 2008 young people aged 16–25 | To examine mental distress and its contributing factors | Bivariate and multivariable logistic regression analyses |
| [ | Cross-sectional study | 2787 participants aged 18–85 | To identify predictors of psychological distress during COVID-19 | Random forest machine learning algorithm and regression trees |
| [ | Florida, USA | 280 school-aged children | To examine mental health during COVID-19 | Bivariate analysis and logistic and multinomial logistic regression models |
Figure 1Machine learning pipeline adopted in this study.
Sociodemographic characteristics of the dataset.
| Sociodemographic Characteristics | Population (%) |
|---|---|
| Age, Mean ± Standard Deviation | 10.7 ± 4.1 |
| Sex | |
| Participant parent | |
| Parent’s ethnicity | 725 (98.2%) |
| Health insurance type | |
| Residential area | |
| Reporting parent’s educational level | |
| Second parent’s educational level | |
| Essential worker (yes): healthcare, delivery worker, store worker, security, building maintenance | 321 (43.5%) |
| Worker in a facility treating COVID-19 (yes) | 105 (14.2%) |
| Job loss during the pandemic (yes) | 38 (5.1%) |
| Limited ability to earn money (yes) | 81 (10.9%) |
Dataset description.
| Category | Features | Description |
|---|---|---|
| Demographics | age_group | Age group of child |
| gender_child | Gender of child | |
| parent_area_live | Area of residence | |
| gender_parent | Gender of the parent or guardian | |
| parenteducation | Education level of parent or guardian | |
| school_child | School enrolment and attendance | |
| 2w_essential_worker | Whether any adults living with the child are essential workers (health care, delivery services, pharmacies, law enforcement and security, store worker, cleaning services, other) | |
| Social life | 3m_outdoors | Days per week the child spent outside the house (parks, outdoor spaces) in 3 months and the past 2 weeks, respectively |
| 2w_outdoors | ||
| 2w_time_outside | Amount of time per week the child spent/dedicated out of the house (e.g., shopping, parks, etc.) | |
| 2w_event_cancellat | How difficult the cancellation of important events in the child’s life (graduation, vacation, Easter recess) was for him/her | |
| 2w_recommendations | Difficulty following recommendations regarding social distancing | |
| 2w_contact_changed | Change in the child’s contact with people outside home relatives compared to before the coronavirus/COVID-19 crisis | |
| 2w_relationships_friends | Change in the quality of the child’s relationships with his/her friends | |
| 3m_soc_media | Time spent using social media (e.g Facetime, Facebook, Instagram, Snapchat, Twitter, Tiktok) for 3 months and the past 2 weeks, respectively | |
| 2w_soc_media | ||
| Personal life | 2w_positive | Positive changes in the child’s life due to the coronavirus/COVID-19 crisis |
| Family life | Family_impact_any | If any event that affected the family occurred due to COVID-19 |
| 2w_financial_recod | Financial problems faced by the family due to the coronavirus/COVID-19 crisis | |
| 2w_relationships_family | Changes in the quality of relationships between the child and members of his/her family | |
| 2w_family_events_lost_job | Whether either of the following have happened to the child’s family members because of coronavirus/COVID-19: loss of job, loss of earnings | |
| 2w_family_events_loss_earnings | ||
| Daily activities | 3m_exercise | Days per week the child engaged in exercise (e.g., increased heart rate, breathing) for at least 30 min, for 3 months and the past 2 weeks, respectively |
| 2w_exercise | ||
| 2w_video_games | Time spent playing video games, for 3 months and the past 2 weeks, respectively | |
| 3m_video_games | ||
| 3m_tv | Time spent watching TV or digital means (e.g., Netflix, Youtube, or web surfing) for 3 months and the past 2 weeks, respectively | |
| 2w_tv | ||
| 2w_reading | How frequently the child asked questions, read, or talked about coronavirus/COVID-19 | |
| Health concerns | 2w_worry_self_infected | Child’s worry about becoming infected |
| 2w_worry_family_inf | Child’s worry about family members or friends becoming infected | |
| 2w_worry_phys_healt | Worry that physical health will be affected by coronavirus/COVID-19 | |
| 2w_worry_ment_health | Worry that the child’s mental/emotional health will be affected by coronavirus/COVID-19 | |
| Behavioral effects | 2w_stress_restrict | Stress caused by the curfew |
| 2w_stress_family | Stress caused to the child by changes in family contacts | |
| 2w_worry_food_reco | Worry about food in the family running out due to loss of income | |
| 2w_stress_social | Stress caused to the child by changes to his/her social contacts | |
| 2w_living_stability | Child’s concern about the stability of the family’s living situation | |
| 2w_hopeful_end | How hopeful the child is that the coronavirus/COVID-19 crisis will end | |
| Sleeping habits | 3m_sleep_hours | Average sleep duration on weekdays, for 3 months and the past 2 weeks, respectively |
| 2w_sleep_hours_rec | ||
| 3m_sleep_time | Sleep schedule on weekdays, for 3 months and the past 2 weeks, respectively | |
| 2w_sleep_time_reco | ||
| 3m_sleep_hours_weeke | Average sleep duration on weekends, for 3 months and the past 2 weeks, respectively | |
| 2w_sleep_hours_wee | ||
| 3m_sleep_time_weeken | Sleep schedule on weekends, for 3 months and the past 2 weeks, respectively | |
| 2w_sleep_time_week | ||
| Medical diagnosis/rehabilitation | 2w_child_health_evaluation | Parental evaluation of the child’s overall physical health before the coronavirus/COVID-19 crisis |
| 2w_mental_health_eval | Parental evaluation of the child’s overall mental/emotional health before the coronavirus/COVID-19 crisis | |
| diagnosis_1_group | Diagnosis defined by the medical expert | |
| Diagnosis_FINAL_groups | Final diagnostic category defined by the medical expert | |
| 2w_symptoms_tot | Symptoms the child had | |
| 2w_all_exposure_tot | Child exposed to someone likely to have coronavirus/COVID-19 | |
| 2w_support_activit | Supports which were in place for the child and have been disrupted | |
| 2w_family_diagnosis | Whether any members of the child’s family have been diagnosed with COVID-19 | |
| 2w_family_events_ho | Whether any of the following have happened to the child’s family members because of Coronavirus/COVID-19: Hospitalization, self-quarantine, death, physical illness; and total number of the above family events | |
| 2w_family_events_qu | ||
| 2w_family_events_di | ||
| 2w_family_events_il | ||
| 2w_family_events_to | ||
| Mood state | 3m_general_worry | How worried the child generally was, 3 months ago and over the past 2 weeks, respectively |
| 3m_sadness | How happy versus sad the child was, 3 months ago and over the past 2 weeks, respectively | |
| 3m_anxiety | How relaxed versus anxious the child was, 3 months ago and over the past 2 weeks, respectively | |
| 3m_restlessness | How fidgety or restless the child was, 3 months ago and over the past 2 weeks, respectively | |
| 3m_anhedonia | Ability of the child to enjoy his/her usual activities, 3 months ago and over the past 2 weeks, respectively | |
| 3m_loneliness | How lonely the child was, 3 months ago and over the past 2 weeks, respectively | |
| 3m_irritability | How irritable or easily angered the child was, 3 months ago and over the past 2 weeks, respectively | |
| 3m_concentration | How well the child was able to concentrate or focus, 3 months ago and over the past 2 weeks, respectively | |
| 3m_tiredness | How fatigued or tired the child was, 3 months ago and over the past 2 weeks, respectively | |
| 3m_rumination | How often the child was expressing negative thoughts, 3 months ago and over the past 2 weeks, respectively |
Figure 2Clustering process.
Summarization of classifiers.
| Classifier | Description |
|---|---|
| Random Forest | An extended version of a decision tree that predicts the future instances with multiple classifiers, rather than a single classifier, to reach an accurate and correct prediction. RF constructs a large number of decision trees. Each decision tree denotes a class prediction, and the class with the most votes represents the model’s prediction [ |
| Multi-Layer Perceptron | MLP belongs in the category of Artificial Neural Networks (ANN) and it is the most common neural network. MLP is based on a supervised training procedure to generate a nonlinear model for prediction. It consists of layers, such as the input layer, output layer, and hidden layers. Thus, MLP is a layered feedforward neural network where the information is transferred unidirectionally from the input layer to the output layer through the hidden layers [ |
| Extreme Gradient Boosting | XG Boost is an extendible and cutting-edge application of gradient-boosting machines. Gradient boosting is an algorithm in which new models are created to predict the residuals of prior models, and then added together to make the final prediction. It uses a gradient descent algorithm to minimize the loss when adding new models [ |
| Logistic Regression | A mathematical model that describes the relationship of data to a dichotomous dependent variable. The model is based on the logistic function, |
| Support Vector Machine | SVM is a supervised learning model based on the statistical learning framework, called VC theory. SVM targets to create a decision boundary, the hyperplane, between two classes, which enables the prediction of labels from one or more feature vectors, such that the distance between the closest points of each class, called support vectors, and the hyperplane to be maximized [ |
| K-Nearest Neighbor | KNN is a non-parametric classification method that tries to classify an unknown sample based on the known classification of its neighbors [ |
| Decision Trees | DTs are sequential models, which logically combine a sequence of simple tests. Each test compares a numeric attribute against a threshold value or a nominal attribute against a set of possible values [ |
Figure 3Evaluation methodology.
Figure 4Evaluation process of clustering methods.
Parameter settings for clustering methods.
| Clustering Method | Parameter Settings |
|---|---|
| Mini Batch K-Means | 3 classes |
| Spectral Clustering | 3 classes, arpack eigen solver, nearest_neighbors affinity |
| Ward’s Hierarchical Agglomerative Clustering | 3 classes, ward linkage, symmetric connectivity |
| Average Linkage | 3 classes, average linkage, cityblock affinity, symmetric connectivity |
| Birch | 3 classes |
| Jenks | 3 classes, include lowest value |
Hyper parameter settings for tuning the ML algorithms.
| Classification Model | Hyper Parameters Tuning |
|---|---|
| Random Forest | n_estimators = [int(x) for x in np.linspace(start = 10, stop = 500, num = 10)]; max_features = [‘auto’, ‘sqrt’]; max_depth = [int(x) for x in np.linspace(3, 10, num = 1)]; min_samples_split = [3, 4, 5, 6, 7, 10]; min_samples_leaf = [1, 2, 4]; bootstrap = [True, False]. |
| Multi-Layer Perceptron | hidden_layer_sizes = [(2, 5, 10), (5, 10, 20), (10, 20, 50)]; activation = [‘tanh’, ‘relu’]; solver = [‘sgd’, ‘adam’]; alpha = [0.0001, 0.05]; learning_rate = [‘constant’, ‘adaptive’] |
| XG Boost | max_depth = [2, 3, 4, 5, 6, 7, 8]; min_child_weight = [1, 2, 3, 4, 5, 6]; gamma = [0, 0.4, 0.5, 0.6] |
| Logistic Regression | C = [0.001, 0.01, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; warm_star = [True, False]; multi_class = [‘ovr’, ‘multinomial’]; solver = [‘newton-cg’, ‘lbfgs’, ‘sag’, ‘saga’] |
| Support Vector Machine | C = [0.001, 0.01, 0.1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]; kernel = [‘linear’, ‘sigmoid’, ‘rbf’, ‘poly’] |
| K-Nearest Neighbor | n_neighbors = [5, 7, 9, 12, 14, 15, 16, 17]; leaf_size = [1, 2, 3, 5]; weights = [‘uniform’, ‘distance’]; algorithm = [‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’] |
| Decision Trees | max_features = [‘auto’, ‘sqrt’, ‘log2’]; min_samples_split = [2, 3, 4, 5, 6, 7, 8, 10, 12, 15]; min_samples_leaf = [1, 2, 3, 4, 5, 6, 7, 8, 10] |
Clustering results.
| Clustering Methods | Cluster Information | Clusters | ||
|---|---|---|---|---|
| Cluster 0 | Cluster 1 | Cluster 2 | ||
| Mini Batch K-Means | Set | [−24, −4] | [−3, 4] | [5, 25] |
| Number of elements | 144 | 468 | 132 | |
| Spectral Clustering | Set | Unable to create continuous sets | ||
| Number of elements | 485 | 230 | 29 | |
| Ward | Set | [−24, −7] | [−6, 1] | [2, 25] |
| Number of elements | 66 | 418 | 260 | |
| Average Linkage | Set | [−24, −7] | [−6, 4] | [5, 25] |
| Number of elements | 66 | 546 | 132 | |
| Birch | Set | [−24, −6] | [−5, 8] | [9, 25] |
| Number of elements | 80 | 608 | 56 | |
| Jenks | Set | [−24, −5] | [−4, 3] | [4, 25] |
| Number of elements | 106 | 469 | 169 | |
Evaluation of clustering methods. The best evaluation score is shown in bold.
| Clustering Method | Evaluation Method | Cumulative Normalized Score | ||
|---|---|---|---|---|
| Silhouette Coefficient | Calinski–Harabasz Index | Davies–Bouldin Index | ||
| Mini Batch K-Means | 0.55 | 1106.78 | 0.60 | 2.94 |
| Spectral Clustering | 0.12 | 24.95 | 14.79 | 0.00 |
| Ward | 0.54 | 989.18 | 0.58 | 2.80 |
| Average Linkage | 0.57 | 1048.06 | 0.52 | 2.94 |
| Birch | 0.55 | 784.60 | 0.49 | 2.64 |
| Jenks | 0.56 | 1112.73 | 0.58 | 2.96 |
Results from feature selection with the categories of the 40 first features.
| Features | Category | Features | Category |
|---|---|---|---|
| 1st feature | Social life | 21st feature | Daily activities |
| 2nd feature | Behavioral effects | 22nd feature | Behavioral effects |
| 3rd feature | Medical diagnosis/rehabilitation | 23rd feature | Behavioral effects |
| 4th feature | Social life | 24th feature | Social life |
| 5th feature | Personal life | 25th feature | Daily activities |
| 6th feature | Medical diagnosis/rehabilitation | 26th feature | Daily activities |
| 7th feature | Demographics | 27th feature | Medical diagnosis/rehabilitation |
| 8th feature | Family life | 28th feature | Demographics |
| 9th feature | Family life | 29th feature | Behavioral effects |
| 10th feature | Social life | 30th feature | Health concerns |
| 11th feature | Social life | 31st feature | Sleeping habits |
| 12th feature | Daily activities | 32nd feature | Social life |
| 13th feature | Daily activities | 33rd feature | Demographics |
| 14th feature | Health concerns | 34th feature | Social life |
| 15th feature | Daily activities | 35th feature | Medical diagnosis/rehabilitation |
| 16th feature | Health concerns | 36th feature | Social life |
| 17th feature | Demographics | 37th feature | Sleeping habits |
| 18th feature | Behavioral effects | 38th feature | Sleeping habits |
| 19th feature | Social life | 39th feature | Sleeping habits |
| 20th feature | Health concerns | 40th feature | Demographics |
Figure 5Spider plot of the number of features that belong to each feature category for the first 40 features where the best performance was achieved.
Figure 6Classification results.
The maximum accuracy achieved from the classification models. The best performance is shown in bold.
| Models | Maximum Accuracy (%) | Number of Features for Maximum Accuracy |
|---|---|---|
| Random Forest | 66.60 | 44 |
| MLP | 57.73 | 58 |
| XG Boost |
|
|
| Logistic Regression | 55.44 | 50 |
| SVM | 64.05 | 49 |
| KNN | 51.28 | 3 |
| Decision Trees | 53.23 | 5 |
Results after XG Boost classifier calibration with Isotonic Regression and Platt’s methods. The best scores are shown in bold.
| Models | Log-Loss | Accuracy (%) |
|---|---|---|
| XG Boost | 1.195 | 69.47 |
| XG Boost + Isotonic | 0.513 | 72.03 |
| XG Boost + Platt |
|
|
Figure 7Change of predicted probabilities on test samples after calibration with: (a) Isotonic Regression method; (b) Platt’s (sigmoid) method.
Figure 8Learned calibration map with: (a) Isotonic Regression method; (b) Platt’s (sigmoid) method.
Figure 9Calibration plot of XG Boost classifier for class 0.
Figure 10Calibration plot of XG Boost classifier for class 1.
Figure 11Calibration plot of XG Boost classifier for class 2.
Figure 12Mean SHAP values.
Figure 13SHAP values of patients from class 0.
Figure 14SHAP values of patients from class 1.
Figure 15SHAP values of patients from class 2.
Figure 16Mean SHAP values of patients from class 0 and class 1.
Figure 17Mean SHAP values of patients from class 0 and class 2.
Figure 18SHAP values patients from class 1 and class 2.