Literature DB >> 33945572

Chronic stress in practice assistants: An analytic approach comparing four machine learning classifiers with a standard logistic regression model.

Arezoo Bozorgmehr¹, Anika Thielmann^1,2, Birgitta Weltermann^1,2.

Abstract

BACKGROUND: Occupational stress is associated with adverse outcomes for medical professionals and patients. In our cross-sectional study with 136 general practices, 26.4% of 550 practice assistants showed high chronic stress. As machine learning strategies offer the opportunity to improve understanding of chronic stress by exploiting complex interactions between variables, we used data from our previous study to derive the best analytic model for chronic stress: four common machine learning (ML) approaches are compared to a classical statistical procedure.
METHODS: We applied four machine learning classifiers (random forest, support vector machine, K-nearest neighbors', and artificial neural network) and logistic regression as standard approach to analyze factors contributing to chronic stress in practice assistants. Chronic stress had been measured by the standardized, self-administered TICS-SSCS questionnaire. The performance of these models was compared in terms of predictive accuracy based on the 'operating area under the curve' (AUC), sensitivity, and positive predictive value.
FINDINGS: Compared to the standard logistic regression model (AUC 0.636, 95% CI 0.490-0.674), all machine learning models improved prediction: random forest +20.8% (AUC 0.844, 95% CI 0.684-0.843), artificial neural network +12.4% (AUC 0.760, 95% CI 0.605-0.777), support vector machine +15.1% (AUC 0.787, 95% CI 0.634-0.802), and K-nearest neighbours +7.1% (AUC 0.707, 95% CI 0.556-0.735). As best prediction model, random forest showed a sensitivity of 99% and a positive predictive value of 79%. Using the variable frequencies at the decision nodes of the random forest model, the following five work characteristics influence chronic stress: too much work, high demand to concentrate, time pressure, complicated tasks, and insufficient support by practice leaders.
CONCLUSIONS: Regarding chronic stress prediction, machine learning classifiers, especially random forest, provided more accurate prediction compared to classical logistic regression. Interventions to reduce chronic stress in practice personnel should primarily address the identified workplace characteristics.

Entities: Chemical Disease Gene Species

Year: 2021 PMID： 33945572 PMCID： PMC8096078 DOI： 10.1371/journal.pone.0250842

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

1. Introduction

Occupational stress is an important issue in health care and other workers worldwide [1]. Following stress models introduced by Selye, Lazarus and others, it was shown that chronic stress can lead to adverse (mental) health effects such as burnout or depression [2, 3]. Also, stress can produce temporary or even permanent alterations in memory [4], cognition [5], arousal/sleep [6, 7], and coping behaviours [8]. In our prior study with 214 general practitioners (GPs) and 550 practice assistants from 136 German general practices, we showed that 19.9% of the male GPs (n = 141), 35.6% of the female GPs (n = 73) and 26.4% of the practice assistants (PrAs) had high chronic stress [9]. Overall, the mean prevalence of high chronic stress was 26.3% in this workforce, which is more than twice as prevalent compared to the general population (11%) studied in the representative German Health Interview and Examination Survey for Adults (DEGS1) with more than 7.900 participants [10, 11]. Analyzing for various work and (regional) practice characteristics, we showed that only the weekly working hours correlated with high chronic stress in GPs and PrAs. However, aiming to develop effective prevention strategies, a more profound understanding of factors causing and/or contributing to high psychological strain on an individual and group level is needed. As workplaces typically are complex and multifactorial social organizations, appropriate statistical methods are needed to analyse for complex associations and cause-effect relationships. Prior studies addressing impaired psychological well-being in primary care workers used standard statistical procedures such as prevalence ratios and logistic regression models to evaluate for associations [9, 12, 13]. These statistical approaches usually simplify the complex relationships between independent variables (features) and response variable (dependent variable): they assume that each independent variable is linked to the outcome by a linear statistical function. This is especially problematic when datasets with large numbers of non-linear interactions and interaction effects between independent variables occur, which make the model more complex [14]. Nowadays, machine learning (ML) approaches offer new opportunities to evaluate complex relationships. Conceptually, ML has the benefit that it efficiently exploits complex and non-linear interactions between variables by minimizing the error between predicted and observed response variables and improve the accuracy of the models compared to standard approaches [15, 16]. By using a large dataset available on practice assistants from our prior study, we aim to develop better understanding workplace factors, associated with chronic stress in practice assistants using machine learning. Thus, we compare four machine learning classifiers (random forest, support vector machine, K-nearest neighbors’, artificial neural network) with a standard logistic regression model using standard measurements to compare test accuracy, i.e. to derive the best prediction model for chronic stress in practice assistants in primary care. Regarding terminology, we like to point out that we use the term “prediction” as used in the context of machine learning: it refers to the output of an algorithm after it has been trained on a dataset and applied to new data to forecast the likelihood of a particular outcome. In contrast, in epidemiological analyses, a (risk) prediction model refers to a mathematical equation that uses patient characteristics (risk factors) to estimate the probability of a defined outcome prospectively.

2. Methods

2.1 Data source

The dataset used for the analyses was derived from our cross-sectional study addressing stress among general practice personnel (GPs, PrAs), which was performed among general practices belonging to the teaching practice network of the Institute for General Medicine, University Hospital Essen, Essen, Germany. A total of 764 professionals from 136 practices had taken part in the survey, which was performed in 2014. The design of the study and key results addressing the 214 GPs (practice owners and employed physicians) and 550 practice assistants (PrAs) (including medical secretaries and practice assistants in trainees) are published [9]. This analysis addresses chronic stress in 550 practice assistants (PrAs), which are the largest professional group in general practices. We documented that 26.4% of the 550 practice assistants (PrAs) had high chronic stress, as well as 19.9% of the male (n = 141) and 35.6% of the female (n = 73) general practitioners (GPs) [9]. In this workforce, the average of workers with high chronic stress was 26.3% (n = 201).

2.2 Ethics statement

Ethical approval for the survey had been obtained from the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (reference number: 13-5536-BO, date of approval: 24/11/2014). All participants had received written information and signed informed consent forms. The principal investigator of the study (B.W) and coauthor of this manuscript provided the data for this analysis.

2.3 Outcome

The primary outcome is strain due to chronic stress over the past three months. Chronic stress was measured using the German short version of the standardized, validated, self-administered TICS-SSCS questionnaire [17, 18]. This instrument measures strain due to chronic stress for the past three months. It consists of 12 items on 5-point Likert scales (0 = ‘never’ und 4 = ‘very often’). The TICS-SSCS values are added to a sum-score. The score ranges from 0 to 48 with 0 denoting ‘never stressed’ and 48 ‘very often stressed’, and reflects subjective strain due to chronic stress [17, 18]. Following the definition of chronic stress of our prior analysis, the TICS scores were dichotomized using the median (TICS = 23) as cut-off (0 = no chronic stress (TICS < 23), 1 = strain due to chronic stress (TICS ≥ 23)).

2.4 Socio-demographic and workplace characteristics

A total of 64 sociodemographic and workplace characteristics were used for the analyses. The sociodemographic characteristics included e.g., age, marital status, number of persons in household. Work-related characteristics comprised details on the employment (e.g., number of hours per week, work status, employment contract), duties in practice (e.g., reception, telephone, prescription, blood pressure measurement) and subjective perceptions of workload (e.g., self-determination of sequence of work steps, influence on work assigned, plan the work independently). The standardized `short questionnaire for workplace analysis’ (German: Kurzfragebogen zur Arbeitsanalyse (KFZA)) was used to assess workplace characteristic [19]. For details on the work characteristics see Tables 1–3. In line with the TICS instrument, which addresses strain due to chronic stress during the past three months, all workplace characteristics had been requested regarding the past three months (see Table 4).

Table 1

Sociodemographic characteristics of practice assistants (n = 550) and strain due to chronic stress (measured by the standardized and validated TICS tool): Items and sum scores.

	Participants (N = 550)
Continuous variables	Mean	SD		Range
Age	38	12.61		16–71
Persons in household more age 18	2	1.12		0–6
Persons in household below age 18	1	0.84		0–6
Number of physicians in practice	3	2.16		1–10
Number of practice assistants in practice	8	7.66		0–35
Categorical variables	n		%
Female gender	544		99.3
Marital status
Married	277		50.4
Single	221		40.2
Divorced	45		8.2
Widowed	7		1.3
Number of persons in household	72		13.1
Cares for next of kin	75		13.6
Working hours/week
1–9 hours	12		2.2
10–19 hours	52		9.5
20–29 hours	116		21.1
30–39 hours	221		40.2
40–49 hours	116		21.1
50–59 hours	12		2.2
>60 hours	10		1.8
Working full time	364		66.2
Has open-ended employment contract	466		84.7
Had participated in stress seminar in the past	31		5.6
Had used counseling for stress reduction	50		9.1
High strain due to chronic stress (TICS ≥ 23)	125		22.7

Table 3

Self-assessment of workplace situation (n = 550 practice assistants).

Work aspects	Workplace factor	Mean Score (PrAs)	95% CI
Job content	Versatility	3.6	3.58–3.7
Job content	Completeness of task	3.5	3.41–3.57
Resources	Scope of action	3.4	3.37–3.49
	Social support	4.0	3.98–4.12
	Cooperation	3.6	3.53–3.66
Stressors	Qualitative work demands	2.2	2.14–2.29
	Quantitative work demands	2.9	2.83–3.01
	Work disruptions	2.7	2.67–2.81
	Workplace environment	2.2	2.13–2.3
Organizational culture	Information and participation	3.6	3.57–3.73
Organizational culture	Benefits	2.9*	2.77–2.94

Table 4

Chronic stress of practice assistants: Results of TICS (Trierer Inventory of Chronic Stress) (n = 550).

How often in the last 3 months did you experience …	Never	Rarely	Sometimes	Frequently	Very Frequently
How often in the last 3 months did you experience …	n(%)	n(%)	n(%)	n(%)	n(%)
Fear, something unpleasant might occur	72 (13.1)	213 (38.7)	190 (34.5)	54 (9.8)	21 (3.8)
Lack of recognition for good performance	158 (28.7)	157 (28.5)	121 (22.0)	71 (12.9)	42 (7.6)
Times with too many obligations	38 (6.9)	119 (21.6)	167 (30.4)	157 (28.5)	67 (12.2)
Times when being unable to suppress worrying thoughts	90 (16.4)	174 (31.6)	182 (33.1)	83 (15.1)	21 (3.8)
Work is not appreciated despite doing the best	157 (28.5)	200 (36.4)	116 (21.1)	56 (10.2)	20 (3.6)
Everything is too much	86 (15.7)	174 (31.7)	174 (31.7)	85 (15.5)	30 (5.5)
Times of worry and one cannot stop it	138 (25.1)	186 (33.9)	139 (25.3)	57 (10.4)	29 (5.3)
Times when being unable to perform as expected	120 (21.8)	299 (54.4)	107 (19.5)	19 (3.5)	5 (0.9)
Times in which the responsibility for others is a burden	162 (29.5)	215 (39.1)	123 (22.4)	42 (7.6)	8 (1.5)
Times when the work gets too much	85 (15.5)	205 (37.3)	183 (33.3)	60 (10.9)	17 (3.1)
Fear of not being able to perform the tasks	126 (22.9)	229 (41.6)	137 (24.9)	43 (7.8)	15 (2.7)
Times when being overwhelmed with worries	165 (30.0)	189 (34.4)	128 (23.3)	45 (8.2)	23 (4.2)

*Missing values above 5%

2.5 Statistical analysis

2.5.1 Handling of missing data

Missing values were observed in 0.2% to 11%. If missing data were above 5%, this is indicated in the Tables 1–3. Common imputation methods for supervised learning were applied to handle missing data [20]. The K-nearest neighbors algorithm was used for imputing missing values in TICS scores with k = 10. For continuous variables we used median imputation and for categorical variables a separate category ‘unknown’ [20].

2.5.2 Preparation of datasets for machine learning

After pre-processing the data to compare machine learning classifiers, the dataset was split into a ‘training’ and a ‘validation’ dataset. Fig 1 illustrates the study process flow. We used the 10-fold cross validation approach in machine learning models to measure the unbiased prediction accuracy of the models (see Fig 2). Based on the literature, 10 was chosen as optimal number of folds, which optimizes the time to complete the test while minimizing the bias and variance associated with the validation process [21-23]. The K-Fold cross validation method also called rotation estimation is used to minimize the bias associated with the random sampling of the training and holdout data samples in comparing the predictive accuracy of two or more machine learning methods. In this method the complete dataset (D) is randomly split into k mutually exclusive subsets (the folds: D1, D2,…, Dk) of approximately equal size. The classification model is trained and tested k times. Each time (t 2 {1, 2,…, k}), it is trained on all but one folds (Dt) and tested on the remaining single fold (Dt). The cross validation estimate of the overall accuracy is calculated as the average of the k individual accuracy measures by formula:

Fig 1

Machine learning data extraction process flow.

Fig 2

K-Fold cross validation.

Where CVA stands for cross-validation accuracy, k is the number of folds used, and A is the accuracy measure of each fold [21].

2.5.3 Logistic regression as standard statistical procedure

Logistic Regression (LR) is a classical statistical modelling procedure to analyze one dependent dichotomous or binary outcome and one or more nominal, ordinal, interval or ratio-level independent variables. LR models are frequently applied to exposure-event studies in medical research, because they can be used to estimate the model predictors’ odds ratio [24]. All variables significant in bivariate analysis were included in the logistic regression model.

2.5.4 Machine learning approaches

1) K-Nearest Neighbors (KNN) classifies an object by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors (k is a positive integer). If k = 1, the object is simply assigned to the class of its nearest neighbor. KNN is a type of instance-based or lazy learning where the function is only approximated locally and all computation is deferred until classification [25, 26]. In this study, we used KNN applying k = 10 neighbors, which are the ten closest observations in multidimensional space based on Euclidean distance function to model the training dataset. 2) Support Vector Machine (SVM) represents different outcome classes in a hyperplane in multidimensional space to find the maximum marginal hyperplane. SVM generates the hyperplane in an iterative manner to minimize the error. A basic SVM is a non-parametric linear classifier that creates a hyperplane using the Euclidean distance function from the nearest input values to determine the target states. In order to obtain probability estimates, a logistic regression model is fitted to the output of the support vector machine [25]. In this study, the SVM classifier used RBF (Radial basis function) kernel, a training error of 1.0E-12, and a default boundary tolerance of a 1.0E-03 hyperplane. To obtain proper probability estimates, we used the option that fits calibration models to the outputs of the SVM. 3) Random Forest (RF) is a collection of decision trees, each constructed in a bootstrapped sample and from a random subset of the possible predictors at each node. RF is used to reduce variance associated with decision trees [27, 28]. In this study, the forest is constructed consisting of randomly 1,000 individual trees. A large number of trees increases the predictive accuracy of RF models and the forest does not require extensive tuning [29]. Due to the insensitivity of error rates to the number of features selected to split each node, we used the default of a random sample of √n of predictors at each node with n being the total number of predictors under consideration. The predicted probability was derived based on average prediction across all of the trees. 4) Artificial Neural Network (ANN) is a computational and flexible model that expresses complex non-linear relationships among features, which consist of an interconnected group of variables. A basic ANN model consists of three layers of neurons, i.e. input, output, and hidden layer. These layers can learn from data iteratively through a backpropagation classifier. It trains a multilayer perceptron with one hidden layer, an input layer with the number of nodes equal to the sum of features, and an output layer [30]. This study used a multilayer Perceptron classifier with one hidden layer, a learning rate value with decay of 0.3, and a momentum rate for the backpropagation classifier of 0.2. Suitable ranges for these parameters are within 0.15–0.8 for learning rate and 0.1–0.4 for momentum [30]. Development of the models was completed using Python (Version 3.7.3) and Python’s Scikit-Learn library (https://scikit-learn.org/stable/).

3. Results

3.1 Sociodemographic and workplace characteristics of the study population

The dataset comprised results of 550 PrA from 136 general practices. The vast majority of the total of PrAs were females (98.9%) with a mean age of 38 years (SD 12.6). Regarding the marital status, 50.6% (n = 277) of the PrAs were married. On average, they worked in the current practice for 18.8 years (SD 12.5), 32.5% in part-time.

3.2. Primary outcome: Strain due to chronic stress

The TICS score of the population ranged from 0 to 44 with a mean of 17.2 and median of 17.0. In the total dataset, 22.7% (n = 125) had high strain due to chronic stress versus 77.3% (n = 425) low strain due to chronic stress. Regarding socio-demographic characteristics personnel with high strain due to chronic stress showed the following significant differences compared to those with low strain: older PrAs (mean 38.76) vs. younger PrAs (mean 24.36), unmarried PrAs (29.4%) vs. married PrAs (17%). While caring for next of kin did not differ between groups. No gender-specific distribution was applied, because PrAs were predominantly female (98.9%). All regression and machine learning approaches were applied to the dataset with female subjects only (n = 546).

3.3. Results of four machine learning classifiers

3.3.1 Prediction accuracy

The performance of the machine learning classifiers was assessed using the validation dataset by calculating Harrell’s c-statistic, a measure of the total area under the receiver operating characteristic curve (AUC) [31]. The results showed an AUC of 0.844 (95%CI, 0.684–0.843) for RF, 0.760 (95%CI, 0.605–0.777) for ANN, 0.787 (95%CI, 0.634–0.802) for SVM, and 0.707 (95%CI, 0.556–0.735) for KNN.

3.3.2 Classification analysis

Corresponding results of sensitivity and positive prediction value (PPV) for machine learning were 99% and 79% for RF, 87% and 85% for ANN, 87% and 86% for the SVM, and 99% and 78% for KNN.

3.4. Results of Logistic regression analysis

In bivariate analysis, the following factors were associated with strain significantly: persons in household below age 18, marital status, age, working hours/week, room equipment, work status, performed laboratory work, obtained blood pressure readings, and performed doppler examination of foot vessels/measured ankle-arm index as duties in practice. C statistics for logistic regression showed an AUC of 0.636 (95%CI, 0.490–0.674). This model predicted 316 cases correctly from 425 total cases, with a sensitivity of 75% and positive prediction value (PPV) of 44%.

3.5. Comparison of ML and regression analysis

The prediction accuracy according to the discrimination (AUC c-statistic) value is shown in Table 5 for all models. All machine learning models achieved statistically improvements in compared to the standard logistic regression model: +20.8% for RF, +15.1% for SVM, +12.4% for ANN, and +7.1% for KNN. Random forest is performing well out of all four machine learning classifiers. RF classifier resulted in a net increase of 104 strain due to chronic stress cases from the logistic regression baseline model, increasing the sensitivity to 99% and PPV to 79%. See Table 6 for more details of machine learning models.

Table 5

Performance of the machine learning algorithms predicting chronic stress derived from applying training algorithms on the validation dataset.

Higher c-statistics results in better algorithm discrimination. The baseline (BL) standard logistic regression model is provided for comparative purposes.

Algorithms	AUC c-statistic	95% Confidence Intervall		Absolute change in AUC (%)
Algorithms	AUC c-statistic	LCL	UCL
BL: Logistic Regression	0.636	0.490	0.674	[Reference]
ML: K-nearest Neighbours	0.707	0.556	0.735	+7.1%
ML: Support Vector Machine	0.787	0.634	0.802	+15.1%
ML: Artificial Neural Network	0.760	0.605	0.777	+12.4%
ML: Random Forest	0.844	0.684	0.843	+20.8%

Table 6

Full details on classification analysis.

Algorithms	Chronic stress cases correct (True Positive)	Chronic stress cases incorrect (False Negative)	Total chronic stress cases	Non-chronic stress cases correct (True Negative)	Non-chronic stress cases incorrect (False Positive)	Total non-chronic stress cases	Sensitivity (True Positive)	Positive Predictive Value (PPV)
Logistic Regression	316	109	425	68	57	125	0.751	0.440
ML: Random Forest	420	5	425	15	110	125	0.988	0.792
ML: K-nearest Neighbours	421	4	425	6	119	125	0.991	0.780
ML: Support Vector Machine	369	56	425	66	59	125	0.868	0.862
ML: Artificial Neural Network	369	56	425	59	66	125	0.868	0.848

Performance of the machine learning algorithms predicting chronic stress derived from applying training algorithms on the validation dataset.

Higher c-statistics results in better algorithm discrimination. The baseline (BL) standard logistic regression model is provided for comparative purposes.

3.6. Variable rankings in machine learning models

Of the 4 ML approaches used, variable importance can only be determined in artificial neural network and random forest. Artificial neural network model uses the overall weighting of the variables within the model. Random forest ranks variable importance based on decision-trees on the selection frequency of the variable as a decision node. For KNN does not provide a method for the importance or coefficients of variables. We used a nonlinear SVM classifier with RBF kernel, which has no variable importance methods. The variable importance was determined by the coefficient effect size for logistic regression model. The identified factors such as persons in household below age 18, age below 35 years old, and insufficient room equipment that have identified by logistic regression, has also identified by ANN and RF. The most determined factors by both of ANN and RF included work related characteristics such as too much work, high demand to concentrate, time pressure, complicated tasks, and insufficient practice room conditions (See Table 7).

Table 7

The most influential predictor variables associated with chronic stress listed by coefficient effect size (Standard logistic regression) weighting (Artificial neural network) and selection frequency (Random forest).

Standard model		Machine learning models
Logistic regression	Coefficient	Artificial Neutral Network	Weight (%)	Random Forest	Frequency
Obtained blood pressure readings	0.951	Too much work	39.7	Too much work	0.73
Persons in household below age 18	0.349	High demand to concentrate	39.3	High demands to concentrate	0.71
Working hours/week more than 40	0.121	Time pressure	36.7	Time pressure	0.70
Work status	-0.109	Complicated tasks	31.5	Complicated tasks	0.67
Performed laboratory work	0.091	Insufficient practice room conditions	18.1	Age ≤ 35	0.63
Employment contract	0.063	Interrupted during work	14.9	Insufficient support by practice leaders	0.52
Age ≤ 35	0.045	Persons in household below age 18	13.8	Insufficient workplace environment	0.51
Insufficient workplace environment	0.028	Working hours/week more than 40 hours	12.7	Insufficient practice room conditions	0.50
Performed doppler examination of foot vessels/measured ankle-arm index	0.018	Workplace environment	12.3	Holding together well	0.48
Marital status/single	0.006	Number of practitioners in the practice	10.6	Influence on work assigned	0.43

4. Discussion

To the best of our knowledge, this study is the first to use machine learning for a better understanding of stress in primary care practice personnel. Comparing four common machine learning (ML) approaches to a classical statistical procedure, we showed that all four machine learning approaches provided more accurate models for the prediction of strain due to chronic stress than as standard regression analysis. Random forest showed the highest accuracy with workload, high demand to concentrate, and time pressure being the most important factors associated with chronic stress. These factors were also identified in other studies in the target populations GPs and GP practice personnel. Addressing job satisfaction, Harris et al. identified time pressure as the most frequent stressor in a study with 626 Australian practice staff in 96 general practices [12]. Studying 158 Canadian family physicians, Lee et al. determined the following occupational stressors as relevant: challenging patients, high workload, time limitations, competency issues, challenges of documentation and practice management and changing roles within the workplace [13, 32]. Similarly, Hoffmann et al. showed that the work disruption was a negative relevant workplace factor in study with 550 practice assistants [33]. These stressors are described to influence poor physician well-being and adverse patient outcomes such as low patient satisfaction [34]. The relevance of such chronic psychological burden is tremendous as it was shown that physiological responses due to stress negatively affect e.g. memory, immune system functions, the function of the cardiovascular system, and brain electric activity [35, 36].

4.1 Comparison to other ML analyses

There are a few other studies from other medical fields, which compared standard statistical and ML approaches, similar to our results. Machine learning is considered a branch of artificial intelligence, which extracts meaningful patterns from data and develops prediction models using several algorithms [37]. ML approaches integrate many different levels of data to develop a new approach to classification based on medical issues such as chronic stress and linked more precisely to interventions for a given individual. Better model accuracy by machine learning was also found in an UK study on cardiovascular risk prediction. Using routine clinical data of 378,256 patients four machine learning algorithms (random forest, logistic regression, gradient boosting, and neural network) were compared to an established algorithm (American College of Cardiology guidelines) to predict first cardiovascular event over 10-years [38]. Neural network performed best, with a predictive accuracy improving by 3.6% compared to baseline algorithm. Using a dataset with 9.502 heart failure patients and a one-year follow-up, a US study compared four machine learning methods (least absolute shrinkage and selection operation regression, classification and regression trees, random forests, and gradient boosted modeling (GBM)) with logistic regression as a classical statistical procedure to predict four heart failure outcomes. The C statistic results for all outcomes show that ML methods were better calibrated and that gradient-boosted (GMB) model was the most consistent ML modeling approach [39]. In the field of oncology, a large American study on breast cancer survival compared two ML algorithms (artificial neural network and decision trees) to classical statistical logistic regression using a large dataset with more than 200,000 cases. The decision tree approach was the best predictor with 93.6% prediction accuracy, followed by artificial neural network with 91.2% and LR with 89.2% [40]. Overall, machine learning approaches yielded more accurate results than classical methods in our and the above-mentioned studies.

4.2 Strength and limitations

The key strength of this study is the comparison of a range of machine learning approaches in the field of healthcare workers´ well-being. Chronic stress measurement approaches based on self-reported questionnaires [17, 41] are subjective and cannot provide immediate information about the state of a person. A continuous stress monitoring using data mining technology helps to better understand stress patterns and also provide better insights about possible future interventions. Limitations of this study include the rather small sample size and the large number of predictor variables (features), which poses a risk for overfitting [42, 43]. One of the key components of predictive accuracy is the amount and quality of the data to provide better results. Furthermore, our data source contained practice assistants from the German region only, which limits generalizability and requires validation in populations from other countries where job tasks and challenges might be different. Although the data collection was conducted in 2014, the results still apply to German practices, except that the COVID pandemic likely increased workload and psychological burden, which we are currently evaluating in an ongoing study [11]. Prospectively, research using continuous stress monitoring and data mining technologies will help to better understand stress patterns and provide even deeper insights for possible future interventions.

5. Conclusion

Compared to logistic regression as a classical statistical procedure, this study showed that all machine learning classifiers provided more accurate models for the prediction of chronic stress in practice assistants with random forest performing best. Identification of chronic stress is of importance for the well-being and productivity of practice assistants. RF identified prominent predictor variables (features) that influence chronic stress which should be considered when developing interventions to reduce chronic stress. 23 Dec 2020 PONE-D-20-23593 Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model PLOS ONE Dear Dr. Bozorgmehr, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Feb 06 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Alfredo Vellido Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please clarify in your Methods section how the dataset was obtained by the authors of this study, and whether there was any ethical oversight over the data collection for this study. Please state whether or not the authors had access to any identifying information. 3.We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. Please see http://www.bmj.com/content/340/bmj.c181.long for guidelines on how to de-identify and prepare clinical data for publication. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Dear Authors the manuscript entitled"Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model" has several interesting and strategic findings and i certainly used it in the real life. the chronic stress is very important issue in our society for health. thank you for your study. some minor revision is only needed. although the manuscript is too long and the part of method was explained in the details but i do not recommend to summarize because it help some readers who not know intelligent artificial method. another limitation that you mentioned is that certainly the important features of work in the chronic stress is dependent to several factor of social-psychological-economy- culture. then these factors were important in that community or culture. however the results can give us a closer look at reality. i suggest that the researchers analyse again the classifiers only in the women subjects and clear the men. i think the accuracy of results increases. Abstract: the background of abstract should be brief and general and the explanation of study add to the method. the names of questionnaires enter in the method. introduction: change the phrase of chronic strain to chronic stress. strain is force that could cause stress and stress is mental state and used as impairment. results: the table 6: the stress is add to this topic:Total Non-(stress) Cases = if possible add the weight to main features listed in the table 7 discussion: as a strategic note you add the some complications due to chronic stress for general health or some studies that measure the effect of stress on brain function or other parts The impact of stress on body function: A review H Yaribeygi, Y Panahi, H Sahraei, TP Johnston, A Sahebkar EXCLI journal 16, 1057 Support vector machine classification of brain states exposed to social stress test using EEG-based brain network measures S Lotfan, S Shahyad, R Khosrowabadi, A Mohammadi, B Hatef Biocybernetics and Biomedical Engineering 39 (1), 199-213 Reviewer #2: Dear authors, your manuscript is interesting but I need you to answer some questions: INTRODUCTION - The introduction is very short. The constructs and concepts necessary to understand the manuscript are not explained. - Page 4, paragraph 1, lines 47-50: this information should go in the “Method” section. METHODS Data source: - What categories are there among the "general practice personnel"? The authors must describe the sample used. - What was the target population? How was the sample chosen? The authors must specify it. DISCUSSION Page 21, lines 272-297: The first two paragraphs do not contribute anything new and repeat information about the results. Limitations The information is from 2014. There was a global economic crisis that affected working conditions. It should be said since currently, working conditions are not equivalent. REFERENCES Many bibliographies are obsolete and some citations are incomplete. The bibliographic citations used are more than 5 years old (57,1%). The authors must update and arrange the bibliography. Too many references do not meet the journal guidelines and that have errors. The authors should review this section. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Boshra Hatef Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 16 Feb 2021 Response to Editor PONE-D-20-23593 Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model PLOS ONE Dear Dr. Vellido, We like to thank you and the reviewers for the very helpful suggestions. Please find enclosed our revision and answers to the open items. Best regards, Arezoo Bozorgmehr Editor comments: 1. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. Reply: Not applicable. 2. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf. Reply: Done. 3. Please clarify in your Methods section how the dataset was obtained by the authors of this study, and whether there was any ethical oversight over the data collection for this study. Please state whether or not the authors had access to any identifying information. Reply: Thank you, we clarified this. The information regarding to the ethics statement was already included (please see P.7, lines 130-134). We now highlighted this by inserting in a new headline and added requested aspects. Text: P. 7, lines 131-136 (document of revised Manuscript with track changes): “2.2 Ethics statement: Ethical approval for survey had been obtained from the Ethics Committee of the Medical Faculty of the University of Duisburg-Essen (reference number: 13-5536-BO, date of approval: 24/11/2014). All participants had received written information and signed informed consent forms. The principal investigator of the study (B.W) and coauthor of this manuscript provided the data for this analysis.” 4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. Reply: The manuscript’s data cannot be shared publicly because of ethical restrictions as our dataset includes potentially identifying information of personnel in general practices. Data requests may be sent to the institutional ethics committee (ethik@ukbonn.de). 5. In your revised cover letter, please address the following prompts: a. If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially identifying or sensitive patient information) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. Reply: The manuscript’s data cannot be shared publicly because of ethical restrictions as our dataset includes potentially identifying information of personnel in general practices. Data requests may be sent to the institutional ethics committee (ethik@ukbonn.de). Reviewer reports: Reviewer #1: 1. although the manuscript is too long and the part of method was explained in the details but i do not recommend to summarize because it help some readers who not know intelligent artificial method. Reply: Thank you. 2. another limitation that you mentioned is that certainly the important features of work in the chronic stress is dependent to several factor of social-psychological-economy- culture. then these factors were important in that community or culture. however the results can give us a closer look at reality. i suggest that the researchers analyse again the classifiers only in the women subjects and clear the men. i think the accuracy of results increases. Reply: Thank you for this suggestion. We fully agree and had used only the dataset with the women for our analyses. This is outlined in the manuscript. Revised text: Please see section 3.2, line 254-257 (document of revised Manuscript with track changes). “No gender-specific distribution was applied, because PrAs were predominantly female (98.9%). All regression and machine learning approaches were applied to the dataset with female subjects only (n=546).” 3. Abstract: the background of abstract should be brief and general and the explanation of study add to the method. the names of questionnaires enter in the method. Reply: Thank you for your suggestion, we changed this. Revised text: Please see abstract (document of revised Manuscript with track changes): “Background Occupational stress is associated with adverse outcomes for medical professionals and patients. In our cross-sectional study with 136 general practices, 26.4% of 550 practice assistants showed high chronic stress. As machine learning strategies offer the opportunity to improve understanding of chronic stress by exploiting complex interactions between variables, we used data from our previous study to derive the best analytic model for chronic stress: four common machine learning (ML) approaches are compared to a classical statistical procedure.” “Methods We applied four machine learning classifiers (random forest, support vector machine, K-nearest neighbors’, and artificial neural network) and logistic regression as standard approach to analyze factors contributing to chronic stress in practice assistants. Chronic stress had been measured by the standardized, self-administered TICS-SSCS questionnaire. The performance of these models was compared in terms of predictive accuracy based on the ‘operating area under the curve’ (AUC), sensitivity, and positive predictive value.” 4. Introduction: change the phrase of chronic strain to chronic stress. strain is force that could cause stress and stress is mental state and used as impairment. Reply: Done Revised text: Please see page 5, line 104 (document of revised Manuscript with track changes): “… associated with chronic stress in practice assistants …” 5. results: the table 6: the stress is add to this topic:Total Non-(stress) Cases = Reply: Thank you for your hint, we corrected this. Revised text: Please see table 6. 6. if possible add the weight to main features listed in the table 7 Reply: We added this information as suggested, thank you. Revised text: Please see table 7. 7. discussion: as a strategic note you add the some complications due to chronic stress for general health or some studies that measure the effect of stress on brain function or other parts The impact of stress on body function: A review H Yaribeygi, Y Panahi, H Sahraei, TP Johnston, A Sahebkar EXCLI journal 16, 1057 Support vector machine classification of brain states exposed to social stress test using EEG-based brain network measures S Lotfan, S Shahyad, R Khosrowabadi, A Mohammadi, B Hatef Biocybernetics and Biomedical Engineering 39 (1), 199-213 Reply: Thank you for these interesting articles, which we added to our paper. Revised text: Please see references on page 22, lines 352-354 (document of revised Manuscript with track changes). “The relevance of such chronic psychological burden is tremendous as it was shown that physiological responses due to stress negatively affect e.g. memory, immune system functions, the function of the cardiovascular system, and brain electric activity [35,36].” Reviewer #2: 8. INTRODUCTION a) The introduction is very short. The constructs and concepts necessary to understand the manuscript are not explained. Reply: Thank you, we revised the text profoundly. First, we refer to the construct of stress as developed by Selye and Lazarus. Second, we outline the construct of practices being multi-parameter systems, which affect professionals working there. Third, we outlined the concept of machine learning as analytic strategy more in detail. Revised text: Please see pages 4-6, lines 59-115 (document of revised Manuscript with track changes): “Occupational stress is an important issue in health care and other workers worldwide [1]. Following stress models introduced by Selye, Lazarus and others, it was shown that chronic stress can lead to adverse (mental) health effects such as burnout or depression [2,3]. Also, stress can produce temporary or even permanent alterations in memory [4], cognition [5], arousal/sleep [6,7], and coping behaviours [8]. In our prior study with 214 general practitioners (GPs) and 550 practice assistants from 136 German general practices, we showed that 19.9% of the male GPs (n = 141), 35.6% of the female GPs (n = 73) and 26.4% of the practice assistants (PrAs) had high chronic stress [9]. Overall, the mean prevalence of high chronic stress was 26.3% in this workforce, which is more than twice as prevalent compared to the general population (11%) studied in the representative German Health Interview and Examination Survey for Adults (DEGS1) with more than 7.900 participants [10,11]. Analyzing for various work and (regional) practice characteristics, we showed that only the weekly working hours correlated with high chronic stress in GPs and PrAs. However, aiming to develop effective prevention strategies, a more profound understanding of factors causing and/or contributing to high psychological strain on an individual and group level is needed. As workplaces typically are complex and multifactorial social organizations, appropriate statistical methods are needed to analyse for complex associations and cause-effect relationships. Prior studies addressing impaired psychological well-being in primary care workers used standard statistical procedures such as prevalence ratios and logistic regression models to evaluate for associations [9,12,13]. These statistical approaches usually simplify the complex relationships between independent variables (features) and response variable (dependent variable): they assume that each independent variable is linked to the outcome by a linear statistical function. This is especially problematic when datasets with large numbers of non-linear interactions and interaction effects between independent variables occur, which make the model more complex [14]. Nowadays, machine learning (ML) approaches offer new opportunities to evaluate complex relationships. Conceptually, ML has the benefit that it efficiently exploits complex and non-linear interactions between variables by minimizing the error between predicted and observed response variables and improve the accuracy of the models compared to standard approaches [15,16]. By using a large dataset available on practice assistants from our prior study, we aim to develop better understanding workplace factors, associated with chronic stress in practice assistants using machine learning. Thus, we compare four machine learning classifiers (random forest, support vector machine, K-nearest neighbors’, artificial neural network) with a standard logistic regression model using standard measurements to compare test accuracy, i.e. to derive the best prediction model for chronic stress in practice assistants in primary care. Regarding terminology, we like to point out that we use the term “prediction” as used in the context of machine learning: it refers to the output of an algorithm after it has been trained on a dataset and applied to new data to forecast the likelihood of a particular outcome. In contrast, in epidemiological analyses, a (risk) prediction model refers to a mathematical equation that uses patient characteristics (risk factors) to estimate the probability of a defined outcome prospectively.” b) Page 4, paragraph 1, lines 47-50: this information should go in the “Method” section. Reply: Thank you for your advice. We now clarified that these results stem from our previous publication on chronic stress in GPs and practice assistants in the introduction. In addition, we included this information in the methods section. Revised text: Please see page 7, lines 127-130 (document of revised Manuscript with track changes). “We documented that 26.4% of the 550 practice assistants (PrAs) had high chronic stress, as well as 19.9% of the male (n = 141) and 35.6% of the female (n = 73) general practitioners (GPs) [9]. In this workforce, the average of workers with high chronic stress was 26.3% (n = 201).” 9. METHODS Data source: a) What categories are there among the "general practice personnel"? The authors must describe the sample used. Reply: Thank you, we added the information in the introduction and methods section. Revised text: Please see Methods section, lines 119-127 (document of revised Manuscript with track changes): “The dataset used for the analyses was derived from our cross-sectional study addressing stress among general practice personnel (GPs, PrAs), which was performed among general practices belonging to the teaching practice network of the Institute for General Medicine, University Hospital Essen, Essen, Germany. A total of 764 professionals from 136 practices had taken part in the survey, which was performed in 2014. The design of the study and key results addressing the 214 GPs (practice owners and employed physicians) and 550 practice assistants (PrAs) (including medical secretaries and practice assistants in trainees) are published [9]. This analysis addresses chronic stress in 550 practice assistants (PrAs), which are the largest professional group in general practices.” b) What was the target population? How was the sample chosen? The authors must specify it. Reply: Thank you. We clarified this in the introduction and methods section. The target populations were 550 practice assistants (PrAs) from 136 teaching practice network. Revised text: Please see page 7, lines 119-127 (document of revised Manuscript with track changes). “The dataset used for the analyses was derived from our cross-sectional study addressing stress among general practice personnel (GPs, PrAs), which was performed among general practices belonging to the teaching practice network of the Institute for General Medicine, University Hospital Essen, Essen, Germany. A total of 764 professionals from 136 practices had taken part in the survey, which was performed in 2014. The design of the study and key results addressing the 214 GPs (practice owners and employed physicians) and 550 practice assistants (PrAs) (including medical secretaries and practice assistants in trainees) are published [9]. This analysis addresses chronic stress in 550 practice assistants (PrAs), which are the largest professional group in general practices.” 10. DISCUSSION a) Page 21, lines 272-297: The first two paragraphs do not contribute anything new and repeat information about the results. Reply: We fully agree, thank you for pointing this out, we revised the text. Revised text: Please see the pages 21-22, lines 311-354 (document of revised Manuscript with track changes). “To the best of our knowledge, this study is the first to use machine learning for a better understanding of stress in primary care practice personnel. Comparing four common machine learning (ML) approaches to a classical statistical procedure, we showed that all four machine learning approaches provided more accurate models for the prediction of strain due to chronic stress than as standard regression analysis. Random forest showed the highest accuracy with workload, high demand to concentrate, and time pressure being the most important factors associated with chronic stress. These factors were also identified in other studies in the target populations GPs and GP practice personnel. Addressing job satisfaction, Harris et al. identified time pressure as the most frequent stressor in a study with 626 Australian practice staff in 96 general practices [12]. Studying 158 Canadian family physicians, Lee et al. determined the following occupational stressors as relevant: challenging patients, high workload, time limitations, competency issues, challenges of documentation and practice management and changing roles within the workplace [13,32]. Similarly, Hoffmann et al. showed that the work disruption was a negative relevant workplace factor in study with 550 practice assistants [33]. These stressors are described to influence poor physician well-being and adverse patient outcomes such as low patient satisfaction [34]. The relevance of such chronic psychological burden is tremendous as it was shown that physiological responses due to stress negatively affect e.g. memory, immune system functions, the function of the cardiovascular system, and brain electric activity [35,36].” 11. Limitations a) The information is from 2014. There was a global economic crisis that affected working conditions. It should be said since currently, working conditions are not equivalent. Reply: The working conditions in German general practices did not change during the last years (except during the current pandemic). Workplaces are secure, there are no insolvencies of practices, and the income of practices is a mixture of reimbursement by the statutory health insurances and private patients. The migration influx in 2015 led to more patients in the system, but for each practice these were small numbers. Also, the gross national product remained stable for Germany (https://en.wikipedia.org/wiki/Gross_national_income). Revised text: Please see page 24, lines 399-401 (document of revised Manuscript with track changes). “Although the data collection was conducted in 2014, the results still apply to German practices, except that the COVID pandemic likely increased workload and psychological burden, which we are currently evaluating in an ongoing study [11].” 12. REFERENCES a) Many bibliographies are obsolete and some citations are incomplete. The bibliographic citations used are more than 5 years old (57,1%). The authors must update and arrange the bibliography. Too many references do not meet the journal guidelines and that have errors. The authors should review this section. Reply: Thank you, we reviewed the literature again and updated references. On the other hand, we continue to refer to important studies in the field even if they are older than 5 years. Now only 32.5% of the quotations are older than 5 years. Revised text: Please see section references. Submitted filename: Response to Reviewers.docx Click here for additional data file. 15 Apr 2021 Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model PONE-D-20-23593R1 Dear Dr. Bozorgmehr, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Alfredo Vellido Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #2: Dear authors, Thanks for your reply. The explanations of the authors are satisfactory. The paper has greatly improved its quality. Congratulations on your work. Best regards ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #2: No 19 Apr 2021 PONE-D-20-23593R1 Chronic stress in practice assistants: an Analytic approach comparing four machine learning classifiers with a standard logistic regression model Dear Dr. Bozorgmehr: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Alfredo Vellido Academic Editor PLOS ONE

Table 2

Practice and workplace characteristics during the past three months (n = 550 practice assistants).

Practice characteristics
Practice structure
Working in group practice	296	53.8
Working in single physician practice	147	26.7
Working in practice with several locations	50	9.1
Working in practice with an employed physician	39	7.1
Working in privately owned health center	6	1.1
Medical records
Electronic medical records (EHR)	348	63.3
Paper and electronic records	187	34.0
Practice services
Emergent home visits	515	93.6
Practice offers regular home visits	511	92.9
Nursing home visits*	508	92.4
Tasks of practice assistant during past 3 months
Scheduled appointments	518	94.2
Documented in patients´ EHR	513	93.3
Prepared prescriptions	504	91.6
Pulled up paperhealth records or opened electronic patient files	500	90.9
Performed phone service	499	90.7
Worked at reception	486	88.4
Obtained blood pressure readings	461	83.8
Performed ECGs	430	78.2
Prepared practice equipment for the day and switch them off in the evening	414	75.3
Performed laboratory work	393	71.5
Supported physician during patient-consultations	363	66.0
Supported billing of statutory health insurance patients	358	65.1
Performed disease-management examinations	332	60.4
Applied long-term blood pressure devices*	327	59.5
Ordered medical supply	284	51.6
Applied long-term ECG*	247	44.9
Ordered office supply	239	43.5
Performed treadmill testing	237	43.1
Supported billing of private patients*	236	42.9
Performed doppler examination of foot vessels/measured ankle-arm index*	103	18.7

*Missing values above 5%

27 in total

1. Predicting breast cancer survivability: a comparison of three data mining methods.

Authors: Dursun Delen; Glenn Walker; Amit Kadam
Journal: Artif Intell Med Date: 2005-06 Impact factor: 5.326

2. Validation in prediction research: the waste by data splitting.

Authors: Ewout W Steyerberg
Journal: J Clin Epidemiol Date: 2018-07-29 Impact factor: 6.437

Review 3. On the Control of Social Approach-Avoidance Behavior: Neural and Endocrine Mechanisms.

Authors: Reinoud Kaldewaij; Saskia B J Koch; Inge Volman; Ivan Toni; Karin Roelofs
Journal: Curr Top Behav Neurosci Date: 2017

4. Predicting the Future - Big Data, Machine Learning, and Clinical Medicine.

Authors: Ziad Obermeyer; Ezekiel J Emanuel
Journal: N Engl J Med Date: 2016-09-29 Impact factor: 91.245

5. Job satisfaction of staff and the team environment in Australian general practice.

Authors: Mark F Harris; Judy G Proudfoot; Upali W Jayasinghe; Christine H Holton; Gawaine P Powell Davies; Cheryl L Amoroso; Tanya K Bubner; Justin J Beilby
Journal: Med J Aust Date: 2007-06-04 Impact factor: 7.738

6. Factor structure and psychometric properties of the trier inventory for chronic stress (TICS) in a representative German sample.

Authors: Katja Petrowski; Sören Paul; Cornelia Albani; Elmar Brähler
Journal: BMC Med Res Methodol Date: 2012-04-01 Impact factor: 4.615

7. Prevalence of chronic stress in general practitioners and practice assistants: Personal, practice and regional characteristics.

Authors: Anja Viehmann; Christine Kersting; Anika Thielmann; Birgitta Weltermann
Journal: PLoS One Date: 2017-05-10 Impact factor: 3.240

8. Can machine-learning improve cardiovascular risk prediction using routine clinical data?

Authors: Stephen F Weng; Jenna Reps; Joe Kai; Jonathan M Garibaldi; Nadeem Qureshi
Journal: PLoS One Date: 2017-04-04 Impact factor: 3.240

9. Prevalence of burnout among German general practitioners: Comparison of physicians working in solo and group practices.

Authors: Annegret Dreher; Mirjam Theune; Christine Kersting; Franziska Geiser; Birgitta Weltermann
Journal: PLoS One Date: 2019-02-06 Impact factor: 3.240

10. Machine learning algorithm validation with a limited sample size.

Authors: Andrius Vabalas; Emma Gowen; Ellen Poliakoff; Alexander J Casson
Journal: PLoS One Date: 2019-11-07 Impact factor: 3.240

2 in total

1. Posttraumatic stress disorder hyperarousal event detection using smartwatch physiological and activity data.

Authors: Mahnoosh Sadeghi; Anthony D McDonald; Farzan Sasangohar
Journal: PLoS One Date: 2022-05-18 Impact factor: 3.752

2. Applying machine learning to explore the association between biological stress and near misses in emergency medicine residents.

Authors: Sonnet Xu; Judith E Arnetz; Bengt B Arnetz
Journal: PLoS One Date: 2022-03-08 Impact factor: 3.240

2 in total