Literature DB >> 36129936

A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran.

Hamed Tavolinejad^1,2, Shahin Roshani^1,3, Negar Rezaei^1,4, Erfan Ghasemi¹, Moein Yoosefi¹, Nazila Rezaei¹, Azin Ghamari¹, Sarvenaz Shahin¹, Sina Azadnajafabad¹, Mohammad-Reza Malekpour¹, Mohammad-Mahdi Rashidi¹, Farshad Farzadfar^1,4.

Abstract

BACKGROUND: The increasing burden of hypertension in low- to middle-income countries necessitates the assessment of care coverage to monitor progress and guide future policies. This study uses an ensemble learning approach to evaluate hypertension care coverage in a nationally representative Iranian survey.
METHODS: The data source was the cross-sectional 2016 Iranian STEPwise approach to risk factor surveillance (STEPs). Hypertension was based on blood pressure ≥140/90 mmHg, reported use of anti-hypertensive medications, or a previous hypertension diagnosis. The four steps of care were screening (irrespective of blood pressure value), diagnosis, treatment, and control. The proportion of patients reaching each step was calculated, and a random forest model was used to identify features associated with progression to each step. After model optimization, the six most important variables at each step were considered to demonstrate population-based marginal effects.
RESULTS: The total number of participants was 30541 (52.3% female, median age: 42 years). Overall, 9420 (30.8%) had hypertension, among which 89.7% had screening, 62.3% received diagnosis, 49.3% were treated, and 7.9% achieved control. The random forest model indicated that younger age, male sex, lower wealth, and being unmarried/divorced were consistently associated with a lower probability of receiving care in different levels. Dyslipidemia was associated with reaching diagnosis and treatment steps; however, patients with other cardiovascular comorbidities were not likely to receive more intensive blood pressure management.
CONCLUSION: Hypertension care was mostly missing the treatment and control stages. The random forest model identified features associated with receiving care, indicating opportunities to improve effective coverage.

Entities: Chemical

Mesh：

Substances：
Antihypertensive Agents

Year: 2022 PMID： 36129936 PMCID： PMC9491523 DOI： 10.1371/journal.pone.0273560

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

Success in controlling communicable diseases, population growth, and aging have led to a demographic and epidemiologic shift in low- and middle-income countries (LMICs) [1]. As a result, the health-related burden of non-communicable diseases (NCDs) has become one of the most significant social and economic challenges facing LMICs towards sustainable development [1, 2]. The health-care systems of LMICs struggle in making the necessary adaptations since NCDs require longitudinal, patient-centered, and multilevel care [3]. Hypertension is a leading NCDs risk factor which can lead to mortality and morbidity [4], and over the past decades, the burden of hypertension has shifted to LMICs with an increase in the prevalence of high blood pressure (BP) [4, 5]. Assessment of health-care system performance and coverage for NCDs is essential to guide public health policies and succeed in reducing risk factors. Care cascade models are used to assess coverage and gaps in care for chronic infectious diseases such as human immunodeficiency virus and latent tuberculosis infection [6, 7]. Regarding NCDs, a number of studies have performed similar care cascade analyses [8, 9]. Thus far, an evaluation of national-level care for NCDs and their risk factors has not been reported from Iran. In this context, we used ensemble learning to evaluate the state of hypertension care. Such methods can be particularly useful to derive meaningful inferences from large datasets [10], which means data mining is a superior method for evaluating NCDs care. In this study, we aimed to discover the associations of receiving appropriate interventions through different stages of hypertension care by using the random forest model in a large nationally representative data.

Materials and methods

Data source

This analysis was performed on the 2016 STEPwise approach to risk factor Surveillance (STEPs) data from Iran. STEPs is a population-based, large-scale, cross-sectional study aiming to monitor NCDs based on the STEPs framework developed by the World Health Organization (WHO) [11]. The design of 2016 STEPs survey is further described elsewhere [12]. The STEPs data was deemed appropriate for this analysis since it samples a wide variety of communities, encompasses a broad age range, and employs a standardized method.

Ethical considerations

The 2016 STEPs study complied with the latest edition of the declaration of Helsinki and was approved by the ethics committee of the National Institute for Medical Research Development (NIMAD Approval ID: IR.NIMAD.REC.1394.032). Participants received a detailed explanation of the rationale and objectives of the study and provided written informed consent before inclusion.

Study population

Patients with hypertension, defined as the presence of either (I) systolic BP (SBP)≥140 mmHg or diastolic BP (DBP)≥90 mmHg; (II) ever using medications for hypertension; (III) or a previously diagnosed hypertension by a health-care provider (HCP) were included [13]. The 2016 STEPs anthropometry section required three BP measurements at three-minute intervals. If all three values were available, the mean of the latter two measurements was used. In participants with two readings, the first was discarded and the second measurement entered the dataset. An algorithmic description of hypertension definition is presented in S1 File/Methods.

Steps of care

We determined steps as the proportion of hypertensive patients who fulfill a set of criteria. The first step (designated as “Screening”) determined if the patient’s BP had ever been measured by an HCP. It should be mentioned that fulfillment of screening was directly based on history of BP measurement by HCP, irrespective of the presence of hypertension, and without considering the measured BP value. The second step (“Diagnosis”) was defined as ever receiving hypertension diagnosis by an HCP. BP equal to or higher than 140/90 mmHg, as defined in our hypertension population, usually indicates pharmacological treatment; hence the third step (“Treatment”) was ever receiving anti-hypertensive medications. Reaching the fourth step (“Control”) required having SBP<130 mmHg and DBP<80 mmHg [14]. The thresholds for defining hypertension and reaching control were selected to best reflect the state of care in Iran. These conventional cut-offs are widely used in LMICs [15], and enable comparison with other available data from these countries [8, 15]. Essentially, each step was a prerequisite for the next one. We defined the outcome as reaching steps of hypertension care and looked for characteristics associated with each level. Further details of the care cascade definition are available in S1 File.

Associated features

We examined the association of receiving care with demographic characteristics (age, sex, marital status), socio-economic features (rural/urban residency, education level, wealth index, occupation, being the head of household, insurance coverage), and comorbidities (history of cardiovascular disease, smoking, dyslipidemia, diabetes mellitus, and body mass index [BMI] category). Education level was based on the number of years spent in school or university and included primary schooling, secondary education, academic education. Wealth index was previously defined for STEPs [16], and is based on quintiles derived from a Principal Component Analysis of the family assets. Insurance coverage levels were based on different health-insurance plans available in the country. BMI levels were categorized to underweight (BMI≤17.5), normal (17.5≤BMI<25), overweight (25≤BMI<30), obese (30≤BMI<35), and morbid obese (BMI≥35).

Statistical modeling

Random forest is an ensemble model that handles categorical variables without the need to transform them to binary forms, and when it is well optimized through a proper resampling process, it provides appropriate and competitive predictive accuracy compared to other algorithms [17]. Moreover, optimization of hyper-parameters is highly efficient in random forests due to independence of trees, which leads to easy parallelization of the tree fitting process, and the unique availability of out of bag validation, which makes the validation process less time consuming, enabling the use of time and computational power to expand hyper-parameters space for hyper-parameter tuning and obtain an even better form of the model. Indeed, there may be other algorithms that could achieve a slightly better accuracy for our analysis, but we chose random forest based on the above-mentioned considerations and the characteristics of our data. Hyper-parameters are involved in various machine learning algorithms to control their complexity. More complex models have less bias but may have too much variability due to overfitting, while less complex models can be too shallow and have more biased results. In this sense, hyper-parameters must be tuned through an appropriate validation process to create a balance in algorithms bias-variance trade-off and deliver generalizable results. To control the complexity of our random forest model and avoid overfitting, models’ performances were evaluated using accuracy as loss function for combinations of the hyper-parameters—mtry (sample size of predictors) = 2, 6, 10; minimum node size = 1, 5, 10; sample fraction of observations = 25%, 50%, 75%, 100%—in out of bag validation procedure with 50 resamples to ensure the generalizability of the validation results. Gini splitting rule was fixed in the validation process. In pre-processing stages, removing zero/near zero variance variables, bag-imputation of missing values, and Synthetic Minority Oversampling Technique (SMOTE) sampling [18] to limit the class imbalance effect of response variable, were used during the resampling process to avoid data leakage phenomenon. After obtaining the final optimized random forest model, specific model-agnostic interpretation tools were used. Permutation-based variable importance [19, 20], with measuring the change in loss function after permutation of the targeted predictor, was used to rank variables so that bigger changes indicate more important variables. Subsequently, ranked variables were divided into quartiles and Partial Dependence Plot (PDP) [21, 22] were drawn to clearly demonstrate the population-based marginal effects for the most important quartile (4th quartile) of ranked variables. Importantly, existing interacting/confounding effects were taken into account by the model. Higher-dimension interactions of features were evaluated by partial dependence plots. It should be noted that dashed lines in PDPs do not indicate continuity between levels of categorical variables, and they were only drawn to facilitate visualizing the changes in PDPs. All procedures were done using R (Version: 3.6.1) and RStudio (Version: 1.2.1335).

Results

The analysis included 30541 participants (52.3% female; median age: 42). The response rate reached 98.4% in STEPs 2016. According to our definition, 9420 (30.8%) individuals were hypertensive at the time of the survey, among whom 89.7% had ever had BP screening, 62.3% had received appropriate diagnosis, 49.3% had been treated for hypertension, and 7.9% had achieved BP control before the study. The characteristics of participants at each level of the care cascade are summarized in Table 1.

Table 1

Population characteristics in each step of the care cascade.

	Hypertensive patients (n = 9420)	Screened (n = 8451)	Diagnosed (n = 5866)	Treated (n = 4643)	Controlled (n = 747)
Demographic features
Age, years
<45	2245 (23.83%)	1811 (21.43%)	982 (16.74%)	463 (9.97%)	66 (8.84%)
[45,56]	2323 (24.66%)	2077 (24.58%)	1370 (23.35%)	1047 (22.55%)	155 (20.75%)
[56,66]	2327 (24.70%)	2185 (25.85%)	1631 (27.80%)	1403 (30.22%)	219 (29.32%)
≥66	2525 (26.80%)	2378 (28.14%)	1883 (32.10%)	1730 (37.26%)	307 (41.10%)
Female sex	5209 (55.30%)	4832 (57.18%)	3601 (61.39%)	2891 (62.27%)	456 (61.04%)
Marital status
Unmarried	471 (5.03%)	338 (4.02%)	131 (2.25%)	59 (1.28%)	8 (1.08%)
Married	7482 (79.88%)	6739 (80.20%)	4645 (79.72%)	3612 (78.40%)	569 (76.48%)
Divorced/separated	163 (1.74%)	141 (1.68%)	92 (1.58%)	70 (1.52%)	15 (2.02%)
Widow/widower	1251 (13.36%)	1185 (14.10%)	959 (16.46%)	866 (18.80%)	152 (20.43%)
Household head	5101 (54.40%)	4549 (54.07%)	3076 (52.74%)	2495 (54.13%)	407 (54.70%)
Socio-economic features
Area of residence
Urban	6514 (69.15%)	5863 (69.38%)	4082 (69.59%)	3280 (70.64%)	545 (72.96%)
Rural	2906 (30.85%)	2588 (30.62%)	1784 (30.41%)	1363 (29.36%)	202 (27.04%)
Education
Primary schooling	3807 (41.84%)	3504 (42.89%)	2638 (46.68%)	2259 (50.72%)	341 (46.97%)
Secondary education	2932 (32.23%)	2581 (31.60%)	1731 (30.63%)	1306 (29.32%)	236 (32.51%)
Academic education	2359 (25.93%)	2084 (25.51%)	1282 (22.69%)	889 (19.96%)	149 (20.52%)
Wealth index
Very low	2072 (22.45%)	1812 (21.89%)	1312 (22.85%)	1065 (23.47%)	156 (21.34%)
Low	1979 (21.44%)	1743 (21.06%)	1216 (21.18%)	969 (21.36%)	148 (20.25%)
Medium	1829 (19.82%)	1664 (20.10%)	1142 (19.89%)	907 (19.99%)	137 (18.74%)
High	1749 (18.95%)	1579 (19.08%)	1057 (18.41%)	814 (17.94%)	148 (20.25%)
Very high	1601 (17.35%)	1479 (17.87%)	1015 (17.68%)	782 (17.24%)	142 (19.43%)
Occupation
White-collar clerk	546 (5.82%)	490 (5.83%)	304 (5.21%)	208 (4.51%)	26 (3.49%)
Blue-collar worker	348 (3.71%)	275 (3.27%)	147 (2.52%)	100 (2.17%)	17 (2.28%)
Self-employed	1834 (19.56%)	1535 (18.25%)	912 (15.62%)	633 (13.72%)	90 (12.08%)
Volunteer/conscript	69 (0.74%)	58 (0.69%)	29 (0.50%)	20 (0.43%)	4 (0.54%)
Student	99 (1.06%)	71 (0.84%)	36 (0.62%)	10 (0.22%)	2 (0.27%)
Housewife	4631 (49.40%)	4294 (51.06%)	3215 (55.08%)	2613 (56.62%)	398 (53.42%)
Unemployed	615 (6.56%)	533 (6.34%)	381 (6.53%)	316 (6.85%)	72 (9.66%)
Pensioner	1232 (13.14%)	1153 (13.71%)	813 (13.93%)	715 (15.49%)	136 (18.26%)
Insurance coverage
No coverage	572 (6.14%)	459 (5.49%)	281 (4.85%)	203 (4.43%)	33 (4.44%)
Basic package	6320 (67.79%)	5605 (67.02%)	3836 (66.15%)	2949 (64.33%)	445 (59.81%)
Complementary package	2431 (26.08%)	2299 (27.49%)	1682 (29.01%)	1432 (31.24%)	266 (35.75%)
	Hypertensive patients (n = 9420)	Screened (n = 8451)	Diagnosed (n = 5866)	Treated (n = 4643)	Controlled (n = 747)
Comorbidities
Cardiovascular disease	328 (3.49%)	319 (3.79%)	279 (4.78%)	268 (5.80%)	67 (8.97%)
Diabetes mellitus	1621 (23.56%)	1554 (24.96%)	1228 (28.26%)	1092 (31.91%)	182 (32.79%)
Smoking	2031 (21.65%)	1815 (21.56%)	1209 (20.72%)	921 (19.96%)	173 (23.16%)
Dyslipidemia	2947 (31.37%)	2854 (33.88%)	2316 (39.65%)	1969 (42.64%)	343 (45.92%)
Body mass index, kg/m²
<17.5	85 (0.93%)	67 (0.82%)	44 (0.78%)	24 (0.54%)	6 (0.84%)
[17.5–25]	2279 (24.95%)	1982 (24.21%)	1304 (23.10%)	982 (22.04%)	189 (26.51%)
[25–30]	3615 (39.57%)	3239 (39.57%)	2204 (39.04%)	1754 (39.36%)	287 (40.25%)
[30–35]	2238 (24.50%)	2059 (25.15%)	1468 (26.01%)	1184 (26.57%)	158 (22.16%)
≥35	918 (10.05%)	839 (10.25%)	625 (11.07%)	512 (11.49%)	73 (10.24%)

Data are reported as number (percentage).

Data are reported as number (percentage). Associated variables were sorted according to their levels of importance in prediction of reaching care steps (results of hyper-parameter tuning for model optimization are presented in the S1 File/Hyperparameter tuning). The most important features emerging as good classifiers in the care continuum were age, sex, occupation, education, wealth index, marital status, being the head of household, and dyslipidemia. Age was an important predictor in hypertension care, demonstrating the highest importance in screening, diagnosis, and treatment, as well as the second highest importance in hypertension control. In each of the four steps, older age was consistently associated with a higher likelihood of reaching higher levels. The shapes of PDPs support this interpretation, and with increasing age, the mean predicted probabilities (MPP) increased in all steps (Figs 1–4). This association was observed across all age groups and was not limited to the elderly or the very young individuals. Notably, the age disparity in hypertension care, with younger patients being less likely to receive appropriate care, was more pronounced in rural than urban areas, as the gap between age groups was wider in rural communities for all steps of care (S1-S4 Figs in S1 File). Another important feature appearing in the top six in all steps of care was sex. Female sex was associated with a higher probability of being screened (MPP: 0.91 versus 0.86), diagnosed (MPP: 0.66 versus 0.56), treated (MPP: 0.52 versus 0.47), and achieving control (MPP: 0.10 versus 0.09) for hypertension compared to males.

Fig 1

Importance of population characteristics and comparative probabilities of the top six important classifiers for screening.

Fig 4

Importance of population characteristics and comparative probabilities of the top six important classifiers for control.

The level of education had a varying association with receiving care in different steps. Higher education attainment was associated with a higher likelihood of being screened for hypertension (MPP in ascending order of education attainment: 0.87, 0.88, and 0.89; Fig 1) and achieving BP control (MPP in ascending order of education attainment: 0.09, 0.10, and 0.11; Fig 4). Conversely, a lower level of education was associated with a better chance of being diagnosed (MPP in ascending order of education attainment: 0.62, 0.61, and 0.59; Fig 2) and treated (MPP in ascending order of education attainment: 0.51, 0.48, and 0.46; Fig 3).

Fig 2

Importance of population characteristics and comparative probabilities of the top six important classifiers for diagnosis.

Fig 3

Importance of population characteristics and comparative probabilities of the top six important classifiers for treatment.

Wealth index appeared in the top six associated features at the level of screening and control. Individuals with very low (MPP = 0.86) and low (MPP = 0.87) wealth indices had a lower likelihood to be screened for hypertension. Medium (MPP = 0.90), high (MPP = 0.90), and very high (MPP = 0.90) wealth groups were similar in terms of hypertension screening (Fig 1). For hypertension control, a higher wealth index was associated with a better outcome. (MPP in ascending order of wealth index: 0.09, 0.08, 0.08, 0.10, and 0.13; Fig 4). While a higher wealth index meant a higher probability of receiving care, wealth showed interactions with education and area of residence. Higher wealth did not result in enhanced screening, diagnosis, or treatment among individuals with only primary educational attainment, contrary to better educated individuals (S5-S12 Figs in S1 File). Among individuals with low wealth indices, those living in rural areas received better care compared to urban communities; however, among individuals with higher levels of wealth, more appropriate care was observed in urban areas (S5-S8 Figs in S1 File). This trend was specially observed for screening and diagnosis. Notably, urban communities had a higher likelihood of being screened, diagnosed, or treated, but a lower rate of achieving control compared to patients living in rural areas (S1-S8 Figs in S1 File). Marital status was an important factor in determining if individuals reached screening and treatment (Figs 1 and 3). Single/unmarried people had by far the lowest probability of being screened (MPP = 0.77) or treated (MPP = 0.36) for hypertension. For both screening and treatment steps, divorced/separated patients (MPP for screening = 0.84; and treatment = 0.46) appeared as the second most vulnerable group. On the other hand, married patients (MPP for screening = 0.90; and treatment = 0.50), and widows/widowers (MPP for screening = 0.88; and treatment = 0.57) had a better chance of being screened or treated. Investigating the interactions of marital status with age, sex, area of residence, and wealth showed that while females generally had better outcomes than males, an exception to this trend was observed among single individuals, as single males had a higher likelihood to be appropriately diagnosed and treated than single females (S13-S20 Figs in S1 File). Marital status did not show a consistent interaction with other variables. Occupation showed a strong, yet heterogeneous association with reaching steps. For the diagnosis step, volunteers/military conscripts had the lowest probability of receiving care (MPP = 0.53), followed by students (MPP = 0.57), blue-collar workers (MPP = 0.57), and self-employed individuals (MPP = 0.60). Hypertensive pensioners had a higher chance of being diagnosed (MPP = 0.62), while white-collar clerks (MPP = 0.63), unemployed persons (MPP = 0.63), and housewives (MPP = 0.63) were the most likely to receive appropriate hypertension diagnosis (Fig 2). In the treatment step, blue-collar workers (MPP = 0.43) and volunteers/military conscripts (MPP = 0.43) demonstrated the lowest probability of being cared for. In ascending order, students (MPP = 0.44), self-employed workers (MPP = 0.44), white-collar clerks (MPP = 0.49), unemployed individuals (MPP = 0.51), pensioners (MPP = 0.52), and housewives (MPP = 0.53) had a higher chance of being treated for their diagnosed hypertension (Fig 3). On the other hand, in the BP control step, being self-employed (MPP = 0.09) or a housewife (MPP = 0.09) was associated with the lowest probability of achieving BP targets. The next most ineffective hypertension control was observed in blue-collar workers (MPP = 0.10) and volunteers/military conscripts (MPP = 0.10). Pensioners (MPP = 0.11) and white-collar clerks (MPP = 0.13) had a better chance for reaching their BP goal. Control was most successful among students (MPP = 0.16), followed by unemployed individuals (MPP = 0.15; Fig 4). Being the head of household was another socio-economic feature among top classifiers in the screening, diagnosis, and control steps (Figs 1, 2 and 4). The heads of households had a higher chance of being screened (MPP: 0.90 versus 0.88). On the contrary, being the head of household was associated with a lower likelihood than other family members for achieving BP control (MPP: 0.09 versus 0.10). The only cardiovascular comorbidity appearing in the top six associated features was dyslipidemia. Among hypertensive patients, the presence of dyslipidemia was associated with a higher chance of receiving the appropriate hypertension diagnosis (MPP: 0.73 versus 0.57) and treatment (MPP: 0.60 versus 0.40; Figs 2 and 4). Investigation of interactions showed that among younger age groups, underweight and normal-weight individuals had a lower probability to be screened, diagnosed or treated. In the control step, however, a higher BMI was associated with lower achievement of BP targets. Importantly, smoking and diabetes did not show a meaningful association with receiving care in any sub-group of the population (S21-S28 Figs in S1 File).

Discussion

This nationally representative data implies that there is substantial room for improvement in the care coverage for hypertension in Iran. While almost nine out of ten hypertensive patients had had BP screening, two thirds had received the appropriate diagnosis by an HCP, only half had been treated with anti-hypertensive medications prior to the survey, and about 8% had achieved BP control. Our results from the Iranian health-care performance were comparable to other LMICs. According to a 2019 analysis of pooled individual-level data from 44 LMICs (not including Iran), 74% of hypertensive patients had received screening, 39% had a prior diagnosis of hypertension, 30% had been treated, and only 10% had proper BP control; however, these numbers had large variations among countries [8]. A recent systematic review of hypertension care in Arab countries concluded that more than 40% of all hypertensive patients were unaware of their condition, while less than 21% were left untreated [23]. Compared to high-income countries, the hypertension care coverage in Iran and other LMICs appears to be lower, especially in the control step. A study of near half a million individuals from 12 high-income countries showed that the proportion of awareness (defined as having received the diagnosis of hypertension) was 56–87% and 46–84%, treatment was 55–80% and 39–81%, and control was 26–58% and 17–69%, among women and men, respectively [24]. In our study, the high rate of screening seems encouraging, especially when compared to other LMICs [8]; however, screening did not lead to proper diagnosis, treatment, and control. A 2005 study with similar definitions of hypertension, diagnosis, and treatment reported a diagnosis rate of 49.2% and a treatment rate of 35.7% among Iranian individuals with hypertension [25]. In comparison, our data showed better coverage in the diagnosis and treatment steps which probably indicates improvement in hypertension care between 2005 and 2016. On the other hand, control rates remained low in our study. Importantly, while hypertension can be controlled by oral medications, there are myriad other factors that influence BP levels, such as dietary habits, physical activity, environmental risk factors like air pollution [26, 27], adherence to medications, and continuation of visits with the same provider [28]. Among the Iranian population, consumption of salt is higher than the recommended amounts [29], and almost half of adults have an insufficient level of physical activity [30]. To some extent, these observations might explain the failure to achieve BP control among Iranians, which indicates a need for implementing population-level strategies and health education to modify lifestyle. We employed ensemble learning, as a superior approach to conventional regression models [17], for analysis of care cascade to find the characteristics associated with hypertension care coverage. Machine learning methods provide many advantages over conventional statistical models in interpreting large datasets [10]. One aspect of random forest model is that it essentially examines the effects of all variables in the dataset simultaneously in deciding the outcome. Incorporation of potential interactions in the model eliminates the possibility of confounding among included variables. We found that among hypertensive patients, younger age, male sex, being unmarried or divorced, lower wealth, or having certain vulnerable occupations were features consistently associated with a lower probability of receiving care. These findings can inform and facilitate future policies to address the existing gaps in hypertension care. By identifying groups who are more likely to be missed at each level, efforts can be made to include more vulnerable individuals in the cascade of care and ultimately, prevent downstream end-organ damage and cardiovascular events attributable to high BP [31]. According to our results, there should be a particular focus on younger adults with hypertension, among whom high BP was more likely to be missed in all steps of care. Importantly, we observed that a young adult who is not overweight or obese, i.e., not the stereotype of a hypertensive patient, was more likely to be neglected for screening, diagnosis, and treatment of hypertension. This suggests that younger individuals may underestimate the risk associated with hypertension, and the health-care system may direct fewer resources to NCDs prevention in young adults. Moreover, the age-related gap in hypertension care cascade was wider in rural compared to urban communities, which could be due to lower health information access and health literacy in rural areas [32]. The higher probability for women to receive hypertension care was compatible with findings from previous studies [8, 13, 24, 25]. The reasons for this observation are multiple, and may include gender differences in health-care-seeking behaviors [33], or a higher emphasis on BP management resulting from perinatal care. In this study, individuals with lower wealth were less likely to reach higher stages. Although anti-hypertensive drugs are both affordable and accessible in Iran, an analysis of the Iranian Food and Drug Administration data in 2002–2011 demonstrated a wealth-related inequality regarding the use of anti-hypertensive medications among provinces [34]. This evidence explains the important role of wealth index observed in achieving BP control, and underlines the priority of developing accessible prevention strategies in LMICs. Importantly, a high wealth-index did not translate into better care among the group with low levels of educational attainment, highlighting the critical and intertwined role of socio-economic features in hypertension care. Among comorbidities, dyslipidemia was associated with a higher probability of being diagnosed and treated; however, history of cardiovascular disease, diabetes mellitus, smoking, or obesity did not appear among the top classifiers. This observation may be concerning as it means that patients with comorbidities, who are at higher risk, were not prioritized for reaching BP control. Future policies should ensure that higher risk groups remain in the care cascade for an integrated risk factor management. This study provides insight into the current state of hypertension care at the national level. Use of a nationally representative data encompassing a broad range of individual characteristics can be regarded as a strength of this study. A central feature is the use of machine learning for evaluation of hypertension care cascade, which can inform future policies by identifying the characteristics that are predictors of being lost to care at different steps. This study has several limitations. First, the cross-sectional design of STEPs limits our evaluation of the care cascade. While the other available studies have similarly conducted care cascade analyses on cross-sectional data [8, 9, 13, 24], longitudinal studies can provide more accurate results. Second, we included a previous diagnosis by an HCP in our definition of hypertension. There is a possibility that some of the patients who reported a previous hypertension diagnosis, had a BP < 140/90 mmHg, and did not receive treatment, were not actually hypertensive. This might have led to the inclusion of normotensive patients and underestimation of treatment and control rates; however, the number of such normotensive patients was expected to be low, and we chose this design to improve the sensitivity for detecting hypertension in the study population. Third, for a reliable BP reading, measurement should be performed in more than one occasion, and ideally with out-of-office techniques; however, due to the limitations in the design of STEPs, we could only use measurements from one patient encounter. Fourth, we selected conventionally used BP thresholds for hypertension. It should be noted that these thresholds are not in complete agreement with recommendations of most recent hypertension guidelines [35], and using different thresholds may lead to changes in results. Lastly, we could not develop a predictive model for hypertension care based on the available data. Future studies, such as future STEPs surveys in Iran, could be used for this purpose.

Conclusion

Data from the nationally representative Iranian STEPs survey showed that hypertension care in the country is mostly missing hypertensive individuals in the treatment and control stages. A random forest model determined features associated with hypertension care and indicated targets for improvement. The most important observations were that younger adults, especially those living in rural areas or without conventional hypertension risk factors such as obesity, were more likely to miss care cascade steps. Moreover, males generally had a lower state of care compared to females. Other important features associated with lower care coverage were low wealth, unmarried or divorced status, or occupations such as being a blue-collar worker or self-employment. Random forest model is a helpful tool for recognizing patterns of care coverage for NCDs and their risk factors. (PDF) Click here for additional data file. 3 Dec 2021

PONE-D-21-32708

A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran

PLOS ONE Dear Dr. Farzadfar, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has some merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Specifically, please address all comments made by the reviewers. Be sure to:

Please submit your revised manuscript by Jan 17 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Indicate what is the added value of the study and its contribution to the field Address all concern in its method (design/analysis) Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Amir Radfar, MD,MPH,MSc,DHSc Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please note that PLOS ONE has specific guidelines on code sharing for submissions in which author-generated code underpins the findings in the manuscript. In these cases, all author-generated code must be made available without restrictions upon publication of the work. Please review our guidelines at https://journals.plos.org/plosone/s/materials-and-software-sharing#loc-sharing-code and ensure that your code is shared in a way that follows best practice and facilitates reproducibility and reuse. 3. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 4. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions. In your revised cover letter, please address the following prompts: a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent. b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories. We will update your Data Availability statement on your behalf to reflect the information you provide. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: No Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No Reviewer #3: N/A ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: No Reviewer #3: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: The publication "A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran" by Tavolinejad et al. tackles an important problem of development and progression of hypertension, moreover it is driven by the low- to middle-income country data. The authors consider hypertension to be defined as blood pressure ≥ 140/90 mmHg), or reported use of anti-hypertensive medications, or a previous hypertension diagnosis. The four steps of care considered for the analysis included screening, diagnosis, treatment, or control. Using machine learning approach (random forest), the authors estimated that from 30.8% of population with hypertension, 89.7% were screened, 62.3% received diagnosis, 49.3 were treated, and 7.9% achieved control. The random forest analysis indicated that younger age, male sex, low wealth, and unmarried/divorced status were associated with a lower probability of receiving care. As this work represents a high educational value to the populations affected by hypertension as well as may serve a the learning about detection and consequences early, this potential is diminished by several major issues related to the following aspects: 1. Although the statistical data is rich and presented clearly (Table 1), the figures in the main manuscript are not readable. On the other hand, the figures in the supplementary files are clear and of high quality. The analyses are performed with a good technical standard and are described in sufficient detail however I disagree with the usage of the word "novel" toward the random forest application in these population data. 2. The authors used random forest to evaluate the features importance for each of the stages, while the post analyses of random forest include other valuable applications that I highly recommend to the authors. To do that, the authors should move beyond feature importance or probability distribution within. The study will benefit the audience if the authors consider: - plotting/analyzing the probability data in higher dimension to seek for hidden relationships and patterns between e.g., age, education, status, gender, etc, i.e., patterns not limited to the conclusions that are expected from linear analysis of individual data (young single male with low income having lower care). Using the four figures to support the findings seems as a very good start to more advanced analyses and more informative conclusions. In addition, potential relationships with the biological class of features (BMI, diabetes, smoking) would start showing to play a role in only a particular cohort. - how do the classifiers cluster (clustering) or relate (PCA) to one another among population? - building a predictive model from the data, perhaps for the remaining large cohort of the population not associated with hypertension. 3. It is unclear if step "screening" is associated with HP when introduced in the abstract. 4. Lines 90-91: is condition (I) considered when "SBP and DBP" or "SBP or DBP" are present? 5. Line 99: the term "screening" still doesn't explain clearly if the BP measured was ever related to HP. 6. Line 117: it is unclear what authors mean by "Random forest is an ensemble model that handles categorical variables AS THEY ARE"? 7. Lines 118-119: authors should expand on: "...it provides appropriate and competitive predictive accuracy compared to other algorithms". It is recommended that authors make an explicit comparison of their analyses with other machine learning approaches before stating that random forest is the best choice for it. 8. Line 119: definition of "hyper-parameters" should be put early for the readers. 9. Line 131: it is unclear what authors mean by knowledge in: "...tools were used to extract knowledge". 10. Line 161 and further: the class of "white collar" should be specified as either clerks or workers for consistency. 11. Lines 184-185: The sentence "Very low (MPP=0.86) and low (MPP=0.87) wealth indices were associated with a decreased probability of being screened." is unclear. 12. It is suggested that, given the longitudinal data/steps present, the authors should analyze the longitudinal data on additional level; one idea could be to look at the trends in features' importance while moving along the 4 steps. More informative analysis could include population models for the subpopulations pre-classified with the machine learning tools. In summary, the study should be enhanced by computational analyses and graphical representation of the results, including, yet not limited to those suggested in #2 and #12. If these additional analyses are added, the manuscript could be submitted to the follow-up review. Reviewer #2: The manuscript presents a Random Forest based machine learning approach to evaluate the state of hypertension care coverage through a population-based data from Iran. Unfortunately, the paper falls short in bringing any fresh insights to the readers. Most of the causes and recommendations made by the authors are already known to the community, and it is unclear what new insight machine learning is offering, if any. Bulk of the paper is dedicated to reporting what the tool running the Random Forest model outputs, without adequate deeper dive. There is little detail on handling the data itself, as what was done to pre-process, normalize and standardize the data prior to feeding it to the machine learning model. Given the understandable lack of availability of the dataset in public domain, it is useful to create credibility around the analysis through rigorous statistics and figures. The paper lacks to explain as what the predictions mean in context of the data as well as the field itself. On what could be a very valuable dataset to analyse, I am afraid that there is a need for extended analysis and deeper interpretation of results, both in terms of machine learning model as well as for the hypertension care coverage. Reviewer #3: The authors present a very well-written and important paper that helps assess the reasons for the discontinuation in the care pathway through the diagnosis, treatment and control of hypertension in Iran. For that they use the random forest algorithm and conclude that younger adults, men, unmarried or divorced people and those with lower socio-economical status are more at risk to not receive treatment and achieve control of their blood pressure. Overall the paper was clear and presented the findings in an understandable fashion. I have two main suggestions to be discussed with the authors in two different domains. (A) Guideline Selection and Phenotyping Algorithm 1. Firstly, I would like to understand the reason for the selection of the older AHA guideline (2017) since in 2020 there was the release of a new version. The new guideline document can be found in the following weblink: https://www.ahajournals.org/doi/10.1161/HYPERTENSIONAHA.120.15026. In it, there are some different recommendations for the essential and optimal control of blood pressure. 2. Also, according to the author's phenotyping algorithm, hypertensive patients had SBP >= 140 and DBP >= 90 and controlled hypertensive SBP < 130 and DBP < 80. This leaves a gap in the participants who had SBP in between 130-139 and 80-89 (the new guideline covers that gap). I suggest that a closer look is taken into that. 3. Again, regarding the guideline, usually blood pressure is measured in more than one encounter or with out-of-office techniques to discard white coat and masked hypertension. I do understand the cross-section nature of the STEPs study and the lack of longitudinal data. However, I believe this should be mentioned in the limitations as well. (B) Algorithm Selection and Hyper-Parameter Tuning 1. The use of the random forest algorithm is quite interesting and it is indeed powerful. However, there is another class of ensemble algorithms that usually outperforms random forest called gradient boosting machines (examples are XGBoost and LightGBM) which generally use decision trees. In that aspect, I would like to hear from the authors the reason for opting for random forest instead of one of those algorithms. 2. Lastly, it was not clear to me how the hyper-parameter tuning was executed. This is a very important step and it is good practice to have a separated validation set for that. It would be beneficial to understand the methodology used by the authors and have it thoroughly explained in the paper or in the supplement for better judgment of the results and also increased reproducibility. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Ariane Sasso [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 5 Mar 2022 Dear Professor Radfar, Academic editor of the PLOS ONE, We are pleased to resubmit our revised manuscript entitled “A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran” (Manuscript ID: PONE-D-21-32708). On behalf of our research team, I would like to thank the respected editorial office of PLOS ONE and the honorable reviewers for the time and effort they took in evaluating our work. We carefully addressed the insightful comments of the reviewers. The following pages include a point-by-point response to the comments with references to the changes made in the manuscript. We made sure to indicate the added value and contribution of this study, which is an in-depth analysis of the state of hypertension care in Iran—a relevant topic considering the paucity of data from LMICs in this regard. Moreover, we addressed the concerns raised about the methodology of this investigation. As instructed, we have submitted a revised version with changes marked in highlight and a second copy without any markings. We have applied PLOS ONE style requirements to our files. We declare that this study was funded by Ministry of Health and Medical Education and National Institute for Health Research (grant number: 241-93259), in the financial disclosure and funding information. Regarding our data availability statement, it should be noted that the STEPS 2016 study is a project launched by the Iranian Ministry of Health and Medical Education of Iran (MOHME), which owns the rights to the dataset. However, interested and qualified researchers may contact the Non-Communicable Diseases Research Center (www.ncdrc.net) to access the datasets of the STEPS 2016 study. The aggregated level data are freely accessible via https://vizit.report/en/index.html. The author-generated codes can be uploaded to GitHub and be made freely available, if the journal considers this source suitable. Otherwise, we are ready to upload the code in alternative sources that the journal would suggest. We are looking forward to receiving your kind response and decision. Farshad Farzadfar (MD, MPH, MSc, DSc) Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran Address: Second Floor, No.10, Jalal Al-e-Ahmad Highway, Tehran, Iran Postal Code: 1411713137 Tel: +98-21-88631293 Email: f–farzadfar@tums.ac.ir Submitted filename: Response to Reviewers.docx Click here for additional data file. 7 Jun 2022

PONE-D-21-32708R1

A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran

PLOS ONE Dear Dr. Farzadfar, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Specifically,please address point B from the reviewer #3 which is still missing. Please submit your revised manuscript by Jul 22 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Amir Radfar, MD,MPH,MSc,DHSc Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: (No Response) ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thanks to the Authors for addressing all the comments and explaining the unclear statements. The article is now recommended for publishing. Reviewer #3: I have one last point that needs addressing. Overall the paper was clear and presented the findings in an understandable fashion. Most of the recommendations from the previous review were followed. 1) Guideline Selection and Phenotyping Algorithm Items A and C were answered satisfactorily. However, point B is still missing: B. Also, according to the author's phenotyping algorithm, hypertensive patients had SBP >= 140 and DBP >= 90 and controlled hypertensive SBP < 130 and DBP < 80. This leaves a gap in the participants who had SBP between 130-139 and 80-89 (the new guideline covers that gap). I suggest that a closer look is taken into that (check lines 91-95). 2) Algorithm Selection and Hyper-Parameter Tuning All the items were answered satisfactorily. In the future, the implementation of algorithms such as XGBoost and LightGBM is currently as easy as RF (there are open-source packages and libraries in Python), and they could potentially be tried out. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: Yes: Ariane Sasso ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

20 Jul 2022 Dear Professor Radfar, Academic editor of the PLOS ONE, We are pleased to resubmit our revised manuscript entitled “A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran” (Manuscript ID: PONE-D-21-32708R1). On behalf of my colleagues, I would like to thank the esteemed editors and staff of PLOS ONE for consideration of our work. As instructed, we have submitted a revised version with tracked changes and a second copy without any markings. The following pages include a point-by-point response to the comments. In response to the insightful comment of Reviewer #3, we have explained that the selected blood pressure thresholds were used to best reflect the state of care in Iran at the time the STEPs survey was performed. Moreover, these cut-offs are widely used in other low- and middle-income countries (LMICs), and are comparable to other available data from LMICs. We note the trade-off in selecting thresholds, which would lead to underestimation or overestimation of hypertension prevalence, diagnosis, treatment, and control. We have tried to explain our rationale more clearly, and we note in the manuscript that these thresholds may not be in total agreement with the most recent guidelines. We would appreciate any specific recommendation about changing blood pressure thresholds that could improve our paper. Please kindly note a change we have made to the author affiliations in this revision. Furthermore, we have added two references (numbers 15 and 35) to the manuscript. As you instructed, we have checked and ensured that our references list is complete and correct. The journal requirements also stated “If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references”. We are not sure if this was an automated message or was in reference to a specific issue with our manuscript. We did not find a retracted article in our reference list, but please kindly let us know if there is a reference we should recheck and address. Please do not hesitate to contact me if we need to provide any additional information about this submission. We are looking forward to receiving your kind response and decision. Farshad Farzadfar (MD, MPH, MSc, DSc) Non-Communicable Diseases Research Center, Endocrinology and Metabolism Population Sciences Institute, Tehran University of Medical Sciences, Tehran, Iran Address: Second Floor, No.10, Jalal Al-e-Ahmad Highway, Tehran, Iran Postal Code: 1411713137 Tel: +98-21-88631293 Email: f–farzadfar@tums.ac.ir Reviewer #1: “Thanks to the Authors for addressing all the comments and explaining the unclear statements. The article is now recommended for publishing.” Response: Dear respected reviewer, we thank you for your time and your constructive comments, which have improved our paper. Reviewer #3: “I have one last point that needs addressing. Overall the paper was clear and presented the findings in an understandable fashion. Most of the recommendations from the previous review were followed. “1) Guideline Selection and Phenotyping Algorithm “Items A and C were answered satisfactorily. However, point B is still missing: “B. Also, according to the author's phenotyping algorithm, hypertensive patients had SBP >= 140 and DBP >= 90 and controlled hypertensive SBP < 130 and DBP < 80. This leaves a gap in the participants who had SBP between 130-139 and 80-89 (the new guideline covers that gap). I suggest that a closer look is taken into that (check lines 91-95).” Response: Dear respected reviewer, we thank you for your precise and constructive comment. We apologize for we did not appropriately address one of your comments in the previous letter. Your point is completely accurate. There is a gap in blood pressure values for definition of hypertension and control, which can be eliminated by reducing the BP cut-off for defining hypertension/high-normal BP to ≥130/80, or even by increasing the target BP for control to <140/90. The main reason we have chosen the 140/90 threshold for hypertension is we wanted to evaluate health-care system coverage at the time of the study. We had reservations about using a lower threshold, since we were concerned it would not make an accurate simulation of the state of care and would not provide a reliable estimation of the factors associated with care, as most health-care providers may not have labeled hypertension and considered pharmacological treatment for the group with office measurements at 130-139/80-89 (high-normal BP). Our rationale was that this would underestimate the rate of treatment and control. Furthermore, the conventional cut-off of 140/90 is widely used in other low- and middle-income countries—as demonstrated by databases used in the following study: https://doi.org/10.1161/CIRCULATIONAHA.120.051620. Importantly, it should be noted that having a BP measurement ≥140/90 was not our only criteria for hypertension definition. We also used hypertension diagnosis by health-care provider, or drug-history of BP-lowering medications. Therefore, a patient with BP<140/90 who had one of the above criteria could still be considered in the hypertensive population. Such patients would then be classified as achieving control if they had fulfilled both diagnosis and treatment steps, and had BP <130/80. As for the control cut-off, we used 130/80 as the target because it is an optimum value for most patients for reducing adverse outcomes, and it is recommended by recent guidelines. Ultimately, we understand that choosing any cut-off for definition of hypertension and for control would result in specific limitations. Especially since we do not have longitudinal data with multiple measurements (a point that you have thoughtfully mentioned). Thanks to your comment, we have tried to explain our rationale for choosing these numbers more clearly. In brief, these cut-offs were used because they most accurately reflect the state of care in Iran, and in most LMICs. Moreover, as several population-based studies evaluating hypertensive patients in LMICs have used similar cut-offs, this approach would increase the comparability of our results with those of others. We have also added a phrase to limitations to make this clear to readers. We hope these explanations are satisfactory. Please kindly let us know if you think there is another specific approach that would improve this study. “2) Algorithm Selection and Hyper-Parameter Tuning “All the items were answered satisfactorily. In the future, the implementation of algorithms such as XGBoost and LightGBM is currently as easy as RF (there are open-source packages and libraries in Python), and they could potentially be tried out.” Response: Many thanks for your insightful and expert comment. This would be a great direction for future studies. Submitted filename: Response to Reviewers.docx Click here for additional data file. 11 Aug 2022 A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran PONE-D-21-32708R2 Dear Dr. Farzadfar, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Amir Radfar, MD,MPH,MSc,DHSc Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #3: All comments have been addressed Reviewer #4: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #3: Yes Reviewer #4: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #3: Yes Reviewer #4: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #3: No Reviewer #4: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #3: Yes Reviewer #4: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #3: Dear Authors, Thank you very much for the great work and for addressing all the comments in detail. I am sorry this review process took so long, and I am to blame for not meeting the deadlines as I wished. Hopefully, your paper will be published soon. I wish you much success! Reviewer #4: Thank you for this well written article. I believe this manuscript will add to the body of literature. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #3: Yes: Ariane Sasso Reviewer #4: Yes: Irina Filip ********** 12 Sep 2022 PONE-D-21-32708R2 A machine learning approach to evaluate the state of hypertension care coverage: From 2016 STEPs survey in Iran Dear Dr. Farzadfar: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Amir Radfar Academic Editor PLOS ONE

30 in total

1. Effectiveness of diabetes and hypertension management by rural primary health-care workers (Behvarz workers) in Iran: a nationally representative observational study.

Authors: Farshad Farzadfar; Christopher J L Murray; Emmanuela Gakidou; Thomas Bossert; Hengameh Namdaritabar; Siamak Alikhani; Ghobad Moradi; Alireza Delavari; Hamidreza Jamshidi; Majid Ezzati
Journal: Lancet Date: 2011-12-09 Impact factor: 79.321

2. Wealth-related Inequality in Utilization of Antihypertensive Medicines in Iran: an Ecological Study on Population Level Data.

Authors: Amir Hashemi-Meshkini; Abbas Kebriaeezadeh; Hamidreza Jamshidi; Ali Akbari-Sari; Ehsan Rezaei-Darzi; Parinaz Mehdipour; Shekoufeh Nikfar; Farshad Farzadfar
Journal: Arch Iran Med Date: 2016-02 Impact factor: 1.354

3. Prevalence, awareness, treatment, and control of hypertension in rural and urban communities in high-, middle-, and low-income countries.

Authors: Clara K Chow; Koon K Teo; Sumathy Rangarajan; Shofiqul Islam; Rajeev Gupta; Alvaro Avezum; Ahmad Bahonar; Jephat Chifamba; Gilles Dagenais; Rafael Diaz; Khawar Kazmi; Fernando Lanas; Li Wei; Patricio Lopez-Jaramillo; Lu Fanghong; Noor Hassim Ismail; Thandi Puoane; Annika Rosengren; Andrzej Szuba; Ahmet Temizhan; Andy Wielgosz; Rita Yusuf; Afzalhussein Yusufali; Martin McKee; Lisheng Liu; Prem Mony; Salim Yusuf
Journal: JAMA Date: 2013-09-04 Impact factor: 56.272

Review 4. The cascade of care in diagnosis and treatment of latent tuberculosis infection: a systematic review and meta-analysis.

Authors: Hannah Alsdurf; Philip C Hill; Alberto Matteelli; Haileyesus Getahun; Dick Menzies
Journal: Lancet Infect Dis Date: 2016-08-10 Impact factor: 25.071

5. The World Health Organization STEPwise Approach to Noncommunicable Disease Risk-Factor Surveillance: Methods, Challenges, and Opportunities.

Authors: Leanne Riley; Regina Guthold; Melanie Cowan; Stefan Savin; Lubna Bhatti; Timothy Armstrong; Ruth Bonita
Journal: Am J Public Health Date: 2016-01 Impact factor: 9.308

6. Global, regional, and national comparative risk assessment of 79 behavioural, environmental and occupational, and metabolic risks or clusters of risks, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015.

Authors:
Journal: Lancet Date: 2016-10-08 Impact factor: 79.321

7. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.

Authors: Aaron Fisher; Cynthia Rudin; Francesca Dominici
Journal: J Mach Learn Res Date: 2019 Impact factor: 5.177

Review 8. Machine Learning and Data Mining Methods in Diabetes Research.

Authors: Ioannis Kavakiotis; Olga Tsave; Athanasios Salifoglou; Nicos Maglaveras; Ioannis Vlahavas; Ioanna Chouvarda
Journal: Comput Struct Biotechnol J Date: 2017-01-08 Impact factor: 7.271

9. Long-term and recent trends in hypertension awareness, treatment, and control in 12 high-income countries: an analysis of 123 nationally representative surveys.

Authors:
Journal: Lancet Date: 2019-07-18 Impact factor: 79.321

10. Insulin pen use and diabetes treatment goals: A study from Iran STEPS 2016 survey.

Authors: Hedyeh Ebrahimi; Farhad Pishgar; Moein Yoosefi; Sedighe Moradi; Nazila Rezaei; Shirin Djalalinia; Mitra Modirian; Niloofar Peykari; Shohreh Naderimagham; Rosa Haghshenas; Saral Rahimi; Hamidreza Jamshidi; Alireza Esteghamati; Bagher Larijani; Farshad Farzadfar
Journal: PLoS One Date: 2019-08-28 Impact factor: 3.240