Literature DB >> 35421085

Two-year death prediction models among patients with Chagas Disease using machine learning-based methods.

Ariela Mota Ferreira¹, Laércio Ives Santos², Ester Cerdeira Sabino³, Antonio Luiz Pinho Ribeiro⁴, Léa Campos de Oliveira-da Silva³, Renata Fiúza Damasceno¹, Marcos Flávio Silveira Vasconcelos D'Angelo⁵, Maria do Carmo Pereira Nunes⁴, Desirée Sant Ana Haikal¹.

Abstract

Chagas disease (CD) is recognized by the World Health Organization as one of the thirteen most neglected tropical diseases. More than 80% of people affected by CD will not have access to diagnosis and continued treatment, which partly supports the high morbidity and mortality rate. Machine Learning (ML) can identify patterns in data that can be used to increase our understanding of a specific problem or make predictions about the future. Thus, the aim of this study was to evaluate different models of ML to predict death in two years of patients with CD. ML models were developed using different techniques and configurations. The techniques used were: Random Forests, Adaptive Boosting, Decision Tree, Support Vector Machine, and Artificial Neural Networks. The adopted settings considered only interview variables, only complementary exam variables, and finally, both mixed. Data from a cohort study with CD patients called SaMi-Trop were analyzed. The predictor variables came from the baseline; and the outcome, which was death, came from the first follow-up. All models were evaluated in terms of Sensitivity, Specificity and G-mean. Among the 1694 individuals with CD considered, 134 (7.9%) died within two years of follow-up. Using only the predictor variables from the interview, the different techniques achieved a maximum G-mean of 0.64 in predicting death. Using only the variables from complementary exams, the G-mean was up to 0.77. In this configuration, the protagonism of NT-proBNP was evident, where it was possible to observe that an ML model using only this single variable reached G-mean of 0.76. The configuration that mixed interview variables and complementary exams achieved G-mean of 0.75. ML can be used as a useful tool with the potential to contribute to the management of patients with CD, by identifying patients with the highest probability of death. Trial Registration: This trial is registered with ClinicalTrials.gov, Trial ID: NCT02646943.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35421085 PMCID： PMC9041770 DOI： 10.1371/journal.pntd.0010356

Source DB: PubMed Journal: PLoS Negl Trop Dis ISSN： 1935-2727

Introduction

Chagas disease (CD) is recognized by the World Health Organization as one of the thirteen most neglected tropical diseases in the world, and is caused by the protozoan Trypanosoma cruzi (T.cruzi) and remains a public health problem despite the partial control of its transmission [1-3]. Most patients with CD remain in the “chronic indeterminate form”, defined as persistent asymptomatic infection without cardiac or gastrointestinal tract involvement [4]. However, up to 30% of chronically infected people may have cardiac abnormalities, which is the most serious complication of CD [5]. This condition is associated with a worse prognosis, with higher mortality rates compared to other causes of heart failure [2,6]. CD is the leading cause of disability-adjusted life years (DALY) lost among all neglected tropical diseases [7]. Previous studies estimate that more than 80% of people affected by CD in the world will not have access to diagnosis and continued treatment, which partly supports the high morbidity and mortality rate and the social cost of the disease [4,8]. One of the strategies that can be used to define interventions in order to reduce the impact of CD would be Machine Learning (ML). ML is a field of Artificial Intelligence based on computational algorithms that allow computers to learn directly from data, without being explicitly programmed [9]. ML algorithms analyze large volumes of data represented by many characteristics (predictor variables) in reasonable time, and can handle complex relationships between data, which makes them as accurate or more accurate than human specialists in some situations [10]. Thus, ML can be defined as a set of tools and methods to identify patterns in data. These patterns can be used to increase our understanding of a specific problem, make predictions about the future and contribute to decision making. We say that the algorithm learns from the data. Using different configurations, the goal is to find a model that better explains the dataset [11]. However, ML systems, commonly complex, need to be explainable so that the solutions suggested by the models can be understood. [12]. The use of ML to predict deaths from specific diseases has been of increasing interest in the scientific literature [12], however, only one study was located using ML to predict death from CD. The study investigated whether heart variability could predict death using a sample of 150 CD patients. The model obtained, which used electrocardiogram (ECG) variables, achieved a death prediction power of 95% [13]. However, information from complementary exams, such as the ECG, is not easily accessible in many remote regions where CD is endemic. And no previous studies were identified that used ML adopting easily accessible predictors, such as information from the interview. The aim of this study was to evaluate different models of ML for predicting death in two years of patients with CD. Models were tested with three different configurations: only with interview variables, only with variables from complementary exams, and finally, with interview variables and variables from complementary exams.

Method

Ethical approval

Ethical approval was granted by ethics committees (Research Ethics Committee of the Faculty of Medicine of the University of São Paulo—protocol number: 042/2012; Research Ethics Committee of the State University of Montes Claros—protocol number: 2,393,610; and the National Institutional Review Council (CONEP), number: 179,685/2012). All subjects agreed to participate in the study and signed an informed consent form before starting the collection.

Studied population

SaMi-Trop is a multicenter study (Trial registration number: NCT02646943, ClinicalTrials.gov) designed by scientists from four Brazilian public universities in the states of Minas Gerais and São Paulo, that established a cohort of carriers of CD, recruited in 21 municipalities that are endemic for CD in the state of Minas Gerais. In this cohort, patients were selected to participate in the study based on the results obtained from electrocardiogram (ECG) exams, and only patients aged 18 years or older and who had a cardiac abnormality compatible with CD were considered eligible; 4,689 patients were identified, the baseline took place in 2013 and 2014 and included 2,161 participants. Subsequently the first follow-up was in the years 2015 to 2016 with the permanence of 1,709 participants and it was possible to identify 146 deaths, totaling data from 1855 individuals. Details on the recruitment and methodologies of this cohort can be accessed in a previous study [14]. Of the 1855 individuals at follow-up, those with negative or indeterminate serology for CD were excluded, thus all participants included in this study had confirmed serology.

Data collection

Baseline and follow-up visits were carried out in the public health units of Primary Health Care, where participants were interviewed, had blood samples taken, and ECG tests performed. The interview addressed sociodemographic issues, lifestyle habits, clinical history, CD treatment, physical activity and quality of life. In the follow-up, among other questions, the reason for the patient’s loss of follow-up was also addressed, where the alternatives were: death, giving up, and not being located. In all patients, the presence of anti-T. cruzi was tested, using microparticle chemiluminescent immunoassay. The negative results were reassessed and the immune-negativity of the result was confirmed by two other chemiluminescence immunoassay tests using different antigens. The present investigation was conducted with data from the SaMi-Trop cohort, with the predictor variables coming from the baseline, and the outcome coming from the first follow-up.

Outcome

The outcome “Death” was adopted for this study (no vs yes). Usually deaths associated with CD are due to cardiovascular causes, mainly sudden death or secondary to heart failure. In the present study, only 4 non-cardiovascular deaths occurred (one accidental death, two due to cancer and one non-specified death). However, as we were unable to assess the cause of death of each patient, all cause mortality was defined as the endpoint.

Inclusion and configuration criteria of the models

In this study, the prediction of the outcome “Death” was performed with three different configurations: only with interview variables, only with complementary exam variables, and finally, with interview variables and complementary exam variables simultaneously. In each configuration additional exclusions occurred due to loss of information (missing). It was decided to exclude participants with some missing data and not perform the imputation of these data, because in no configuration did the number of participants with missing data exceed 10% of the total.

Data pre-processing

The predictor variables of interviews and complementary exams initially considered in the analyses, as well as the way in which they were collected and worked on, are presented in the supplementary material (S1 Table). Among the predictor variables from the interview, 33 variables were considered and among the predictor variables from complementary exams, 15 variables were considered, including information from the ECG, and the test to assess heart failure (NT-proBNP) and the viral load of CD (quantitative PCR). To choose the format of the NT-proBNP variable, three measures were tested: numerical, categorized by age [15], and categorized with a cutoff point of ≥ 300 pg/dl [15], the latter being the format that had the best predictive power for the outcome.

Selection of predictor variables for the model

The selection of variables aimed to reduce the amount of variables initially available, preserving the relevant and discarding the redundant. There are important reasons why variable selection is essential. The procedure reduces the amount of input variables, optimizing professional work time, reducing model training time and overfitting, resulting in a better generalization capability of the models. In this study, we initially selected 5, 10, or 15 most important predictor variables according to the Random Forests ranking, a strategy based on a previous study [16]. In all experiments, the configuration with 10 variables obtained better results and was adopted in this study. However, other simulations using predictor variables of specific interest were also presented (NT-proBNP related).

Machine learning approaches

Five supervised ML techniques were used separately to predict the death of the cohort participants. The techniques used were: Random Forests [17], Adaptive Boosting [18], Decision Tree [19], Support Vector Machine [20], and Artificial Neural Networks [21]. For training and evaluation of these models, Matlab software version R2015b was used, with 5-fold cross-validation, of which three were used for training (training set), one to adjust the hyperparameters of each technique (validation set) and one to evaluate the performance of the models (test set) (Fig 1). Cross-validation was performed in a stratified manner, i.e., in each of the 5 folders the prevalence of the outcome, in relation to the total number of participants, was preserved.

Fig 1

Flowchart of the process of selecting predictor variables based on cross-validation analysis for predicting death among patients with CD within two years of follow-up.

Table 1 presents a brief description and the adjusted hyperparameters for each ML technique.

Table 1

Description of Machine Learning (ML) techniques adopted in the study.

ML Technique	Description	Adjusted hyperparameters
Random Forests	Uses an aggregate of decision trees from randomly selected instances and predictors. Each tree predicts a problem class and the model prediction is determined by majority vote.	Number of trees, number of sampled predictors, fraction of sampled data.
Adaptative Boosting	Uses an aggregate of decision trees from randomly selected instances and predictors in the first tree. From the second tree onwards, the selection of instances is made considering a probability proportional to the prediction error of the previous trees.	Number of trees, Learning Rate.
Decision tree	Recursively defined structure composed of decision nodes and leaf nodes. A decision node contains a test on some predictor and for each result of that test there is a link to a subtree. A leaf node corresponds to one of the problem classes.	Maximum number of subdivisions.
Support Vector Machine	Searches for a hyperplane that maximizes the distance between instances of two different classes. When the problem dealt with has only two predictors, this hyperplane is represented by a line, and with n predictors a hyperplane with n dimensions is needed to adapt to the data.	Kernel function.
Artificial Neural Networks	Simulates the way a human brain learns through artificial neurons. An artificial neuron takes input information from an external source and combines such inputs with non-linear operations producing a result based on the assimilated knowledge.	Number of neurons in the hidden layer, number of epochs and learning rate.

Class imbalance problem

In all the experiments performed in this study, there was an imbalance in the outcome classes. The number of participants with a positive outcome (death category) was approximately 8%, while the number with a negative outcome (non-death category) was approximately 92%. Thus, it was necessary to insert a balancing step for the training set. The Synthetic Minority Oversampling Technique (SMOTE) method with k = 10 was used to balance the number of instances of the two classes. SMOTE performs a reduced sampling of the majority class and synthesizes new data points in the minority class [22].

Model performance evaluation

All models were evaluated by different metrics: Sensitivity, Specificity and G-mean. In the context of this study, sensitivity measures the probability of the system predicting a death, given that the death actually occurred. Specificity measures the probability of the system predicting a non-death, given that the non-death actually occurred. The G-mean equates sensitivity and specificity (, measuring the balance between the classification performances in the majority and minority classes in problems with class imbalance.

Statistical analysis

To verify if there are significant differences between the configurations, we performed the Mann Whitney U statistical test, compared the predictive results of each metric (sensitivity, specificity, and G-mean) of the models and considered statistically significant when the p-value < 0.05. Comparison groups were defined as follows: interview variables vs. complementary exam variables; complementary exam variables vs. complementary exam variables without NT-proBNP; complementary exam variables vs. NT-proBNP variable exclusively; complementary exam variables vs. interview variables; and complementary exam variables simultaneously. For all groups we used the values of all folders and the 5 ML models developed.

Results

Of the 1855 individuals considered in the follow-up of the SaMi-Trop cohort, 161 were excluded because they were negative or indeterminate serology for CD. Therefore, this study considered 1,694 individuals, of which 134 (7.9%) died within two years of follow-up. Fig 2 shows the number of patients included and excluded from the cohort and of eligible patients in the study and in the models obtained.

Fig 2

Flowchart of patients included and excluded from the cohort and of eligible patients in the study and in the models obtained.

SaMi-Trop Project. Minas Gerais.

Flowchart of patients included and excluded from the cohort and of eligible patients in the study and in the models obtained.

SaMi-Trop Project. Minas Gerais. Table 2 presents categorical predictor variables selected among the 10 with the greatest predictive power in one or more configurations adopted in this study (final model), and their association with death. Quantitative PCR and heart rate variability (numerical variables) were also selected among the 10 variables with the greatest predictive power. The mean PCR in the “non-death” group was 555.40 parasites/mL and in the “death” group it was 666.35 parasites/mL (p = 0.431). The mean heart rate variability between the “non-death” group was 304.55 ms and in the “death” group it was 475.97 ms (p = 0.009).

Table 2

Descriptive and bivariate analysis of categorical predictor variables selected among those with the greatest predictive power, and their association with death in patients with Chagas disease (CD).

Minas Gerais, Brazil (n = 1,694).

Interview variables	Descriptive	Bivariate		P-value ^π
		Death
	n (%)	No n (%)	Yes n (%)
Sociodemographic
Gender
Male	562 (33.2)	503 (89.5)	59 (10.5)	0.005
Female	1132 (66.8)	1057 (93.4)	75 (6.6)
Literate*
No	750 (44.5)	671 (89.5)	79 (10.5)	<0.001
Yes	937 (55.5)	884 (94.3)	53 (5.7)
Age
Up to 60 years	935 (55.2)	886 (94.8)	49 (5.2)	<0.001
Above 61 years	759 (44.8)	674 (88.8)	85 (11.2)
Self-declared color *
White	361 (21.4)	332 (92)	29 (8.0)	0.836
Non-white	1324 (78.6)	1222 (92.3)	102 (7.7)
Per capita income*
Greater than R$ 356.33	665 (39.7)	602 (90.5)	63 (9.5)	0.033
Less than R$ 356.32	1011 (60.3)	944 (93.4)	67 (6.6)
Signs and symptoms reported
Climb stairs *
No	617 (36.7)	538 (87.2)	79 (12.8)	<0.001
Yes	1065 (63.3)	1010 (94.8)	55 (5.2)
Self-reported ECG irregularity *
No	651 (39.2)	602 (92.5)	49 (7.5)	0.559
Yes	1009 (60.8)	9258 (91.7)	84 (8.3)
Racing heart *
No	603 (36.3)	602 (92.5)	49 (7.5)	0.559
Yes	1057 (63.7)	925 (91.7)	84 (8.3)
Comorbidities reported
Arterial hypertension
No	608 (35.9)	573 (94.2)	35 (5.8)	0.014
Yes	1086 (64.1)	987 (90.9)	99 (9.1)
Permanent self-reported pacemaker *
No	1559 (93.9)	1446 (92.8)	113 (7.2)	<0.001
Yes	101 (6.1)	81 (80.2)	20 (19.8)
Variables from complementary exams
Heart rate*
Normal	1112 (67.4)	1024 (92.1)	88 (7.9)	0.141
Below normal (up to 59 bpm)	509 (30.8)	474 (93.1)	35 (6.9)
Above normal (above 101 bpm)	30 (1.8)	25 (83.3)	5 (16.7)
Corrected QT interval*
Normal (up to 440 m/s)	816 (49.4)	778 (95.3)	38 (4.7)	<0.001
Altered (above 441 m/s)	835 (50.6)	745 (89.2)	90 (10.8)
QRS complex duration*
Normal (up to 120)	959 (58.1)	911 (95)	48 (5)	<0.001
Altered (above 121)	692 (41.9)	612 (88.4)	80 (11.6)
Isolated right bundle branch block plus left anterior fascicular block*
Negative	1460 (88.4)	1352 (92.6)	108 (7.4)	0.135
Positive	191 (11.6)	171 (89.5)	20 (10.5)
Isolated right bundle branch block*
Negative	1315 (79.6)	1209 (91.9)	106 (8.1)	0.355
Positive	336 (20.4)	314 (93.5)	22 (6.5)
Pacemaker*
Absent	1592 (96.4)	1482 (93.1)	110 (6.9)	<0.001
Present	59 (3.6)	41 (69.5)	18 (30.5)
Pathological Q waves*
Negative	1398 (84.7)	1308 (93.6)	90 (6.4)	<0.001
Positive	253 (14.9)	215 (85)	38 (15)
Low QRS complex voltage*
Negative	1556 (94.2)	1444 (92.8)	112 (7.2)	0.001
Positive	95 (5.8)	79 (83.2)	16 (16.8)
Categorized NT-proBNP*
Normal (below 300pg/dl)	1194 (70.8)	1166 (97.7)	28 (2.3)	<0.001
Altered (above 301pg/dl)	492 (29.2)	386 (78.5)	106 (21.5)

* Variation of the n = 1.694 because of missing information.

π Chi squared test

Descriptive and bivariate analysis of categorical predictor variables selected among those with the greatest predictive power, and their association with death in patients with Chagas disease (CD).

Minas Gerais, Brazil (n = 1,694). * Variation of the n = 1.694 because of missing information. π Chi squared test

Configuration with interview variables

The 10 most important predictors from the interview (according to Random Forests ranking) were: age, literacy, per capita income, climbing stairs, self-reported ECG irregularities, self-reported skin color, self-reported permanent pacemaker, gender, arterial hypertension, and racing heart. The results of the 5 ML techniques are shown in Fig 3A. In general, the different techniques revealed relatively similar and modest values, with a G-mean of a maximum of 0.64.

Fig 3

Performance of models in predicting death for patients with CD, within two years, according to each machine learning technique adopted.

Fig 3A: Considering the interview variables. Fig 3B: Considering the variables of complementary exams. Fig 3C: Considering the variables of complementary exams, excluding the categorized NT-proBNP variable. Fig 3D: Considering the variables of complementary exams, considering only the NT-proBNP variable. Fig 3E: Considering the interview variables and complementary exam variables.

Performance of models in predicting death for patients with CD, within two years, according to each machine learning technique adopted.

Confuguration with variables from complementary exams

The 10 most important predictor variables from complementary exams were: categorized NT-proBNP, QRS complex duration, heart rate variability, isolated right bundle branch block, quantitative PCR, corrected QT interval, low QRS complex voltage, heart rate, pathological Q Wave, and isolated right bundle branch block plus left anterior fascicular block. Among them, it was observed that the predictive power of the NT-proBNP variable was greater (Fig 4).

Fig 4

Importance of predictor variables of complementary exams for predicting death in patients with CD, within two years, according to Random Forests ranking.

The results of the 5 ML techniques considering the complementary exam variables are shown in Fig 3B. All techniques presented G-means above 0.74, the maximum value being 0.77. The configuration considering only complementary exam variables presented predictive power superior to that achieved by configuration with the adoption of only interview variables, according to the U test (p-value < 0.001 for the three metrics). Due to the high predictive effect of the NT-ProBNP variable, two other configurations were tested: 1) excluding the categorized NT-proBNP variable (only the variables remaining: QRS complex duration, heart rate variability, isolated right bundle branch block, quantitative PCR, corrected QT interval, low QRS complex voltage, heart rate, pathological Q wave, isolated right bundle branch block plus left anterior fascicular block, and pacemaker); 2) Keeping this variable exclusively (Fig 3C and 3D, respectively).

Configuration with variables from complementary exams excluding the categorized NT-proBNP

Excluding the categorized NT-proBNP variable (Fig 3C), the maximum observed G-mean was 0.66. The reduction observed between this configuration (Fig 3C) and the configuration with complementary exam variables (Fig 3B), according to the U test, shows that there is a significant difference (p-value < 0.002 for the three metrics), revealing a loss in prediction power with the withdrawal of NT-proBNP.

Configuration with only categorized NT-proBNP

The configuration with NT-proBNP exclusively (Fig 3D), revealed G-mean of up to 0.76 in all techniques, a value similar to the configuration with variables from complementary exams. There was no significant difference between these settings (p-value > 0.05 for the three metrics).

Configuration with interview and complementary exam variables

Using interview and complementary exam variables simultaneously, we tested whether there would be an improvement in the prediction of ML models. The results are shown in Fig 3E. Initially, the 20 most important predictor variables were included (10 from interviews and 10 from complementary exams) according to the Random Forests ranking. However, in the final model, 10 variables were kept: categorized NT-proBNP, QRS complex duration, heart rate variability, age, self-reported skin color, stair climbing, isolated right bundle branch block, quantitative PCR, corrected QT interval, and per capita income. In this configuration (Fig 3E), a G-mean of up to 0.75 was observed. In all techniques, no improvement in prediction was observed after using interview variables and complementary exams simultaneously (p-value > 0.05 for the three metrics). Configuration with only complementary exam variables (Fig 3B) had similar predictive power to the configuration with complementary exam and interview variables (Fig 3E).

Discussion

This study described how ML was able to predict death over a 2-year period in patients with CD, using different techniques and different configurations of predictive models. In general, the prediction of death from CD presented a G-mean value between 0.59 and 0.77, varying according to the techniques and configurations. The five techniques of ML adopted showed approximately similar values, not being possible to identify superiority of a technique in relation to the others. In the configuration using only variables from the interview, the different techniques revealed relatively modest values, with a maximum G-mean of 0.64. Configuration using only variables from complementary exams revealed a G-mean of up to 0.77. In this configuration, the role of NT-proBNP was evident (Fig 4), showing more than twice the importance observed for the other variables in the model. This finding was confirmed when observing the configuration using only NT-proBNP, where it was observed that this single variable reached a G-mean of 0.76. The configuration of interview variables and complementary exams simultaneously reached a G-mean of up to 0.75. Thus, the three configurations that considered NT-proBNP (variables from complementary exams, only with NT-proBNP, and from interview and complementary exams simultaneously) had similar and superior predictive power to the two configurations (interview and complementary exam variables excluding the NT-proBNP) that did not consider NT-proBNP. The role of this variable in the prediction of death is confirmed by verifying the similarity of the predictive power of the configuration that exclusively adopted this single variable with the predictive power of other more complex configurations. The different models obtained allowed us to identify which configurations have the best predictive power. In addition, the study made it possible to identify, through the use of artificial intelligence, the most important predictors of death from anamnesis, complementary exams, and both. In the configuration with only variables from the interview, it was noticed that sociodemographic characteristics and self-reported signs/symptoms remained in the model. In this case, the high accuracy of self-reported questions for chronic conditions has already been verified [23]. In terms of clinical projection, adopting only ten interview variables during an anamnesis, individuals who would potentially benefit most from possible previous interventions would be identified. It would be possible to achieve 64% prediction power of death in this configuration. Although the predictive power is considered modest, it can still represent a significant clinical impact. This intervention could be useful in practice, in places where there is restricted access to exams and specialized health services, with a "selective, focused and exclusive" offering [24]. The literature also reports that the groups that have greater health needs, such as those with CD, are precisely those that have greater difficulty in accessing and using health services [25]. It is a great challenge for the Brazilian public health service (SUS) to achieve equitable access, as each social segment has different demands produced by social processes of exclusion, not always perceived by the government [24]. Thus, the ML tool could contribute to this need for better management of health system users with CD, using simple and effective data, from the anamnesis, especially in remote regions. In the configuration only with variables from complementary exams, ECG, parasite load (quantitative PCR), and NT-proBNP biomarker variables remained. In this configuration, the prediction power achieved was greater than using only interview variables, highlighting the importance of information of complementary tests for conducting the clinical management of CD. The ECG is the most important complementary exam in the initial evaluation of patients with CD [4]. It is an inexpensive and standardized exam. Detection of t.cruzi parasites in blood by PCR has been used to assess the parasite burden and effectiveness of CD treatment, and previous studies have demonstrated the role of PCR in predicting CD progression [26,27]. When verifying the importance of complementary exam variables to compose the model, the NT-proBNP biomarker was prominent. Thus, due to this importance, a configuration with only this variable was tested, and the predictive power of this configuration composed with this single variable was similar to the predictive power of the configuration with the 10 complementary exam variables (76% vs. 77%). The configuration with complementary exam variables excluding NT-proBNP had significantly lower predictive power (66%). NT-proBNP levels already identified are accurate discriminators of heart failure diagnosis, auxiliaries in patient risk stratification, and as powerful predictors of death. A previous study found that the discriminatory ability of NT-proBNP to predict mortality (C = 0.69, 95% CI: 0.66, 0.71) is similar to that of an ECG (C = 0.68, 95% CI: 0.65, 0.71) [28]. Confirming this finding, in another study, in the construction of a risk score to predict 2-year mortality in patients with CD, the NT-proBNP was included, and revealed greater power of death prediction [29]. Furthermore, this measure is the factor most strongly associated with the occurrence of cardiovascular events in the population with CD [30]. Unfortunately, NT-proBNP is not readily available in most clinical settings involving CD. Its large-scale use in the Brazilian public health system as a routine test to aid in management is not envisioned at this time, although it is highly desired, because this exam, performed by a simple blood test, provides fewer echocardiograms (-58.2%) and a reduction in the number of hospitalizations (-12.6%). As it is also a strategy with lower final cost and better diagnostic accuracy, there would be no increase in the budget of the public health system for the diagnosis and treatment of patients with heart failure [31]. Although NT-pro BNP is a quantitative marker, and levels are best interpreted as a continuous variable, cut-off points can still be useful in making its application easy for physicians without extensive experience. Thus, a cut-off of 300 pg/ml is proposed to rule out a diagnosis of heart failure, while higher age-dependent cut-offs are suggested for rule in [15]. Other models for predicting death in the CD population are cited in previous studies. However, only one study was found that used ML to predict death in CD patients. The study included 150 patients and 15 patients who died [13]. Studies that performed this prediction commonly use the methodology of creating risk scores and developed simple models to predict death, with good clinical relevance and a C statistical score from 0.82 to 0.84. All these models shared information from complementary exams to estimate individual risk [29,32,33]. In endemic areas, CD represents a major cause of death from cardiovascular disease [28]. A meta-analysis identified that CD is associated with high mortality, regardless of clinical condition, with a relative risk (RR) of 1.74 (95% CI 1.49–2.03) and attributable risk of 42.5% considering the exposed group [29]. The region where our study was conducted is characterized by having low demographic density, strong social inequality, large distances between municipalities, and extensive rural areas [34]. In addition to these contextual problems in the region, there is a lack of training, specific knowledge, and safety in the management of patients with CD among primary health care physicians [35,36]. Thus, ML can revolutionize this effective delivery of health care with the advent of new tools and algorithms, developing a new class of smart digital health interventions. However, more studies are needed to demonstrate the effectiveness of digital interventions that rely on machine learning applications in real-life healthcare. More evidence of the clinical utility of ML in the provision of health services is needed. Researchers should go beyond retrospectively validating machine learning models, integrating their models into properly designed digital health tools and evaluating the tools in rigorous studies carried out in real-life environments [37]. Despite their limitations, the use of ML tools could contribute to expanding access to an adequate management of CD, considering that it is an automatic, simple, and inexpensive technique. Among the strengths of this study, the longitudinal assessment of a large sample of patients with CD who live in endemic areas and in small municipalities, far from the large urban centers commonly portrayed in the studies, stands out. That is, the individuals participating in the investigated sample typically represent populations with CD from endemic areas. But this study is not without limitations. In addition to the difficulties already mentioned in practical implementation, there are some points to comment on, such as the way the models were evaluated in this study, using cross-validation to train and test the models. From a methodological point of view, cross-validation allows for a robust assessment. However, using two independent data sets (one for training and the other for testing the models) would better the generalizability, and application in clinical practice of the models. In addition, independent data are required. A second independent sample must be included for external validation and implementation studies can be carried out to assess the potential impact of using these models to predict death in patients with CD.

Conclusion

This study evaluated the 2-year predictive power of death in CD patients using different ML settings and techniques. It was possible to develop optimized models, which can contribute with the development of prediction tools. The ML model proved to be useful and with good power to predict death within two years among patients with CD. The different configurations and techniques for predicting death from CD achieved 59% to 77% predictive power. Configuration with information coming only from the interview have the advantage of being used in scenarios with little access to complementary exams, but the incorporation of variables from complementary exams has improved the predictive power. A configuration adopting only the NT-proBNP, in isolation, showed a prediction capacity similar to the best prediction models using interview variables and complementary exams. Thus, the ML method confirmed the role of this biomarker in predicting death. ML can be used as a useful tool with the potential to contribute to the management of patients with CD, by identifying patients with the highest probability of death. However, there is still a need to pursue models with greater predictive power and the clinical implementation of this knowledge through the use of independent data for external validation.

Prediction Model Development.

(DOCX) Click here for additional data file.

Predictor variables initially considered in the analyzes and how they were worked.

(DOCX) Click here for additional data file.

Database.

(XLS) Click here for additional data file. S1A: Decision tree created from the interview variables and S1B: Decision tree created from the variables of complementary exams. (TIF) Click here for additional data file. 6 Dec 2021 Dear Ferreira, Thank you very much for submitting your manuscript "Development of a two-year death prediction model among patients with Chagas Disease using methods based on machine learning" for consideration at PLOS Neglected Tropical Diseases. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. In light of the reviews (below this email), we would like to invite the resubmission of a significantly-revised version that takes into account the reviewers' comments. We cannot make any decision about publication until we have seen the revised manuscript and your response to the reviewers' comments. Your revised manuscript is also likely to be sent to reviewers for further evaluation. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out. [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Please prepare and submit your revised manuscript within 60 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. Please note that revised manuscripts received after the 60-day due date may require evaluation and peer review similar to newly submitted manuscripts. Thank you again for your submission. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Alberto Novaes Ramos Jr Associate Editor PLOS Neglected Tropical Diseases Bruce Lee Deputy Editor PLOS Neglected Tropical Diseases *********************** Reviewer's Responses to Questions Key Review Criteria Required for Acceptance? As you describe the new analyses required for acceptance, please consider the following: Methods -Are the objectives of the study clearly articulated with a clear testable hypothesis stated? -Is the study design appropriate to address the stated objectives? -Is the population clearly described and appropriate for the hypothesis being tested? -Is the sample size sufficient to ensure adequate power to address the hypothesis being tested? -Were correct statistical analysis used to support conclusions? -Are there concerns about ethical or regulatory requirements being met? Reviewer #1: -Are the objectives of the study clearly articulated with a clear testable hypothesis stated? Yes -Is the study design appropriate to address the stated objectives? Yes -Is the population clearly described and appropriate for the hypothesis being tested? No -Is the sample size sufficient to ensure adequate power to address the hypothesis being tested? Yes -Were correct statistical analysis used to support conclusions? Yes. Several minor suggestions, two major that will require additional analysis. -Are there concerns about ethical or regulatory requirements being met? No concern. Im cool with it. The authors cite another paper initially describing the cohort from which the sample was driven. However, it forces the reader to go to this original paper to understand key points such as the inclusion and exclusion criteria and the manner in which these participants were recruited and included. I recommend the authors to clearly state, at least briefly, which were the inclusion and exclusion criteria, which diagnostic tests were performed in Chagas disease diagnosis work-up, and weather the participants were included sequentially in a period or randomly from the original cohort. Another issue in the method section is the very short follow-up period for Chagas disease patients. It is clear that this study reports from data of a partial follow-up not yet fulfilled, but I could not find any discussion regarding this short follow-up or expectations about the models’ behaviors with longer follow-up periods in the discussion section. Usually, Chagas Disease patients are followed for 10 or even more years for an outcome to be observed frequently, unless they are in a very advanced stage of (possible heart) disease. Therefore, it is also important to state in the inclusion and exclusion criteria if age or disease severity was considered for this purpose. The outcome definition is extremely important for appropriate interpretation and applicability of the model. Therefore, I suggest the authors to clearly state if death (yes or no) is an overall CD death, overall heart CD death, any cause death, stroke related death, heart failure death, sudden death or any combination of these, or possible other mechanisms of death. These different mechanisms of death require different prevention and treatment approaches and may drastically change the applicability of the final model. “In addition to these, in each configuration additional exclusions occurred due to loss of information (missing). Table 1 shows the number of lost participants, the total number of participants included, in addition to the percentage of death verified in each configuration.” seems result to me. I suggest moving this piece to results section. The method would be how to check missing data, how to decide on data imputation, and what methods are considered to impute data. Additionally, it is not clear what fraction of the excluded participants had the outcome. This could be easily cleared out rearranging table 1 or replacing it by an inclusion and exclusion diagram as recommend in TRIPOD. There is no comment on data imputation. Usually, missing data is not at random and analyzing complete cases only has potential to prediction bias, therefore complete case analysis is considered bad practice. Although there are several complex data imputation algorithms, such as predictive mean matching and ML models, functions to make imputations are currently easy to use in several software, such as STATA and R-project. Of course, if missingness prevalence is very very low, imputation may seem pointless, however there is no way to know that (that Im aware of) until imputation is performed, and complete case data analysis is compared with imputed data analysis, but it does not seem to be the case. As seen in table one, two out of three scenarios have higher prevalence of participants with missing data then participants with the outcome. If missingness is not at random, i.e., it is strongly related to the outcome, then the outcome could be even doubled if missingness had never occurred in first place. How many of the participants with missing data have the outcome data missing? “To choose the NT-proBNP categorization, three measures were tested: numerical, categorized by age [13], and categorized with a cutoff point of ≥ 300 pg/dl, the latter being the format that had the best predictive power for the outcome.” Although there is reasoning for this procedure this statement is a bit awkward. There is plenty evidence pointing that categorizing continuous predictors is just loss of information and bad practice. By categorizing a continuous predictor in two classes, the authors are reducing the possibility of the machine learning techniques to identify complex data patterns which was mentioned as an advantage in the introduction section. In an extreme example, if BNP is an excellent predictor and may work alone replacing the whole model with a single decision threshold, why workout a machine learning algorithm in the first place? Additionally, similar comments can be worked towards all other predictors shown in table 3 such as PCR and income, and the outcome representation. It is not clear, but it seems that the machine learning techniques are representing the outcome as classes instead of probabilities as it would be expected in regression modeling. This modeling decision may also affect how the predictors may be combined and related to the outcome. Have authors tried regression ML instead of classes ML? Representing the outcome as classes turn the interpretation much easier but it hides the potential prediction error. If the predictions returns probabilities then discussions regarding calibration, decision thresholds and errors around this thresholds would be necessary. From the supplementary material it is clear that at least 45 predictors were available. Why did the authors chose to perform the models with ten predictors and not 8 or 6 or 12? What is the rationale involved in this decision? The following suggestion goes beyond the stated research aim. Instead of just comparing different machine learning techniques with binary outcomes the authors could try a comparison of the very same machine learning techniques with binary and survival outcomes. I believe there is a huge knowledge gap on this topic, it would be closer to the already available CD death prediction models in the literature and it would give a more elegant solution to the outcome missing data turning them into censored participants. Regarding the model’s performance measures I would suggest the authors to show the performances estimates from both test set and validation set. It is not clear how the subsamples for cross validation were defined. Where they splitted randomly? Or was there a geographic or calendar rationale? Additionally, a brief comment or discussion regarding the performance measures choices or alternatives, such as Youden index, accuracy, total error, positive likelihood ratio in predictive values would be nice. Reviewer #2: The paper presents the results from the application of a number of supervised machine learning techniques to the problem of estimating the probability of death in patients diagnosed with the Chagas disease. Classical techniques for function approximation like neural networks and support vector machines have been used, together with other standard techniques based on decision-trees. The sample used for the analysis, the SaMi-Trop cohort, consists of ~2000 patients of which ~7% died within two years. The selection of predictor variables is grouped in three main classes 1) interview; 2) complementary exams; and 3) interview and complementary exams. For model calibration, a 5-fold cross-validation scheme is selected and the performance using the three classes of predictors evaluated. The models are assessed in terms of three standard figures of merit: 1) sensitivity; 2) specificity; and 3) G-mean. The results of the prediction models are discussed and certain conclusions drawn with respect to the importance of the predictors. Reviewer #3: The authors present a manuscript of relevant interest. The subject falls within the scope of the journal. Overall, the paper is well written and contains valuable information. The bibliography is pertinent and current. However, the text still needs some improvement and minor repairs. Excerpts that deserve special attention in terms of writing were marked in yellow.Comments and additional suggestions have been placed in the margin of the text. -------------------- Results -Does the analysis presented match the analysis plan? -Are the results clearly and completely presented? -Are the figures (Tables, Images) of sufficient quality for clarity? Reviewer #1: -Does the analysis presented match the analysis plan? Yes -Are the results clearly and completely presented? Yes -Are the figures (Tables, Images) of sufficient quality for clarity? Partially What is the median follow-up time? What is the range of the follow up time? What is the fraction of participants loss to follow up or with interrupted follow-up? What is the rate of death expressed in events divided by person year (x 100 person-year)? Are participants living mainly in rural or urban areas? These are nice information to allow readers having an idea how the cohort behaves over time and compare with other Chagas disease cohorts. Is it possible to show exclusion diagram as a result from the screening and recruiting effort? Regarding the subsamples for cross validation, it does not make sense estimating the outcome prevalence in all samples if they were splitted randomly, however as it is not clear how does samples were splitted the authors should consider to show the outcome prevalence from all samples. “In all techniques, no improvement in prediction was observed after using interview variables and complementary exams together. That is, models with only complementary exam variables (Table 04) had similar predictive power to models with complementary exam and interview variables (Table 08).” What is the risk of death for a patient with altered and proBNP? How can a health care provider estimate a risk of death for his/her patient with any of these models? Reviewer #2: The results are presented and discussed in accordance to the used methodology and some of its limitations have been highlighted. A number of qualitative conclusions are made, although little-to-none statistical evidence is provided to support them. Similarly, a number of unclear and subjective statements are made throughout the section regarding the statistical validity, robustness, and ability to generalise of the approach. From this perspective, a moderate use of definitive claims is recommended. The majority of the authors' claims are relevant, but only of qualitative nature and potentially questionable. While it is true that it may be possible to use machine learning models in the clinical practice, it is mandatory that the relevance of the outcomes is properly scrutinised and their use in a real decision-making scenario conscientiously disciplined. I believe that strong statements regarding the validity of these methods and their results can only be made if a rigorous protocol is applied along the data collection and analysis pipeline. I am confident that the authors themselves would appreciate a more rigorous practice and degree of conscientiousness in the case they were the patients whose future chances of survival are predicted. Reviewer #3: Some results are not clearly and completely presented. Some tables need minor repairs. -------------------- Conclusions -Are the conclusions supported by the data presented? -Are the limitations of analysis clearly described? -Do the authors discuss how these data can be helpful to advance our understanding of the topic under study? -Is public health relevance addressed? Reviewer #1: -Are the conclusions supported by the data presented? Partially -Are the limitations of analysis clearly described? Yes -Do the authors discuss how these data can be helpful to advance our understanding of the topic under study? Partially -Is public health relevance addressed? No “It was possible to develop calibrated models, which allows the development of prediction tools. The ML model proved to be useful…” There is no data supporting this conclusion. Neither in methods section nor in results section there is any mention regarding calibration statistics. Furthermore, it is known that machine learning techniques are usually bad calibrated models and there are several methods proposed to recalibrate the estimated probabilities from these techniques. However, it makes more sense to concern with calibration as a performance dimension with regression models, when probabilities are results of interest. Although the research has produced several models and one final model was considered, the authors did not present a prediction instrument from the final model, therefore stating that the models are useful is not supported by the results as it is. I can’t see how a reader or a health care provider can use any of such models to make predictions. I would expect a formula or a nomogram allowing users to make predictions. However, these are not applicable from machine learning techniques due to the “black box” phenomenon. A work around would be a web calculator. There are many many examples in the Internet of such tools, one I particularly find elegant is found in the following address: https://breast.predict.nhs.uk/ and for chagas disease in the following address: https://shiny.ini.fiocruz.br/pedrobrasil/ Reviewer #2: It is in my opinion difficult to make conclusive statements regarding the appropriateness of the study, as such statements should be based on the statistical properties of the sample and the complexity of the prediction model. Because such information is only partially available or missing altogether, I can only assume that the models have been thoroughly optimized and that the class distribution was preserved in the creation of the 5 folds used for validating the hyperparameters. These are the minimal and essential requirements when dealing with statistical information: Their use is rigorously documented in the rich literature that modern public health has generated, it should not be neglected but rather integrated to support the use of machine learning techniques. My opinion is that accuracies in the 70% range are not outstanding. However, and perhaps more importantly, they highlight the existence of a fundamental structure in the data which should be further discovered and explored. As it is a very uncommon in data analysis that a rigorous statistical treatment is developed. I advise the authors to consider a re-statement of the objectives of their work towards the usual practice to replace explicit statistical information with visual summaries of the data (typically, histograms and scatter plots), and support their presentation and the quality of nevertheless interesting investigations with highly valuable qualitative information. I encourage the authors to restructure their contribution with such summaries, as they would introduce the reader to the main features of the used data, rather than their questionable predictive power. Reviewer #3: The discussion and conclusion related to the parasite load ( quantitative PCR) should be founded better. The conclusion/discussion about the public health relevance should be reviewed ( see comment on line 353) -------------------- Editorial and Data Presentation Modifications? Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”. Reviewer #1: (No Response) Reviewer #2: I do not have any comments about this. Reviewer #3: The authors present a manuscript of relevant interest. The subject falls within the scope of the journal. Overall, the paper is well written and contains valuable information. The bibliography is pertinent and current. However, the text still needs some improvement and minor repairs. Excerpts that deserve special attention in terms of writing were marked in yellow. Comments and additional suggestions have been placed in the margin of the text. Minor remarks- Please see lines 22, 56, 68, 70, 197, 201, 212, 217, 234, 238, 248, 255, 256, 258, and 354. Major remarks- Please see comments on lines 252, 293-294, 321, and 353 -------------------- Summary and General Comments Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed. Reviewer #1: It seems dad the authors where eager to develop prediction models with machine learning techniques. From my point of view, there could be at least two main approaches for this research. The first one would be to extensively test different ways to apply different machine learning techniques using Chagas disease as a case study scenario and discuss particular issues of the modeling process for Chagas disease data behavior with different models. Here, the model development and how to model would be of major interest. The second approach would be to really build a prediction instrument for different clinical scenarios of Chagas disease health care settings, in a way that health care providers could make predictions from these models. Here, the applicability of one or more models would be of major interest. Unfortunately, the authors fell in between those approaches, not fulfilling neither of those. To fulfill the first one, the suggestion would be to compare neural networks with binary outcome with neural networks with survival outcome, in a similar way to compare random forest with binary outcome with random forest with survival outcome etc. To fulfill the second approach the suggestion would be to construct a prediction instrument from the final model or from a couple of chosen models and make it available for users to make predictions. It seems that no reporting guideline was followed as there are a few key point that I consider crucial for overall understanding that were poorly described. I would personally use the TRIPOD (https://www.equator-network.org/reporting-guidelines/tripod-statement/) as a reporting guideline, but there are others that could be as useful (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7538196/) Abstract: Chagas Heart Disease has three main mechanism of death which are: stroke, sudden death and heart failure. It is not clear which mechanism of death it is used as the outcome. If the overall mortality is the outcome, please state so. From the applicability point of view, it is important to know which mechanism of death are involved in the prediction as these mechanisms require different approaches regarding treatment and to courses of action in prevention. Additionally, overall mortality for chagas disease may have a substantial contribution from mortality not related to CD, as these patients live many years or even decades with this condition. Therefore, knowledge regarding this issue is important for appropriate interpretation and applicability of the final model. The main objective of the research is to compare five different machine learning techniques, however in the abstract the authors do not state which techniques are being tested and compared. Please, state what are the models being compared. In abstract “Using the predictor variables from the 35 interview, the different techniques achieved a maximum accuracy of 62 % in predicting death…”. Is accuracy in this statement the same as overall correct classification or predictions? In the abstract, the authors stated that are three measures of interest to express the model’s performance, and accuracy is not one of them. Please define accuracy or state the model’s performance as one of the three measures of interest. Introduction: The authors state in the introduction that machine learning techniques learn from the data and can make predictions. Although I do agree with the authors, this also applies to regression models that are intended to prediction. The main characteristics of machine learning techniques that make it attractive over regression models are their automated procedures and the ability to capture complex patterns in the data in a way that implementing and updating the model in automated computerized systems make it much easier. However, this implementation does not seem to be the case, thus additional issues rise. Most if not all ML techniques suffer (more or less) from the “black box” phenomena. This means that if the data pilot inserts 30 predictors in a potential model, the algorithm shrinks the effect of the variables with no prediction contribution to zero without actually removing the predictor from the model. If the model is implemented in a way that the user needs to fill the 30 predictors values, it becomes less and less attractive to users as the number of predictors rises. Most of ML are just not made for removing the weak or near zero effect predictors, thus reducing the number of predictors is a workload balance between the data pilot effort and the user effort. This is briefly cited at the methods section “Selection of predictor variables for the model”. But the priorities of the authors are not clearly stated around this issue. From the performance point of view, there are plenty studies comparing performance of regressions and machine learning techniques with conflicting results. Additionally, recent systematic review points toward an evidence that there is no clear performance advantage off machine learning techniques over regression models when considering studies last susceptible to bias. Please take a look at the paper below. Christodoulou E, Ma J, Collins GS, Steyerberg EW, Verbakel JY, Van Calster B. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. Journal of Clinical Epidemiology. junho de 2019;110:12–22. The authors make no comment at all in the introduction about the usefulness or pitfalls of the models for death prediction for Chagas disease population already available in the literature. I suggest looking for models predicting death for Chagas disease, as I am aware of at least three of them, and comment about the potential improvement required in the field from this point on. Methods: “Of the 2,161 participants at baseline, 123 202 with negative or indeterminate serology for CD were excluded, thus all participants included in 124 this study (n=1,959) had confirmed serology.” should be in the “study population” section. Often, it is confusing writing reports of studies that are actually sub studies mentioning a main study. Take special attention to mention key points applicable to substudy only or main cohort only. I suggest the authors clearly state which are the methods from the main cohort and the ones applied on this report. Results: There is no scale value or axis label for the vertical axis at figure 2. Is it really supposed to be like this or there is something missing? Discussion: I believe there is two main results to be discussed from this research. the first one is the applicability of the chosen models which would be when and how these models could be used and alternatively when and how these models could not be used, or the limitations impairing their use. Additionally, how better or worse these models are when comparing to the existing models. This discussion is not present. The second main result is briefly commented in discussion. The authors stated that BNP alone is the strongest predictor. This is not exactly new as for long many researchers have shown that heart function expressed as ejection fraction is the is strongest predictor of death for this population. Additionally, it has a performance to identify death in two ears similar 2 many other combinations in the model and the authors cited a research where BNP was tested as prognostic marker showing similar performance to the observed in this research. So, what are the differences (advantages and disadvantages) of applying BNP as a single test, and applying BNP in the machine learning model? Of course, this discussion only makes sense if there is expectation to use BNP for new patients in the future. If this is not the case, it wouldn't make any sense to test BNP in a prediction model in first place. One point that should be in discussion section is what courses of actions are possible from these model results? What decisions health care providers can make or improve from the application of these models? Let's suppose a physician is making a prediction with one of these models and the model returns that this patient will die in the next two years. What could this physician do with this information? What are the future research to improve the results shown in this report and the field of Chagas disease mortality prevention? Kind regards and may the force be with you. Reviewer #2: I do not recommend the publication of the work in the present form. While I believe that the data and the study are interesting and should be communicated in some form, I do not believe that the analysis reported in the manuscript is of sufficient rigour. My suggestion to the authors is to reconsider the perspective of the manuscript and present the analysis from the point of view of an exploratory analysis of an interesting cohort, rather than a predictive modelling effort. It is also important that the authors realise that there is a place and role for the use of machine learning approaches in the clinical practice. Their use is critical and it should always be supported by extremely sober statistical evidence: Classification errors have a very high cost in the clinical practice, they should not be treated lightly. Failing this evidence, I trust that machine learning is a powerful way to explore the complexity of neglected diseases and I encourage the authors to restrict their use to that task only. Reviewer #3: The authors present an original, well-structured article with relevant information. Objectives of the study clearly articulated with a clear testable hypothesis stated. Study design appropriate to address the stated objectives. Population clearly described. Sample size sufficient to ensure adequate power to address the hypothesis being tested. There are no concerns about ethical or regulatory requirements. -------------------- PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Pedro Emmanuel Alvarenga Americano do Brasil Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Submitted filename: PNTD-D-21-01555.pdf Click here for additional data file. 4 Feb 2022 Submitted filename: carta_revisor.docx Click here for additional data file. 25 Mar 2022 Dear Ferreira, We are pleased to inform you that your manuscript 'Two-year death prediction models among patients with Chagas Disease using machine learning-based methods' has been provisionally accepted for publication in PLOS Neglected Tropical Diseases. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases. Best regards, Alberto Novaes Ramos Jr Associate Editor PLOS Neglected Tropical Diseases Bruce Lee Deputy Editor PLOS Neglected Tropical Diseases *********************************************************** Reviewer's Responses to Questions Key Review Criteria Required for Acceptance? As you describe the new analyses required for acceptance, please consider the following: Methods -Are the objectives of the study clearly articulated with a clear testable hypothesis stated? -Is the study design appropriate to address the stated objectives? -Is the population clearly described and appropriate for the hypothesis being tested? -Is the sample size sufficient to ensure adequate power to address the hypothesis being tested? -Were correct statistical analysis used to support conclusions? -Are there concerns about ethical or regulatory requirements being met? Reviewer #1: (No Response) Reviewer #2: (No Response) Reviewer #3: The authors present a well-structured study, with an adequate presentation of results and a detailed statistical analysis. ********** Results -Does the analysis presented match the analysis plan? -Are the results clearly and completely presented? -Are the figures (Tables, Images) of sufficient quality for clarity? Reviewer #1: (No Response) Reviewer #2: (No Response) Reviewer #3: The authors present a well-structured study, with an adequate presentation of results and a very detailed statistical analysis. ********** Conclusions -Are the conclusions supported by the data presented? -Are the limitations of analysis clearly described? -Do the authors discuss how these data can be helpful to advance our understanding of the topic under study? -Is public health relevance addressed? Reviewer #1: (No Response) Reviewer #2: (No Response) Reviewer #3: Several modifications have been made, with a significant improvement in the quality of the manuscript. ********** Editorial and Data Presentation Modifications? Use this section for editorial suggestions as well as relatively minor modifications of existing data that would enhance clarity. If the only modifications needed are minor and/or editorial, you may wish to recommend “Minor Revision” or “Accept”. Reviewer #1: (No Response) Reviewer #2: (No Response) Reviewer #3: Several modifications have been made, with a significant improvement in the quality of the manuscript. ********** Summary and General Comments Use this section to provide overall comments, discuss strengths/weaknesses of the study, novelty, significance, general execution and scholarship. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. If requesting major revision, please articulate the new experiments that are needed. Reviewer #1: The authors stated that the cohort "had a cardiac abnormality compatible with CD" thus considered elegible. However, the follow-up time was not yet discussed. There are two major points: (1) age may be considered as a proxy of time of disease as one may assume that most if not all subjects were infected in childhood, thus different ages ate the beginning of the follow-up may indicate different opportunities of Chagas disease progression and should be (and already is) adjusted accordingly; (2) It is important to have an idea of the cardiac function severity at the beginning of follow-up (LV ejection fraction), through echocardiography data or risk group data (A B C or D), as it has been proved consistently to be a strong predictor in the past researches. This is important because the more severe cases are likely to progress to death faster. Of course the BNP peptide is an indicator of this phenomena, but it is not as well stablished as LV EF, and there is no discussion regarding this correspondence in this report. From predictors selection strategy, the authors seemed to choose fewer predictors, 10, from the overall 45 available predictors. ML models usually automatically choose predictors from a full set of predictors by shrinking the weight of a weak predictor toward zero and keeping the strongest predictors no matter how many are they. But the data pilot usually has low degree of control on this process, often called the black box phenomena. Therefore, it is a bit awkward the statement that "In all cases the configuration with 10 variables obtained the best results in terms of GMean and therefore was the chosen configuration". This statement can be true, however it implies that there was a previous predictors selection (10 out of 45) before adjusting the model. If this is true, what was the rationale of choosing a particular set of ten? "We believe that it is not necessary to show the performance estimates for this set, as this is not a common practice in this type of study and because we believe that this information will not add new knowledge to the discussion." The authors may choose not to estimate the model performance in all sets, however I disagree with this statement. By showing that the performance is similar in validation and test sets the authors may show that performance travels for new predictions, and the likelyhood of model's optimism is low. Additionally, it would not do any harm, from my point of view, to add performances from all sets. Reviewer #2: (No Response) Reviewer #3: The authors present a manuscript of relevant interest. The subject falls within the scope of the journal. The paper is well written and contains useful information. Several modifications have been made, with a significant improvement in the quality of the manuscript. The bibliography is pertinent and current. Therefore, I recommend the approval of the manuscript for publication in this version. ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Pedro Emmanuel Alvarenga Americano do Brasil Reviewer #2: No Reviewer #3: No 8 Apr 2022 Dear Dr. Ferreira, We are delighted to inform you that your manuscript, "Two-year death prediction models among patients with Chagas Disease using machine learning-based methods," has been formally accepted for publication in PLOS Neglected Tropical Diseases. We have now passed your article onto the PLOS Production Department who will complete the rest of the publication process. All authors will receive a confirmation email upon publication. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any scientific or type-setting errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Note: Proofs for Front Matter articles (Editorial, Viewpoint, Symposium, Review, etc...) are generated on a different schedule and may not be made available as quickly. Soon after your final files are uploaded, the early version of your manuscript will be published online unless you opted out of this process. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Neglected Tropical Diseases. Best regards, Shaden Kamhawi co-Editor-in-Chief PLOS Neglected Tropical Diseases Paul Brindley co-Editor-in-Chief PLOS Neglected Tropical Diseases

24 in total

1. Symbolic features and classification via support vector machine for predicting death in patients with Chagas disease.

Authors: Cristina C R Sady; Antonio Luiz P Ribeiro
Journal: Comput Biol Med Date: 2016-01-22 Impact factor: 4.589

2. Comparison of estimates for the self-reported chronic conditions among household survey and telephone survey--Campinas (SP), Brazil.

Authors: Priscila Maria Stolses Bergamo Francisco; Marilisa Berti de Azevedo Barros; Neuber José Segri; Maria Cecília Goi Porto Alves; Chester Luiz Galvão Cesar; Deborah Carvalho Malta
Journal: Rev Bras Epidemiol Date: 2011-09

3. Mapping of capacities for research on health and its social determinants in Brazil.

Authors: Elis Borde; Marco Akerman; Alberto Pellegrini Filho
Journal: Cad Saude Publica Date: 2014-10 Impact factor: 1.632

4. Burden of Chagas disease in Brazil, 1990-2016: findings from the Global Burden of Disease Study 2016.

Authors: Francisco Rogerlândio Martins-Melo; Mariângela Carneiro; Antonio Luiz Pinho Ribeiro; Juliana Maria Trindade Bezerra; Guilherme Loureiro Werneck
Journal: Int J Parasitol Date: 2019-02-08 Impact factor: 3.981

5. Development and validation of a risk score for predicting death in Chagas' heart disease.

Authors: Anis Rassi; Anis Rassi; William C Little; Sérgio S Xavier; Sérgio G Rassi; Alexandre G Rassi; Gustavo G Rassi; Alejandro Hasslocher-Moreno; Andrea S Sousa; Maurício I Scanavacca
Journal: N Engl J Med Date: 2006-08-24 Impact factor: 91.245

Review 6. Chagas Cardiomyopathy: An Update of Current Clinical Knowledge and Management: A Scientific Statement From the American Heart Association.

Authors: Maria Carmo Pereira Nunes; Andrea Beaton; Harry Acquatella; Caryn Bern; Ann F Bolger; Luis E Echeverría; Walderez O Dutra; Joaquim Gascon; Carlos A Morillo; Jamary Oliveira-Filho; Antonio Luiz Pinho Ribeiro; Jose Antonio Marin-Neto
Journal: Circulation Date: 2018-09-18 Impact factor: 29.690

Review 7. Diagnosis and management of Chagas disease and cardiomyopathy.

Authors: Antonio L Ribeiro; Maria P Nunes; Mauro M Teixeira; Manoel O C Rocha
Journal: Nat Rev Cardiol Date: 2012-07-31 Impact factor: 32.419

Review 8. Chagas disease: an overview of clinical and epidemiological aspects.

Authors: Maria Carmo Pereira Nunes; Wistremundo Dones; Carlos A Morillo; Juan Justiniano Encina; Antônio Luiz Ribeiro
Journal: J Am Coll Cardiol Date: 2013-06-13 Impact factor: 24.094

9. Ten-year incidence of Chagas cardiomyopathy among asymptomatic Trypanosoma cruzi-seropositive former blood donors.

Authors: Ester C Sabino; Antonio L Ribeiro; Vera M C Salemi; Claudia Di Lorenzo Oliveira; Andre P Antunes; Marcia M Menezes; Barbara M Ianni; Luciano Nastari; Fabio Fernandes; Giuseppina M Patavino; Vandana Sachdev; Ligia Capuani; Cesar de Almeida-Neto; Danielle M Carrick; David Wright; Katherine Kavounis; Thelma T Goncalez; Anna Barbara Carneiro-Proietti; Brian Custer; Michael P Busch; Edward L Murphy
Journal: Circulation Date: 2013-02-07 Impact factor: 29.690

10. Challenges in the care of patients with Chagas disease in the Brazilian public health system: A qualitative study with primary health care doctors.

Authors: Renata Fiúza Damasceno; Ester Cerdeira Sabino; Ariela Mota Ferreira; Antonio Luiz Pinho Ribeiro; Hugo Fonseca Moreira; Thalita Emily Cezário Prates; Cristina Andrade Sampaio; Desirée Sant Ana Haikal
Journal: PLoS Negl Trop Dis Date: 2020-11-09