Literature DB >> 36240212

Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS.

Sara Domínguez-Rodríguez¹, Miquel Serna-Pascual¹, Andrea Oletto², Shaun Barnabas³, Peter Zuidewind³, Els Dobbels³, Siva Danaviah⁴, Osee Behuhuma⁴, Maria Grazia Lain⁵, Paula Vaz⁵, Sheila Fernández-Luis^6,7, Tacilta Nhampossa⁷, Elisa Lopez-Varela⁷, Kennedy Otwombe⁸, Afaaf Liberty⁸, Avy Violari⁸, Almoustapha Issiaka Maiga⁹, Paolo Rossi¹⁰, Carlo Giaquinto¹¹, Louise Kuhn¹², Pablo Rojo¹, Alfredo Tagarro^1,13,14.

Abstract

Logistic regression (LR) is the most common prediction model in medicine. In recent years, supervised machine learning (ML) methods have gained popularity. However, there are many concerns about ML utility for small sample sizes. In this study, we aim to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. The data set (n = 100) was randomly split into 70% training and 30% validation set. Seven algorithms (LR, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Naïve Bayes (NB), Artificial Neural Network (ANN), and Elastic Net) were compared. The variables included as predictors were the same across the models including sociodemographic, virologic, immunologic, and maternal status features. For each of the models, a parameter tuning was performed to select the best-performing hyperparameters using 5 times repeated 10-fold cross-validation. A confusion-matrix was built to assess their accuracy, sensitivity, and specificity. RF ranked as the best algorithm in terms of accuracy (82,8%), sensitivity (78%), and AUC (0,73). Regarding specificity and sensitivity, RF showed better performance than the other algorithms in the external validation and the highest AUC. LR showed lower performance compared with RF, SVM, or KNN. The outcome of children living with perinatally acquired HIV can be predicted with considerable accuracy using ML algorithms. Better models would benefit less specialized staff in limited resources countries to improve prompt referral in case of high-risk clinical progression.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36240212 PMCID： PMC9565414 DOI： 10.1371/journal.pone.0276116

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.752

Introduction

There is still a high rate of mortality in children living with HIV in the first years after antiretroviral (ART) initiation in Sub-Saharan Africa. These rates reported being as high as 21% in South Africa and 26% in Mozambique [1]. Several risk factors are associated with increased childhood mortality and clinical progression to AIDS. Advanced disease at HIV diagnosis [2], older age at ART initiation [3], higher baseline HIV viral load [4], low weight-for-age Z-score [3], low % CD4 [4], and female sex [3] are the factors reported most. However, the risk models that yielded these risk factors were built using logistic regression (LR) models, considered the standard and most used approach for binary classification in the context of low-dimensional data. Modern ML algorithms use statistical, probabilistic and optimization methods to detect patterns or rules from large and complex datasets [5]. Specifically, supervised ML systems create prediction models by studying a dataset where the outcome is known and subsequently validating this model in a dataset where the outcome is unknown. Unlike classical multivariable LR, ML allows the inclusion of a large number of predictors, that may exceed the number of observations, and allows the combination of these predictors in flexible linear and nonlinear ways [6]. By relaxing model assumptions, ML may improve risk classification. Despite its advantages, the application of ML other than LR in the HIV field has been limited [7-11]. Several studies have claimed the better performance of machine learning versus logistic regression in other medical fields [12-14]. Despite these advantages, many researchers question the true benefit of these methodologies, as they rightly point out that large amounts of data are usually required to yield accurate results with modern ML algorithms [15]. However, the performance was only measured in terms of area under the curve (AUC), a metric that can be misleading to test the performance of a model [16]. Interpretability of ML results has also been a source of concern for medical researchers, as in many cases, complex data analyses need to be performed requiring high-level programming skills and requiring complex interpretations. However, the implementation of automated ML in easy packages have emerged in recent years helping non-experts to use ML off-the-shelf [17]. To our knowledge, no study has addressed whether novel ML algorithms outperform traditional models in limited sample sizes including the discrimination ability of several algorithms in terms of probability class given by the model. This study presents real-life data of early-treated children with HIV. In this study, we did not aim to explain or describe the course of infection in these children but to select the best predicting algorithm.

Materials & methods

Study population

This analysis was performed within an international prospective multicenter Early Antiretroviral Treatment in HIV Children (EARTH) Cohort, embedded in the EPIICAL Consortium (Early treated Perinatally HIV Infected individuals: Improving Children’s Actual Life Project). Patients were recruited from South Africa (Africa Health Research Institute (AHRI), Durban; Family Centre For Research With Ubuntu (FAMCRU), Stellenbosch; Perinatal HIV Research Unit (PHRU), Soweto) and Mozambique (Centro de Investigación en Salud de Manhiça (CISM)–ISGlobal, Manhiça; Fundação Ariel Glaser contra o SIDA Pediatrico, Mozambique). Inclusion criteria included infants living with perinatally acquired HIV who begin ART before 3 months of age, or breastfed infants diagnosed with HIV under three months of age that started ART within three months of diagnosis were enrolled in the study. In this analysis, we included a total of 100 children with a minimum follow-up time of 12 months and had data on viral load (VL) or percentage of CD4 T-cells (%CD4) available at enrollment until May 1st 2021. Infants and mothers’ demographic, clinical, virological, and immunological variables at enrollment were included as possible predictors in the dataset. The combined primary outcome was mortality or clinical progression to AIDS 12 months after enrollment. Clinical progression to AIDS was defined as having a clinical WHO AIDS stage III or IV events after enrollment visit. The variables included as predictors were the same for all the models: age at recruitment, sex, weight-for-age (WAZ) z-score at enrollment calculated using the WHO Growth Reference [18], preterm birth, age at HIV diagnosis, age at ART initiation, initiation ART regimen, pre-ART VL, baseline %CD4, mother severe life events (including change in employment, separation or relationship break-up, new partner, loss of home or move, or death in the family) or health issues, mother’s adherence to ART at enrollment, and mother’s last CD4 count and VL measurements. The EARTH study was approved by ethics committees within each country and written informed consent was obtained from the parent(s)/legal guardian following country-specific law. Each participant received a unique study number, under which data were pseudo-anonymized.

Models

Logistic Regression (LR)

This algorithm is considered an extension of linear regression, used to model a dichotomous variable that usually represents the presence or absence of an event. In other words, LR can predict the likelihood of occurrence of a certain outcome [19]. Random Forest (RF): The RF algorithm is built by many decision trees, as a forest is formed by a collection of many trees. To classify a new sample, the input vector of that sample goes through all the decision trees of the forest. Each decision tree gives a classification outcome, i.e., a vote. After a large number of trees is generated, the forest chooses the majority vote for the most popular class and chooses the classification outcome [20].

Support Vector Machine (SVM)

The SVM technique constructs an optimal hyperplane. In other words, a delimitation to separate the training set according to the outcome into an n-dimensional space, where n is the number of features. The distance between this hyperplane and the nearest instance is the marginal distance for a class. The optimal hyperplane is the one that maximizes this distance for both classes. When the data are not linearly separable, the hyperplane is constructed by a non-linear function (kernel) [21]. For this research, the radial basis kernel was used.

Naïve Bayes (NB)

The NB algorithm is based on the probabilistic Bayes’ theorem. This classifier calculates the probability of an event based on observed values of relevant variables and prior knowledge on the frequency at which these values occur when the event takes place. The algorithm assumes that each of the variables contributes independently to that probability. The final classification is produced by combining the prior knowledge and observations of predictor variables and comparing the likelihood of an event occurring.

K-nearest neighbour (KNN)

The KNN algorithm can be considered a variant of NB classifier. Unlike NB, the KNN algorithm does not require prior probabilities. This algorithm assumes that similar events exist in close proximity. In other words, the values of their predictor variables are similar. The implementation of this algorithm is preceded by a clustering technique, used to determine this similarity and therefore the proximity of a new event to a known event.

Artificial Neural Networks (ANN)

The ANNs are machine learning algorithms inspired by the neural activity of the brain [22]. ANNs are represented by nodes (analogous of neurons) grouped in matrices called layers, which interconnect to other layers to generate knowledge, just as neuronal layers in the brain. Nodes and edges (connections) have weights that are adjusted according to the frequency in which the connections are made, allowing the model (analogous of the brain itself) to recognize patterns in future datasets characteristic of a given class.

Elastic Net (EN)

The EN is the most recent and optimized penalized regression algorithm used in the machine learning field. These algorithms are based on simple linear regression. However, they ‘penalize’ the least informative variables by reducing or even eliminating their coefficients, providing a simple and more generalizable model, which helps to reduce the dimensions of the model and its tendency to overfit training data [23]. The caret R package [24] was used to implement stats [25] for LR, randomForest [20] for RF, kernlab [26] for SVM, klaR [27] for NB, nnet [28] package for ANN, and the glmnet [23] package for the Elastic Net.

Training and validation of the models

All the included participants were randomly divided into a training dataset used to generate the models (70% of the original dataset), and a validation dataset used to assess the performance of the models (remaining 30%). Partitions were balanced by the outcome class, which yielded a training set with an n = 71, and a validation set with an n = 29. Both the training and validation data set was compared to ensure that the validation set was representative of the whole population. Chi-squared tests were used to compare categorical variables and Mann-Whitney U test to compare continuous variables. Absolute numbers and frequencies were assessed for the categorical variables and medians and interquartile ranges [IQR] for the continuous. For each of the novel 6 ML models, a parameter tuning was performed in the training set with 10 tune grids to select the best-performing hyperparameters of each model using 5 times repeated 10-fold cross-validation. The tuning parameters were the ones implemented in the caret R package (RF: mtry; SVM: C and sigma; NB: fL and adjust; KNN: K; ANN: size and decay; and GLMnet: alpha and lambda). To deal with class imbalance, a down-sampling was performed, in other words, randomly sub-setting all the classes in the training set so that their class frequencies match the least prevalent class. A p-value <0·05 was taken as statistically significant. A confusion matrix was built for each of the models to assess their accuracy, sensitivity, and specificity. The area under the curve (AUC) of the receiver operating characteristic curves (ROCs) was also determined for each model [29]. Both apparent AUC (determined using the training set) and actual AUC (determined in the validation set) were calculated. We calculated optimism as the difference between the apparent AUC and validated AUC. All missing values were imputed independently for the training and the testing dataset, Imputation was done by using a non-parametric approach based on Random Forest (RF) algorithm. The Out-of-bag (OOB) imputation error estimate was assessed as the normalized root mean squared error (NRMSE) and the proportion of falsely classified (PFC) outcomes. All the analyses were performed using the R language. The caret R package [24] was used to implement the splitting of the data set, model parameter tuning, model training, and confusion matrix validation. Data augmentation was done to increase the amount of data by adding synthetic samples (n = 280) to the real data that we had (n = 100). From this oversampling data (n = 380), we resampled with replacement three subsets of n = 100 (A1, A2, and A3). The seven models were trained and validated as described above in the A1, A2, and A3 subsets. The data augmentation was performed using the augmenteR R package. The R source code are publicly available at doi:10.5281/zenodo.6303556.

Results

A total of 100 children living with perinatally acquired HIV who received early treatment were included in the study. Of those, 33/100 (33%) died or clinically progressed to AIDS within 12-months of follow-up. Specifically, 22% died, 14% progressed, and 3% progressed and then died. The description of the infants’ features according to the primary outcome is described in the S1 Table. The distribution of each randomly divided data set (training and testing) was compared in Table 1. Training and testing data sets did not differ significantly across the variables included in the models. In the imputation of the training data set, the NRMSE was 0·22 for continuous and the PFC 0·14 for categorical, and for the testing, data set NRMSE was 0·01 and the PFC 0·0001. We did not find differences between the imputed dataset and the complete case data set (S2 Table). The tuning parameters for training each of the algorithms tested are described in Table 2.

Table 1

Feature distribution according to the different data sets.

	Training set	Testing set	p-value
	N = 71	N = 29
Age at recruitment			0.316
Days (median, IQR)	36.0 [29.6;69]	41.0 [30;89.1]
Gender			0.416
Female	35 (49.3%)	11 (37.9%)
Male	36 (50.7%)	18 (62.1%)
Weight-for-age at enrollment			0.805
z-score (median, IQR)	-1.46 [-2.62;-0.87]	-1.18 [-2.98;-0.30]
Preterm birth			0.140
No	41 (57.7%)	22 (75.9%)
Yes	30 (42.3%)	7 (24.1%)
Age at HIV diagnosis			0.563
Days (median, IQR)	30.0 [0.00;35.5]	31.0 [0.00;50.0]
Age at ART			0.195
(median, IQR)	32.0 [18.5;62.5]	36.0 [23.0;82.0]
Initial ART regimen			0.558
3TC+ABC+LPVr	33 (46.5%)	14 (48.3%)
3TC+ABC+NVP	0 (0.00%)	1 (3.45%)
3TC+AZT+LPVr	22 (31.0%)	8 (27.6%)
3TC+AZT+NVP	16 (22.5%)	6 (20.7%)
Baseline viral load			0.350
Copies/mL (median, IQR)	609715 [36738;2570245]	226844 [36295;1344319]
Baseline % CD4			0.587
Cell/mL (median, IQR)	36.9 [29.9;45.2]	40.0 [28.0;47.0]
Maternal severe life events or health issues			1.000
No	34 (47.9%)	14 (48.3%)
Yes	37 (52.1%)	15 (51.7%)
Maternal adherence (self-reported at enrollment)			0.589
Poor	3 (4.23%)	3 (10.3%)
Intermediate low	12 (16.9%)	4 (13.8%)
Intermediate high	18 (25.4%)	5 (17.2%)
Good	38 (53.5%)	17 (58.6%)

Table 2

Algorithm tuning parameters.

Algorithm	Tuning parameter
Logistic regression	-
Random forest	mtry = 12
Support Vector Machine	C = 8; sigma = 4.69·10⁻¹¹
Naïve Bayes	fL = 0; adjust = 1
K-nearest neighbor	K = 5
Artificial Neural Network	Size = 11; decay = 0.1
GLMNET	Alpha = 0.8; lambda = 0.21

Algorithm tuning parameters selected by repeated (5 times) 10-fold cross-validation in a grid. Mtry: Number of variables for splitting at each tree node in a random forest; C: regularization parameter that controls the trade off between the achieving a low training error and a low testing error; sigma: determines how fast the similarity metric goes to zero as they are further apart; fL: Laplace smoother; adjust: adjust the bandwidth of the kernel density; K = number of nearest neighbours; size: number of units in hidden layer; decay: regularization parameter to avoid over-fitting; alpha: regularization parameter; lambda: penalty on the coefficients

ART: Antiretroviral; 3TC: Lamivudine; ABC: Abacavir; LPVr: Lopinavir boosted with ritonavir; NVP: Nevirapine; Maternal severe life events: change in employment, separation or relationship break-up, new partner, loss of home or move, or death in the family; Maternal adherence (Optimal: No ART dose missed; Intermediate low: 10–50% doses missed; Intermediate high: 50–90%; Good: >90%) Algorithm tuning parameters selected by repeated (5 times) 10-fold cross-validation in a grid. Mtry: Number of variables for splitting at each tree node in a random forest; C: regularization parameter that controls the trade off between the achieving a low training error and a low testing error; sigma: determines how fast the similarity metric goes to zero as they are further apart; fL: Laplace smoother; adjust: adjust the bandwidth of the kernel density; K = number of nearest neighbours; size: number of units in hidden layer; decay: regularization parameter to avoid over-fitting; alpha: regularization parameter; lambda: penalty on the coefficients RF and SVM presented higher accuracy than the rest of the algorithms (RF: 82·8%, SVM: 82·8%, KNN:79·3%, LR: 75·9%, GLMnet: 69%, ANN: 69%, NB:65·5%). RF was the most sensitive algorithm (RF:78%, SVM: 66·7%, KNN:66·7%, ANN: 66·7%, NB: 55·6%, LR:55·6%, and GLMnet: 33.3%). Most algorithms presented similar specificity, except ANN and NB (SVM:90%, RF:85%, GLMnet:85%, LR: 85%, ANN: 70%, and NB: 70%). SVM and RF presented the highest PPV in the validation (SVM: 75%, RF:70%). Likewise, RF (89·5%) and SVM (85·7%) presented the highest NPV in the validation (Fig 1). Complete case analysis was performed, and RF was also the best performing model. The variable importance of each algorithm is summarized in S1 Fig. All the models agree on the baseline VL, age at HIV diagnosis, and WAZ as the most important predictors.

Fig 1

Algorithms performance in the validation set.

The RF was the algorithm with the largest AUC of the ROC curve (73.2% (67·2–79·1)) followed by the KNN algorithm (69·6% (36·2–76·1)) (Fig 2). The AUC optimism was higher in RF (0.2), LR (0.19), and NB (0.18) than in the rest of the algorithms like KNN (0.05), ANN (0.03), and SVM (0.02). RF-based model was also the one that performed the best in the three synthetic data sets (S2 Table).

Fig 2

Algorithms receiving operating curve in the validation set.

The probability of having the primary combined outcome calculated by each of the algorithms is described in Fig 3. LR and NB algorithms failed to assess significantly different probabilities for each of the outcome results. RF, SVM, KNN, ANN, and GLMnet were able to assess significantly different outcome probabilities.

Fig 3

Probability of death/progression according to each algorithm in the validation set.

Discussion

The results obtained in this study indicate that the outcome of children living with perinatally acquired HIV who received early treatment can be predicted with considerable accuracy (>80%) by using novel ML algorithms that can integrate clinical, virological, and immunological data. The RF algorithm outperformed in sensitivity all other tested ML methods, including LR, commonly used in the medical field, even trained in a small sample size (n = 71). Due to its flexibility, ML is broadly considered to be a better analytical tool than traditional statistical modelling, such as logistic regression. One of the main reasons why this may be is due to ML can overcome issues of multiple and correlated predictors, non-linear relationships, and interactions between predictors and endpoints that can lead to misassumptions and hyperparametric scenarios in traditional regression models [30]. However, LR has been the standard of care for binary classification. This is mainly because LR does not demand computational or statistical expertise and also because novel ML are believed to be data-hungry [15]. The limited sample size hurdle was very present in our study since acquiring hundreds or thousands of patients’ samples in some fields such as pediatric HIV is rarely feasible, particularly in the case where a follow-up is needed in countries with limited resources. We have shown that ML learning approaches including RF, SVM, ANN, or KNN can offer a performance advantage over traditional LR methods even in small sample sizes in terms of accuracy, sensitivity, and specificity. Of special interest, our results have shown that the RF model yielded significantly different outcome probabilities for each class unlike the other models applied. This means that the outcome probabilities given by this model could help the physicians to identify those children who will die or progress better than with the standard LR. These findings agree with other studies where RF performed better than LR [12, 14]. The AUC optimism given by the models were higher in all the models due to the limited sample size. Also, the RF, LR and NB models presented the highest differences between AUC validated and apparent AUC. However, in our study, LR did not present the lowest optimism as reported in other studies [15]. The data hungriness or performance should not be only evaluated with the AUC metric [16], which would be not the best measurement for all prediction goals including, for example, where maximizing sensitivity could save patients’ lives. We believe that with small data any algorithm will be penalized in robustness and the performance metric should be evaluated depending on the outcome of interest. In the clinical field, there are complex associations and highly correlated data, issues that RF copes better than traditional methods. Interpretability of ML results has also been a source of concern for medical researchers, as in many cases, the complex data analyses performed do not yield clear and simple relationships between variables and outcomes. Therefore, we believe that traditional LR may be the most suitable model for studies that aim to not just predict or classify outcomes, but, rather, to measure associations between specific factors and an event. In limited resources countries where HIV patients’ care is decentralized and no specialized staff is attending children, accurate models are needed to alert the clinician if the patient is at risk of clinical progression. In this case, clinicians may consider referring the patients to a higher level of care or a different follow-up schedule. In terms of limitations, the main drawback of this research is the limited number of observations, which likely diminishes the prediction capacity of all the trained models, designed to work with a much larger number of events. The design where the training and validation set are separated could exacerbate the limitation of sample size. This could be solved with methods like nested cross-validations in which the validation set also remained completely blind for the model. However, we decided to use a unique built-in R package as caret to reduce programming complexity and these novel methods are not implemented automatically. Further, we employed the machine learning classifiers without extensive optimizations, leading to the possibility that our results could be improved by performing these adjustments. Because of the same reason, we decided to follow the simplest way to implement modern ML. To make more robust the comparison of the models in small datasets, three additional synthetic small data sets (n = 100) were created to perform the same comparisons. Results were consistent among them, and RF was the best fitting model in the real and the synthetic datasets. The present study has several strengths. This study has brought ML close to the physicians by designing a digital comprehensive friendly app with direct applicability to healthcare. The collaboration of multidisciplinary staff such as physicians, bioinformaticians and biostatistician is of great value in identifying new and more efficient statistical methods to better interpret clinical data and aid clinical practice/patient management.

Conclusions

Despite early treatment and close follow-up, death and progression remain unacceptably high in infants living with HIV in Africa, especially in the first 6 months of life. This study shows how infant mortality and progression to AIDS in early-treated HIV-infected infants can be predicted more accurately with modern ML algorithms such as RF than with traditional LR. These state-of-the-art ML algorithms outperformed common LR also in a small sample size. This result is particularly interesting since recruitment difficulties and ethical considerations make pediatric studies more challenging, especially with critically ill children. In such cases, the importance of accurate models is stressed.

Study population characteristics according to the primary outcome (Death/clinical progression to AIDS).

(DOCX) Click here for additional data file.

Feature distribution in the original and imputed data sets.

(DOCX) Click here for additional data file.

Model performance in the different subsets.

(DOCX) Click here for additional data file.

Variable importance according to each algorithm.

(TIF) Click here for additional data file.

Inclusivity in global research.

(DOCX) Click here for additional data file. 18 Jan 2022

PONE-D-21-31247

Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS

PLOS ONE Dear Dr. Tagarro, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Mar 04 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Zaher Mundher Yaseen Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Our policy for research in this area aims to improve transparency in the reporting of research performed outside of researchers’ own country or community. The policy applies to researchers who have travelled to a different country to conduct research, research with Indigenous populations or their lands, and research on cultural artefacts. The questionnaire can also be requested at the journal’s discretion for any other submissions, even if these conditions are not met. Please find more information on the policy and a link to download a blank copy of the questionnaire here: https://journals.plos.org/plosone/s/best-practices-in-research-reporting. Please upload a completed version of your questionnaire as Supporting Information when you resubmit your manuscript. 3. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. 4. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section. 5. Thank you for stating the following in the Acknowledgments Section of your manuscript: This work has been supported within EPIICAL project by through an independent grant to the PENTA (Paediatric European Network for Treatment of AIDS) Foundation. The funders had no role in study design, data collection, analysis, and interpretation, or manuscript preparation. We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Please include your amended statements within your cover letter; we will change the online submission form on your behalf. 6. Please note that in order to use the direct billing option the corresponding author must be affiliated with the chosen institute. Please either amend your manuscript to change the affiliation or corresponding author, or email us at plosone@plos.org with a request to remove this option. 7. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. 8. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ 9. One of the noted authors is a group or consortium [EPIICAL Consortium]. In addition to naming the author group, please list the individual authors and affiliations within this group in the acknowledgments section of your manuscript. Please also indicate clearly a lead author for this group along with a contact email address. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: I Don't Know ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for asking my opinion about the manuscript entitled ‌‎"Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS" This manuscript ‎aims to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. I believe that this manuscript should ‎be minor revision:‎ The aim of this study (Page 1- Abstract) was there explore compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. There were several good things about the paper, such as ‎aim good. But the abstract should be reformulated and the ‎objective of the study should be well highlighted.‎ ‎1. The abstract is very long.‎ ‎2. In the introduction, include the significance of the study as ‎well as novelty. What makes the study different from the rest and ‎what does it add to the current knowledge?‎ ‎2.1. In the introduction, the authors should have explained the ‎purpose of this study and the existing gaps in this field and ‎explain why this study was conducted.‎ ‎3. In the material and methods section, include sample size justification. ‌‎ ‎4. Conclusions are too short and not supported by results.‎ ‎5. References are relevant, correct, and not recent. The number of ‎references should be increased.‎ ‎6. There are a lot of grammatical errors. This must be taken care ‎of and addressed.‎ Reviewer #2: I find out this article very interesting. It construction is very good and language fulfills scientific research standards. Selected methods from ML for comparison are general enough and popular enough. I recommend this article for publication. However I want to authors check if the the sigma value in the SVM method is accurate. If the sigma is very small the SVM tends to make a local classifier, larger sigma tends to make a much more general classifier. Please check if the optimization of this parameter was done right. The second thing, which will make this article more clearer to the reader, would be strict references to the Project R packages. I am familiar with this software, but I think that if someone is not, it would be hard to him to reproduce the methods used in results discussion. Reviewer #3: Summary: The authors compared seven well-understood machine learning (ML) approaches to the problem of predicting mortality and progression to full AIDS on infants with HIV from South Africa and Mozambique. They compared prediction accuracy, sensitivity, and specificity of these methods to highlight the relative benefits of some ML methods and especially against logistic regression, a very common prediction method in medicine. Developing a more accurate predictive model for health outcomes of HIV-infected infants is a very worthwhile task. Technical Soundness: The largest obvious challenge with this study, and one that the authors openly recognize, is the small dataset to be used for training and validation. With a 70/30 test-train split on n=100 observations, I believe the authors are challenged to convincingly demonstrate the generalizability of their comparative study. Some ML methods intrinsically perform better than others with smaller training datasets (e.g. Random Forest vs. ANN), but I remain unconvinced the the relative performance of the algorithms described herein would still hold true if one could train on more data. In other words, are the reported relative results really indicative of the true efficacy of the methods or are they significantly influenced by data sparsity alone? I think it would benefit this comparative study to increase the number of observations for training and validation by at least an order of magnitude. If that can't be done in a practical way with actual observational data due to real-world constraints, I would recommend that the authors look at tabular data augmentation methods to synthetically increase the size of the training and test sets. Tabular data augmentation such as Test-Time augmentation (TTA) can synthetically add new observations by adding small amounts of gaussian noise to duplicated data rows. Libraries such as DeltaPy (https://github.com/firmai/deltapy) also assist with meaningful tabular data augmentation and Generative adversarial networks (GANs) and other methods have also been used for this (https://arxiv.org/abs/2010.00638). I request the authors to find a way to train and test their algorithms on more data to at least quantify/estimate the effect of small dataset size on the relative performance of the tested results. I would have liked to see some attempt at feature importance estimation under all 7 different algorithms. Which independent variables have the most influence on the balanced accuracy of various ML models? While a bit more challenging for some methods such as ANN, determining relative feature importance under logistic regression and Random Forest is direct and straightforward. Even with ANN, you should be able to do Permutation Feature Importance estimation. Ideally, I would like to see a table of algorithms vs. features/variables ranked by importance. In other words, which features contribute the most to prediction accuracy across the various ML methods? And is there consistency in feature importance across ML methods? Figure 2 shows the AUC under the ROC curve and is very telling. An AUC of 50% indicates a totally random classifier, and several of the reviewed algorithms approximated a random classifier with an AUC of close to 50%: Logistic (51.7% AUC), SVM (46% AUC), ANN (52.9% AUC). Really, only Random Forest performed acceptably (73.2%) by the AUC metric, probably because RF had the highest sensitivity. Still, an AUC of 73% is not great. I would love to see how these AUC might change/improve as a function of additional training data. Given the known class imbalance in the data, the authors chose to randomly subsample the already limited training data so that the class frequencies matched the least prevalent class. Another way around this might have been to use a confusion matrix, but report balanced accuracy [(sensitivity + specificity)/2] instead of simple accuracy. Table 2 shows the tuned hyperparameters discovered by doing a gridded 10-fold cross validation. These are not comprehensive hyperparameters for these algorithms, so it would be helpful to understand how these were chosen. For example, did you only optimize mtry for RF? How did you chose this hyper parameter instead of ntree, sampsize, maxnodes, etc.? Can you include justification for which hyperparameters you chose to optimize for each method? The data is well-described in the manuscript, but the data is not publicly available. The authors stated that the data is protected by GDPR. I suspect that a sufficiently anonymized version of these data could have been made public in a way that did not conflict with GDPR. If not, please include the actual URL or DOI to the REDCap instance holding these data with better instructions for requesting access. It would be helpful to review your R source code. Can the authors please publish this source code, perhaps in a public GitHub repository and share the URL or DOI to access to the source code? Minor editing: There are various misspellings such as "enrolment" instead of enrollment on lines 46-51. Or "tunning" instead of tuning in Table 2. Lines 181-184 have some english word choice problems such as "These" instead of "This" or "will death" instead of "will die". ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: Yes: Luke Sheneman [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 21 Jul 2022 Respond to each of the reviewers have been attached as respond to reviewers in the general information to be attached. In that document you can see the answer highlighted in other color to facilitate the reading Editor comments: Journal requirements 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf We have reformatted the manuscript according to formatting guidelines 2. Please include a complete copy of PLOS’ questionnaire on inclusivity in global research in your revised manuscript. Our policy for research in this area aims to improve transparency in the reporting of research performed outside of researchers’ own country or community. The policy applies to researchers who have travelled to a different country to conduct research, research with Indigenous populations or their lands, and research on cultural artefacts. The questionnaire can also be requested at the journal’s discretion for any other submissions, even if these conditions are not met. Please find more information on the policy and a link to download a blank copy of the questionnaire here: https://journals.plos.org/plosone/s/best-practices-in-research-reporting. Please upload a completed version of your questionnaire as Supporting Information when you resubmit your manuscript. A copy of PLOS questionnaire on inclusivity in global research has been attached. 3. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified what type you obtained (for instance, written or verbal, and if verbal, how it was documented and witnessed). If your study included minors, state whether you obtained consent from parents or guardians. If the need for consent was waived by the ethics committee, please include this information. We have included additional information about participant informed consent in the Methods section, study population subsection. 4. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match. We have included the funder ViiV Healthcare in the cover letter 5. We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form. Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows: The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Please include your amended statements within your cover letter; we will change the online submission form on your behalf. We have included the funding information in the cover letter and remove the funding statement from the acknowledgements section. 6. Please note that in order to use the direct billing option the corresponding author must be affiliated with the chosen institute. Please either amend your manuscript to change the affiliation or corresponding author, or email us at plosone@plos.org with a request to remove this option. The affiliation of the corresponding author has been modified in the manuscript 7. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. The data availability statement has been included in the cover letter. 8. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ The ORCID ID from the corresponding author has been updated 9. One of the noted authors is a group or consortium [EPIICAL Consortium]. In addition to naming the author group, please list the individual authors and affiliations within this group in the acknowledgments section of your manuscript. Please also indicate clearly a lead author for this group along with a contact email address. The members of the EPIICAL consortium have been listed in the Acknowledgements section Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for asking my opinion about the manuscript entitled ‎"Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS" This manuscript ‎aims to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. I believe that this manuscript should ‎be minor revision:‎ The aim of this study (Page 1- Abstract) was there explore compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. There were several good things about the paper, such as ‎aim good. But the abstract should be reformulated and the ‎objective of the study should be well highlighted.‎ 1. The abstract is very long.‎ We agree with the reviewer that the abstract was long. However, there is several technical information on methods that take up space and we cannot exclude them. We have now provided an abstract with less than 250 words ‎2. In the introduction, include the significance of the study as ‎well as novelty. What makes the study different from the rest and ‎what does it add to the current knowledge?‎ ‎2.1. In the introduction, the authors should have explained the ‎purpose of this study and the existing gaps in this field and ‎explain why this study was conducted.‎ The gaps existing in this fields are detailed in line 18-28 of the introduction where it is explained that despite the advantages on the application of Machine learning stated in previous paragraph, the usability is scarce because the need of large amount of data and lack of interpretabilities models. We have provided line 29-32 explaining the novelty of the study and the aim and discuss the strengths of the study on the discussion section. We have now included in the text that this study presents real-life data instead of simulated data sets. ‎3. In the material and methods section, include sample size justification. ‎ We agree with the reviewer that a sample size justification will be interesting for the studies to assess how sample size was chosen. In this study we included a total of 100 children with information available at the moment of the analysis. This information has been newly included in line 48 in the study population subsection of the Material and Methods section. For that reason, there is no sample size calculation. In addition, the required sample size for a particular machine learning model is often unknown. Pre hoc approaches for estimating sample size for machine learning algorithms are known to be not robust in high dimensionality medical data and highly depends on the variability of the sample. ‎4. Conclusions are too short and not supported by results.‎ We have included a more detailed paragraph in the conclusions section of the manuscript ‎5. References are relevant, correct, and not recent. The number of ‎references should be increased.‎ We have included the most recent literature available at the moment. Following the reviewer advice, we have included some more references. ‎6. There are a lot of grammatical errors. This must be taken care ‎of and addressed.‎ We have now corrected grammatical errors and manuscript have been reviewed by a native co-author Reviewer #2: I find out this article very interesting. It construction is very good and language fulfills scientific research standards. Selected methods from ML for comparison are general enough and popular enough. I recommend this article for publication. However I want to authors check if the the sigma value in the SVM method is accurate. If the sigma is very small the SVM tends to make a local classifier, larger sigma tends to make a much more general classifier. Please check if the optimization of this parameter was done right. We agree with the reviewer in the association between tuning parameters of each model and the results. Also happening with SVM method. In this study we aim to compare the seven methods in predicting the probability of a composite endpoint (death and progression) and we want to follow an off-the-shelf solution without in-house customization that could lead to difficulties in interpreting results or reproducibility. For that reason, we performed an automatic parameter tuning implemented in the caret R package. For each of the novel 6 ML models, a parameter tuning was performed in the training set with 10 tune grids to select the best-performing hyperparameters of each model using 5 times repeated 10-fold cross-validation. We are aware that, as the reviewer suggested, better models could have been fitted using extensive tuning of the parameters. However, we tried to make it easier and reproducible in the field. The second thing, which will make this article more clearer to the reader, would be strict references to the Project R packages. I am familiar with this software, but I think that if someone is not, it would be hard to him to reproduce the methods used in results discussion. We have newly included in the methods section the packages and references for them. Reviewer #3: The authors compared seven well-understood machine learning (ML) approaches to the problem of predicting mortality and progression to full AIDS on infants with HIV from South Africa and Mozambique. They compared prediction accuracy, sensitivity, and specificity of these methods to highlight the relative benefits of some ML methods and especially against logistic regression, a very common prediction method in medicine. Developing a more accurate predictive model for health outcomes of HIV-infected infants is a very worthwhile task. Technical Soundness: The largest obvious challenge with this study, and one that the authors openly recognize, is the small dataset to be used for training and validation. With a 70/30 test-train split on n=100 observations, I believe the authors are challenged to convincingly demonstrate the generalizability of their comparative study. Some ML methods intrinsically perform better than others with smaller training datasets (e.g. Random Forest vs. ANN), but I remain unconvinced the the relative performance of the algorithms described herein would still hold true if one could train on more data. In other words, are the reported relative results really indicative of the true efficacy of the methods or are they significantly influenced by data sparsity alone? I think it would benefit this comparative study to increase the number of observations for training and validation by at least an order of magnitude. If that can't be done in a practical way with actual observational data due to real-world constraints, I would recommend that the authors look at tabular data augmentation methods to synthetically increase the size of the training and test sets. Tabular data augmentation such as Test-Time augmentation (TTA) can synthetically add new observations by adding small amounts of gaussian noise to duplicated data rows. Libraries such as DeltaPy (https://github.com/firmai/deltapy) also assist with meaningful tabular data augmentation and Generative adversarial networks (GANs) and other methods have also been used for this (https://arxiv.org/abs/2010.00638). I request the authors to find a way to train and test their algorithms on more data to at least quantify/estimate the effect of small dataset size on the relative performance of the tested results. We agree with the reviewer that sample size is one of the main limitation of this study. We believe that all of these 7 algorithms, including conventional logistic regression, would benefit from increasing sample size. Our aim was to assess the performance of these 7 algorithms under the same sample size limitation to adjust for real-life research in this setting. We agree that increasing sample size could even make these differences bigger since algorithms such as random forest has already demonstrated better performance in high-dimensional data. The novelty of this approach is to describe and compare the performance of the 7 algorithms under sample size limitation. As the reviewer suggested, we have explored some of the packages for data augmentation such as DeltaPy. However, these approaches are based on data augmentation for deep learning more focused on images classification rather than our aim. In any case, we will explore further types of simulation in next steps and try to find a way to increase sample size based on a sample features distribution. I would have liked to see some attempt at feature importance estimation under all 7 different algorithms. Which independent variables have the most influence on the balanced accuracy of various ML models? While a bit more challenging for some methods such as ANN, determining relative feature importance under logistic regression and Random Forest is direct and straightforward. Even with ANN, you should be able to do Permutation Feature Importance estimation. Ideally, I would like to see a table of algorithms vs. features/variables ranked by importance. In other words, which features contribute the most to prediction accuracy across the various ML methods? And is there consistency in feature importance across ML methods? We have now included the Supplementary Figure 1 to describe feature importance across ML the models and the results in Results section. Figure 2 shows the AUC under the ROC curve and is very telling. An AUC of 50% indicates a totally random classifier, and several of the reviewed algorithms approximated a random classifier with an AUC of close to 50%: Logistic (51.7% AUC), SVM (46% AUC), ANN (52.9% AUC). Really, only Random Forest performed acceptably (73.2%) by the AUC metric, probably because RF had the highest sensitivity. Still, an AUC of 73% is not great. I would love to see how these AUC might change/improve as a function of additional training data. As mentioned before, we agree with the reviewer that increasing the sample size will improve the performance of each of the models. There are several studies exploring the better performance of ML methods in high dimensional and big data. However, we wanted to compare the performance of these algorithms under the same limitations. Given the known class imbalance in the data, the authors chose to randomly subsample the already limited training data so that the class frequencies matched the least prevalent class. Another way around this might have been to use a confusion matrix, but report balanced accuracy [(sensitivity + specificity)/2] instead of simple accuracy. We agree with the reviewer that we could have modified the cut-off that configures the classifier determining an appropriate threshold to balance the accuracy. However we believe that was better to perform downsampling so that physicians and epidemiologist would understand better the sensitivity and specificity result and be more comparable among the ones reported in this setting medical field. Table 2 shows the tuned hyperparameters discovered by doing a gridded 10-fold cross validation. These are not comprehensive hyperparameters for these algorithms, so it would be helpful to understand how these were chosen. For example, did you only optimize mtry for RF? How did you chose this hyper parameter instead of ntree, sampsize, maxnodes, etc.? Can you include justification for which hyperparameters you chose to optimize for each method? The hyper parameters shown in Table 2 are the tuning parameters that could be modified and adjust in the caret R package. Tuning parameters are now described in methods section The data is well-described in the manuscript, but the data is not publicly available. The authors stated that the data is protected by GDPR. I suspect that a sufficiently anonymized version of these data could have been made public in a way that did not conflict with GDPR. If not, please include the actual URL or DOI to the REDCap instance holding these data with better instructions for requesting access. The study project is still ongoing and database locking has not been performed yet. In addition, this data is allocated and owned by the PENTA Foundation, in Italy and regulated under European GDPR data protection law. According to this regulation, this data is pseudonymized and not anonymized because birth date and site of recruitment is present. Also, there is not a formal consent from the patients or their caregivers to have their data openly available. However, data could be shared under specific circumstances and/or research collaboration by sending a research proposal to the corresponding author of this article. It would be helpful to review your R source code. Can the authors please publish this source code, perhaps in a public GitHub repository and share the URL or DOI to access to the source code? We have included the R code in a public repository. The R source code are publicly available at doi:10.5281/zenodo.6303556. Minor editing: There are various misspellings such as "enrolment" instead of enrollment on lines 46-51. Or "tunning" instead of tuning in Table 2. Lines 181-184 have some english word choice problems such as "These" instead of "This" or "will death" instead of "will die". We have now corrected grammatical errors and manuscript have been reviewed by a native co-author Submitted filename: Rebuttal_letter_PLOSONE_28-02-22.docx Click here for additional data file. 13 Sep 2022

PONE-D-21-31247R1

Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS

==============================

Please submit your revised manuscript by Oct 28 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, Zaher Mundher Yaseen Academic Editor PLOS ONE Journal Requirements: Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #3: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #3: Partly ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #3: I Don't Know ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #3: No ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #3: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thanks to the corresponding author, he was successful in ‎answering our questions. Thank you for asking my opinion about the ‎manuscript ‎entitled ‎‎"Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS". This ‎manuscript ‎aims to compare the performance of 7 algorithms in the prediction of 1-year mortality and clinical progression to AIDS in a small cohort of infants living with HIV from South Africa and Mozambique. I believe that this manuscript should ‎be ‎minor revision. Reviewer #3: The authors have revised their manuscript wherein they compared seven well-understood machine learning (ML) approaches to the problem of predicting mortality and progression to full AIDS on infants with HIV from South Africa and Mozambique. The authors should be congratulated for tackling such meaningful and important work. The authors responded to all reviewer comments and made corrections or additions where possible to address reviewer feedback. The largest problem with this study is the size of the dataset (n=100), as the authors fully recognize. It is not convincing to compare/contrast the performance of machine learning methods against only one extremely small dataset. There is not enough evidence here to demonstrate that Random Forest will generally outperform other methods for these kinds of data. All one can say is that RF outperformed other methods for this single, small dataset. That is an interesting datapoint, but I'm afraid I still remain unconvinced due to lack of sample size and/or replicates. If real-world data was not available (for practical reasons), I would have liked to see some kind of data augmentation via simulation to increase the length of this dataset OR simulate a number of small synthetic n=100 datasets with similar feature distributions. The authors mentioned that their goal was to compare ML methods specifically under sample-size constraints. That is fine, but the investigators should have more replicates (i.e. more n=100 datasets) to provide a convincing case. Comparing these ML methods against real + simulated data would instill more confidence in their comparative study. The authors mentioned that they will seek to explore simulation in future work, but they did not do so for this manuscript. Small sample size with one dataset is still a problem here and the authors have not offered a solution in their response to the reviewers. I am satisfied that the authors have done feature importance estimation as now mentioned in the results section and supplemental material. I am curious how this estimation was done for methods like ANN, where is it not always clear how features are used in hidden layers. It would be helpful if an extra sentence or two describing how feature importance estimation was done would be helpful. The authors now included Fig 1 in the supplemental material, but I don't have access to that as a reviewer so I cannot comment on that supplemental figure. The reviewers did a good job at better describing their tunable parameters for all tested algorithms in the Methods section and Table 2. This increases the interpretability and reproducibility of this work. The authors have made their R code publicly available and have provided a DOI link. The lack of openly publishing these data may still be an issue. The authors cannot openly and freely share data for a variety of very good reasons, and they have provided alternative methods for investigators to request data on a case-by-case basis. Whether this meets the PLOS ONE standards and requirements for open data is up to the Editor and PLOS ONE. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: Yes: Lucas Sheneman ********** [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

22 Sep 2022 We have revised the manuscript and adress the questions of the reviewers. No additional requirements were asked by reviewer 1. Answer to reviewer 3: We appreciate the input and suggestions proposed by the reviewer. We agree with the reviewer that the small sample size could be the largest problem of the study. We also agree that one may think that these results could be biased for these small data sets and appreciate the suggestion of data augmentation creating different small data sets to replicate these results in the same conditions. Using the augmenteR R package we managed to create a oversampled augmented data where we performed resampling with replacement and built three small synthetic dataframes (n=100 each). In each of the synthetic dataframes we performed the same analysis and model comparisons. Similar results were found in the real data and synthetic dataframes with global better results in all the models in the synthetic dataframes (probably because the class imbalance were less). In conclusion, RF was the best fitting model in all the cases validating our previous results. This was added to methods, results, and discussion. Submitted filename: REVIEWERS_SEP2022.docx Click here for additional data file. 29 Sep 2022 Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS PONE-D-21-31247R2 Dear Dr. Tagarro, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Zaher Mundher Yaseen Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: We feel that the manuscript has extensively improved. ‎Thanks to the corresponding author, he was ‎successful in ‎answering our questions. (ACCEPT).‎ ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No ********** 5 Oct 2022 PONE-D-21-31247R2 Machine learning outperformed logistic regression classification even with limit sample size: A model to predict pediatric HIV mortality and clinical progression to AIDS Dear Dr. Tagarro: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Zaher Mundher Yaseen Academic Editor PLOS ONE

19 in total

1. The 2006 WHO child growth standards.

Authors: Martin Bloem
Journal: BMJ Date: 2007-04-07

2. Comparing regression, naive Bayes, and random forest methods in the prediction of individual survival to second lactation in Holstein cattle.

Authors: E M M van der Heide; R F Veerkamp; M L van Pelt; C Kamphuis; I Athanasiadis; B J Ducro
Journal: J Dairy Sci Date: 2019-08-22 Impact factor: 4.034

3. Multiple Machine Learning Comparisons of HIV Cell-based and Reverse Transcriptase Data Sets.

Authors: Kimberley M Zorn; Thomas R Lane; Daniel P Russo; Alex M Clark; Vadim Makarov; Sean Ekins
Journal: Mol Pharm Date: 2019-02-26 Impact factor: 4.939

Review 4. Automated machine learning: Review of the state-of-the-art and opportunities for healthcare.

Authors: Jonathan Waring; Charlotta Lindvall; Renato Umeton
Journal: Artif Intell Med Date: 2020-02-21 Impact factor: 5.326

5. Mortality and clinical outcomes in HIV-infected children on antiretroviral therapy in Malawi, Lesotho, and Swaziland.

Authors: Mark M Kabue; W Chris Buck; Sebastian R Wanless; Carrie M Cox; Eric D McCollum; A Chantal Caviness; Saeed Ahmed; Maria H Kim; Lineo Thahane; Andrew Devlin; Duncan Kochelani; Peter N Kazembe; Nancy R Calles; Michael B Mizwa; Gordon E Schutze; Mark W Kline
Journal: Pediatrics Date: 2012-08-13 Impact factor: 7.124

6. Constrained binary classification using ensemble learning: an application to cost-efficient targeted PrEP strategies.

Authors: Wenjing Zheng; Laura Balzer; Mark van der Laan; Maya Petersen
Journal: Stat Med Date: 2017-04-06 Impact factor: 2.373

7. pROC: an open-source package for R and S+ to analyze and compare ROC curves.

Authors: Xavier Robin; Natacha Turck; Alexandre Hainard; Natalia Tiberti; Frédérique Lisacek; Jean-Charles Sanchez; Markus Müller
Journal: BMC Bioinformatics Date: 2011-03-17 Impact factor: 3.307

8. Risk factors associated with increased mortality among HIV infected children initiating antiretroviral therapy (ART) in South Africa.

Authors: Brian C Zanoni; Thuli Phungula; Holly M Zanoni; Holly France; Margaret E Feeney
Journal: PLoS One Date: 2011-07-29 Impact factor: 3.240

9. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints.

Authors: Tjeerd van der Ploeg; Peter C Austin; Ewout W Steyerberg
Journal: BMC Med Res Methodol Date: 2014-12-22 Impact factor: 4.615

10. Random forest versus logistic regression: a large-scale benchmark experiment.

Authors: Raphael Couronné; Philipp Probst; Anne-Laure Boulesteix
Journal: BMC Bioinformatics Date: 2018-07-17 Impact factor: 3.169