| Literature DB >> 31312723 |
Iain G Johnston1,2,3, Till Hoffmann4, Nick S Jones2,4, Climent Casals-Pascual5,6, Sam F Greenbury2,4, Ornella Cominetti7, Muminatou Jallow8, Dominic Kwiatkowski5, Mauricio Barahona2,4.
Abstract
More than 400,000 deaths from severe malaria (SM) are reported every year, mainly in African children. The diversity of clinical presentations associated with SM indicates important differences in disease pathogenesis that require specific treatment, and this clinical heterogeneity of SM remains poorly understood. Here, we apply tools from machine learning and model-based inference to harness large-scale data and dissect the heterogeneity in patterns of clinical features associated with SM in 2904 Gambian children admitted to hospital with malaria. This quantitative analysis reveals features predicting the severity of individual patient outcomes, and the dynamic pathways of SM progression, notably inferred without requiring longitudinal observations. Bayesian inference of these pathways allows us assign quantitative mortality risks to individual patients. By independently surveying expert practitioners, we show that this data-driven approach agrees with and expands the current state of knowledge on malaria progression, while simultaneously providing a data-supported framework for predicting clinical risk.Entities:
Keywords: Applied mathematics; Developing world; Malaria
Year: 2019 PMID: 31312723 PMCID: PMC6620311 DOI: 10.1038/s41746-019-0140-y
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Mutual information approach to identify features predicting mortality. At each level (horizontal axis), patient data are greedily split into two subsets according to the remaining feature that most strongly predicts mortality. The algorithm stops when no feature is statistically significantly associated with death. The figure shows a tree generated by this algorithm: cerebral malaria (CM), respiratory distress (RD), splenomegaly (SP), abnormal posturing (PO) and transfusion (TF) are selected as informative features. Nodes are shown as pie charts representing the composition of WHO classifications in each cluster. Solid (dashed) edges indicate that the feature was present (absent) and their width is proportional to the number of patients. The vertical axis corresponds to the mortality log odds ratio compared with the average mortality. Partition 8 has infinite log odds ratio (LOR) because all patients survive
Fig. 2Inferring the pathways of malarial disease progression with HyperTraPS. a The HyperTraPS algorithm (see text) was used to infer the ordering with which malarial symptoms are likely acquired across patients. Horizontal axis records symptoms; vertical axis records ordering from low (early acquisition) to high (late acquisition). This ordering axis is grouped into seven longer “ordering windows” in the lower subsection of the figure, to display broader trends in addition to specific features of the dynamics. The size of a semicircle denotes the posterior probability that a given symptom is acquired at a given ordering in progression of malaria. Red semicircles are posteriors from the dataset of patients who died; blue semicircles inferred from patients who lived. Highlighted symptoms display a greater Kolmogorov–Smirnov distance between posteriors from survival and death pathways than between either posterior and the uninformative prior, forming potentially diagnostic features. b Posterior distributions on ordering for three features that differentiate between patients that eventually die and those that eventually survive, and for one that does not discriminate
Fig. 3Comparison of inferred disease progression results with expert survey. Horizontal axis gives the mean ordering of a symptom’s acquisition from HyperTraPS inference results; vertical axis gives the mean ordering of that symptom resulting from a survey of expert opinions (see text). The size of each circle is proportional to the frequency with which that feature is “present” when observed in the dataset: small circles are rarely observed, large circles commonly so. Vertical error bars correspond to the standard deviation of expert responses for a given feature, illustrating the substantial range of opinions across our surveyed experts
Fig. 4Prediction and validation of hidden patient symptoms using HyperTraPS. a Rows correspond to an illustrative subset of individual patients; columns give different observed symptoms. Upwards triangles denote feature presence, downwards triangles denote absence. A random subset of features was artificially hidden, and the prediction algorithm using HyperTraPS posteriors described in the text was then applied to predict the presence or absence of these features given the remaining features (small grey triangles). Blue triangles denote correct predictions; red denote incorrect predictions; large grey triangles give instance where no strong prediction was available. Overall 83% of predictions (1104 of 1330) were accurate. b Illustration of prediction of likely next steps in disease progression for a given patient. Starting from any given patient state, HyperTraPS posteriors give the probability that any symptom is the next to be acquired by that patient. Circles represent the probability that each symptom will be acquired next, in two cases: a patient with no symptoms, illustrating the agreement with Fig. 2, and a real patient taken from the dataset. In both cases the four most likely next symptoms are given on the right
Fig. 5Bayesian classification of patient risk. a Pipeline for applying Bayes’ theorem and simulation on the learned dynamics of surviving and dead patients to classify risk of new patients. b A test dataset of 50 patients that died and 50 patients that survived was analysed using posterior distributions for disease progression pathways derived from a separate training dataset. Figures give the likelihood ratio of a given patient being on a high-risk trajectory to that patient being on a low-risk trajectory, used to classify patients into high and low risk classes. Blue figures show where this classification aligns with the true patient outcome; red figures show where this classification does not align with patient outcome; dashes indicate cases where a classification was not available. Bars show the proportions of correct (blue) to incorrect (red) classifications. Overall 81% of classifications (57 of 70) were successful; false positive identification rate of high-risk patients (i) is 20%, and false negative identification of high-risk patients (ii) is 6%