| Literature DB >> 33785776 |
Sam F Greenbury1,2, Kayleigh Ougham3, Jinyi Wu1,2, Cheryl Battersby3, Chris Gale3, Neena Modi3, Elsa D Angelini4,5.
Abstract
We used agnostic, unsupervised machine learning to cluster a large clinical database of information on infants admitted to neonatal units in England. Our aim was to obtain insights into nutritional practice, an area of central importance in newborn care, utilising the UK National Neonatal Research Database (NNRD). We performed clustering on time-series data of daily nutritional intakes for very preterm infants born at a gestational age less than 32 weeks (n = 45,679) over a six-year period. This revealed 46 nutritional clusters heterogeneous in size, showing common interpretable clinical practices alongside rarer approaches. Nutritional clusters with similar admission profiles revealed associations between nutritional practice, geographical location and outcomes. We show how nutritional subgroups may be regarded as distinct interventions and tested for associations with measurable outcomes. We illustrate the potential for identifying relationships between nutritional practice and outcomes with two examples, discharge weight and bronchopulmonary dysplasia (BPD). We identify the well-known effect of formula milk on greater discharge weight as well as support for the plausible, but insufficiently evidenced view that human milk is protective against BPD. Our framework highlights the potential of agnostic machine learning approaches to deliver clinical practice insights and generate hypotheses using routine data.Entities:
Year: 2021 PMID: 33785776 PMCID: PMC8009880 DOI: 10.1038/s41598-021-85878-z
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Receipt of each nutritional component stratified by gestational age.
| Gestational age (number of infants) | Proportion of infants having received each nutritional component on at least one day during neonatal unit stay | ||||||
|---|---|---|---|---|---|---|---|
| MM (%) | HDM (%) | BMF (%) | FM (%) | PN (%) | GE (%) | Survival (%) | |
| 22 (55) | 47 | 20 | 22 | 22 | 76 | 95 | 27 |
| 23 (1,217) | 72 | 27 | 39 | 43 | 88 | 95 | 45 |
| 24 (2,277) | 84 | 34 | 53 | 59 | 94 | 97 | 65 |
| 25 (2,607) | 89 | 35 | 62 | 68 | 96 | 96 | 79 |
| 26 (3,304) | 92 | 36 | 63 | 71 | 96 | 96 | 86 |
| 27 (4,239) | 93 | 35 | 61 | 76 | 97 | 95 | 92 |
| 28 (5,599) | 94 | 32 | 57 | 76 | 96 | 95 | 93 |
| 29 (6,555) | 94 | 30 | 48 | 77 | 92 | 93 | 97 |
| 30 (8,503) | 93 | 24 | 39 | 79 | 72 | 93 | 98 |
| 31 (11,323) | 91 | 18 | 27 | 82 | 51 | 94 | 98 |
| Total (45,679) | 91 | 27 | 45 | 76 | 80 | 94 | 92 |
Proportions of infants who received each component on at least one day are reported along with survival rates. Components are abbreviated as MM: Maternal milk, HDM: Human Donor Milk, BMF: Breast Milk Fortifier, FM: Formula Milk, PN: Parenteral Nutrition, GE: Glucose Electrolyte solution.
Figure 1Time curves of the proportion of infants receiving a given nutritional component Time is expressed as postmenstrual age in weeks. Curves are stratified and colour-coded by infant gestational age groups (GA), computed on the set of infants still in care at a given postmenstrual age. The vertical grey dashed line indicates 36 weeks postmenstrual age. In (a–f) we show the proportion of infants who are still in care at a given postmenstrual age receiving each of the six components (MM: Maternal Milk, HDM: Human Donor Milk, BMF: Breast Milk Fortifier, FM: Formula Milk, PN: Parenteral Nutrition, GE: Glucose Electrolyte solution). We only considered GA > 22 weeks as, due to small sample size, values for GA = 22 weeks exhibited too much variation for meaningful interpretation.
Statistics on the within-cluster populations of the 10 most common nutritional patterns and the whole cohort.
| Property type | Whole cohort | Cluster rank | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |||
| Cluster size | N | 45,679 | 11,052 | 5,471 | 4,868 | 3,951 | 2,935 | 2,913 | 2,707 | 1,382 | 1,207 | 758 |
| Proportion | 24% | 12% | 11% | 9% | 6% | 6% | 6% | 3% | 3% | 2% | ||
| Cumulative proportion | 24% | 36% | 47% | 56% | 62% | 68% | 74% | 77% | 80% | 82% | ||
| Geographical location variables | North | 30% | 32% | 27% | 29% | 36% | 29% | 27% | 40% | 14% | 17% | 26% |
| Midlands | 27% | 28% | 30% | 33% | 36% | 18% | 27% | 27% | 10% | 15% | 37% | |
| London | 20% | 19% | 18% | 17% | 11% | 22% | 24% | 16% | 45% | 36% | 27% | |
| South | 23% | 22% | 25% | 20% | 18% | 30% | 23% | 17% | 32% | 32% | 11% | |
| Admission variables | Birth Year | 2015 | 2015 | 2015 | 2015 | 2015 | 2015 | 2014 | 2015 | 2015 | 2015 | 2015 |
| Gestational Age (weeks) | 29 | 30 | 29 | 29 | 28 | 29 | 31 | 25 | 30 | 28 | 28 | |
| BW z-score | − 0.23 | − 0.09 | − 0.22 | − 0.31 | − 0.41 | − 0.36 | 0.27 | − 0.35 | − 0.24 | − 0.37 | − 0.20 | |
| Resuscitation | 8% | 6% | 8% | 7% | 9% | 9% | 2% | 17% | 5% | 9% | 12% | |
| Antenatal Steroids | 90% | 89% | 93% | 93% | 91% | 89% | 91% | 85% | 93% | 94% | 88% | |
| Sex (% girls) | 46% | 45% | 46% | 49% | 47% | 47% | 43% | 40% | 47% | 45% | 40% | |
| Apgar 1 min | 6 | 7 | 6 | 6 | 6 | 6 | 8 | 4 | 7 | 6 | 6 | |
| Apgar 5 min | 8 | 9 | 8 | 9 | 8 | 8 | 9 | 7 | 9 | 8 | 8 | |
| Apgar 10 min | 9 | 9 | 9 | 9 | 9 | 9 | 10 | 8 | 9 | 9 | 9 | |
| Outcome variables | Mortality | 8% | 1% | 1% | 1% | 2% | 1% | 0% | 76% | 0% | 0% | 27% |
| Necrotising Enterocolitis (NEC) | 3% | 2% | 0% | 0% | 3% | 2% | 0% | 11% | 0% | 0% | 34% | |
| Bronchopulmonary Dysplasia (BPD) | 27% | 20% | 26% | 30% | 46% | 36% | 2% | 11% | 15% | 36% | 40% | |
| Maternal Milk at Discharge | 54% | 34% | 99% | 90% | 27% | 8% | 93% | 32% | 89% | 98% | 30% | |
| Length of Stay (days) | 49 | 42 | 52 | 54 | 70 | 60 | 30 | 9 | 43 | 65 | 60 | |
| W36 z− score | − 1.59 | − 1.33 | − 1.66 | − 1.68 | − 1.74 | − 1.53 | − 1.29 | − 1.59 | − 1.69 | − 1.64 | − 1.78 | |
| W36dz | − 1.20 | − 1.07 | − 1.31 | − 1.22 | − 1.23 | − 1.07 | − 1.39 | − 1.33 | − 1.22 | − 1.18 | − 1.40 | |
| Nutritional variables | Maternal Milk | 0.66 | 0.51 | 0.96 | 0.95 | 0.68 | 0.30 | 0.93 | 0.32 | 0.92 | 0.96 | 0.43 |
| Human Donor Milk | 0.06 | 0.00 | 0.00 | 0.00 | 0.00 | 0.24 | 0.00 | 0.00 | 0.22 | 0.06 | 0.00 | |
| Breast Milk Fortifier | 0.14 | 0.00 | 0.30 | 0.35 | 0.22 | 0.00 | 0.00 | 0.00 | 0.00 | 0.44 | 0.00 | |
| Formula Milk | 0.38 | 0.67 | 0.00 | 0.27 | 0.45 | 0.69 | 0.40 | 0.00 | 0.44 | 0.01 | 0.44 | |
| Parenteral Nutrition | 0.21 | 0.17 | 0.18 | 0.13 | 0.20 | 0.19 | 0.00 | 0.74 | 0.14 | 0.14 | 0.59 | |
| Glucose Electrolyte | 0.17 | 0.12 | 0.10 | 0.10 | 0.12 | 0.11 | 0.17 | 0.57 | 0.11 | 0.08 | 0.49 | |
Clusters are ranked according to population size (1 = largest cluster). All variables are reported as median values, except for proportions which are means. Proportions are reported as integer values, therefore 0% corresponds to < 0.5% and the sum across the geographical location variables may not be exactly 100%. Nutritional variables encode the nutritional patterns through the proportion of days a nutritional component was given over the total time in care (between 0 and 1).
Figure 2Composition of the 10 most common nutritional patterns ordered by size from largest to smallest. Nutritional patterns are encoded as the proportion of days over total time in care a nutritional component was given (0 to 1). (a) Histograms of proportion of infants receiving a nutritional component (one per column) over a given proportion of days. Shading = one-dimensional histograms (11 bins-width = 0.1-including 0). Colour intensity = density of infants fed with that proportion over time in care and is maximal if all infants fall into a single bin. Cross (“x”) = mean proportion of days averaged across all infants in the cluster. (b) Timeline of nutritional components administration over normalised LoS (20 equal bins), computed as average temporal sequences of nutritional events over time in care. Maximal bin value (and colour intensity) attained if all infants in the cluster received a particular nutritional component every day in that time bin.
Figure 3Aggregation of nutritional clusters based on admission variables identifies 9 admission groups to further assess association with geographical location and outcome variables. Hierarchical clustering and PCA performed on the 24 most populated nutritional clusters (covering 95% of the whole cohort) using admission variables (shown in (a)) to characterize infants in each cluster. Nutritional clusters indexed according to size rank. The 9 admission groups, defined by visual inspection, are encoded with letters A to I on the hierarchical clustering dendrogram (a) and on the PCA projection plot (b). (a) Admission z-scores values (used for clustering), along with outcome variables (z-scores), nutritional and geographical location variables (mean proportions, Table 2). Colour coding for admission and outcome variables uses green for “favourable” and purple for “unfavourable” values (e.g. a high z-score value is unfavourable for NEC while favourable for MM at discharge). Feature columns in admission, outcome and geographical location variables are also reordered based upon hierarchical clustering. (b) Projection of the nutritional clusters (plotted as points with cluster index as in (a)) along the two main principal components returned by PCA (PC1 and PC2, explaining 94% of the variance). Admission groups A to I from (a) are reproduced on the PCA plot with ellipses whose axes are based on the covariances of the group’s coordinates along PC1 and PC2. (c) Contributions of individual admission variables to the first two principal components (PC1 and PC2). PC1 is driven by gestational age, resuscitation and Apgar scores, while PC2 is driven by gestational age and birth weight z-score.
Association between nutritional patterns and 2 clinical outcomes.
| Coefficient | 95% Confidence Interval | ||
|---|---|---|---|
| Intercept | |||
| BW z-score | < | ||
| Antenatal Steroids | 0.15 | [− 0.07, 0.37] | 0.185 |
| Apgar 1 min | 0.00 | [− 0.04, 0.05] | 0.861 |
| IMD decile | − 0.01 | [− 0.04, 0.02] | 0.588 |
| Smoking in Pregnancy | 0.05 | [− 0.09, 0.19] | 0.504 |
| Sex | − 0.06 | [− 0.18, 0.06] | 0.301 |
| Gestational Age | |||
| Midlands and east | − 0.09 | [− 0.27, 0.1] | 0.364 |
| North | − 0.09 | [− 0.27, 0.09] | 0.306 |
| South | 0.02 | [− 0.16, 0.21] | 0.796 |
| Nutritional Treatment | |||
| Z1 | 0.03 | [− 0.09, 0.15] | 0.641 |
| Z2 | − 0.04 | [− 0.16, 0.07] | 0.482 |
| Z3 | 0.00 | [− 0.1, 0.11] | 0.940 |
| Intercept | − 0.15 | [− 0.36, 0.05] | 0.143 |
| BW z-score | < | ||
| Antenatal Steroids | 0.27 | [− 0.49, 1.03] | 0.491 |
| Apgar 1 min | − 0.09 | [− 0.24, 0.07] | 0.263 |
| IMD decile | 0.00 | [− 0.15, 0.15] | 0.977 |
| Smoking in Pregnancy | 0.05 | [− 0.48, 0.58] | 0.862 |
| Sex | |||
| Gestational Age | < | ||
| Midlands and east | 0.02 | [− 0.62, 0.66] | 0.941 |
| North | 0.38 | [− 0.27, 1.02] | 0.254 |
| South | 0.42 | [− 0.19, 1.04] | 0.176 |
| Nutritional Treatment | < | ||
| Z1 | − 0.26 | [− 0.66, 0.15] | 0.216 |
| Z2 | 0.21 | [− 0.19, 0.61] | 0.309 |
| Z3 | − 0.10 | [− 0.54, 0.34] | 0.648 |
Regression coefficients, 95% confidence intervals and p values are reported for all tested covariates, and the three latent variables (Z1, Z2 and Z3) of the factor model that facilitate deconfounding. We excluded infants with any missing values for the studied variables and used London as the reference category for geographical location. (a) Linear regression on weight z-score at 36 weeks postmenstrual age (W36 z-score), comparing clusters 13 (n = 371) and cluster 16 (n = 339). (b) Logistic regression on BPD, comparing cluster 17 (n = 376) to 18 (n = 283). Coefficients in (a) measure effect on the outcome by a change of one unit in the corresponding covariate. Coefficients in (b) measure change in log-odds ratio of the outcome by a change of one unit in the corresponding covariate. “Nutritional Treatment” = binary covariate encoding inclusion in one cluster versus the other. Significant (p < 0.05) covariates are highlighted in bold/italic if inducing a negative/positive effect on the outcome variable when the associated covariate is quantitatively positive.