Tingyu Chen1, Eryu Xia2, Tiange Chen3, Caihong Zeng1, Shaoshan Liang1, Feng Xu1, Yong Qin2, Xiang Li3, Yuan Zhang2, Dandan Liang1, Guotong Xie4, Zhihong Liu5. 1. National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, 210002 Jiangsu, China. 2. IBM Research - China, Beijing, China. 3. Ping An Healthcare Technology, 9F Building B, PingAn IFC, No.1-3 Xinyuan South Road, Beijing 100027, China. 4. Ping An Healthcare Technology, 9F Building B, PingAn IFC, No.1-3 Xinyuan South Road, Beijing 100027, China. Electronic address: xieguotong@pingan.com.cn. 5. National Clinical Research Center of Kidney Diseases, Jinling Hospital, Nanjing University School of Medicine, Nanjing, 210002 Jiangsu, China. Electronic address: liuzhihong@nju.edu.cn.
Abstract
BACKGROUND: Although IgA nephropathy (IgAN), an immune-mediated disease with heterogeneous clinical and pathological phenotypes, is the most common glomerulonephritis worldwide, it remains unclear which IgAN patients benefit from immunosuppression (IS) therapy. METHODS: Clinical and pathological data from 4047 biopsy-proven IgAN patients from 24 renal centres in China were included. The derivation and validation cohorts were composed of 2058 and 1989 patients, respectively. Model-based recursive partitioning, a machine learning approach, was performed to partition patients in the derivation cohort into subgroups with different IS long-term benefits, associated with time to end-stage kidney disease, measured by adjusted Kaplan-Meier estimator and adjusted hazard ratio (HR) using Cox regression. FINDINGS: Three identified subgroups obtained a significant IS benefits with HRs ≤ 1. In patients with serum creatinine ≤ 1·437 mg/dl, the benefits of IS were observed in those with proteinuria > 1·525 g/24h (node 6; HR = 0·50; 95% CI, 0·29 to 0·89; P = 0·02), especially in those with proteinuria > 2·480 g/24h (node 8; HR = 0·23; 95% CI, 0·11 to 0·50; P <0·001). In patients with serum creatinine > 1·437 mg/dl, those with high proteinuria and crescents benefitted from IS (node 12; HR = 0·29; 95% CI, 0·09 to 0·94; P = 0·04). The treatment benefits were externally validated in the validation cohort. INTERPRETATION: Machine learning could be employed to identify subgroups with different IS benefits. These efforts promote decision-making, assist targeted clinical trial design, and shed light on individualised treatment in IgAN patients. FUNDING: National Key Research and Development Program of China (2016YFC0904103), National Key Technology R&D Program (2015BAI12B02).
BACKGROUND: Although IgA nephropathy (IgAN), an immune-mediated disease with heterogeneous clinical and pathological phenotypes, is the most common glomerulonephritis worldwide, it remains unclear which IgANpatients benefit from immunosuppression (IS) therapy. METHODS: Clinical and pathological data from 4047 biopsy-proven IgANpatients from 24 renal centres in China were included. The derivation and validation cohorts were composed of 2058 and 1989 patients, respectively. Model-based recursive partitioning, a machine learning approach, was performed to partition patients in the derivation cohort into subgroups with different IS long-term benefits, associated with time to end-stage kidney disease, measured by adjusted Kaplan-Meier estimator and adjusted hazard ratio (HR) using Cox regression. FINDINGS: Three identified subgroups obtained a significant IS benefits with HRs ≤ 1. In patients with serum creatinine ≤ 1·437 mg/dl, the benefits of IS were observed in those with proteinuria > 1·525 g/24h (node 6; HR = 0·50; 95% CI, 0·29 to 0·89; P = 0·02), especially in those with proteinuria > 2·480 g/24h (node 8; HR = 0·23; 95% CI, 0·11 to 0·50; P <0·001). In patients with serum creatinine > 1·437 mg/dl, those with high proteinuria and crescents benefitted from IS (node 12; HR = 0·29; 95% CI, 0·09 to 0·94; P = 0·04). The treatment benefits were externally validated in the validation cohort. INTERPRETATION: Machine learning could be employed to identify subgroups with different IS benefits. These efforts promote decision-making, assist targeted clinical trial design, and shed light on individualised treatment in IgANpatients. FUNDING: National Key Research and Development Program of China (2016YFC0904103), National Key Technology R&D Program (2015BAI12B02).
IgA nephropathy is the most common primary glomerulonephritis and an important cause of end-stage kidney disease worldwide, especially in Asian regions. Although it is considered an immune-mediated disease with heterogeneous phenotypes clinically and pathologically, which IgA nephropathypatients will benefit from immunosuppression treatment and how to identify these patients remains unclear. Previous clinical trials regarding IgA nephropathy treatment decisions recruited patients by merely relying on simplistic categories of clinical risk factors rather than a comprehensive consideration of both clinical and pathological characteristics, possibly leading to inaccurate risk stratification and undetermined treatment effectiveness. Some recently developed machine learning methods can learn from data sets and be used to identify patient subgroups with differential responses to treatment.
Added value of this study
A large national collaboration data set consisting 4047 IgA nephropathypatients from 24 renal centres was collected. Subgroups with different immunosuppression treatment benefits were identified automatically using a machine learning method in IgA nephropathypatients based on a broad spectrum of clinical and pathological characteristics. Potential subgroups of patients benefiting from immunosuppression therapy on reducing real-world kidney decisive outcomes were identified in 2058 IgA nephropathypatients from 18 renal centres and externally validated in 1989 IgA nephropathypatients from 7 renal centres.
Implications of all the available evidence
Our study suggested that comprehensive consideration of renal function, proteinuria and renal histological characteristics would serve as indicators for the selection of immunosuppression therapy in IgA nephropathypatients. These efforts promote decision-making, assist future targeted clinical trials, and shed light on individualised treatment in IgA nephropathypatients. We also illustrated the power of machine learning methods in solving medical puzzles, discovering insight from real-world data and facilitating precision medicine in patients with kidney diseases.Alt-text: Unlabelled box
Introduction
IgA nephropathy (IgAN) is the most common primary glomerulonephritis worldwide, especially in Asian regions, where IgAN accounts for approximately 50% of all cases [1]. IgAN is characterised by the deposition of IgA in the mesangial area of the glomeruli, accompanied by various clinical manifestations and histopathological lesions. Up to one in four patients suffer from end-stage kidney disease (ESKD) within 20 years from diagnosis and require renal transplantation [2].Immunosuppression (IS) therapy is a common treatment choice for immune-mediated kidney disease [3]. Although IgAN has been recognised as an immune-mediated disease, the pathogenesis of IgAN has not yet been fully elucidated and the efficacy of IS in patients with IgAN remains controversial [2,4]. Clinical trials have been conducted to address the effects of IS therapy in IgANpatients with “high-risk” factors including renal function, proteinuria and hypertension [5,6]. The STOP-IgAN randomised controlled trial (RCT) illustrated that IS reduced proteinuria but had no effects on the slope of the eGFR decline or ESKD after 3 years [7]. Although the TESTING trial showed that oral corticosteroids had potential renal benefit, definitive conclusions about their efficacy could not be drawn since the trial was terminated early owing to serious side effects [8].Notably, kidney histological characteristics were not taken into consideration when recruiting the patients into these two trials [9]. Determining the disease status and making treatment decisions based on a simplistic categorisation of clinical risk factors, such as renal function and proteinuria, with no consideration of kidney histological characteristics may have resulted in insufficient and unpredictable results as mentioned in the two trials [7,8,10,11]. Incorporating pathological features into the evaluation indicators may be necessary to group IgANpatients and guide treatment more precisely.Recent advancements in machine learning provide an approach to identify IgANpatients who may potentially benefit from IS therapy. We can employ machine learning methods to jointly consider clinical and pathological features, model their high-order interactions, and effectively identify subgroups who may potentially benefit from IS therapy. Model-based recursive partitioning [12], a machine learning method used for the automated identification of subpopulations, has recently been applied in medicine for discovering patient subgroups with differential responses to treatment [13,14].This study is the first to use machine learning to explore more individualised IS treatment selection in IgANpatients. In our study, a large national collaboration data set, which consisted of 4047 IgANpatients from 24 renal centres, was analysed. We successfully identified and externally validated subgroups of patients who may potentially benefit from IS therapy based on a broad spectrum of clinical and pathological features, with direct consideration of ESKD (the decisive kidney outcome), explicit modelling of high-order feature interactions, and automatic selection of cut-off values in IgANpatients. The findings of this study can provide insights into data-informed decision making in treating IgANpatients and the inclusion of targeted patients in future clinical trials, thus shedding light on individualised therapy in IgANpatients.
Materials and methods
Study cohort
The study included the Nanjing cohort (n = 1026), which consisted of patients from 18 centres associated with Jinling Hospital in Nanjing (China) between January 1997 and June 2010; the Chinese Registry of Prognostic Study of IgA Nephropathy (CRPIGA) cohort (n = 2155), which included patients from 7 centres before December 2015; and the single-centre Nanjing Glomerulonephritis Registry (NGR) cohort (n = 2354), which was retrieved consecutively from the Nanjing Glomerulonephritis Registry (Jinling Hospital) between January 2006 and June 2011. The Nanjing and CRPIGA cohorts were collected independently for research purposes, and details have been previously described [15,16]. In the current study, we included patients with biopsy-proven primary IgAN who were 18 years or older with follow-up exceeding 12 months, estimated glomerular filtration rate (eGFR) ≥ 30 mL/min/1·73 m2, and proteinuria ≥ 0·5 g/d. Patients who progressed to ESKD within the first 12 months of follow-up were also included. Patients with secondary causes or those with comorbid conditions were excluded.The derivation cohort included the Nanjing cohort and NGR cohort I (data from January 2006 to June 2009), and the validation cohort consisted of the CRPIGA cohort and NGR cohort II (data from July 2009 to June 2011). The Department of Nephrology at Ruijin Hospital participated both in the derivation and validation cohorts, and the patients did not overlap.This study followed the tenets of the Declaration of Helsinki, and was approved by the ethics committee of Jinling Hospital (2010NLY-023), Nanjing, China. Written informed consent was obtained from all study participants.
Definitions of variables
The baseline data, including demographic and disease characteristics, and clinical and pathological variables (Table S1) were collected within 1 month of the renal biopsy. In the derivation cohort, blood pressure, serum creatinine, and proteinuria during follow-up were also recorded. Serum creatinine was measured using enzymatic methods calibrated to the National Institute of Standards and Technology Liquid Chromatography Isotope Dilution Mass Spectrometry method.The definition of therapy was consistent with previous retrospective studies of IgAN [17,18]. IS therapy (corticosteroids, cyclophosphamide and mycophenolate mofetil) and renin-angiotensin system blockade (any exposure to angiotensin-converting enzyme inhibitors, angiotensin receptor blockers or both, RASB) prior to biopsy were assessed. IS therapy and RASB use at biopsy and during the follow-up time were reported separately. The patients’ treatment began immediately after the biopsy unless there were contraindications or adverse reactions. Patients who received corticosteroids, cyclophosphamide or mycophenolate mofetil were defined as having received standard-of-care IS therapy in this study. Corticosteroids and cyclophosphamide were recommended by the KIDGO guidelines for IgAN [19], and mycophenolate mofetil was reported to lower proteinuria and ameliorate histopathological changes in RCTs, especially in Chinese patients [20,21]. With reference to previous studies [17,18], IS treatment was reported according to the intention to treat principle regardless of the actual duration of therapy. Due to the limitation of the retrospective study design, the dosage of IS therapy was not collected.The updated Oxford Classification [22] was applied for scoring kidney pathological lesions and was defined as follows: mesangial hypercellularity (M0, mesangial score ≤0·5; M1, mesangial score >0·5), endocapillary hypercellularity (E0, absent; E1, present), segmental glomerulosclerosis (S0, absent; S1, present), tubular atrophy/interstitial fibrosis (T0, ≤25%; T1, 26%–50%; T2, >50%), and cellular/fibrocellular crescents (C0, absent; C1, present in at least 1 glomerulus; C2, in>25% of glomeruli). In the Nanjing cohort and the NGR cohort, renal biopsies were scored centrally. In the CRPIGA cohort, renal biopsies were scored by local pathologists who were experienced in the classification and were blinded to the other clinical data. The presence of necrosis and the condition of the arterioles were also noted in the derivation analysis for comprehensive multivariate adjustment.The primary clinical outcome was ESKD, a long-term kidney hard outcome, defined as eGFR<15 mL/min/1·73 m2 for ≥ 3 months, initiation of dialysis or transplantation. Eligible patients were followed from the time of biopsy until the earliest of any censoring event (patient left data set or transferred out, death, study end date, or most recent data upload from practice) or an outcome event. We also addressed secondary outcomes including potential surrogate end points for chronic kidney disease (40% decline in eGFR and 30% decline in eGFR) [23] and the most used surrogate outcome in previous retrospective studies of IgAN (50% decline in eGFR) [17].
Statistical analysis
Data processing and description
A total of 40 variables (Table S1), including demographic characteristics, disease and treatment characteristics, clinical characteristics, and pathologic variables were included in the derivation dataset. A total of 17 variables were collected for the validation cohort, including the splitting factors identified from the derivation analysis and other characteristics (Table S1). The overall rates of missing data were 0·78% and 2·24% in the derivation and validation data sets, respectively (the rates of missing data for each variable are shown in Table S2a and Table S2b). Missing value imputation was conducted as follows: (1) missing clinical history (including personal disease history, treatment history, and family history) was imputed as ‘not having the history’; (2) missing clinical indicators were imputed with random numbers generated from the distribution; and (3) missing mean arterial pressure was calculated from imputed systolic blood pressure and diastolic blood pressure.Continuous variables were summarised as the mean (SD) or median (interquartile range [IQR]) and compared using the t-test or Mann–Whitney test. Categorical variables were summarised as numbers (percentages) and compared using the chi-squared test or Fisher's exact test.
Benefiting cohort identification
Recursive partitioning is a classical method for multivariate analysis, which, by creating a decision tree, divides a population into subpopulations that have similar values of the response variable. Model-based recursive partitioning [14], a variant of recursive partitioning, is a machine learning method that embeds recursive partitioning into parametric statistical model estimation and variable selection. Model-based recursive partitioning is applicable when the response to a treatment is not homogenous in the general cohort; it divides patients into groups such that the responses to the treatment are homogenous within each subcohort and are different between the subcohorts. This method was applied in a previous study to analyse the effect of riluzole on amyotrophic lateral sclerosis [24].Model-based recursive partitioning was conducted with the following basic steps: (1) fit a model to a dataset (all patients or a ‘node’); (2) test for parameter instability, where for each candidate partitioning variable and each of its possible partitioning criteria, whether splitting the node into two segments would result in a locally well-fitting model in each of the segments is tested; (3) split the node with respect to the variable and partitioning criteria associated with the highest instability; and (4) repeat the procedure in each of the daughter nodes, until the termination condition is met.In this study, we employed model-based recursive partitioning for patient subgroup identification with a combination of clinical and pathological features. We defined the Cox regression model as the parametric model, the time to the clinical outcome (ESKD) as the response variable, and the IS treatment as the regressor. We defined the other 39 variables (Table S1) as candidate partitioning variables, which were used to partition the dataset. We applied the model-based recursive partitioning implementation in the R package ‘partykit’ [25] using negative log-likelihood as the objective function. The termination condition was set such that the maximum depth of the tree was 4 and the least number of patients in a subgroup was 50.
Evaluation of IS treatment benefits
In each patient subgroup, the long-term benefits of IS treatment associated with reducing the clinical outcome was evaluated using (1) univariate and multivariate Cox regression; and (2) the Kaplan-Meier estimator and log-rank test adjusted with inverse probability of treatment weighting (IPTW). For both evaluation approaches, the propensity score was calculated to measure the probability of treatment and used to reduce the confounding effects [26,27]. The propensity score for each patient was the fitted value from a multivariate logistic regression model using the IS treatment as the dependant variable and the other variables in Table S1 as independent variables [28]. In the univariate regression scenario, a model using the disease outcome as the response variable and IS as the only regressor was fitted. The hazard ratio (HR), 95% confidence interval (CI), and P value associated with the IS variable were reported. In the multivariate regression scenario, the disease outcome was used as the response variable, with IS treatment and the propensity score as the regressors. The adjusted HR, 95% CI, and P value associated with the IS treatment variable were reported. Adjusted Kaplan-Meier estimation and log-rank test were conducted as previously described [29], [30], [31] using the R package ‘IPWsurvival’.
Model validation analysis
External validation was performed to validate the findings from the derivation analysis. The validation cohort was partitioned into separate subsets following the same decision rules identified in the derivation analysis. In each subset, both the unadjusted hazard ratio and adjusted hazard ratio were estimated to evaluate the IS treatment benefits using the same methods as in the derivation analysis.Further details of the Methods are provided in Item S1 and Item S2.SPSS 22.0 software (IBM Corporation, Armonk, NY, USA) and R programming software (version 3.4.1) were used for the statistical analysis. All P values were two-tailed, and values ≤ 0·05 were considered statistically significant.
Results
Derivation analysis
There were 3380 patients in the combined Nanjing cohort and NGR cohort I (from January 2006 to June 2009), of whom 2058 met the inclusion criteria and were included in the derivation cohort (Fig. 1), whose characteristics are shown in Table 1. The differences in characteristics between patients treated with IS and those without in the derivation cohort are shown in Table S3.
Fig. 1
Enrolment of patients for the derivation and validation cohorts.
The NGR cohort I included patients retrieved consecutively from the Nanjing Glomerulonephritis Registry from January 2006 to June 2009, and the NGR cohort II included patients from July 2009 to June 2011.
NGR, Nanjing Glomerulonephritis Registry; CRPIGA, Chinese Registry of Prognostic Study of IgA Nephropathy; eGFR, estimated glomerular filtration rate; IgAN, immunoglobulin A nephropathy; PAS, periodic acid–Schiff.
Table 1
Description of the derivation and validation cohorts.
Enrolment of patients for the derivation and validation cohorts.The NGR cohort I included patients retrieved consecutively from the Nanjing Glomerulonephritis Registry from January 2006 to June 2009, and the NGR cohort II included patients from July 2009 to June 2011.NGR, Nanjing Glomerulonephritis Registry; CRPIGA, Chinese Registry of Prognostic Study of IgA Nephropathy; eGFR, estimated glomerular filtration rate; IgAN, immunoglobulin A nephropathy; PAS, periodic acid–Schiff.Description of the derivation and validation cohorts.Values are numbers (percentages) unless stated otherwise. SBP, systolic blood pressure; DBP, diastolic blood pressure; MAP, mean arterial pressure; eGFR, estimated glomerular filtration rate; RAS, renin-angiotensin system; ESKD, end-stage kidney disease.In the derivation cohort, 857 (41.7%) patients had a follow-up less than 5 years, and the median number of follow-up measurements was 13 (IQR, 8 to 20). The slope of the eGFR in the follow-up period was −2·52 (SD, 9·12) mL/min/1.73 m2 per year. Blood pressure was well controlled during the follow-up time; the time-averaged mean arterial pressure value was 94·6 (SD, 9·5) mmHg (126·8 [12·9]/78.3 [8,9] mmHg). The median time-averaged proteinuria was 0·7 (IQR, 0·4 to 1·1) g/24 hr. Overall, the median number of serum creatinine, proteinuria, and blood pressure measurements were 13 (IQR, 8 to 20), 13 (8 to 19), and 10 (6 to 16), respectively.The partitioning results are organised in a partitioning tree in Fig. 2, where each node in the tree represents a patient subgroup satisfying the conditions shown on the branches. The IS benefits for patients in each subgroup, as stratified by the partitioning tree, are shown in Table 2. Six features were selected as splitting factors, including serum creatinine (SCr), urine protein, serum albumin (ALB), hypertension, diastolic blood pressure (DBP), and C in the Oxford Classification, contributing to 15 nodes (including internal nodes and leaf nodes). Five subgroups were identified as having a significant long-term treatment benefit at a significance level of 0·05: nodes 6, 8, and 12 had a decreased hazard of ESKD and nodes 11 and 15 had an increased hazard of ESKD.
Fig. 2
Model-based recursive partitioning results.
Partitioning results are organised as a model-based recursive partitioning tree, where upper level nodes are split into child nodes based on certain branching criteria, thus form subgroups. Summary statistics of the nodes and branching criteria are included in the model-based recursive partitioning tree. Summary statistics of each node are shown in a grey box, which includes the node name, node size (number of patients in the node), and the hazard ratio (HR, with 95% confidence interval) for immunosuppression treatment after adjusting for confounding variables. The partitioning variable for each branching criterion is shown in a white box, with the criteria shown on the line connecting a parent node and its child node.
HR, hazard ratio; Oxford_C, presence of crescent (C1: present in at least 1 glomerulus; C2: present in > 25% of glomeruli); DBP, diastolic blood pressure.
Table 2
Immunosuppression long-term benefits comparison for patients in each subgroup stratified by model-based recursive partitioning in the derivation cohort.
Node
Sizea
Unadjusted
Adjustedb
coefficient
HR
95% CI
P Value
coefficient
HR
95% CI
P Value
1
2058
0·06
1·06
0·79 to 1·42
0·69
−0·15
0·86
0·62 to 1·20
0·37
2
1730
0·10
1·11
0·74 to 1·66
0·61
−0·45
0·63
0·40 to 1·01
0·06
3
1184
0·19
1·21
0·60 to 2·44
0·59
0·08
1·08
0·50 to 2·30
0·85
4
728
−0·17
0·85
0·23 to 3·08
0·80
−0·22
0·80
0·20 to 3·16
0·76
5
456
0·39
1·48
0·64 to 3·41
0·36
0·14
1·15
0·45 to 2·95
0·77
6
546
−0·39
0·68
0·41 to 1·12
0·13
−0·68
0·50
0·29 to 0·89
0·02
7
306
0·31
1·36
0·68 to 2·74
0·38
0·25
1·28
0·59 to 2·78
0·53
8
240
−1·29
0·27
0·13 to 0·57
<0·001
−1·46
0·23
0·11 to 0·50
< 0·001
9
328
0·04
1·05
0·69 to 1·59
0·83
0·07
1·08
0·67 to 1·72
0·76
10
105
−0·83
0·44
0·23 to 0·85
0·01
−0·17
0·84
0·37 to 1·95
0·69
11
51
0·06
1·06
0·39 to 2·90
0·91
1·56
4·78
1·19 to 19·24
0·03
12
54
−1·58
0·21
0·08 to 0·55
< 0·001
−1·24
0·29
0·09 to 0·94
0·04
13
223
0·47
1·59
0·92 to 2·77
0·10
0·41
1·50
0·86 to 2·63
0·16
14
99
−0·12
0·88
0·37 to 2·12
0·78
−0·13
0·88
0·37 to 2·12
0·78
15
124
1·04
2·82
1·37 to 5·83
< 0·001
0·94
2·56
1·23 to 5·33
0·01
HR, hazard ratio; CI, confidence interval.
Node size means number of patients in this node.
Adjusted by confounding variables.
Model-based recursive partitioning results.Partitioning results are organised as a model-based recursive partitioning tree, where upper level nodes are split into child nodes based on certain branching criteria, thus form subgroups. Summary statistics of the nodes and branching criteria are included in the model-based recursive partitioning tree. Summary statistics of each node are shown in a grey box, which includes the node name, node size (number of patients in the node), and the hazard ratio (HR, with 95% confidence interval) for immunosuppression treatment after adjusting for confounding variables. The partitioning variable for each branching criterion is shown in a white box, with the criteria shown on the line connecting a parent node and its child node.HR, hazard ratio; Oxford_C, presence of crescent (C1: present in at least 1 glomerulus; C2: present in > 25% of glomeruli); DBP, diastolic blood pressure.Immunosuppression long-term benefits comparison for patients in each subgroup stratified by model-based recursive partitioning in the derivation cohort.HR, hazard ratio; CI, confidence interval.Node size means number of patients in this node.Adjusted by confounding variables.Node 6, consisting of 546 patients, was characterised by having SCr ≤1·437 mg/dl and proteinuria >1·525 g/24 hr. In this group, IS was associated with favourable kidney survival (adjusted HR = 0·50; 95% CI, 0·29 to 0·89; P = 0·02). The benefits of IS were higher when patients had proteinuria > 2·480 g/24 hr (node 8), with an adjusted HR of 0·23 (95% CI, 0·11 to 0·50; P < 0·001). Therefore, the IS benefits increased with a higher level of proteinuria when patients had SCr ≤1·437 mg/dl.Of the patients with SCr > 1.437 mg/dl, IS was associated with better outcome in node 12 (adjusted HR=0·29; 95% CI, 0·09 to 0·94; P = 0·04), which consisted of 54 patients and was characterised by SCr > 1·437 mg/dl, ALB ≤ 35·95 g/L, and C > 0. Differing from node 12 only by crescent formation, node 11, a subgroup of 51 patients characterised by SCr > 1·437 mg / dl, ALB ≤ 35·95 g / L and C = 0, showed a lack of benefits from IS therapy (adjusted HR = 4·78; 95% CI,1·19 to 19·24; P = 0·03). Patients with SCr >1·437 mg/dl, ALB >35·95 g/L and DBP >83 mmHg (node 15) also demonstrated a lack of IS benefits (adjusted HR = 2·56; 95% CI, 1·23 to 5·33; P = 0·01). In other nodes in the partitioning tree, the adjusted IS benefits were not significant. Using the IPTW-adjusted Kaplan-Meier estimator, the adjusted survival curve without ESKD during follow-up was also better in the IS treatment group in three benefit nodes (node 6, node 8, and node 12) (Fig. 3) but not in other nodes (Fig. S1).
Fig. 3
IPTW-adjusted Kaplan-Meier curves without end-stage kidney disease according to immunosuppression treatment in benefit nodes.
Inverse probability of treatment weighting (IPTW)-adjusted Kaplan-Meier curves without end-stage kidney disease according to immunosuppression treatment in benefit nodes including node 6 (a; adjusted log-rank test P = 0·01), node 8 (b; adjusted log-rank test P = 0·007), and node 12 (c; adjusted log-rank test P = 0·007), which were stratified by model-based recursive partitioning; comparisons of curves were conducted by the adjusted log-rank test. The risk table below each figure shows the crude number of patients at risk (without adjustment of weights). IS, immunosuppression.
IPTW-adjusted Kaplan-Meier curves without end-stage kidney disease according to immunosuppression treatment in benefit nodes.Inverse probability of treatment weighting (IPTW)-adjusted Kaplan-Meier curves without end-stage kidney disease according to immunosuppression treatment in benefit nodes including node 6 (a; adjusted log-rank test P = 0·01), node 8 (b; adjusted log-rank test P = 0·007), and node 12 (c; adjusted log-rank test P = 0·007), which were stratified by model-based recursive partitioning; comparisons of curves were conducted by the adjusted log-rank test. The risk table below each figure shows the crude number of patients at risk (without adjustment of weights). IS, immunosuppression.Using the entire derivation cohort as a control, we compared its clinical and pathological characteristics to those in the subgroups with significant benefit including node 6 (n = 546), node 8 (n = 240), and node 12 (n = 54) (Table S4) and to those in subgroups lacking benefit, including node 11 (n = 51) and node 15 (n = 124) (Table S5).Sensitivity analyses were performed to assess the robustness of the findings using different imputation methods (Fig. S2 for relative node size, and Fig. S3 and Table S6 for IS treatment benefits), and the results from the benefit nodes were consistent with our findings. The long-term benefits were also internally validated using a bootstrap analysis, where the same treatment benefits were observed in each of the benefit nodes (Fig. S4). We also validated our results in various secondary outcomes (Table S7-S9), IS treatment reduces risk of all three secondary outcomes (30%, 40% and 50% decline in eGFR) in the benefit subgroups (node 6, 8 and 12), providing indirect evidence to our conclusions.
Validation analysis
There were 3734 patients in the combined CRPIGA cohort and NRG cohort II (from July 2009 to June 2011), of whom 1989 met the inclusion criteria and were included in the external validation cohort (Fig. 1), whose characteristics are shown in Table 1. The differences in characteristics between patients treated with IS and those without in the validation cohort are shown in Table S10.The external validation results were highly consistent with findings from the derivation analysis. In the validation cohort, significant long-term treatment benefits were observed in the same significant benefit subgroups (Table 3 and Fig. S5). In node 6, which consisted of 569 patients, IS was associated with favourable kidney survival (adjusted HR = 0·44; 95% CI, 0·19 to 0·99; P = 0·04). The benefits of IS were higher in patients in node 8, with an adjusted HR of 0·24 (95% CI, 0·09 to 0·66; P = 0·006). IS was also associated with better outcome in node 12 (adjusted HR = 0·53; 95% CI, 0·32 to 0·90; P = 0·02), which consisted of 75 patients.
Table 3
Immunosuppression long-term benefits validation for patients in each subgroup stratified by model-based recursive partitioning in the validation cohort.
Node
Sizea
Unadjusted
Adjustedb
coefficient
HR
95% CI
P Value
coefficient
HR
95% CI
P Value
1
1989
0·03
1·03
0·76 to 1·40
0·83
−0·23
0·79
0·58 to 1·09
0·15
2
1645
−0·43
0·65
0·38 to 1·12
0·12
−0·54
0·58
0·33 to 1·02
0·06
3
1076
−0·24
0·78
0·37 to 1·68
0·53
−0·16
0·85
0·39 to 1·87
0·69
4
836
−0·37
0·69
0·28 to 1·68
0·43
−0·25
0·78
0·31 to 1·96
0·60
5
237
0·01
1·01
0·49 to 2·11
0·97
−0·03
0·97
0·28 to 3·38
0·97
6
569
−0·83
0·43
0·20 to 0·95
0·04
−0·83
0·44
0·19 to 0·99
0·04
7
263
−0·01
0·99
0·28 to 3·51
0·99
−0·14
0·87
0·22 to 3·44
0·84
8
306
−1·55
0·21
0·08 to 0·58
0·002
−1·44
0·24
0·09 to 0·66
0·006
9
344
0·16
1·18
0·80 to 1·73
0·41
0·02
1·02
0·69 to 1·51
0·93
10
131
−0·24
0·79
0·44 to 1·39
0·41
−0·33
0·72
0·40 to 1·28
0·26
11
28
0·16
1·17
0·37 to 3·74
0·79
0·15
1·16
0·36 to 3·70
0·80
12
75
−0·58
0·56
0·32 to 0·97
0·04
−0·63
0·53
0·32 to 0·90
0·02
13
192
0·13
1·14
0·64 to 2·03
0·65
0·05
1·06
0·59 to 1·90
0·86
14
95
0·23
1·26
0·52 to 3·06
0·61
0·14
1·15
0·45 to 2·91
0·77
15
97
0·09
1·1
0·50 to 2·42
0·82
0·07
1·07
0·48 to 2·38
0·87
HR, hazard ratio; CI, confidence interval.
Node size means number of patients in this node.
Adjusted by confounding variables.
Immunosuppression long-term benefits validation for patients in each subgroup stratified by model-based recursive partitioning in the validation cohort.HR, hazard ratio; CI, confidence interval.Node size means number of patients in this node.Adjusted by confounding variables.
Discussion
In clinical practice, which IgANpatients should be treated with IS therapy and how to choose them rationally and objectively remains unclear. The treatment strategies for patients with IgAN are mainly based on the categorisation of proteinuria and renal function [19], which clearly has several limitations in this highly heterogeneous disease. A previous study [17] verified that corticosteroid benefits were associated with proteinuria levels and renal function status in IgANpatients, and a recent prospective multi-centre RCT [11] validated that IgANpatients with active proliferative lesions had a good response to IS therapy. Repeat-biopsy also revealed that active lesions were reversed after IS therapy. The above results suggest that comprehensive consideration of clinical indicators and renal pathological changes may provide better instruction for choosing which IgANpatients to treat with IS therapy.A large national cohort of 4047 IgANpatients with a broad spectrum of clinical and pathological data was collected in this study. Using model-based recursive partitioning for grouping, using the real-world decisive kidney outcome ESKD as the objective endpoint, and adjusting for a variety of confounding factors in the evaluation of long-term treatment benefits, we identified and externally validated patient subgroups who may potentially benefit from IS therapy. The benefits of IS increased with elevated proteinuria in patients with stable renal function (SCr ≤1·437 mg/dl). Of the patients with impaired renal function (SCr >1·437 mg/dl), those with high proteinuria and crescent formation in their renal tissue obtained significant IS benefits and favourable kidney outcome.The machine learning method can automatically divide patients into subgroups of different treatment responses based on relevant characteristics, thus reducing bias due to subjective human experience and resulting in more objective groups. The merits of our model-based recursive partitioning approach also include: (1) modelling higher-order interactions among covariates in a real world setting; (2) using a hypothesis-free data-driven approach to reduce bias and discover novel factors or interactions; and (3) employing statistical tests as partitioning criteria for continuous variables to reduce human impact (and thus minimizing the bias introduced) in the selection of cut-off values. Through this data-driven machine learning approach, nephrologists can decide on treatment based on the combination of clinical and pathological features in addition to the categorisation of clinical features [19] and may obtain a more individualised treatment plan for IgANpatients.Proteinuria is a strong risk factor for poor prognosis that has been widely recognised in IgANpatients [5,32]. The VALIGA study [17] suggested a graded benefit of corticosteroids according to proteinuria level, but the 1–3 g/24 h category remains a “grey zone.” Proteinuria alone is insufficient for grouping patients; thus, many studies incorporated renal function with proteinuria. A secondary investigation of the STOP-IgAN study [33] concluded that corticosteroids reduced proteinuria in IgANpatients with relatively well-preserved eGFR (>60 ml/min/1·73 m2), neither IS prevented eGFR loss in patients with lower baseline GFR (eGFR ≤ 60 ml/min/1·73 m2) even though they had high proteinuria. This result indicated that renal function status is also one of the important determinators of the response to IS in patients with IgAN. The influence of histological lesions on the above IS effect was hard to assess because these studies did not include renal pathology data. Coppo [34] summarised that IS treatment success occurred mostly in individuals who were highly proteinuric with better renal function. Most of these previous studies grouped patients to evaluate IS response based on cut-off values set empirically. Automatic detection of the subgroups that benefitted from IS and were associated with reduced real-world decisive kidney outcome (ESKD) and optimal cut-off values was fulfilled in this paper, making the results more robust and objective.Proteinuria can be caused by active glomerular lesions or chronic lesions, including glomerular sclerosis and tubular atrophy/interstitial fibrosis [35], which are difficult to distinguish, although the degree of proteinuria can provide some clue. In this study, kidney biopsies were scored according to the Oxford Classification [22] and active (cellular and/or fibrocellular) crescents, but not fibro crescents were included. Theoretically, active lesions require aggressive IS therapy, but severe chronic lesions do not [20,36,37]. However, doctors are not capable of identifying the specific type of lesions through proteinuria and SCr alone, and the combination with renal pathologic features can offer a more comprehensive evaluation, which is extremely important for guiding treatment [4,34]. In our study, nodes 11 and 12 were both proteinuric subgroups with impaired renal function but they displayed completely opposite responses to IS, which were parallel to their different pathological changes. Both subgroups had severe tubulointerstitial lesions, but node 12 had more active lesions (endocapillary hypercellularity and crescent formation) than the entire cohort, and node 12 showed a better IS therapy benefits than node 11. An international multi-centre study also supported that crescent lesions might be an indicator for IS therapy in patients with IgAN [38]. Therefore, in patients with impaired renal function, in addition to proteinuria and Scr, the histological features of the renal lesions can provide useful information to evaluate the long-term benefits of IS. ALB and DBP were included in many risk prediction models and were reported to be strongly associated with kidney outcomes in patients with IgAN [6,39,40]. In our analysis, ALB and DBP were also identified as variables associated with long-term benefits of IS therapy.Nodes 6 and 8 had relatively well-preserved renal function and marked proteinuria with advanced proliferative histological lesions (mesangial and endocapillary proliferative lesions and crescent formation). In contrast to the entire cohort, the patients in these two subgroups had less glomerular sclerosis and fewer tubulointerstitial lesions. Therefore, the IS benefits seen in these subgroups were expected.The strengths of the study include: (1) the large, national and multi-centre IgAN cohort; (2) external validation of the results; (3) the employment of a “hard outcome”, ESKD, as the study endpoint; (4) the first use of a combination of potential clinical and pathological features for patient grouping; (5) automated identification of subgroups benefitting from IS and high-order interactions between features using a more objective machine learning approach; (6) detailed covariate data that enabled comprehensive multivariate adjustment; and (7) multiple sensitivity analyses that support the robustness of the primary results. Additionally, RASBs were applied in the majority of patients in both the IS-treated and untreated groups, which shows the benefit of IS therapy in addition to RASB therapy. This study also had limitations. First, we were prevented from drawing causal conclusions due to the observational nature of this study. Although we adjusted for potential confounders, unobserved confounders may exist; randomised trials are needed to further support our findings. Second, the conclusions of this study were established on the basis of data acquired from a Chinese population, and the applicability of the results to other ethnic groups and regions still needs to be verified.This study was the first to identify and externally validate the potential subgroups of IgANpatients benefiting from IS therapy, as indicated by a reduction in a real-world decisive kidney outcome (ESKD), using machine learning method and comprehensively analysing renal function, proteinuria and histological lesions. While the findings of the study should be confirmed in future randomised trials because retrospective analyses alone are not sufficient to determine the treatment choice for patients with IgAN, and the potential adverse effects of IS should not be ignored, this study shed light on individualised therapy in IgAN.
Declaration of Competing Interest
The authors declare that they have no competing interests.
Authors: Jicheng Lv; Hong Zhang; Muh Geot Wong; Meg J Jardine; Michelle Hladunewich; Vivek Jha; Helen Monaghan; Minghui Zhao; Sean Barbour; Heather Reich; Daniel Cattran; Richard Glassock; Adeera Levin; David Wheeler; Mark Woodward; Laurent Billot; Tak Mao Chan; Zhi-Hong Liu; David W Johnson; Alan Cass; John Feehally; Jürgen Floege; Giuseppe Remuzzi; Yangfeng Wu; Rajiv Agarwal; Hai-Yan Wang; Vlado Perkovic Journal: JAMA Date: 2017-08-01 Impact factor: 56.272