| Literature DB >> 29980727 |
Youzhou Tang1, Weiru Zhang2, Minfeng Zhu3, Li Zheng1, Lingli Xie4, Zhijiang Yao3, Hao Zhang5, Dongsheng Cao6, Ben Lu7.
Abstract
Effective treatment of lupus nephritis and assessment of patient prognosis depend on accurate pathological classification and careful use of acute and chronic pathological indices. Renal biopsy can provide most reliable predicting power. However, clinicians still need auxiliary tools under certain circumstances. Comprehensive statistical analysis of clinical indices may be an effective support and supplementation for biopsy. In this study, 173 patients with lupus nephritis were classified based on histology and scored on acute and chronic indices. These results were compared against machine learning predictions involving multilinear regression and random forest analysis. For three class random forest analysis, total classification accuracy was 51.3% (class II 53.7%, class III&IV 56.2%, class V 40.1%). For two class random forest analysis, class II accuracy reached 56.2%; class III&IV 63.7%; class V 61%. Additionally, machine learning selected out corresponding important variables for each class prediction. Multiple linear regression predicted the index of chronic pathology (CI) (Q2 = 0.746, R2 = 0.771) and the acute index (AI) (Q2 = 0.516, R2 = 0.576), and each variable's importance was calculated in AI and CI models. Evaluation of lupus nephritis by machine learning showed potential for assessment of lupus nephritis.Entities:
Mesh:
Year: 2018 PMID: 29980727 PMCID: PMC6035173 DOI: 10.1038/s41598-018-28611-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Patient characteristics, stratified by histological classification.
| Characteristic | Class | Total (n = 173) | ||
|---|---|---|---|---|
|
|
|
| ||
| Mean age, yr | 27.49 ± 13.26 | 26.38 ± 13.65 | 28.27 ± 13.83 | 27.12 ± 13.5 |
| Female | 84.7# | 67.9 | 75.8 | 75.1 |
| Fever | 23.7 | 25.9 | 24.2 | 24.9 |
| Photosensitivity | 18.6 | 19.8 | 30.3 | 21.4 |
| Psilosis | 20.3 | 25.9 | 9.1 | 20.8 |
| High blood pressure | 11.8* | 29.6* | 21.2 | 22 |
| Arthralgia | 47.5 | 33.3 | 33.3 | 38.2 |
| Edema | 44.1 | 46.9 | 48.5 | 46.2 |
| Erythra | 35.6* | 52.9 | 51.6 | 46.9 |
| OB (+) | 54.2 | 63 | 54.5 | 58.4 |
| Raynaud phenomenon | 6.8 | 11.1 | 9.1 | 9.2 |
| Urinary protein(+) | 83.1 | 85.2 | 87.9 | 85 |
| Urinary erythrocytes(+) | 35.6 | 54.3 | 51.5 | 47.4 |
| WBC count (109/L) | 7.59 ± 3.89 | 6.86 ± 4.02 | 7.94 ± 7.56 | 7.31 ± 4.85 |
| PLT count (109/L) | 219.36 ± 108.22*# | 167.63 ± 74.55* | 178.53 ± 86.1 | 187.35 ± 92 |
| BUN (mmol/L) | 6.79 ± 5.1 | 8.73 ± 7.12 | 7.84 ± 5.49 | 7.9 ± 6.22 |
| Cr (μmol/L) | 104.28 ± 132.81 | 111.54 ± 107.96 | 93.14 ± 61.02 | 105.55 ± 110.04 |
| Uric acid (μmol/L) | 335.31 ± 112.94*# | 411.03 ± 125.4* | 364.12 ± 118.7 | 376.26 ± 133.77 |
| Serum C3 (103 g/l) | 691.19 ± 253.42* | 507.47 ± 301.17*# | 755.85 ± 346.2*※ | 617.5 ± 312.05 |
| SSB(+) | 11.8 | 8.6 | 12.1 | 10.4 |
| SLEDAI | 12.71 ± 5.63 | 14.81 ± 6.71* | 12.61 ± 5.62 | 13.68 ± 6.22 |
| TIL | 3.05 ± 1.33*# | 3.81 ± 1.68 | 3.94 ± 1.64 | 3.58 ± 1.6 |
| NAG (U/L) | 12.7 ± 3.91*# | 16.33 ± 1.77 | 17.55 ± 3.01* | 15.33 ± 3.77 |
| AI | 4.8 ± 2.93 | 7.28 ± 2.62 | 5.73 ± 3.16 | 6.13 ± 3.04 |
| CI | 0.9 ± 1.16 | 1.74 ± 2.09 | 1.79 ± 1.87 | 1.46 ± 1.82 |
| eGFR (ml/min) | 105.49 ± 66.05 | 102.52 ± 73.92 | 98.21 ± 45.35 | 102.71 ± 66.35 |
| Serum C4 (103 g/l) | 145.1 ± 71.45# | 118.03 ± 74.14* | 136.32 ± 72.34 | 130.75 ± 73.51 |
| Sm(+) | 16.9 | 24.7 | 21.2 | 21.3 |
| SSA(+) | 49.2 | 44.4 | 51.5 | 47.4 |
| nRNP(+) | 15.3 | 18.5 | 6.1 | 15 |
| dsDNA(+) | 22*# | 56.8*※ | 18.2* | 37.6 |
| Scl-70(+) | 5.1 | 1.2 | 0 | 2.3 |
| Jo-1(+) | 3.4 | 1.2 | 0 | 1.7 |
The data included 173 samples. According to treatment differences, we defined three clusters of pathological classes: cluster 1 (class II), cluster 2 (class III or IV, including V combined with III or IV) and cluster 3 (pure class V). Serum C3: serum complement 3, Serum c4: serum complement 4, dsDNA: double-stranded DNA, SLEDAI: systemic lupus erythematosus Disease Activity Index, Cr: creatinine, SSB: anti-Sjogren syndrome B antibody, eGFR: estimated glomerular filtration rate, nRNP: U1-RNP antibody, OB: stool occult blood, WBC count: white blood cell count, PLT count: platelet count, TIL: tubulointerstitial lesion, NAG: urinary N-acetyl-beta-d-glucosaminidase isoenzyme, AI: renal biopsy acute index, CI: renal biopsy chronic index. Values are expressed by % or mean ± SD. *p < 0.05, comparing to the other two groups, #p < 0.05, class II vs class III&IV, ※p < 0.05, class III&IV vs class V, *p < 0.05, class II vs class V (chi-square test was used for classified variables, ANOVA test was used for continuous variables).
Figure 1Random forest algorithms for class prediction and variable selection. (a) Random forest schematic diagram showing how it worked. The forest consists of 1000 decision trees. In each tree, samples were classified into two “leaf nodes” by a randomly chosen variable, continue the dividing process using different variables randomly until all the samples were classified into three clusters. (b) The table showed three classification random forest prediction accuracy results on total and each cluster of classes; The right diagram showed the prediction model importance of four main variables: urinary NAG enzyme, Cr, serum C3 levels and uric acid level. (c) Two-classification random forest prediction accuracy were listed in the table. Each class has its unique panel of specific variables, different variables’ importance were shown in diagrams.
Figure 2Multiple linear regression algorithms to prediction AI and CI. (a) Multiple linear regression models for AI and CI. Using this statistical method, multiple variables were integrated and constructed equations for AI and CI. In multiple linear regression models of AI and CI, sex: 1 (male) or 2 (female); fever: 0 (absent) or 1 (present) (likewise for edema, arthralgia, OB, SSB, and U1-RNP (nRNP); and urinary erythrocytes (ERY): 0 (none), 1 (+), 2 (++), 3 (+++). platelet (PLT) count (109/L), urinary NAG enzyme level (U/L) and BUN (mmol/L) were treated as continuous variables. For each equation, n (sample numbers), f score and p (variable numbers) were listed. (b) Validations on models: Diagrams showed prediction models’ fitness to biopsy pathological results using five-fold cross validation; (c) Fitting and five-fold cross validation table to predict AI and CI showed satisfying results on models for CI (Q2 = 0.746, R2 = 0.771) and AI (Q2 = 0.516, R2 = 0.576). Q2 reflect test set validation result and R2 reflected validation on itself (divide model data into 5 even part, used 1/5 to test models).
Figure 3Variable importance in AI and CI prediction. Variable importance for AI (a) and CI (b). Based on Fig. 2 models for AI and CI prediction, the diagrams shows each variable’s influence on models. We used vector cosine similarity to evaluate SLEDAI’s relation with other variables in equations and it showed that SLEDAI is comparably an independent variable in AI and CI equations.