| Literature DB >> 34028548 |
Ambarish M Athavale1, Peter D Hart1, Mathew Itteera1, David Cimbaluk2, Tushar Patel3, Anas Alabkaa2, Jose Arruda4, Ashok Singh1, Avi Rosenberg5, Hemant Kulkarni6.
Abstract
Importance: Interstitial fibrosis and tubular atrophy (IFTA) is a strong indicator of decline in kidney function and is measured using histopathological assessment of kidney biopsy core. At present, a noninvasive test to assess IFTA is not available. Objective: To develop and validate a deep learning (DL) algorithm to quantify IFTA from kidney ultrasonography images. Design, Setting, and Participants: This was a single-center diagnostic study of consecutive patients who underwent native kidney biopsy at John H. Stroger Jr. Hospital of Cook County, Chicago, Illinois, between January 1, 2014, and December 31, 2018. A DL algorithm was trained, validated, and tested to classify IFTA from kidney ultrasonography images. Of 6135 Crimmins-filtered ultrasonography images, 5523 were used for training (5122 images) and validation (401 images), and 612 were used to test the accuracy of the DL system. Kidney segmentation was performed using the UNet architecture, and classification was performed using a convolution neural network-based feature extractor and extreme gradient boosting. IFTA scored by a nephropathologist on trichrome stained kidney biopsy slide was used as the reference standard. IFTA was divided into 4 grades (grade 1, 0%-24%; grade 2, 25%-49%; grade 3, 50%-74%; and grade 4, 75%-100%). Data analysis was performed from December 2019 to May 2020. Main Outcomes and Measures: Prediction of IFTA grade was measured using the metrics precision, recall, accuracy, and F1 score.Entities:
Mesh:
Year: 2021 PMID: 34028548 PMCID: PMC8144924 DOI: 10.1001/jamanetworkopen.2021.11176
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure. Overall Analysis Pipeline
The entire process was partitioned into 4 main tasks (green boxes): preprocessing of images, segmentation of kidneys in preprocessed images, feature extraction from masked images, and image classification from feature maps. Subtasks within these main tasks are indicated with italic type. In the feature extraction and image classification phase, a test set of 612 images was generated and was never used in any training. This test set was used for a final independent evaluation of the overall analytical pipeline. US indicates ultrasonography; VGG19, Visual Geometry Group 19; XGBoost, extreme gradient boosting.
Characteristics of the Study Participants
| Characteristic | Participants, No. (%) | ||||
|---|---|---|---|---|---|
| IFTA 0%-24% (n = 159) | IFTA 25%-49% (n = 74) | IFTA 50%-74% (n = 41) | IFTA ≥75% (n = 78) | ||
| Age, mean (SD), y | 42.6 (13.4) | 52.9 (13.5) | 51.7 (13.7) | 49.8 (14.5) | <.001 |
| Sex | |||||
| Female | 93 (59.5) | 43 (57.1) | 21 (51.2) | 36 (48.2) | .36 |
| Male | 66 (40.5) | 31 (42.9) | 20 (48.8) | 42 (51.8) | |
| Race/ethnicity | |||||
| White | 70 (44.2) | 21 (31.2) | 10 (24.4) | 29 (38.6) | .16 |
| Black | 56 (35.6) | 39 (50.7) | 21 (51.2) | 37 (47.0) | |
| Asian | 11 (6.8) | 7 (9.1) | 2 (4.9) | 5 (6.0) | |
| Other | 22 (13.5) | 7 (9.1) | 8 (19.5) | 7 (8.4) | |
| Diabetes | 39 (23.9) | 32 (44.1) | 18 (43.9) | 40 (50.6) | <.001 |
| Hypertension | 101 (63.8) | 68 (90.9) | 40 (97.6) | 71 (91.6) | <.001 |
| Body mass index | 29.7 (7.0) | 29.8 (7.0) | 29.2 (7.1) | 29.8 (6.3) | .88 |
| Creatinine, mg/dL | 1.49 (1.84) | 2.21 (1.43) | 2.41 (0.85) | 4.17 (2.18) | <.001 |
| Estimated glomerular filtration rate, mL/min | 87.3 (51.0) | 41.0 (24.3) | 30.8 (11.2) | 19.2 (10.1) | <.001 |
| Proteinuria, g/g creatinine | 4.06 (3.9) | 4.92 (5.23) | 3.65 (2.76) | 4.90 (4.76) | .23 |
| Biopsy diagnosis | |||||
| Lupus nephritis | 62 (39.0) | 8 (10.8) | 3 (7.3) | 8 (10.3) | <.001 |
| Diabetic nephropathy | 4 (2.5) | 19 (25.7) | 10 (24.4) | 24 (30.8) | <.001 |
| Focal segmental glomerulosclerosis | 14 (8.8) | 20 (27.0) | 12 (29.3) | 8 (10.3) | <.001 |
| IgA nephropathy | 19 (11.9) | 6 (8.1) | 4 (9.8) | 10 (12.8) | .78 |
| Membranous glomerulonephritis | 21 (13.2) | 6 (8.1) | 2 (4.9) | 2 (2.6) | .04 |
| Antineutrophil cytoplasmic antibody vasculitis | 4 (2.5) | 4 (5.4) | 1 (2.4) | 9 (11.5) | .02 |
| Hypertensive nephropathy | 1 (0.6) | 1 (1.4) | 2 (4.9) | 9 (11.5) | <.001 |
| Minimal change disease | 11 (6.9) | 2 (2.7) | 0 | 0 | .02 |
| Ultrasonography images, No. | |||||
| Total | 2701 | 1239 | 701 | 1494 | .20 |
| Per patient | 16.99 | 16.74 | 17.13 | 19.15 | .29 |
Abbreviation: IFTA, Interstitial fibrosis and tubular atrophy.
SI conversion factor: To convert creatinine to micromoles per liter, multiply by 88.4.
Other includes American Indian or Alaska Native and Native Hawaiian or Pacific Islander or that race/ethnicity was not indicated in the medical record.
Body mass index is calculated as weight in kilograms divided by height in meters squared.
Agreement Among Pathologists’ Independent Evaluation of IFTA Scores on Randomly Selected Subsample of Histopathology Slides
| Pathologist 1, No. of slides | Pathologist 2, No. of Slides | Total | |||
|---|---|---|---|---|---|
| IFTA 0%-24% | IFTA 25%-49% | IFTA 50%-74% | IFTA ≥75% | ||
| IFTA 0%-24% | 26 | 4 | 0 | 0 | 30 |
| IFTA 25%-49% | 5 | 12 | 3 | 1 | 21 |
| IFTA 50%-74% | 0 | 5 | 2 | 0 | 7 |
| IFTA ≥75% | 0 | 6 | 3 | 26 | 35 |
| Total | 31 | 27 | 8 | 27 | 93 |
Abbreviation: IFTA, Interstitial fibrosis and tubular atrophy.
Weighted Cohen κ = 0.8360, and SE = 0.1026.
Predictive Performance of the Deep Learning Model to Quantify Interstitial Fibrosis and Tubular Atrophy
| Metric | Point estimate (95%) CI) | ||
|---|---|---|---|
| Validation set (n = 401) | Test set (n = 612) | Patient level (n = 268) | |
| Precision | 0.8936 (0.8634-0.9238) | 0.8927 (0.8682-0.9172) | 0.9003 (0.8644-0.9362) |
| Recall | 0.7646 (0.7231-0.8061) | 0.8037 (0.7722-0.8352) | 0.8421 (0.7984-0.8858) |
| Accuracy | 0.8429 (0.8073-0.8785) | 0.8675 (0.8406-0.8944) | 0.8955 (0.8589-0.9321) |
| F1 score | 0.8054 (0.7667-0.8441) | 0.8389 (0.8098-0.8680) | 0.8639 (0.8228-0.9049) |
Incremental Value of the DL Model to Predict Interstitial Fibrosis and Tubular Atrophy Class at the Level of Individual Patient
| Characteristic | Baseline model | Alternative model |
|---|---|---|
| Covariates | DL-based predictions | DL-based predictions, age, sex, diabetes, hypertension, body mass index, estimated glomerular filtration rate |
| Likelihood ratio χ2 ( | 341.58 (3) | 395.41 (21) |
| Pseudo | 0.5044 | 0.5839 |
| Brier score | 0.0676 | 0.0644 |
| Point estimates (95% CI) | ||
| Precision | 0.8798 (0.8409-0.9187) | 0.8880 (0.8502-0.9258) |
| Recall | 0.8135 (0.7669-0.8601) | 0.8435 (0.8000-0.8870) |
| Accuracy | 0.8843 (0.8460-0.9226) | 0.8918 (0.8546-0.9290) |
| F1 score | 0.8354 (0.7910-0.8798) | 0.8607 (0.8192-0.9022) |
Abbreviation: DL, deep learning.
Results are from multinomial logistic regression analyses with ground truth labels as the dependent variable.