Literature DB >> 35041197

Identification of glomerulosclerosis using IBM Watson and shallow neural networks.

Francesco Pesce¹, Federica Albanese², Davide Mallardi², Michele Rossini², Giuseppe Pasculli^2,3, Paola Suavo-Bulzis², Antonio Granata⁴, Antonio Brunetti⁵, Giacomo Donato Cascarano⁵, Vitoantonio Bevilacqua⁵, Loreto Gesualdo⁶.

Abstract

BACKGROUND: Advanced stages of different renal diseases feature glomerular sclerosis at a histological level which is observed by light microscopy on tissue samples obtained by performing a kidney biopsy. Computer-aided diagnosis (CAD) systems leverage the potential of artificial intelligence (AI) in healthcare to support physicians in the diagnostic process.
METHODS: We propose a novel CAD system that processes histological images and discriminates between sclerotic and non-sclerotic glomeruli. To this goal, we designed, tested, and compared two artificial neural network (ANN) classifiers. The former implements a shallow ANN classifying hand-crafted features extracted from Regions of Interest (ROIs) by means of image-processing procedures. The latter, instead, employs the IBM Watson Visual Recognition System, which uses a deep artificial neural network making decisions taking the images as input, without the need to design any procedure for describing images with features. The input dataset consisted of 428 sclerotic glomeruli and 2344 non-sclerotic glomeruli derived from images of kidney biopsies scanned by the Aperio ScanScope System.
RESULTS: Both AI approaches allowed to very accurately distinguish (mean MCC 0.95 and mean Accuracy 0.99) between sclerotic and non-sclerotic glomeruli. Although the systems may seem interchangeable, the approach based on feature extraction and classification would allow clinicians to gain information on the most discriminating features. In fact, further procedures could explain the classifier's decision by analysing which subset of features impacted the most on the final decision.
CONCLUSIONS: We developed a customizable support system that can facilitate the work of renal pathologists both in clinical and research settings.

Entities: Chemical

Keywords: Artificial intelligence; Glomerulosclerosis; IBM Watson; Renal biopsy

Mesh：

Year: 2022 PMID： 35041197 PMCID： PMC8765108 DOI： 10.1007/s40620-021-01200-0

Source DB: PubMed Journal: J Nephrol ISSN： 1121-8428 Impact factor: 4.393

Introduction

The primary histologic indicators of irreparable renal injury include interstitial fibrosis, tubular atrophy (IFTA) and glomerulosclerosis, which is the final pathological alteration of chronic kidney diseases [1]. It is characterised by the deposition of scar tissue, which replaces the renal parenchyma, and is quantified by renal pathologists to indicate the presence and extent of renal damage. However, such assessment can be variable among pathologists [2-5] with results often indicating decisions based on grading systems that may be applied differently in different institutions. Several previous studies have applied various morphometric methods to improve the reproducibility and accuracy of IFTA assessment [6-9], and Machine Learning (ML) algorithms have already been successfully applied to glomerular segmentation by different research groups [7, 10–13] comprehending a whole-slide classifier to directly replicate a pathologist’s assessment of IFTA and glomerulosclerosis on renal biopsy specimens [14]. In this work, we aimed to design a computer-aided diagnosis (CAD) system based on Artificial Intelligence (AI) to detect glomerulosclerosis automatically. Specifically, we designed, developed, tested, and compared two types of classifiers, both based on artificial neural networks (ANNs), namely ML algorithms capable of learning tasks, based on examples of input data and the desired output [15]. Precisely, the first approach (feature-based) implements a pipeline that, starting from the input images, extracts features describing the input data and makes the classifications based on these features. Instead, the second approach employs the IBM Watson Visual Recognition (WVR) framework, which is capable of making decisions based on the input images without the need to design procedures for extracting hand-crafted features. IBM WVR uses Deep Learning (DL) algorithms to analyse and classify images [16, 17].

Materials and methods

Altogether, 26 kidney biopsies performed between 07/2011 and 02/2015 at the Department of Emergency and Organ Transplantations (DETO) of the Bari University Hospital were used. All kidney biopsies were stained with Periodic Acid-Schiff PAS [18] after formalin fixation and inclusion in paraffin. For each biopsy several 2–3 µm thick sections were cut from different levels of the tissue (at least 3 levels with 3 sections for each level). All biopsies were processed at the same institution. Slides were stained at different times (i.e. when such organ donation occurred). Each slide was scanned using an Aperio ScanScope at 20 × with a resolution of 0.50 μm/pixel. For each slide, glomeruli were identified and manually annotated using the Aperio Image Scope tool by two independent renal pathologists. Glomeruli were labelled as “sclerotic” or “non-sclerotic” (Fig. 1a). After the manual labelling, we developed a MATLAB script to extract the Regions of Interest (ROI) employed for the subsequent stages.

Fig. 1

a Glomeruli annotation. In the pre-processing stage, glomeruli were manually annotated by two renal pathologists. Non-sclerotic glomeruli were marked in green and sclerotic glomeruli in yellow. b (upper right quadrant). Feature based classification approach. c (lower right quadrant). IBM Watson Visual Recognition Workflow The final dataset included 2772 glomeruli, 428 sclerotic and 2344 non-sclerotic ones, with a ratio between the two classes of 1/5.5. For clarity’s sake, we considered sclerotic glomeruli as belonging to the Positive class, whereas the non-sclerotic samples constituted the Negative class. We split the dataset into two parts to perform the subsequent analyses: a training set (about 80% of the entire dataset) to train the classifiers, and a test set (about 20%) to evaluate the classification performance. Regarding the WVR System by IBM, from the 80% of glomeruli belonging to the train set, the 10% of the samples was randomly withdrawn to create a validation set, used to choose the best models to test. Supplementary Tables 1 and 2 show the number of samples constituting the dataset processed by the classification systems for both the considered approaches. Only the test set was used to assess the final performance; the same test set was used for both approaches thus allowing for a better comparison of the two models.

Feature-based classification approach

The feature-based classification approach is based on the extraction of features from the input images through image processing techniques; the classification is then performed with a supervised ML algorithm, namely a shallow ANN, allowing to characterise and distinguish between sclerotic and non-sclerotic glomeruli. We designed and developed this model following three steps: (i) feature extraction, (ii) feature reduction and (iii) glomeruli classification. The workflow is described in Fig. 1b. Regarding the feature extraction procedure, two morphological characteristics related to Bowman’s capsule and Bowman’s space were extracted after image processing procedures that were necessary due to the PAS staining of the images. Also, 148 textural features based on the well-known multi-radial colour LBP (mrcLBP) and Haralick algorithms were obtained. After extracting the features, a procedure to reduce the feature space was needed due to the high dimensionality of such data. The feature reduction process reduces the number of features considering the most useful ones, namely, those better contributing to the discrimination process while removing the irrelevant or redundant ones. In order to do this, Principal Component Analysis (PCA) was performed, allowing to reduce the data to be considered in the subsequent phase to 95 components contributing to 99.9% of the variance of the input data. After image processing and feature extraction procedures, we built a shallow ANN with one hidden layer. In order to select a suitable number of neurons for the hidden layer, we trained and cross-validated (tenfold cross-validation) the ANN, changing the number of neurons iteratively, from 1 to 95 (the number of input features). We then chose the configuration reporting the highest average Matthews Correlation Coefficient (MCC), i.e. the ANN configuration with 27 neurons in the hidden layer. Concerning the training hyperparameters, all the configurations had the following: Weight and bias initialization with the Nguyen-Widrow initialization algorithm; Weight and bias update with the Scaled Conjugate Gradient algorithm; Cross-Entropy loss function; Hyperbolic tangent sigmoid transfer function for the hidden layer, whereas the softmax function was employed for the output layer; Network training stopped if any of the following criteria were met: 100 training epochs reached; 6 consecutive validation fails; loss value reached 0; gradient performance reached 10-6; Matlab Deep Learning Toolbox™ was employed to design, train and validate the ANN architectures. In order to assess the robustness of the implemented workflow, we performed tenfold cross-validation and a final hard-voting procedure for making decisions of the test set. Furthermore, we performed ten runs of the classification pipeline in order to evaluate the performance variations with respect to the data contained in the folds. Precisely, the training dataset was split into tenfold; in turn, ninefold were used to train the network, whereas the last fold was used to validate it. Classification of the samples belonging to the test set was performed by considering a majority voting by the ten classifiers: the most supported class was then assigned to the specific sample. In this feature-based approach, we faced the issue of the imbalance of the dataset by implementing two complementary strategies. Firstly, the MCC was evaluated as a general performance comparison among the folds. In fact, the MCC value is a measure of the quality of binary (two-class) classifications which considers the number of false positives and false negatives; thus, it is generally regarded as a balanced measure that can be used even if the classes are of very different sizes [19]. The second strategy, instead, considers the Receiving Operating Characteristic (ROC) in order to choose the correct classification threshold value. ROC curves plot the True Positive Rate (TPR) variations against the False Positive Rate (FPR), varying the threshold used for making the decision by the classifier. Selecting the most suitable threshold, such as the one providing us to obtain the higher Area Under the Curve (AUC), allowed us to reduce the classifier polarization due to the most represented class. Matlab source code is available at the following Github repository: https://github.com/LabInfInd/glomerulosclerosis_identification_watson_ann.git.

IBM Watson Visual Recognition

IBM WVR, differently from the previous method, uses DL algorithms to analyse and classify images [16, 17]. DL is a branch of ML focused on algorithms based on models showing deep architectures characterised by multiple layers capable of extracting features that describe the input data at higher abstraction levels, i.e. Convolutional Neural Networks or Deep Neural Networks [20]. Five steps are needed to train and use a classification model on the IBM WVR system: Prepare training data: sort images into positive or negative images. A set of images related to the classification task have to be collected. In order to optimize the training phase, the images should have similar size, resolution, and colour palette. With these images, two training sets must be created: a set with the positive images (containing the features the classifier should recognize) and another with negative image examples (without features). The two training sets should not overlap; Train and create new models: upload examples as training data. These two sets are uploaded to the WVR service that is available on IBM Cloud. The service automatically trains its neural network based on positive and negative image examples. At the end of this stage, a custom model has been created and will be available for usage in the Recognition service; Prepare images: gather images to analyse. After training the model, any set of images can be uploaded to the Recognition service in order to be classified; Analyse images: use the built-in capabilities or a custom model. The trained custom model classifies each image of the uploaded set; View results: review the insights into your visual content. For each of the analysed images, the system returns the image associated class, a set of information that characterises the imputed image and the features recognized in the latter. The workflow is depicted in Fig. 1c. Thanks to the extreme versatility of IBM’s proprietary algorithm for WVR, which allows users to train Watson AI on almost any visual content in order to create custom analysis models, we designed several models by combining the following variables: colour of the image (PAS staining or grayscale-converted images, to better understand how well the classifier discriminates analysing colours); size of images (original or resized to 224 × 224 pixel files according to Watson’s guidelines); binary (two types of images provided: sclerotic glomeruli and non-sclerotic glomeruli images) or multi-class technique (two classes: sclerotic and non-sclerotic glomeruli); the number of images (to balance the number of samples per class, as suggested in WVR guidelines). We employed different image augmentation procedures to balance the positive and negative image samples concerning the last point. Since the IBM WVR guidelines suggest creating models with an approximately equal number of positive and negative cases in the training set, data augmentation of the sclerotic glomeruli images was carried out. Data augmentation is a method that increases the number of samples in a dataset creating synthetic images, in this case, by applying image transformations to the available samples. By doing this, we were able to balance the number of sclerotic (∼ 300) and non-sclerotic images (∼ 1600) in the training set. Finally, two training data sets were created: the first one was obtained by subsampling the most represented class of samples, randomly selecting 313 non-sclerotic glomeruli (negative) from the negatives, and 307 sclerotic glomeruli (positive) images (model 300, no data augmentation needed); the second dataset was generated with the data augmentation; thus, it contained 1667 negative samples and 1607 positive images (model 1600).

Results

Different metrics were considered to evaluate the performance of both classifiers. Specifically, we evaluated Accuracy, Precision, Recall, Specificity, F1 score and, as already mentioned, the MCC, considering True Positive, True Negative, False Positive and False Negative according to the Confusion Matrix. Table 1 reports the average and the best performance obtained by the Feature-Based classifier whose confusion matrices are reported in Supplementary Tables 3a and 3b (respectively for the best and worst case scenarios). The results show that the feature-based Artificial Neural Network was able to discriminate sclerotic and non-sclerotic glomeruli with high performance (mean MCC = 0.95 and mean Accuracy = 0.99) and low variability (MCC std = 0.01 and Accuracy std < 0.00).

Table 1

Performance of the feature-based approach

	Mean + std	Best
Accuracy	0.9874 ± 0.0018	0.9914
Precision	0.9844 ± 0.0111	1.0000
Recall	0.9310 ± 0.0153	0.9425
MCC	0.9501 ± 0.0074	0.9659
Specificity	0.9974 ± 0.0019	1.000
F1-score	0.9568 ± 0.0065	0.9659

Performance of the feature-based approach Average Precision and Recall were equal to 0.98 (± 0.01) and 0.93 (± 0.02), respectively, showing better performance in the identification of non-sclerotic glomeruli (all the non-sclerotic glomeruli were detected in the best case). We created, for each dataset, eight balanced classifiers, considering the different images obtained through the processing described in the Methods section, thus obtaining 16 models altogether. Specifically, for each dataset, there were: multi-class, PAS staining and original size model; multi-class in grayscale and original size model; multi-class resized and PAS staining model; multi-class resized in grayscale model; binary, PAS staining and original size model; binary, PAS staining and resized model; binary in grayscale and original size model; binary resized in grayscale model. A validation test was carried out on all 16 models to choose the most performing one. The results are shown in the following tables in terms of recall and specificity, considering a classification threshold set at 0.5. The classifiers in the analysis provide a score between 0 and 1. This number indicates Watson’s confidence in classifying an image as belonging to a certain class. The validation test is the same for each model. Every test had a cut-off of 0.5: if > 0.5, the glomerulus was considered as belonging to the tested class. Based on the models’ performances on the validation set (Table 2), we focused the analysis on the test set, considering only the model performing at best, i.e. the “binary resized in grayscale model”. The test is the same for each model. Every test had a cut-off of 0.5: if the test resulted in a value > 0.5, the glomerulus was considered belonging to the tested class.

Table 2

Performance of IBM Watson on the validation dataset

	MODEL 300		MODEL 1600
	Specificity (%)	Recall (%)	Specificity (%)	Recall (%)
Multiclass, PAS staining and original size	97.14	94.12	100	97.06
Binary, PAS staining and original size	97.14	97.06	100	97.06
Multiclass in grayscale and original size	97.14	97.06	100	100
Multiclass resized and PAS staining	97.14	91.18	100	97.06
Multiclass resized in grayscale	97.14	97.06	100	94.12
Binary, PAS staining and resized	97.14	97.06	99.46	100
Binary in grayscale and original size	97.14	97.06	100	100
Binary resized in grayscale	97.14	97.06	100	100

Performance of IBM Watson on the validation dataset The obtained performances on the test set are reported in terms of Accuracy, Precision, Recall, MCC, Specificity and F1-score using intermediate augmented datasets (Supplementary Table 4).

Comparison between IBM Watson Visual Recognition and feature-based model

The test set used to compare the performance of the two classification approaches was the same for both models and consisted of 492 non-sclerotic glomeruli and 87 sclerotic glomeruli. The results of the comparison between IBM WVR and the feature-based model are reported in Table 3, in terms of average performance (± standard deviation).

Table 3

Comparison between IBM WVR and the feature-based model

	IBM Visual Recognition	Feature-based
Precision	0.9647	0.9844 ± 0.0111
Recall	0.9425	0.9310 ± 0.0153
Specificity	0.9939	0.9974 ± 0.0019
Accuracy	0.9862	0.9874 ± 0.0018
MCC	0.9455	0.9501 ± 0.0074
F1	0.9535	0.9568 ± 0.0065

Basic model metrics with 300 images are reported for the classifier based on IBM WVR to avoid any interference due to the data augmentation

Comparison between IBM WVR and the feature-based model Basic model metrics with 300 images are reported for the classifier based on IBM WVR to avoid any interference due to the data augmentation Evaluation metrics were good and comparable between the two systems and both the classification approaches reached high levels of performance. IBM WVR showed a higher recall, whereas precision was higher with the feature-based model. Both models, however, performed better in the identification of non-sclerotic glomeruli. Focusing on the misclassifications of both the classifiers, most of the errors were due to low-quality images caused by technical artefacts, which even renal pathologists misinterpret and commonly discard in clinical practice (Supplementary Fig. 1).

Discussion

Since its advent, AI has always been recognized as a valid tool to assist the processing of virtually every data modality and ultimately enhance the human capability of handling and making sense of such data. ML, and, more specifically, DL, have long been part of our daily routines: computer vision tasks [21] (such as object detection, face recognition, action, and activity recognition), voice recognition of smartphones, autopilot of vehicles [22]. Healthcare, too, has acknowledged the potential support of AI in performing the most diverse tasks (such as diagnosis, therapeutic strategies, patient management) in a short time and with the advantage of being cost-effective. AI can potentially be applied to every medical speciality; imaging has definitely been one of most prolific fields [23], especially in oncology (i.e. thoracic imaging, breast lesions [24], colonoscopy, brain tumours). IBM WVR has been successfully “trained” to detect abnormalities and extract textural features of the altered lung parenchyma that could be related to specific signatures of the Covid-19 virus [25]. The implementation and development of digital pathology, too, have been driven by the progress in ML and DL [26], and several AI systems have already been developed to assist physicians [27, 28].Through the development of computational image analysis tools for tissue interrogation, AI and ML have brought pathology to the forefront in this process of re-defining nephrology [29, 30]. In fact, AI applied to image processing can offer many advantages in terms of accuracy and workload management for renal pathologists, also potentially helping with the discovery of novel biomarkers in research settings [31-35]. Furthermore, the recent gathering of the Banff Digital Pathology Working Group demonstrates the strong interest in AI and will help to advance the use of such techniques in specific renal pathology fields [36] (e.g., renal transplantation). In the automated analysis of kidney images, we propose a system which focuses on glomerulosclerosis. To do this, we tested two different ANN approaches, and both classifiers showed good performance in recognising glomerulosclerosis and discriminating between normal and sclerotic glomeruli. Although recent literature demonstrated that DL methodologies are able to perform better than traditional ML approaches [37-39], our results show that, in this case, performance of the feature-based approach remains comparable to the IBM WVR system and may offer some advantages. In particular, the description of regions with discriminative features and the implemented pipeline for designing the classifier made the feature-based model quite robust. Shallow ANNs, furthermore, seem to be more precise and more flexible since it is possible to customize the algorithm according to the number and quality of desired features. These advantages make the feature-based model potentially suitable for each field of application and at any level of complexity. Another interesting note that emerges from this study is that even a general-purpose visual analytics tool like IBM WVR has led to very accurate results, though particular care was devoted to preparing the training set. Despite the user-friendly interface provided by IBM WVR, choosing the model that would perform better was not straightforward, and up to 16 models derived from different combinations of key input parameters were prepared to be tested. Namely, we worked on the colour (the original histological staining or greyscale), the size (original or resized as suggested by IBM WVR guidelines), the class definition (“binary”: when only one class is defined e.g., the sclerotic glomerulus versus anything else, or “multi class” e.g., both sclerotic and non-sclerotic glomeruli are used as separate classes to be recognized), and the number of images given that having roughly 50/50 positive and negatives is recommended). This latter parameter was particularly challenging. In order to balance the dataset, we used data augmentation, but such technique did not result in a linear improvement of the performance as shown by the profile of the MCC across the different models (Supplementary Table 4). To the best of our knowledge, this is the first study exploiting and comparing the IBM WVR system for this particular task. Additional features to implement the final system will include the recognition of intermediate sclerotic lesions, other renal compartments and the automatic annotation of the glomeruli as different ANNs have already been created for this purpose [40, 41]. Below is the link to the electronic supplementary material. Supplementary file1 (DOCX 501 KB)

29 in total

1. Reproducibility of the Banff schema in reporting protocol biopsies of stable renal allografts.

Authors: James Gough; David Rush; John Jeffery; Peter Nickerson; Rachel McKenna; Kim Solez; Kiril Trpkov
Journal: Nephrol Dial Transplant Date: 2002-06 Impact factor: 5.992

2. Region-Based Convolutional Neural Nets for Localization of Glomeruli in Trichrome-Stained Whole Kidney Sections.

Authors: John D Bukowy; Alex Dayton; Dustin Cloutier; Anna D Manis; Alexander Staruschenko; Julian H Lombard; Leah C Solberg Woods; Daniel A Beard; Allen W Cowley
Journal: J Am Soc Nephrol Date: 2018-06-19 Impact factor: 10.121

3. Morphometric and visual evaluation of fibrosis in renal biopsies.

Authors: Alton B Farris; Catherine D Adams; Nicole Brousaides; Patricia A Della Pelle; A Bernard Collins; Ellie Moradi; R Neal Smith; Paul C Grimm; Robert B Colvin
Journal: J Am Soc Nephrol Date: 2010-11-29 Impact factor: 10.121

Review 4. Artificial intelligence in radiology.

Authors: Ahmed Hosny; Chintan Parmar; John Quackenbush; Lawrence H Schwartz; Hugo J W L Aerts
Journal: Nat Rev Cancer Date: 2018-08 Impact factor: 60.716

5. Automated Computational Detection of Interstitial Fibrosis, Tubular Atrophy, and Glomerulosclerosis.

Authors: Brandon Ginley; Kuang-Yu Jen; Seung Seok Han; Luís Rodrigues; Sanjay Jain; Agnes B Fogo; Jonathan Zuckerman; Vighnesh Walavalkar; Jeffrey C Miecznikowski; Yumeng Wen; Felicia Yen; Donghwan Yun; Kyung Chul Moon; Avi Rosenberg; Chirag Parikh; Pinaki Sarder
Journal: J Am Soc Nephrol Date: 2021-02-23 Impact factor: 10.121

Review 6. Artificial intelligence and machine learning in nephropathology.

Authors: Jan U Becker; David Mayerich; Meghana Padmanabhan; Jonathan Barratt; Angela Ernst; Peter Boor; Pietro A Cicalese; Chandra Mohan; Hien V Nguyen; Badrinath Roysam
Journal: Kidney Int Date: 2020-04-01 Impact factor: 10.612

7. Computational Segmentation and Classification of Diabetic Glomerulosclerosis.

Authors: Brandon Ginley; Brendon Lutnick; Kuang-Yu Jen; Agnes B Fogo; Sanjay Jain; Avi Rosenberg; Vighnesh Walavalkar; Gregory Wilding; John E Tomaszewski; Rabi Yacoub; Giovanni Maria Rossi; Pinaki Sarder
Journal: J Am Soc Nephrol Date: 2019-09-05 Impact factor: 14.978

8. Segmentation of Glomeruli Within Trichrome Images Using Deep Learning.

Authors: Shruti Kannan; Laura A Morgan; Benjamin Liang; McKenzie G Cheung; Christopher Q Lin; Dan Mun; Ralph G Nader; Mostafa E Belghasem; Joel M Henderson; Jean M Francis; Vipul C Chitalia; Vijaya B Kolachalama
Journal: Kidney Int Rep Date: 2019-04-15

9. Promises of Big Data and Artificial Intelligence in Nephrology and Transplantation.

Authors: Charat Thongprayoon; Wisit Kaewput; Karthik Kovvuru; Panupong Hansrivijit; Swetha R Kanduri; Tarun Bathini; Api Chewcharat; Napat Leeaphorn; Maria L Gonzalez-Suarez; Wisit Cheungpasitporn
Journal: J Clin Med Date: 2020-04-13 Impact factor: 4.241

Review 10. Digital pathology and computational image analysis in nephropathology.

Authors: Laura Barisoni; Kyle J Lafata; Stephen M Hewitt; Anant Madabhushi; Ulysses G J Balis
Journal: Nat Rev Nephrol Date: 2020-08-26 Impact factor: 28.314

1 in total

Review 1. Artificial Intelligence-Assisted Renal Pathology: Advances and Prospects.

Authors: Yiqin Wang; Qiong Wen; Luhua Jin; Wei Chen
Journal: J Clin Med Date: 2022-08-22 Impact factor: 4.964

1 in total