Literature DB >> 35306784

Multivariable Diagnostic Prediction Model to Detect Hormone Secretion Profile From T2W MRI Radiomics with Artificial Neural Networks in Pituitary Adenomas.

Begumhan Baysal¹, Mehmet Bilgin Eser¹, Mahmut Bilal Dogan¹, Muhammet Arif Kursun¹.

Abstract

Objective: This study aims to develop neural networks to detect hormone secretion profiles in the pituitary adenomas based on T2 weighted magnetic resonance imaging (MRI) radiomics.
Methods: This retrospective model-development study included a cohort of patients with pituitary adenomas (n=130) from January 2015 to January 2020 in one tertiary center. The mean age was 46.49±13.69 years, and 76/130 (58.46%) were women. Three observers segmented lesions on coronal T2 weighted MRI, and an interrater agreement was evaluated using the Dice coefficient. Predictors were determined as radiomics features (n=851). Feature selection was based on intraclass correlation coefficient, coefficient variance, variance inflation factor, and LASSO regression analysis. Outcomes were identified as 7 hormone secretion profiles [non-functioning pituitary adenoma, growth hormone-secreting adenomas, prolactinomas, adrenocorticotropic hormone-secreting adenomas, pluri-hormonal secreting adenomas (PHA), follicle-stimulating hormone and luteinizing hormone-secreting adenomas, and thyroid-stimulating hormone adenomas]. A multivariable diagnostic prediction model was developed with artificial neural networks (ANN) for 7 outcomes. ANN performance was presented as an area under the receiver operating characteristic curve (AUC) and accepted as successful if the AUC was >0.85 and p-value was <0.01.
Results: The performance of the ANN distinguishing prolactinomas from other adenomas was validated (AUC=0.95, p<0.001, sensitivity: 91%, and specificity: 98%). The model distinguishing PHA had the lowest AUC (AUC=0.74 and p<0.001). The AUC values for the other five ANN were >0.85 and p values were <0.001. Conclusions: This study was successful in training neural networks that could differentiate the hormone secretion profile of pituitary adenomas. © Copyright Istanbul Medeniyet University Faculty of Medicine.

Entities: Chemical

Keywords: Pituitary adenoma; artificial intelligence; machinelearning; magnetic resonance imaging; radiomics

Year: 2022 PMID： 35306784 PMCID： PMC8939455 DOI： 10.4274/MMJ.galenos.2022.58538

Source DB: PubMed Journal: Medeni Med J ISSN： 2149-4606

INTRODUCTION

Pituitary adenoma is the second most common primary central nervous system tumor and constitutes approximately 14% of intracranial masses[1,2,3]. Pituitary adenomas are classified according to their size as microadenoma (<1 cm), macroadenoma (≥1 cm) and giant adenoma (>4 cm). Additionally, 54-62% of pituitary adenomas are active hormone secreting tumors [growth hormone-secreting adenomas (GHSA), prolactinomas, adrenocorticotropic hormone-secreting adenomas (ACTHSA), pluri-hormonal adenomas (PHA), follicle-stimulating hormone and luteinizing hormone-secreting adenomas (FSH&LHSA), and thyroid-stimulating hormone adenomas (TSHA)], and 38-46% of them are non-functioning[4,5,6]. However, it is possible to determine the hormone secretion profile by using plasma hormone concentrations. Currently, due to increasing use of radiological imaging, many pituitary adenomas are detected incidentally[7]. For these tumors, it may be possible to estimate the hormone secretion profile at the time of imaging by exploiting the heterogeneity[8]. Radiomics is a quantitative approach that extracts many image features from medical images and allows the development of diagnostic tools[8]. The success of this approach in determining tumor subtypes has been studied and confirmed in some other tumors[9,10]. In addition, in a limited number of recent studies, a model based on radiomics features was developed to predict tumor consistency in patients with pituitary adenoma[11,12,13,14,15]. In prior radiomics studies, the stability of the radiomics feature was only evaluated at the level of interrater agreement with the intraclass correlation coefficient[11]. Therefore, this approach may be inadequate to detect stability of radiomics feature. However, the recent statement offered that stable features also should have high precision and accuracy[16]. Therefore, creating diagnostic models based on stable radiomics features may positively affect reproducibility, precision, and accuracy. This study aims to develop neural networks to detect hormone secretion profiles in the pituitary adenomas based on T2 weighted magnetic resonance imaging (MRI) radiomics.

MATERIALS and METHODS

Ethical Considerations

This retrospective model-development study was done after it was approved by the Istanbul Medeniyet University Goztepe Prof. Dr. Suleyman Yalcin Training and Research Hospital Clinical Research Ethics Committee (decision no: 2020/0304, date: 05.18.2020), and written informed consent was waived. The STARD 2015 statement was followed to document the study, and white papers and statements of multiple societies were followed[17,18,19,20,21]. This study was scored (18/36) with a radiomics quality score[17].

Study Population and Data Collection

This model-development study was carried out in a single tertiary-care center. From the patients documented between January 2015 to January 2020, 130 patients who met the inclusion criteria were included in the study. Inclusion criteria were determined as compliance with the following criteria: 1. The MRI, including T2W sequences, of the patient must be present. 2. Image quality should be sufficient to allow segmentation. Patients diagnosed in our center, but whose imaging was performed in another center were excluded. The MRI protocol is described in Table 1.

Table 1

Magnetic resonance imaging protocol for pituitary gland used in the study.

Predictors: Analysis of the T2W Images

Three radiologists with 8 years, 3 years, and 1 year of experience performed segmentation using 3D Slicer software, version 4.10.2 (https://www.slicer.org). Segmentation was done volumetrically on T2W images. The 851 radiomics feature, which is the predictor of this study, was extracted with the PyRadiomics (version 2.2.0). All the features (shape, first order, and high order) in this module were selected. Resampling was done, normalization was enabled, and wavelet-based filters were activated (Figure 1).

Figure 1

Pipeline of the study.

GH: Growth, hormone, ACTH: Adrenocorticotropic hormone, TSH: Thyroid-stimulating hormone, FSH/LH: Follicle-stimulating hormone/luteinizing hormone, PRL: Prolactin

Outcomes

Outcomes were identified as 7 hormone secretion profiles [non-functioning pituitary adenoma (n=19), GHSA (n=21), prolactinomas (n=64), ACTHSA (n=6), PHA (n=6), FSA&LHSA (n=8), and TSHA (n=6)].

Features Stability Analyses: Interobserver Agreement Evaluation and Coefficient of Variation Analysis

Segmentations and radiomic features were separately assessed for interobserver agreement. For segmentation, the Dice similarity coefficient was used to measure interobserver reliability, while intraclass correlation coefficient (ICC - 3,k), two-way random effects model, and absolute agreement were used for radiomics features[22]. Features with an ICC>0.75 were included in the coefficient of variation (CoV) analysis, with those presenting >15% variances being eliminated[16]. The predictor features that passed the CoV analysis were subjected to Spearman’s correlation (SC) analysis, and correlation matrixes were performed for variance inflation factor (VIF) analysis.

Features Selection Analyses: Collinearity-multicollinearity Evaluation and Least Absolute Shrinkage and Selection Operator Regression

VIF analyses were performed to reduce the collinearity-multicollinearity using the formula 1/1-R2. If the VIF was above 10, the feature was eliminated[23]. The features with smaller CoV were preserved in this elimination process. Further, validated imaging biomarkers were evaluated using SC analysis between features and outcomes (p<0.01). Features were selected with the least absolute shrinkage and selection operator (LASSO) with L1 normalization. Random sampling and 5-fold cross-validation were used for seeding LASSO.

Structuring Artificial Neural Networks

For training, networks of multilayer perceptron and radial basis function were selected. The software appointed the number of layers, the number of neurons, error function, hidden activation, and output activation in these models. The software used random number generator for sampling 70% of the patients as train, 15% as a test, and 15% as a validation (hold-out) set for each training session of neurons. These subgroups were in a similar distribution in terms of predictors and outcomes. Hyperparameter tuning was made with the “early stopping” algorithm. The “Early stopping” algorithm trains the neural networks with the “training” set and performs hyperparameter tuning with the “test” set at the end of each epoch. Neuron training continues as long as the error rate decreases in both groups. The training is terminated when the error rate starts to increase in the “test” set. Finally, neuron performance is measured with the validation (hold-out) set.

Statistical Analysis

Statistical analyses and neural network development were performed using the TIBCO Statistica version 13.0.5 (TIBCO Software, Palo Alto, CA). Neural network results with the highest diagnostic accuracy are presented with area under the receiver operating characteristic curve (AUC) with 95% confidence intervals. In receiver operating curve analysis, if AUC was >0.85 and p-value was <0.01, then it was considered a validated classifier neural network[16].

RESULTS

Patient’s Characteristics

This study included 130 consecutive patients with pituitary adenoma. The mean age was 46.49±13.69 years, and 76/130 (58.46%) were women. All patients were Caucasians. A full summary of clinicopathologic characteristics of the patients is presented in Table 2.

Table 2

Characteristics of the participants.

Model Development and Specification

The interobserver median Dice coefficient values for segmentations were 0.84 [interquartile range (IQR): 0.06] between observers 1 and 2; 0.84 (IQR: 0.17) between observers 1 and 3; and 0.79 (IQR: 0.20) between observers 2 and 3. The 204 features were eliminated by using ICC (<0.75). By using CoV analysis (>0.15), 552 features were eliminated. Finally, another 44 features were eliminated by using VIF analysis due to collinearity. Most of the radiomics features were found to be unstable (n=800, 94%). Stable predictors (n=51) and all outcomes were used for correlation analysis, and correlation matrixes were created to evaluate the unadjusted relation between each candidate predictor and outcomes (Figure 2). In this analysis, all SC coefficients were below 0.30, with p<0.01 for only five predictors (Figure 3). Finally, LASSO regression was used for regularization, and the most relevant predictors were selected for neural network training.

Figure 2

Correlation matrix between predictors and outcomes.

GH: Growth, hormone, PRL: Prolactin,

ACTH: Adrenocorticotropic hormone,

FSH/LH: Follicle-stimulating hormone/luteinizing hormone,

TSH: Thyroid-stimulating hormone

Figure 3

Heatmap of the predictors. Each predictor coded with a variable number and an available list of variables in a supplemental file. With this Spearman rank correlation analysis, this heatmap created the high collinear variables eliminated by VIF analyses.

VIF: Varince inflation factor

Diagnostic Prediction Model Results

The performance of the ANN distinguishing prolactinomas from other adenomas was validated (AUC=0.95, p<0.001, sensitivity: 91%, and specificity: 98%). The model distinguishing PHA had the lowest AUC (AUC=0.74 and p<0.001). Results of seven neural networks are presented in Table 3.

Table 3

Neural networks performance results.

DISCUSSION

The most obvious result of this study was that prolactinomas, which were found in about half of the included patients, were predicted with high accuracy based on the heterogeneity in the T2W MRI images. However, the model distinguishing PHA had the lowest AUC. Difficulty in distinguishing these tumors with more than one cell group suggests that the results are not random and related to tissue heterogeneity. There are limited studies in the literature on the classification of pituitary adenomas from MRI images[11,12,13,14,15]. The four of these studies investigated surgical consistency after surgical excision of adenomas[11,13,14,15]. In the study, which included 89 macroadenomas, Cuocolo et al.[11] predicted 28 patients’ outcomes in the test group, and only two soft tumors were misclassified as fibrous tumors. However, all fibrous tumors were correctly classified. Fan et al.[14] reported that adding clinical data such as age, sex, hormone levels to the model improved the model’s accuracy. These results meant that patients who might require re-surgery were identified by imaging the early phase of the disease. This information can make the surgeon confident for surgical planning and reduce residuals and recurrence rates. A second benefit is that the patient can be informed that the tumor is consistent and may need re-surgery in the future. In another study, Peng et al.[12] used T1W, contrast-enhanced T1W, and T2W MRI images and three different machine learning algorithms, and they predicted three different immunohistochemical classes of pituitary adenomas preoperatively. They observed that T2W radiomics based model’s accuracy was the highest. The best classifier was the support vector machine. Considering these results, we did our study with T2W radiomics features and pre-trained neural networks. Currently, radiomics studies are facing a reproducibility crisis. Therefore, the European Society of Radiology (ESR) has recently presented the statement for imaging biomarkers stability such as radiomics.[16] Cuoculo et al.[11] and Zeynalova et al. evaluated the reproducibility of radiomics features by using ICC and included the features with ICC>0.75 and ICC>0.90, respectively. Peng et al.12[13], Fan et al.[14], and Rui et al.[15] did not evaluate the reproducibility of radiomics features. In this study, we followed the ESR statement to evaluate the feature’s stability. Therefore, we eliminated high variance features by using CoV and high collinear features by using VIF analysis[16]. Although Cuocolo et al.[11] did not accept variance and collinearity as a criterion of stability, they also eliminated these features similar to our study. The incidence of incidental adenoma is increasing due to the increasing frequency of imaging[7]. Detecting these lesions’ secretion profiles and consistency at the time of imaging can be beneficial for accelerating patient management. Due to several studies on tumor stiffness and consistency, we focused on the secretion profile in this study[11,13,14,15]. We hypothesized that the cells that determined the secretion profile could be detected by quantitative analysis in this study and we thought that estimating PHA with the lowest accuracy while estimating prolactinoma with the highest accuracy confirmed this hypothesis. Because each of the pluri-hormonal tumors has different amounts of different secretory types of cells, this condition restricts imaging profiling whereas imaging profiling in a tumor containing a single type of cell, such as a prolactinoma, is succesful. This study had several limitations. First, prolactinomas were found in half of the patients, and this neural network trained balanced distribution; however other networks have not. Second, the ground truth was hormone plasma levels because our patient population was consisted of patients admitting to the outpatient clinic of endocrinology. Third, the study was single-centered. However, radiomics features were subjected to rigorous stability analyses to increase reproducibility and precision, and the internal validation methods were used in training neural networks to increase accuracy.

CONCLUSIONS

Soon, this study and previous studies will become parts of a complex web and accumulate, allowing us to obtain much more quantitative data on patients than current. Until then, we need to increase our quantitative data and closely test our imaging biomarkers’ reproducibility, precision, and accuracy. This study shows that the ANN distinguishes with 95% accuracy whether a pituitary adenoma is a prolactinoma.

23 in total

1. Quality of science and reporting of radiomics in oncologic studies: room for improvement according to radiomics quality score and TRIPOD statement.

Authors: Ji Eun Park; Donghyun Kim; Ho Sung Kim; Seo Young Park; Jung Youn Kim; Se Jin Cho; Jae Ho Shin; Jeong Hoon Kim
Journal: Eur Radiol Date: 2019-07-26 Impact factor: 5.315

2. MR textural analysis on contrast enhanced 3D-SPACE images in assessment of consistency of pituitary macroadenoma.

Authors: Wenting Rui; Yue Wu; Zengyi Ma; Yongfei Wang; Yin Wang; Xiao Xu; Junhai Zhang; Zhenwei Yao
Journal: Eur J Radiol Date: 2018-12-06 Impact factor: 3.528

Review 3. CBTRUS statistical report: primary brain and central nervous system tumors diagnosed in the United States in 2005-2009.

Authors: Therese A Dolecek; Jennifer M Propp; Nancy E Stroup; Carol Kruchko
Journal: Neuro Oncol Date: 2012-11 Impact factor: 12.300

4. Ethics of Artificial Intelligence in Radiology: Summary of the Joint European and North American Multisociety Statement.

Authors: J Raymond Geis; Adrian P Brady; Carol C Wu; Jack Spencer; Erik Ranschaert; Jacob L Jaremko; Steve G Langer; Andrea Borondy Kitts; Judy Birch; William F Shields; Robert van den Hoven van Genderen; Elmar Kotter; Judy Wawira Gichoya; Tessa S Cook; Matthew B Morgan; An Tang; Nabile M Safdar; Marc Kohli
Journal: Radiology Date: 2019-10-01 Impact factor: 11.105

Review 5. Incidental pituitary adenomas.

Authors: Walavan Sivakumar; Roukoz Chamoun; Vinh Nguyen; William T Couldwell
Journal: Neurosurg Focus Date: 2011-12 Impact factor: 4.047

Review 6. Radiomics: extracting more information from medical images using advanced feature analysis.

Authors: Philippe Lambin; Emmanuel Rios-Velazquez; Ralph Leijenaar; Sara Carvalho; Ruud G P M van Stiphout; Patrick Granton; Catharina M L Zegers; Robert Gillies; Ronald Boellard; André Dekker; Hugo J W L Aerts
Journal: Eur J Cancer Date: 2012-01-16 Impact factor: 9.162

7. Development and Validation of a Radiomics Nomogram for Preoperative Prediction of Lymph Node Metastasis in Colorectal Cancer.

Authors: Yan-Qi Huang; Chang-Hong Liang; Lan He; Jie Tian; Cui-Shan Liang; Xin Chen; Ze-Lan Ma; Zai-Yi Liu
Journal: J Clin Oncol Date: 2016-05-02 Impact factor: 44.544

8. What the radiologist should know about artificial intelligence - an ESR white paper.

Authors:
Journal: Insights Imaging Date: 2019-04-04

9. Preoperative Noninvasive Radiomics Approach Predicts Tumor Consistency in Patients With Acromegaly: Development and Multicenter Prospective Validation.

Authors: Yanghua Fan; Min Hua; Anna Mou; Miaojing Wu; Xiaohai Liu; Xinjie Bao; Renzhi Wang; Ming Feng
Journal: Front Endocrinol (Lausanne) Date: 2019-06-28 Impact factor: 5.555

10. ESR Statement on the Validation of Imaging Biomarkers.

Authors:
Journal: Insights Imaging Date: 2020-06-04