Literature DB >> 34136106

Image-driven classification of functioning and nonfunctioning pituitary adenoma by deep convolutional neural networks.

Hongyu Li^1,2, Qi Zhao¹, Yihua Zhang³, Ke Sai⁴, Lunshan Xu³, Yonggao Mou⁴, Yubin Xie¹, Jian Ren¹, Xiaobing Jiang^1,4,5.

Abstract

The secreting function of pituitary adenomas (PAs) plays a critical role in making the treatment strategies. However, Magnetic Resonance Imaging (MRI) analysis for pituitary adenomas is labor intensive and highly variable among radiologists. In this work, by applying convolutional neural network (CNN), we built a segmentation and classification model to help distinguish functioning pituitary adenomas from non-functioning subtypes with 3D MRI images from 185 patients with PAs (two centers). Specifically, the classification model adopts the concept of transfer learning and uses the pre-trained segmentation model to extract deep features from conventional MRI images. As a result, both segmentation and classification models obtained high performance in two internal validation datasets and an external testing dataset (for segmentation model: Dice score = 0.8188, 0.8091 and 0.8093 respectively; for classification model: AUROC = 0.8063, 0.7881 and 0.8478, respectively). In addition, the classification model considers the attention mechanism for better model interpretation. Taken together, this work provides the first deep learning-based tumor region segmentation and classification models of PAs, which enables early diagnosis and subtyping PAs from MRI images.

Entities: Chemical Disease Gene Species

Keywords: Deep learning; MRI; Pituitary adenomas

Year: 2021 PMID： 34136106 PMCID： PMC8178077 DOI： 10.1016/j.csbj.2021.05.023

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Pituitary adenomas (PAs) account for approximately 15% of all intracranial tumors [1]. Recently, the prevalence of PAs has increased to 115 cases per 100,000. The increase is probably due to the rising use of diagnostic medical imaging and enhanced awareness [2]. Clinically, PAs may cause considerable mortality due to their mass effects and the hypersecretion of one or more pituitary hormones [3]. Nonfunctioning pituitary adenomas(NFPAs) are usually associated with mass effects, including headache, visual defects, and the development of hypopituitarism. Secretory adenomas produce one or more pituitary hormones such as prolactin, growth hormone (GH), adrenocorticotropic hormone (ACTH), and thyroid-stimulating hormone (TSH), causing phenotypic clinical symptoms, including loss of libido, hyperthyroidism, acromegaly and Cushing’s syndrome [2]. Treatments are determined often according to the size of the lesion and the status of the secreting function. Therefore, early detection and management of pituitary adenomas are non-negligible in improving the prognosis of patients with PAs. Contrasted magnetic resonance imaging (MRI) is the mainstream method to evaluate the location and size of PAs. The GrowCut algorithm is freely available as a module for the medical image computing platform 3D Slicer [4] and has been used in a recent study to segment PAs based on MRI [5]. Assessment of pituitary hormones is another essential factor to determine the types and treatment modalities for adenomas. Analysis of MRI images is labor intensive and highly variable among radiologists. Besides, the clinical testing of all pituitary hormones is usually time consuming, economically costly, and even remains unavailable in many local medical centers. Therefore, constructing an artificial intelligence system will assist the radiologists to obtain more reliable PAs diagnoses from the conventional MRI images, and thus led to time and cost saving. For the past decades, deep architecture [6] has garnered a great amount of attention in various fields due to its representational power. Deep learning methods, especially convolutional neural networks, have shown great potential power in the assessment of medical problems, such as cancer classification, tumor segmentation and survival prediction [7], [8], [9], [10]. Furthermore, reports demonstrated that a computer-aided diagnosis (CAD) system can accurately diagnose PAs through MRI images [11]. Ranging from the LeNet architecture [12] to Residual-style Networks [13], [14], [15], the network architectures have become deeper and wider for rich representations. On the large labelled datasets, CNNs have shown good performance in different computer vision tasks, such as ImageNet [16], [17], Microsoft COCO [18]. However, CNNs cannot be trained efficiently from scratch for medical images due to small datasets. For the small dataset scenario, an effective method to employ CNNs to medical image classification is transfer learning [19]. Transfer learning is a deep learning approach in which a model that has been trained for one task is used as a starting point to train a model for a similar task. Usually, fine-tuning a network from pre-trained network with transfer learning is more computationally efficient than training a network from scratch. Transfer learning techniques have been shown to be successful in several medical applications, such as the diagnosis of Alzheimer’s disease [20], magnetic resonance (MR) image segmentation [21] and microscopy images [22]. In the previous studies, there are two types of transfer learning approaches, such as (i) use off-the-shelf trained CNN models over a large dataset of natural images as a feature extractor and train a separate learning method for classification [23], [24], [25], [26]. (ii) Use pre-trained CNNs and apply fine-tuning to the application of medical images database [27], [28]. Although increasing the depth and width of network architecture could help improve model performance, models tend to produce many redundant features and make convergence difficult. Many researchers have investigated a different aspect of the architecture design, termed attention. The significance of attention mechanisms has been studied extensively in previous research literature [29], [30]. Introducing attention mechanisms in network architecture design is more computationally efficient. In this study, we employed a 3D deep learning algorithm to generate three fully automated segmentation models based on conventional MRI images. Then, we exploited transfer learning methodology for feature representation based on trained segmentation models to enhance classification accuracy. Our method is different from (i) discussed above. We did not agree with the idea of transfer learning from natural 2D images due to the purpose of making full use of the 3D contextual information of MRI images. We used pre-trained segmentation CNN models which are trained on relatively large 3D MRI patches to initiate the classification models and fine-tuned the classification models. Meanwhile, we adopted an attention module to automatically refine extracted features, making the classification model more concerned with features that contain significant information.

Materials and methods

Study cohorts

From January 2017 to March 2021, patients with PAs surgically treated in Sun Yat-sen University Cancer Center (SYSUCC) were retrospectively reviewed. The following inclusion criteria were used: (1) pathological confirmation of a pituitary adenoma, (2) a completed evaluation of pituitary hormones, and (3) availability of four contrast MRI data (T1-weighted image, contrast-enhanced T1-weighted image, T2-weighted image, and T2-weighted FLAIR image). All the procedures in the current study were approved by the ethics committee of SYSUCC. Written informed consent was obtained from all the patients. A total of 168 patients from SYSUCC were enrolled and divided into training (n = 100), internal validation 1 (n = 44) and validation 2 (n = 24) sets. For external independent testing, a total of 17 patients from Daping Hospital, Army Medical University who were treated surgically from January 2018 to December 2019 were recruited as a testing set (Fig. 1).

Fig. 1

Schematic overview of the study.

Clinical and laboratory evaluation

Demographic and clinical-pathological variables (see Sup. Table 1), including age, sex, pathological diagnoses, hormone levels and tumor size were collected using an electronic medical record system. Venous blood samples were collected in the morning after a 12-hour fast. Hormone assays including prolactin, cortisol, TSH, free T4 (fT4), GH, luteinizing hormone (LH), follicle-stimulating hormone (FSH), estradiol and total testosterone were measured with methods described previously [31]. The diagnosis of a pituitary adenoma was based on physical examination, contrast magnetic resonance imaging, hormone assay, and pathological observation. Accordingly, PAs were further classified into functioning and nonfunctioning adenomas based on their hormone-secreting status [2], [32]. The diagnosis of prolactinoma was confirmed by constantly increasing prolactin levels (>100 ng/ml). The diagnosis of a GH secreting adenoma was based on an increased GH level, a lack of GH suppression under 1 ng/ml during an oral glucose load (75 g), and an IGF-1 level over reference for sex and age [33]. The invasiveness of the tumor was evaluated according to the Knosp’s classification [34].

MRI data collection and process

An MRI was performed on a 1.5 or 3.0 T system. Four sequences, T1W, contrast-enhanced T1W (T1CE), T2W and T2W-FLAIR, were acquired for further analysis. The MRI images in the training, validation and testing datasets were preprocessed in the following manner: a) DICOM files were converted to NIFTI format. b) All MRI volumes were rigidly registered to the same T2 anatomic template and resampled to 1 mm voxel resolution through the Oxford Center for Functional MRI of the Brain’s (FMRIB) Linear Image Registration Tool (FLIRT) [35], [36] from the FMRIB Software Library [37], [38], [39]. c) The volumes of all modalities were skull-stripped using the Brain Extraction Tool (Bet) [40]. d) All MRI volumes used N4BiasCorrection [41] to remove random field inhomogeneity. e) Rescaling intensity range to [0, 1]. The tumor contours for 185 subjects were manually labeled and validated by two neurosurgeons (XB Jiang and YH Zhang). All segmentation was performed using the MITK software [42], taking about 30 min per subject. To deal with ambiguities in tumor contours’ definition, we had all subjects labeled by the two neurosurgeons and subsequently fused the results to obtain a single consensus tumor contour for each subject. To take advantage of the 3D contextual information of the MRI, we randomly extracted patches from preprocessed T1W, T2W, Flair and T1CE images on the axial, sagittal and coronal views as the input images for the segmentation task. Four cropped patches (96 × 96 × 96 voxels) were randomly extracted from preprocessed T1W, T2W, Flair and T1CE images for each subject on the fly during every training iteration. Therefore, each segmentation model totally used 4*30000 patches for training. Considering the limited contribution of normal tissues to classify secreting types of PAs, the classification model was only trained with patches (96 × 96 × 96 voxels) that were extracted based on the tumor center calculated from the tumor mask. Therefore, a total of 4*3000 patches were applied for training for each classification model.

Deep learning model design

CNN for PAs segmentation

Our segmentation network was based on Residual Unet architecture (Fig. 2A). Similar to a standard Unet [43], which is consisted of an analysis path (encoder part) and a synthesis path (decoder part). This network had 35 convolutional layers and was made of the following blocks: Resblock, Conv3D for down-sampling and Deconv3D for up-sampling. Each Resblock used in this study, as shown in Sup. Fig. 1, consisted of a shortcut and a few stacked layers: the convolutional layers and the parametric rectified linear unit (PReLU) layers. The analysis path consists of the repeated Resblock, each followed by a Conv3D block that did a convolution operation with stride 2 in each dimension for down-sampling. In the synthesis path, the repeated Resblock was followed by a Deconv3D block that did a transposed convolution operation with stride 2 in each dimension for an up-sampling of the feature map. Shortcut connections from layers of equal resolution in the analysis path provided the essential high-resolution features to the synthesis path. In the last layer, a convolution operation reduces the number of output channels to the number of labels, which is 2 in our case. The segmentation model would output a pixel-wise mask of the input image which 1 stands tumor tissue and 0 stands normal tissue. The segmentation model contains a total of 76,967,968 parameters for learning.

Fig. 2

Segmentation and classification network overview. (A) Segmentation network architecture. (B) Classification network architecture. (C) Convolutional block attention module used in the classification network.

CNN classifier for secreting function type prediction

Transfer learning aims to transfer knowledge between related source and target domains [44]. Transfer learning methods can be divided into instance-transfer, feature-transfer, parameter-transfer and relational-knowledge-transfer approaches. These approaches [23], [24] focused on feature transfer between datasets under different tasks, even from nonmedical datasets. Here, we didn’t adapt the idea of transfer learning from natural 2D images due to the purpose of making full use of the third dimension of MRI images. In comparison, we assumed that the source task (segmentation task) and the target task (classification task) here shared some parameters or prior distributions with the hyper-parameters of the models. Therefore, we transferred the learned weights from the segmentation models to train the classification models. Our proposed classification model, as shown in Fig. 2B, made use of the trained analysis path (encoder) in the segmentation model to extract the features of the MRI images. Combined with an attention module, our model learned to suppress irrelevant regions in an input image while highlighting salient features. The attention module (shown in Fig. 2C) used in our network is a Convolutional Block Attention Module (CBAM) [45]. The input feature was refined based on an attention mask generated by CBAM. The weights of the analysis path in the classification network were transferred from trained segmentation models. The decoder was modified to adapt to the classification task. The synthesis path and final layer of the segmentation network were removed. Instead, a 3D average pooling layer and a fully connected layer followed by a softmax layer, with an output size of two, were inserted. The classification would predict the probability of the patient with functioning pituitary adenomas (FPAs). A total of 38,497,810 parameters is available for learning for the classification model.

Multi-view model combination

To make full use of 3D contextual information, both segmentation and classification models were trained on extracted axial, sagittal, and coronal images, respectively. In the validation and testing procedure, predictions for segmentation and classification on different views were combined to obtain the final predictions as combined model prediction results. At test time, for each segmentation network structure, the corresponding versions of trained models were used to obtain a segmentation result from these three views, and these softmax outputs were averaged to obtain a single fused result. The classification network structure was tested similarly.

Experimental setup

The segmentation and classification networks were implemented in the TensorFlow library and NiftyNet platform in Python [46], [47] and were trained on an NVIDIA GTX2080Ti GPU. The main hyper-parameters of the two architectures are shown in Sup. Table 2. For the classification task, a two-phase training was used. In the first phase, the network was trained on training patches for 2500 iterations. During the first phase, all layers except the attention module and the fully connected layers were fixed. In the second phase, the network was trained on training patches for another 500 iterations. During the second phase, all layers were trainable. For the segmentation task, three same architecture networks (as shown in Fig. 2A) were trained based on the MRI patches extracted on the axial, sagittal and coronal views, separately. Similarly, three same architecture classification networks (as shown in Fig. 2B) were trained on the axial, sagittal and coronal views separately based on transferring the encoder part of the three trained segmentation networks.

Statistical analysis

Statistical analysis for demographic variables was performed by using chi-square tests for categorical data and one-way ANOVA for continuous data. For the segmentation model, the experimental results were evaluated based on two main metrics, namely, the Dice similarity coefficient (DSC) and the Hausdorff distance. For tumor regions, we obtained a binary map with algorithmic predictions and the experts’ consensus truth , and we calculated the Dice score which is defined as: For surface distance evaluation, we calculated the Hausdorff distance. For two point sets X and Y, the one-sided HD from X to Y is defined as: And similarly, for : Finally, the Hausdorff distance is defined as: For the classification model, the classification performance was evaluated by generating receiver operating characteristics (ROC) and precision-recall (PR) curves. The AUROC among different models was compared by Delong’s method [48].

Results

Patient characteristics

The flow diagram of this study is shown in Fig. 1. A total of 185 patients were included. As shown in Sup. Table 1, no significant differences in sex, age, Knosp’s Grade, tumor type, tumor volume and diameter were observed among the training, validation 1, validation 2 and the external testing datasets.

Model construction for PAs segmentation from MRI images

We randomly selected one patient’s MRI image in testing dataset and visualized the segmentation results (shown in Sup. Fig. 2). The MRI scan slices on the axial view, sagittal view and coronal view are visualized in Figure S2(A), Figure S2(B) and Figure S2(C), respectively. From these example segmentations, our model had a promising performance for 3D MRI slices. Table 1 presents quantitative evaluations in the validation dataset 1. It shows that the axial, sagittal, and coronal models achieved average Dice scores of 0.7942, 0.8024 and 0.8082 for the whole tumor. Using the multi-model ensemble method, the multi-view combined model achieved the best performance in the validation dataset 1 (average Dice score of 0.8188) and was better than GrowCut algorithm (average Dice score of 0.7014). To further evaluate our proposed model, we collected another 24 samples as validation dataset 2 to validate. As shown in Sup. Table 3, our proposed segmentation model still achieved the better performance (average Dice score close to 0.810 and average Hausdorff distance close to 5.352 mm) than GrowCut algorithm (average Dice score close to 0.689 and average Hausdorff distance close to 33.605 mm). In the testing dataset, the axial, sagittal and coronal models achieved a similar Dice score (shown in Table 2) for the whole tumor. Similarly, the best performance was also done by the combined model with an average Dice score of 0.81. The combined model still achieved a better performance than GrowCut algorithm (average Dice score of 0.6893). These results demonstrated the potential of our segmentation models in 3D MRI segmentation tasks.

Table 1

Dice and Hausdorff measurements between the proposed method and GrowCut algorithm in validation dataset 1. Bold numbers indicate the best performance values on Dice and Hausdorff measurements.

View	Dice_mean	Dice_std	Hausdorff_mean (mm)	Hausdorff_std (mm)
Axial	0.7942	0.0895	7.9551	6.2622
Sagittal	0.8024	0.1134	7.984	8.6931
Coronal	0.8082	0.0828	7.177	3.8330
Combined	0.8188	0.0763	6.4735	3.3578
GrowCut	0.7014	0.0595	27.607	6.7506

Table 2

Dice and Hausdorff measurements between the proposed method and GrowCut algorithm in testing dataset. Bold numbers indicate the best performance values on Dice and Hausdorff measurements.

View	Dice_mean	Dice_std	Hausdorff_mean (mm)	Hausdorff_std (mm)
Axial	0.7652	0.1159	11.2054	6.5137
Sagittal	0.7792	0.0991	11.2809	8.7102
Coronal	0.7646	0.1169	12.7353	9.0150
Combined	0.8093	0.0769	9.3599	5.4566
GrowCut	0.6893	0.0653	28.2917	6.6768

Dice and Hausdorff measurements between the proposed method and GrowCut algorithm in validation dataset 1. Bold numbers indicate the best performance values on Dice and Hausdorff measurements. Dice and Hausdorff measurements between the proposed method and GrowCut algorithm in testing dataset. Bold numbers indicate the best performance values on Dice and Hausdorff measurements.

Classification model for predicting functioning and nonfunctioning PAs

Transfer learning and an attention module were applied to explore feature representations. The attention-based model (Att model) was validated and assessed by comparison with the model trained by random initialization (RI model) and the model trained by transfer learning only (TF model). The RI and TF models shared the same architecture, which only removed the attention module in comparison with our proposed attention-based classification network. Hence, the random initializing model (RI model) and the transfer-learning only model (TF model) were the baseline models. A 4-fold cross-validation was performed in the training dataset by randomly shuffling the dataset and distributing them into 4 groups (75 samples for training and 25 samples for in-training validation). Validation and testing datasets were used to validate our proposed model with two baseline models after cross-validation. To evaluate the prediction performance of the proposed classification model, we performed a 4-fold cross-validation in training dataset on the axial, sagittal and coronal views. Fig. 3(A-D) shows the mean values of AUC and ROC curves of the RI, TF and Att models trained on the different plane views. Fig. 3E presents AUROC comparison results for RI, TF and Att model on axial, sagittal, coronal and combined views. Sup. Table 4 presents quantitative evaluations of an AUROC comparison for the RI, TF and Att models on the different plane views. As a result, the Att models trained on the axial, sagittal and coronal plane views showed a performance under 4-fold cross-validation with the area under the ROC curve close to 0.79. Similarly, the multi-view combined Att model achieved the best performance (AUC = 0.801; 95% CI, 0.738–0.855) and was significantly better (P < 0.0001) than the combined RI (AUC = 0.709; 95% CI, 0.639–0.772) and the combined TF (AUC = 0.713; 95% CI, 0.643–0.776) model.

Fig. 3

ROC analysis under 4-fold cross-validation. (A) the mean ROC curves of RI, TF and Att model trained on axial view. (B) The mean ROC curves of RI, TF and Att model trained on sagittal view. (C) The mean ROC curves of RI, TF and Att model trained on coronal view. (D) The mean ROC curves of multi-view combined of RI, TF and Att model. (E) Comparison results of averaged AUROC under 4-fold cross-validation for RI, TF and Att model on axial, sagittal, coronal and combined views. To check the robustness of our proposed model, we performed the 10-fold cross-validation in training dataset on the axial, sagittal and coronal views. Sup. Table 5 presents quantitative evaluations of an AUROC comparison for the RI, TF and Att models on different plane views. Under 10-fold cross-validation, the multi-view combined Att model achieved the best performance (AUC = 0.792; 95% CI, 0.726–0.849) and was significantly better than the combined RI (P = 0.0122, AUC = 0.724; 95% CI, 0.653–0.788) and the combined TF (P = 0.0307, AUC = 0.745; 95% CI, 0.675–0.807) model. These results suggested that our proposed classification model was reliable. To rigorously evaluate the prediction and generalizability performance of our proposed classification model, we next compared the combined Att, RI and TF models in the validation and testing datasets. Fig. 4 presents ROC curves, PR curves for combined RI, TF, Att model and the confusion matrix, diagnostic performances for combined Att model in validation dataset 1 and testing dataset. Supplementary Tables 6–7 present the quantitative ROC analysis and comparisons in the validation dataset 1 and testing dataset. The results show that the combined Att model achieved the best performance in the validation (AUROC = 0.8063; 95%CI, 0.708–0.883) and testing (AUROC = 0.8478; 95%CI, 0.725–0.947) datasets. The performance of the combined Att model in the validation dataset 1 and testing dataset was comparable with the performance in the training dataset (AUC = 0.801; 95% CI, 0.738–0.855). The diagnostic performance of our proposed model achieved an accuracy of 0.7083 with the Youden’s Index of 0.1667) in validation dataset 2 (Sup. Tables 8-9).

Fig. 4

Evaluation of classification model in validation and testing datasets. The ROC curves of multi-view combined RI, TF and Att model and the confusion matrix for multi-view combined Att model in the (A) validation dataset 1 and (B) testing dataset. Precision-Recall Curves of multi-view combined RI, TF and Att model in the (C) validation dataset 1 and (D) testing dataset. (E) The diagnostic performance of multi-view combined Att model in validation dataset 1 and testing dataset. To detect the classification performance within subgroups divided by clinical characteristics, we run the model in a combined dataset sub-grouped by gender and age. A total of 85 patients (27 FPA and 58 NFPA) were included, and 37 of them are female, with a median age of 48. As shown in Sup. Tables 10-11, the proposed classification model achieved similar performance (AUROC = 0.7937 in female subgroup, 0.7929 in male subgroup, 0.8108 in older subgroup, and 0.7976 in young subgroup).

Model interpretation with the attention mask

Models trained with an attention module could learn to suppress irrelevant regions in the input MRI images while highlighting the salient features. To determine how our proposed models identify the tumor region from the MRI images, attention maps were generated using contrasted T1W scan to exhibit where and what the models focus on. Fig. 5 showed the T1CE scan and its corresponding attention map for the patient with NFPA (Fig. 5(A-B)) and those with FPA (Fig. 5(C-D)). The degree of the attention weights is marked with different colors, where red represents the most attention paid by the model. Heterogeneous colors distributed among the contrasted T1W image indicate that the model trained with an attention mechanism pays different attention to the regions. As shown in Fig. 5, the profile of the color distribution in functioning and nonfunctioning pituitary adenomas is different, where the tumor region is marked in deep red for the nonfunctioning pituitary adenomas but in light red for functioning ones. Additionally, the tissues with the highest signal on contrasted T1W, including a cavernous and basal sinus, were marked in red, and the normal brain tissues were marked in light colors. Intriguingly, regions with the lowest signal on contrasted T1W were also in red, such as basal cisterns and the fourth ventricle.

Fig. 5

(A) Original contrast enhanced T1w (T1CE) image for the patient with NFPA. (B) Attention mask of the same T1CE image for the patient with NFPA. (C) Original contrast enhanced T1w (T1CE) image for the patient with FPA. (D) Attention mask of the same T1CE image for the patient with FPA. Basal cisterns and the fourth ventricle with low signals were marked as red in attention mask. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Discussion

MRI is generally preferred over CT for the diagnosis of PAs because of its superior definition of small lesions in the pituitary sella and its improved anatomic definition before surgery. At present, surgery is the standard first-line therapy for the treatment of patients with non-functioning PAs. Therefore, the accurate tumor contours segmentation and precise classification of secreting function types for PAs are crucial steps in surgical and treatment planning. With the recent progress of AI algorithm on MRI images, we expected a similar application that provides a more efficient manner to diagnosis and classify PAs directly with MRI images. Eventually, we developed two CNN models for PAs segmentation and classification of functioning and nonfunctioning PAs based on conventional T1W, T2W, Flair and T1CE MRI sequences. As a result, the combined segmentation model achieved 0.8188, 0.8091 and 0.8093 Dice scores for the whole tumor in two validation datasets and a testing dataset by using multi-model ensemble methods. For segmentation tasks, we found several factors that may explain the high performance achieved by our segmentation model. First, the use of a 3D convolutional neural network compared to a 2D convolutional neural network could make full use of the third MRI image dimension. Additionally, the Resblock used in the segmentation model allows us to build a deeper network and take advantage of the deep neural network’s powerful representational ability. Finally, the 3D segmentation models were trained on image patches extracted on axial, sagittal and coronal views. The results in Table 1, Table 2 revealed that joint use of the three models’ predictions achieved substantially improved performance after using any one model prediction in the validation and testing datasets. Moreover, the inference time was approximately 32.8 s per patient (using one RTX 2080Ti GPU). A previous study has shown that there was reduction of intra-observer variation (by 36.4%), reduction of interobserver variation by 54.5%, and time savings of 39.4% with automated segmentation model assistance for nasopharyngeal carcinoma [49]. Due to its short inference time and the accuracy of tumor segmentation, the automated segmentation model could be used as a PA computer-aided diagnosis tool for radiologists. Additionally, the combined classification model also achieved a high predicting performance with accuracies of 72.7% (AUROC = 0.8063) in validation dataset 1, 70.8% (AUROC = 0.7881) in validation dataset 2 and 82.3% (AUROC = 0.8478) in the testing dataset. During this task, we mainly investigated whether the integration of the transfer-learning method as well as the attention mechanism could substantially improve the overall performance. Four-fold cross-validation shows that transfer-learning based models (TF and Att models) achieved higher AUC than random initial models, while the attention-based models (Att models) also obtained a higher AUC value than transfer-learning models themselves (TF models). Further comparisons between TF, RI and Att models in training, validation and testing datasets proved our assumption that the source task (segmentation task) and the target task (classification task) shared certain parameters and similar prior distributions with the hyper-parameters of the models. Moreover, comparing to make network architecture deeper and wider, attention mechanisms used in classification models aim to improve classification performance without increasing in models’ complexity and computation. The generated attention masks allow the proposed Att model to concentrate adaptively on the abnormal regions. Experimental results in the training, validation and testing datasets (as shown in Sup. Tables 4-6) demonstrated that the attention module in our transfer learning-based models plays a critical role. The benefit comes from encoding a top-down attention mechanism into a bottom-up top-down feedforward convolutional structure in the classification model, so it can learn the specialized features of the input MRI images. Similarly, by using a multi-model ensemble method, the combined Att model was more robust and achieved the best performance. Model comparisons in the validation and testing datasets demonstrated that introducing attention module enables models to perform better. By visualizing the attention masks, we found that our models pay more attention to some regions with the lowest signal. It is unclear why these areas attract attention from the machine, and much more works are warranted to investigate the underlining mechanisms and its clinical significance. In the model of attention mask, the machine may be aware of some unique features from the MRI, and thus help the radiologists to differentiate functioning and nonfunctioning preoperatively. As far as we know, there is no theoretical basis for the classification of PAs based on MRI. However, some studies have explored the correlation between MRI and pathological features. For example, Peng et al. suggested a machine learning model which can immunohistochemically classify PAs with an MR-based radiomic analysis [50]. Similarly, diffusion-weighted imaging (DWI) MRI was reported to differentiate functional types of pituitary macro-adenomas in a small set of patients [51]. These data indicate that an MRI-based deep convolutional neural network is potential to aid in classifying the functioning status of PAs based on preoperatively MRI. The main limitation of this study is the sample size, resulting in fewer micro-adenomas for build segmentation models, which could cause poor generalization ability in unseen micro-adenomas data. In addition, the collected ACTH patients were relatively small comparing to GH and PRL patients, which could limit the accuracy of our classification model for ACTH patients. Our coming efforts will include more data points from those kinds of patients as well as a large sample size to further improve the accuracy of the models. Moreover, we failed to evaluate the ability segmentation model in analyzing tumor constituents. The constituent of PAs is a critical factor for surgical plans, which should be addressed in the future works. Finally, as only newly diagnosed and surgically treated PAs were recruited for the analysis, this model could only apply to patients with primary PAs which should be surgically, but not for the recurrent and/or those treated medically. Therefore, much more work is encouraged to investigate the role of deep learning in predicting the tumor constituent. Collectively, this research was the first computer-based description to predict subtypes of PAs by conventional 3D MRIs, and the models showed preferable performance in the testing set, enabling supporting early diagnosis and treatment plan for PAs. Our models have the potential to be used more widely as a practical tool to support PA early diagnosis and treatment planning.

Financial disclosure

The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Funding

This work was supported by grants from the National Natural Science Foundation of China (Grant Nos. 81702479, 31471252, 31771462); National Key R&D Program of China (Grant No. 2017YFA0106700); Program for Guangdong Introducing Innovative and Entrepreneurial Teams (Grant No. 2017ZT07S096); Guangdong Basic and Applied Basic Research Foundation (Grant No. 2020A1515010280). Science and Technology Program of Jiangmen, China (Grant No. 2018630100110019805). Guangdong Natural Science Foundation (Grant No. 2018A030313323); and Fundamental Research Funds for the Central Universities (SYSU: 18ykpy34).

CRediT authorship contribution statement

Hongyu Li: Data curation, Formal analysis, Investigation, Methodology, Software, Validation. Qi Zhao: Conceptualization, Formal analysis, Software, Supervision, Validation, Writing - review & editing. Yihua Zhang: Resources, Validation, Writing - original draft, Writing - review & editing. Ke Sai: . Lunshan Xu: Data curation. Yonggao Mou: Resources. Yubin Xie: Formal analysis, Funding acquisition, Software, Supervision. Jian Ren: Conceptualization, Methodology, Project administration, Resources, Writing - review & editing. Xiaobing Jiang: Conceptualization, Data curation, Funding acquisition, Investigation, Project administration, Resources, Supervision, Writing - original draft, Writing - review & editing.

3 in total

1. Generating novel pituitary datasets from open-source imaging data and deep volumetric segmentation.

Authors: Rachel Gologorsky; Edward Harake; Grace von Oiste; Mustafa Nasir-Moin; William Couldwell; Eric Oermann; Todd Hollon
Journal: Pituitary Date: 2022-08-09 Impact factor: 3.599

2. A Convolutional Neural Network Model for Detecting Sellar Floor Destruction of Pituitary Adenoma on Magnetic Resonance Imaging Scans.

Authors: Tianshun Feng; Yi Fang; Zhijie Pei; Ziqi Li; Hongjie Chen; Pengwei Hou; Liangfeng Wei; Renzhi Wang; Shousen Wang
Journal: Front Neurosci Date: 2022-07-04 Impact factor: 5.152

3. Deep Learning for Prediction of Progression and Recurrence in Nonfunctioning Pituitary Macroadenomas: Combination of Clinical and MRI Features.

Authors: Yan-Jen Chen; Hsun-Ping Hsieh; Kuo-Chuan Hung; Yun-Ju Shih; Sher-Wei Lim; Yu-Ting Kuo; Jeon-Hor Chen; Ching-Chung Ko
Journal: Front Oncol Date: 2022-04-20 Impact factor: 5.738

3 in total