Literature DB >> 33035963

A Bayesian 2D functional linear model for gray-level co-occurrence matrices in texture analysis of lower grade gliomas.

Thierry Chekouo¹, Shariq Mohammed², Arvind Rao³.

Abstract

In cancer radiomics, textural features evaluated from image intensity-derived gray-level co-occurrence matrices (GLCMs) have been studied to evaluate gray-level spatial dependence within the regions of interest in the brain. Most of these analysis work with summary statistics (or texture-based features) constructed using the GLCM entries, and potentially overlook other structural properties in the GLCM. In our proposed Bayesian framework, we treat each GLCM as a realization of a two-dimensional stochastic functional process observed with error at discrete time points. The latent process is then combined with the outcome model to evaluate the prediction performance. We use simulation studies to assess the performance of our method and apply it to data collected from individuals with lower grade gliomas. We found our approach to outperform competing methods that use only summary statistics to predict isocitrate dehydrogenase (IDH) mutation status.

Entities: Chemical Disease Gene Species

Keywords: 2D functional data; Bayesian variable selection; GLCM; Imaging sequences; Lower grade glioma; Texture analysis

Mesh：

Year: 2020 PMID： 33035963 PMCID： PMC7554657 DOI： 10.1016/j.nicl.2020.102437

Source DB: PubMed Journal: Neuroimage Clin ISSN： 2213-1582 Impact factor: 4.881

Introduction

In oncology, characterization of the tumor phenotype is a challenge owing to the heterogeneous tumor micro-environments and diverse clinical pathways. Cancer radiomics has emerged as an area to address such challenges. It deals with quantifying the intrinsic patterns in gray-level intensity patterns, texture, enhancement, morphology and shape of the tumor tissues. Recent advances include different types of quantitative assessments that are facilitated through dimension reduction by constructing features from delineated areas (or regions of interest). These features could potentially guide the understanding of the underlying composition of tissues which are suspected to have malignant pathology. Consequently, these image-derived features are expected to capture the underlying patterns indistinguishable to the human eye and facilitate down-stream analyses. There has been increased interest in understanding the associations between clinical features, molecular pathology and textural features based on the imaging data (Zulpe and Pawar, 2012, Kumar et al., 2015, Narang et al., 2017). Specifically, textural features evaluated using gray-level co-occurrence matrices (GLCMs) have been used to study gray-level spatial dependence within the delineated regions of interest. GLCMs are matrices defined over an image domain with entries as counts describing the spatial distribution of co-occurring gray-scale values (Haralick et al., 1973). Most of the analyses involving GLCMs work with the texture-based features (or summary statistics) rather than evaluating the GLCM in its entirety. In this paper, we propose a Bayesian probabilistic modeling framework to identify the associations between a phenotype and the GLCMs constructed based on Magnetic Resonance Imaging (MRI) scans. We specifically focus on lower grade gliomas (LGG), which is a specific type of brain cancer. The proposed framework facilitates modeling the GLCM within a two-dimensional functional space to capture the spatial dependencies in the gray-levels, thereby extending traditional analysis based on summary statistics or textural features. This modeling can be utilized under a classification or regression setting corresponding to any relevant clinical phenotype.

Lower grade gliomas

Lower grade glioma is a neoplastic disease that develops in the glial cells of the brain. In general, gliomas are graded from I to IV according to the World Health Organization criteria. Tumors grades II and III are usually classified as LGG, as opposed to the more common higher grade tumors, also known as glioblastoma multiforme. We primarily work with The Cancer Genome Atlas (TCGA) (Tomczak et al., 2015) cohort for LGG data which involves 108 cases with the tumor histology as astrocytoma, oligodendroglioma, or oligoastrocytoma. This histologic data and the MRI acquisition protocols can be found in the publicly available data files of patients (Pedano et al., 2016). Patients in this cohort had undergone routine MRI prior to surgery and treatment. These MRI images are acquired in four different sequences: Pre-contrast T1-weighted (T1W), post-gadolinium T1 contrast enhanced (T1CE), T2-weighted (T2W), and T2 fluid-attenuated inversion recovery (FLAIR). We show all four MRI sequences corresponding to an axial slice for an LGG patient in Figs. 1(a-d). The patient image data, the tumor segmentation labels and the corresponding clinical data were extracted from The Cancer Imaging Archive (TCIA) (Clark et al., 2013, Kuthuru et al., 2018) and TCGA. The tumor area is shown as a delineated region on the image for all four sequences in Figs. 1(a-d). However, the whole brain MRIs are three-dimensional data objects with stacked axial layers corresponding to each subject. Voxel intensity values are sensitive to the MRI scanner configuration and are difficult to interpret and compare either between study visits within a single subject, or across different subjects. This emphasizes the necessity of pre-processing in terms of intensity value normalization, which we address by performing a biologically motivated normalization technique using the R package WhiteStripe (Shinohara et al., 2014). For consistency across all the subjects and scans, the normalization was implemented under the same default settings in the R package WhiteStripe.

Fig. 1

Axial slice of a brain MRI for a LGG subject shown for four imaging sequences T1W, T1CE, T2W and FLAIR. The segmented tumor region is shown with the (red) boundary overlaid on the images. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Molecular Sub-types in LGG

Several molecular alterations have been shown to be associated with the overall survival of the LGG patients. These include isocitrate dehydrogenase (IDH1/2) mutation status (collectively referred to as IDH mutations), 1p19q chromosomal arm codeletion, MGMT promoter methylation status, and TP53 mutation (Leu et al., 2013, Moritz-Gasser and Herbet, 2013, Pignatti et al., 2002). Multiple molecular sub-types of LGG have been identified (Eckel-Passow et al., 2015, Network, 2015), which predominantly include IDH wild-type, IDH mutant with 1p19q codeleted, and IDH mutant with 1p19q noncodeleted groups. These groupings were used to show that LGGs with IDH wild-type have molecular characteristics and behavior similar to glioblastoma and have been associated with shorter survival. Specifically in the case of glioblastoma, studies also indicate that patients with IDH1 mutation have better prognosis compared to the patients with wild-type (Hartmann et al., 2010). Since the IDH status is such an important aspect in case of gliomas, in this paper we aim to assess the associations between the imaging measurements and the IDH status in the case of LGGs. Prediction of molecular status (e.g. IDH status) in gliomas using radiomic features is an extensively studied problem in the field of radiomics (Chaddad et al., 2015, Hsieh et al., 2017, Li et al., 2017, Chang et al., 2018, Jakola et al., 2018, Li et al., 2018, Rathore et al., 2018, Tian et al., 2018, Han et al., 2019, Liu et al., 2019).

Gray-level co-occurrence matrices

A predominant approach in image texture analysis to evaluate gray-level spatial dependencies is through the analysis of GLCM. A GLCM is a matrix defined over an image domain with its entries computed as counts, describing the spatial distribution of co-occurring gray-scale values in the image voxels at a given spatial offset and orientation/angle (Haralick et al., 1973). The entry in a GLCM is the total number of pixel pairs that have gray-levels j and k in the image in a specified direction and distance. Consider a two-dimensional image (e.g. an axial slice from the MRI), the -th entry in the corresponding GLCM represents the number of instances that pixel pairs with gray-levels j and k are neighbors either horizontally, vertically or diagonally. The GLCM entries can be computed by traversing pixel-by-pixel across the whole image or region of interest. That is, for a given pixel (with gray-level j), we consider its neighbors at and at a distance one (immediate neighboring pixel). For each of these four neighbors, if its gray-level is k then the -th entry of the GLCM is incremented by one, and this leads to a non-symmetric GLCM. However, adding one to both -th and -th entries creates a symmetric GLCM. Symmetric GLCMs can also be constructed by looking at all eight directions for each pixel. For a three-dimensional image the number of directions increase to thirteen as we have an additional dimension. GLCMs constructed by considering all the directions are symmetric and rotation invariant. GLCMs can also be constructed by looking for pixel pairs at different distances/scales and not necessarily constrained to immediate voxel neighbors. For each patient and imaging sequence, a GLCM is computed from the three-dimensional tumor volume observed in the corresponding sequence. We consider symmetric as well as non-symmetric GLCMs from T gray-level bins. The GLCM object thus constructed provides a huge dimension reduction mapping from the image space to the data lattice. We choose which is a common choice in applications involving GLCMs. However, we would like to emphasize that our approach in this paper can be readily employed to various choices of the gray-levels. The GLCMs also facilitate further statistical modeling and analysis in diagnostic oncology, which is otherwise hindered due to irregular shape and sizes of the tumor in the original image space. Current modeling approaches using GLCM focus on deriving textural features, which are summary statistics constructed using the GLCM entries (Hitam et al., 2003, Lee et al., 2015, Ghosh et al., 2015). Many studies have also reported high correlation between the derived textural features which leads to over-fitting (Albregtsen et al., 2008, Zhang et al., 2008, Kassner and Thornhill, 2010). These textural features also potentially overlook other structural properties in the GLCM. Some of the most commonly used textural features based on the GLCM are contrast, correlation, energy, entropy and homogeneity (definitions below). These features are interpreted in their conventional sense as summary statistics and are referred to as the Haralick features.where is the -th entry in the GLCM, and and are the marginal means of row j and column k, respectively. Similarly, and are the marginal standard deviations of row j and column k, respectively.

Statistical framework

In this work, we propose a novel Bayesian approach using a generalized two-dimensional (2D) functional linear models to model the IDH status using the GLCMs. Our framework builds nonlinear association models between scalar responses and functional predictors. To utilize the gray-level spatial structure in the GLCM, we use it as our fundamental data object rather than any texture features derived from them. Li et al., 2019 is one of the very few studies that uses the whole GLCM (instead of the textural features) to classify adrenal lesions through spatial Bayesian modeling. The data object in their study is a structured lattice whereas we consider the GLCM as a two-dimensional functional object. We treat the GLCM counts as 2D functional data for each imaging sequence, and account for both symmetric and non-symmetric GLCMs. Our method is part of the functional prediction regression (FPR) family of models which relate a non-functional response (e.g. IDH status) to functional predictors (e.g. GLCMs). There is an extensive literature on FPR models (Baladandayuthapan et al., 2015, James, 2002, Ramsay and Dalzell, 1991, Morris, 2015, Vannucci et al., 2003, Vannucci et al., 2005, Ferraty and Vieu, 2003, Cardot et al., 1999, Cardot et al., 2003, Ferraty and Vieu, 2002, Muller, 2005). A nice review on functional regression can be found in Morris, 2015. Most of previous approaches to modeling functional regression effects have focused on one-dimensional smoothing, but fewer studies have focused on multidimensional smoothing used mostly as the tensor product of basis of each dimension (Eilers and Marx, 2003, Marx et al., 2011, Marx and Eilers, 2005). Inspired by the work of Chen et al., 2017, the representation of our two-dimensional functional data is based on a tensor product of eigenfunctions of marginal kernels. As this representation has a simple interpretation, we newly adopt it in the context of Bayesian FPR models. Moreover, we also account for measurement error in the GLCMs, and provide regularization using mixture prior distributions. The manuscript is organized as follows. In Section 2, we illustrate our proposed Bayesian modeling approach. In Section 3, we discuss posterior inference and show how our modeling framework can be used to predict the IDH status. In Section 4, we present results from simulation studies in which we found our approach to outperform competing methods that use summary statistics. In Section 5, we discuss the results of our inference on the data set of TCGA LGG patients. We conclude in Section 6 with a discussion and some directions for future work.

A Bayesian 2D functional linear model

In this study, we consider imaging and clinical data from n TCGA LGG patients who had all undergone routine MRI prior to surgery and treatment. For the ith patient, we observe if IDH is mutant and 0 otherwise. Moreover, for each patient i, we observe (-) GLCM count for imaging sequence with gray-levels and for every , where T is the total number of gray-level intensities. For both simulated and LGG data, we used the Z-score standardization for for each entry and imaging sequence r. In our LGG case study, we consider gray-levels and imaging sequences: T1W, T1CE, T2W and FLAIR as described in Section 1.2.

Theoretical motivation

Let be the mean of the process for all where . Under fairly general assumptions, Chen et al., 2017 showed that the two dimensional process can be written aswhere the ’s, and ’s, are respectively the eigenfunctions of the marginal kernels and is the covariance function of , and is the principal component score associated with the pair of eigenfunctions such that for and for . The representation of in Eq. (1) can be seen as an extension of the Karhunen-Loève expansion (Loève, 1963), and is well suited for situations where the two arguments of play symmetric roles. From Eq. (1), we can show that ’s, and ’s, are respectively eigenfunctions of and , where and are the marginal functions of (see Theorem 6.1 of the Supplementary Materials). Hence, those eigenfunctions can be estimated after performing standard functional principal component analysis on the marginal stochastic processes and .

Model formulation

We treat the non-symmetric GLCM counts from each sequence r as 2D functional data indexed by pairs of gray-levels. We then model the (-) GLCM counts as followswhere is a mean-zero white-noise process with variance . Hence we assume that for subject i, the GLCM counts are noisy realizations of the true latent process at intensity level . When GLCMs are symmetric, we then assume that , and . Hence the indices of the double summation in Eq. (3) would be restricted to belong to the set and the off-diagonal terms ’s () are multiplied by two. We use () to denote a continuous clinical outcome for patient i. To illustrate the main ideas, we describe the simplest case with continuous outcome data and extend the method for binary outcome later. In order to relate imaging with clinical outcome, we assume a functional linear regression model with a scalar response defined aswhere is the continuous functional regression coefficient associated with GLCM for imaging sequence r, and is a Q-dimensional vector that encodes for nonfunctional fixed effects (confounders such as demographic variables: gender, age, etc) with as the associated regression coefficients. Here we take the first component of to be 1 to include the intercept term. Using Eq. (3), Eq. (4) can be written aswhere Under the assumption that is defined in the tensor Hilbert space (Takesaki, 2001, Bratteli and Robinson, 2003), can be written as a linear combination of the orthonormal system using the representation The basis functions and are not known, hence we use functional principal component (FPC) analysis to construct their estimators (Ramsay and Silverman, 2005, Hall et al., 2006, Yao et al., 2005) by using respectively the marginal functions and . Once the orthonormal basis coefficients or the FPC scores have been estimated, we can reduce (5) by applying truncated approximations in (3), which giveswhere and are the truncation parameters for the predictors from the rth imaging sequence. Again, when GLCMs are symmetric, and . Hence, the indices of the double summation k and l in Eq. (8) belong now to and the off-diagonal regression coefficients () are also multiplied by two. For both simulated and observed data, we used a fraction of variance explained (FVE) of at least 95% in order to estimate and . Alternatively, we may formulate it as a model selection procedure and choose them by using some model selection criterion, such as the deviance information criterion (DIC). For the case of binary response (IDH status) we consider a probit model that linearly relates GLCMs to the binary response. That is, we adopt a data augmentation approach (Albert and Chib, 1993) to define our predictive model as in (8) whereand if subject i has a mutant IDH and if subject i has a wild type IDH. In this context, the variance of the latent continuous variable is (Albert and Chib, 1993).

Prior Distributions

Variable selection is an important problem in functional regression model and we aim to identify both imaging sequences and their corresponding components that contribute the most to the prediction of the response. As commonly done in Bayesian variable selection, we define the indicator vector where if component for imaging sequence r is selected and 0 otherwise. We then define adaptively regularized representations of the effect functions by placing a mixture prior on , the coefficient effect of latent scores for every . In Eq. (10), is the Dirac delta function concentrated at 0, and denotes the probability density function of a normal distribution with mean and variance . It follows from prior (10) that if and if . Let us define and assume that for every (), and . The shrinkage hyper-parameter in Eq. (10) is defined as (, fixed) such as the covariance matrix of is diagonally dominant (and symmetric), hence, it is guaranteed to be positive definite. Instead of defining a mixture prior for the coefficient effects ’s (see Eq. 10), we can simply assume that follows i.e. we do not impose sparsity on the set of components. This approach can be valid when the number of components is less than the number of subjects but the computational cost of its MCMC algorithm is higher than the sparsity approach. We made a comparison of the two approaches (sparsity and no sparsity) in terms of computational cost and component selection in the Supplementary Materials (see Section 7). We have also added the option to run the non-sparsity approach to our MCMC algorithm. Under our model formulation, controls the overall correlation between the regression coefficient functions and if both imaging sequences r and are associated with . Otherwise, there is no correlation. This assumption is motivated by the fact that the imaging sequences provide information about the same region and also seem to be positively correlated, hence we would expect their effects to be correlated as well. Moreover, in the case where one imaging sequence is associated and the other are not, the method can learn from the data and impose a partial correlation, thereby enabling the effects of other imaging sequences. Fig. 1 of the Supplementary Materials presents empirical correlations between sequences of the observed GLCMs. It shows that most of the empirical correlations are positive. For example, we observe stronger pairwise correlations between FLAIR and T2W, possibly as the tumor region is indicated by a mass-like hyperintense signal in both sequences (Bruno et al., 2019). We assume independent binomial prior distributions with probability for the binary parameter . We also impose gamma prior distributions on the covariance with shape parameter a and rate parameter b. If the outcome is binary, it is assumed that the variance of the error term (or the latent variable ), is set to 1 (Albert and Chib, 1993). Otherwise, if it’s continuous, we assume a conjugate inverse gamma prior for with parameters and . The latent scores ’s are assumed to be normally distributed , as is commonly assumed in functional principal component analysis (Crainiceanu and Goldsmith, 2010), where follows an inverse gamma distribution with parameters and , and represents the eigenvalues for each imaging sequence r. We finally impose a conjugate inverse gamma prior for with parameters and , and independent normal for components of the coefficient vector for nonfunctional features.

Posterior inference

For posterior inference, our primary interest is in the estimation of the selection indicator and regression effects of the latent scores ’s. We implement a Markov Chain Monte Carlo (MCMC) algorithm for posterior inference that employs partially collapsed Gibbs sampling (van and Park, 2008) to sample and the regression coefficients . In fact, for each sequence r, to sample the binary matrix given (), we integrate out first . Then, we sample the regression coefficients from their full conditional distributions. In Section 8 of the Supplementary Materials, we present the full conditionals to update all parameters. We outline the steps of our MCMC algorithm below. Sample from its full conditionals as defined in Eq. (5) of the Supplementary Materials. Sample from its full conditionals as defined in Eq. (6) of the Supplementary Materials. Sample from its posterior full conditional using a Metropolis-Hasting step as decribed in Section 8 (Supplementary Materials). Sample the parameters , from their full conditional distributions as defined respectively in Eqs. (7)–(10) of the Supplementary Materials. When the response is binary, sample as a truncated normal distribution as described in the Supplementary Materials (see Section 8). Sample from its full conditionals as defined in Eq. (11) of the Supplementary Materials.

Prediction

We use an M-fold cross-validation where the n samples are partitioned into M subsets. We follow a similar procedure adopted in Chekouo et al., 2017 for survival time outcome. We set for our LGG case study, and for each sample , we denote with the partition that contains i (validation set). Let be the binary outcome vector in the partition , and be the outcome from the remaining partitions (training set). The cross-validation density for the i-th individual is estimated via model averaging; following Gelfand and Dey, 1994, Lamnisos et al., 2012, Chekouo et al., 2017, we use as the importance density of the target distribution , where . The predictive probability can then be written aswhere , and G is the number of distinct models ’s obtained from the MCMC samples. We approximate the densitieswhere is the cumulative distribution function of the standard normal distribution, is the posterior mode of obtained from Eq. (2), is the posterior mean determined by averaging sample values of from each iteration is the posterior mode of , and if and 0 otherwise. Hence, the inclusion probability for subject can be estimated as

Simulation study

To demonstrate the performance of our 2D functional linear model on the GLCM data we conduct a simulation study under different scenarios. Note that in our real data analysis GLCMs are generated from the pixel values of the original tumors in the MRI scans of patients with low-grade gliomas. However, for simulation purposes we focus on simulating the GLCMs directly instead of trying to create them from simulated MRI scans. Each GLCM is a matrix where the -th entry corresponds to the GLCM count generated to represent the neighborhood pattern of gray-level j and gray-level k. For each subject we want to generate four GLCMs corresponding to four imaging sequences. Let be the GLCM corresponding to the imaging sequence r and be its -th entry for sequence r. Let be a matrix which specifies the underlying mean structure of the pattern in the GLCM. The entries of are generated in a deterministic way to include different patterns for the GLCM. We determine the value of each by corresponding it to the probability of a certain region under a bivariate normal distribution. Steps (7)-(8) in Algorithm 1 specify the exact computations to generate symmetric GLCMs. Note that Q is constructed to be a symmetric matrix to generate a symmetric GLCM. As alluded to previously, we observe correlation between the GLCMs across imaging sequences, which will be incorporated as correlation between the entries for all . We incorporate this correlation through the error term , where and is a pre-specified correlation structure with being the noise. For the simulations in this section, we consider to be the empirical correlations between the -th GLCM entries across all four sequences based on all the subjects from our real data. We now compute the logarithm of entries of as , where w is an integer sampled randomly from a discrete uniform distribution such that . Here w accommodates for the varying tumor volume (total number of tumor pixels) for each subject. We choose and based on the real data. The inclusion of additive noise in -scale could also be interpreted as multiplicative noise for the original GLCM count . The values will be rounded to the nearest integer to generate integer-valued GLCM counts. Algorithm 1 summarizes the procedure to simultaneously generate correlated GLCMs for multiple imaging sequences. Note that first we generate randomly from a , which indicates the category label for subject i. Based on this label we assign the mean parameter and variance parameter to specify the underlying patterns in the GLCM.To create the different simulation scenarios, we fix the number of gray-levels as . To construct the GLCMs for the subjects with , we consider the mean parameter as and covariance matrix as for all . Similarly, for the subjects with , we consider the mean parameter as and covariance matrix for all . Let be a correlation matrix. We assume the same covariance structure to construct GLCMs of all four sequences for the subjects in the same category, that is, for some and . We broadly divide our simulation scenarios into three cases with respect to different mean patterns in the GLCM between all four imaging sequences. These scenarios are trying to capture various degrees of co-occurrence for different gray-levels. For example, higher values close to the diagonal reflects strong co-occurrence of voxel values indicating spatial homogeneity, whereas, higher values away from the diagonal reflect less co-occurrence indicating spatial heterogeneity. Table 1 shows the specific choices of and for all for different simulation scenarios we consider. These choices are structured so that two of the four imaging sequences have similar underlying patterns across both the response categories. That is, only GLCMs with sequences and are associated with the response for each scenario (except case C). Fig. 2, Fig. 3, Fig. 4 show the generated GLCMs for a subject with for cases and C, respectively.

Table 1

Choice of the mean parameter for the bivariate normal distribution used to generate to create simulation cases and C.

Case	{m1(1),m1(2),m1(3),m1(4)}	{m0(1),m0(2),m0(3),m0(4)}
A	{(1,7),(3,5),(5,3),(7,1)}	{(1,7),(2,6),(4,4),(7,1)}
B	{(1,7),(1,4),(1,1),(5,2)}	{(1,7),(4,4),(6,2),(5,2)}
C	{(4,4),(4,4),(4,4),(4,4)}	{(4,4),(4,4),(4,4),(4,4)}

Fig. 2

Case (Counts) of symmetric GLCM for a subject with where the pattern between imaging sequences is shifted on the diagonal. In this case, we have and .

Fig. 3

Case (Counts) of symmetric GLCM for a subject with where the pattern between imaging sequences is shifted away from the diagonal. In this case, we have and .

Fig. 4

Case (Counts) of symmetric GLCM for a subject with where the pattern between imaging sequences is the same. In this case, we have .

Choice of the mean parameter for the bivariate normal distribution used to generate to create simulation cases and C. Case (Counts) of symmetric GLCM for a subject with where the pattern between imaging sequences is shifted on the diagonal. In this case, we have and . Case (Counts) of symmetric GLCM for a subject with where the pattern between imaging sequences is shifted away from the diagonal. In this case, we have and . Case (Counts) of symmetric GLCM for a subject with where the pattern between imaging sequences is the same. In this case, we have . For each of the three cases and C we further consider two scenarios through varying covariance structure of the bivariate normals. We consider the first scenario with and the second scenario with and for each of the three cases. We also consider two possible values for the noise: (a) and (b) . In summary, we include the characteristics of GLCMs from MRI data such as (i) varying tumor volume (total number of tumor pixels) between subjects using the parameter w, (ii) dependence between the GLCM entries from the four imaging sequences, and (iii) various spatial heterogeneity/homogeneity patterns using the entries of . We have also generated similarly non-symmetric GLCMs with the same scenarios. The algorithm to generate them is presented in Section 2 of the Supplementary Materials. Hyperparameter Settings: We set the hyperparameters as follows. A vague prior is assigned to the coefficient of nonfunctional effects by specifying the hyperparameter as a large value. Specifically, we set . We also set , the hyperparameter of the regression effects which allows the covariance matrix of to be positive definite. We set the hyperparameters of the prior distributions of as and . We set a non-informative prior on by specifying its hyperparameters as . Finally, we set , the hyperparameter of and the prior probability to select a specific component. A sensitivity analysis with different choices of , and h shows that the posterior inference is quite robust to the specification of these hyperparameters (details in Supplementary Material, Section 4).

Simulation results

To highlight the comparison of using symmetric and non-symmetric GLCM matrices, we defined two Bayesian methods for GLCM matrices: Bayes2Dsym is our Bayesian approach for 2D functional data that accounts only for symmetric GLCM, while Bayes2Dnosym can account for non-symmetric GLCMs. Both methods were run for 80,000 MCMC iterations with a burn-in period of 10,000 iterations. To assess the convergence of the MCMC algorithm on simulated data, we ran two MCMC chains for each case with randomly chosen starting points. Correlations between the marginal posterior probabilities of were greater than 0.98 indicating good concordance between the two chains. Additional MCMC diagnostic checks can be found in the Supplementary Materials Section 5. We further compared our methods with three alternative approaches that use summary statistics obtained from GLCM as covariates (i.e. the 13 Haralick features described in Section 1.2): the penalized L1 lasso binomial regression of Friedman et al., 2010 implemented in the R package glmnet, the SCAD penalty for support vector machine (SVM) of Zhang et al., 2005 implemented in the R package penalizedSVM, and the Random Forest of Breiman, 2001 implemented in the R package randomForest. We applied those methods with a total of covariate features. We computed the AUC to assess the prediction performance, and predictive probabilities were computed as described in Section 3.1 for . Fig. 5 shows violin plots with density plots on each side, and sample means (marked as a diamond in the middle of plots) of AUCs. It shows the prediction performance for all competing methods across 12 scenarios: Bayes2Dnosym always performs best (much higher AUC), and Bayes2Dsym is the second best. This actually shows that the use of the raw GLCM (i.e without symmetrizing them) gives better predictive performance in particular when the noise in the GLCMs is large. Moreover, the prediction performance is worst when using summary statistics with the competing methods. However, all methods perform similarly in Case B under the scenario of different covariance structure between the two groups of responses (0 and 1), and . We have also compared our methods with the three alternative methods that use the whole symmetric and non-symmetric GLCMs (not summary statistics) as covariates. We still outperform the competing methods (see Figs. 11 and 12 in Section 9 of the Supplementary Materials for details). Finally, we have also run our two methods when i.e. without prior correlation between marginal effects of imaging sequences on the response . Results presented in Supplementary Material (Section 3) show that they are not significantly different from those with but seem slightly inferior.

Fig. 5

Simulation results using whole symmetric and non-symmetric GLCMs with our methods (Bayes2Dsym and Bayes2Dnosym), but competing methods (RF, glmnet and SVM) are fitted using (symmetric) GLCM-derived Haralick summary features. AUC were computed (in %) to evaluate the prediction performance. SC stands for same covariance i.e , and DC stands for different covariances i.e. and . Fig. 6 shows plots for marginal posterior probabilities (MPP) of and for a scenario. They show that all MPPs of components from FLAIR () and T1CE () sequences are less than 0.5 but for sequences T1W () and T2W (), some are larger than 0.5. This is aligned with our simulated data where sequences FLAIR and T1CE are not associated with the binary response as they were generated with the same means and covariances between the two groups of responses (i.e and ).

Fig. 6

Marginal posterior probabilities of on a simulated data scenario: Case A, and .

Application to TCGA LGG Data

In this section, we summarize results from our methods applied to the TCGA LGG dataset described in Section 1.1, which consists of TCGA LGG patients whose MRI scans were available for all imaging sequences (T1W, T1CE, T2W and FLAIR). To assess the utility of the whole GLCM in terms of prediction on the LGG data, we fitted the following models: (i) our models Bayes2Dnosym and Bayes2Dsym by using the non-symmetric and symmetric GLCM matrices respectively and (ii) competing methods as described in Section 4.1 using summary statistics as covariates. We adjusted all the models with clinical data such as age and gender. To assess the convergence of the Bayes2Dnosym and Bayes2Dsym models, we fitted 10 MCMC chains with different starting points. We assessed the convergence of binary variables by evaluating the correlation coefficients between their marginal posterior probabilities. These indicated good concordance between the 10 chains, with all correlations for each imaging sequence (figures in Supplementary Material, Section 5). Additional MCMC diagnostics checks for continuous parameters (in particular ) can be found in Supplementary Material. We run our methods with the same hyperparameter settings as described in Section 4. Figure 7(b) shows violin plots of AUCs for predictive performances of the aforementioned models. It shows that our two methods (Bayes2Dsym and Bayes2Dnosym) perform much better than the competing methods as they have higher AUCs. Moreover, our method Bayes2Dnosym which accounts for non-symmetric GLCMs has the highest predictive performance. Our methods outperform the competing methods, when the competing methods use the whole GLCM data as covariates, showing the importance of modeling the GLCM data as 2D objects. Fig. 8.

Fig. 7

Fig. 8

Comparison of predictive performance on the LGG data over 10 partitions of the sample data. competing methods used the whole GLCM data as covariates. (A) Bayes2Dnosym; (B) Bayes2Dsym; (C) glmnet; (D) RF and (E) SVM.

Comparison of predictive performance on the LGG data over 10 partitions of the sample data. Symmetric and non-symmetric GLCMs are used with our methods Bayes2Dsym and Bayes2Dnosym respectively, but competing methods (RF, glmnet and SVM) are fitted using GLCM-derived haralick features (computed from symmetric GLCMs). (A) Bayes2Dnosym; (B) Bayes2Dsym; (C) glmnet; (D) RF and (E) SVM. Comparison of predictive performance on the LGG data over 10 partitions of the sample data. competing methods used the whole GLCM data as covariates. (A) Bayes2Dnosym; (B) Bayes2Dsym; (C) glmnet; (D) RF and (E) SVM. Figs. 9(a)–(d) show marginal posterior probabilities of component inclusion for each imaging sequence . We can observe that only the imaging sequence T1CE does not have high probabilities which would confirm that sequences T1W, T2W and FLAIR are potential biomarkers for IDH status in LGG disease (Patel et al., 2017). From the four sequences of the MR imaging in lower grade gliomas, it is known that the tumor region could be slightly hypointense compared to white matter in T1W, whereas usually there is no enhancement observed in T1CE. However, with the T2W and FLAIR sequences, the tumor regions are usually observed with a mass-like hyperintense signal. This confirms with our findings as the marginal posterior probabilities corresponding to components from T1W, T2W and FLAIR are clearly higher as shown in Fig. 9, Fig. 9(d). This indicates that the differences in signals from these sequences is contributing towards better predictive capability of the model for the IDH status. For diffuse-astrocytomas, the tumor region in T1CE does not usually show enhancement except rare instances of small ill-defined areas (Bruno et al., 2019). In oligodendrogliomas contrast enhancement is shown in about 50% of the cases but is not a reliable indicator (Frank, 2019). The lack of enhancement could potentially contribute towards the GLCM not being able to consistently capture the information about the spatial heterogeneity. This could be one of the possible reasons for not seeing significant associations with the GLCMs from T1CE.

Fig. 9

Marginal posterior probabilities of obtained from the method Bayes2Dnosym.

Marginal posterior probabilities of obtained from the method Bayes2Dnosym. Figs. 10(a)–(d) show estimated functional effects for the non-symmetric case, where we see that the effect sizes corresponding to the GLCM for T1CE are lower in magnitude in comparison to the other three imaging sequences. In case of T1W since the tumor region is usually observed to be hypointense compared to the white matter, we expect to see more voxels with intensities in the middle of the gray-scale. This fact is captured from our analysis as we see that the functional effect sizes are high in magnitude for the cells corresponding to middle region on the gray-scale. However, in T2 and FLAIR sequences, the tumor region in mostly observed to be hyperintense and we see more voxels with intensities on the higher end of the gray-scale. This aspect is clearly captured as we see higher magnitudes for the functional effect sizes corresponding to the higher end of the gray-scale. Similarly, in the symmetric case we see from Figs. 11(a)–(d) that the functional effect sizes for T1CE are too small in magnitude compared to the other three imaging sequences. For T1W images we see that the effect sizes are higher for regions in the middle of the gray-scale consistent with the non-symmetric case. For T2W and FLAIR sequences we see higher magnitudes on the higher-end of the gray-scale as the tumor region is observed to be hyperintense in the image which is also consistent with the non-symmetric case.

Fig. 10

Functional regression effects of GLCMs for imaging sequence , based on non-symmetric matrices.

Fig. 11

Functional regression effects of GLCMs for imaging sequence , based on symmetric GLCM matrices.

Functional regression effects of GLCMs for imaging sequence , based on non-symmetric matrices. Functional regression effects of GLCMs for imaging sequence , based on symmetric GLCM matrices.

Conclusion

In this manuscript, we have proposed a Bayesian predictive model to accurately predict the IDH status for LGG patients based on GLCM matrices obtained from multiple imaging sequences. Our Bayesian approach has several innovative characteristics: (i) it employs novel image-dependent variable selection priors, leveraging correlations between imaging sequences through priors imposed on regression coefficients, (ii) it explicitly accounts for the structure of the whole GLCM matrices rather than restricting its use to some derived summary statistics, (iii) it accounts for both symmetric and non-symmetric GLCM matrices, and (iv) it treats any sequence r of GLCM counts as 2D functional data, indexed by pairwise gray-level intensities. We elaborate on these features below. The performance of our method was evaluated on simulated data and on a data set collected from TCGA LGG patients. Both studies show that using the whole GLCM matrices enhances prediction performances. While GLCM-derived features are usually computed on symmetric matrices as symmetric GLCMs are rotation and direction invariant, our method with non-symmetric GLCMs performs slightly better than symmetric GLCMs in terms of prediction performance. Prediction of IDH mutation status (and other molecular status) in gliomas using radiomic features is an extensively studied problem in the field of radiomics. The first step in these methods is feature construction from one or more MRI sequences, followed by either (a) standard statistical modeling (e.g. logistic regression, random forest) using those features (Chaddad et al., 2015, Hsieh et al., 2017, Jakola et al., 2018, Li et al., 2018, Tian et al., 2018, Han et al., 2019, Liu et al., 2019), or (b) deep learning-based approaches (Li et al., 2017, Chang et al., 2018). Some of the existing literature for IDH prediction in gliomas, using either texture features or morphological features or both, report AUCs greater than . However, one of the main drawbacks of these approaches is the subjectivity in the choice of what features to include, and how many of them to include. The choice of these features is also reliant upon the pre-processing steps involved during the feature selection. However, our proposed method addresses the issue of subjectivity in the choice of features, and is an end-to-end approach that takes as input the MRI scans from four imaging sequences to compute the corresponding GLCMs. These GLCMs are directly used as data objects in the modeling such that there is no subjectivity in feature construction. We build a classification model for the IDH status by directly using the GLCMs from the four MRI sequences as predictors and treating them as 2D functional data objects. Our approach aims to capture the entire information in the GLCM instead of only using summary statistics constructed from the GLCM as predictors. We developed a rigorous functional data based approach, that respects the underlying geometry of the space of GLCMs, to predict the IDH status. Our approach provides an alternative way that complements the existing approaches which use texture/morphological features to predict the molecular status in LGGs. Our Bayesian hierarchical model can be generalized and extended in different directions. One can incorporate molecular data features (e.g. RNA-seq and proteomic data) as potential response variables. This may also require the incorporation of the dependence between GLCM and molecular data as similarly implemented in Chekouo et al., 2016. Although our motivating example was LGGs, our framework could readily be applied to gliomas in general, and other types of cancer where GLCMs can be constructed from imaging data.

CRediT authorship contribution statement

Thierry Chekouo: Methodology, Software, Formal analysis, Resources, Data curation, Writing - original draft, Funding acquisition. Shariq Mohammed: Conceptualization, Software, Formal analysis, Data curation, Writing - original draft, Funding acquisition. Arvind Rao: Conceptualization, Data curation, Formal analysis, Writing - review & editing, Resources, Project administration, Funding acquisition.

Algorithm 1 Simulating symmetric GLCMs for four imaging sequences

1: Set (m(r),Σ(r))=(m1(r),Σ1(r)) or (m0(r),Σ0(r)) based on y=1 or 0, respectively.

2: Let w be an integer sampled randomly from a discrete uniform distribution U{v1,v2}.

3: for

j∈1,…,T

4: for

k∈1,…,j

5: Let ∊jk=∊kj=(∊jk(1),…,∊jk(4))⊤, where ∊jk∼N4(0,σ∊2Δjk).

6: for

r∈1,…,4

7: S← square given by (k-1,T-j),(k,T-j),(k,T-j+1),(k-1,T-j+1).

8: qjk(r)=qkj(r)=P(X∈S), where X∼N2(m(r),Σ(r)).

9: Compute the GLCM entry as gjk(r)=wexp(log(qjk(r))+∊jk(r)).

30 in total

Review 1. Texture analysis: a review of neurologic MR imaging applications.

Authors: A Kassner; R E Thornhill
Journal: AJNR Am J Neuroradiol Date: 2010-04-15 Impact factor: 3.825

2. Texture analysis of multiple sclerosis: a comparative study.

Authors: Jing Zhang; Longzheng Tong; Lei Wang; Ning Li
Journal: Magn Reson Imaging Date: 2008-05-29 Impact factor: 2.546

3. MRI texture analysis based on 3D tumor measurement reflects the IDH1 mutations in gliomas - A preliminary study.

Authors: Liang Han; Siyu Wang; Yanwei Miao; Huicong Shen; Yan Guo; Lizhi Xie; Yuqing Shang; Junyi Dong; Xiaoxin Li; Weiwei Wang; Qingwei Song
Journal: Eur J Radiol Date: 2019-01-24 Impact factor: 3.528

4. Quantitative texture analysis in the prediction of IDH status in low-grade gliomas.

Authors: Asgeir Store Jakola; Yi-Hua Zhang; Anne J Skjulsvik; Ole Solheim; Hans Kristian Bø; Erik Magnus Berntsen; Ingerid Reinertsen; Sasha Gulati; Petter Förander; Torkel B Brismar
Journal: Clin Neurol Neurosurg Date: 2017-12-05 Impact factor: 1.876

5. Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors: Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal: J Stat Softw Date: 2010 Impact factor: 6.440

6. A Bayesian integrative approach for multi-platform genomic data: A kidney cancer case study.

Authors: Thierry Chekouo; Francesco C Stingo; James D Doecke; Kim-Anh Do
Journal: Biometrics Date: 2016-09-26 Impact factor: 2.571

7. Deep Learning based Radiomics (DLR) and its usage in noninvasive IDH1 prediction for low grade glioma.

Authors: Zeju Li; Yuanyuan Wang; Jinhua Yu; Yi Guo; Wei Cao
Journal: Sci Rep Date: 2017-07-14 Impact factor: 4.379

8. IDH mutation-specific radiomic signature in lower-grade gliomas.

Authors: Xing Liu; Yiming Li; Shaowu Li; Xing Fan; Zhiyan Sun; Zhengyi Yang; Kai Wang; Zhong Zhang; Tao Jiang; Yong Liu; Lei Wang; Yinyan Wang
Journal: Aging (Albany NY) Date: 2019-01-29 Impact factor: 5.682

9. IDH/MGMT-driven molecular classification of low-grade glioma is a strong predictor for long-term survival.

Authors: Severina Leu; Stefanie von Felten; Stephan Frank; Erik Vassella; Istvan Vajtai; Elisabeth Taylor; Marianne Schulz; Gregor Hutter; Jürgen Hench; Philippe Schucht; Jean-Louis Boulay; Luigi Mariani
Journal: Neuro Oncol Date: 2013-02-13 Impact factor: 13.029

10. A Visually Interpretable, Dictionary-Based Approach to Imaging-Genomic Modeling, With Low-Grade Glioma as a Case Study.

Authors: Srikanth Kuthuru; William Deaderick; Harrison Bai; Chang Su; Tiep Vu; Vishal Monga; Arvind Rao
Journal: Cancer Inform Date: 2018-10-05

2 in total

1. Exploration of an Integrative Prognostic Model of Radiogenomics Features With Underlying Gene Expression Patterns in Clear Cell Renal Cell Carcinoma.

Authors: Yeqian Huang; Hao Zeng; Linyan Chen; Yuling Luo; Xuelei Ma; Ye Zhao
Journal: Front Oncol Date: 2021-03-08 Impact factor: 6.244

2. A Radiomics Model for Predicting Early Recurrence in Grade II Gliomas Based on Preoperative Multiparametric Magnetic Resonance Imaging.

Authors: Zhen-Hua Wang; Xin-Lan Xiao; Zhao-Tao Zhang; Keng He; Feng Hu
Journal: Front Oncol Date: 2021-09-02 Impact factor: 6.244

2 in total