Literature DB >> 35574500

ordinalbayes: Fitting Ordinal Bayesian Regression Models to High-Dimensional Data Using R.

Kellie J Archer1, Anna Eames Seffernick1, Shuai Sun1, Yiran Zhang2.   

Abstract

The stage of cancer is a discrete ordinal response that indicates the aggressiveness of disease and is often used by physicians to determine the type and intensity of treatment to be administered. For example, the FIGO stage in cervical cancer is based on the size and depth of the tumor as well as the level of spread. It may be of clinical relevance to identify molecular features from high-throughput genomic assays that are associated with the stage of cervical cancer to elucidate pathways related to tumor aggressiveness, identify improved molecular features that may be useful for staging, and identify therapeutic targets. High-throughput RNA-Seq data and corresponding clinical data (including stage) for cervical cancer patients have been made available through The Cancer Genome Atlas Project (TCGA). We recently described penalized Bayesian ordinal response models that can be used for variable selection for over-parameterized datasets, such as the TCGA-CESC dataset. Herein, we describe our ordinalbayes R package, available from the Comprehensive R Archive Network (CRAN), which enhances the runjags R package by enabling users to easily fit cumulative logit models when the outcome is ordinal and the number of predictors exceeds the sample size, P > N, such as for TCGA and other high-throughput genomic data. We demonstrate the use of this package by applying it to the TCGA cervical cancer dataset. Our ordinalbayes package can be used to fit models to high-dimensional datasets, and it effectively performs variable selection.

Entities:  

Keywords:  LASSO; cumulative logit; penalized models; spike-and-slab; variable inclusion indicators

Year:  2022        PMID: 35574500      PMCID: PMC9097970          DOI: 10.3390/stats5020021

Source DB:  PubMed          Journal:  Stats (Basel)        ISSN: 2571-905X


Introduction

Despite the advent of HPV vaccinations and effective screening programs, globally, cervical cancer is the fourth most common cancer among women [1]. The estimated number of new cases in 2020 is 604,127 with 341,831 deaths [2]. The stage of cervical cancer, as outlined in the International Federation of Gynecology and Obstetrics (FIGO) guidelines, is based on physical examinations, endoscopic procedures, and imaging. Specifically, the FIGO stage is based on the size and depth of the tumor as well as the level of spread [3]. It is important that the stage, a discrete ordinal response, be correct as it is used to guide treatment planning, counsel patients with respect to prognosis, and to determine whether the patient meets eligibility criteria for available clinical trials or other research studies [4,5]. Unfortunately, there is still debate as to whether surgical or non-invasive radiological modalities for identifying parametrial and lymph node involvement is preferred when staging a patient [4]. Thus, it is clinically relevant to identify molecular features from high-throughput genomic assays that are associated with the stage of cervical cancer to elucidate pathways related to tumor aggressiveness, identify improved molecular features that may be useful for staging, and identify therapeutic targets. Penalized frequentist models have been widely applied when analyzing high-dimensional data. Such models were initially described for linear [6] and logistic [7] regression and subsequently for ordinal response models [8-10]. However, when applying penalized frequentist models, the penalty parameter, or vector of parameters in the case of elastic net, must be selected by the analyst. As a result, the coefficient estimates from the resulting model are conditional on that penalty parameter. For that reason, penalized Bayesian models were developed for the linear [11-14] and logistic [15-17] regression settings. We also recently described penalized Bayesian models for the ordinal response setting [18] and demonstrated that our penalized Bayesian cumulative logit model has improved variable selection performance when compared to penalized frequentist cumulative logit models [19]. Herein, we describe our ordinalbayes R package, which enhances the runjags R package [20] by enabling users to easily fit penalized Bayesian cumulative logit models. The ordinalbayes function can be used to fit LASSO, normal spike-and-slab, double exponential spike-and-slab, and regression-based variable inclusion indicator Bayesian models. Variable selection can be performed using the Bayes factor or using the posterior distributions of the variable inclusion indicators directly. In the following sections, we describe our implementation and describe the syntax required for each of our Bayesian models. We then illustrate the functions in the ordinalbayes R package using two examples where we were interested in identifying transcripts important to predicting the FIGO stage in cervical cancer patients using high-throughput gene expression data. A small example is provided in Appendix A.

Materials and Methods

Ordinal Bayesian Models and R Syntax

We previously described four penalized cumulative logit Bayesian models that can be fit when the covariate space is high-dimensional [18]. This includes a regression-based variable inclusion indicator ordinal model, a LASSO ordinal model, a normal spike-and-slab ordinal model, and a double exponential spike-and-slab ordinal model. To introduce our penalized cumulative logit Bayesian models, we let Y1, … ,Y represent the ordinal responses for n subjects, which can take on one of 1, … , K ordinal response levels, with K representing the number of ordinal levels. Let = (x, x, … , x)′ represent the vector of covariates for subject i, where P represents the number of predictors. When assuming proportional odds, the effect of each covariate is constant across all ordinal response levels such that the slope for the ordinal responses are parallel. For each ordinal level k = 1, 2, … , K − 1, let = (β1, β2 … , β)′ denote a vector of unknown regression coefficients. The cumulative logit model is where Pr(Y ≤ k|) is the cumulative probability of the event Y ≤ k given . The thresholds differentiate between the K ordinal levels and must satisfy the constraint −∞ = α0 < α1 <α2 < ··· < α <α = ∞. Herein, we describe our ordinalbayes package that enhances the functionality of the runjags package by providing functions specific to fitting these four penalized ordinal Bayesian models and extracting results of interest. We also provide an overview of each model. Tables summarizing the package functions and syntax appears in Appendix C. The primary function for model fitting in the ordinalbayes package is ordinalbayes. The function arguments are This function accepts a model formula that specifies the ordinal response on the left-hand side of the equation and any unpenalized predictor variable(s) on the right-hand side of the equation. Unpenalized predictors are variables such as age that we include in the model without applying any shrinkage of their corresponding parameter estimates. When unpenalized predictors are included as covariates in the model, the user can specify the variance associated with the corresponding model parameters (default coerce.var = 10). If no unpenalized predictor variables are included, the model formula should be y ~ 1 (representing the intercept). The user can subset the data.frame prior to model fitting, for example, subset=(race ==“white”). To specify the penalized covariates in the model, the user should pass the data.frame to the x parameter, indicating the relevant columns of covariates. By default, the penalized covariates are centered (center = TRUE) and scaled (scale = TRUE). The selected parameters are initialized prior to updating through MCMC. For one chain, the k − 1 ordinal thresholds, α, are initialized to the logit of the cumulative response probabilities, which is equivalent to the estimated k − 1 thresholds in an intercept-only model For multiple chains, initial values for the α terms for chains beyond the first chain are sampled from a Normal(0, 0.5) distribution and then sorted to impose the α1 < ··· <α order restriction. Within the MCMC, the α terms are sampled from a Normal , and users can adjust the variance by specifying alpha.var (default 10 such that the precision is 0.10). All penalized coefficients (β for j = 1, … , P) are initialized to zero. Other relevant parameters common to all model types include: nChains, the number of parallel chains for the model (default 3); adaptSteps, the number of iterations for adaptation (default 5000); burnInSteps, the number of iterations of the Markov chain to run (default 5000); numSavedSteps, the number of saved steps per chain (default 9999); and thinSteps, the thinning interval for monitors (default 3). Provided the user will be running the model on a machine with multiple processors, the computational speed can be improved by running the chains in parallel by specifying parallel = TRUE. When parallel = TRUE, runjags executes the MCMC sampling using nChains parallel processors. To ensure the user can obtain reproducible results, seed accepts an integer that is used to set the random seed. The output from JAGS can be suppressed by specifying quiet = TRUE. The user can fit one of four available Bayesian models. A list of the parameters the user can set for all four models is provided in Table A1. Following Section 2.1.1, which describes applying ordinalbayes to Bioconductor objects, each of the four models is described along with the relevant arguments that must be specified by the user. A list of the parameters the user needs to set for each specific model is provided in Table A2.
Table A1.

ordinalbayes parameters available for all models.

ParameterDescription and Default Values

alpha.var Variance for αk in the MCMC chain (default 10)
coerce.var Variance associated with any unpenalized predictors in the MCMC chain (default 10)
adaptSteps Number of iterations for adaptation (default 5000)
burnInSteps Number of iterations of the Markov chain to run (default 5000)
nChains Number of parallel chains to run (default 3)
numSavedSteps Number of saved steps for each chain (default 9999)
thinSteps The thinning interval for monitors (default 3)
parallel Run the MCMC on multiple processors (default TRUE)
model Specify which penalized ordinal model to fit (default regressvi)
center If TRUE (default), center the variables to be penalized in the model
scale If TRUE (default), scale the variables to be penalized in the model
seed An integer value for the random seed to ensure reproducibility
quiet If TRUE, suppress output of JAGS (or rjags) when updating models (default FALSE)
Table A2.

ordinalbayes parameters for each penalized ordinal Bayesian model.

ModelParameters in Ordinalbayes Call to SpecifyDescription

lassoa, bThe penalty parameter λ ∼Gamma(a, b) (default a = 0.1, b = 0.1)

normalss sigma2.0 The variance for the spike (set to some small positive value, e.g., 0.01)
sigma2.1 The variance for the slab (set to some large positive value, e.g., 10)
gamma.ind=“fixed”, pi.fixed Use a constant prior for πj of pi.fixed (default 0.05)
gamma.ind=“random”, c.gamma, d.gamma Use a random prior for πj ∼Beta(c.gamma, d.gamma), for example, c.gamma = 0.01, d.gamma = 0.19.

dess a, b The penalty parameter λ ∼Gamma(a, b) (default a = 0.1, b = 0.1)
lambda0 The parameter value for the spike, e.g., lambda0 = 20
gamma.ind=“fixed”, pi.fixed Use a constant prior for πj of pi.fixed (default 0.05)
gamma.ind=“random”, c.gamma, d.gamma Use a random prior for πj ∼Beta(c.gamma, d.gamma), for example, c.gamma = 0.01, d.gamma = 0.19.

regressvi a, b The penalty parameter λ ∼Gamma(a, b) (default a = 0.1, b = 0.1)
gamma.ind=“fixed”, pi.fixed Use a constant prior for πj of pi.fixed (default 0.05)
gamma.ind=“random”, c.gamma, d.gamma Use a random prior for πj ∼Beta(c.gamma, d.gamma), for example, c.gamma = 0.01, d.gamma = 0.19.

Use with Bioconductor Objects: SummarizedExperiment and ExpressionSet

When analyzing data processed using the DESeq2 Bioconductor package, the genomic feature object is of class DESeqTransform, which is a SummarizedExperiment, and therefore, the phenotypic data are accessed using the colData extractor function. When analyzing data processed using packages that structure the genomic feature object as a Biobase ExpressionSet, the phenotypic data are accessed using the pData extractor function. Therefore, in the ordinalbayes call, data should be either a colData() or pData() call to the genomic feature object. Again, the ordinalbayes function accepts a model formula that specifies the ordinal response on the left-hand side of the equation and any unpenalized predictor variable(s) from the phenotypic dataset on the right-hand side of the equation. If no unpenalized predictor variables are included, the model formula should be y ~ 1 (representing the intercept). When specifying the penalized covariates in the model, the user should pass to the x parameter the appropriate call for extracting the genomic feature data from the object. For SummarizedExperiment objects, the genomic features to be penalized are accessed using the assay() extractor function. For ExpressionSet objects, the genomic features to be penalized are accessed using the exprs() extractor function. The user can also pass a matrix to x; however, the user needs to carefully verify that the observations in the x matrix are appropriately aligned to the phenotypic data. Note that the number of rows in both data and x should be the same, such that the transpose of assay or exprs should be supplied to x.

Regression-Based Variable Inclusion Indicator Ordinal Model

By default, the model that is fit is the regression-based variable inclusion indicator Bayesian model, specified by model = “regressvi”. This model takes the form and assumes the penalized coefficients are from a Laplace (or double exponential) distribution with parameter λ and that λ is from a Gamma distribution with parameters a and b. Based on our extensive simulations [19], model performance is not affected by choices of a and b, so we provide defaults of 0.1 for both. The variable inclusion indicator γ is assumed to follow a Bernoulli distribution with parameter π. The user can use either a fixed constant prior (default) or a random prior. When using a fixed constant prior, the user must specify both gamma.ind=“fixed” and set pi.fixed to some constant in the (0, 1) interval (default is 0.05). Alternatively, a random prior for π is acheived by specifying both gamma.ind=“random” and parameter values (c.gamma and d.gamma) for the Beta distribution. Values of c.gamma and d.gamma should be selected such that the mean of the Beta distribution for the variable inclusion indicators corresponds to the anticipated proportion of covariates truly associated with the ordinal response, given by c/(c + d), while considering that the variance is given by If unpenalized coefficients are included in the model, their coefficients are ζ ~ Normal .

Lasso Ordinal Model

The LASSO Bayesian ordinal model can be fit by specifying model=“lasso”. This model assumes the penalized coefficients β for j = 1, … , P are from independent Laplace (or double exponential) distributions with parameter λ and that λ is from a Gamma distribution with parameters a and b. As previously mentioned, model performance is not affected by choices of a and b, so we provide defaults of 0.1 for both. If unpenalized coefficients are included in the model, their coefficients are .

Normal Spike-and-Slab Ordinal Model

The normal spike-and-slab Bayesian ordinal model can be fit by specifying model=“normalss”. This model is given by When fitting this model, the user is required to specify the variance for the spike by setting sigma2.0 to a small positive value (e.g., 0.01) and variance for the slab by setting sigma2.1 to a large positive value (e.g., 10). As with the regression-based variable inclusion indicator Bayesian model, the variable inclusion indicator γ is assumed to follow a Bernoulli distribution with parameter π. The user can use either a fixed constant prior (default) or a random prior. When using a fixed constant prior, the user must specify both gamma.ind=“fixed” and set pi.fixed to some constant in the (0, 1) interval (default is 0.05). Alternatively, a random prior for π is acheived by specifying both gamma.ind=“random” and parameter values (c.gamma and d.gamma) for the Beta distribution. If unpenalized coefficients are included in the model, their coefficients are .

Double Exponential Spike-and-Slab Ordinal Model

The double exponential spike-and-slab ordinal model can be fit by specifying model=“dess” and is given by When fitting this model the user is required to specify the parameter for the spike (λ0) using lambda0, which should be a large positive value (e.g., 20), while the slab is taken to be a double exponential distribution with parameter λ where that λ is from a Gamma distribution with parameters a and b. As with the regression-based variable inclusion indicator and Normal spike-and-slab models, the variable inclusion indicator γ is assumed to follow a Bernoulli distribution with parameter π. The user can use either a fixed constant prior (default) or a random prior. When using a fixed constant prior, the user must specify both gamma.ind=“fixed” and set pi.fixed to some constant in the (0, 1) interval (default is 0.05). Alternatively, a random prior for π is achieved by specifying both gamma.ind=“random” and parameter values (c.gamma and d.gamma) for the Beta distribution. If unpenalized coefficients are included in the model, their coefficients are .

Other Package Functions

The ordinalbayes function yields an object of class ordinalbayes. Generic functions have been specifically tailored to extract meaningful results from the resulting MCMC chain. The print function returns several summaries from the MCMC output for each parameter monitored, including: the 95th lower confidence limit for the highest posterior density (HPD) credible interval (Lower95), the median value (Median), the 95th upper confidence limit for the HPD credible interval (Upper95), the mean value (Mean), the sample standard deviation (SD), the mode of the variable (Mode), the Monte Carlo standard error (MCerr,) percent of SD due to MCMC (MC%ofSD), effective sample size (SSeff), autocorrelation at a lag of 30 (AC.30), and the potential scale reduction factor (psrf). The plot function provides a trace of the sampled output and optionally the density estimate for each variable in the chain. This function additionally adds the appropriate beta and gamma labels for each penalized variable name. When identifying important covariates, the regression-based variable inclusion indicator, normal spike-and-slab, and double exponential spike-and-slab Bayesian ordinal models all incorporate a variable inclusion indicator, γ, in the model. Variable selection can be based on whether the posterior mean of γ exceeds a pre-specified threshold. Alternatively, we can use the Bayes factor to test the hypotheses H0 : γ = 0 versus H : γ = 1, where the null hypothesis is rejected for feature j if the Bayes factor exceeds a pre-specified threshold. For the LASSO, normal spike-and-slab, and double exponential spike-and-slab Bayesian ordinal models, the Bayes factor can be used to test an interval null hypothesis H0 : |β| ≤ ϵ versus H : |β| > ϵ, where ϵ is a small positive value that is close to 0. For the regression-based variable inclusion indicator Bayesian ordinal model, the Bayes factor can be used to test H0 : |γβ| ≤ ϵ versus H : |γβ| > ϵ. Note that for the Bayesian LASSO, no variable inclusion indicators are incorporated, so variable selection can only be performed using the Bayes factor for β. The summary function requires an ordinalbayes object, and the user can specify epsilon (default 0.1) for testing the null hypothesis that H0 : |β| ≤ ϵ. The output from summary is a list containing the following components: alphamatrix, the MCMC output for the threshold parameters; betamatrix, the MCMC output for the penalized parameters; zetamatrix, The MCMC output for the unpenalized parameters (if included); gammamatrix, the MCMC output for the variable inclusion parameters (not available when model = “lasso”); gammamean, the posterior mean of the variable inclusion indicators (not available when model = “lasso”); gamma.BayesFactor, Bayes factor for the variable inclusion indicators (not available when model = “lasso”); Beta.BayesFactor, Bayes factor for the penalized parameters; and lambdamatrix, the MCMC output for the penalty parameter (not available when model=“normalss”). The coef function also accepts an ordinalbayes object and returns a function (default is method=mean) of the posterior distribution of the penalized parameter estimates and variable inclusion indicators. The predict function accepts an ordinalbayes object and optionally allows the user to specify new data for unpenalized predictors and the penalized predictors by invoking neww = and newx =, respectively. If neww and newx are not supplied, the original data are used for prediction. The model.select parameter allows the user to obtain model predictions through one of three different methods. When model.select = “average” (default), the mean coefficient values over the MCMC chain are used to estimate fitted probabilities; the predicted response is attaining the maximum fitted probability. When model.select = “median”, the median coefficient values over the MCMC chain are used to estimate fitted probabilities; the predicted ordinal response is attaining the maximum fitted probability. When model.select = “max.predicted.class”, each step in the chain is used to calculate fitted probabilities and the ordinal response, then the final predicted ordinal response is taken as that ordinal response level that is most frequently predicted. The function fitted is synonymous with predict.

Analysis of Cervical Cancer Dataset

We downloaded the transcript-level HTSeq count data for the 309 subjects from the The Cancer Genome Atlas Cervical Squamous Cell Carcinoma and Endocervical Adenocarcinoma (TCGA-CESC) project [21] having transcriptome profiling performed using the TCGAbiolinks Bioconductor package [22]. We then restricted attention to the 253 cervical cancer subjects with a primary diagnosis of squamous cell carcinoma. Subsequently, we removed one subject whose sample was FFPE preserved, one subject with metastatic disease, two subjects who contributed only solid normal tissue, and seven subjects lacking FIGO stage. This left 242 subjects in Stage I (N = 124), II (N = 61), and III-IV (N = 57). Using the DESeq2 Bioconductor package [23], we performed differential expression analysis using the stage as the independent predictor in the negative binomial model. We then applied the regularized log transformation to robustly transform the count data to a log2 scale to stabilize the variance and then filtered the resulting dataset to retain transcripts that had a mean expression > 0.5 and FDR< 0.10 from the stage I versus stages III/IV contrast. We fit a regression-based variable inclusion indicator Bayesian ordinal model using a Beta(0.01, 0.19) hyperprior for the π using the runjags package to run three parallel chains with 5000 burn-in, 5000 tuning steps, and thinned to keep every third step in the sampling process to reduce auto-correlation in our posterior samples, and we kept 9999 saved steps per chain. Convergence was assessed using Gelman and Rubin’s potential scale reduction factor (PSRF).

Results

There were 1137 transcripts that were differentially expressed at a Benjamini–Hochberg FDR< 0.05 and 2009 transcripts that were differentially expressed at a Benjamini-Hochberg FDR< 0.10 when examining the contrast between stage I and stages III/IV. These 2009 transcripts were retained for Bayesian modeling. Forty transcripts had a Bayes factor > 4 when testing H0 : |γβ| ≤ 0.1 versus H : |γβ| > 0.1. Forty-one transcripts had a Bayes factor > 4 when testing H0 : γ = 0 versus H : γ = 1 (Table 1). Notably, the features were the same with the exception that Bayes factor testing γ = 0 additionally identified ENSG00000115548 (Gene symbol KDM3A).
Table 1.

Transcripts significant from the regression-based variable inclusion indicator Bayesian ordinal model when testing H0 : γ = 0 versus H : γ = 1 using the Bayes factor and a threshold of 4. Annotation information obtained on 28 February 2022 from https://www.ncbi.nlm.nih.gov/gene, https://www.genecards.org, and https://lncipedia.org.

Ensemble IDGene SymbolChr γ¯

ENSG00000076344 RGS11 160.179
ENSG00000077274 CAPN6 X0.264
ENSG00000101888 NXT2 X0.194
ENSG00000115548 KDM3A 20.174
ENSG00000122884 P4HA1 100.186
ENSG00000125430 HS3ST3B1 170.286
ENSG00000131370 SH3BP5 30.175
ENSG00000135443 KRT85 120.334
ENSG00000136457 CHAD 170.179
ENSG00000138398 PPIG 20.240
ENSG00000150636 CCDC102B 180.281
ENSG00000161277 THAP8 190.283
ENSG00000163510 CWC22 20.301
ENSG00000164485 IL22RA2 60.196
ENSG00000164651 SP8 70.231
ENSG00000166091 CMTM5 140.215
ENSG00000166342 NETO1 180.197
ENSG00000171121 KCNMB3 30.186
ENSG00000177173Pseudogene, parent NAP1L4P110.258
ENSG00000180229 HERC2P3 150.196
ENSG00000188817 SNTN 30.236
ENSG00000197360 ZNF98 190.214
ENSG00000203601LINC0097010.316
ENSG00000225449 RAB6C-AS1 20.235
ENSG00000230201Pseudogene, parent ATP6V0CP1170.286
ENSG00000233996Pseudogene, parent KDM3AP120.248
ENSG00000236138 DUX4L26 30.247
ENSG00000236819LINC01563170.311
ENSG00000250602lnc-ALDH7A1-150.246
ENSG00000253923Pseudogene, parent HSPE180.302
ENSG00000256980 KHDC1L 60.207
ENSG00000259083lnc-TRAPPC6B-1140.263
ENSG00000259134LINC00924150.352
ENSG00000260484lnc-OPRK1-280.263
ENSG00000263612lnc-ZNF517-480.228
ENSG00000264049MIR4737170.266
ENSG00000264954 PRR29-AS1 170.221
ENSG00000265579lnc-CBLN2-1180.227
ENSG00000271711Pseudogene, parent SAP3030.264
ENSG00000272071lnc-PAPD7-250.279
ENSG00000276517Lnc-TTC27-920.221
Many genes listed in Table 1 are relevant to cervical cancer, related cancers of the female reproductive system, or cancer in general. For example, in a tissue-based study, CAPN6 was not detected in normal cervical squamous epithelium, but its expression was observed in low-grade and increased further in high-grade squamous cervical intraepithelial lesions [24]. KDM3A is an epigenetic regulator that has been found to be highly expressed in cervical cancer tissues and involved in cervical cancer progression [25]. P4HA1 was included in a five-gene signature to predict cervical cancer prognosis [26]. A previous study suggested that CMTM5 is a tumor suppressor that is frequently methylated and thus loses function in cancer [27], including cervical cancer [28]. RAB6C has been shown to be aberrantly methylated in cervical cancer compared to normal tissues [29]. ALDH7A1 was among 30 genes that demonstrated a dose–response pattern with NNK, a tobacco carcinogen, in cervical cancer samples [30], implicating tobacco may be a causative factor in cervical cancer development in addition to HPV infection. Other genes, while not yet described in cervical cancer, have been found to be prognostic in ovarian cancer (RGS11 [31], CHAD and CBLN2 [32], NETO1 [33], HSPE1 [34], and BIRC6, which Lnc-TTC27–9 is intronic to [35]). The expression of SH3BP5 is reduced in ovarian cancer samples compared to normal tissue and that silencing of Sab protein expression may lead to chemo-resistance [36]. The expression of SNTN has high discriminatory power to differentiate between normal tissue, serous borderline ovarian tumors, and serous ovarian carcinoma [37]. IL22RA2 is highly expressed in various tissues, including those in the female reproductive system [38]. With respect to genes associated with other cancers, NXT2 was among 12 genes used to define prognostic risk groups in melanoma [39]. A review article described that the aberrant expression of HS3ST3B1 is observed in many cancers, and the authors posited that HS3ST3B1 may act as a tumor-promoting enzyme [40]. The expression of KRT85 was found to be associated with overall survival in subjects with colon cancer [41]. When using the fitted model using the 2009 transcripts, only 16.9% of subjects were misclassified, with all misclassifications in Stage II. However, when fitting a parsimonious model including only the 41 transcripts in Table 1, the misclassification rate decreased to 11.6%. For evaluating the effectiveness of this multi-category classification, we evaluated the hypervolume under ROC manifold [42,43], which was 0.865 (95% CI: 0.800, 0.914) for the 41 transcript model, indicating good discrimination among the three stages.

Discussion

The ordinalbayes package is based on runjags and enables the user to easily fit penalized ordinal Bayesian cumulative logit models to high-dimensional datasets. The package includes methods for monitoring the mixing of chains (plot) and convergence (print). It also includes a summary function that permits the user to estimate the Bayes factor for testing an interval null hypothesis for β and for testing the null that γ = 0 to assist the user with variable selection. The coef function uses the posterior distribution to return summary estimates of the penalized β and the γ indicators. The predict (or equivalently, fitted) function can be used to obtain the estimated ordinal response probabilities as well as the predicted ordinal response level for each observation. When applied to The Cancer Genome Atlas cervical cancer dataset, predictive performance was excellent. When restricting attention to only the 41 transcripts with a Bayes factor > 4, predictive performance yielded an overall misclassification error of 11.6%, though the misclassification error increased from 0% for Stage I and III/VI in the full model to 3.2% and 14.0%, respectively, in the reduced model. Interestingly, transcripts that were identified have known associations with cervical cancer, cancers of the female reproductive system, and other cancer in general. The syntax we used to analyze this dataset appears in the Appendix B.
123
1120152
243213
301442
123
1124280
20200
301357
123
112091
24457
30749
  39 in total

1.  Evaluating classification accuracy for modern learning approaches.

Authors:  Jialiang Li; Ming Gao; Ralph D'Agostino
Journal:  Stat Med       Date:  2019-01-30       Impact factor: 2.373

Review 2.  Cervical cancer worldwide.

Authors:  Mailinh Vu; Jim Yu; Olutosin A Awolude; Linus Chuang
Journal:  Curr Probl Cancer       Date:  2018-06-25       Impact factor: 3.187

3.  ESMO-ESGO consensus conference recommendations on ovarian cancer: pathology and molecular biology, early and advanced stages, borderline tumours and recurrent disease†.

Authors:  N Colombo; C Sessa; A du Bois; J Ledermann; W G McCluggage; I McNeish; P Morice; S Pignata; I Ray-Coquard; I Vergote; T Baert; I Belaroussi; A Dashora; S Olbrecht; F Planchamp; D Querleu
Journal:  Ann Oncol       Date:  2019-05-01       Impact factor: 32.976

4.  Gene expression profiles in HPV-immortalized human cervical cells treated with the nicotine-derived carcinogen 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone.

Authors:  Bogdan Prokopczyk; Indu Sinha; Neil Trushin; Willard M Freeman; Karam El-Bayoumy
Journal:  Chem Biol Interact       Date:  2008-11-06       Impact factor: 5.192

5.  Evaluation of logistic Bayesian LASSO for identifying association with rare haplotypes.

Authors:  Swati Biswas; Charalampos Papachristou
Journal:  BMC Proc       Date:  2014-06-17

6.  Logistic Bayesian LASSO for genetic association analysis of data from complex sampling designs.

Authors:  Yuan Zhang; Jonathan N Hofmann; Mark P Purdue; Shili Lin; Swati Biswas
Journal:  J Hum Genet       Date:  2017-04-20       Impact factor: 3.172

7.  Integrated genomic and molecular characterization of cervical cancer.

Authors: 
Journal:  Nature       Date:  2017-01-23       Impact factor: 49.962

8.  Heparanase gene haplotype (CGC) is associated with stage of disease in patients with ovarian carcinoma.

Authors:  Shirley Ralph; Paul E C Brenchley; Angela Summers; Daniela D Rosa; Ric Swindell; Gordon C Jayson
Journal:  Cancer Sci       Date:  2007-04-05       Impact factor: 6.716

9.  TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data.

Authors:  Antonio Colaprico; Tiago C Silva; Catharina Olsen; Luciano Garofano; Claudia Cava; Davide Garolini; Thais S Sabedot; Tathiane M Malta; Stefano M Pagnotta; Isabella Castiglioni; Michele Ceccarelli; Gianluca Bontempi; Houtan Noushmehr
Journal:  Nucleic Acids Res       Date:  2015-12-23       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.