| Literature DB >> 26423345 |
Behrouz Madahian, Sujoy Roy, Dale Bowman, Lih Y Deng, Ramin Homayouni.
Abstract
BACKGROUND: The dimension and complexity of high-throughput gene expression data create many challenges for downstream analysis. Several approaches exist to reduce the number of variables with respect to small sample sizes. In this study, we utilized the Generalized Double Pareto (GDP) prior to induce sparsity in a Bayesian Generalized Linear Model (GLM) setting. The approach was evaluated using a publicly available microarray dataset containing 99 samples corresponding to four different prostate cancer subtypes.Entities:
Mesh:
Year: 2015 PMID: 26423345 PMCID: PMC4597416 DOI: 10.1186/1471-2105-16-S13-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flow chart of Gibbs sampling procedure for SBGG. Here j = 1, 2,..., p and r = 1, 2,..., n and s = 2, 3, .. , k where n is the number of samples, p is the number of covariates in the model, and k is the number of categories of response variable.
Figure 2Posterior mean of . The x-axis represents the list of 398 differentially expressed genes obtained after Benjamini and Hochberg FDR correction of the results of single gene analysis using classical multi-category logistic regression. The y-axis represents the posterior mean of θ associated with each gene. While some signals are reduced toward zero, other signals stand out which turn out to be biologically more relevant to prostate cancer progression subtypes.
Overall average accuracy and associated standard deviations (in parentheses) of SBGG, SBGDE, SVM and Random Forest models using 10 and 50 marker genes
| Model | P-10 | P-50 |
|---|---|---|
| SBGG | 82.5 (6.8) | 94.9 (3.08) |
| SBGDE | 80.4 (6.2) | 82.3 (6.4) |
| SVM | 53.6 (5.7) | 67 (4.9) |
| Random Forest | 83 (5.2) | 84.6 (3.5) |
Average classification accuracy and associated standard deviations (in parentheses) of prostate cancer subtypes in the test group using SBGG, BBGDE, SVM and Random Forest models for 50 marker genes
| Sample Type | SBGG | SBGDE | SVM | Random Forest |
|---|---|---|---|---|
| Benign | 95.4 (3.07) | 99.6 (1.9) | 90.1 (1.7) | 96.8 (1.3) |
| PIN | 80.6 (0.08) | 53.4 (1.4) | 38.2 (8.2) | 52 (1.1) |
| PCA | 98.9 (1.9) | 65.4 (7.2) | 45.8 (6.2) | 84.8 (5.4) |
| MET | 96.8 (4.6) | 95.4 (6.3) | 81.8 (1.6) | 83.6 (7.09) |
Average classification accuracy and associated standard deviations(in parentheses) of prostate cancer subtypes in the test group using SBGG, BBGDE, SVM and Random Forest models for using 10 marker genes
| Sample Type | SBGG | SBGDE | SVM | Random Forest |
|---|---|---|---|---|
| Benign | 89.4 (6.1) | 95.1 (6) | 84.4 (5.3) | 91.1 (4.5) |
| PIN | 62.5 (1.6) | 61.7 (2.8) | 9 (7.2) | 61.4 (1.9) |
| PCA | 98.7 (0.7) | 86.9 (1.1) | 37.4 (9) | 86.7 (2.1) |
| MET | 59.4 (2.06) | 56 (3.2) | 55.3 (1.2) | 82.8 (7.3) |
Figure 3Accuracy plot of four models using different number of genes for classification of prostate cancer subtypes. The accuracy values are the average classification accuracy across 50 runs and the vertical lines show their associated standard deviations.
Literature based functional cohesion p-values (LPv) and associated standard deviations (in parentheses) of the top 100 genes obtained from SBGG, SBDE, logistic regression, and Random Forest models
| Sample Type | Lpv |
|---|---|
| SBGG | 2.0E-4 (1.7E-5) |
| SBGDE | 0.007 (0.001) |
| Ordinal Logistic Regression | 0.047 |
| Random Forest | 0.131 (0.07) |