| Literature DB >> 26052223 |
Amanda Elswick Gentry1, Colleen K Jackson-Cook2, Debra E Lyon3, Kellie J Archer1.
Abstract
The pathological description of the stage of a tumor is an important clinical designation and is considered, like many other forms of biomedical data, an ordinal outcome. Currently, statistical methods for predicting an ordinal outcome using clinical, demographic, and high-dimensional correlated features are lacking. In this paper, we propose a method that fits an ordinal response model to predict an ordinal outcome for high-dimensional covariate spaces. Our method penalizes some covariates (high-throughput genomic features) without penalizing others (such as demographic and/or clinical covariates). We demonstrate the application of our method to predict the stage of breast cancer. In our model, breast cancer subtype is a nonpenalized predictor, and CpG site methylation values from the Illumina Human Methylation 450K assay are penalized predictors. The method has been made available in the ordinalgmifs package in the R programming environment.Entities:
Keywords: cancer; methylation; ordinal response; penalized regression
Year: 2015 PMID: 26052223 PMCID: PMC4447150 DOI: 10.4137/CIN.S17277
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Criteria for breast cancer subtype classification.
| SUBTYPE | DEFINITION |
|---|---|
| Luminal A | ER+ and/or PR+, HER2− |
| Luminal B | ER+ and/or PR+, HER2+ |
| Triple negative | ER−, PR−, HER2− |
| HER2 Type | ER−, PR−, HER2+ |
Notes: Breast cancer subtype classification typically considers proportion of tumor cells positive for the Ki67 protein. This measurement was not collected in our study and therefore could not be used for classification.
Figure 1Boxplot of mean β-values by percent GC content across all samples, for type I probes.
Figure 2Boxplot of mean β-values by percent GC content across all samples, for type II probes.
Demographic characteristics by stage of cancer. The medians are reported for continuous variables (age and BMI) and the frequencies are reported for categorical variables (race, smoking status, and prior surgery).
| STAGE | I | IIA | IIB | IIIA | TOTAL |
|---|---|---|---|---|---|
| Age (median) | 55 | 48 | 56 | 49 | 53 |
| BMI (median) | 29.58 | 25.79 | 31.01 | 29.25 | 28.34 |
|
| |||||
| Race (Black) | 5/21 | 10/29 | 6/15 | 0/8 | 21/73 |
| Currently smoking (Y) | 3/21 | 5/29 | 6/15 | 1/8 | 15/73 |
| Prior surgery (Y) | 21/21 | 26/29 | 12/15 | 7/8 | 66/73 |
Frequencies of breast cancer subtype by stage of cancer.
| STAGE | I | IIA | IIB | IIIA | TOTAL |
|---|---|---|---|---|---|
| Luminal A | 7 | 16 | 7 | 7 | 37 |
| Luminal B | 2 | 3 | 3 | 0 | 8 |
| Triple negative | 11 | 7 | 2 | 1 | 21 |
| HER2 type | 1 | 3 | 3 | 0 | 7 |
LRT and resulting P-values from univariate cumulative logit models predicting stage of cancer.
| INTERCEPTS ONLY | AGE | BMI | RACE | CURRENTLY SMOKING | PRIOR SURGERY | SUBTYPE | |
|---|---|---|---|---|---|---|---|
| Deviance | 188.72 | 187.57 | 188.70 | 188.69 | 187.56 | 185.70 | 179.91 |
|
| 1.15 | 0.02 | 0.03 | 1.16 | 3.02 | 8.81 | |
|
| |||||||
| 0.2834 | 0.8787 | 0.8611 | 0.2821 | 0.0824 | 0.0319 | ||
Note:
statistic.
AIC-selected CpG sites listed with their chromosome, position, and associated UCSC ref genes, where appropriate.
| CPG SITE | CHROMOSOME | LOCATION (START) | LOCATION (END) | UCSC REF GENE |
|---|---|---|---|---|
| cg01393985 | 6 | 89927651 | 89927700 | |
| cg02873991 | 12 | 25151263 | 25151312 | |
| cg02990147 | X | 24329623 | 24329672 | |
| cg03478356 | 9 | 45726913 | 45726962 | |
| cg03604519 | X | 70150242 | 70150291 | |
| cg03642328 | 11 | 69624925 | 69624974 | |
| cg04315214 | 1 | 2043799 | 2043848 | |
| cg05898699 | 18 | 15197299 | 15197348 | |
| cg06159404 | 10 | 43846376 | 43846425 | |
| cg06618740 | 1 | 1100126 | 1100175 | |
| cg07068358 | 16 | 25879737 | 25879786 | |
| cg07078747 | 12 | 34177660 | 34177709 | |
| cg07850592 | 1 | 231299396 | 231299445 | |
| cg08314875 | Y | 15015601 | 15015650 | |
| cg08407901 | 21 | 43989901 | 43989950 | |
| cg08615372 | 19 | 18699234 | 18699283 | |
| cg08833952 | 22 | 22469409 | 22469458 | |
| cg09667394 | 1 | 78011748 | 78011797 | |
| cg10139947 | 2 | 105274650 | 105274699 | |
| cg10467557 | 13 | 21893614 | 21893663 | |
| cg12386614 | 1 | 33608005 | 33608054 | |
| cg12440927 | 7 | 157791673 | 157791722 | |
| cg13033971 | 13 | 46291925 | 46291974 | |
| cg14468658 | 5 | 140723461 | 140723510 | |
| cg14884760 | 22 | 50164389 | 50164438 | |
| cg16807687 | 10 | 85973970 | 85974019 | |
| cg19009644 | 3 | 10553211 | 10553260 | |
| cg19149522 | 7 | 6616375 | 6616424 | |
| cg19893664 | 14 | 105619634 | 105619683 | |
| cg20418394 | 10 | 72254335 | 72254384 | |
| cg21156276 | 9 | 4491869 | 4491918 | |
| cg24493834 | 6 | 129250963 | 129251012 | |
| cg25099892 | 13 | 113313857 | 113313906 | |
| cg26479305 | 12 | 52470979 | 52471028 | |
| cg27161197 | 12 | 47224649 | 47224698 |
Figure 3Boxplot of β-values for CpG site cg19149522 (ZDHHC4), for all subjects, by stage of cancer.
Figure 4Boxplot of β-values for CpG site cgl6807687 (PCDH21), for all subjects, by stage of cancer.
Cross-tabulation of the observed (rows) versus predicted (columns) class for the AIC and the fully converged models.
| AIC | I | IIA | IIB | IIIA | CONVERGED | I | IIA | IIB | IIIA |
|---|---|---|---|---|---|---|---|---|---|
| I | 20 | 1 | 0 | 0 | I | 21 | 0 | 0 | 0 |
| IIA | 0 | 29 | 0 | 0 | IIA | 0 | 29 | 0 | 0 |
| IIB | 0 | 4 | 11 | 0 | IIB | 0 | 0 | 15 | 0 |
| IIIA | 0 | 0 | 6 | 2 | IIIA | 0 | 0 | 0 | 8 |