| Literature DB >> 30857319 |
Erin A Salinas1, Marina D Miller2, Andreea M Newtson3, Deepti Sharma4, Megan E McDonald5, Matthew E Keeney6, Brian J Smith7,8, David P Bender9,10, Michael J Goodheart11,12, Kristina W Thiel13, Eric J Devor14, Kimberly K Leslie15,16, Jesus Gonzalez Bosquet17,18.
Abstract
The utility of comprehensive surgical staging in patients with low risk disease has been questioned. Thus, a reliable means of determining risk would be quite useful. The aim of our study was to create the best performing prediction model to classify endometrioid endometrial cancer (EEC) patients into low or high risk using a combination of molecular and clinical-pathological variables. We then validated these models with publicly available datasets. Analyses between low and high risk EEC were performed using clinical and pathological data, gene and miRNA expression data, gene copy number variation and somatic mutation data. Variables were selected to be included in the prediction model of risk using cross-validation analysis; prediction models were then constructed using these variables. Model performance was assessed by area under the curve (AUC). Prediction models were validated using appropriate datasets in The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. A prediction model with only clinical variables performed at 88%. Integrating clinical and molecular data improved prediction performance up to 97%. The best prediction models included clinical, miRNA expression and/or somatic mutation data, and stratified pre-operative risk in EEC patients. Integrating molecular and clinical data improved the performance of prediction models to over 95%, resulting in potentially useful clinical tests.Entities:
Keywords: clinical outcomes; endometrial cancer; high risk; integration of data; prediction models
Mesh:
Substances:
Year: 2019 PMID: 30857319 PMCID: PMC6429416 DOI: 10.3390/ijms20051205
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Flow chart of patients included in the University of Iowa (UI) endometrial cancer study cohort. (CHA = complex endometrial hyperplasia with atypia). In this dataset, 126 patients had endometrial cancer, the endometrioid type. Only 62 had sufficient quantity and quality of purified RNA for RNA sequencing.
Patient clinical and pathological characteristics. Univariate analysis with logistic regression was used to assess differences between both groups. * denotes statistically significant differences between low and high risk patients.
| Clinical/Pathological Variables | Low Risk (N = 70) | High Risk (N = 56) | ||
|---|---|---|---|---|
| Preoperative characteristics | Age (mean) | 58.7 | 64.8 | 0.003 * |
| BMI (mean) | 38.5 | 32.6 | <0.001 * | |
| Charlson Morbidity Index (mean) | 4.7 | 5 | 0.012 * | |
| Grade | <0.001 * | |||
| 1 | 38 | 7 | ||
| 2 | 21 | 27 | ||
| 3 | 8 | 22 | ||
| Postoperative characteristics | Invasion (mean) | 19 | 62 | <0.001 * |
| 2009 FIGO Stage | 0.991 | |||
| I | 70 | 23 | ||
| II | - | 7 | ||
| III | - | 20 | ||
| IV | - | 6 | ||
| Lymph nodes (% positive) | 0 (0%) | 13 (27%) | 0.987 | |
| Peritoneal Cytology (% positive) | 2 (3%) | 31 (56%) | 0.011 * | |
| Lymphovascular involvement | 2 (3%) | 10 (19%) | <0.001 * | |
| ER (% positive) | 38 (93%) | 31 (78%) | 0.066 | |
| PR (% positive) | 38 (93%) | 30 (75%) | 0.040 * | |
| Postoperative complications (% positive) | 12 (17%) | 17 (32%) | 0.056 | |
| LOS (mean days) | 3.3 | 6.1 | 0.002 * | |
| Adjuvant Treatment (yes) (% positive) | 8 (11%) | 39 (74%) | <0.001 * | |
| Outcomes | 5-year Survival (%) | 98% | 75% | <0.001 * |
| Recurrence (% positive) | 2 (3%) | 19 (37%) | <0.001 * | |
| Death due to disease (% positive) | 1 (1%) | 15 (30%) | 0.001 * |
Figure 2Survival analysis for UI EEC patients. (A) Survival curves for UI EEC patients with clinical data stratified by risk. There were two low risk and one high risk patients with no survival information; (B) Independent variables associated with survival in the multivariate analysis for UI EEC patients. Ref: reference value; PA LN: Para-aortic lymph nodes; LR: low risk. High risk patients have almost 10 times greater risk of dying from endometrial cancer relative to low risk patients.
Figure 3Survival analysis for TCGA endometrial cancer patients. Survival curves for TCGA EEC patients with clinical data stratified by risk. Survival for both UI and TCGA patients was similar when stratified by risk level.
Figure 4Selection of molecular variables for prediction model analyses for both groups, low risk (LR (N = 36), in orange) and high risk (HR (N = 26), in purple). Variables that passed a cut-off p-value < 0.05 in a univariate linear model and were present in each fold of the k-fold cross-validation were selected. In A, B and C: patients are on the X axis and molecular variables are on the Y axis. (A) Heatmap of expression of 255 selected genes out of a total of 26,336 genes. Normalized gene expression is represented in a red-green scheme from lower to higher expression, respectively; (B) Heatmap of 55 selected miRNAs out of a total of 1916 miRNAs. Normalized miRNA expression is represented in a blue-orange scheme from lower to higher expression, respectively; (C) Heatmap of 398 selected somatic mutations out of a total of 12,340 mutations. Each somatic mutation for each patient is represented in purple. Grey represents non-mutated genes; (D) Manhattan plot of 846 selected loci with copy number variation out of a total of 26,720 loci. The Y axis represents how many times the locus was involved in the prediction process with cross-validation (k-fold with 25 replications). The X axis represents the chromosomal location. The horizontal red line denotes 25 replications. See the Variable Selection Section 4 for more details.
Prediction models for levels of risk using diverse clinical, pathological and molecular data. Models that included only 1 data class used all variables selected with the cross-validation analysis (input variables: 17 clinical features, 255 mRNAs, 55 miRNAs, 398 somatic mutations and 846 CNVs). The Lasso analysis selected only the most informative resulting variables for the prediction process: 7 clinical features, 38 mRNAs, 28 miRNAs, 35 somatic mutations, and 65 CNVs. For prediction analysis using more than one data class, we used the resulting variables. Model performances were measured by AUC and their 95% confidence interval (CI). Models with the best performance are marked with * Prediction models using only one data class. # Number of variables
|
|
|
|
|
|
|
| M1-A | Clinical | 17 | 7 | 0.88 | 0.84, 0.92 |
| M1-B | mRNAs | 255 | 38 | 0.79 | 0.73, 0.85 |
| M1-C | miRNAs | 55 | 28 | 0.84 | 0.76, 0.93 |
| M1-D | Mutations | 398 | 35 | 0.68 | 0.63, 0.73 |
| M1-E | CNVs | 846 | 65 | 0.67 | 0.56, 0.77 |
|
| |||||
|
|
|
|
|
|
|
| M2-A | mRNAs | 7 + 38 | 37 | 0.93 | 0.90, 0.96 |
| * M2-B | miRNAs | 7 + 28 | 24 | 0.97 | 0.96, 0.99 |
| * M2-C | Mutations | 7 + 35 | 35 | 1 | 1, 1 |
| M2-D | CNVs | 7 + 65 | 61 | 0.92 | 0.89, 0.94 |
|
| |||||
|
|
|
|
|
|
|
| M3-A | mRNAs + miRNAs | 7 + 38 + 28 | 37 | 0.83 | 0.74, 0.91 |
| M3-B | Mutations + CNVs | 7 + 35 + 65 | 48 | 0.94 | 0.91, 0.97 |
| M3-C | mRNAs + Mutations | 7 + 38 + 35 | 41 | 0.95 | 0.92, 0.98 |
| M3-D | miRNAs + Mutations | 7 + 28 + 35 | 36 | 0.94 | 0.91, 0.97 |
| M3-E | miRNAs + CNVs | 7 + 28 + 65 | 46 | 0.86 | 0.81, 0.91 |
| M3-F | mRNAs + CNVs | 7 + 38 + 65 | 44 | 0.93 | 0.91, 0.95 |
|
| |||||
|
|
|
|
|
|
|
| M4-A | mRNAs + miRNAs + Mutations | 7 + 38 + 28 + 35 | 42 | 0.94 | 0.91, 0.96 |
| M4-B | mRNAs + miRNAs + CNVs | 7 + 38 + 28 + 65 | 40 | 0.91 | 0.88, 0.93 |
| M4-C | mRNAs + Mutations + CNVs | 7 + 38 + 35 + 65 | 42 | 0.91 | 0.88, 0.95 |
| M4-D | miRNAs + Mutations + CNVs | 7 + 28 + 35 + 65 | 53 | 0.88 | 0.84, 0.92 |
|
| |||||
|
|
|
|
|
|
|
| M5-A | mRNAs + miRNAs + Mutations + CNVs | 7 + 38 + 28 + 35 + 65 | 47 | 0.89 | 0.86, 0.92 |
Figure 5Prediction models with the highest performances. Curves of performance of the models based on AUC. On top of the graphic: number of variables included in the model. On Y-axis: AUC value. On X-axis: lambda value. (A) Model including clinical and miRNA variables; (B) Model including clinical variables and somatic mutations.
External replication of prediction models for levels of risk. In the analysis of TCGA and GEO datasets, we used resulting variables from UI analyses of 1 data class or type; these results are included for comparison in this table and denoted as “UI model #”) (The definitions for input variables and resulting variables are the same as in Table 2). In most cases, variables resulting from the UI analyses were not available in external sets (marked by *). Model performances were measured by AUC and their 95% confidence interval (CI).
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA model M1-A | Clinical | 2 * | 2 | 0.75 | 0.73, 0.78 |
| GEO model M1-A | Clinical | 2 * | 2 | 0.84 | 0.79, 0.89 |
|
|
|
|
|
|
|
| TCGA model M1-B | mRNAs | 36 * | 23 | 0.60 | 0.57, 0.63 |
| GEO model M1-B | mRNAs | 14 * | 5 | 0.60 | 0.53, 0.68 |
|
|
|
|
|
|
|
| TCGA model M1-C | miRNAs | 28 | 4 | 0.57 | 0.54, 0.60 |
|
|
|
|
|
|
|
| TCGA model M1-C | Mutations | 34 * | 18 | 0.59 | 0.57, 0.62 |
|
|
|
|
|
|
|
| TCGA model M1-E | CNVs | 65 | 2 | 0.63 | 0.59, 0.67 |
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA model M2-A | mRNAs | 2 + 36 * | 15 | 0.75 | 0.72, 0.78 |
| GEO model M2-A | mRNAs | 2 + 14 * | 2 | 0.92 | 0.90, 0.95 |
|
|
|
|
|
|
|
| TCGA model M2-B | miRNAs | 2 + 28 * | 3 | 0.75 | 0.72, 0.77 |
|
|
|
|
|
|
|
| TCGA model M2-C | Mutations | 2 + 34 * | 30 | 0.75 | 0.73, 0.77 |
|
|
|
|
|
|
|
| TCGA model M2-D | CNVs | 2 + 65 * | 3 | 0.75 | 0.71, 0.79 |
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA model M3-A | mRNAs + miRNAs | 2 + 36 + 28 * | 4 | 0.75 | 0.72, 0.78 |
|
|
|
|
|
|
|
| TCGA model M3-B | Mutations + CNVs | 2 + 34 + 65 * | 24 | 0.78 | 0.75, 0.80 |
|
|
|
|
|
|
|
| TCGA model M3-C | mRNAs + Mutations | 2 + 36 + 34 * | 2 | 0.74 | 0.71, 0.77 |
|
|
|
|
|
|
|
| TCGA model M3-D | miRNAs + Mutations | 2 + 28 + 34 * | 2 | 0.74 | 0.72, 0.75 |
|
|
|
|
|
|
|
| TCGA model M3-E | miRNAs + CNVs | 2 + 28 + 65 * | 5 | 0.76 | 0.73, 0.79 |
|
|
|
|
|
|
|
| TCGA model M3-F | mRNAs + CNVs | 2 + 36 + 65 * | 2 | 0.75 | 0.72, 0.78 |
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA model M4-A | mRNAs + miRNAs + Mutations | 2 + 36 + 28 + 34 * | 2 | 0.74 | 0.71, 0.77 |
|
|
|
|
|
|
|
| TCGA model M4-B | mRNAs + miRNAs + CNVs | 2 + 36 + 28 + 65 * | 2 | 0.76 | 0.73, 0.79 |
|
|
|
|
|
|
|
| TCGA model M4-C | mRNAs + Mutations + CNVs | 2 + 36 + 34 + 65 * | 10 | 0.75 | 0.73, 0.78 |
|
|
|
|
|
|
|
| TCGA model M4-D | miRNAs + Mutations + CNVs | 2 + 28 + 34 + 65 * | 9 | 0.77 | 0.74, 0.80 |
|
| |||||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| TCGA model M5-A | mRNAs + miRNAs + Mutations + CNVs | 2 + 36 + 28 + 34 + 65 * | 8 | 0.76 | 0.73, 0.78 |
Validation of prediction models using data from TCGA EEC dataset. As described in the Methods section, the threshold cut-off values were selected to attain a sensitivity of around 90% and the specificity and negative predictive value. The goal was to create models that would capture at least 90% of the high-risk cases, while ruling out most low risk ones. Recurrence probability scale *: 1/(exp(-score) + 1), where score is the resulting value of the prediction model on a log scale.
| Model M2-B | Model M2-C | Model M3-C | Model M3-D | |||||
|---|---|---|---|---|---|---|---|---|
| Recurrence probability scale * | Cut-off = 0.5004 | Cut-off = 0.4984 | Cut-off = 0.7309 | Cut-off = 0.5151 | ||||
| Value | 95% CI | Value | 95% CI | Value | 95% CI | Value | 95% CI | |
| Sensitivity | 90% | 85%, 94% | 90% | 86%, 94% | 90% | 82%, 98% | 90% | 86%, 94% |
| Specificity | 38% | 31%, 44% | 16% | 8%, 26% | 10% | 1%, 23% | 30% | 23%, 37% |
| Positive Predictive Value (PPV) | 56% | 51%, 61% | 49% | 47%, 52% | 13% | 12%, 15% | 53% | 50%, 57% |
| Negative Predictive Value (NPV) | 79% | 70%, 84% | 64% | 47%, 74% | 87% | 45%, 94% | 76% | 66%, 81% |
| Accuracy | 62% | 54%, 68% | 51% | 47%, 56% | 20% | 13%, 32% | 58% | 52%, 63% |
Variables included in the best performing prediction models. Performance of these models was measured by AUC, sensitivity, specificity, and positive and negative prediction values in both the UI (testing) and TCGA (validation) datasets. * Weights for clinical variables were calculated as the exponential of the estimate in the Lasso regression model. Then, the score of the variable is calculated multiplying the weight of the variable by its value. For example, for age, the score is the weight of age, 1.03 times the age in years; for grade, the score is the weight, 12.99, 1.48, or 1.71 times the numerical grade of the tumor (weights differs among the various models). ** Details of individual weights for miRNA expression are in Table A1. # Details of individual weights for somatic mutations are in Table A2. ## Details of individual weights for gene expression are in Table A3.
| Prediction Model | M2-B | M2-C | M3-C | M3-D |
|---|---|---|---|---|
|
| Weight of clinical variables * | |||
| Age | 1.03 | - | - | - |
| History of other cancers | 0.93 | - | - | - |
| Grade | 12.99 | 1.27 | 1.01 | 1.48 |
| BMI | - | 0.99 | - | - |
|
| Log2 transformed and normalized miRNA expression **: | |||
| miRNAs | MIR125B1, MIR181A1, MIR181A2HG, MIR188, MIR301B, MIR30B, MIR3142, MIR345, MIR3690, MIR4269, MIR4307, MIR4463, MIR492, MIR5692A1, MIR578, MIR601, MIR633, MIR6503, MIR6769A, MIR6820 | MIR125B1, MIR181A1, MIR181A2HG, MIR188, MIR30B, MIR3690, MIR4269, MIR4307, MIR633, MIR876 | ||
| Somatic mutations | Number of mutations per gene and person #: | |||
|
|
|
| ||
| Gene expression | Log2 transformed and normalized gene expression ##: | |||
|
| ||||
Details of individual weights for miRNAs. For individual scores multiply the log2 transformed and normalized miRNA expression value by the weight in the table. Each model has different weights.
| Weights for Each miRNA Included in the Model | ||
|---|---|---|
|
|
|
|
| MIR125B1 | 3.01 | 1.07 |
| MIR181A1 | 1.67 | 1.27 |
| MIR181A2HG | 1.53 | 1.07 |
| MIR188 | 1.86 | 1.31 |
| MIR301B | 1.98 | - |
| MIR30B | 0.35 | 0.61 |
| MIR3142 | 12.08 | - |
| MIR345 | 0.38 | - |
| MIR3690 | 1.36 | 1.15 |
| MIR4269 | 2.09 | 1.58 |
| MIR4307 | 14.25 | 1.19 |
| MIR4463 | 8.68 | - |
| MIR492 | 0.95 | - |
| MIR5692A1 | 0.41 | - |
| MIR578 | 2.01 | - |
| MIR601 | 2.59 | - |
| MIR633 | 14.71 | 1.20 |
| MIR6503 | 0.21 | - |
| MIR6769A | 1.51 | - |
| MIR6820 | 0.29 | - |
| MIR876 | 0.22 | 0.97 |
Details of individual weights for somatic mutations. For individual scores, multiply the number of mutations per gene and person by the weight in the table. Each model has different weights.
| Weights for Each Somatic Mutation Included in the Model | |||
|---|---|---|---|
| M2-C | M3-C | M3-D | |
|
| 1.55 | 1.1 | |
|
| 83.96 | 35.65 | |
|
| 3.9 | 4.57 | 3.2 |
|
| 2.58 | 2.87 | |
|
| 13.02 | 1.05 | 10.85 |
|
| 0.98 | 0.88 | |
|
| 0.94 | 0.93 | |
|
| 4.23 | 6.64 | |
|
| 3.22 | ||
|
| 0.71 | 0.61 | 0.74 |
|
| 12.51 | 4.28 | |
|
| 1.01 | ||
|
| 1.42 | 1.34 | |
|
| 4.12 | 4.29 | 2.81 |
|
| 1.56 | ||
|
| 1.13 | ||
|
| 1.34 | 1.04 | |
|
| 3.63 | 4.41 | |
|
| 1.33 | 2.06 | |
|
| 0.93 | 0.94 | |
|
| 0.9 | 0.86 | |
|
| 2.38 | 1.95 | |
|
| 17.57 | 5.83 | 12.5 |
|
| 0.86 | ||
|
| 4.34 | 2.33 | |
|
| 1.81 | 1.45 | 1.62 |
|
| 2.07 | 3.15 | |
|
| 1.17 | ||
|
| 2.67 | 2.55 | 2.41 |
|
| 5.45 | 4.91 | 6.33 |
|
| 1.37 | ||
|
| 1.41 | 1.32 | 1.52 |
|
| 1.74 | ||
|
| 3.65 | 3.47 | 2.78 |
Details of individual weights for gene expression. For individual scores, multiply the log2 transformed and normalized gene expression by the weight in the table. Each model has different weights.
| Weights for Each Somatic Mutation Included in the Model | |
|---|---|
| M3-C | |
|
| 1.04 |
|
| 1.19 |
|
| 1.18 |
|
| 1.33 |
|
| 0.98 |
|
| 1.12 |
|
| 1.83 |
|
| 1.07 |
|
| 1.27 |
|
| 1.05 |
|
| 0.82 |
|
| 1.19 |
|
| 1.28 |
|
| 1.71 |
|
| 0.55 |
|
| 2.01 |
|
| 0.58 |
|
| 0.57 |
|
| 1.33 |
|
| 0.65 |
|
| 0.95 |
|
| 1.21 |
|
| 1.45 |
|
| 1.16 |
|
| 1.41 |
|
| 1.48 |
|
| 1.18 |
|
| 0.66 |
TCGA Patient clinical and pathological characteristics (N = 400). Univariate analysis with logistic regression was used to assess differences between both groups.
| Low Risk (N = 206) | High Risk (N = 194) | |||
|---|---|---|---|---|
| Preoperative characteristics | Age (mean) | 60 | 62 | <0.001 |
| BMI (mean) | 36.1 | 33.6 | 0.064 | |
| Grade | <0.001 | |||
| 1 | 80 | 17 | ||
| 2 | 57 | 59 | ||
| 3 | 69 | 118 | ||
| Postoperative characteristics | Myometrial invasion | 0.984 | ||
| <50% | 204 | 54 | ||
| >50% | 0 | 17 | ||
| 2009 FIGO Stage | 0.984 | |||
| I | 204 | 71 | ||
| II | 0 | 33 | ||
| III | 0 | 70 | ||
| IV | 0 | 13 | ||
| Lymph nodes (positive) | 0 (0%) | 40 (27%) | <0.001 | |
| Peritoneal Cytology (positive) | 3 (2%) | 24 (16%) | <0.001 |
Patient clinical and pathological characteristics for GEO17025 patients (N = 71). Univariate analysis with logistic regression was used to assess differences between both groups.
| Low Risk (N = 49) | High Risk (N = 22) | |||
|---|---|---|---|---|
| Preoperative characteristics | Age (mean) | 58 | 67 | 0.002 |
| Grade | <0.001 | |||
| 1 | 26 | 0 | ||
| 2 | 17 | 13 | ||
| 3 | 6 | 9 |