| Literature DB >> 34890147 |
Joshua J Levy1, Carly A Bobak, Mustafa Nasir-Moin, Eren M Veziroglu, Scott M Palisoul, Rachael E Barney, Lucas A Salas, Brock C Christensen, Gregory J Tsongalis, Louis J Vaickus.
Abstract
Spatially resolved characterization of the transcriptome and proteome promises to provide further clarity on cancer pathogenesis and etiology, which may inform future clinical practice through classifier development for clinical outcomes. However, batch effects may potentially obscure the ability of machine learning methods to derive complex associations within spatial omics data. Profiling thirty-five stage three colon cancer patients using the GeoMX Digital Spatial Profiler, we found that mixed-effects machine learning (MEML) methods† may provide utility for overcoming significant batch effects to communicate key and complex disease associations from spatial information. These results point to further exploration and application of MEML methods within the spatial omics algorithm development life cycle for clinical deployment.Entities:
Mesh:
Year: 2022 PMID: 34890147 PMCID: PMC8669762
Source DB: PubMed Journal: Pac Symp Biocomput ISSN: 2335-6928
Figure 1:Overview of Experiment:
Paired H&E and IF stains (PanCK stain, green; CD45 stain, pink) used to help select ROI (black squares) contained in macro-architectural contexts (outlined: blue for intra-tumoral, red for tumor interface, green for away from tumor) for: A) Non-metastatic patient, B) Metastatic patient; C) Tree boosting methods combined with mixed effects modeling to adjust for D) patient/batch-level effects (e.g., interactions within nested observations, continuous scale) to yield E) disease associated interactions; SHAP dependence plots demonstrate how predictor (x-axis) covaries with another (colors) and impact on predictor importance, y-axis
Bootstrapped model performance comparison, using C-Statistic/Area Under the Curve (AUC) and Log-Loss
| Fixed Effects | MEML | Bayesian Generalized Linear Mixed Models | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Task | AUC ± SE | RF | XGBoost | GPBoost | GPBoost-Coords | SP-BART | BGLMM | BGLMM-Int | BGLMM-GP |
|
|
| 0.759±0.01 | 0.747±0.01 | 0.752±0.01 | 0.747±0.01 | 0.772±0.01 | 0.778±0.01 |
| 0.777±0.01 |
|
| 0.781±0.009 | 0.773±0.01 | 0.782±0.009 | 0.78±0.009 | 0.784±0.009 | 0.788±0.009 |
| 0.786±0.01 | |
|
|
| 0.909±0.006 | 0.951±0.004 |
| n/a | 0.897±0.006 | 0.852±0.008 | 0.896±0.006 | n/a |
|
| 0.834±0.014 | 0.849±0.014 |
| n/a | 0.867±0.012 | 0.848±0.013 | 0.877±0.012 | n/a | |
|
| 0.881±0.013 | 0.866±0.013 |
| n/a | 0.874±0.012 | 0.836±0.014 | 0.881±0.012 | n/a | |
|
| 0.827±0.014 | 0.842±0.014 |
| n/a | 0.887±0.011 | 0.885±0.012 | 0.885±0.011 | n/a | |
Figure 2:Macro Predictions:
A) SHAP summary plot of top GPBoost terms; each point represents value of specific predictor for ROI; color indicates predictor’s value; positive SHAP value (x-axis) indicates how related to tumor interface; B) GPBoost SHAP partial dependency of CD68 interaction with PanCK; C) Posterior distributions of unpenalized BGLMM-Int predictors (thick and thin bars represent 50% and 90% credible intervals respectively)
Top terms from BGLMM-Int models; GPBoost-extracted interactions are emphasized in bold
| Top 7 Ranked BGLMM-Int Terms | |||||
|---|---|---|---|---|---|
| Macro | METS | ||||
| OOS | WS | Overall | Intra | Inter | Away |
| PanCk | PanCk |
| Beta-2-microglobulin (B2M) |
| CD25 |
| CTLA4 | CD127 | CD14 | CD45 | CD11c | CD44 |
|
| CTLA4 | Age | CD8 | CTLA4 | FAP-α |
| CD34 |
| CD68 | FOXP3 | CD14 | Fibronectin |
|
| CD44 | CD11c |
| GZMB | CD68 |
| CD68 |
|
| Age | CD68 | Age |
| CD27 | CD68 | Ki-67 |
|
|
|
| CD56 |
| Age | CD163 | ||
| CD34 | CD8 |
| PanCk | ||
|
| Beta-2-microglobulin (B2M) | CD66b |
| ||
| CD20 |
| ||||
|
| |||||
| Fibronectin | |||||
| Sex | |||||
Figure 3:METS Predictions:
A) SHAP summary plot of top GPBoost terms; each point represents value of specific predictor for ROI; color indicates predictor’s value; positive SHAP value (x-axis) indicates how related to tumor interface; B) GPBoost SHAP partial dependency of CD20 interaction with age; C) Posterior distributions of unpenalized BGLMM-Int predictors (thick and thin bars represent 50% and 90% credible intervals respectively)