| Literature DB >> 32060406 |
Mike J Mason1, Carolina Schinke2, Christine L P Eng3, Fadi Towfic4, Fred Gruber5, Andrew Dervan6, Douglas Bassett6, Jonathan Goke3, Brian A Walker2,7, Anjan Thakurta4, Justin Guinney8, Brian S White8, Aditya Pratapa9, Yuanfang Guan10, Hongjie Chen11, Yi Cui12, Bailiang Li13, Thomas Yu8, Elias Chaibub Neto8, Konstantinos Mavrommatis4, Maria Ortiz14, Valeriy Lyzogubov15, Kamlesh Bisht4, Hongyue Y Dai16, Frank Schmitz6, Erin Flynt4, Samuel A Danziger6, Alexander Ratushny6, William S Dalton17, Hartmut Goldschmidt18,19, Herve Avet-Loiseau20, Mehmet Samur21,22, Boris Hayete5, Pieter Sonneveld23, Kenneth H Shain24,25, Nikhil Munshi26,27, Daniel Auclair28, Dirk Hose18,29, Gareth Morgan15, Matthew Trotter14.
Abstract
While the past decade has seen meaningful improvements in clinical outcomes for multiple myeloma patients, a subset of patients does not benefit from current therapeutics for unclear reasons. Many gene expression-based models of risk have been developed, but each model uses a different combination of genes and often involves assaying many genes making them difficult to implement. We organized the Multiple Myeloma DREAM Challenge, a crowdsourced effort to develop models of rapid progression in newly diagnosed myeloma patients and to benchmark these against previously published models. This effort lead to more robust predictors and found that incorporating specific demographic and clinical features improved gene expression-based models of high risk. Furthermore, post-challenge analysis identified a novel expression-based risk marker, PHF19, which has recently been found to have an important biological role in multiple myeloma. Lastly, we show that a simple four feature predictor composed of age, ISS, and expression of PHF19 and MMSET performs similarly to more complex models with many more gene expression features included.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32060406 PMCID: PMC7326699 DOI: 10.1038/s41375-020-0742-z
Source DB: PubMed Journal: Leukemia ISSN: 0887-6924 Impact factor: 11.528
Data set descriptions.
| ISS stage | ||||||||
|---|---|---|---|---|---|---|---|---|
| Study | EGA/GEO/Clinical trial Id | Median PFS | Data type | 1 | 2 | 3 | ||
| Training datasets | Masaryk [ | E-MTAB-4032 | 11.35 | Expression array | 0.27 | 0.31 | 0.42 | 147 |
| MAQC-II [ | GSE24080 | 25.47 | Expression array | 0.53 | 0.26 | 0.21 | 559 | |
| MMRF IA9 [ | NCT01454297 | 12.59 | RNA-seq | 0.35 | 0.37 | 0.28 | 636 | |
| HOVON65/GMMG-HD4 [ | GSE19784 | 18.3 | Expression array | 0.43 | 0.27 | 0.31 | 282 | |
| Total training | 1624 | |||||||
| Validation datasets | MRC-IX [ | GSE15695 | * | Expression array | * | * | * | 241 |
| Heidelberg [ | E-MTAB-372 | * | Expression array | * | * | * | 215 | |
| Moffitt | * | RNA-seq | * | * | * | 74 | ||
| DFCI | NCT01191060 | * | RNA-seq | * | * | * | 293 | |
| Total validation | 823 | |||||||
*clinical data withheld per data provider request.
Fig. 1Challenge model submission architecture: training datasets are fully available to Challenge participants (left), while blinded validation datasets are sequestered in the cloud (right).
Containerized models are submitted to cloud, ran on training datasets and risk predictions are scored.
Comparator modes.
| Model | Reference | Features |
|---|---|---|
| Baseline | Baseline | Age and/or ISS |
| UAMS-70 | Shaughnessy et al. [ | Gene expression signature composed of 70 genes |
| EMC-92 | Kuiper et al. [ | Gene expression signature composed of 92 genes |
| UAMS-70 extended | This manuscript | UAMS-70 with age and/or ISS |
| EMC-92 extended | This manuscript | EMC-92 with age and/or ISS |
| REFS | This manuscript | Gene expression and clinical features |
Fig. 2Challenge performance.
a Box plots show distributions of bootstrapped model performances for each team. Comparator models are shown with text marked in blue for baseline models, green for published models and red for published models extended to include clinical features. The dashed red line indicates the median of the best performing comparator model. Barplots to the right show the tie-breaking metric, wBAC, for each model. Amongst statistically tied models, GIS has the highest wBAC and was declared the top-performer. Asterisk indicates internal collaborator’s comparator model. b Kaplan–Meier curve of UAMS-70 comparator model with and without age and ISS added.
Fig. 3PHF19 compared with other myeloma classifiers and features.
a Two-dimensional histogram of PFS concordance index-based univariate effect sizes (z) in training and validation cohorts where colors represent the number of genes in a given hexagonal bin. PHF19 and well-known myeloma genes noted. b PHF19 and MMSET expression in relation to t(4;14). c A simple four feature model performs as well as UAMS-70 combined with age and ISS.
Fig. 4PHF19 knockdown leads to decreased cell: knockdown of PHF19 was performed in the JJN3 and ARP1 MM cell lines using inducible shRNA.
a PHF19 knockdown, relative to scrambled shRNA control, was confirmed using qRT-PCR and b western blotting. c, d Cell proliferation was significantly decreased in MM cells with PHF19 knockdown compared with scrambled control for JJN3 and ARP1 cell lines.