| Literature DB >> 35676445 |
Felix Y Feng1, Osama Mohamad1, Andre Esteva2, Jean Feng3, Douwe van der Wal4, Shih-Cheng Huang5, Jeffry P Simko1, Sandy DeVries6, Emmalyn Chen4, Edward M Schaeffer7, Todd M Morgan8, Yilun Sun9, Amirata Ghorbani10, Nikhil Naik10, Dhruv Nathawani10, Richard Socher10, Jeff M Michalski11, Mack Roach1, Thomas M Pisansky12, Jedidiah M Monson13, Farah Naz14, James Wallace15, Michelle J Ferguson16, Jean-Paul Bahary17, James Zou5, Matthew Lungren5, Serena Yeung5, Ashley E Ross7, Howard M Sandler18, Phuoc T Tran19, Daniel E Spratt20, Stephanie Pugh21.
Abstract
Prostate cancer is the most frequent cancer in men and a leading cause of cancer death. Determining a patient's optimal therapy is a challenge, where oncologists must select a therapy with the highest likelihood of success and the lowest likelihood of toxicity. International standards for prognostication rely on non-specific and semi-quantitative tools, commonly leading to over- and under-treatment. Tissue-based molecular biomarkers have attempted to address this, but most have limited validation in prospective randomized trials and expensive processing costs, posing substantial barriers to widespread adoption. There remains a significant need for accurate and scalable tools to support therapy personalization. Here we demonstrate prostate cancer therapy personalization by predicting long-term, clinically relevant outcomes using a multimodal deep learning architecture and train models using clinical data and digital histopathology from prostate biopsies. We train and validate models using five phase III randomized trials conducted across hundreds of clinical centers. Histopathological data was available for 5654 of 7764 randomized patients (71%) with a median follow-up of 11.4 years. Compared to the most common risk-stratification tool-risk groups developed by the National Cancer Center Network (NCCN)-our models have superior discriminatory performance across all endpoints, ranging from 9.2% to 14.6% relative improvement in a held-out validation set. This artificial intelligence-based tool improves prognostication over standard tools and allows oncologists to computationally predict the likeliest outcomes of specific patients to determine optimal treatment. Outfitted with digital scanners and internet access, any clinic could offer such capabilities, enabling global access to therapy personalization.Entities:
Year: 2022 PMID: 35676445 PMCID: PMC9177850 DOI: 10.1038/s41746-022-00613-w
Source DB: PubMed Journal: NPJ Digit Med ISSN: 2398-6352
Fig. 1Multimodal deep learning system and dataset.
a The multimodal architecture is composed of two parts: a tower stack to parse a variable number of digital histopathology slides and another tower stack to merge the resultant features and predict binary outcomes. b The training of the self-supervised model of the image tower stack.
Clinicopathologic and trial characteristics.
| RTOG-9202 | RTOG-9408 | RTOG-9413 | RTOG-9910 | RTOG-0126 | Combined | |
|---|---|---|---|---|---|---|
| Number of Patients | 1180 | 1719 | 695 | 976 | 1084 | 5654 |
|
| 1004 | 1312 | 481 | 769 | 937 | 4503 |
|
| 18 | 48 | 23 | 23 | 0 | 112 |
|
| 147 | 334 | 173 | 166 | 112 | 932 |
|
| 4 | 12 | 1 | 11 | 12 | 40 |
|
| 1 | 10 | 11 | 6 | 9 | 37 |
|
| 6 | 3 | 6 | 1 | 14 | 30 |
| Number of Pathology Slides | 3188 | 5472 | 2104 | 3075 | 2365 | 16204 |
| Number of Clinical Variables | 53 | 69 | 71 | 60 | 62 | − |
| Therapy Randomization | RTS vs. RTL | RT vs. RTS | RTS 2 × 2a | RTS vs. RTM | RT vs. RT + | − |
| Patient Risk Groups | Inter. | High | Low | Inter. | High | Inter. | High | Inter. | High | Inter. | Low | Inter. | High |
| Primary Endpoint | Disease-free Survival | Overall Survival | Progression-free Survival | Prostate Cancer-specific Mortality | Overall Survival | − |
| Median Follow-up for Censored Patients (Years) | 17.4 | 15.1 | 13.7 | 9.3 | 13.2 | 11.4 |
| No. Patients Died | 944 | 1154 | 504 | 297 | 505 | 3404 |
| Trial Accrual Dates | 1992–1995 | 1994–2001 | 1995–1999 | 2000–2004 | 2002–2008 | 1992–2008 |
The column ‘combined’ shows the characteristics of the final dataset with all five trials used for training and validation. aRTOG-9413 randomized patients in a 2 × 2 fashion testing the effect of timing of ADT (before and during RT vs. starting after RT) and field size (prostate only vs. full pelvic RT). New acronyms: radiotherapy plus short/medium/long-term hormone therapy (RTS/RTM/RTL).
Fig. 2Pathologist interpretation of self-supervised model tissue clusters.
The self-supervised model in the multimodal model was trained to identify whether or not augmented versions of small patches of tissue came from the same original patch, without ever seeing clinical data labels. After training, each image patch in the dataset of 10.05 M image patches was fed through this model to extract a 128-dimensional feature vector, and the UMAP algorithm[27] was used to cluster and visualize the resultant vectors. A pathologist was then asked to interpret the 20 image patches closest to each of the 25 cluster centroids—the descriptions are shown next to the insets. For clarity, we only highlight 6 clusters (colored), and show the remaining clusters in gray. See Supplementary Fig. 2 for full pathologist annotation.
Fig. 3Comparison of the multimodal deep learning system to NCCN risk groups across trials and outcomes.
a Performance results reporting on the area under the curve (AUC) of time-dependent receiver operator characteristics of the MMAI (blue bars) vs. NCCN (gray bars) models, include 95% confidence intervals and two-sided p-values. Comparisons were made across 5-year and 10-year time points on the following binary outcomes: distant metastasis (DM), biochemical failure (BF), prostate cancer-specific survival (PCSS), and overall survival (OS). b Summary table of the relative improvement of the MMAI model over the NCCN model across the various outcomes and broken down by performance on the data from each trial in the validation set. Relative improvement is given by (AUCMMAI − AUCNCCN)/AUCNCCN. c Ablation study showing model performance when trained on a sequentially decreasing set of data inputs, including the pathology images only (path), pathology images + NCCN variables (path + NCCN), and pathology images + NCCN variables + age + Gleason primary + Gleason secondary (path + NCCN + 3). d–h Performance comparison on the individual clinical trial subsets of the validation set—together, these five comprise the entire validation set shown in (a).