| Literature DB >> 31026023 |
Jasleen K Grewal1, Basile Tessier-Cloutier2, Martin Jones1, Sitanshu Gakkhar1, Yussanne Ma1, Richard Moore1, Andrew J Mungall1, Yongjun Zhao1, Michael D Taylor3, Karen Gelmon4, Howard Lim4, Daniel Renouf4, Janessa Laskin4, Marco Marra1,5, Stephen Yip2, Steven J M Jones1.
Abstract
Importance: A molecular diagnostic method that incorporates information about the transcriptional status of all genes across multiple tissue types can strengthen confidence in cancer diagnosis. Objective: To determine the practical use of a whole transcriptome-based pan-cancer method in diagnosing primary and metastatic cancers and resolving complex diagnoses. Design, Setting, and Participants: This cross-sectional diagnostic study assessed Supervised Cancer Origin Prediction Using Expression (SCOPE), a machine learning method using whole-transcriptome RNA sequencing data. Training was performed on publicly available primary cancer data sets, including The Cancer Genome Atlas. Testing was performed retrospectively on untreated primary cancers and treated metastases from volunteer adult patients at BC Cancer in Vancouver, British Columbia, from January 1, 2013, to March 31, 2016, and testing spanned 10 822 samples and 66 output classes representing untreated primary cancers (n = 40) and adjacent normal tissues (n = 26). SCOPE's performance was demonstrated on 211 untreated primary mesothelioma cancers and 201 treatment-resistant metastatic cancers. Finally, SCOPE was used to identify the putative site of origin in 15 cases with initial presentation as cancers with unknown primary of origin.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31026023 PMCID: PMC6487574 DOI: 10.1001/jamanetworkopen.2019.2597
Source DB: PubMed Journal: JAMA Netw Open ISSN: 2574-3805
Figure 1. Results From Algorithm and Feature Selection Experiments and Performance of Supervised Cancer Origin Prediction Using Expression (SCOPE) on The Cancer Genome Atlas (TCGA) Held-Out Set
A, Feature selection does not improve pan-cancer classification, with the single neural network having a higher performance than neural networks trained on biologically or statistically relevant gene subsets. B, Performance of single neural network on held-out set is higher than other algorithms. C, Validation of SCOPE on the TCGA held-out set shows high discriminatory power among most cancer types. Incorrect predictions for more than 10% of samples belonging to a given cancer types are shown by curved directed edges. Curve width shows relative fraction of samples in misprediction set. Mispredictions occur among cancer type with same organ system of origin. Colored points with bars represent mean F1 score and SD spread for the corresponding category. Individual gray points in each category indicate the held-out performance of a single neural network part of the ensemble (n = 5). AC indicates adenocarcinoma; CESC-AC, cervical/endocervical adenocarcinoma, CESC-SCC, cervical/endocervical squamous cell carcinoma; DLBC, diffuse large B-cell lymphoma; NCI-DLBC, National Cancer Institute’s cohort of DLBC; PCPG, pheochoromocytoma/paraganglioma; SCC, squamous cell carcinoma; TFRI-GBM, Terry Fox Research Institute’s cohort of non–cell line glioblastoma; UCEC, uterine corpus endometrial carcinoma.
Performance of SCOPE on Genentech Cohort of Primary Mesotheliomas
| Mesothelioma Subtype | Total Cases With Subtype, No. | Precision | Recall | Predicted | Predicted Cases, No. | |
|---|---|---|---|---|---|---|
| Biphasic epithelioidlike | 72 | 1 | 1 | 1 | Epithelioid mesothelioma | 72 |
| Epithelioid | 54 | 1 | 0.98 | 0.99 | Epithelioid mesothelioma | 53 |
| Sarcomatoid | 29 | NA | NA | NA | Sarcomatoid mesothelioma | 18 |
| NA | NA | NA | Epithelioid mesothelioma | 5 | ||
| NA | NA | NA | Sarcoma | 4 | ||
| NA | NA | NA | Other | 2 | ||
| Biphasic sarcomalike | 56 | NA | NA | NA | Epithelioid mesothelioma | 38 |
| NA | NA | NA | Sarcomatoid mesothelioma | 17 | ||
| NA | NA | NA | Other | 1 |
Abbreviations: NA, not applicable; SCOPE, Supervised Cancer Origin Prediction Using Expression.
Data from Bueno et al,[25] 2016.
Figure 2. Performance of Supervised Cancer Origin Prediction Using Expression (SCOPE) on External Metastatic Cohort
A, SCOPE has improved performance compared with baseline linear comparator trained from a statistically filtered feature subset. Numbers in parentheses indicate sample size. B, Two-sided t tests show a significant association of tumor content on general diagnosis as organ system, for biopsies sampled from site of metastasis. C, Two-sided t tests show no effect of tumor content on misclassification to organ system, for biopsies sampled from the cancer’s site of origin. B and C, Box plots illustrate the median (center line in box) with the lower and upper hinges indicating the 25th and 75th percentiles, respectively. The upper whisker shows the largest value at most 1.5 times the interquartile range (IQR) from the hinge, and the lower whisker shows the smallest value at most 1.5 times the IQR of the hinge. The IQR is calculated as the distance between the first and third quartiles. Data points outside these ranges are plotted as individual points. AC indicates adenocarcinoma; CESC-AC, cervical/endocervical adenocarcinoma; CNS group, lower-grade glioma, glioblastoma multiforme; GEJ group, esophageal AC, esophageal SCC, stomach AC, liver hepatocellular carcinoma, papillary kidney carcinoma; LNG group, lung AC, lung squamous cell carcinoma (SCC); MISC group, prostate AC, testicular germ cell tumor, CESC-AC, subcutaneous melanoma, diffuse large B-cell lymphoma, follicular lymphoma, thymoma, adrenocortical carcinoma; UCEC, uterine corpus endometrial carcinoma.
aP = .03.
bP < .001.
cP = .79.
dP = .70.
Performance on the Metastatic Cohort
| Diagnosed Type | Total Cases, No. | Cohort Metrics | Cases Predicted, No. | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| TPR | FPR | TP | TN | FP | FN | Precision | Recall | Diagnosis | Biopsy Site | Organ System | Other | |||
| Mesothelioma | 1 | 1.00 | 0.00 | 1 | 130 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1 | 0 | 0 | 0 |
| Colorectal AC | 21 | 0.81 | 0.00 | 17 | 114 | 0 | 4 | 1.00 | 0.81 | 0.89 | 17 | 1 | 2 | 1 |
| UCEC | 5 | 0.40 | 0.00 | 2 | 129 | 0 | 3 | 1.00 | 0.40 | 0.57 | 2 | 0 | 1 | 2 |
| Uterine carcinosarcoma | 4 | 0.25 | 0.00 | 1 | 130 | 0 | 3 | 1.00 | 0.25 | 0.40 | 1 | 0 | 2 | 1 |
| Breast carcinoma | 65 | 0.97 | 0.03 | 63 | 68 | 2 | 2 | 0.97 | 0.97 | 0.97 | 63 | 1 | 0 | 1 |
| LNG_group | 14 | 1.00 | 0.01 | 14 | 117 | 1 | 0 | 0.93 | 1.00 | 0.97 | 14 | 0 | 0 | 0 |
| Sarcoma | 17 | 0.53 | 0.01 | 9 | 122 | 1 | 8 | 0.90 | 0.53 | 0.67 | 9 | 1 | 0 | 7 |
| Ovarian carcinoma | 7 | 0.86 | 0.01 | 6 | 160 | 1 | 1 | 0.86 | 0.86 | 0.86 | 6 | 0 | 0 | 1 |
| Pancreatic AC | 9 | 0.33 | 0.01 | 3 | 158 | 1 | 6 | 0.75 | 0.33 | 0.46 | 3 | 1 | 4 | 1 |
| MISC_group | 9 | 0.88 | 0.00 | 8 | 125 | 1 | 1 | 0.73 | 0.88 | 0.77 | 8 | 0 | 0 | 1 |
| Cholangio-carcinoma | 5 | 0.80 | 0.02 | 4 | 127 | 2 | 1 | 0.67 | 0.80 | 0.73 | 4 | 0 | 1 | 0 |
| GEJ_group | 11 | 0.29 | 0.01 | 3 | 151 | 5 | 8 | 0.31 | 0.29 | 0.26 | 3 | 3 | 1 | 4 |
| CNS_group | 6 | 1.00 | 0.00 | 6 | 23 | 0 | 0 | 1.00 | 1.00 | 1.00 | 6 | 0 | 0 | 0 |
| Breast carcinoma | 4 | 1.00 | 0.00 | 4 | 25 | 0 | 0 | 1.00 | 1.00 | 1.00 | 4 | 0 | 0 | 0 |
| Colorectal AC | 1 | 1.00 | 0.00 | 1 | 28 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1 | 0 | 0 | 0 |
| GEJ_group | 1 | 1.00 | 0.00 | 1 | 28 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1 | 0 | 0 | 0 |
| MISC_group | 2 | 1.00 | 0.00 | 2 | 27 | 0 | 0 | 1.00 | 1.00 | 1.00 | 2 | 0 | 0 | 0 |
| Pancreatic AC | 2 | 1.00 | 0.00 | 2 | 27 | 0 | 0 | 1.00 | 1.00 | 1.00 | 2 | 0 | 0 | 0 |
| Uterine carcinosarcoma | 1 | 1.00 | 0.00 | 1 | 28 | 0 | 0 | 1.00 | 1.00 | 1.00 | 1 | 0 | 0 | 0 |
| Sarcoma | 6 | 0.83 | 0.00 | 5 | 24 | 0 | 1 | 1.00 | 0.83 | 0.91 | 5 | 0 | 0 | 1 |
| Mesothelioma | 4 | 0.75 | 0.00 | 3 | 26 | 0 | 1 | 1.00 | 0.75 | 0.86 | 3 | 0 | 0 | 1 |
| LNG_group | 5 | 0.88 | 0.02 | 4 | 23 | 1 | 1 | 0.75 | 0.88 | 0.76 | 4 | 0 | 1 | 0 |
| UCEC | 1 | 0.00 | 0.00 | 0 | 29 | 0 | 1 | 0.00 | 0.00 | 0.00 | 0 | 0 | 1 | 0 |
| Total | 201 | 0.76 | 0.005 | 160 | 3128 | 19 | 41 | 0.86 | 0.76 | 0.79 | 160 | 7 | 13 | 21 |
Abbreviations: AC, adenocarcinoma; CESC–AC, cervical/endocervical adenocarcinoma; FN, false-negative count; FP, false-positive count; FPR, false-positive rate; SCC, squamous cell carcinoma; TN, true-negative count; TP, true-positive count; TPR, true-positive rate; UCEC, uterine corpus endometrial carcinoma.
CNS_group includes lower-grade glioma, glioblastoma multiforme. LNG_group includes lung AC, and lung SCC; GEJ_group includes esophageal AC, esophageal SCC, stomach AC, liver hepatocellularcarcinoma, and papillary kidney carcinoma. MISC_group includes prostate AC, testicular germ cell tumor, CESC-AC, subcutaneous melanoma, diffuse large B-cell lymphoma, follicular lymphoma, thymoma, and adrenocortical carcinoma.
Precision, as indicated, is equivalent to class-specific accuracy.
Cases where predicted cancer type matched pathology diagnosis (diagnosis), was the same as tissue type of the biopsy site (biopsy site), matched a cancer type with same organ system of origin (organ system), or did not match any of the above (other).
Figure 3. Supervised Cancer Origin Prediction Using Expression (SCOPE) Prediction and Putative Primary for Cancers of Unknown Primary Site
A, Confusion matrix of predictions for primaries of unknown origin. The size of the circles represents relative number of samples in each category. B, Case count for cancers of unknown primary of origin by category. Correct predictions are shown in light blue; incorrect predictions, dark blue.