| Literature DB >> 34762643 |
Maxim Barenboim1, Michal Kovac2,3, Baptiste Ameline2, David T W Jones4, Olaf Witt4,5, Stefan Bielack6, Stefan Burdach1,7, Daniel Baumhoer2, Michaela Nathrath1,8.
Abstract
Although osteosarcoma (OS) is a rare cancer, it is the most common primary malignant bone tumor in children and adolescents. BRCAness is a phenotypical trait in tumors with a defect in homologous recombination repair, resembling tumors with inactivation of BRCA1/2, rendering these tumors sensitive to poly (ADP)-ribose polymerase inhibitors (PARPi). Recently, OS was shown to exhibit molecular features of BRCAness. Our goal was to develop a method complementing existing genomic methods to aid clinical decision making on administering PARPi in OS patients. OS samples with DNA-methylation data were divided to BRCAness-positive and negative groups based on the degree of their genomic instability (n = 41). Methylation probes were ranked according to decreasing variance difference between two groups. The top 2000 probes were selected for training and cross-validation of the random forest algorithm. Two-thirds of available OS RNA-Seq samples (n = 17) from the top and bottom of the sample list ranked according to genome instability score were subjected to differential expression and, subsequently, to gene set enrichment analysis (GSEA). The combined accuracy of trained random forest was 85% and the average area under the ROC curve (AUC) was 0.95. There were 449 upregulated and 1,079 downregulated genes in the BRCAness-positive group (fdr < 0.05). GSEA of upregulated genes detected enrichment of DNA replication and mismatch repair and homologous recombination signatures (FWER < 0.05). Validation of the BRCAness classifier with an independent OS set (n = 20) collected later in the course of study showed AUC of 0.87 with an accuracy of 90%. GSEA signatures computed for this test set were matching the ones observed in the training set enrichment analysis. In conclusion, we developed a new classifier based on DNA-methylation patterns that detects BRCAness in OS samples with high accuracy. GSEA identified genome instability signatures. Machine-learning and gene expression approaches add new epigenomic and transcriptomic aspects to already established genomic methods for evaluation of BRCAness in osteosarcoma and can be extended to cancers characterized by genome instability.Entities:
Mesh:
Year: 2021 PMID: 34762643 PMCID: PMC8584788 DOI: 10.1371/journal.pcbi.1009562
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1ROC curves for each cross validation fold (dashed-black line) and the mean ROC curve (solid black line) for random forest models fitted (A) to 43 samples and (B) to 41 samples where two samples, HD_00J and I047_005 were removed (the dashed diagonal line represent AUC = 0.5, i.e. classification at random). (C) ROC curve for independent validation test (AUC = 0.87) (n = 20).
Confusion matrices of each cross validation and average fold for random forest models.
| Fold 1 | Fold 2 | Fold 3 | Fold 4 | Fold 5 | Fold Average | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predicted | Predicted | Predicted | Predicted | Predicted | Predicted | ||||||||
| RF models | Actual | No | Yes | No | Yes | No | Yes | No | Yes | No | Yes | No | Yes |
| No | 2 | 1 | 2 | 1 | 2 | 1 | 4 | 0 | 3 | 0 | 13 (81%) | 3 (19%) | |
| Yes | 1 | 4 | 2 | 3 | 0 | 5 | 0 | 5 | 0 | 5 | 3 (12%) | 22 (88%) | |
| No | 1 | 2 | 3 | 0 | 3 | 0 | 2 | 2 | 1 | 3 | 10 (59%) | 7 (41%) | |
| Yes | 0 | 5 | 1 | 4 | 1 | 4 | 1 | 5 | 0 | 5 | 3 (12%) | 23 (88%) | |
aAccuracy = (22+13)/(22+13+3+3) x100 = 85% (samples I047_005 and HD_00J removed)
bAccuracy = (23+10)/(22+13+3+7) x100 = 77%
Fig 2Initial unsupervised hierarchical clustering of 2000 probes (A) in all available 43 samples and (C) in 41 sample set after removal of two samples (HD_00J and I047_005). Note that these two samples were clustered incorrectly. Lower part in heatmap (A) shows probes with very low intensities across all samples yielding delta variances with negative values. Brown band: BRCAness-positive, blue band: BRCAness negative samples. (B) Barplot shows the number of positive Δ-variances re-computed after excluding each sample consequently from 43 sample set. Red—BRCAness negative, green—positive samples, Y-axe—number of positive Δ-variances. Subsequent removal of HD_00J and I047_005 gives the largest increase in the number of positive Δ-variances.
Fig 3Unsupervised hierarchical clustering of samples and probes in 41 sample set.
(A) 246 probes identified as important non-correlated predictors combined after 5-fold cross validation and (B) 4052 probes correlated to and including 246 probes discriminate between BRCAness-positive and BRCAness-negative samples. BRCAness-positive and BRCAness-negative samples indicated by brown and blue horizontal bars at the top, correspondingly. Please note, for the final RF model, used for validation with independent test set (n = 20), all 41 samples were used as one training set. This set yielded 54 uncorrelated probes out of 2000 probes.
Fig 4Dendrogram showing the results of our hierarchical cluster analysis of (A) 20 RNA-Seq samples selected from two thirds top and bottom of the OS sample list arranged according to PGC score; and (B) 17 remained RNA-Seq samples used for DE analysis and GSEA. BRCAness-positive samples are denoted by red color and BRCAness-negative by cyan color. Leaves representing excluded samples are denoted with ’x’. The demarcated boxes in red and cyan represent two groups obtained by cutting the dendrogram at even height and used further in DGE and GSEA. The height of a group is defined by Ward’s method.
Enrichment results for gene sets.
| GSEA | NAME | SIZE | ES | NES | NOM p-val | FDR q-val | FWER p-val |
|---|---|---|---|---|---|---|---|
| Positive enrichment score | KEGG_DNA_REPLICATION | 30 | 0.639 | 0.639 | 0.000 | 0.000 | 0.000 |
| KEGG_MISMATCH_REPAIR | 22 | 0.490 | 0.490 | 0.000 | 0.001 | 0.003 | |
| KEGG_HOMOLOGOUS_RECOMBINATION | 26 | 0.482 | 0.482 | 0.000 | 0.001 | 0.004 | |
| KEGG_ONE_CARBON_POOL_BY_FOLATE | 17 | 0.427 | 0.427 | 0.001 | 0.004 | 0.023 | |
| KEGG_RNA_POLYMERASE | 28 | 0.426 | 0.426 | 0.000 | 0.003 | 0.023 | |
| Negative enrichment score | KEGG_ASTHMA | 23 | 0.705 | -0.705 | 0.000 | 0.000 | 0.000 |
| KEGG_GRAFT_VERSUS_HOST_DISEASE | 36 | 0.674 | -0.674 | 0.000 | 0.000 | 0.000 | |
| KEGG_ALLOGRAFT_REJECTION | 33 | 0.669 | -0.669 | 0.000 | 0.000 | 0.000 | |
| KEGG_TYPE_I_DIABETES_MELLITUS | 39 | 0.602 | -0.602 | 0.000 | 0.000 | 0.000 | |
| KEGG_AUTOIMMUNE_THYROID_DISEASE | 33 | 0.578 | -0.578 | 0.000 | 0.000 | 0.000 | |
| KEGG_HEMATOPOIETIC_CELL_LINEAGE | 80 | 0.549 | -0.549 | 0.000 | 0.000 | 0.000 | |
| KEGG_RENIN_ANGIOTENSIN_SYSTEM | 16 | 0.547 | -0.547 | 0.000 | 0.000 | 0.000 | |
| KEGG_INTESTINAL_IMMUNE_NETWORK_FO | 43 | 0.492 | -0.492 | 0.000 | 0.001 | 0.003 | |
| KEGG_COMPLEMENT_AND_COAGULATION_ | 61 | 0.487 | -0.487 | 0.000 | 0.001 | 0.004 | |
| KEGG_CELL_ADHESION_MOLECULES_CAMS | 122 | 0.468 | -0.468 | 0.000 | 0.001 | 0.005 | |
| KEGG_PRIMARY_IMMUNODEFICIENCY | 34 | 0.453 | -0.453 | 0.000 | 0.001 | 0.009 | |
| KEGG_LINOLEIC_ACID_METABOLISM | 22 | 0.428 | -0.428 | 0.001 | 0.002 | 0.021 | |
| KEGG_LEISHMANIA_INFECTION | 69 | 0.427 | -0.427 | 0.000 | 0.002 | 0.023 |
Test set confusion matrix.
| Actual | Predicted | ||
|---|---|---|---|
| No | Yes | ||
| n = 20 | No | 5(83%) | 1(17%) |
| Yes | 1(7%) | 13(93%) | |
aAccuracy = (5+13)/20 x100 = 90%; Sensitivity = 13/(13+1)x100 = 93%; Specificity = 5/(5+1)x100 = 83%.