| Literature DB >> 33166334 |
Jun Kang1, Ahwon Lee1, Youn Soo Lee1.
Abstract
Breast cancers with PIK3CA mutations can be treated with PIK3CA inhibitors in hormone receptor-positive HER2 negative subtypes. We applied a supervised elastic net penalized logistic regression model to predict PIK3CA mutations from gene expression data. This regression approach was applied to predict modeling using the TCGA pan-cancer dataset. Approximately 10,000 cases were available for PIK3CA mutation and mRNA expression data. In 10-fold cross-validation, the model with λ = 0.01 and α = 1.0 (ridge regression) showed the best performance, in terms of area under the receiver operating characteristic (AUROC). The final model was developed with selected hyper-parameters using the entire training set. The training set AUROC was 0.93, and the test set AUROC was 0.84. The area under the precision-recall (AUPR) of the training set was 0.66, and the test set AUPR was 0.39. Cancer types were the most important predictors. Both insulin like growth factor 1 receptor (IGF1R) and the phosphatase and tensin homolog (PTEN) were the most significant genes in gene expression predictors. Our study suggests that predicting genomic alterations using gene expression data is possible, with good outcomes.Entities:
Year: 2020 PMID: 33166334 PMCID: PMC7652327 DOI: 10.1371/journal.pone.0241514
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Prevalence rate of PIK3CA mutations across cancer types.
Cancer type abbreviations are explained in the S1 Appendix.
Fig 2Summary of modeling results.
(A) Left: receiver operating characteristic (ROC) curve. Right: precision-recall (PR) curve of training set and test set. The horizontal green line is the PIK3CA mutation rate (0.11) (B) Correlation between training set and test set of the area under the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPR) among cancer types. The gray band is the 95% confidence interval. Abbreviations are explained in the S1 Appendix. (C) Correlations between the PIK3CA mutation rate of the AUROC, and the AUPR.
Fig 3Coefficient model.
(A) Top 30 high mRNA coefficients. (B) Cancer type coefficients. Cancer types abbreviations are explained in the S1 Appendix.