Literature DB >> 33166334

Prediction of PIK3CA mutations from cancer gene expression data.

Abstract

Breast cancers with PIK3CA mutations can be treated with PIK3CA inhibitors in hormone receptor-positive HER2 negative subtypes. We applied a supervised elastic net penalized logistic regression model to predict PIK3CA mutations from gene expression data. This regression approach was applied to predict modeling using the TCGA pan-cancer dataset. Approximately 10,000 cases were available for PIK3CA mutation and mRNA expression data. In 10-fold cross-validation, the model with λ = 0.01 and α = 1.0 (ridge regression) showed the best performance, in terms of area under the receiver operating characteristic (AUROC). The final model was developed with selected hyper-parameters using the entire training set. The training set AUROC was 0.93, and the test set AUROC was 0.84. The area under the precision-recall (AUPR) of the training set was 0.66, and the test set AUPR was 0.39. Cancer types were the most important predictors. Both insulin like growth factor 1 receptor (IGF1R) and the phosphatase and tensin homolog (PTEN) were the most significant genes in gene expression predictors. Our study suggests that predicting genomic alterations using gene expression data is possible, with good outcomes.

Entities: Chemical Disease Gene Mutation Species

Year: 2020 PMID： 33166334 PMCID： PMC7652327 DOI： 10.1371/journal.pone.0241514

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Targeted therapy has become a standard treatment for many cancer patients, however the approach requires a test for a specific cancer genomic alteration, to treat patients. Several direct genomic alteration tests have been developed and proven for their clinical utility to treat patients [1, 2]. Machine learning approaches can be applied to detect genomic alterations. Machine learning algorithms can build prediction models from a large number of predictors, such as radiomic features [3], pathology image [4] or gene expression data [5]. Because most direct genomic tests are more specific and sensitive than predictive models, machine learning approaches may have limited roles in clinical practice, however, machine learning approaches are ideal when direct tests are unavailable or fail. RAS pathway activation predictions have been performed using gene expression data [5]. Authors used data from The Cancer Genome Atlas (TCGA), with a supervised elastic net penalized logistic regression classifier, with stochastic gradient descent. Their model performance was 84% with an area under the receiver operating characteristic (AUROC) curve, and 63% with an area under the precision-recall (AUPR) curve. Importantly, these authors suggested their approach could be applied to other genomic alterations. Breast cancer having PIK3CA mutations can be treated using PIK3CA inhibitors, in hormone receptor-positive HER2 negative subtypes [6]. The PIK3CA mutation is the second most common driver mutation after TP53, and is most frequently detected in endometrial carcinoma (45%), followed by breast invasive carcinoma (24%), cervical squamous cell carcinoma, endo-cervical adenocarcinoma (20%) and colon adenocarcinoma (16%) [7]. PIK3CA encodes the p110α catalytic subunit of phosphatidylinositol 3′-kinase (PI3K). PI3K is a protein kinase that phosphorylates phosphatidylinositol 4,5-biphosphate (PIP2) to generate phosphatidylinositol 3,4,5-triphosphate (PIP3). The phosphatase and tensin homolog (PTEN) converts PIP2 to PIP3 in contrast to PI3K. PIP3 is a second messenger that activates protein kinase B (AKT), which is a serine/threonine-specific protein kinase. AKT inhibits apoptosis and promotes cell proliferation [8]. We applied a supervised elastic net penalized logistic regression model to predict PIK3CA mutations. We wanted to ascertain whether this prediction model approach could be applied not only to RAS pathway activation, but also to PIK3CA mutation predictions. The purpose of this study is to investigate the PIK3CA mutation prediction performance of machine learning models.

Materials and methods

Dataset

We used the TCGA pan-cancer dataset. TCGA archives the following; exome sequencing, gene expression, DNA methylation, protein expression, and clinical data from > 10,000 cancer samples across 33 common cancer types. The TCGA dataset is publically available. PIK3CA mutation data was extracted using cgdsr rpackage [9]. Gene expression data was downloaded from the National Cancer Institute (NCI)’s Genomic Data Commons (GDC) website. This archives data for TCGA (https://gdc.cancer.gov/about-data/publications/pancanatlas). Gene expression in the TCGA pan-cancer dataset is batch-corrected with normalization. The target variable was PIK3CA mutation status. PIK3CA status was considered positive when the case had the following PIK3CA variants (C420R, E542K, E545A, E545D, E545G, E545K, Q546E, Q546R, H1047L, H1047R, H1047Y) which were the target variants of the Therascreen PIK3CA RGQ PCR Kit, Qiagen, Hilden, Germany. This kit was approved as a companion diagnostics test to treat with PIK3CA inhibitor by the United States Food and Drug Administration.

Modeling process

To narrow down potential predictors, genes with a large median absolute deviation (> third-quartiles) were selected. Thirty three cancer type dummy variables were included in predictor variables. We split three-quarters of the dataset into the training set and one quarter into the test set. Yeo-Johnson transformation was performed to correct skewness. Centering and scaling were also performed. All preprocessing was performed using the recipe r package [10]. Penalized logistic regression was applied to prediction modeling. Ten-fold cross-validation with target variable stratification was performed over the hyper-parameter grid: λ {10−5, 10−4,10−3,10−2,10−1, 100}, α {0.0, 0.25, 0.5, 0.75, 1.0}. Lambda (λ) is a penalty scaling parameter and alpha (α) is a mixing parameter of penalty function () [11].

Assessing model performance

Model performance was evaluated using AUROC and AUPR curve approaches. The AUPR approach is more informative than AUROC for imbalanced datasets [12]. The modeling process and assessing model performance were performed with the tidymodels rpackage [13].

Results

Dataset summary

10,845 cases were available for both PIK3CA mutation and mRNA expression data. 5,128 out of 20,502 genes were included in the modeling process, after filtering for median absolute deviation, as described in the modeling process method. The prevalence rate for PIK3CA mutation was 0.11 in all cases. The PIK3CA mutation prevalence rate in each cancer type varied. The median prevalence rate of PIK3CA mutation for each cancer type was 0.03 (range 0–0.33) (Fig 1).

Fig 1

Prevalence rate of PIK3CA mutations across cancer types.

Cancer type abbreviations are explained in the S1 Appendix.

Prevalence rate of PIK3CA mutations across cancer types.

Cancer type abbreviations are explained in the S1 Appendix.

Selecting model and performance estimation

For 10-fold cross-validation, the model with λ = 0.01 and α = 1.0 (ridge regression) showed the best performance in terms of AUROC (S1 Fig). The final model was trained with the selected hyper-parameters with the entire training set. The training set AUROC was 0.93 and the test set AUROC was 0.84. The AUPR of the training set was 0.66 and the test set AUPR was 0.39 (Fig 2A).

Fig 2

Summary of modeling results.

(A) Left: receiver operating characteristic (ROC) curve. Right: precision-recall (PR) curve of training set and test set. The horizontal green line is the PIK3CA mutation rate (0.11) (B) Correlation between training set and test set of the area under the receiver operating characteristic curve (AUROC), and the area under the precision-recall curve (AUPR) among cancer types. The gray band is the 95% confidence interval. Abbreviations are explained in the S1 Appendix. (C) Correlations between the PIK3CA mutation rate of the AUROC, and the AUPR.

Summary of modeling results.

Performance of each cancer type

Because PIK3CA mutation prevalence varied across cancer types, the performance of each cancer type was investigated. The AUROC and AUPR were positively correlated between the training sets and test sets in cancer type sub-analysis (Fig 2B). The AUPR was high in cancer types with high PIK3CA mutation rates such as colon, breast and uterus cancer types. The AUROC did not correlate with PIK3CA mutation rates of each cancer type (Fig 2C).

Important predictors

The top 30 important predictors are shown (Fig 3). The coefficient is the parameter of the predictor which represents the effect of the predictor on prediction. Insulin like growth factor 1 Receptor (IGF1R) mRNA expression was the strongest negative predictor, and PTEN was the strongest positive predictor. Both IGF1R and PTEN are key players in the tyrosine kinase pathway [8, 14]. The cancer types were important predictors. Some cancer types including uterine carcinosarcoma (UCS), bladder urothelial carcinoma (BLCA), pancreatic adenocarcinoma (PAAD), lymphoid neoplasm diffuse large B-cell lymphoma (DLBC) were the strongest predictors.

Fig 3

Coefficient model.

(A) Top 30 high mRNA coefficients. (B) Cancer type coefficients. Cancer types abbreviations are explained in the S1 Appendix.

Coefficient model.

(A) Top 30 high mRNA coefficients. (B) Cancer type coefficients. Cancer types abbreviations are explained in the S1 Appendix.

Discussion

Our model showed good performance in predicting PIK3CA mutations in various cancer types. Our data suggested that the supervised elastic net penalized logistic regression model could be applied not only to the RAS activation pathway, but also to other genomic alterations. Both the RAS activation pathway and PIK3CA mutations are key, common cancer genomic alterations. Because they exert significant effect on gene expression in cancer cells, prediction from gene expression data can be good. However, the supervised elastic net penalized logistic regression model cannot be generalized or applied to other genomic alterations which have have a weak effect on gene expression. Prediction modeling from the TCGA pan-cancer dataset can be limiting in terms of data preprocessing. The gene expression data is processed by between-sample normalization to remove batch effects. If the model has been trained from between-sample normalization, a new sample cannot be exactly processed with normalization which was done on trainset. A model based on gene expression from the TCGA pan-cancer dataset has limitation in terms of data preprocessing. It is necessary to develop a processing method that is independent of a dataset, to apply gene expression data to the prediction model. Our PIK3CA prediction model was similar to the RAS activation prediction model in terms of AUROC (0.84). However the AUPR of our model was lower than the RAS activation model (0.39 versus 0.63). The reason for our lower AUPR may be explained by an imbalanced dataset that has the low prevalence rate of PIK3CA mutations [5]. The model for RAS activation trained with cancer types with more than 0.05 prevalence of RAS activation to avoid imbalance classification problem. We included all cancer types in our modeling process. The lower prevalence rate of target variables meant our dataset had a lower AUPR baseline. In the sub-analysis performance of each cancer type, the cancer types with higher PIK3CA mutation rates showed better AUPRs. Our model included cancer types as predictors, and they were stronger predictors than gene expression. The varying prevalence of PIC3CA mutations across cancer types may be a reason for the strong predictive power of cancer types. Some significant gene expression predictors were closely related to the PTEN-PI3K pathway. PTEN and IGFR1R were the strongest gene expression predictors, which has negative and positive predictive powers. IGF1R is a tyrosine kinase receptor that activates PI3K [14], and PTEN is an important regulator of PIP3 by dephosphorylating PIP3 [8]. Several studies have attempted to predict genomic alterations from gene expression data [15, 16]. A study investigated PIK3CA mutation predictions using gene-expression signatures which is a sum of the average of the logarithmic gene expression. The model showed good performance AUROC 0.71 in an independent test set [15, 16]. Another study predicted copy number alterations with gene expression, using a multinomial logistic regression model with least absolute shrinkage and selection operator (LASSO) parameters [17]. The prediction of the 1p/19q codeletion was very good, with an AUROC of 0.997, and gene-level predictions were good, with an AUROC of 0.75 [17]. A logistic regression model was used for MYCN Proto-Oncogene, BHLH Transcription Factor (MYCN) gene amplification in neuroblastoma [18]. The clinical utility of PIK3CA mutation prediction from mRNA expression is unclear because most direct genomic tests are more specific and sensitive than predictive models. Our prediction model is not an application that is immediately applicable to a cancer patient for detection of PIK3CA mutation. It is not known how it will be used, but finding out the mutation prediction performance using gene expression data could play a role in advancing machine learning to be helpful in patient treatment. Our study suggested that the prediction of genomic alterations using gene expression data was possible, with good performance. However, improved performances are required for clinical tests, and the standardization of generation processing of gene expression data is also needed.

Abbreviations of cancer type.

(PDF) Click here for additional data file.

Hyperparameter tuning and performance assessment in 10-fold cross-validation resampling.

The x-axis is a penalty scaling parameter: λ {10−5, 10−4,10−3,10−2,10−1, 100}, color is mixture hyperparameter of penalty function: α {0.0, 0.25, 0.5, 0.75, 1.0}. y-axis is estimates of area under the receiver operating characteristic (AUROC) using 10-fold cross-validation resampling. (TIF) Click here for additional data file. 26 Aug 2020 PONE-D-20-22669 Prediction of PIK3CA mutations from cancer gene expression data PLOS ONE Dear Dr. Lee, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please respond to the comments item-by-item to satisfy the reviewers' concerns. Please carefully edit the MS for English corrections and typos. Please submit your revised manuscript by Oct 10 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript: A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols We look forward to receiving your revised manuscript. Kind regards, Nandini Dey, MS., Ph.D Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Partly Reviewer #2: Yes ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: I Don't Know Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This is a regulated regression analysis to find out the most significant variable(s) and that is kept in the final model. This is a very concise article and lack of detail methodology. It has been known for quite some times that PIK3CA GOF mutation is very common (in fact next to TP53) in solid tumors. It is also known that PIK3CA is very much related to PTEN and IGF1R signaling. The finding is not new and the rationale for this article is not very clear. More importantly, this type of article is not suitable for PLOS ONE audience. Authors may consider to some bio-informatics or bio-statistics journal. Reviewer #2: The authors present a succinct study on the prediction of PIK3CA mutations from gene expression data. This study applies an elastic net penalized logistic regression classifier to the cancer genome atlas (TCGA) pan-cancer gene expression dataset, a method that was previously established for detecting RAS pathway activation. The methods used and the results presented in the figures appear to be appropriate for the work performed. Both the AUROC and AUPRC demonstrate predictive performance well above baseline. Limitations of the approach used were also appropriately discussed. It may be questionable why PIK3CA mutation prediction from mRNA expression is useful when targeted sequencing panels can assay these mutations directly, but it has been proposed elsewhere that clinical transcriptomics may add important functional or phenotypic information. Overall this work demonstrates that machine learning approaches can predict PIK3CA mutation status from gene expression data with a reasonably good level of performance. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. 9 Sep 2020 Reviewer #1: Comment 1 This is a regulated regression analysis to find out the most significant variable(s) and that is kept in the final model. This is a very concise article and lack of detail methodology. Response Thank you for your constructive feedback considering the lack of detailed methodology. TCGA is a widely used public data of cancer genomics. The detail of the TCGA pan-cancer data is described in the reference. For the method of prediction modeling, we tried to follow guidelines for developing and reporting machine learning predictive models in biomedical research. We add a supplementary figure to help understand hyperparameter tuning. Luo, Wei, Dinh Phung, Truyen Tran, Sunil Gupta, Santu Rana, Chandan Karmakar, Alistair Shilton, et al. “Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View.” Journal of Medical Internet Research 18, no. 12 (2016): e323. https://doi.org/10.2196/jmir.5870. Comment 2 It has been known for quite some times that PIK3CA GOF mutation is very common (in fact next to TP53) in solid tumors. It is also known that PIK3CA is very much related to PTEN and IGF1R signaling. The finding is not new and the rationale for this article is not very clear. Response Thank you for your opinion. Our study aims to build the PIK3CA mutation prediction model not to search for important variables. Because the penalized logistic regression model is highly interpretable, we were able to find significant variables like IGF1R and PTEN. But this findings of significant variables is not the primary purpose of this study. The purpose of this study is to investigate the *PIK3CA* mutation prediction performance of machine learning models. The purpose of the study is further described in the manuscript. Comment 3 More importantly, this type of article is not suitable for PLOS ONE audience. Authors may consider to some bio-informatics or bio-statistics journal. Response Thank you for your suggestion. Since machine learning modeling is complex and has begun to be widely used relatively recently, the audience may lack an understanding of detailed methods. However, our study used a widely used data set (TCGA) and modeling framework (R tidymodels package). We believe our research will benefit audiences interested in applying machine learning to patient care. We also believe that publishers targeting a broad audience are publishing predictive model studies using machine learning. Reviewer #2: Comment 1 The authors present a succinct study on the prediction of PIK3CA mutations from gene expression data. This study applies an elastic net penalized logistic regression classifier to the cancer genome atlas (TCGA) pan-cancer gene expression dataset, a method that was previously established for detecting RAS pathway activation. The methods used and the results presented in the figures appear to be appropriate for the work performed. Both the AUROC and AUPRC demonstrate predictive performance well above baseline. Limitations of the approach used were also appropriately discussed. Response Thank you for your opinion. Comment 2 It may be questionable why PIK3CA mutation prediction from mRNA expression is useful when targeted sequencing panels can assay these mutations directly, but it has been proposed elsewhere that clinical transcriptomics may add important functional or phenotypic information. Response As you pointed out, the clinical utility of PIK3CA mutation prediction from mRNA expression is unclear because most direct genomic tests are more specific and sensitive than predictive models. Our prediction model is not an application that is immediately applicable to a cancer patient for the detection of PIK3CA mutation. It is not known how it will be used, but finding out the mutation prediction performance using gene expression data could play a role in advancing machine learning to be helpful in patient treatment. We discussed further the limitations of this study in the manuscript. Comment 3 Overall this work demonstrates that machine learning approaches can predict PIK3CA mutation status from gene expression data with a reasonably good level of performance. Response Thank you for your opinion. Submitted filename: Response-to-Reviewers.pdf Click here for additional data file. 16 Oct 2020 Prediction of PIK3CA mutations from cancer gene expression data PONE-D-20-22669R1 Dear Dr. Lee, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, Nandini Dey, MS., Ph.D Academic Editor PLOS ONE Additional Editor Comments (optional): Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: No Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: N/A Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: As a reviewer I am not satisfied the overall approach of the MS and I am also not satisfied after the revision too. Reviewer #2: In my original review I questioned the utility of PIK3CA mutation prediction from mRNA data. The authors have added text to the discussion section that has addressed this comment appropriately. ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 23 Oct 2020 PONE-D-20-22669R1 Prediction of PIK3CA mutations from cancer gene expression data Dear Dr. Lee: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. Nandini Dey Academic Editor PLOS ONE

14 in total

1. Comprehensive Characterization of Cancer Driver Genes and Mutations.

Authors: Matthew H Bailey; Collin Tokheim; Eduard Porta-Pardo; Sohini Sengupta; Denis Bertrand; Amila Weerasinghe; Antonio Colaprico; Michael C Wendl; Jaegil Kim; Brendan Reardon; Patrick Kwok-Shing Ng; Kang Jin Jeong; Song Cao; Zixing Wang; Jianjiong Gao; Qingsong Gao; Fang Wang; Eric Minwei Liu; Loris Mularoni; Carlota Rubio-Perez; Niranjan Nagarajan; Isidro Cortés-Ciriano; Daniel Cui Zhou; Wen-Wei Liang; Julian M Hess; Venkata D Yellapantula; David Tamborero; Abel Gonzalez-Perez; Chayaporn Suphavilai; Jia Yu Ko; Ekta Khurana; Peter J Park; Eliezer M Van Allen; Han Liang; Michael S Lawrence; Adam Godzik; Nuria Lopez-Bigas; Josh Stuart; David Wheeler; Gad Getz; Ken Chen; Alexander J Lazar; Gordon B Mills; Rachel Karchin; Li Ding
Journal: Cell Date: 2018-04-05 Impact factor: 41.582

2. Pyrosequencing for EGFR mutation detection: diagnostic accuracy and clinical implications.

Authors: Nora Sahnane; Rossana Gueli; Maria G Tibiletti; Barbara Bernasconi; Michele Stefanoli; Francesca Franzi; Graziella Pinotti; Carlo Capella; Daniela Furlan
Journal: Diagn Mol Pathol Date: 2013-12

Review 3. The insulin-like growth factor system and cancer.

Authors: Derek LeRoith; Charles T Roberts
Journal: Cancer Lett Date: 2003-06-10 Impact factor: 8.679

4. Gene expression profiling reveals new aspects of PIK3CA mutation in ERalpha-positive breast cancer: major implication of the Wnt signaling pathway.

Authors: Magdalena Cizkova; Géraldine Cizeron-Clairac; Sophie Vacher; Aurélie Susini; Catherine Andrieu; Rosette Lidereau; Ivan Bièche
Journal: PLoS One Date: 2010-12-30 Impact factor: 3.240

5. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.

Authors: Takaya Saito; Marc Rehmsmeier
Journal: PLoS One Date: 2015-03-04 Impact factor: 3.240

6. CNAPE: A Machine Learning Method for Copy Number Alteration Prediction from Gene Expression.

Authors: Quanhua Mu; Jiguang Wang
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2021-02-03 Impact factor: 3.710

7. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning.

Authors: Nicolas Coudray; Paolo Santiago Ocampo; Theodore Sakellaropoulos; Navneet Narula; Matija Snuderl; David Fenyö; Andre L Moreira; Narges Razavian; Aristotelis Tsirigos
Journal: Nat Med Date: 2018-09-17 Impact factor: 53.440

8. Identification of Non-Small Cell Lung Cancer Sensitive to Systemic Cancer Therapies Using Radiomics.

Authors: Laurent Dercle; Matthew Fronheiser; Lin Lu; Shuyan Du; Wendy Hayes; David K Leung; Amit Roy; Julia Wilkerson; Pingzhen Guo; Antonio T Fojo; Lawrence H Schwartz; Binsheng Zhao
Journal: Clin Cancer Res Date: 2020-03-20 Impact factor: 13.801

9. Machine Learning Detects Pan-cancer Ras Pathway Activation in The Cancer Genome Atlas.

Authors: Gregory P Way; Francisco Sanchez-Vega; Konnor La; Joshua Armenia; Walid K Chatila; Augustin Luna; Chris Sander; Andrew D Cherniack; Marco Mina; Giovanni Ciriello; Nikolaus Schultz; Yolanda Sanchez; Casey S Greene
Journal: Cell Rep Date: 2018-04-03 Impact factor: 9.423

10. Gene signatures associated with genomic aberrations predict prognosis in neuroblastoma.

Authors: Xiaoyan He; Chao Qin; Yanding Zhao; Lin Zou; Hui Zhao; Chao Cheng
Journal: Cancer Commun (Lond) Date: 2020-03

2 in total

1. Real-time digital polymerase chain reaction (PCR) as a novel technology improves limit of detection for rare allele assays.

Authors: Jiachen Xu; Kyra Duong; Zhenlin Yang; Kavanaugh Kaji; Yan Wang; Zhijie Wang; Jie Wang; Jiajia Ou; Steven R Head; Gogce Crynen; Phillip Ordoukhanian; Lauren Hanna; Ava Hanna
Journal: Transl Lung Cancer Res Date: 2021-12

2. Widespread redundancy in -omics profiles of cancer mutation states.

Authors: Jake Crawford; Brock C Christensen; Maria Chikina; Casey S Greene
Journal: Genome Biol Date: 2022-06-27 Impact factor: 17.906

2 in total