Libo Yang1, Bo Fu2, Yan Li2, Yueping Liu3, Wenting Huang4, Sha Feng5, Lin Xiao6, Linyong Sun7, Ling Deng6, Xinyi Zheng6, Feng Ye8, Hong Bu1. 1. Laboratory of Pathology, West China Hospital, Sichuan University, Chengdu, China; Key Laboratory of Transplant Engineering and Immunology, Ministry of Health, West China Hospital, Sichuan University, Chengdu, China; Department of Pathology, West China Hospital, Sichuan University, Chengdu, China. 2. Big Data Research Center, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China. 3. Department of Pathology, the Fourth Hospital of Hebei Medical University, Shijiazhuang, China. 4. National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Beijing 100021, China. 5. National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Science and Peking Union Medical College, Shenzhen 518116, China. 6. Laboratory of Pathology, West China Hospital, Sichuan University, Chengdu, China; Key Laboratory of Transplant Engineering and Immunology, Ministry of Health, West China Hospital, Sichuan University, Chengdu, China. 7. Department of Pathology, West China Hospital, Sichuan University, Chengdu, China. 8. Laboratory of Pathology, West China Hospital, Sichuan University, Chengdu, China; Key Laboratory of Transplant Engineering and Immunology, Ministry of Health, West China Hospital, Sichuan University, Chengdu, China. Electronic address: fengye@scu.edu.cn.
Abstract
BACKGROUND AND OBJECTIVE: Chemotherapy is useful to many breast cancer patients, however, it is not therapeutic for some patients. Pathologic complete response (pCR) is an indicator to good response in Neoadjuvant chemotherapy (NAC). In this study, we aimed to develop a way to predict pCR before NAC. METHODS: We retrospectively collected 287 stage II-III breast cancer cases either to a training set (N = 197) or to a test set (N = 90). Fourteen candidate genes were selected from four public microarray data sets. A prediction model was built, by using these fourteen candidate genes and three reference genes expression which were tested by TaqMan probe-based quantitative polymerase chain reaction, after selecting a better algorithm. RESULTS: The Naive Bayes algorithm had a relatively higher predictive value, compared with random forest, support vector machine (SVM), and k-nearest neighbor (knn) algorithms (P < 0.05). This 17-gene prediction model showed a high positive correlation with pCR (odds ratio, 8.914, 95% confidence interval, 4.430-17.934, P < 0.001). By using this model, the enrolled patients were classified into sensitive (SE) and insensitive (INS) groups. The pCR rates between the SE and INS groups were highly different (42.3% vs.7.6%, P < 0.001). The sensitivity and specificity of this prediction model were 84.5% and 62.0%. CONCLUSIONS: Instead of whole transcriptome-based technologies, panel gene expression with tens of essential genes implemented in a machine learning model has predictive potential for chemosensitivity in breast cancers.
BACKGROUND AND OBJECTIVE: Chemotherapy is useful to many breast cancerpatients, however, it is not therapeutic for some patients. Pathologic complete response (pCR) is an indicator to good response in Neoadjuvant chemotherapy (NAC). In this study, we aimed to develop a way to predict pCR before NAC. METHODS: We retrospectively collected 287 stage II-III breast cancer cases either to a training set (N = 197) or to a test set (N = 90). Fourteen candidate genes were selected from four public microarray data sets. A prediction model was built, by using these fourteen candidate genes and three reference genes expression which were tested by TaqMan probe-based quantitative polymerase chain reaction, after selecting a better algorithm. RESULTS: The Naive Bayes algorithm had a relatively higher predictive value, compared with random forest, support vector machine (SVM), and k-nearest neighbor (knn) algorithms (P < 0.05). This 17-gene prediction model showed a high positive correlation with pCR (odds ratio, 8.914, 95% confidence interval, 4.430-17.934, P < 0.001). By using this model, the enrolled patients were classified into sensitive (SE) and insensitive (INS) groups. The pCR rates between the SE and INS groups were highly different (42.3% vs.7.6%, P < 0.001). The sensitivity and specificity of this prediction model were 84.5% and 62.0%. CONCLUSIONS: Instead of whole transcriptome-based technologies, panel gene expression with tens of essential genes implemented in a machine learning model has predictive potential for chemosensitivity in breast cancers.