Ruixi Li1,2,3, Bo Liao1,2,3, Bo Wang4,5, Chan Dai4,5, Xin Liang1,2,3, Geng Tian4,5, Fangxiang Wu1,2,3,6. 1. School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China. 2. Key Laboratory of Computational Science and Application of Hainan Province, Haikou 571158, China. 3. Key Laboratory of Data Science and Intelligence Education (Hainan Normal University), Ministry of Education, Haikou 571158, China. 4. Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao 266000, China. 5. Geneis (Beijing) Co., Ltd., Beijing 100102, China. 6. Division of Biomedical Engineering, Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N5A9, Canada.
Abstract
BACKGROUND: Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; however, the prognosis is relatively poor. Thus, it is of clinical importance to accurately infer the tissue-of-origin of CUP. METHODS: We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. Specifically, we downloaded the expression profiles of 20,501 genes for 7713 samples from The Cancer Genome Atlas (TCGA), which were used as the training data set. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were also downloaded from the Gene Expression Omnibus (GEO) for an independent data set. RESULTS: 400 genes were selected to train a gradient boosting model for identification of the primary site of the tumor. The overall 10-fold cross-validation accuracy of our method was 96.1% across 20 types of cancer, while the accuracy for the independent data set reached 83.5%. CONCLUSION: Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage.
BACKGROUND: Cancer of unknown primary (CUP) is a type of malignant tumor, which is histologically diagnosed as a metastatic carcinoma while the tissue-of-origin cannot be identified. CUP accounts for roughly 5% of all cancers. Traditional treatment for CUP is primarily broad-spectrum chemotherapy; however, the prognosis is relatively poor. Thus, it is of clinical importance to accurately infer the tissue-of-origin of CUP. METHODS: We developed a gradient boosting framework to trace tissue-of-origin of 20 types of solid tumors. Specifically, we downloaded the expression profiles of 20,501 genes for 7713 samples from The Cancer Genome Atlas (TCGA), which were used as the training data set. The RNA-seq data of 79 tumor samples from 6 cancer types with known origins were also downloaded from the Gene Expression Omnibus (GEO) for an independent data set. RESULTS: 400 genes were selected to train a gradient boosting model for identification of the primary site of the tumor. The overall 10-fold cross-validation accuracy of our method was 96.1% across 20 types of cancer, while the accuracy for the independent data set reached 83.5%. CONCLUSION: Our gradient boosting framework was proven to be accurate in identifying tumor tissue-of-origin on both training data and independent testing data, which might be of practical usage.
Authors: Richard W Tothill; Adam Kowalczyk; Danny Rischin; Alex Bousioutas; Izhak Haviv; Ryan K van Laar; Paul M Waring; John Zalcberg; Robyn Ward; Andrew V Biankin; Robert L Sutherland; Susan M Henshall; Kwun Fong; Jonathan R Pollack; David D L Bowtell; Andrew J Holloway Journal: Cancer Res Date: 2005-05-15 Impact factor: 12.701
Authors: Robert Karlsson; Markus Aly; Mark Clements; Lilly Zheng; Jan Adolfsson; Jianfeng Xu; Henrik Grönberg; Fredrik Wiklund Journal: Eur Urol Date: 2012-07-20 Impact factor: 20.096
Authors: K Sheahan; J C O'Keane; A Abramowitz; J A Carlson; B Burke; L S Gottlieb; M J O'Brien Journal: Am J Clin Pathol Date: 1993-06 Impact factor: 2.493