Literature DB >> 32509741

TOOme: A Novel Computational Framework to Infer Cancer Tissue-of-Origin by Integrating Both Gene Mutation and Expression.

Binsheng He¹, Jidong Lang², Bo Wang², Xiaojun Liu², Qingqing Lu², Jianjun He¹, Wei Gao³, Pingping Bing¹, Geng Tian², Jialiang Yang².

Abstract

Metastatic cancers require further diagnosis to determine their primary tumor sites. However, the tissue-of-origin for around 5% tumors could not be identified by routine medical diagnosis according to a statistics in the United States. With the development of machine learning techniques and the accumulation of big cancer data from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO), it is now feasible to predict cancer tissue-of-origin by computational tools. Metastatic tumor inherits characteristics from its tissue-of-origin, and both gene expression profile and somatic mutation have tissue specificity. Thus, we developed a computational framework to infer tumor tissue-of-origin by integrating both gene mutation and expression (TOOme). Specifically, we first perform feature selection on both gene expressions and mutations by a random forest method. The selected features are then used to build up a multi-label classification model to infer cancer tissue-of-origin. We adopt a few popular multiple-label classification methods, which are compared by the 10-fold cross validation process. We applied TOOme to the TCGA data containing 7,008 non-metastatic samples across 20 solid tumors. Seventy four genes by gene expression profile and six genes by gene mutation are selected by the random forest process, which can be divided into two categories: (1) cancer type specific genes and (2) those expressed or mutated in several cancers with different levels of expression or mutation rates. Function analysis indicates that the selected genes are significantly enriched in gland development, urogenital system development, hormone metabolic process, thyroid hormone generation prostate hormone generation and so on. According to the multiple-label classification method, random forest performs the best with a 10-fold cross-validation prediction accuracy of 96%. We also use the 19 metastatic samples from TCGA and 256 cancer samples downloaded from GEO as independent testing data, for which TOOme achieves a prediction accuracy of 89%. The cross-validation validation accuracy is better than those using gene expression (i.e., 95%) and gene mutation (53%) alone. In conclusion, TOOme provides a quick yet accurate alternative to traditional medical methods in inferring cancer tissue-of-origin. In addition, the methods combining somatic mutation and gene expressions outperform those using gene expression or mutation alone.

Entities: Chemical Disease Gene Mutation Species

Keywords: cross-validation; gene expression; random forest; somatic mutation; tissue-of-origin

Year: 2020 PMID： 32509741 PMCID： PMC7248358 DOI： 10.3389/fbioe.2020.00394

Source DB: PubMed Journal: Front Bioeng Biotechnol ISSN： 2296-4185

Introduction

Metastatic cancer is a common clinical challenge for limited evidence to determine its primary origin. Patients with carcinoma of unknown primary (CUP) account for about 5% of total cancer patients (Shaw et al., 2007). CUP are usually heterogeneous, and can lead to dilemmas in diagnosing and treatment since the original tumor site is unknown (Rizwan and Zulfiqar, 2010). Clinically, CUP patients are generally treated with non-selective empirical chemotherapy, which usually leads to low survival rates (Kurahashi et al., 2013). Thus, identifying cancer tissue-of-origin (TOO) is critical in improving the treatment of cancer patients and extending their surviving time (Hudis, 2007; Varadhachary et al., 2008; Hyphantis et al., 2013). There are several ancillary examinations in CUP identification, among which immunohistochemistry (IHC) is an important one. However, this method relies on the experiences of pathologists and is labor-intensive. As a result, it is inaccurate in most of the times (Huebner et al., 2007; Voigt, 2008; Centeno et al., 2010; Kandalaft and Gown, 2015; Janick et al., 2018). Positron emission tomography (PET) and computed tomography (CT) are also commonly used in the identification of CUP (Fencl et al., 2007; Kwee et al., 2010; Fu et al., 2019). The detection rate of conventional radiological imaging on primary carcinoma reach 20–27%, and that of PET reach 24–40% (Ambrosini et al., 2006). The detection accuracy of PET/CT is awfully low that it rarely brings help to identify the primary origin. Obstacles in image technology cause much difficulty of effective use of relative Carcinoma image to help tracing cancer tissue origin. Molecular profiling of tissue-specific genes is also being used in CUP work-up. Quantities of large-scale profiles of different tumors have been used for diagnose. Molecular profiling is as well as or better than IHC, in terms of poorly differentiated or undifferentiated tumors (Oien and Dennis, 2012). Therefore, making use of molecular profiling has become a popular way for diagnosis of unknown origin. Comprehensive molecular profiles displayed in The Cancer Genome Atlas (TCGA) including copy number variation, somatic mutation, gene expression, microRNA expression, DNA methylation, and protein expression, are used to identifying human tumor types (Li et al., 2017). By analysis of tumor types from data of methylation and copy number variation, tissue of origin and molecular classification can be revealed (Hoadley et al., 2014). The methylation profile of metastasis in a meningeal melanocytic tumor is similar to that of primary tumor, and it is suggest that particular copy number variations may be associated with metastatic behavior (Küsters-Vandevelde et al., 2017). Methylation and copy number variation are DNA-level molecular profiling, which brought great help to identify tumor origins. The copy number profile and gain or loss in specific chromosome regions have been researched by hybridization and cytogenetic-based methods (Baudis, 2007; Beroukhim et al., 2007). An IDH1 somatic mutation in genomic profiling was revealed to bring great benefit to the diagnosis of cholangiocarcinoma and trace the primary origin in a malignancy (Sheffield et al., 2016). Marquard et al. (2016) obtained classification accuracy of 69% and 85% on 6 and 10 primary sited with somatic mutation, respectively, based on PM and CN classifier (classifiers with both point mutations and copy number aberrations) with cross-validation. Mutation of tumor-specific enrichment in certain genes, has been utilized to infer tumor localization, and Dietlein and Eschner (2014) developed a tool with mutation spectra to infer cancer origins with a prediction specificity of 79% (Lawrence et al., 2014). As a DNA-level molecular profiling, SNP, that is somatic mutation, can be used as a very useful tool to infer the tissue of origins. A lot of RNA-level gene expression profile have been explored to identify the cancer tissue of origin (Erlander et al., 2004; Qu et al., 2007; Gross-Goupil et al., 2012; Greco, 2013; Hainsworth et al., 2013). Erlander et al. (2011) have demonstrated that the gene expression value of samples detected in metastatic tumor is similar to that in the original tumor under condition of CUP. Centeno et al. (2010) developed a hybrid model by integrating expression profiling and IHC for microRNA-based qRT-PCR test on identification of cancer tissue origin, with 85% of the cases correctly identified (Rosenwald et al., 2010). Bloom et al. (2004) utilized artificial neural networks (ANNs) to predict the unknown cancer tissue origin with mean accuracy of 83–88% in different platforms. Numerous researches have utilized molecular profiles, such as copy number variation, somatic mutation, gene expression, and so on for predicting cancer tissue origin. However, the accuracy of prediction was not satisfying. Identifying cancer tissue origin by combining somatic mutation and gene expression profiling on DNA level and RNA level, respectively, is first proposed in this study. Firstly, we obtained the data of somatic mutation and gene expression profiling from International Cancer Genome Consortium (ICGC) Database. Machine learning methods can help to improve the performance on prediction of cancer tissue origin. We aim to obtain better performance in predicting cancer tissue origin, by the combination of somatic mutation and gene expression profiling, based on random forest. Machine learning algorithm, such as logistic regression can be used to select gene (Kao et al., 2006). However, random forest algorithm (Sandri and Zuccolotto, 2006) was chosen as the gene selection algorithm in this study due to its advantage, good robustness and easy to use. Finally, we used random forest algorithm for classification of cancers. Experiment results showed that higher accuracy can be obtained by using the method proposed in this study.

Materials and Methods

Gene Expression Data

Gene expression profile was downloaded from ICGC Database version release-26[1]. Each gene is named by Gene Symbol ID. The value of gene expression in each labeled sample is normalized by TPM. After deduplication, samples were extracted for combination with SNP samples.

Somatic Mutation Data

The somatic mutation data was downloaded from ICGC Database version release-28[2]. Each gene is named by Ensembl Gene ID. For Gene Symbol ID is most widely used in paper, the Ensembl Gene ID of gene name in somatic mutation data was converted to Gene Symbol ID. The samples are deduplicated according to information of ICGC-donor-ID, chromosome, and locus in chromosome and gene-affected. Each sample was labeled by its type of cancer.

Data Combination

The gene expression and somatic mutation data were merged into one feature matrix. For labeled samples with gene expression array data only involves in 21 cancer types, and samples with Skin Cutaneous Melanoma (SKCM) were removed for it contributes to the major metastasis cancers. The sample with somatic mutation data whose label was not included in these 20 cancer types was removed. Then, the shared sample data was chosen, therefore the samples data after filtering is obtained from 20 different cancer types. An M∗N matrix was generated, where M and N represents the number of sample and gene, respectively.

Gene Selection

Because gene sequencing and mutation detection are costly and time consuming, a scale reduction of gene number is necessary. There are many feature selection algorithms, like Lasso, PCA (Malhi and Gao, 2005; Muthukrishnan and Rohini, 2016) and etc. The Random forest (Breiman, 2001; Sandri and Zuccolotto, 2006) was a supervised learning algorithm, which is an ensemble learning algorithm based on decision tree and was used to select genes. Best performance was obtained by using 80 selected genes. genes were used in a tree, where n represents the number of genes. At the process of splitting node, Gini index was used, which is calculated by formula: Where p represents the weight referring to frequencies of cancers in a node, k represents the number of cancers and p represents the weight of the kth cancer. The variable importance measures of ith gene in node m, that is the Gini index variation after splitting of node m, is calculated by formula: Where m is a node in M, which is a set of nodes, represents variable importance measures of ith gene in node m, the GI represents the Gini index before splitting, GI and GI represents the Gini index of two new node after splitting, respectively. The importance of the ith gene, in the tth tree is calculated by formula: Where represents the importance of the ith gene in the tth tree. If the set of trees is T, the importance of the ith gene in all the tree is calculated by formula: Where is the importance of the ith gene in all trees. We sorted the importance scores of all genes, then the top H genes were selected, where H is the variable number of genes that can be set to find the best result.

Multi-Classifier Random Forest

The random forest is actually a special method of bagging that using the decision tree as a model in bagging (Breiman, 2001; Meyer et al., 2019). First, the bootstrap method is used to generate m training sets, which is a set of samples. Then, each training set is used to construct a tree. genes are used in a tree, where n represents the number of selected genes. When splitting a node, not all the genes are used to optimize the metric Gini index used in this study, a part of genes is randomly extracted instead. An optimal solution can be found among the extracted genes, and applied to node splitting. Leaf node in the tree records which gene is used to determine the cancer type, and each leaf node represents the last judged cancer type. The predicted cancer type is given by maximum votes from decision tree.

Statistical Analysis

The metric of precision, recall and F1 score were used to evaluate the performance of the model. True-positive, false-positive, true-negative and false-negative are abbreviated as TP, FP, TN, and FN, respectively. Precision is calculated by (TP)/(TP + FP), which indicates the ability of classifier to differentiate positive from negative cases. Recall is calculated by (TP)/(TP + FN), which indicates the ability of classifier to recognize all positive cases. The F1 score is calculated by (2*recall*precision)/(recall + precision). Each individual cancer type is calculated by these metrics, and the cohort metric adopt the mean report. The entire cohort is calculated by accuracy, reported as (TP + TN)/(totalcases). Ten times 10-fold cross validation is used to obtain the metric report, whose average is treated as the result metric.

Gene Annotation

The functions annotation of specific gene set was given. Geno ontology (Ye et al., 2006; Waardenberg et al., 2016) was used as enrichment analysis database. Gene clustering and visualization was realized by R package cluaterProfiler and gogadget (Yu et al., 2012; Nota, 2016).

Results

The Workflow of TOOme

The complete workflow of prediction on cancer tissue origin is shown in Figure 1. The process can be split into three steps. At the first step, we download the raw data from ICGC Database, and extracted the effective information to obtain preliminary data of somatic mutation and gene expression profiling. At the second step, we filtered the data of somatic mutation and gene expression profiling, respectively. Then, samples with both somatic mutation data and gene expression proofing were used to form feature matrix. As a result, the generated feature matrix was used for gene selection. At the third step, most of the samples were utilized to train the model with 10-time 10 folds cross validation by using random forest classification algorithm. We carried out numerous experiments to evaluate the performance of the proposed method.

FIGURE 1

The complete workflow of prediction on cancer tissue origin.

Data Used in This Study

We used ICGC version 26 and 28 databases, with Gene expression profile and somatic mutation information to classify tumor samples. The allele mutation in somatic mutation data can be A/G, C/T, C/A, and etc. For it is hard to distinguish mutation types with limited relative information and tools, we consider all kinds of allele mutation as gene mutation and count the number of gene mutation of each sample. Different from somatic mutation data, gene expression profile array data is directly used. The sample distribution of each cancer is showed in Table 1, where samples suffer from BRCA are much more than from other cancers. Considerable prediction results can be obtained by our model. The precision, recall and F1 score, showed in Table 2, reach 99.86%, 99.47% and 99.67%, respectively.

TABLE 1

Sample distribution of each cancer from ICGC database.

Available cancer types	Abbreviation	Samples
		Amount	Percentage
Bladder urothelial carcinoma	BLCA	294	4.20%
Breast invasive carcinoma	BRCA	970	13.84%
Cervical squamous cell carcinoma and endocervical adenocarcinoma	CESC	241	3.44%
Colon adenocarcinoma	COAD	390	5.57%
Glioblastoma multiforme	GBM	148	2.11%
Head and neck squamous cell carcinoma	HNSC	460	6.56%
Kidney renal clear cell carcinoma	KIRC	345	4.92%
Kidney renal papillary cell carcinoma	KIRP	216	3.08%
Acute myeloid leukemia	LAML	121	1.73%
Brain lower grade glioma	LGG	433	6.18%
Liver hepatocellular carcinoma	LIHC	282	4.02%
Lung adenocarcinoma	LUAD	475	6.78%
Lung squamous cell carcinoma	LUSC	411	5.87%
Ovarian serous cystadenocarcinoma	OV	185	2.64%
Pancreatic adenocarcinoma	PAAD	134	1.91%
Prostate adenocarcinoma	PRAD	374	5.34%
Rectum adenocarcinoma	READ	137	1.95%
Stomach adenocarcinoma	STAD	412	5.88%
Thyroid carcinoma	THCA	486	6.93%
Uterine corpus endometrial carcinoma	UCEC	494	7.05%
Total		7008	100%

TABLE 2

Performance of classification of combination of somatic mutation and gene expression by using 80 genes.

Cancer type	Precision	Recall	F1-score	Support	Specificity
BLCA	0.8906	0.9354	0.9124	294.0000	0.9950
BRCA	0.9987	0.9947	0.9967	970.0000	0.9998
CESC	0.9148	0.8859	0.9001	241.0000	0.9971
COAD	0.7548	0.9644	0.8468	390.0000	0.9815
GBM	0.9940	1.0000	0.9970	148.0000	0.9999
HNSC	0.9916	1.0000	0.9958	460.0000	0.9994
KIRC	0.9850	0.9516	0.9680	345.0000	0.9992
KIRP	0.9344	0.9630	0.9485	216.0000	0.9979
LAML	1.0000	1.0000	1.0000	121.0000	1.0000
LGG	0.9926	0.9977	0.9952	433.0000	0.9995
LIHC	0.9925	0.9844	0.9884	282.0000	0.9997
LUAD	0.9358	0.9448	0.9403	475.0000	0.9953
LUSC	0.9408	0.9000	0.9199	411.0000	0.9965
OV	1.0000	0.9946	0.9973	185.0000	1.0000
PAAD	0.9378	0.9552	0.9464	134.0000	0.9988
PRAD	0.9973	1.0000	0.9987	374.0000	0.9998
READ	0.7569	0.1591	0.2627	137.0000	0.9990
STAD	0.9947	0.9976	0.9961	412.0000	0.9997
THCA	1.0000	0.9979	0.9990	486.0000	1.0000
UCEC	0.9673	0.9816	0.9744	494.0000	0.9975
Accuracy	0.9577	0.9577	0.9577	0.0000

Sample distribution of each cancer from ICGC database. Performance of classification of combination of somatic mutation and gene expression by using 80 genes. In this study, there are 371 samples with metastasis, where 352 samples are SKCM. To avoid unbalanced distribution of samples, we removed all the SKCM samples with metastasis. Only 19 samples with metastasis were used as test dataset.

Performance Evaluation

The classification accuracies obtained by using data of somatic mutation, gene expression profiling and both of them, under condition of using different number of genes, have been compared in Figure 2. Motivated by Ma et al. (2006) that five genes can be used to solve a 32-type classification problem, five was chosen as the minimum number of genes. For gene sequencing and mutation detection are costly and time consuming, 120 was chosen as the maximum number of genes. A lot of experiments have been done using the prepared data between the interval from 5 to 120. For using small number of genes did not obtain satisfying classification performance, the interval between number of genes was set to 10 or even larger until the number of genes equals to 50. Then the interval was set to 5 for fine tuning, based on small fluctuation by changed number of genes.

FIGURE 2

The classification accuracy of using somatic mutation, gene expression and combination of somatic mutation and gene expression, respectively.

The classification accuracy of using somatic mutation, gene expression and combination of somatic mutation and gene expression, respectively. Results with 10-time 10 folds cross validation on training dataset are shown in Figure 2 that accuracy of using data of both somatic mutation and gene expression profiling is always higher than that of only using one of it. The best result of them are 95.77%, 53.51%, and 89.28%, obtained by using 80, 120, and 105 genes, respectively. Results shows that gene expression can make much contribution to obtain higher accuracy than data of somatic mutation. However, a combination of them achieved best classification performance. As for the test dataset, we conducted experiments by using the chosen 80 genes in training model. The overall classification accuracy is 89.47%. Table 3 shows the prediction probabilities of each sample on each cancer. The value on the table highlighted by color of green, yellow, and pink presents high, middle, and low probabilities, respectively, of predicting a sample to a cancer type. We obtained considerable prediction accuracy on sample with BRCA and THCA. Each sample was correctly predicted to the same as the true label. A sample whose true label is CESC was predicted to UCEC. A sample whose true label is BRCA was predicted to LGG with a terrible probability 1.65%. In this condition, we considered that little error on classification is tolerable.

TABLE 3

Prediction probabilities of each samples on each cancer.

Mean Value of Gene Expression and Somatic Mutations on Each Cancer

We plotted the heatmap of mean value of gene expression and somatic mutations on each cancer. In Figure 3, the rows represent 74 genes of gene expression and columns denote the cancers. In Figure 4, the rows represent six genes of somatic mutation and columns represent the cancers. The mean value of gene expression and somatic mutation on a logarithmic scale was plotted with relative color. A color bar was used to display the value difference. Cancers that fell into cluster at horizontal axis had a similar value between gene expression or mutation number. The genes were also clustered at vertical axis based on the similarity between cancers.

FIGURE 3

Heatmap of mean value of gene expression on each cancer.

FIGURE 4

Heatmap of mean value of somatic mutations on each cancer.

Heatmap of mean value of gene expression on each cancer. Heatmap of mean value of somatic mutations on each cancer.

Discussion

Data of somatic mutation and gene expression profiling can be used to identify the primary site of tumors. However, it was the first time to identify the cancer tissue origin by using both data of somatic mutation and gene expression profiling. We carried out experiments by using 7008 samples with combination of data of somatic and gene expression profiling among 20 cancers. By comparing the performance of them, we obtained highest accuracy by leveraging both of the data of somatic mutation and gene expression profiling. The primary analysis tool we used was random forest (Breiman, 2001; Sandri and Zuccolotto, 2006), a machine learning algorithm that can be used for gene selection and tumor classification. We chose top-rank 80 genes, where 6 genes and 74 genes are corresponding to mutation and expression, respectively, for classification. Therefore, it showed that data of somatic mutation performs worse than gene expression profiling on prediction of cancer tissue origin. Our method obtained 96% overall accuracy on the training dataset. The performance is maintained considerably on the external cohorts, and the overall accuracy on sample with metastatic disease is 89%. Our model cannot provide good performance on physiologically proximal cancers, such as uterine corpus endometrial carcinoma and cervical squamous cell carcinoma and endocervical adenocarcinoma. The endometrial and ovarian endometrioid carcinomas evolve from similar precursor endometrial epithelial cells; many researches are involved in the molecular pathogenesis of the endometrial and ovarian endometrioid carcinomas (McConechy et al., 2014). We studied the role that gene plays in cellular component, biological process and molecular function. Figure 5 shows the top-rank 80 genes selected by random forest algorithm. The selected genes were enriched in hormone metabolic process, tissue and organ development and hormone-mediated signaling pathway, specifically in gland development, urogenital system development, hormone metabolic process, morphogenesis of a branching epithelium, morphogenesis of a branching structure, endocrine system development, branching morphogenesis of an epithelial tube, thyroid hormone metabolic process, thyroid hormone generation and prostate gland development. For example, APC plays a significant role in discovering pathogenesis of soft tissue tumors (Kuhnen et al., 2000). Birnbaum et al. (2012) investigated what role the APC gene play in colorectal cancer, at the investigation of 183 colon adenocarcinomas, point mutations were found in 73% of cases. We obtained the similar conclusion that mutation of APC gene may be the important impact of colorectal cancer, as heatmap shown in Figure 4 that the mean number of APC gene mutation in colorectal cancer is more than that in other cancers except rectum adenocarcinoma. It can be explained that they are two physiologically proximal cancers. Mutation in IDH1 gene can reduce cell survival, proliferation and invasion of human glioma (Cui et al., 2016). Mutation in IDH1 gene is an oncogenic driver in a majority of lower-grade gliomas and have an impact on brain lower grade glioma with different genetic pathway (Ohno et al., 2013; Pieper et al., 2014; Ohka et al., 2017). The same conclusion was acquired in Figure 4 that the mean number of IDH1 gene mutation in Brain lower grade glioma is more than that in other cancers.

FIGURE 5

Selected top-rank 80 genes enriched in cellular component, biological process and molecular function.

Selected top-rank 80 genes enriched in cellular component, biological process and molecular function. ACPP gene plays a vital key in prostate adenocarcinoma (Maatman et al., 1984; Drago et al., 1989; Vihko et al., 2005). From the heatmap, it is clear that the level of ACPP gene expression in prostate adenocarcinoma is higher than that in other cancers. The expression levels of TG were found to be altered in all kinds of thyroid carcinomas (Makhlouf et al., 2016). From Figure 3, we obtained similar results that the level of TG gene expression in thyroid carcinomas is higher than that in other cancers. Molecular profiling of tissue-specific genes can be utilized to identify the primary site of tumor. Combination of data of somatic mutation and gene expression profiling were first proposed in this study to predict the primary origin. We obtained considerable prediction performance, and therefore this research can bring great help to the identification of cancer tissue origin. However, we did not carry out research to discover the relationship between data of gene expression and somatic mutation. Our method cannot classify physiologically proximal cancers yet. And it is also a future work to employing other machine learning algorithms that can improve the classification performance.

Conclusion

Identification of cancer tissue origin is a challenging work recently and in the future. With a lot of molecular profiling available, we can make use of them alone and combine some of them to improve performance of identification primary site of tumor. Machine learning algorithm is also an effective tool to help classifying the cancers. The prediction performance can be tremendously affected by the number of features used. In this study, we used both molecular data of somatic mutation and gene expression profiling to generate a feature matrix. Then the optimal number of genes was obtained and the data was trained, based on random forest algorithm. The performance of using our method was also compared to only by using data of somatic mutation or gene expression profiling. Our method achieved highest accuracy. Experiment results shows that our method can be an effective tool for primary origin tracing.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://dcc.icgc.org/releases/release_26/, https://dcc.icgc.org/releases/release_28/.

Author Contributions

JY, GT, and PB conceived the concept of the work. BH, XL, BW, and JL performed the experiments. BH and XL wrote the manuscript. QL, WG, and JH reviewed the manuscript. All authors approved the final version of this manuscript.

Conflict of Interest

BW, JL, XL, QL, GT, and JY were employed by the company Geneis Beijing Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

26 in total

1. Evaluating the Microsatellite Instability of Colorectal Cancer Based on Multimodal Deep Learning Integrating Histopathological and Molecular Data.

Authors: Wenjing Qiu; Jiasheng Yang; Bing Wang; Min Yang; Geng Tian; Peizhen Wang; Jialiang Yang
Journal: Front Oncol Date: 2022-07-05 Impact factor: 5.738

2. Informative SNP Selection Based on a Fuzzy Clustering and Improved Binary Particle Swarm Optimization Algorithm.

Authors: Zejun Li; Li Ang; Wei Shi; Ning Xin; Min Chen; Hua Tang
Journal: Comput Math Methods Med Date: 2022-06-16 Impact factor: 2.809

3. A Framework to Predict the Molecular Classification and Prognosis of Breast Cancer Patients and Characterize the Landscape of Immune Cell Infiltration.

Authors: Kun Zheng; Zhiyong Luo; Yilu Zhou; Lili Zhang; Yali Wang; Xiuqiong Chen; Shuo Yao; Huihua Xiong; Xianglin Yuan; Yanmei Zou; Yihua Wang; Hua Xiong
Journal: Comput Math Methods Med Date: 2022-06-07 Impact factor: 2.809

Review 4. Application of Circulating Tumor DNA as a Biomarker for Non-Small Cell Lung Cancer.

Authors: Jialiang Yang; Yan Hui; Yanxiang Zhang; Minghui Zhang; Binbin Ji; Geng Tian; Yangqiang Guo; Min Tang; Lianxing Li; Bella Guo; Tonghui Ma
Journal: Front Oncol Date: 2021-08-05 Impact factor: 6.244

5. Mining mutation contexts across the cancer genome to map tumor site of origin.

Authors: Saptarshi Chakraborty; Axel Martin; Zoe Guan; Colin B Begg; Ronglai Shen
Journal: Nat Commun Date: 2021-05-24 Impact factor: 14.919

6. Gene-Based Testing of Interactions Using XGBoost in Genome-Wide Association Studies.

Authors: Yingjie Guo; Chenxi Wu; Zhian Yuan; Yansu Wang; Zhen Liang; Yang Wang; Yi Zhang; Lei Xu
Journal: Front Cell Dev Biol Date: 2021-12-16

7. Development and Validation of a Nomogram for the Prediction of Hospital Mortality of Patients With Encephalopathy Caused by Microbial Infection: A Retrospective Cohort Study.

Authors: Lina Zhao; Yun Li; Yunying Wang; Qian Gao; Zengzheng Ge; Xibo Sun; Yi Li
Journal: Front Microbiol Date: 2021-08-19 Impact factor: 5.640

Review 8. Genomic Variation Prediction: A Summary From Different Views.

Authors: Xiuchun Lin
Journal: Front Cell Dev Biol Date: 2021-11-25

Review 9. RUNX3 Expression Level Is Correlated with the Clinical and Pathological Characteristics in Endometrial Cancer: A Systematic Review and Meta-analysis.

Authors: Zhen Liu; Zhi-Pan Hong; Shu-Xue Xi
Journal: Biomed Res Int Date: 2021-07-14 Impact factor: 3.411

10. A Computational Framework to Identify Transcriptional and Network Differences between Hepatocellular Carcinoma and Normal Liver Tissue and Their Applications in Repositioning Drugs.

Authors: Aimin Hu; Zheng Wei; Zuxiang Zheng; Bichao Luo; Jieming Yi; Xinhong Zhou; Changjiang Zeng
Journal: Biomed Res Int Date: 2021-09-23 Impact factor: 3.411