| Literature DB >> 33076847 |
Yimei Jiang1, Xiaowei Yan1, Kun Liu1, Yiqing Shi1, Changgang Wang1, Jiele Hu1, You Li1, Qinghua Wu1, Ming Xiang2, Ren Zhao3.
Abstract
BACKGROUND: In recent years, the differences between left-sided colon cancer (LCC) and right-sided colon cancer (RCC) have received increasing attention due to the clinicopathological variation between them. However, some of these differences have remained unclear and conflicting results have been reported.Entities:
Keywords: Gene expression; Left-sided colon cancer; Machine learning; Mutations; Right-sided colon cancer
Mesh:
Substances:
Year: 2020 PMID: 33076847 PMCID: PMC7574488 DOI: 10.1186/s12885-020-07507-8
Source DB: PubMed Journal: BMC Cancer ISSN: 1471-2407 Impact factor: 4.430
Fig. 1Feature selection and classification model based on mutations between LCC and RCC. a AUC scores of 100 times 10-fold cross-validation using different feature numbers. b The ROC curve of the classification model on the test dataset. c The importance score of the 30 mutation features
Information on 30 mutation features
| Mutationa | avsnp150b | Gene.refGenec | weightsd | Sample Number in LCC(%)e | Sample Number in RCC_(%)f | |
|---|---|---|---|---|---|---|
| chr7_140753336_140753336_A_T | rs113488022 | BRAF | 0.12 | 2 (1.8) | 35 (20.5) | 4.72E-15 |
| chr12_25245347_25245347_C_T | rs112445441 | KRAS | 0.09 | 5 (4.5) | 20 (11.7) | 4.07E-08 |
| chr5_112839942_112839942_C_T | rs121913332 | APC | 0.09 | 1 (0.9) | 18 (10.5) | 2.93E-07 |
| chr12_25245350_25245350_C_G | rs121913529 | KRAS | 0.09 | 1 (0.9) | 6 (3.5) | 0.02 |
| chr12_25245350_25245350_C_T | rs121913529 | KRAS | 0.07 | 8 (7.1) | 26 (15.2) | 8.74E-11 |
| chr17_7674220_7674220_C_T | rs11540652 | TP53 | 0.06 | 10 (8.9) | 2 (1.2) | 0.56 |
| chr7_135929761_135929762_AT_A | . | LUZP6;MTPN | 0.06 | 4 (3.6) | 2 (1.2) | 0.56 |
| chr3_179234297_179234297_A_G | rs121913279 | PIK3CA | 0.05 | 5 (4.5) | 7 (4.1) | 0.01 |
| chr12_25245350_25245350_C_A | rs121913529 | KRAS | 0.04 | 11 (9.8) | 13 (7.6) | 3.46E-05 |
| chr3_179218294_179218294_G_A | rs121913273 | PIK3CA | 0.04 | 4 (3.6) | 5 (2.9) | 0.04 |
| chr12_25225628_25225628_C_T | rs121913527 | KRAS | 0.03 | 2 (1.8) | 6 (3.5) | 0.02 |
| chr5_112780895_112780895_C_T | rs587781392 | APC | 0.03 | 4 (3.6) | 4 (2.3) | 0.08 |
| chr5_78039082_78039083_GT_G | . | AP3B1 | 0.03 | 2 (1.8) | 10 (5.8) | 5.37E-4 |
| chr15_23567535_23567536_CT_C | . | MKRN3 | 0.02 | 1 (0.9) | 15 (8.8) | 5.29E-06 |
| chr17_7673802_7673802_C_T | rs28934576 | TP53 | 0.02 | 5 (4.5) | 6 (3.5) | 0.02 |
| chr3_46375665_46375666_TG_T | rs939905165 | LOC102724297 | 0.02 | 0 (0) | 15 (8.8) | 5.29E-06 |
| chr1_244056271_244056272_GA_G | rs972665297 | ZBTB18 | 0.02 | 2 (1.8) | 17 (9.9) | 7.77E-07 |
| chr7_1747914_1747915_TA_T | . | ELFN1 | 0.02 | 2 (1.8) | 8 (4.7) | 3.08E-3 |
| chr12_109581434_109581435_GC_G | . | MVK | 0.02 | 0 (0) | 11 (6.4) | 2.16E-4 |
| chr4_105242265_105242266_CT_C | . | TET2-AS1 | 0.02 | 1 (0.9) | 8 (4.7) | 3.08E-3 |
| chr8_13568071_13568072_CT_C | rs1014242184 | C8orf48 | 0.01 | 1 (0.9) | 14 (8.2) | 1.36E-05 |
| chr17_58357799_58357800_AC_A | rs781215815 | RNF43 | 0.01 | 0 (0) | 17 (9.9) | 7.77E-07 |
| chr2_68464196_68464197_AT_A | . | FBXO48 | 0.01 | 0 (0) | 9 (5.3) | 1.29E-3 |
| chr4_154609909_154609910_GT_G | . | FGG | 0.01 | 0 (0) | 10 (5.8) | 5.32E-4 |
| chr17_7675088_7675088_C_T | rs28934578 | TP53 | 0.01 | 10 (8.9) | 13 (7.6) | 3.46E-05 |
| chr2_147926116_147926117_TA_T | rs764719749 | ACVR2A | 0.01 | 1 (0.9) | 11 (6.4) | 2.16E-4 |
| chr5_112838220_112838220_C_T | rs121913333 | APC | 0.01 | 3 (2.7) | 7 (4.1) | 7.22E-3 |
| chr13_108232109_108232110_CA_C | rs977361714 | ABHD13 | 0.003 | 1 (0.9) | 9 (5.3) | 1.29E-3 |
| chr4_44698597_44698598_GA_G | . | GUF1 | 0.003 | 1 (0.9) | 11 (6.4) | 2.16E-4 |
| chr6_98837428_98837429_CT_C | rs898072886 | POU3F2 | 0.003 | 1 (0.9) | 10 (5.8) | 5.32E-4 |
aPosition of variants. For example, chr7_140753336_140753336_A_T represents base A being replaced by T at position 140,753,336 of chromosome 7
bThe annotation of variants with dbSNP identifiers by ANNOVAR
cThe annotated genes of the variants by ANNOVAR
dThe weights (importance) of the mutation features for the classification model
eThe number of samples (percent of samples) with the variants among LCC samples
fThe number of samples (percent of samples) with the variants among RCC samples
gThe P-value from Fisher’s exact test for each variant
Fig. 2Heatmaps of the selected mutations and gene expression data. a Information of 30 mutations in LCC and RCC samples. Red represents an mutation being present in the sample, while blue represents no corresponding mutation in the sample. b Gene expression of 17 DEGs in LCC and RCC samples. Color represents log10(FPKM+ 1)
Fig. 3Volcano plot and MA plot for DEGs. Red dots represent upregulated genes in RCC compared with the level in LCC, while green dots represent downregulated genes
Fig. 4Annotation results of DEGs in KEGG (a) and GO analyses (b). Only genes that were upregulated in RCC compared with the level in LCC were enriched in KEGG pathways and GO analyses (adjusted P-value< 0.05)
Fig. 5Feature selection and classification model based on DEGs between LCC and RCC. a AUC scores of 100 times 10-fold cross-validation using different feature numbers. b The importance score of the 30 mutation features. c The ROC curve of the classification model on the test set. d Boxplot of the top four genes with the highest importance score in LCC and RCC
Fig. 6Network of all of the DEGs and genes with the selected 30 mutations (Produced by Cytoscape Version 3.7.1). Circle nodes represent DEGs, while triangles represent mutated genes. Nodes with a light yellow color represent genes with mutations, dark turquoise represents downregulated DEGs, while dark orange represents upregulated DEGs. The line color represents the score of the connection between two nodes, ranging from 0.4 to 0.99. Node size represents the degree of the node: the larger the node size, the higher the degree of the node
Fig. 7The correlation network of 30 mutations and 17 DEGs calculated by logistic regression model (FDR < 0.05, Produced by Cytoscape Version 3.7.1). Nodes with red color represent mutations, mutation1 represents rs113488022 (BRAF, V600E mutation), mutation2 represents mutation in the 3′-UTR of ELFN1(chr7_1747914_1747915_TA_T- represents the chromosome, the position, and the mutated base). Nodes with a blue color represent DEGs
The correlations of rs113488022 with DEGs
| Mutation | Gene | Coefficienta | FDRa |
|---|---|---|---|
| rs113488022 | ULBP2 | 0.31 | 0.01 |
| CA8 | 0.18 | 0.04 | |
| HOXC6 | 0.68 | 0.002 | |
| AFAP1-AS1 | 0.13 | 0.001 |
aThe coefficients and adjusted P-values (FDR) of the correlations of rs113488022 with genes from logistic regression model