| Literature DB >> 35681633 |
Faddy Kamel1,2, Nathalie Schneider1, Pasha Nisar2, Mikhail Soloviev1.
Abstract
Traditional approaches to genome-wide marker discovery often follow a common top-down strategy, where a large scale 'omics' investigation is followed by the analysis of functional pathways involved, to narrow down the list of identified putative biomarkers, and to deconvolute gene expression networks, or to obtain an insight into genetic alterations observed in cancer. We set out to investigate whether a reverse approach would allow full or partial reconstruction of the transcriptional programs and biological pathways specific to a given cancer and whether the full or substantially expanded list of putative markers could thus be identified by starting with the partial knowledge of a few disease-specific markers. To this end, we used 10 well-documented differentially expressed markers of colorectal cancer (CRC), analyzed their transcription factor networks and biological pathways, and predicted the existence of 193 new putative markers. Incredibly, the use of a validation marker set of 10 other completely different known CRC markers and the same procedure resulted in a very similar set of 143 predicted markers. Of these, 138 were identical to those found using the training set, confirming our main hypothesis that a much-expanded set of disease markers can be predicted by starting with just a small subset of validated markers. Further to this, we validated the expression of 42 out of 138 top-ranked predicted markers experimentally using qPCR in surgically removed CRC tissues. We showed that 41 out of 42 mRNAs tested have significantly altered levels of mRNA expression in surgically excised CRC tissues. Of the markers tested, 36 have been reported to be associated with aspects of CRC in the past, whilst only limited published evidence exists for another three genes (BCL2, PDGFRB and TSC2), and no published evidence directly linking genes to CRC was found for CCNA1, SHC1 and TGFB3. Whilst we used CRC to test and validate our marker discovery strategy, the reported procedures apply more generally to cancer marker discovery.Entities:
Keywords: biomarkers; cancer detection; cancer screening; colorectal cancer; marker discovery; qPCR
Year: 2022 PMID: 35681633 PMCID: PMC9179423 DOI: 10.3390/cancers14112654
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.575
Figure 1A summary of the methodology used to expand the range of molecular biomarkers of colorectal cancer (CRC). Panel (A): A training set of 10 known CRC markers is used to interrogate transcription factor (TF) databases and, separately, functional pathways databases. Panel (B): Validating the procedure using a set of 10 completely different markers with the same procedure as in (A). Panel (C): The virtually identical set of novel markers identified in (A) and (B) is further ranked using gene co-expression information to prioritize the most likely putative markers for further validation using quantitative PCR analysis of surgically resected CRC tissues.
Figure 2A hypothetical map illustrating TF network hierarchy. A master transcription factor is highlighted with green background. Other transcription factors are highlighted using blue-filled shapes. A red gene depicts a ‘seed’ marker gene known to be involved in a disease (CRC in our case). The red dashed lines with arrows indicate the approach to discovering potentially co-regulated genes (blue) that share the same upstream transcription factor(s). Identification of a master transcription factor (purple dashed line) may lead to discovery of other relevant TFs and of other genes (purple).
Figure 3A pathway involved in proteoglycans in cancer (colorectal cancer, Homo sapiens). Reprinted with permission from [33].
Figure 4Gene co-expression network for KRAS as depicted using data from the Gene Expression Omnibus on the GeneMania platform. The differing thickness in lines relates to the strength of co-expression (thicker lines show stronger co-expression). Yellow lines denote physical interactions, blue lines denote co-expression, orange lines predict co-expression, blue lines co-localization, burgundy genetic interactions and black denotes similar pathways.
Known CRC biomarkers used in this study.
| Training Set | Validation Set |
|---|---|
| BAG1 | BAX |
| BCL-2 | CDH1 |
| CDKN1A | CDKN1B |
| CXCR4 | EGFR |
| ERBB2 | ESR1 |
| KRAS | MK167 |
| PIK3CA | PLAU |
| PTEN | TERT |
| TFGBRII | TP53 |
| TYMS | VEGF |
Top 44 of the predicted genes/proteins of significance to CRC.
| Predicted Genes | Main Functions or Relevant |
|---|---|
| CDK2, CDK4, CDK6, CDKN1A, CDKN1B, CCNA1, CCND1, CCND2, CCND3, CCNE1 | Regulation of cell cycle |
| EGF, EGFR, FGFR1, HRAS, KDR *, KRAS, PIK3CA, PIK3R3, TGFB1, TGFB3, TGFBR2 | Cell growth, proliferation, differentiation or embryogenesis, wound healing |
| BAD, BAX, BCL2, BCL2L1, BID | Regulation of apoptosis and cell death |
| CREB1, E2F3, SMAD4, STAT1 | Transcription factors |
| CDH1, CTNNB1, FN1 | Cell adhesion, motility and/or shape |
| PDGFC, PDGFRB, VEGFC | Growth factors and their receptors |
| MAPK3, SHC1, IRS2 | Cellular signaling, signal transduction |
| PTEN *, TSC2 | Tumor suppressor genes |
| MMP2 | Extracellular metalloproteinase |
| TLR2 | Immune system regulation |
| MDM2 | Ubiquitin-protein ligase |
* KDR and PTEN genes were excluded from further qPCR analysis.
Figure 5Experimental validation of differential expression of the predicted CRC marker genes in three patients using qPCR. The expression values on a Log(2) scale are shown. Panel (A): moderately differentiated adenocarcinoma of the sigmoid colon, T2N0M0, EMVI negative. Panel (B): A moderately differentiated adenocarcinoma of the sigmoid colon, T4N2M0, EMVI positive. Panel (C): A moderately differentiated adenocarcinoma of the caecum, T3N1M0, EMVI negative. All amplifications were performed in triplicate. Confidence intervals (p = 0.05) are shown as black error bars. Expression values were normalized to the endogenous levels of three reference RNAs (GAPDH mRNA, 18S and 28S rRNAs). Significantly upregulated genes are labelled in red and significantly downregulated genes are shown in blue.
Expression of the 42 selected mRNA tested in the excised CRC tissues.
| Gene 1 | Patient 1 | Patient 2 | Patient 3 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Expression 2 | Expression 2 | Expression 2 | |||||||
| CCNA1 | 0.890 | 0.4366 | 2.343 | ↑↑ | 0.0046 | 0.380 | ↓↓ | 0.0088 | |
| CCND1 | 1.423 | 0.1843 | 3.433 | ↑↑ | 0.0031 | 5.627 | ↑↑↑ | 0.0001 | |
| CCND2 | 1.093 | 0.7726 | 1.060 | 0.9889 | 5.253 | ↑↑↑ | 0.0003 | ||
| CCND3 | 0.320 | ↓↓ | 0.0260 | 1.043 | 0.7505 | 9.560 | ↑↑↑↑ | 0.0001 | |
| CCNE1 | 1.050 | 0.9274 | 3.413 | ↑↑ | 0.0036 | 0.800 | 0.0602 | ||
| CDK2 | 3.480 | ↑↑ | 0.0116 | 4.187 | ↑↑↑ | 0.0002 | 2.527 | ↑↑ | 0.0014 |
| CDK4 | 2.360 | ↑↑ | 0.0017 | 3.170 | ↑↑ | 0.0015 | 2.380 | ↑↑ | 0.0056 |
| CDK6 | 2.330 | ↑↑ | 0.0048 | 3.870 | ↑↑ | 0.0019 | 6.717 | ↑↑↑ | 0.0001 |
| CDKN1A | 1.060 | 0.6550 | 1.250 | 0.2477 | 1.333 | 0.1679 | |||
| CDKN1B | 1.707 | ↑ | 0.0163 | 1.177 | 0.3277 | 5.533 | ↑↑↑ | 0.0001 | |
| EGF | 1.160 | 0.6528 | 4.600 | ↑↑↑ | 0.0013 | 9.340 | ↑↑↑↑ | 0.0002 | |
| EGFR | 0.390 | ↓↓ | 0.0216 | 1.563 | ↑ | 0.0047 | 3.617 | ↑↑ | 0.0001 |
| FGFR1 | 2.690 | ↑↑ | 0.0002 | 0.713 | 0.0767 | 3.490 | ↑↑ | 0.0017 | |
| HRAS | 0.280 | ↓↓ | 0.0144 | 1.157 | 0.3894 | 1.560 | ↑ | 0.0151 | |
| KRAS | 0.373 | ↓↓ | 0.0097 | 1.230 | 0.3994 | 8.313 | ↑↑↑↑ | 0.0002 | |
| PIK3CA | 0.297 | 0.0576 | 1.557 | ↑ | 0.0178 | 9.830 | ↑↑↑↑ | 0.0006 | |
| PIK3R3 | 5.847 | ↑↑↑ | 0.0008 | 1.130 | 0.1943 | 10.26 | ↑↑↑↑ | 0.0005 | |
| TGFB1 | 1.617 | 0.0728 | 1.917 | ↑ | 0.0000 | 0.143 | ↓↓↓ | 0.0060 | |
| TGFB3 | 0.807 | 0.2360 | 1.203 | 0.8063 | 2.830 | ↑↑ | 0.0165 | ||
| TGFBR2 | 0.913 | 0.2790 | 0.567 | ↓ | 0.0114 | 5.737 | ↑↑↑ | 0.0008 | |
| BAD | 1.670 | ↑ | 0.0052 | 10.07 | ↑↑↑↑ | 0.0000 | 0.210 | ↓↓↓ | 0.0135 |
| BAX | 0.870 | 0.4985 | 4.483 | ↑↑↑ | 0.0098 | 0.387 | ↓↓ | 0.0049 | |
| BCL2 | 0.543 | 0.1271 | 0.210 | ↓↓↓ | 0.0184 | 1.253 | ↑ | 0.0331 | |
| BCL2L1 | 2.430 | ↑↑ | 0.0101 | 8.863 | ↑↑↑↑ | 0.0000 | 1.097 | 0.3578 | |
| BID | 7.500 | ↑↑↑ | 0.0002 | 6.157 | ↑↑↑ | 0.0001 | 7.463 | ↑↑↑ | 0.0002 |
| CREB1 | 1.563 | 0.1042 | 2.363 | ↑↑ | 0.0008 | 13.94 | ↑↑↑↑ | 0.0000 | |
| E2F3 | 2.600 | ↑↑ | 0.0071 | 1.100 | 0.7703 | 4.443 | ↑↑↑ | 0.0012 | |
| SMAD4 | 0.507 | ↓ | 0.0035 | 1.083 | 0.4171 | 0.123 | ↓↓↓↓ | 0.0023 | |
| STAT1 | 1.097 | 0.4855 | 1.523 | ↑ | 0.0456 | 2.790 | ↑↑ | 0.0370 | |
| CDH1 | 0.993 | 0.7830 | 1.267 | 0.1972 | 6.590 | ↑↑↑ | 0.0001 | ||
| CTNNB1 | 2.403 | ↑↑ | 0.0008 | 0.853 | 0.1334 | 10.96 | ↑↑↑↑ | 0.0001 | |
| FN1 | 3.673 | ↑↑ | 0.0039 | 0.843 | 0.1443 | 2.403 | ↑↑ | 0.0005 | |
| PDGFC | 0.600 | ↓ | 0.0110 | 3.247 | ↑↑ | 0.0016 | 2.813 | ↑↑ | 0.0033 |
| PDGFRB | 3.350 | ↑↑ | 0.0028 | 4.733 | ↑↑↑ | 0.0002 | 3.010 | ↑↑ | 0.0069 |
| VEGFC | 1.583 | ↑ | 0.0249 | 2.990 | ↑↑ | 0.0007 | 0.237 | ↓↓↓ | 0.0212 |
| MAPK3 | 0.617 | 0.0947 | 2.313 | ↑↑ | 0.0006 | 4.903 | ↑↑↑ | 0.0044 | |
| SHC1 | 2.607 | ↑↑ | 0.0062 | 1.250 | ↑ | 0.0030 | 12.06 | ↑↑↑↑ | 0.0002 |
| IRS2 | 15.74 | ↑↑↑↑ | 0.0001 | 4.457 | ↑↑↑ | 0.0001 | 4.300 | ↑↑↑ | 0.0003 |
| TSC2 | 0.840 | 0.2096 | 2.453 | ↑↑ | 0.0034 | 4.810 | ↑↑↑ | 0.0012 | |
| MMP2 | 1.210 | 0.4724 | 2.573 | ↑↑ | 0.0057 | 1.217 | ↑ | 0.0097 | |
| TLR2 | 3.913 | ↑↑ | 0.0045 | 9.013 | ↑↑↑↑ | 0.0000 | 0.927 | 0.6464 | |
| MDM2 | 1.300 | 0.6030 | 1.427 | 0.1993 | 7.947 | ↑↑↑ | 0.0001 | ||
1 The genes are arranged according to their main functions or known molecular phenomena involved. 2 Averaged gene expression ratios (tumor v matching normal colon, n = 3). Significantly upregulated mRNA (arrows point up), downregulated mRNAs (arrows pointing down) (p < 0.05). Arrows emphasize the degree of differential expression (one arrow indicate <2 fold difference, two arrows indicate 2 to 4 fold difference, three arrows indicate 4 to 8 fold difference, four arrows indicate over 8 fold difference, all at p < 0.05).