| Literature DB >> 30809322 |
Efstathios-Iason Vlachavas1,2,3, Eleftherios Pilalis1,3, Olga Papadodima1, Dirk Koczan4, Stefan Willis5, Sven Klippel5, Caixia Cheng6, Leyun Pan6, Christos Sachpekidis6, Alexandros Pintzas1, Vasilis Gregoriou1, Antonia Dimitrakopoulou-Strauss6, Aristotelis Chatziioannou1,3.
Abstract
PURPOSE: Transcriptomic profiling has enabled the neater genomic characterization of several cancers, among them colorectal cancer (CRC), through the derivation of genes with enhanced causal role and informative gene sets. However, the identification of small-sized gene signatures, which can serve as potential biomarkers in CRC, remains challenging, mainly due to the great genetic heterogeneity of the disease.Entities:
Keywords: 18F-FDG PET; ACADM, Acyl-Coenzyme A Dehydrogenase; AUC, Area Under the Curve; CCT7, Chaperonin Containing TCP1 Subunit 7; CD44, CD44 Molecule (Indian Blood Group); CRC, Colorectal cancer; Colorectal cancer; DE, Differentially Expressed; FD, Fractal Dimension; FDG, F-18-Fluorodeoxyglucose; GDC, Genomics Data Commons; GEO, Gene Expression Omnibus; GSTP1, Glutathione S-Transferase Pi 1; KIT, Proto-Oncogene Receptor Tyrosine Kinase; Lasso, least absolute shrinkage and selection operator; MFA, Multiple Factor Analysis; Microarray analysis; PCs, Principal Components; PET, Positron Emission Tomography; ROC, Receiver-operator Characteristic curve; Radiogenomics; SUV, Standardized Uptake Value; TCGA; TCGA-COAD, The Cancer Genome Atlas-Colon Adenocarcinoma; Translational bioinformatics
Year: 2019 PMID: 30809322 PMCID: PMC6374701 DOI: 10.1016/j.csbj.2019.01.007
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Basic clinicopathological characteristics of the 30 total patients in the two microarray datasets. Overall, the patients represent a relatively homogenous cohort with only primary colorectal cancer adenocarcinomas.
| Patient Characteristics | Microarray Platform | ||
|---|---|---|---|
| hgu133a (13 patients) | hgu133plus2 (17 patients) | ||
| Number of patients | |||
| Sex | F | 7 | 8 |
| M | 6 | 9 | |
| Tumor Stage | T1 | 1 | 1 |
| T2 | – | 6 | |
| T3 | 10 | 10 | |
| T4 | 2 | – | |
| Lymph Node Status | N0 | 7 | 12 |
| N1 | 2 | 1 | |
| N2 | 4 | 4 | |
| Synchronous distinct metastasis (M) | 4 | 2 | |
| Anatomic Location | Right-sided | 7 | 8 |
| Left-sided | 6 | 9 | |
| Mean Age, (range) | 70 (58–81) | 64 (51–83) | |
Fig. 1Schematic presentation of the two-tissue compartment. Cplasma represents the tracer concentration in blood, C1 the unbound (non-metabolized) tracer in tissue, and C2 the metabolized tracer in tissue. In case of FDG k1 is the transport rate of the tracer from blood to tissue, k2 the transport rate back to blood, k3 the phosphorylation rate and k4 the dephosphorylation rate.
Fig. 2Computational workflow applied for the integrative analysis of the two microarray datasets. Complete analysis was performed in R statistical software/Bioconductor (R versions 3.2.2, 3.3.1 & 3.5.0). Details of the framework are described at the Supplementary Legend of Fig. 2.
Fig. 5Grouped dotplots of the average cross-validation resampling results (representative example using a specific random seed), for the 3 groups of variables: only the 94 hub genes, only the 8 PET variables and the combination of both (ROC, Sensitivity, Specificity-10 fold cross-validation, repeated 10 times for 10 different random seeds). Using the total merged microarray dataset as the training set, the composite signature yields overall better performance measures than either PET, or genes separately.
Fig. 3A. Top ranked GO Biological Processes resulting from the application of semantic analysis on the 1760 genes found as differentially expressed between cancer and normal tissue. GO terms identified as significantly enriched are grouped according to their biological relevance (horizontal axis). The vertical axis depicts the number of relevant genes. B. Common and unique DE genes found by the three statistical comparisons: all 30 cancer samples versus their paired controls (‘total comparison’), 24 cancer samples from patients without distant metastases versus their adjacent paired controls (‘non-metastatic’ comparison), and 6 cancer samples from patients with synchronous distant metastases versus their respective controls (‘metastatic’ comparison).
Fig.4A. Correlation plot (FactoShiny R package) combining 94 selected linker genes (marked in green) with the 8 PET variables (marked in red). On the first PC (Dim1) genes significantly correlate with four of the PET variables (SUV, k3, FD, INF), whereas the 4 (k4, VB, k1, k2) are largely orthogonal (independent) and contribute additional information to the composite dataset (Dim2). B. Projection of the samples to the first two PCs, based on both PET and gene features (R package factoextra). First PC (Dim1), spanned by the 94 genes and 4 of the PET variables, clearly separates cancer samples from the adjacent control ones. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 6Comparative heatmap plots of hierarchical clustering (distance: Pearson, linkage: average, scaled log2 normalized intensity values) comparing different disease states of the microarray dataset, using the R package ComplexHeatmap [22] (distance: Pearson, linkage: average) A. Initial composite signature B. Lasso-optimized, compact composite signature of 7 genes and 5 PET variables. The compact composite signature preserves the discrimination potential of the initial signature.
Fig. 7Heatmap of the 22-genes signature in the TCGA RNA-Seq dataset (456 cancer samples, 41 normal samples-R package ComplexHeatmap). Samples with relatively high expression of a given gene are marked in red and samples with relatively low expression are marked in blue. The gene set achieves an overall good separation of normal and cancer samples. Samples and genes have been reordered by the method of hierarchical clustering (average method, Pearson distance). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 8Kaplan-Meier plot of overall survival estimates, comparing the 2 patient clusters survival outcomes, based on the expression of the 6 genes of Group2 in the TCGA-COAD dataset (log-rank p-value-R package survmimer).
Fig. 9Kaplan-Meier plots of overall survival, examining the prognosis of the 4 final patient clusters using the 22- genes signature in the TCGA-COAD dataset, based only on the available cancer samples. Fig. 7A illustrates the overall survival estimates between the groups “HighGroup1.HighGroup2” and “HighGroup1.LowGroup2”, whereas Fig. 7B compares “HighGroup1.HighGroup2 and “LowGroup1.LowGroup2” (log-rank p-value-R package survminer).