| Literature DB >> 29284484 |
Yasir Rahmatallah1, Magomed Khaidakov2,3, Keith K Lai4, Hannah E Goyne5, Laura W Lamps5, Curt H Hagedorn2,3, Galina Glazko6.
Abstract
BACKGROUND: Sessile serrated adenomas/polyps are distinguished from hyperplastic colonic polyps subjectively by their endoscopic appearance and histological morphology. However, hyperplastic and sessile serrated polyps can have overlapping morphological features resulting in sessile serrated polyps diagnosed as hyperplastic. While sessile serrated polyps can progress into colon cancer, hyperplastic polyps have virtually no risk for colon cancer. Objective measures, differentiating these types of polyps would improve cancer prevention and treatment outcome.Entities:
Keywords: Cantelli’s inequality; Feature selection; Formalin-fixed paraffin-embedded; Hyperplastic polyps; Microarrays; Molecular signature; RNA-seq; Sessile serrated adenoma/polys; Shrunken centroid classifier; Summary metric
Mesh:
Substances:
Year: 2017 PMID: 29284484 PMCID: PMC5745747 DOI: 10.1186/s12920-017-0317-7
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1Examples illustrating the new feature selection step. a The fold change in both platforms was larger than the within-phenotype variability and the correlation coefficient between platforms (ρ ) was high; b when phenotypic labels in part A were randomly resampled, the fold change in both platforms became negligible as compared to the within-phenotype variability and the correlation coefficient between platforms (ρ ) became low. c The fold change in both platforms was smaller than the within-phenotype variability and the correlation coefficient between platforms (ρ ) was low; d when phenotypic labels in part C were randomly resampled, the correlation coefficient (ρ ) was low
Fig. 2Venn diagram summarizing the DE genes in three comparisons
Fig. 3Principle component analysis (PCA) scatter plots. a SSA/P and HP samples are not well-separated when all the expressed genes are considered; b control right (CR) and control left (CL) samples are well-separated when all the expressed genes are considered; c SSA/P and HP samples are well-separated when only the genes differentially expressed between SSA/Ps and both HPs and CRs with the exclusion of genes DE between CR and CL are considered (139 genes); d CR and CL samples are well-separated when only the 139 genes in (c) are considered
Fig. 4Heatmap of RNA-seq expression data. Hierarchical clustering of CR (green), HP (yellow) and SSA/Ps (blue) biopsies (columns) and differentially expressed genes (rows). Only genes that were expressed at the same level in HP and CR samples but significantly up- or down-regulated in SSA/Ps are shown. Down-regulated and up-regulated genes in SSA/Ps are indicated in blue and orange colors, respectively. The log2(SSA/P / HP) is shown on the left side of gene names
Up-regulated pathways (GO categories)
| Category | Pathway | FDRa |
|---|---|---|
| Cell adhesion | ||
| CALCIUM INDEPENDENT CELL CELL ADHESION | 0.022 | |
| CELL SUBSTRATE ADHERENS JUNCTION | 0.042 | |
| Cell growth and death | ||
| CELL STRUCTURE DISASSEMBLY DURING APOPTOSIS | 0.033 | |
| POSITIVE REGULATION OF CELL PROLIFERATION | 0.033 | |
| Immune system | ||
| INFLAMMATORY RESPONSE | 0.033 | |
| IMMUNOLOGICAL SYNAPSE | 0.045 | |
| Signal transduction | ||
| POSITIVE REGULATION OF SECRETION | 0.045 | |
| G PROTEIN COUPLED RECEPTOR PROTEIN SIGNALING | 0.042 | |
| SECOND MESSENGER MEDIATED SIGNALING | 0.045 | |
| Metabolism | ||
| AROMATIC COMPOUND METABOLIC PROCESS | 0.022 | |
| HETEROCYCLE METABOLIC PROCESS | 0.022 | |
| Differentiation | ||
| CELLULAR MORPHOGENESIS DURING DIFFERENTIATION | 0.045 | |
| Cellular component organization | ||
| EXTRACELLULAR STRUCTURE ORGANIZATION AND BIOGENESIS | 0.042 | |
| Neuron development | ||
| AXONOGENESIS | 0.042 | |
| NEURITE DEVELOPMENT | 0.045 |
aFDR: False Discovery Rate
Fig. 5MST2 of the ‘Golgi stack’ gene set from the C5 collection of MSigDB. This gene set was detected by GSNCA (P < 0.05) in both comparisons: HPs versus SSA/Ps and CRs versus SSA/Ps
Performance of the nearest SCC classifying independent SSA/P and HP microarray samples using three signatures
| Platforms | Concordant genes | Signature size | Signature | Illum.a
| Affy.b
|
|---|---|---|---|---|---|
| Training: RNA-seq | C4BPA,CEMIP,CHGA,CLDN1,CPE,DPP10,FSIP2,GRAMD1B,GRIN2D,IL2RG,KIZ,KLK7,MEGF6,MYCN,NTRK2,PLA2G16,RAMP1,SBSPON,SEMG1,SLC7A9,SPIRE1,TM4SF4 | 18 | C4BPA,CHGA,CLDN1,CPE,DPP10,GRAMD1B,GRIN2D,KIZ,KLK7,MEGF6,MYCN,NTRK2,PLA2G16,SBSPON,SEMG1,SLC7A9,SPIRE1,TM4SF4 | 0 | – |
| Training: RNA-seq | CLDN1,FOXD1,IDO1,IL2RG,KIZ,LMO4,MEGF6,NTRK2,PIK3R3,PLA2G16,PRUNE2,PTAFR,SBSPON,SEMG1,SLC7A9,SPIRE1,TACSTD2,TPD52L1,TRIB2,ZIC2 | 16 | CLDN1,FOXD1,KIZ,MEGF6,NTRK2,PIK3R3,PLA2G16,PRUNE2,PTAFR,SBSPON,SEMG1,SLC7A9,SPIRE1,TACSTD2,TPD52L1,TRIB2 | – | 3 |
| Training: RNA-seq | CHFR,CHGA,CLDN1,IL2RG,KIZ,MEGF6,NTRK2,PLA2G16,PTAFR,SBSPON,SEMG1,SLC7A9,SPIRE1,TACSTD2,VSIG1,ZIC2 | 13 | CHFR,CHGA,CLDN1,KIZ,MEGF6,NTRK2,PLA2G16,PTAFR,SBSPON,SEMG1,SLC7A9,SPIRE1,TACSTD2 | 0 | 3 |
aIllumina microarrays
bAffymetrix HGU133plus2 microarrays
Genes, included in the smallest 13 gene expression signature of SSA/Ps
| Gene | log2FC | FC | Description |
|---|---|---|---|
| SLC7A9 | 3.22 | 9.34 | Solute carrier family 7 member 9 |
| SEMG1 | 2.95 | 7.72 | Semenogelin I |
| MEGF6 | 2.66 | 6.34 | Multiple EGF like domains 6 |
| TACSTD2 | 1.93 | 3.82 | Tumor-associated calcium signal transducer 2 |
| CLDN1 | 1.85 | 3.59 | Claudin 1 |
| SBSPON | 1.23 | 2.35 | Somatomedin B and thrombospondin type 1 domain containing |
| PLA2G16 | 1.18 | 2.27 | Phospholipase A2 group XVI |
| PTAFR | 1.08 | 2.11 | Platelet activating factor receptor |
| KIZ | 0.98 | 1.98 | Kizuna centrosomal protein |
| SPIRE1 | 0.82 | 1.76 | Spire type actin nucleation factor 1 |
| CHFR | −0.62 | 0.65 | Checkpoint with forkhead and ring finger domains, E3 ubiquitin protein ligase |
| CHGA | −1.63 | 0.32 | Chromogranin A |
| NTRK2 | −2.32 | 0.20 | Neurotrophic tyrosine kinase, receptor, type 2 |
Fig. 6The probability of an assigned SSA/P (HP) class is the cumulative distribution function CDF(SM) (1-CDF(SM)) of the empirical distribution of SM after standardization. The empirical approach can also be substituted by the normal approximation of SM. Since both approaches have limitations, the Cantelli lower bound (CLB) is used as a conservative probability assignment for the SM score