| Literature DB >> 35209912 |
Abstract
BACKGROUND: Feature selection of multi-omics data analysis remains challenging owing to the size of omics datasets, comprising approximately [Formula: see text]-[Formula: see text] features. In particular, appropriate methods to weight individual omics datasets are unclear, and the approach adopted has substantial consequences for feature selection. In this study, we extended a recently proposed kernel tensor decomposition (KTD)-based unsupervised feature extraction (FE) method to integrate multi-omics datasets obtained from common samples in a weight-free manner.Entities:
Keywords: Feature selection; Kernel trick; Multiomcis; Tensor decomposition
Mesh:
Year: 2022 PMID: 35209912 PMCID: PMC8876179 DOI: 10.1186/s12920-022-01181-4
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Confusion matrix when applying KTD-based unsupervised FE to a synthetic dataset ()
| Adjusted | Adjusted | |
|---|---|---|
| 7.06 | 2.94 | |
| 0.04 | 989.96 |
Fig. 1Schematic representation of HBV vaccination data analysis. Analysis starts from the center, moves to the right, comes back to the center, and then moves to the left. The cyan rectangle annotated as “methylation” is , the yellow rectangle annotated as “gene” is , the green rectangle annotated as “WBC” is , and the magenta rectangle annotated as “Plasma” is . The four tilted cubes to the right of these four rectangles are , whose correspondence with is indicated by the same color. The tilted cubes colored by layers to the right of the four tilted cubes represent the bundle of . The right-most figure with a blue cube annotated as “G” at the center corresponds to TD shown in Eq. (16). The four colored rectangles to the left of the four colored and annotated rectangles represent the singular-value vectors computed by Eq. (17). Genes are selected from these singular-value vectors using P values computed by Eq. (18). For methylation, transcription factors (TFs) are further selected by Enricher using the selected genes (Table 3). The selected genes and TFs are then uploaded to Enrichr to validate the biological reliability (the left-most figure with color gradation)
TFs enriched in the “ChEA 2016” Enrichr category (adjusted P values ) when 1335 genes associated with 2077 methylation probes selected by KTD-based unsupervised FE were considered (the full list is available in Additional file 2: Data S1)
| TFs | ZNF217, TCF4, STAT3, SMARCD1, WT1, FOXA2, PAX3-FKHR, SMAD4, SMAD3, SOX9, TFAP2C, YAP1, AR, SOX2, CTNNB1, VDR, PIAS1, TEAD4, MITF, HNF4A, SUZ12 |
Fig. 2Left: , middle: , right: when HOSVD is applied to a linear kernel computed using HBV vaccine data. The Pearson correlation coefficient between and is ()
Fig. 3Boxplot of when HOSVD is applied to a linear kernel computed using kidney cancer data. Left: TCGA, , right: GEO, . P values are based on the t test applied to . T: tumors, N: normal kidney samples
Confusion matrix when linear regression, lasso and rf were applied to the synthetic dataset ()
| Linear regression | Lasso | Rf | ||||
|---|---|---|---|---|---|---|
| Adjusted | Adjusted | |||||
| Selected | not selected | Selected | Not selected | |||
| 0.07 | 19.93 | 4.62 | 15.383 | 17.55 (5.82) | 2.45 (14.18) | |
| 0.03 | 979.97 | 2.12 | 977.88 | 495.43 (14.18) | 484.57 (965.82) | |
| 0.07 | 19.93 | 4.70 | 15.30 | 17.69 (5.67) | 2.31 (14.33) | |
| Other than above | 0.01 | 979.99 | 2.27 | 977.73 | 494.70 (14.33) | 485.30 (965.67) |
| 0.09 | 19.91 | 4.55 | 15.45 | 17.71(5.46) | 2.29 (14.54) | |
| Other than above | 0.01 | 979.99 | 2.12 | 977.78 | 496.68 (14.54) | 483.32 (965.46) |
For cases when rf was employed, the results when the top most features with larger absolute importance were selected have also been shown in parentheses
Eight genes associated with 11 probes identified as DEGs when gene expression profiles were considered. Proteins identified as DEGs when gene expression profiles in the proteome were considered
| Gene symbols | S100A9, CD74, hba1, ACTB, HBB, HBA2, MALAT1, COX1 |
| WBC | HIST1H2BJ, HIST2H2BF, HIST1H2BG, HIST1H2BB, HIST1H2BD, ACTG1, HIST1H2BL, HIST1H2BN, PFN1, HIST1H2BK, HIST3H2BB, ACTB, HBB, HBA2, HIST1H2BA, HIST1H2BI, HIST1H2BC, HIST1H2BO, HIST2H2BE, HIST1H2BM, HBA1, HIST1H2BF, HIST1H2BE, HIST1H2BH |
| Plasma | FGA, HP, GSN, ALB, FGG, IGLL5, APOA1, SERPINA1, ORM1, TF, GC, CP, C4A, CSF3R, A2M, HPX, HRG, A1BG, CFH, APOB, C3, CLEC14A |
Proteins identified as DEGs when gene expression profiles in the proteome were considered
| WBC | HIST1H2BJ, HIST2H2BF, HIST1H2BG, HIST1H2BB, HIST1H2BD, ACTG1, HIST1H2BL, HIST1H2BN, PFN1, HIST1H2BK, HIST3H2BB, ACTB, HBB, HBA2, HIST1H2BA, HIST1H2BI, HIST1H2BC, HIST1H2BO, HIST2H2BE, HIST1H2BM, HBA1, HIST1H2BF, HIST1H2BE, HIST1H2BH |
| Plasma | FGA, HP, GSN, ALB, FGG, IGLL5, APOA1, SERPINA1, ORM1, TF, GC, CP, C4A, CSF3R, A2M, HPX, HRG, A1BG, CFH, APOB, C3, CLEC14A |
Confusion matrix of selected mRNAs between TCGA and GEO datasets
| GEO | |||
|---|---|---|---|
| TCGA | 17269 | 101 | |
| 65 | 5 | ||
, Odds ratio: 13.13
Fig. 4Scatter plot of kernels between messenger RNA (mRNA) and microRNA (miRNA). Upper: TCGA (Pearson correlation coefficient ), Lower: GEO(Pearson correlation coefficient )
Correlation between HBV vaccine experiment kernels. Upper triangle: Pearson correlation, lower triangle: P values
| – | Gene expression | WBC | Plasma |
|---|---|---|---|
| Gene expression | – | 0.1279405 | 0.2192163 |
| WBC | – | 0.4384998 | |
| Plasma | – |