| Literature DB >> 28149325 |
SungHwan Kim1,2, Jae-Hwan Jhong1, JungJun Lee1, Ja-Yong Koo1.
Abstract
BACKGROUND: Of late, high-throughput microarray and sequencing data have been extensively used to monitor biomarkers and biological processes related to many diseases. Under this circumstance, the support vector machine (SVM) has been popularly used and been successful for gene selection in many applications. Despite surpassing benefits of the SVMs, single data analysis using small- and mid-size of data inevitably runs into the problem of low reproducibility and statistical power. To address this problem, we propose a meta-analytic support vector machine (Meta-SVM) that can accommodate multiple omics data, making it possible to detect consensus genes associated with diseases across studies.Entities:
Keywords: Data integration; Meta-analysis; Support vector machine; TCGA
Year: 2017 PMID: 28149325 PMCID: PMC5270233 DOI: 10.1186/s13040-017-0126-8
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Shown are the results of experimental studies to compare the meta-logistic model with the meta-analytic SVM
| Meta-SVM | Meta-logistic regression | |||||
|---|---|---|---|---|---|---|
| Variance ( | Sensitivity (SE) | Specificity (SE) | Youden | Sensitivity (SE) | Specificity (SE) | Youden |
| No inclusion of random study | ||||||
| 0.1 | 0.828 (0.001) | 0.9843 (0) | 1.812 | 0.1073 (0) | 1 (0) | 1.107 |
| 0.3 | 0.8127 (0.002) | 0.8707 (0.001) | 1.683 | 0.2087 (0.001) | 0.996 (0) | 1.205 |
| 0.5 | 0.76 (0.002) | 0.867 (0.001) | 1.627 | 0.2633 (0.001) | 0.9123 (0.001) | 1.176 |
| Inclusion of one random study | ||||||
| 0.1 | 0.8007 (0.011) | 0.9837 (0.002) | 1.784 | 0.102 (0.004) | 0.997 (0.001) | 1.099 |
| 0.3 | 0.6673 (0.013) | 0.8497 (0.009) | 1.517 | 0.2113 (0.009) | 0.966 (0.005) | 1.177 |
| 0.5 | 0.6013 (0.017) | 0.852 (0.009) | 1.453 | 0.2527 (0.01) | 0.8667 (0.008) | 1.119 |
| Inclusion of two random studies | ||||||
| 0.1 | 0.624 (0.016) | 0.9737 (0.009) | 1.598 | 0.0847 (0.005) | 0.994 (0.001) | 1.079 |
| 0.3 | 0.51 (0.016) | 0.8433 (0.006) | 1.353 | 0.1727 (0.011) | 0.9317 (0.005) | 1.104 |
| 0.5 | 0.4167 (0.012) | 0.85 (0.009) | 1.267 | 0.256 (0.012) | 0.8193 (0.006) | 1.075 |
Shown are the brief descriptions of the eight microarray datasets of disease-related binary phenotypes (e.g., case and control). All datasets are publicly available
| Name | Study | Type | # of samples | Control | Case | Reference |
|---|---|---|---|---|---|---|
| TCGA | breast cancer | mRNA | 300 | 234 (ER+) | 66 (ER-) | The Cancer Genome Atlas (TCGA) |
| TCGA | breast cancer | Methylation | 300 | 234 (ER+) | 66 (ER-) | The Cancer Genome Atlas (TCGA) |
| TCGA | breast cancer | CNV | 300 | 234 (ER+) | 66 (ER-) | The Cancer Genome Atlas (TCGA) |
| KangA (batch 1) | IPF | mRNA | 63 | 11 | 52 | Kang |
| KangB (batch 2) | IPF | mRNA | 96 | 21 | 75 | Kang et al. (2012) GSE47460 |
| Konishi | IPF | mRNA | 38 | 15 | 23 | Konishi et al. (2009), GSE10667 |
| Pardo | IPF | mRNA | 24 | 11 | 13 | Pardo et al. (2005), GSE2052 |
This table includes selected features of multiple omics data via the Meta-SVM
| Four studies of lung disease (IPF) |
| C20orf114 MMP7 CXCL14 AGER TMEM100 THY1 CXCL2 HSD17B6 CCL18 CPA3 GEM |
| LEPREL1 ANXA3 CYP1B1 LRRC32 EMP2 FHL2 ADM C7 ITGA7 IGFBP2 BACE2 FKBP11 |
| RGS5 FCGR3A SRPX FBLN2 HPCAL1 SOX4 CD248 CLDN5 LTBP1 ALOX5AP |
| Three multi-omics data of breast cancer (TCGA) |
| ABCC11 ABCC8 ACOX2 CAMP CST9L GRPR LAMP3 LCN2 LTF |
| MUCL1 NME5 THRSP VTCN1 |
| - This gene set is significantly enriched in the ABC transporters (KEGG) |
| (FDR adjusted |
An algorithm for the meta-analytic SVM via Newton’s method
| Step 1: For 1≤ |
| Step 2: Set |
|
|
| for 0≤ |
| Step 3: Update |