| Literature DB >> 24909518 |
Chun-Pei Cheng, Christopher DeBoever, Kelly A Frazer1, Yu-Cheng Liu, Vincent S Tseng.
Abstract
BACKGROUND: Human disease often arises as a consequence of alterations in a set of associated genes rather than alterations to a set of unassociated individual genes. Most previous microarray-based meta-analyses identified disease-associated genes or biomarkers independent of genetic interactions. Therefore, in this study, we present the first meta-analysis method capable of taking gene combination effects into account to efficiently identify associated biomarkers (ABs) across different microarray platforms.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24909518 PMCID: PMC4068973 DOI: 10.1186/1471-2105-15-173
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Characteristics of microarray datasets used in this study
| ESCC | 1-1 | GSE23400 | Affymetrix HG-U133A | 53/53 | 20,133/22,283 | 12,633 | 250 ± 22† | China | [ |
| 1-2 | GSE23400 | Affymetrix HG-U133B | 51/51 | 14,110/22,477 | 9,256 | 250 ± 22† | |||
| 1-3 | GSE20347 | Affymetrix HG-U133A_2 | 17/17 | 20,133/22,277 | 12,633 | 250 ± 22† | China | [ | |
| 1-4 | GSE29001 | Affymetrix HG-U133A_2 | 12/12 | 20,133/22,277 | 12,633 | 250 ± 22† | China | [ | |
| HCC | 2-1 | GSE14520 | Affymetrix HG-U133A_2 | 19/22 | 20,133/22,277 | 12,633 | 250 ± 22† | China | [ |
| 2-2 | GSE14520 | Affymetrix HT_HG-U133A | 210/225 | 20,429/22,277 | 12,743 | 440 ± 105† | |||
| 2-3 | GSE17856 | Agilent 014850 | 44/43 | 20,772/25,073 | 14,312 | 60 ± 0†† | Japan | [ |
ESCC: esophageal squamous cell carcinoma; HCC: hepatocellular carcinoma; N: # of normal samples; T: # of tumor samples; A: # of available probes matched with distinguishable gene IDs in a platform; D: # of downloaded probes contained in a platform; Avg: average; SD: standard deviation; †: Affymetrix probe set-matched target sequence; ††: Agilent spotted sequence.
Figure 1Comparisons of pairedsequences in two input sets. A) Average of maximum/mean/minimum similarity scores among different probe sequence sets in ESCC set. B) Average similarity as a function of maximum/average/minimum scores among different probe sequence sets in HCC set. C) Distributions of most similar paired probe sequences (Max group in the panel A). D) Distributions of most similar paired probe sequences (Max group in the panel B). Intra: different probes matched with same gene IDs; Inter: different probes matched with different gene IDs.
Example of sequence similarity matrix
| | | | |||||||
|---|---|---|---|---|---|---|---|---|---|
| | | | |||||||
| PF1 | P1 | G1 | 1.0 | 0.9 | 0.5 | 0.8 | 0.3 | 0.4 | 0.1 |
| P2 | G1 | | 1.0 | 0.4 | 0.8 | 0.2 | 0.3 | 0.2 | |
| P3 | G3 | | | 1.0 | 0.4 | 0.1 | 0.5 | 0.9 | |
| PF2 | P1 | G1 | | | | 1.0 | 0.0 | 0.5 | 0.3 |
| P2 | G2 | | | | | 1.0 | 0.8 | 0.4 | |
| PF3 | P1 | G2 | | | | | | 1.0 | 0.5 |
| P2 | G3 | 1.0 | |||||||
Figure 2Algorithm of identification.
Figure 3A genetic algorithm selection for improvement.
Figure 4Examination of improvedusing different TLCs. A)c-LMs derived from input ESCC set. B)c-LMs derived from input HCC set. Error bars indicate standard error of the means. TLC: threshold of LLVs.
Figure 5Testing improvedbuilt by different number of training datasets. A) Accuracy as a function of various ESCC training datasets, grouped by gene numbers. B) Accuracy as a function of various HCC training datasets, grouped by gene numbers. Error bars indicate standard error of the means. TrDS: number of training datasets.
Figure 6Scalability of improvedA) Accuracy as a function of various ESCC training datasets, grouped by gene numbers. B) Accuracy as a function of various HCC training datasets, grouped by gene numbers. C) Average running times of panel A and B. Error bars indicate standard error of the means. Improved: improved c-LMs; Defective: removing one out of an improved c-LM-contained genes by turns.
Figure 7Enrichment analysis of cancer-related GO terms. A) Performing the test in ESCC set. B) Performing the test in HCC set. ABs: associated biomarkers; Random: randomly selected genes from array platforms. Error bars indicate standard error of the means.
Figure 8Number of shortest paths among genes in network. A) Number of shortest paths as a function of different lengths for ESCC input set. B) Number of shortest paths as a function of different lengths for HCC input set. ABs: associated biomarkers; Random: randomly selected genes from array platforms.
Figure 9Distance of shortest paths among genes in network. The distance of shortest paths among genes for ESCC and HCC input sets. ABs: associated biomarkers; Random: randomly selected genes from array platforms.
List of ABs observed in multiple improved
| ESCC | COL3A1(10), COL1A2(9), FNDC3B(8), ID4(7), PMEPA1(7), BID(6), CDH11(4), COL11A1(4), ETV5(4), AGRN*(3), DHRS7(3), MMP11(3), TOM1L2(3), VPS8(3), ASXL1(2), C9orf100*(2), CHEK1(2), ECT2(2), FST(2), HMGB3(2), IGF2BP2(2), KIF14(2), KIRREL(2), NETO2(2), PGM5(2), PKD1*(2), PSME4(2), RPS20*(2), RUNX1*(2), STYXL1(2), TCFL5(2), TOB1(2), TSGA14(2) |
| HCC | PPIA(14), CXCL14(8), CAP2(6), CDKN2A(6), ECM1(6), APBA2BP(5), CLEC4M(5), ACLY(4), CXCL12(4), CYP1A2(4), SAC3D1(4), COX8A(3), FOS(3), SHFM1(3), SNRPE(3), SUB1(3), YWHAQ(3), AKR1C3(2), ATP6V0E1(2), BMI1(2)), CELSR3*(2), CUL4B(2), DR1(2), FASTK(2), FHIT(2), FLAD1(2), GABRP(2), GM2A(2), GTPBP9(2), HBB(2), ITGA6(2), LOC390998*(2), MEA1(2), MRPS35(2), NPM1(2), PCDHGC3*(2), PDCD5(2), PPP2R5A(2), PRKDC(2), REEP6*(2), SNW1(2), STAB2(2), TMED9(2), TNS1(2), VAMP4(2), XLKD1(2), ZRF1(2) |
ESCC: esophageal squamous cell carcinoma; HCC: hepatocellular carcinoma; N: # of appearance times; *: uncommon genes across a set of array platforms.