| Literature DB >> 28088185 |
Yi-Pin Lai1, Liang-Bo Wang1,2, Wei-An Wang1, Liang-Chuan Lai1,3, Mong-Hsun Tsai1,4, Tzu-Pin Lu5, Eric Y Chuang6,7.
Abstract
BACKGROUND: With the advancement in high-throughput technologies, researchers can simultaneously investigate gene expression and copy number alteration (CNA) data from individual patients at a lower cost. Traditional analysis methods analyze each type of data individually and integrate their results using Venn diagrams. Challenges arise, however, when the results are irreproducible and inconsistent across multiple platforms. To address these issues, one possible approach is to concurrently analyze both gene expression profiling and CNAs in the same individual.Entities:
Keywords: Copy number alteration; Gene expression; R/Bioconductor
Mesh:
Year: 2017 PMID: 28088185 PMCID: PMC5237550 DOI: 10.1186/s12859-016-1438-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The overall workflow of iGC. Two parameters are defined: the minimum CN changes to classify samples as G or L groups, and the minimum sample proportion showing CNAs in a population
The performance of the iGC and SIM packages in different scenarios
| Scenario | Gene number | Sample size | iGC sensitivity | iGC specificity | SIM sensitivity | SIM specificity |
|---|---|---|---|---|---|---|
| 1 | 100 | 50 | 0.6293 ± 0.075 | 0.8764 ± 0.025 | 0.2582 ± 0.1118 | 0.7527 ± 0.0373 |
| 2 | 100 | 100 | 0.7283 ± 0.0651 | 0.9094 ± 0.0217 | 0.3503 ± 0.0817 | 0.7834 ± 0.0272 |
| 3 | 100 | 200 | 0.807 ± 0.0562 | 0.9357 ± 0.0187 | 0.3766 ± 0.0834 | 0.7922 ± 0.0278 |
| 4 | 100 | 300 | 0.8436 ± 0.0531 | 0.9479 ± 0.0177 | 0.3982 ± 0.0831 | 0.7994 ± 0.0277 |
| 5 | 300 | 50 | 0.6326 ± 0.0426 | 0.8775 ± 0.0142 | 0.2058 ± 0.0592 | 0.7353 ± 0.0197 |
| 6 | 300 | 100 | 0.7287 ± 0.0372 | 0.9096 ± 0.0124 | 0.2735 ± 0.0475 | 0.7578 ± 0.0158 |
| 7 | 300 | 200 | 0.8053 ± 0.0328 | 0.9351 ± 0.0109 | 0.2553 ± 0.0454 | 0.7518 ± 0.0151 |
| 8 | 300 | 300 | 0.8415 ± 0.0313 | 0.9472 ± 0.0104 | 0.2431 ± 0.0425 | 0.7477 ± 0.0142 |
Fig. 2The distributions of p-values obtained from the iGC and SIM packages under different scenarios. Four sample sizes were simulated (N = 50, 100, 200 and 300) along with two numbers of genes were simulated (N = 100 for (a) and (b), N = 300 for (c) and (d)). Four groups with different Pearson correlation coefficients between CN and GE are illustrated using different colors: red, r = 0; blue, r = 0-0.3; green, r = 0.3-0.7; orange, r = 0.7-1. Each group has the same number of genes and the four groups are sorted based on the Pearson correlation coefficients
The top three significant genes with copy number gain or loss in the TCGA dataset
| Genes | GE mean gain | GE mean loss | GE mean neutral | GE mean diff. | CNA prop. gain | CNA prop. loss |
| FDRa |
|---|---|---|---|---|---|---|---|---|
| GNPAT (G) | 0.372 | −1.048 | −0.200 | 0.601 | 0.558 | 0.015 | 3.01E-61 | 3.43E-58 |
| SETDB1 (G) | 0.505 | NA | −0.056 | 0.562 | 0.556 | 0 | 4.55E-58 | 2.60E-55 |
| ANGEL2 (G) | 0.588 | −0.664 | 0.032 | 0.577 | 0.549 | 0.013 | 1.30E-55 | 4.93E-53 |
| GSTM1 (L) | 0.588 | −0.961 | 0.026 | −1.281 | 0.310 | 0.409 | 1.02E-33 | 5.10E-31 |
| TOX (L) | −2.599 | −3.237 | −2.455 | −0.777 | 0.023 | 0.337 | 1.64E-19 | 4.10E-17 |
| LYN (L) | 0.224 | −0.134 | 0.279 | −0.410 | 0.034 | 0.314 | 4.16E-19 | 6.92E-17 |
GE gene expression, CNA copy number alteration, Diff difference, Prop proportion, FDR false discovery rate, NA not available
aGenes were ordered based on the FDR values
Fig. 3Pearson correlation coefficients between GE and CN in the TCGA breast cancer dataset in (a) a Gaussian density plot and (b) a boxplot. Four conditions were evaluated: I) the whole set of genes on the microarray, II) the subset of genes located in the CNA regions, III) the genes identified by the Venn diagram method, and IV) the genes identified by the iGC package (* P < 0.001)
Fig. 4The correlation between GE and CN for the gene GSTM1 in the TCGA breast cancer dataset, presented as (a) a scatter plot and (b) a boxplot. L, CN loss; N, no gain or loss in CN; G, CN gain
Fig. 5Pearson correlation coefficients between GE and CN in the lung adenocarcinoma dataset in (a) a Gaussian density plot and (b) a boxplot. Three conditions were evaluated: I) the whole set of genes on the microarray, II) the subset of genes located in the CNA regions, and III) the genes identified by the iGC package (* P < 0.001). Conditions IV and V were split from condition III, where IV) contained genes with positive correlations between GE and CNA and V) contained genes with negative correlations
The three most significant genes with copy number gain or loss in the lung adenocarcinoma dataset
| Genes | GE mean gain | GE mean loss | GE mean neutral | GE mean diff. | CNA prop. gain | CNA prop. loss |
| FDRa |
|---|---|---|---|---|---|---|---|---|
| EIF1AX (G) | 8.798 | 9.029 | 8.060 | 0.731 | 0.275 | 0.005 | 4.23E-21 | 1.21E-18 |
| RAP2C (G) | 7.599 | NA | 7.093 | 0.505 | 0.285 | 0.000 | 3.33E-12 | 4.78E-10 |
| ALAS2 (G) | 5.765 | NA | 6.213 | −0.448 | 0.347 | 0.000 | 1.64E-11 | 1.18E-09 |
| RPS4Y1 (L) | NA | 6.507 | 9.472 | −2.965 | 0.000 | 0.383 | 6.16E-28 | 1.54E-26 |
| TTTY15 (L) | 5.988 | 4.463 | 4.955 | −0.510 | 0.010 | 0.420 | 1.51E-17 | 1.88E-16 |
| PRKY (L) | 6.408 | 4.800 | 5.154 | −0.363 | 0.005 | 0.358 | 7.81E-17 | 6.51E-16 |
GE gene expression, CNA copy number alteration, Diff difference, Prop proportion, FDR false discovery rate, NA not available
aGenes were ordered based on the FDR values