| Literature DB >> 26556852 |
Robin Li1,2, Xiao Lin1, Haijiang Geng1, Zhihui Li1, Jiabing Li1, Tao Lu1,2, Fangrong Yan1,2.
Abstract
BACKGROUND: Personalized cancer treatments depend on the determination of a patient's genetic status according to known genetic profiles for which targeted treatments exist. Such genetic profiles must be scientifically validated before they is applied to general patient population. Reproducibility of findings that support such genetic profiles is a fundamental challenge in validation studies. The percentage of overlapping genes (POG) criterion and derivative methods produce unstable and misleading results. Furthermore, in a complex disease, comparisons between different tumor subtypes can produce high POG scores that do not capture the consistencies in the functions.Entities:
Keywords: cancer genomics; gene expression; overlapping genes; pagerank; reproducibility
Mesh:
Substances:
Year: 2015 PMID: 26556852 PMCID: PMC4792587 DOI: 10.18632/oncotarget.5987
Source DB: PubMed Journal: Oncotarget ISSN: 1949-2553
POG scores of 12 experiments using 8 datasets
| Platform: GPL6244 | Platform: GPL570 | ||||||
|---|---|---|---|---|---|---|---|
| Experiment | GEO accession | Experiment | GEO accession | ||||
| Experiment 1.1 | 1. GSE36295 | 0.32 | 0.36 | Experiment 2.1 | 1. GSE18842 | 0.39 | 0.38 |
| Experiment 1.2 | 1. GSE25401 | 0.08 | 0.09 | Experiment 2.2 | 1. GSE30999 | 0.06 | 0.06 |
| Experiment 1.3 | 1. GSE25401 | 0.09 | 0.08 | Experiment 2.3 | 1. GSE18842 | 0.06 | 0.07 |
| Experiment 1.4 | 1. GSE25401 | 0.10 | 0.10 | Experiment 2.4 | 1. GSE19804 | 0.04 | 0.04 |
| Experiment 1.5 | 1. GSE28686 | 0.06 | 0.05 | Experiment 2.5 | 1. GSE18842 | 0.12 | 0.12 |
| Experiment 1.6 | 1. GSE28686 | 0.06 | 0.06 | Experiment 2.6 | 1. GSE19804 | 0.04 | 0.04 |
POG12 score represents the reproducibility of a DEG list detected in dataset 2 when evaluating it in dataset 1
POG21 score represents the reproducibility of a DEG list detected in dataset 1 when evaluating it in dataset 2
P-value of rank-sum of overlapping genes (PRSOG) of all 12 experiments when correlation coefficient is 0.7
| Experiment | Mean POG | RSOG | RSOG Distribution | PRSOG | Experiment | Mean POG | RSOG | RSOG Distribution | PRSOG | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mean | SD | Mean | SD | ||||||||
| 1.1 | 0.34 | 0.26 | 0.21 | 0.0070 | 1.11 × 10−16 | 2.1 | 0.38 | 0.23 | 0.24 | 0.0067 | 0.88 |
| 1.2 | 0.085 | 0.047 | 0.046 | 0.0009 | 0.19 | 2.2 | 0.060 | 0.030 | 0.033 | 0.0022 | 0.84 |
| 1.3 | 0.085 | 0.045 | 0.046 | 0.0005 | 0.94 | 2.3 | 0.065 | 0.031 | 0.034 | 0.0024 | 0.83 |
| 1.4 | 0.100 | 0.051 | 0.053 | 0.0004 | 0.99 | 2.4 | 0.040 | 0.019 | 0.019 | 0.0020 | 0.58 |
| 1.5 | 0.055 | 0.030 | 0.029 | 0.0027 | 0.47 | 2.5 | 0.120 | 0.053 | 0.063 | 0.0035 | 0.99 |
| 1.6 | 0.060 | 0.027 | 0.030 | 0.0031 | 0.81 | 2.6 | 0.040 | 0.015 | 0.019 | 0.0020 | 0.95 |
P-value of rank-sum of overlapping genes (PRSOG) of experiments 1.1 and 2.1, with different correlation coefficients
| Experiment 1.1: breast cancer | Experiment 2.1: different subtypes of non-small cell lung cancer | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Correlation Coefficient | Mean POG | RSOG | RSOG Distribution | PRSOG | Correlation Coefficient | Mean POG | RSOG | RSOG Distribution | PRSOG | ||
| Mean | SD | Mean | SD | ||||||||
| 0.5 | 0.34 | 0.24 | 0.21 | 0.0046 | 2.22 × 10−16 | 0.5 | 0.38 | 0.24 | 0.24 | 0.0043 | 0.81 |
| 0.6 | 0.34 | 0.26 | 0.21 | 0.0056 | 0 | 0.6 | 0.38 | 0.24 | 0.24 | 0.0053 | 0.79 |
| 0.7 | 0.34 | 0.26 | 0.21 | 0.0070 | 1.11 × 10−16 | 0.7 | 0.38 | 0.23 | 0.24 | 0.0067 | 0.88 |
| 0.8 | 0.34 | 0.28 | 0.21 | 0.0097 | 3.32 × 10−13 | 0.8 | 0.38 | 0.22 | 0.24 | 0.0093 | 0.95 |
| 0.9 | 0.34 | 0.32 | 0.21 | 0.0271 | 1.05 × 10−5 | 0.9 | 0.38 | 0.18 | 0.24 | 0.0268 | 0.98 |
Figure 1The effect of correlation coefficient on RSOG, mean of RSOG distribution, standard deviation of RSOG distribution and PRSOG
The x-axis is the correlation coefficient from 0.5 to 0.9 by increments of 0.1; the y-axis is either the RSOG, mean of RSOG distribution, standard deviation of RSOG distribution or PRSOG of the 12 experiments.
Results of fitting power-law, log-normal, and exponential distributions with correlation coefficient 0.7
| Experiment | Power-law distribution | Log-normal distribution | Exponential distribution | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Xmin | Parameter | K-S | Xmin | Parameter 1 | Parameter 2 | K-S | Xmin | Parameter | KS | |
| 1.1 | 0.00141 | 65.0 | 0.067 | 0.00134 | −6.587 | 0.0247 | 0.037 | 0.00144 | 0.000742 | 0.667 |
| 1.2 | 0.00066 | 192.0 | 0.054 | 0.00066 | −7.320 | 0.0083 | 0.032 | 0.00069 | 0.000625 | 0.667 |
| 1.3 | 0.00059 | 526.0 | 0.093 | 0.00058 | −7.439 | 0.0051 | 0.084 | 0.00061 | 0.000577 | 0.667 |
| 1.4 | 0.00062 | 620.0 | 0.122 | 0.00062 | −7.381 | 0.0041 | 0.101 | 0.00063 | 0.000615 | 0.667 |
| 1.5 | 0.00115 | 99.9 | 0.086 | 0.00093 | −6.870 | 0.0700 | 0.053 | 0.00118 | 0.000648 | 0.667 |
| 1.6 | 0.00141 | 65.0 | 0.067 | 0.00134 | −6.587 | 0.0247 | 0.037 | 0.00144 | 0.000742 | 0.667 |
| 2.1 | 0.00141 | 19.0 | 0.082 | 0.00125 | −6.631 | 0.0962 | 0.029 | 0.00162 | 0.000693 | 0.667 |
| 2.2 | 0.00099 | 53.4 | 0.075 | 0.00097 | −6.922 | 0.0268 | 0.029 | 0.00105 | 0.000619 | 0.667 |
| 2.3 | 0.00089 | 177.0 | 0.056 | 0.00087 | −7.033 | 0.0117 | 0.048 | 0.00092 | 0.000683 | 0.667 |
| 2.4 | 0.00095 | 141.0 | 0.062 | 0.00094 | −6.963 | 0.0114 | 0.041 | 0.00098 | 0.000664 | 0.667 |
| 2.5 | 0.00121 | 36.2 | 0.089 | 0.00113 | −6.750 | 0.0500 | 0.038 | 0.00129 | 0.000631 | 0.667 |
| 2.6 | 0.00117 | 47.5 | 0.065 | 0.00097 | −6.859 | 0.0726 | 0.0009 | 0.00123 | 0.000607 | 0.667 |
K-S = p-value of Kolmogorov-Smirnov test, which is commonly used to compare a sample with a reference probability distribution or two samples;
p-value of K-S test has statistical significance of 0.05;
p-value of K-S test has statistical significance of 0.01.
Figure 2Mean and standard deviation of p-value in 10,000 resampling procedures of 20 genes selected randomly from experiments 1.1 and 2.1, using correlation coefficient 0.7
Figure 3Components of overlapping genes in 12 experiments, with correlation coefficient 0.7; comparing experiments 1.1 and 2.1
The summarized information of datasets from GEO
| GEO ID | Disease | Tissue | Samples Size | Platform ID |
|---|---|---|---|---|
| GSE36295 | Breast cancer | Breast tissues | 53 | GPL6244 |
| GSE39004 | Breast cancer | Breast tissues | 180 | GPL6244 |
| GSE25401 | Human obesity | Adipose tissue | 56 | GPL6244 |
| GSE28686 | Illicit methcathinone | Blood tissue | 40 | GPL6244 |
| GSE18842 | Lung cancer | Lung tissue | 91 | GPL570 |
| GSE19804 | Lung cancer | Lung tissue | 120 | GPL570 |
| GSE30999 | Psoriasis | Skin biopsy | 170 | GPL570 |
| GSE19743 | Burn injury | Blood sample | 177 | GPL570 |
Figure 4The PRSOG process in non-small cell lung cancer
a. Two lung pictures represent two studies in non-small cell lung cancer by different labs; our experiment 2.1 assesses the reproducibility of these two studies. b. The RNA expression data of the two studies uses the same platform to ensure the same gene background. c. The significant genes (empty circles) in each dataset are calculated by SAM, controlling the quantity in 1000. d. Blue circles are reproducible genes among significant genes found in both studies. e. Building the network of this gene pool by correlation coefficient. f. Calculating the rank of every gene by PageRank; a warmer color indicates a more important role in the network. g. Assuming k overlapping genes in the gene pool, we resample k genes in the gene pool to build the distribution of k genes and then calculate the p-value of the rank sum of theses k overlapping genes. h. Classifying genes in the gene pool into three kinds by rank value.
Definitions of power-law, exponential and log-normal distributions
| Name | Distribution | |
|---|---|---|
| f(x) | C | |
| Power-law | ||
| Exponential | ||
| Log-normal | ||