| Literature DB >> 20470420 |
Inna Chervoneva1, Yanyan Li, Stephanie Schulz, Sean Croker, Chantell Wilson, Scott A Waldman, Terry Hyslop.
Abstract
BACKGROUND: Normalization in real-time qRT-PCR is necessary to compensate for experimental variation. A popular normalization strategy employs reference gene(s), which may introduce additional variability into normalized expression levels due to innate variation (between tissues, individuals, etc). To minimize this innate variability, multiple reference genes are used. Current methods of selecting reference genes make an assumption of independence in their innate variation. This assumption is not always justified, which may lead to selecting a suboptimal set of reference genes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20470420 PMCID: PMC2889935 DOI: 10.1186/1471-2105-11-253
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Breast tumor data: 95% UCL vs. the average overall rank of the normalizing factors. Each point represents one of the possible 63 = 26-1 gene subsets. Different colors are used for the subsets with different numbers of genes included. The x-coordinate is the average overall rank of the corresponding normalizing factor variance. The y-coordinate is the upper 95% confidence limit (95% UCL) for the standard deviation of the log normalizing factor. The red dot, which is closest to the lower left corner, represents the optimal (in the sense of criteria A and C) combination of two genes, ACTB and SF3A1.
Figure 2Neuroblastoma data: 95% UCL vs. the average overall rank of the normalizing factors. Each point represents one of the possible 1023 = 210-1 gene subsets. Different colors are used for the subsets with different numbers of genes included. The x-coordinate is the average overall rank of the corresponding normalizing factor variance. The y-coordinate is the upper 95% confidence limit (95% UCL) for the standard deviation of the log normalizing factor. Only sets with average rank less than 200 are shown on the plot.
Figure 3Blood data: 95% UCL vs. the average overall rank of the normalizing factors (Ct numbers). Each point represents one of the possible 33 = 25-1 gene subsets. Different colors are used for the subsets with different numbers of genes included. The x-coordinate is the average overall rank of the corresponding normalizing factor variance. The y-coordinate is the upper 95% confidence limit (95% UCL) for the standard deviation of the log normalizing factor.
Pearson correlation matrix of the residuals from model (8) fitted to the data from 80 breast tumor samples
| ACTB | GAPDH | MRPL | PSMC4 | PUM | ||
|---|---|---|---|---|---|---|
| GAPDH | Coeff.1 | -0.112 | ||||
| p-value | 0.324 | |||||
| MRPL | Coeff.1 | 0.021 | ||||
| p-value | <.0001 | 0.851 | ||||
| PSMC4 | Coeff.1 | -0.108 | 0.014 | |||
| p-value | 0.028 | 0.340 | 0.903 | |||
| PUM | Coeff.1 | 0.077 | ||||
| p-value | 0.496 | 0.001 | <.0001 | <.0001 | ||
| SF3A1 | Coeff.1 | 0.147 | -0.086 | 0.160 | ||
| p-value | 0.194 | 0.447 | <.0001 | 0.005 | 0.156 |
1Pearson correlation coefficient with p-value testing that it is zero
Breast tumor data: Top ranked by set size bootstrap 95% upper confidence limit (UCL) for the variance and standard deviation of the log geometric mean (GM).
| Set Size(*) | ACTB | GAPDH | MRPL19 | PSMC4 | PUM | SF3A1 | 95% UCL Var(GM) | 95% UCL StdDev(GM) |
|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0.407 | 0.638 |
| 3 | 1 | 0 | 0 | 0 | 1 | 1 | 0.356 | 0.596 |
| 4 | 1 | 1 | 0 | 0 | 1 | 1 | 0.398 | 0.631 |
| 5 | 1 | 1 | 0 | 1 | 1 | 1 | 0.429 | 0.655 |
| 6 | 1 | 1 | 1 | 1 | 1 | 1 | 0.465 | 0.682 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included in the subset and 0 that it is not included.
Breast tumor data: Ten gene subsets with the smallest mean overall ranks of the variance of the log geometric mean (GM).
| Set Size(*) | ACTB | GAPDH | MRPL19 | PSMC4 | PUM | SF3A1 | Mean rank of Var(GM) |
|---|---|---|---|---|---|---|---|
| 3 | 1 | 0 | 0 | 0 | 1 | 1 | 2.0 |
| 2 | 1 | 0 | 0 | 0 | 1 | 0 | 3.2 |
| 2 | 0 | 0 | 0 | 0 | 1 | 1 | 4.7 |
| 4 | 1 | 0 | 0 | 1 | 1 | 1 | 6.1 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 6.4 |
| 3 | 1 | 0 | 0 | 1 | 0 | 1 | 8.1 |
| 1 | 0 | 0 | 0 | 0 | 0 | 1 | 8.8 |
| 4 | 1 | 0 | 1 | 0 | 1 | 1 | 9.2 |
| 4 | 1 | 1 | 0 | 0 | 1 | 1 | 10.3 |
| 3 | 1 | 0 | 1 | 0 | 0 | 1 | 10.6 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included in the subset and 0 that it is not included.
Breast tumor data: Variability of log geometric means based on optimal gene subsets identified by various methods
| Set Size | Method | Optimal set | Variance logGM | Std Dev logGM |
|---|---|---|---|---|
| 2 | Szabo et al | MRPL19, PUM1 | 0.517 | 0.719 |
| 2 | Vandes. et al | MRPL19, PSMC4 | 0.629 | 0.793 |
| 2 | New | ACTB, SF3A1 | ||
| 3 | Szabo et al | MRPL19, PUM1, PSMC4 | 0.531 | 0.729 |
| 3 | Vandes. et al | MRPL19, PUM1, PSMC4 | 0.531 | 0.729 |
| 3 | New | ACTB, SF3A1, PUM1 | 0.327 | 0.572 |
| 4 | Szabo et al 1 | MRPL19, PUM1, PSMC4, SF3A1 | 0.464 | 0.681 |
| 4 | New | ACTB, SF3A1, PUM1, GAPDH | 0.369 | 0.607 |
1Same results using either the method of Vandesompele et al [4] or Szabo et al [9]
Neuroblastoma data: Top ranked by set size bootstrap 95% upper confidence limit (UCL) for the variance and standard deviation of the log geometric mean (GM).
| Set Size(*) | B2M | TBP | UBC | 95% UCL Var(GM) | 95% UCL StdDev(GM) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.458 | 0.677 |
| 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.340 | 0.583 |
| 3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.340 | 0.583 |
| 4 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0.303 | 0.550 |
| 5 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0.299 | 0.547 |
| 6 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0.298 | 0.546 |
| 7 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0.303 | 0.550 |
| 8 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 0 | 1 | 0.317 | 0.563 |
| 9 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0.334 | 0.578 |
| 10 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0.353 | 0.594 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included in the subset and 0 that it is not included
1AC - ACTB; 2GA - GAPDH; 3HM - HMBS; 4HP - HPRT1; 5RP - RPL13A; 6SD - SDHA; 7YW - YWHAZ
Neuroblastoma data: Ten gene subsets with the smallest mean overall ranks of the variance of the log geometric mean (GM).
| Set Size(*) | B2M | TBP | UBC | Mean rank of Var(GM) | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 52.1 |
| 7 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 59.9 |
| 5 | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 63.8 |
| 6 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 76.7 |
| 6 | 1 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 87.5 |
| 5 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 92.2 |
| 6 | 1 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 93.3 |
| 7 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 95.9 |
| 7 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 96.4 |
| 7 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 0 | 1 | 103.0 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included in the subset and 0 that it is not included
1AC - ACTB; 2GA - GAPDH; 3HM - HMBS; 4HP - HPRT1; 5RP - RPL13A; 6SD - SDHA; 7YW - YWHAZ
Neuroblastoma data: Variability of log geometric means based on optimal gene subsets identified by various methods
| Set Size | Method | Optimal set | Variance logGM | Std Dev logGM |
|---|---|---|---|---|
| 2 | Vand | GAPDH, HPRT | 0.327 | 0.572 |
| 2 | Szabo | GAPDH, SDHA | 0.374 | 0.612 |
| 2 | New | GAPDH, YWHAZ | 0.250 | 0.500 |
| 3 | Old1 | GAPDH, HPRT, SDHA | 0.348 | 0.590 |
| 3 | New | ACTB, GAPDH, YWHAZ | 0.255 | 0.505 |
| 4 | Old1 | GAPDH, HPRT, SDHA, UBC | 0.361 | 0.601 |
| 4 | New | ACTB, B2M, GAPDH, TBP | 0.231 | |
| 5 | Old1 | GAPDH, HPRT, SDHA, UBC, HMBS | 0.358 | 0.598 |
| 5 | New | ACTB, B2M, GAPDH, HPRT1, RPL13A | 0.224 | |
| 6 | Old1 | GAPDH, HPRT, SDHA, UBC, HMBS, YWHAZ | 0.319 | 0.565 |
| 6 | New | ACTB, GAPDH, B2M, HPRT1, TBP, YWHAZ | 0.227 |
1Same results using either the method of Vandesompele et al [4] or Szabo et al [9]
Blood data: Top ranked by set size bootstrap 95% upper confidence limit (UCL) for the variance and standard deviation of the log geometric mean (GM) based on log transformed relative expression levels.
| Set Size(*) | ACTB | GAPDH | HPRT1 | PPIB | TFRC | 95% UCL Var(GM) | 95% UCL StdDev(GM) |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 0 | 0 | 0 | 1.09 | |
| 2 | 0 | 1 | 0 | 0 | 1 | 1.25 | 1.12 |
| 3 | 0 | 1 | 1 | 0 | 1 | 1.57 | 1.25 |
| 4 | 0 | 1 | 1 | 1 | 1 | 1.77 | 1.33 |
| 5 | 1 | 1 | 1 | 1 | 1 | 2.06 | 1.43 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included in the subset and 0 that it is not included.
Blood data: Top ranked by set size bootstrap 95% upper confidence limit (UCL) for the variance and standard deviation of the log geometric mean (GM) based on Ct numbers.
| Set Size(*) | ACTB | GAPDH | HPRT1 | PPIB | TFRC | 95% UCL Var(GM) | 95% UCL StdDev(GM) |
|---|---|---|---|---|---|---|---|
| 1 | 0 | 1 | 0 | 0 | 0 | 6.22 | 2.49 |
| 2 | 0 | 1 | 0 | 0 | 1 | 6.06 | 2.46 |
| 3 | 0 | 1 | 1 | 0 | 1 | 6.66 | 2.58 |
| 4 | 1 | 1 | 1 | 0 | 1 | 7.29 | 2.70 |
| 5 | 1 | 1 | 1 | 1 | 1 | 7.91 | 2.81 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included in the subset and 0 that it is not included.
Blood data: Ten gene subsets with the smallest mean overall ranks of the variance of the log geometric mean (GM) based on log transformed relative expression levels.
| Set Size(*) | ACTB | GAPDH | HPRT1 | PPIB | TFRC | Mean rank of Var(GM) |
|---|---|---|---|---|---|---|
| 2 | 0 | 1 | 0 | 0 | 1 | 1.5 |
| 1 | 0 | 1 | 0 | 0 | 0 | 2.8 |
| 3 | 0 | 1 | 1 | 0 | 1 | 3.7 |
| 3 | 1 | 1 | 0 | 0 | 1 | 5.3 |
| 2 | 0 | 1 | 1 | 0 | 0 | 5.7 |
| 1 | 0 | 0 | 0 | 0 | 1 | 7.0 |
| 4 | 1 | 1 | 1 | 0 | 1 | 8.1 |
| 3 | 0 | 1 | 0 | 1 | 1 | 8.2 |
| 2 | 1 | 1 | 0 | 0 | 0 | 8.8 |
| 4 | 0 | 1 | 1 | 1 | 1 | 10.7 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included
in the subset and 0 that it is not included.
Blood data: Ten gene subsets with the smallest mean overall ranks of the variance of the log geometric mean (GM) based on Ct numbers.
| Set Size(*) | ACTB | GAPDH | HPRT1 | PPIB | TFRC | Mean rank of Var(GM) |
|---|---|---|---|---|---|---|
| 2 | 0 | 1 | 0 | 0 | 1 | 1.7 |
| 1 | 0 | 1 | 0 | 0 | 0 | 2.1 |
| 3 | 0 | 1 | 1 | 0 | 1 | 3.5 |
| 2 | 0 | 1 | 1 | 0 | 0 | 4.4 |
| 1 | 0 | 0 | 0 | 0 | 1 | 5.3 |
| 3 | 0 | 1 | 0 | 1 | 1 | 5.6 |
| 4 | 0 | 1 | 1 | 1 | 1 | 7.3 |
| 2 | 0 | 1 | 0 | 1 | 0 | 8.3 |
| 3 | 0 | 1 | 1 | 1 | 0 | 9.6 |
| 2 | 0 | 0 | 1 | 0 | 1 | 10.4 |
(*) in the column with gene name, 1 indicates that the corresponding gene is included in the subset and 0 that it is not included.
Figure 4Blood data: 95% UCL vs. the average overall rank of the normalizing factors (expression levels). Each point represents one of the possible 33 = 25-1 gene subsets. Different colors are used for the subsets with different numbers of genes included. The x-coordinate is the average overall rank of the corresponding normalizing factor variance. The y-coordinate is the upper 95% confidence limit (95% UCL) for the standard deviation of the log normalizing factor.
Blood data: Variability of log geometric means based on optimal gene subsets identified by various methods
| Set Size | Method | Optimal set | Variance logGM | Std Dev logGM |
|---|---|---|---|---|
| 2 | Szabo et al | TFRC, GAPDH | 0.98 | 0.99 |
| 2 | Vandes. et al | TFRC, HPRT | 1.47 | 1.21 |
| 2 | New | TFRC, GAPDH | 0.99 | |
| 3 | Szabo et al | TFRC, GAPDH, PPIB | 1.26 | 1.12 |
| 3 | Vandes. et al | TFRC, GAPDH, PPIB | 1.62 | 1.27 |
| 3 | New | TFRC, GAPDH, HPRT | 1.16 | 1.08 |
| 4 | All methods | GAPDH, PPIBA, TFRC, HPRT | 1.34 | 1.16 |
Design of the simulation study
| Scenario | Std Dev | Correlation Matrix of R | Total Covariance Matrix V | No Genes | True | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.30 | 1 | 0 | 0 | 0 | 0 | 0.25 | 0.16 | 0.16 | 0.16 | 0.16 | 1 | 0.250 | 0.250 | |
| Uncorrelated R | 0.35 | 0 | 1 | 0 | 0 | 0 | 0.16 | 0.28 | 0.16 | 0.16 | 0.16 | |||
| Sample Random | 0.80 | 0 | 0 | 1 | 0 | 0 | 0.16 | 0.16 | 0.80 | 0.16 | 0.16 | 3 | 0.255 | 0.148 |
| Effect Var = 0.16 | 0.90 | 0 | 0 | 0 | 1 | 0 | 0.16 | 0.16 | 0.16 | 0.97 | 0.16 | 4 | 0.264 | 0.144 |
| 1.00 | 0 | 0 | 0 | 0 | 1 | 0.16 | 0.16 | 0.16 | 0.16 | 1.16 | 5 | 0.267 | 0.139 | |
| 0.60 | 1 | 0.2 | 0.2 | 0.2 | 0.2 | 0.38 | 0.10 | 0.11 | 0.15 | 0.16 | 1 | 0.380 | 0.380 | |
| Corr Coef = 0.2 | 0.70 | 0.2 | 1 | 0.2 | 0.2 | 0.2 | 0.10 | 0.51 | 0.13 | 0.17 | 0.19 | 2 | 0.275 | 0.223 |
| Sample Random | 0.75 | 0.2 | 0.2 | 1 | 0.2 | 0.2 | 0.11 | 0.13 | 0.58 | 0.19 | 0.20 | |||
| Effect Var = 0.02 | 1.10 | 0.2 | 0.2 | 0.2 | 1 | 0.2 | 0.15 | 0.17 | 0.19 | 1.23 | 0.28 | 4 | 0.275 | 0.169 |
| 1.20 | 0.2 | 0.2 | 0.2 | 0.2 | 1 | 0.16 | 0.19 | 0.20 | 0.28 | 1.46 | 5 | 0.301 | 0.167 | |
| 0.42 | 1 | -0.2 | -0.2 | 0.2 | 0.2 | 0.28 | 0.06 | 0.06 | 0.15 | 0.15 | 1 | 0.276 | 0.276 | |
| Corr Coef = ±0.2 | 0.45 | -0.2 | 1 | -0.2 | 0.2 | 0.2 | 0.06 | 0.30 | 0.06 | 0.15 | 0.15 | 2 | 0.176 | 0.145 |
| Sample Random | 0.48 | -0.2 | -0.2 | 1 | 0.2 | 0.2 | 0.06 | 0.06 | 0.33 | 0.16 | 0.16 | 0.101 | ||
| Effect Var = 0.1 | 0.60 | 0.2 | 0.2 | 0.2 | 1 | 0.2 | 0.15 | 0.15 | 0.16 | 0.46 | 0.17 | 4 | 0.166 | 0.086 |
| 0.60 | 0.2 | 0.2 | 0.2 | 0.2 | 1 | 0.15 | 0.15 | 0.16 | 0.17 | 0.46 | 5 | 0.175 | ||
| 0.30 | 1 | -0.4 | 0.0 | 0.0 | 0.0 | 0.25 | 0.11 | 0.16 | 0.16 | 0.16 | 1 | 0.250 | 0.090 | |
| Corr Coef = ±0.4 | 0.40 | -0.4 | 1 | 0.0 | 0.0 | 0.0 | 0.11 | 0.32 | 0.16 | 0.16 | 0.16 | |||
| Sample Random | 0.60 | 0.0 | 0.0 | 1 | 0.0 | 0.0 | 0.16 | 0.16 | 0.52 | 0.16 | 0.16 | 3 | 0.217 | 0.068 |
| Effect Var = 0.16 | 0.70 | 0.0 | 0.0 | 0.0 | 1 | 0.4 | 0.16 | 0.16 | 0.16 | 0.65 | 0.38 | 4 | 0.223 | 0.069 |
| 0.80 | 0.0 | 0.0 | 0.0 | 0.4 | 1 | 0.16 | 0.16 | 0.16 | 0.38 | 0.80 | 5 | 0.244 | 0.070 | |
| 0.40 | 1 | 0.4 | 0.4 | 0.4 | 0.4 | 0.26 | 0.18 | 0.21 | 0.23 | 0.24 | 1 | 0.260 | 0.260 | |
| Corr Coef = 0.4 | 0.50 | 0.4 | 1 | 0.4 | 0.4 | 0.4 | 0.18 | 0.35 | 0.24 | 0.26 | 0.28 | 0.153 | ||
| Sample Random | 0.70 | 0.4 | 0.4 | 1 | 0.4 | 0.4 | 0.21 | 0.24 | 0.59 | 0.32 | 0.35 | 3 | 0.274 | 0.133 |
| Effect Var = 0.1 | 0.80 | 0.4 | 0.4 | 0.4 | 1 | 0.4 | 0.23 | 0.26 | 0.32 | 0.74 | 0.39 | 4 | 0.302 | 0.121 |
| 0.90 | 0.4 | 0.4 | 0.4 | 0.4 | 1 | 0.24 | 0.28 | 0.35 | 0.39 | 0.91 | 5 | 0.331 | ||
1NF - normalizing factor
2Assuming Szabo et al [9] model
Results of the simulation study
| Sensitivity to optimal subset | ||||
|---|---|---|---|---|
| Scenario | No of samples | |||
| Uncorrelated R | 25 | 43.00 | 41.75 | 60.25 |
| Sample Random | 40 | 53.50 | 53.25 | 73.50 |
| Effect Var = 0.16 | 80 | 81.25 | 81.50 | 86.25 |
| All Corr Coef = 0.2 | 25 | 34.25 | 36.50 | 38.75 |
| Sample Random | 40 | 53.25 | 55.25 | 46.25 |
| Effect Var = 0.02 | 80 | 75.75 | 74.50 | 57.25 |
| Corr Coef = ±0.2 | 25 | 48.50 | 55.00 | 0.00 |
| Sample Random | 40 | 68.00 | 72.75 | 0.25 |
| Effect Var = 0.1 | 80 | 91.50 | 93.50 | 0.00 |
| Corr Coef = ±0.4 | 25 | 36.50 | 31.50 | 8.50 |
| Sample Random | 40 | 49.25 | 42.75 | 7.25 |
| Effect Var = 0.16 | 80 | 63.50 | 60.25 | 3.75 |
| All Corr Coef = 0.4 | 25 | 37.5 | 40.8 | 23.3 |
| Sample Random | 40 | 49.8 | 51.3 | 21.0 |
| Effect Var = 0.1 | 80 | 68.0 | 71.5 | 21.3 |
1Criterion (A) (minimum 95% upper confidence limit for standard deviation of the normalizing factor)
2Criterion (C) (minimum average rank of the normalizing factor variance)
3Minimum standard deviation of the normalizing factor variance as in Szabo et al [9]