| Literature DB >> 16623940 |
Lesley Jones1, Darlene R Goldstein, Gareth Hughes, Andrew D Strand, Francois Collin, Stephen B Dunnett, Charles Kooperberg, Aaron Aragaki, James M Olson, Sarah J Augood, Richard L M Faull, Ruth Luthi-Carter, Valentina Moskvina, Angela K Hodges.
Abstract
BACKGROUND: Gene expression microarray experiments are expensive to conduct and guidelines for acceptable quality control at intermediate steps before and after the samples are hybridised to chips are vague. We conducted an experiment hybridising RNA from human brain to 117 U133A Affymetrix GeneChips and used these data to explore the relationship between 4 pre-chip variables and 22 post-chip outcomes and quality control measures.Entities:
Mesh:
Year: 2006 PMID: 16623940 PMCID: PMC1524996 DOI: 10.1186/1471-2105-7-211
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Total RNA Gel images from the Bioanalyser (Agilent). Representative total RNA samples of varying quality, as assessed objectively by RIN and by subjective assessment. Selected corresponding pre- and post-chip variable assessments are also shown. Samples identified as outliers on post-chip quality control measures but not excluded are labelled (*); outliers excluded from expression analysis are labelled (**). Sample from case with prolonged agonal state (†). Sample not run on arrays due to poor quality total RNA (‡). n.d. is value not determined. CB, cerebellum; CN, caudate nucleus; MC, motor cortex.
Pre- and post-chip variables assessed in the current study on U133A arrays, with indications of the recommended acceptable range (where available) and summary statistics determined for samples in the current study. * None of the chips had significant spatial artifacts of hybridisation (e.g. scratches, uneven hybridisation) as assessed by visual inspection of the chip following hybridisation and by assessment of array and single outlier status across the chips. MAD is the median absolute deviation.
| Variable | Variable (abbreviated) | Recommended acceptable range | Summary statistics (current experiment) | |||
| Median | MAD | Range | ||||
| Pre-chip variables | ||||||
| Post mortem interval hours | PMI | empirical | 12 | 3 | 4 | 29 |
| Subjective RNA quality 4 point scale | SUBQUAL | empirical | 3 | 1 | 1 | 4 |
| RNA Integrity Number (RIN) 0 to 10 | RIN | empirical | 7.8 | 0.9 | 4.3 | 9.5 |
| Adjusted cRNA yield (μg) | YIELD | >15 μg | 57 | 7 | 30 | 83 |
| Post-chip variables: assessed within MAS 5.0 (Affymetrix) | ||||||
| Background | BG | < 100 | 101.3 | 11.99 | 72.32 | 438.41 |
| Noise (Raw Q) | RAWQ | 1.5 – 3.0 | 3.73 | 0.35 | 2.83 | 9.52 |
| Noise | NOISE | empirical | 5.76 | 0.82 | 3.85 | 38.2 |
| %P | PC_PRESENT | 25 – 50% | 47 | 2 | 24 | 53 |
| Scaling Factor | SF | < 10 | 2.18 | 0.385 | 1.08 | 7.68 |
| 3'/5' ratio of β-Actin | B_ACTIN35 | < 3 | 1.6 | 0.26 | 0.98 | 55.87 |
| 3'/5' ratio of GAPDH | GAPDH35 | < 1.25 | 1.2 | 0.16 | 0.77 | 31.62 |
| Post-chip variables: assessed within dChip [9] | ||||||
| PM/MM difference array outlier algorithm* | DCHIP_AR_OUTLIER | < 5% | 0.37 | 0.2 | 0.04 | 47.59 |
| PM/MM difference single outlier algorithm* | DCHIP_SING_OUTLIER | < 5% | 0.09 | 0.03 | 0.04 | 2.3 |
| %P | DCHIP_PCCALL | 25 – 50% | 58.4 | 1.6 | 36.2 | 63.4 |
| Median intensity | MEDINT | empirical | 358 | 43.5 | 231 | 1620 |
| Post-chip variables: assessed within Affy, Bioconductor, developed by Collin, 2005 [13] | ||||||
| Median NUSE | MED_NUSE | < 1.05 | 1.01 | 0.01 | 0.98 | 1.26 |
| IQR.LR1 | IQR_LR1 | empirical | 0.23 | 0.03 | 0.15 | 1.04 |
| B.LR1 | B_LR1 | empirical | 0.005 | 0.02 | -0.11 | 0.24 |
| IQRplusAbsB.LR1 | IQRplusAbsB_LR1 | empirical | 0.24 | 0.04 | 0.15 | 1.13 |
| CV.LR1 | CV_LR1 | empirical | 6.41 | 3.9 | 0.01 | 35.91 |
| IQR.LR2 | IQR_LR2 | empirical | 0.18 | 0.03 | 0.11 | 0.98 |
| B.LR2 | B_LR2 | empirical | 0 | 0.01 | -0.1 | 0.22 |
| IQRplusAbsB.LR2 | IQRplusAbsB_LR2 | empirical | 0.2 | 0.035 | 0.12 | 1.02 |
| CV.LR2 | CV_LR2 | empirical | 3.97 | 3.225 | 0 | 33.8 |
| Post-chip variables: assessed within Affy, Bioconductor [13] | ||||||
| RNA degradation plot (gradient) | RNADEG_SLOPE | empirical | 2.27 | 0.27 | 0.86 | 3.96 |
| RNA degradation plot (P- value from the linear regression) | PVAL_SLOPE | empirical | 2.67E-08 | 2.60e-8 | 4.21e-12 | 0.0049 |
Correlations between pre-chip variables. Matrix of correlations (over the main diagonal) and p-values for difference from 0 (under the main diagonal) between pre-chip variables.
| YIELD | PMI | RIN | SUBQUAL | |
| YIELD | * | -0.22 | 0.29 | 0.34 |
| PM | 0.018 | * | -0.19 | -0.21 |
| RIN | 0.002 | 0.043 | * | 0.71 |
| SUBQUAL | 0.0003 | 0.025 | 0.000001 | * |
Pre- and post chip variables for samples flagged as potential outliers on at least one post-chip quality control measure. Samples were identified as outliers as judged empirically within the experiment and detailed in the results. Samples (*) were excluded from expression analysis. Samples from cases with prolonged agonal state (†). CB, cerebellum; CN, caudate nucleus; MC, motor cortex.
| Pre-chip variables | |||||||||
| Sample | PMI | SUBQUAL | RIN | YIELD | |||||
| H104 CB | 14 | 3 | 7.4 | 77 | |||||
| H122 CB*† | 9 | 2 | 8.1 | 68 | |||||
| H85 CB*† | 10 | 3 | 7.4 | 65 | |||||
| HC79 CB*† | 4 | 4 | 8.1 | 44 | |||||
| H123 CN* | 7.5 | 3 | 7.6 | 66 | |||||
| HC61 CN | 6 | 3 | 7.1 | 59 | |||||
| H131 MC* | 13 | 4 | 9.1 | 48 | |||||
| HC71 MC* | 5 | 1 | 4.3 | 36 | |||||
| HC52 MC | 23 | 1 | 6.5 | 56 | |||||
| HC55 MC | 20 | 1 | 6.6 | 44 | |||||
| Post-chip variables: assessed within MAS 5.0 (Affymetrix) | |||||||||
| BG | RAWQ | NOISE | PC_PRESENT | SF | B_ACTIN35 | GAPDH35 | |||
| H104 CB | 271 | 7.5 | 13.9 | 42 | 2.2 | 1.7 | 1.2 | ||
| H122 CB*† | 438 | 9.5 | 34.8 | 36 | 1.9 | 1.7 | 1.0 | ||
| H85 CB*† | 92 | 3.6 | 5.7 | 49 | 2.1 | 1.8 | 1.1 | ||
| HC79 CB*† | 87 | 3.4 | 5.5 | 45 | 1.7 | 55.9 | 31.6 | ||
| H123 CN* | 103 | 3.7 | 7.6 | 30 | 4.1 | 2.4 | 0.8 | ||
| HC61 CN | 119 | 4.0 | 5.2 | 43 | 4.0 | 1.5 | 1.2 | ||
| H131 MC* | 250 | 8.4 | 38.2 | 24 | 1.1 | 1.6 | 1.1 | ||
| HC71 MC* | 107 | 3.6 | 4.7 | 26 | 7.7 | 4.0 | 4.1 | ||
| HC52 MC | 122 | 4.2 | 6.0 | 39 | 3.6 | 3.9 | 2.1 | ||
| HC55 MC | 98 | 3.4 | 4.7 | 41 | 3.4 | 2.6 | 2.0 | ||
| Post-chip variables: assessed within dChip (Li and Wong, 2001) | |||||||||
| DCHIP_AR_OUTLIER | DCHIP_SING_OUTLIER | DCHIP_PCCALL | MEDINT | ||||||
| H104 CB | 2.71 | 0.61 | 55 | 597 | |||||
| H122 CB*† | 14.72 | 1.85 | 41 | 1199 | |||||
| H85 CB*† | 6.92 | 0.40 | 61 | 358 | |||||
| HC79 CB*† | 12.75 | 0.38 | 55 | 327 | |||||
| H123 CN* | 30.48 | 1.72 | 42 | 369 | |||||
| HC61 CN | 4.05 | 0.58 | 57 | 287 | |||||
| H131 MC* | 47.59 | 2.30 | 36 | 1620 | |||||
| HC71 MC* | 19.66 | 1.23 | 39 | 266 | |||||
| HC52 MC | 1.70 | 0.20 | 52 | 331 | |||||
| HC55 MC | 3.19 | 0.29 | 53 | 294 | |||||
| Post-chip variables: assessed within Bioconductor packages affy, affyPLM and specialised code [15, 22] | |||||||||
| MED_NUSE | IQR_LR1 | B_LR1 | IQRplusAbsB_LR1 | CV_LR1 | IQR_LR2 | B_LR2 | IQRplusAbsB_LR2 | CV_LR2 | |
| H104 CB | 1.06 | 0.27 | -0.048 | 0.32 | 17.3 | 0.27 | -0.029 | 0.30 | 10.7 |
| H122 CB*† | 1.24 | 0.66 | -0.105 | 0.76 | 16.0 | 0.64 | -0.098 | 0.74 | 15.2 |
| H85 CB*† | 1.08 | 0.42 | 0.008 | 0.43 | 1.9 | 0.38 | 0.014 | 0.40 | 3.7 |
| HC79 CB*† | 1.12 | 0.56 | -0.001 | 0.56 | 0.2 | 0.56 | 0.000 | 0.56 | 0.1 |
| H123 CN* | 1.26 | 0.67 | 0.241 | 0.91 | 35.9 | 0.65 | 0.220 | 0.87 | 33.8 |
| HC61 CN | 1.06 | 0.30 | 0.025 | 0.33 | 8.4 | 0.25 | 0.000 | 0.25 | 0.0 |
| H131 MC* | 1.26 | 1.04 | -0.050 | 1.09 | 4.8 | 0.98 | -0.040 | 1.02 | 4.1 |
| HC71 MC* | 1.2 | 0.96 | 0.165 | 1.13 | 17.2 | 0.87 | 0.150 | 1.02 | 17.3 |
| HC52 MC | 1.06 | 0.43 | 0.063 | 0.49 | 14.5 | 0.34 | 0.039 | 0.38 | 11.3 |
| HC55 MC | 1.07 | 0.42 | 0.039 | 0.46 | 9.4 | 0.34 | 0.015 | 0.35 | 4.4 |
| Post-chip variables: assessed within affy, Bioconductor | |||||||||
| RNADEG_SLOPE | PVAL_SLOPE | ||||||||
| H104 CB | 2.40 | 3.8 × 10-9 | |||||||
| H122 CB*† | 0.86 | 4.9 × 10-3 | |||||||
| H85 CB*† | 2.22 | 1.2 × 10-7 | |||||||
| HC79 CB*† | 3.96 | 1.0 × 10-9 | |||||||
| H123 CN* | 1.61 | 3.2 × 10-7 | |||||||
| HC61 CN | 2.40 | 9.7 × 10-9 | |||||||
| H131 MC* | 1.17 | 2.2 × 10-5 | |||||||
| HC71 MC* | 2.67 | 6.0 × 10-10 | |||||||
| HC52 MC | 3.61 | 4.2 × 10-12 | |||||||
| HC55 MC | 3.20 | 3.2 × 10-11 | |||||||
Four main principal components explain ~75% of the variance in post-GeneChip QC measures)
| Component | % variance explained | Cumulative variance explained |
| 1 | 31.6 | 31.6 |
| 2 | 17.8 | 49.4 |
| 3 | 13.4 | 62.8 |
| 4 | 12.1 | 74.9 |
Figure 2Pairwise scatterplots for the first four principal components. Outlier chips (Table 3) are represented by blue triangles.
Rotated component matrix for post-chip variables (U133A arrays, all samples)
| QC measure | Component | |||
| 1 | 2 | 3 | 4 | |
| IQR_LR1 | 0.934 | |||
| IQR_LR2 | 0.928 | |||
| IQRplAbsB_LR1 | 0.936 | |||
| IQRplAbsB_LR2 | 0.938 | |||
| DCHIP_AR_OUTLIER | 0.771 | |||
| MED_NUSE | 0.769 | |||
| DCHIP_SING_OUTLIER | 0.557 | |||
| CV_LR1 | 0.418 | |||
| CV_LR2 | 0.327 | |||
| B_LR1 | 0.905 | |||
| B_LR2 | 0.855 | |||
| SF | 0.734 | |||
| DCHIP_PCCALL | -0.848 | |||
| PC_PRESENT | -0.866 | |||
| NOISE | 0.959 | |||
| RAWQ | 0.934 | |||
| BG | 0.928 | |||
| MEDINT | 0.874 | |||
| B_ACTIN35 | 0.960 | |||
| GAPDH35 | 0.946 | |||
| PVAL_SLOPE | 0.789 | |||
| RNADEG_SL | 0.427 | |||
Values less than 0.4 are not shown unless they are the largest for the corresponding variable
Figure 3Pairwise scatterplots of PC1 vs IQR_LR1. Outlier chips (Table 3) are represented by blue triangles.
Canonical correlation analysis to explore the relationships between two sets: a set of 4 pre-chip and a set of 22 post-chip variables
| Canonical Correlation | Approximate Standard Error | Significance | |
| 1 | 0.838 | 0.028 | < 0.0001 |
| 2 | 0.593 | 0.061 | 0.012 |
| 3 | 0.500 | 0.072 | 0.154 |
| 4 | 0.454 | 0.075 | 0.263 |
Canonical factor loadings of the pre-chip variables corresponding to the first and the second canonical correlations
| Pre-chip QC | Canonical variables | |
| Pre-chip1 | Pre-chip2 | |
| YIELD | -0.526 | 0.363 |
| PMI | 0.140 | 0.112 |
| RIN | -0.934 | 0.099 |
| SUBQUAL | -0.847 | -0.521 |
Canonical factor loadings of the post-chip variables corresponding to the first and the second canonical correlations
| Post-chip QC | Canonical variables | |
| Post-chip1 | Post-chip2 | |
| RNADEG_SL | 0.845 | |
| B_ACTIN35 | 0.532 | 0.280 |
| GAPDH35 | 0.513 | 0.280 |
| SF | 0.497 | |
| B_LR1 | 0.364 | |
| B_LR2 | 0.345 | |
| IQR_LR1 | 0.289 | |
| IQR_LR2 | 0.366 | |
| IQRplAbsB_LR1 | 0.288 | |
| IQRplAbsB_LR2 | 0.343 | |
| MED_NUSE | 0.337 | |
| DCHIP_AR_OUTL | 0.337 | |
| DCHIP_SING_OUT | 0.313 | |
| DCHIP_PCCALL | -0.381 | |
| MEDINT | -0.377 | |
| PC_PRESENT | -0.339 | |
| CV_LR1 | ||
| CV_LR2 | 0.376 | |
| BG | 0.336 | |
| NOISE | -0.284 | 0.275 |
| RAWQ | -0.287 | 0.303 |
| PVAL_SLOPE | 0.538 | |
Values less than 0.25 are not shown
Figure 4Pairwise scatterplots showing RIN plotted against (A) B_ACTIN, (B) GAPDH and (C) SF. Outlier chips (Table 3) are represented by blue triangles.
Figure 5The effect of including poor quality chips in analyses on ability to detect differential gene expression. Fewer differentially expressed genes are detected when comparing male and female motor cortex if a chip that failed QC is included in the analysis, reflected in the at least 50% fewer probe sets detected as differentially expressed at the two different p-value thresholds (t-test nominal unadjusted p-values). Bad (B) indicates comparisons where one chip that failed QC (HC71: female) was included in the analysis; Good (G) indicates comparisons where all chips passed QC. Samples were matched for age. This effect is most marked with very small chip numbers and gradually becomes less as chip numbers increase.
The number of samples included at various steps in the process from total RNA to analysis of expression and in the current analysis of quality control measures
| 45 | 45 | 44 | 134 | |
| 39 | 41 | 37 | 117 | |
| 36 | 41 | 35 | 112 | |
| 36 | 38 | 35 | 117 |