| Literature DB >> 19015138 |
Adam L Asare1, Zhong Gao, Vincent J Carey, Richard Wang, Vicki Seyfert-Margolis.
Abstract
MOTIVATION: As the use of microarrays in human studies continues to increase, stringent quality assurance is necessary to ensure accurate experimental interpretation. We present a formal approach for microarray quality assessment that is based on dimension reduction of established measures of signal and noise components of expression followed by parametric multivariate outlier testing.Entities:
Mesh:
Year: 2008 PMID: 19015138 PMCID: PMC2638936 DOI: 10.1093/bioinformatics/btn591
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Application of multivariate outlier detection to negative and positive controls derived from MAQC and Affymetrix spike-in series, the latter with digital contamination
| Negative controls | ||||
|---|---|---|---|---|
| Source | No. of chips | No. of chips flagged | ||
| PMVO-raw | PVMO-PC | MDQC-PC | ||
| Affy. MAQC | 120 | (34,34,23) | (0,0,0) | (9,3,1) |
| Illu. MAQC | 19 | (0,0,0) | (0,0,0) | (3,1,0) |
| Digitally contaminated arrays | ||||
| Source | No. of chips | Contaminated | Chips flagged | |
| PMVO-PC | MDQC | |||
| Affy. spike-in | 12 | – | none | 2,8,10 |
| 1 | 1 | 1,8 | ||
| 1,2 | 1,2,8 | 1,2 | ||
| 1,2,11 | 1,2,8,11 | 8,10 | ||
For negative controls, table cells give number of arrays flagged at α=0.10, 0.05, 0.01.
For positive controls, cell entries give indices of arrays contaminated or identified by various algorithms. Method labels are: PMVO-raw, for parametric multivariate outlier detection applied to raw QA features; PMVO-PC, for PMVO applied with dimension reduction to first three principal components; MDQC, for Mahalanobis distance-based algorithm of Cohen Freue et al. (2007) with the MCD estimator of covariance, applied to raw QC features; and MDQC-PC, for MDQC with the S-estimator of covariance applied on PC1–PC3 of QC features.
Fig. 1.Composite of four types of digital contamination applied to raw Affymetrix intensity data—the three circular subregions are, counterclockwise from upper left, low constant, variable and high constant blobs, and the rectangular region on the right has inflated variance.
Fig. 2.Parallel coordinate plots are a common way of visualizing multi-variate data with different scales to facilitate detection of outliers. Applying our QA approach, 18 of the 507 microarrays were flagged as aberrant (highlighted in red). As shown, our approach to QA has selected samples as problematic where one or more indicators appear as an outlier based on reduction of the dimensionality of the data via PCA and applying a sequential Wilks's multivariate outlier test at an α=0.01. Our approach provides greater consistency in designating problematic arrays through a statistical framework that does not rely on arbitrary cutoffs for any individual indicator.
Fig. 3.PCA was applied to gene expression estimates for all genes in two clinical trials. (A) The outlier detection approach described was applied to 204 arrays from a ragweed allergy study and identified five samples. These microarrays are highlighted in red in the PCA 1 versus PCA 3 plot for gene expression to show the relationship of outlier samples detected by the system to actual gene expression estimates per array. Points A, B and C have problematic NUSE values. Points D and E have abnormally high GAPDH and HSAC07 ratios. The location of these arrays based on gene expression PCA suggests that QA problems may contribute to deterioration of overall expression. (B) A kidney transplant trial with 42 arrays where three were detected as outliers. The three arrays are highlighted in red in the PCA 1 versus PCA 2 gene expression plot. Points F, G and H have abnormal NUSE, GAPDH and HSAC07 ratios. Again, the samples flagged by the QA approach appear to have gene expression estimates that differ from the majority of other arrays. The arrayMvout package includes a map fig3map from records in the ITN QA metrics matrix to samples labeled A–H in these figures.
Fig. 4.Statistical power calculations. (A) Ragweed allergy study showing improved EDR upon removal of arrays flagged as Points B and C in Figure 3A comparing two time points of interest. (B) Kidney transplant study showing removal of two arrays flagged as F and G in Figure 3B in a differential expression comparison of two treatment cohorts.