| Literature DB >> 21923903 |
Matthew N McCall1, Rafael A Irizarry.
Abstract
BACKGROUND: A novel method of microarray preprocessing--Frozen Robust Multi-array Analysis (fRMA)--has recently been developed. This algorithm allows the user to preprocess arrays individually while retaining the advantages of multi-array preprocessing methods. The frozen parameter estimates required by this algorithm are generated using a large database of publicly available arrays. Curation of such a database and creation of the frozen parameter estimates is time-consuming; therefore, fRMA has only been implemented on the most widely used Affymetrix platforms.Entities:
Mesh:
Year: 2011 PMID: 21923903 PMCID: PMC3180392 DOI: 10.1186/1471-2105-12-369
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Figure 1Diagram of a typical workflow. A typical workflow using the affy, frma, and frmaTools packages to obtain fRMA gene expression estimates in the form of an ExpressionSet or frmaExpressionSet object. Above each arrow is the function used to transform one object into the next. The dashed lines divide the figure by the package used.
Training data size affects fRMA reproducibility
| Batch Size | ||||||
|---|---|---|---|---|---|---|
| 5 | 0.7585 (0.1574) | 0.7573 (0.1437) | 0.8515 (0.1435) | 0.5812 (0.1231) | 0.4439 (0.0916) | |
| 10 | 0.6795 (0.0799) | 0.7173 (0.0858) | 0.6563 (0.0641) | 0.4506 (0.1073) | 0.3878 (0.0901) | |
| 20 | 0.5696 (0.0654) | 0.4691 (0.0523) | 0.5180 (0.0506) | 0.4551 (0.0561) | 0.3299 (0.0629) | |
| 30 | 0.4429 (0.0491) | 0.3884 (0.0482) | 0.3387 (0.0380) | 0.3697 (0.0537) | 0.3036 (0.0440) | |
| 40 | 0.3290 (0.0450) | 0.3700 (0.0488) | 0.2598 (0.0368) | 0.2642 (0.0303) | ||
| 50 | 0.3093 (0.0424) | 0.3107 (0.0339) | 0.2307 (0.0291) | |||
| 60 | 0.2661 (0.0374) | 0.2454 (0.0322) | 0.1955 (0.0261) | |||
| 70 | 0.2529 (0.0322) | 0.2286 (0.0295) | 0.2089 (0.0261) | |||
| 80 | 0.2256 (0.0281) | 0.2098 (0.0285) | 0.1616 (0.0259) | |||
| 90 | 0.1922 (0.0274) | 0.2058 (0.0248) | 0.1566 (0.0163) | |||
| 100 | 0.1891 (0.0277) | 0.1976 (0.0261) | 0.1128 (0.0167) | |||
Median and IQR of the across-replicate median absolute deviations (MAD) for different batch sizes (columns) and number of batches (rows) used to train the fRMA algorithm. The median provides an estimate of the typical MAD; the IQR provides an estimate of the variability seen in MADs across replicates.
Figure 2fRMA can mimic RMA. Distribution of difference in expression estimates for 22283 probesets across 200 breast tumor arrays (GSE11121) between 3 different fRMA implementations and RMA. The 3 fRMA implementations are as follows: (1) the default fRMA implementation (dotted line), (2) fRMA trained on a balanced random sample from the arrays being analyzed (dashed line), and (3) fRMA trained on a balanced random sample from the arrays being analyzed and using the same reference distribution for quantile normalization as RMA.
Comparison of RMA and fRMA based on bias, precision, and overall performance
| Preprocessing | Slope (SD) | Null SD | Null 99.5% | SNR | POT | |
|---|---|---|---|---|---|---|
| RMA | 0.14 (0.54) | 0.47 | 1.46 | 0.30 | 0.01 | |
| fRMA | 0.20 (0.48) | 0.40 | 1.24 | 0.50 | 0.02 | |
| RMA | 0.69 (0.55) | 0.39 | 1.26 | 1.77 | 0.15 | |
| fRMA | 0.75 (0.49) | 0.34 | 1.11 | 2.21 | 0.23 | |
| RMA | 0.61 (0.42) | 0.33 | 0.99 | 1.85 | 0.18 | |
| fRMA | 0.60 (0.36) | 0.28 | 0.86 | 2.14 | 0.23 | |
Comparison of RMA and fRMA trained on the modified spike-in data. For three intensity strata, we report assessments of accuracy (column 1), precision (columns 2 & 3), and overall performance (columns 4 & 5).
Figure 3fRMA can be used with incrementally growing datasets. Distribution of difference in expression estimates for 22283 probesets across 200 breast tumor arrays (GSE11121) when preprocessing with fRMA trained using the first N batches and fRMA trained using all batches.
Figure 4Using the same normalization vector reduces bias. Same as Figure 3 except that the same normalization vector is used for all of the implementations. This suggests that differences in the center of the distributions are primarily due to differences in the reference distribution used in quantile normalization and that differences in the spread of the distributions are primarily due to estimation of the probe effects.