| Literature DB >> 15283861 |
Chao Lu1.
Abstract
BACKGROUND: Normalization is an important step for microarray data analysis to minimize biological and technical variations. Choosing a suitable approach can be critical. The default method in GeneChip expression microarray uses a constant factor, the scaling factor (SF), for every gene on an array. The SF is obtained from a trimmed average signal of the array after excluding the 2% of the probe sets with the highest and the lowest values.Entities:
Mesh:
Year: 2004 PMID: 15283861 PMCID: PMC509236 DOI: 10.1186/1471-2105-5-103
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of signal data in 76 rat genome U34A GeneChip microarrays.
| Lowest | Highest | Mean | SD | CV (%) | |
| Total signal | 832,561.4 | 3,161,392.7 | 2,039,655.7 | 526,295.0 | 25.80% |
| Sum of signals used for SF | 524,513.7 | 1,986,236.9 | 1,212,296.5 | 336,138.0 | 27.73% |
| Trimmed total | 308,047.7 | 1,240,257.3 | 827,359.1 | 215,325.1 | 26.03% |
| Mean signal | 94.6 | 359.3 | 231.0 | 59.8 | 25.80% |
| Median of signals | 17.8 | 54.8 | 35.7 | 8.7 | 24.41% |
| Mean of log signals | 4.3 | 5.8 | 5.1 | 0.4 | 7.17% |
| Trimmed percentage | 34.4 | 54.1 | 40.7 | 4.4 | 10.70% |
"Total signal" is the sum of all the signals on each array. "Sum of signals used for SF" is the sum of signals excluding the trimmed data and used to calculate SF. "Trimmed total" is the sum of the 2% probe sets with the highest signals on the array. "Mean of log signals" is the mean of log2 transformed signals. "Trimmed percentage" = (Trimmed total/Total signal) × 100%. See also in Methods. The "lowest" and "highest" showed the lowest and highest number in the category among the 76 chips, respectively. The mean, standard deviation (SD) and coefficient of variation (CV) were also calculated.
Figure 1(A) Comparison among different normalization factors. NFLogMean (x-axis) is plotted against SF (red open triangle) and NFMedian (black closed circle). The correlation between NFLogMean and NFMedian is higher (R = 0.971) than that between NFLogMean and SF (R = 0.918). (B) The NF score, NFscore, for SF (red open triangle), NFMedian (blue open diamond) and NFLogMean (black closed circle) is expressed as a function of respective 'true NF'. NFTrimLogMean is not shown here to simplify the graph since it is similar to NFLogMean. See also in Methods.