| Literature DB >> 21383905 |
Takashi Yoneya, Tatsuya Miyazawa.
Abstract
An enormous amount of microarray data has been collected and accumulated in public repositories. Although some of the depositions include raw and processed data, significant parts of them include processed data only. If we need to combine multiple datasets for specific purposes, the data should be adjusted prior to use to remove bias between the datasets. We focused on a GeneChip platform and a pre-processing method, RMA, and examined simple quantile correction as the post-processing method for integration. Integration of the data pre-processed by RMA was evaluated using artificial spike-in datasets and real microarray datasets of atopic dermatitis and lung cancer. Studies using the spike-in datasets show that the quantile correction for data integration reduces the data quality at some extent but it should be acceptable level. Studies using the real datasets show that the quantile correction significantly reduces the bias. These results show that the quantile correction is useful for integration of multiple datasets processed by RMA, and encourage effective use of public microarray data.Entities:
Keywords: GeneChip; RMA; data integration; microarray; quantile correction
Year: 2011 PMID: 21383905 PMCID: PMC3044426 DOI: 10.6026/97320630005382
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Expression patterns of disease specific and house keeping genes. The distributions of CCL18, SERPINB3, ACTB and GAPDH of the AD dataset are shown in A, and the distributions of SPP1, CD24, ACTB and GAPDH of the LC dataset are shown in B. The S1 of the LC dataset is not shown because the raw data of GSE3268 are not available. (A) G1: lesional skin samples of patients (GSE6710), G2: non-lesional skin samples of patients (GSE6710), G3: normal skin samples of healthy donors (GSE5667), G4: non-lesional skin samples of patients (GSE5667), G5: lesional skin samples of patients (GSE5667). (B) G1: normal samples of patients (GSE7670), G2: tumor samples of patients (GSE7670), G3: tumor samples of patients (GSE6253), G4: normal samples of patients (GSE3268), G5: tumor samples of patients (GSE3268).