| Literature DB >> 23505547 |
Ming Li1, Yalu Wen, Qing Lu, Wenjiang J Fu.
Abstract
Oligonucleotide microarrays are commonly adopted for detecting and qualifying the abundance of molecules in biological samples. Analysis of microarray data starts with recording and interpreting hybridization signals from CEL images. However, many CEL images may be blemished by noises from various sources, observed as "bright spots", "dark clouds", and "shadowy circles", etc. It is crucial that these image defects are correctly identified and properly processed. Existing approaches mainly focus on detecting defect areas and removing affected intensities. In this article, we propose to use a mixed effect model for imputing the affected intensities. The proposed imputation procedure is a single-array-based approach which does not require any biological replicate or between-array normalization. We further examine its performance by using Affymetrix high-density SNP arrays. The results show that this imputation procedure significantly reduces genotyping error rates. We also discuss the necessary adjustments for its potential extension to other oligonucleotide microarrays, such as gene expression profiling. The R source code for the implementation of approach is freely available upon request.Entities:
Mesh:
Year: 2013 PMID: 23505547 PMCID: PMC3591399 DOI: 10.1371/journal.pone.0058677
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Twenty-five-mer oligonucleotides which are perfectly matched or mismatched to the target sequence with SNP allele A or B.
Figure 2CEL images before and after imputation.
Left: before, Right: after.
Figure 3Allelic copy number abundance before and after imputation.
Left: before, Right: after; Red: genotype AA, Blue: genotype BB, Green: genotype AB
Genotyping error before and after imputation.
| Sample ID | # of SNP affected | Ave. # of probe affected/SNP | Error Rate Before Imputation | Error Rate After Imputation |
| NA12812 | 20075 | 5.67 | 2.54% | 0.45% |
| NA10835 | 20925 | 3.76 | 1.56% | 0.32% |
| NA12239 | 12758 | 3.34 | 1.05% | 0.26% |
| NA12144 | 7330 | 3.22 | 0.95% | 0.24% |
| NA12005 | 4404 | 3.01 | 0.83% | 0.65% |
| NA12056 | 7490 | 3.28 | 0.87% | 0.85% |
| NA12146 | 7657 | 3.16 | 0.73% | 0.25% |
| NA12155 | 9394 | 3.18 | 0.71% | 0.71% |
| NA07056 | 2026 | 2.94 | 0.70% | 0.54% |
| NA12236 | 8071 | 3.19 | 0.66% | 0.65% |
| NA12813 | 4788 | 3.15 | 0.61% | 0.25% |
| NA10863 | 5050 | 3.07 | 0.56% | 0.55% |
Genotyping error rates stratified by number of affected probes.
| # of affected probes | 0 | 1–4 (<10%) | 5–8 (10%–20%) | >8 (>20%) |
| # of SNPs | 415417 (84.5%) | 65458 (13.3%) | 8313 (1.7%) | 2688 (0.5%) |
| # of errors before imputation | 1871 | 2165 | 504 | 247 |
| Error rates before imputation | 0.45% | 3.33% | 6.06% | 9.19% |
| # of errors after imputation | 1871 | 379 | 74 | 19 |
| Error rates after imputation | 0.45% | 0.58% | 0.89% | 0.71% |
| Fold Change of Error rates | 1 | 5.7 | 6.8 | 13 |