| Literature DB >> 20236987 |
Xiao-Qin Xia1, Zhenyu Jia, Steffen Porwollik, Fred Long, Claudia Hoemme, Kai Ye, Carsten Müller-Tidow, Michael McClelland, Yipeng Wang.
Abstract
Most current microarray oligonucleotide probe design strategies are based on probe design factors (PDFs), which include probe hybridization free energy (PHFE), probe minimum folding energy (PMFE), dimer score, hairpin score, homology score and complexity score. The impact of these PDFs on probe performance was evaluated using four sets of microarray comparative genome hybridization (aCGH) data, which included two array manufacturing methods and the genomes of two species. Since most of the hybridizing DNA is equimolar in CGH data, such data are ideal for testing the general hybridization properties of almost all candidate oligonucleotides. In all our data sets, PDFs related to probe secondary structure (PMFE, hairpin score and dimer score) are the most significant factors linearly correlated with probe hybridization intensities. PHFE, homology and complexity score are correlating significantly with probe specificities, but in a non-linear fashion. We developed a new PDF, pseudo probe binding energy (PPBE), by iteratively fitting dinucleotide positional weights and dinucleotide stacking energies until the average residue sum of squares for the model was minimized. PPBE showed a better correlation with probe sensitivity and a better specificity than all other PDFs, although training data are required to construct a PPBE model prior to designing new oligonucleotide probes. The physical properties that are measured by PPBE are as yet unknown but include a platform-dependent component. A practical way to use these PDFs for probe design is to set cutoff thresholds to filter out bad quality probes. Programs and correlation parameters from this study are freely available to facilitate the design of DNA microarray oligonucleotide probes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20236987 PMCID: PMC2887943 DOI: 10.1093/nar/gkq039
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Array CGH data set used in this study
| Data Set | Microarray platform | Sample | Manufacturer | Designer | Oligos | Bases | Role of data set in the analysis | Number of samples |
|---|---|---|---|---|---|---|---|---|
| 1 | NimbleGen HG18 whole genome CGH Array | Normal human male genomic DNA | NimbleGen Inc. | NimbleGen Inc. | 137 280 | 50 | Sensitivity | 6 |
| 2 | NimbleGen Human Promoter Array (custom design) | Human prostate cell line (PC3M, 267B1) genomic DNA | NimbleGen Inc. | authors | 220 475 | 50 | Sensitivity | 4 |
| 3 | NimbleGen | NimbleGen Inc. | authors | 288 238 | 50 | Sensitivity, specificity | 4 | |
| 4 | In-house Spotted Human Promoter Array (custom design) | Normal human lung tissue genomic DNA | authors | authors | 11 653 | 50 | Sensitivity, reproducibility | 205 |
Figure 1.ARSS, positional weights and pseudo stacking energies of the PPBE model for data set 1. (A) Convergence of the PPBE model after three cycles of iterative fitting of both of positional weights and pseudo dinucleotide stacking energies (six cycles total); (B) Plot of positional weights; (C) Comparison of traditional dinucleotide stacking energies and pseudo dinucleotide stacking energies.
Figure 2.Box plots (black line) show the correlation of individual PDFs with observed oligonucleotide probe hybridization intensities for data set 1. The density curve (red line) is computed using kernel density estimates and shows the distribution of individual PDFs. The secondary Y-axis represent the density of different PDFs.
Simple linear model average residue square sum (ARSS) and correlation coefficients (r) for the correlation of individual PDFs with probe hybridization intensities
Data Set 1 | Data Set 2 | Data Set 3 | Data Set 4 | |||||
|---|---|---|---|---|---|---|---|---|
| ARSS | ARSS | ARSS | ARSS | |||||
| PHFE | 0.11 | 0.168 | 0.03 | 0.504 | 0.03 | 0.460 | 0.13 | 1.668 |
| PMFE | 0.29 | 0.156 | 0.27 | 0.468 | 0.32 | 0.414 | 0.28 | 1.568 |
| HairpinScore | 0.21 | 0.162 | 0.22 | 0.479 | 0.20 | 0.442 | 0.21 | 1.621 |
| DimerScore | 0.19 | 0.164 | 0.23 | 0.478 | 0.17 | 0.448 | 0.15 | 1.660 |
| ComplexityScore-2B | 0.08 | 0.169 | 0.05 | 0.503 | 0.02 | 0.461 | 0.09 | 1.684 |
| ComplexityScore-5B | 0.04 | 0.170 | 0.11 | 0.498 | 0.01 | 0.461 | 0.02 | 1.698 |
| ComplexityScore-8B | 0.01 | 0.170 | 0.15 | 0.493 | 0.01 | 0.461 | 0.12 | 1.675 |
| ComplexityScore-11B | 0.01 | 0.170 | 0.10 | 0.498 | 0.02 | 0.461 | 0.10 | 1.683 |
| BlastScore | 0.02 | 0.170 | 0.11 | 0.498 | 0.01 | 0.461 | 0.18 | 1.641 |
| PPBE | 0.36 | 0.148 | 0.30 | 0.460 | 0.65 | 0.269 | 0.48 | 1.301 |
Figure 3.Relative ARSS of different models for different data sets.
Figure 4.Comparisons of ARSS for within-dataset validations using the multivariate models W/O PPBE or W. PPBE.
Figure 5.Correlation of probe hybridization intensity with probe specificity and reproducibility. (A) Correlation of probe hybridization intensity with probe specificity (observed log2 base transformed ratio) for data set 3. Gray line indicates no change; (B). Correlation of oligonucleotide probe hybridization intensity with probe reproducibility for data set 4, represented as coefficient of variation (CV).