| Literature DB >> 19208645 |
Sophie Lemoine1, Florence Combes, Stéphane Le Crom.
Abstract
The increase in feature resolution and the availability of multipack formats from microarray providers has opened the way to various custom genomic applications. However, oligonucleotide design and selection remains a bottleneck of the microarray workflow. Several tools are available to perform this work, and choosing the best one is not an easy task, nor are the choices obvious. Here we review the oligonucleotide design field to help users make their choice. We have first performed a comparative evaluation of the available solutions based on a set of criteria including: ease of installation, user-friendly access, the number of parameters and settings available. In a second step, we chose to submit two real cases to a selection of programs. Finally, we used a set of tests for the in silico benchmark of the oligo sets obtained from each type of software. We show that the design software must be selected according to the goal of the scientist, depending on factors such as the organism used, the number of probes required and their localization on the target sequence. The present work provides keys to the choice of the most relevant software, according to the various parameters we tested.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19208645 PMCID: PMC2665234 DOI: 10.1093/nar/gkp053
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Description of the parameters involved in the oligonucleotide specificity and the way they are calculated
| Design program | Cross-hybridization assessment | Management of low complexity region | Design orientation | Number of probes designed by gene |
|---|---|---|---|---|
| ArrayOligoSelector ( | Blast and thermodynamic calculations | Filter out probes using a lossless compression score | Probes ranking according to the 3′ end distance | Chosen by the user |
| CommOligo ( | Thermodynamic calculations and Kane's specifications | Masking sequences with nucleotide repeats | Design starting from the 3′- or 5′-end | Chosen by the user |
| GoArrays ( | Blast and Kane's specifications | Filter out probes with prohibited sequences | Read input sequences from the 3′ end | Chosen by the user |
| HPD ( | Multiple alignment and hierarchical clustering (ClustaW) | No masking or filtering | No localization specified | All probes reaching selection criteria |
| Mprime ( | Blast (wublast) | No masking or filtering | Probes weighted towards 3′-end (optional region specification) | Chosen by the user |
| OliD ( | Blast | Filter out probes with trinucleotide repeats | Preference given to the 3′-end proximity | Chosen by the user |
| OligoArray ( | Blast and thermodynamic calculations | Masking sequences with nucleotide repeats | Distance to the 3′-end specified by the user (max 1500) | Chosen by the user |
| Oligodb ( | Blast | Masking sequences using DUST with BLAST | No localization specified | Chosen by the user |
| OligoFaktory ( | Blast | Masking sequences using DUST with BLAST | Design starting from the 3′- or 5′-end | Chosen by the user (up to 3) |
| OligoPicker ( | Blast and repetitive sequence stretches | Masking sequences using DUST with BLAST | Design choosen for the 3′- or the 5′-end | Chosen by the user (up to 5) |
| OligoWiz ( | Blast and thermodynamic calculations | Filter out probes using a low complexity score calculation | Localization score based on centre, 5′- or 3′-distance | All probes reaching selection criteria |
| Oliz ( | Blast and Kane's specifications | No masking or filtering | Probes designed in the 3′-UTR | All probes reaching selection criteria |
| Osprey ( | Position Specific Scoring Matrix and Gribskov profiles | Masking sequences with nucleotide repeats | 5′- or 3′-bias as the last constraint | All probes reaching selection criteria |
| PICKY ( | Suffix array approach using Kane's specifications and thermodynamic calculations | Masking sequence with a low complexity using suffix array structure | No localization specified | Chosen by the user (up to 5) |
| PROBEmer ( | Suffix array approach | No masking or filtering | Optional range of positions | All probes reaching selection criteria |
| Probesel ( | Suffix array approach and thermodynamic calculation | No masking or filtering | No localization specified | One probe by target gene |
| ProbeSelect ( | Suffix array approach and thermodynamic calculation | Filter out probes with nucleotide repeats | No localization specified | Chosen by the user |
| ROSO ( | Blast | Filter out probes with nucleotide repeats | Optional probe localization in a 3′ or 5′ range | Chosen by the user |
| SEPON ( | Blast | No masking or filtering | Penalty given based on 3′ distance | Chosen by the user |
| YODA ( | Custom sequence similarity search tool (SeqMatch) | Filter out probes with prohibited sequences | Selection of 5′-end, 3′-end, centre or all probes | All probes reaching selection criteria |
Cross-hybridisation and low complexity directly affect the specificity of the oligonucleotide, whereas design orientation and the number of probes designed by sequence is an indirect way to affect specificity.
Description of the parameters involved in the oligonucleotide sensitivity and the way they are calculated
| Design program | Oligo length | GC percent | Secondary structure assessment | |
|---|---|---|---|---|
| ArrayOligoSelector | Fixed by the user | Not a selection criteria | Filter out probes using user fixed threshold | Evaluated by self-complementarity |
| CommOligo | Fixed by the user from 10 to 128 mer | Range chosen by the user, | Masking sequences using user defined range | Evaluated by self-complementarity |
| GoArrays | Fixed by the user | Range chosen by the user, | Not a selection criteria | Using Mfold |
| HPD | Fixed by the user | Not a selection criteria | Filter out probes using user fixed threshold | Evaluated using self-folding energy (hairpin) |
| Mprime | Fixed by the user | Range chosen by the user, | Filter out probes out of the user defined range | With scoring calculation from Kampke |
| OliD | Fixed by the user | Not available | Filter out probes using user fixed threshold | Using Mfold |
| OligoArray | Range chosen by the user from 15 to 75 mer | Range chosen by the user, | Filter out probes out of the user defined range | Using a custom module similar to Mfold |
| Oligodb | Fixed by the user from 20 to 100 mer | Not a selection criteria, | Not a selection criteria | Using Mfold |
| OligoFaktory | Range chosen by the user | Range chosen by the user | Not a selection criteria | Not a selection criteria |
| OligoPicker | Fixed by the user from 20 to 100 mer | Range chosen by the user, | Not a selection criteria | Evaluated by self-complementarity |
| OligoWiz | Range chosen by the user, | Range chosen by the user, | Not a selection criteria | Evaluated using a folding energy algorithm |
| Oliz | Fixed at 50 mer | Fixed range around 76°C, | Range fixed between 45% and 50% | Not a selection criteria |
| Osprey | Range chosen by the user from 10 to 90 mer | Range chosen by the user, | Not a selection criteria | With dimer and hairpin free energy calculation |
| PICKY | Range chosen by the user | Include in the cross-hybridization screening | ||
| PROBEmer | Fixed by the user | Range chosen by the user | Filter out probes out of the user defined range | Evaluated by self-complementarity as in Primer3 |
| Probesel | Range chosen by the user | Not a selection criteria | Using Mfold | |
| ProbeSelect | Fixed by the user | Not a selection criteria, | Not a selection criteria | Evaluated by self-complementarity |
| ROSO | Fixed by the user | Range chosen by the user, | Preferred range between 40% and 65% | Hairpin and homoduplex free energy calculation |
| SEPON | Fixed by the user | Using a penalty if GC content is far from 40% to 60% | Using Mfold | |
| YODA | Fixed by the user | Range chosen by the user, | Filter out probes out of the user defined range | Evaluated by self-complementarity |
Tm and GC percent measurement are directly linked to the strength of the interaction between the probe and its target. Secondary structure will indirectly influence sensitivity by interfering with the interaction. NN stands for Nearest Neighbour thermodynamic model.
Description of the availability and flexibility of the oligonucleotide design software
| Design program | Organism | Specificity bank | Accessibility (free for academics) | User interface | Programming language | Running time |
|---|---|---|---|---|---|---|
| ArrayOligoSelector | No limitation | Fasta file | Downloadable from a website | Command line (L) | Python | 52 min |
| CommOligo | No limitation | Fasta file | Available upon request | Standalone GUI (W) | C++ | 1156 min |
| GoArrays | No limitation | Fasta file | Downloadable from a website | Standalone GUI (L) | Java | ND |
| HPD | Not available | Not available | Downloadable from a website | Standalone GUI (W) | Object Pascal | ND |
| Mprime | Rat, mouse, human, drosophila and zebrafish | RefSeq database for the organism | Web interface only | Web interface | C++ and Perl | 31 min |
| OliD | No limitation | Genome sequence | Available upon request | Command line (L) | Python | ND |
| OligoArray | No limitation | Fasta file | Downloadable from a website | Command Line (L) and Standalone GUI (L, W, M) | Java | 141 min |
| Oligodb | Human | All human cDNA transcripts in ENSEMBL | Web interface only | Web interface | Unknown | ND |
| OligoFaktory | All organisms with an NCBI database | A predefined or custom NCBI database | Web interface and downloadable program | Web interface and MacOS standalone GUI | Unknown | 43 min |
| OligoPicker | No limitation | Fasta file | Downloadable from a website | Command line (L) | Perl | 30 min |
| OligoWiz | All organisms found on the server | Located on the server | Client program downloadable | Client GUI (L, W, M) | Java client, Perl server | 26 min |
| Oliz | No limitation | Not available | Downloadable from a website | Command line (L) | Perl | ND |
| Osprey | No limitation | Fasta file | Web interface and source available upon request | Web interface | C and Perl | ND |
| PICKY | No limitation | Fasta file | Available upon request | Standalone GUI (L, W, M) | C++ | 11 min |
| PROBEmer | No limitation | Fasta file | Web interface only | Web interface | C | ND |
| Probesel | No limitation | Same as target sequences | Available upon request | Command line (L) | C++ | ND |
| ProbeSelect | No limitation | Proprietary format | Available upon request | Command line (L) | C++ | ND |
| ROSO | No limitation | Fasta file | Web interface and standalone program available upon request | Web interface and standalone GUI (W) | C | 418 min |
| SEPON | All organisms with an EST collection | A source EST collection | Available upon request | Command line (L) | Perl | ND |
| YODA | No limitation | Fasta file | Downloadable from a website | Command line (L, W, M) | Java | 3 min |
User interface can be available for Linux (L), Windows (W) or MacOS (M). Running time has been estimated in minutes using a Pentium 4 3 GHz desktop computer with 2-Gb memory. ND, Not determined.
Property for each oligonucleotide set created for the custom mouse array
| ArrayOligoSelector | CommOligo | Mprime | OligoArray | OligoFacktory | OligoPicker | OligoWiz | PICKY | ROSO | YODA | |
|---|---|---|---|---|---|---|---|---|---|---|
| Probe number | 1421 | 1392 | 1299 | 1383 | 580 | 1256 | 1421 | 1042 | 1163 | 1420 |
| Probe size | 50.0 ± 0 | 50.0 ± 0 | 50.0 ± 0 | 50.2 ± 0.46 | 51.5 ± 0.74 | 50.0 ± 0 | 49.9 ± 3.71 | 51.0 ± 0.84 | 50.0 ± 0 | 50.0 ± 0 |
| Specificity (%) | 94.23 | 81.82 | 78.06 | 82.21 | 80.34 | 98.89 | 83.11 | 98.46 | 78.59 | 81.83 |
Mean and standard deviation for probe size. Specificity is calculated counting the number of unique hits found with an identity ≥75% all along the probe.
Figure 1.Comparison of the sensitivity of the oligonucleotides designed for the custom mouse array. For each oligonucleotide set created we plot the distribution for all oligonucleotides in the set of Tm (A) and free energies of the most probable secondary structure (B). The name of the software used for design is displayed on the x-axis. AOS stands for ArrayOligoSelector.
Figure 2.Evaluation of custom oligonucleotide specificity. (A) Duplex free energies between oligonucleotides and their best off-target hit. (B) Distribution of the distance between the 5′ of the oligonucleotide and the 3′ of the target gene sequence for each designed oligoset. The name of the software used for design is displayed on the x-axis. AOS stands for ArrayOligoSelector.
Figure 3.Comparison of the sensitivity of the oligonucleotides designed for tiling array. For each oligonucleotide set created we plot the distribution for all oligonucleotides in the set of Tm (A), GC percent (B) and free energies of the most probable secondary structure (C).
Figure 4.Evaluation of tiling oligonucleotide specificity. (A) Distribution of the distance in base pair between oligonucleotide that follows each other on the tiling path. (B) Distribution of the number of oligonucleotide by transcript. (C) Distribution of the number of BLAST hits by oligonucleotide using the parameters described in the ‘Material and methods’ section. The y-axis is log scaled. To clearly display these distributions we removed all oligonucleotides with only one hit.