| Literature DB >> 21170345 |
Ping Gong1, Mehdi Pirooznia, Xin Guan, Edward J Perkins.
Abstract
High density oligonucleotide probe arrays have increasingly become an important tool in genomics studies. In organisms with incomplete genome sequence, one strategy for oligo probe design is to reduce the number of unique probes that target every non-redundant transcript through bioinformatic analysis and experimental testing. Here we adopted this strategy in making oligo probes for the earthworm Eisenia fetida, a species for which we have sequenced transcriptome-scale expressed sequence tags (ESTs). Our objectives were to identify unique transcripts as targets, to select an optimal and non-redundant oligo probe for each of these target ESTs, and to annotate the selected target sequences. We developed a streamlined and easy-to-follow approach to the design, validation and annotation of species-specific array probes. Four 244K-formatted oligo arrays were designed using eArray and were hybridized to a pooled E. fetida cRNA sample. We identified 63,541 probes with unsaturated signal intensities consistently above the background level. Target transcripts of these probes were annotated using several sequence alignment algorithms. Significant hits were obtained for 37,439 (59%) probed targets. We validated and made publicly available 63.5K oligo probes so the earthworm research community can use them to pursue ecological, toxicological, and other functional genomics questions. Our approach is efficient, cost-effective and robust because it (1) does not require a major genomics core facility; (2) allows new probes to be easily added and old probes modified or eliminated when new sequence information becomes available, (3) is not bioinformatics-intensive upfront but does provide opportunities for more in-depth annotation of biological functions for target genes; and (4) if desired, EST orthologs to the UniGene clusters of a reference genome can be identified and selected in order to improve the target gene specificity of designed probes. This approach is particularly applicable to organisms with a wealth of EST sequences but unfinished genome.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21170345 PMCID: PMC2999564 DOI: 10.1371/journal.pone.0014266
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Oligo array design process.
Our proposed easy-to-follow approach to design, validation and annotation of microarray oligo probes particularly for organisms with unsequenced genome but having a transcriptomic scale of EST sequences generated with deep sequencing and Sanger sequencing technologies. The SSH subtraction is optional as many researchers often construct normalized cDNA libraries for cloning and Sanger sequencing.
Completeness test of two 454 sequence assemblies using BLASTN a.
| Assembly | Unique sequences | Significant hits ( |
| |
| Newbler | ||||
| Singleton | 157,070 | 493,800 | 2,109 | 8,818 |
| Contig | 31,114 | 32,713 | 579 | 2,419 |
| Total | 188,184 | 526,513 | 2,688 | 11,237 |
| SeqMan | ||||
| Singleton | 129,486 | 44,111 | 917 | 2,414 |
| Contig | 63,602 | 44,979 | 339 | 1,489 |
| Total | 193,088 | 89,090 | 1,256 | 3,903 |
Each unique 454 singleton or contig was aligned against all other unique sequences within the same assembly.
Correctness test of two assemblies using BLASTN a.
| Assembly | Unique sequences |
|
| ||
| Newbler | |||||
| Singleton | 157,070 | 3,207 | 5,306 | 400 | 344 |
| Contig | 31,114 | 1,713 | 3,613 | 534 | 548 |
| Total | 188,184 | 4,920 | 8,919 | 934 | 892 |
| SeqMan | |||||
| Singleton | 129,486 | 1,766 | 3,005 | 129 | 207 |
| Contig | 63,602 | 2,813 | 5,677 | 593 | 1,250 |
| Total | 193,088 | 4,579 | 8,682 | 722 | 1,457 |
Each unique 454 singleton or contig was aligned against all available Sanger sequences of Eisenia fetida (EF) (2231 SSH +104 GenBank dbEST), Eisenia andrei (EA) and Lumbricus spp. (LS) (1108 EA +17225 LS). If one unique 454 sequence hit more than one Sanger sequences, only the most significant one was counted.
The full length (≥100 bases) is that of the subject or the query, whichever is shorter.
Design of four 244K-oligo probe test arrays using Agilent's eArray.
| Non-redundant probe within group | |||||||||
| Target sequence source | Target sequence # | Target length | Probe length | Probe(s)/target | Design method | Sense | antisense | sense | antisense |
| GenBank dbEST | 104 | >300 | 60 | 4 | Best distribution | 281 | 280 | 281 | 280 |
| SSH libraries | 3144 | vary | 60 | 2 | Best distribution | 5412 | 5489 | 5412 | 5489 |
| 454 SeqMan-Singleton | 129486 | 40∼278 | 60 | 1 | Best probe | (96430) | (96339) | (96430) | (96339) |
| 454 SeqMan-Contig1 | 40222 | <150 | 60 | 1 | Best probe | (33309) | (33293) | (33309) | (33293) |
| 454 SeqMan-Contig2 | 18129 | 150∼300 | 60 | 2 | Best distribution | 31392 | 31305 | 31392 | 31305 |
| 454 SeqMan-Contig3 | 5251 | >300 | 60 | 4 | Best distribution | 19502 | 19509 | 19502 | 19509 |
| All above | 196336 | vary | 60 | 1 | Best probe | 155244 | 155213 | 155244 | 155213 |
| 454 Newbler-Contig | 31114 | vary | 60 | 1 | Best probe | 30684 | 30682 | 30684 | 30682 |
| Short unique | 26302 | 40∼59 | 40 | 1 | Best probe | (23941) | (23907) | 23941 | 23907 |
| Total number of redundant probes among groups | 27245 | 28722 | 27245 | 28722 | |||||
| Total number of non-redundant probes in the final test array design | 215270 | 213756 | 239211 | 237663 | |||||
| Total number of redundant probes included in the final test array design | 26129 | 27643 | 2188 | 3736 | |||||
| Total number of unique and redundant probes in the final test array design | 241399 | 241399 | 241399 | 241399 | |||||
| Test Array ID | TA-1 | TA-2 | TA-3 | TA-4 | |||||
| Configuration number of Agilent custom gene expression array in 1x244K-format (catalog no. G4502A) | 20022 | 20023 | 20024 | 20025 | |||||
A probe group is defined as the collection of probes designed for a specific source of target sequences (e.g., SSH libraries, 454 Newbler-Contig, etc.). Redundant probes were removed within each probe group. Numbers in brackets are the probes excluded from array design because they are either already included in the group called “All above”, or are short 40-mer probes that are excluded from TA-1 and TA-2.
Figure 2Hybridization results of 60-mer probes.
Hybridization results of eArray-designed sense and antisense 60-mer probes showing the number and percentage of probes with 0 to 8 positive measurements. The total number of sense and antisense probes was 217,458 (215,270 unique +2,188 redundant) and 217,492 (213,756+3,736), respectively. Each probe was measured 8 times, i.e., on four arrays (2 array/design) and two PMT gain settings (400 and 500). Results of additional redundant probes included in test array designs TA-1 and TA-2 are not shown because they were measured 4 times if disregarding their repeats.
Comparison of several representative approaches to transcriptome-scale oligo array probe design, validation and annotation based on species-specific EST sequence information.
| Organism (common name) | EST # & sequencing technology | Assembler & unique target # | Probe design program | Probe # & length (mer) | Validation | Target EST annotation | Ref. |
|
| 566K, 454 & Sanger | Newbler/SeqMan, N/A | eArray | See | Array | BLASNTNBLASTXInterProScanPIPA | This study |
|
| 10K, Sanger | Phrap/Consed, 5K | Tethys | 5K, 50–60 | Array | BLAST2GO |
|
|
| 59K, Sanger | Cap3, 20K | eArray | 39K, 60 | Array & qRT-PCR | BLASTNBLASTX |
|
|
| >30K, Sanger | Redundant, 4K | Affymetrix | 4K, 25 | Array | BLAST |
|
|
| 67K, Sanger | Whole genome, varied | Picky | 43K, 50–70 | Array | TIGR V5 gene model |
|
|
| 486K, Sanger | Shotgun genome, 22K | eArray | 22K, 60 | Array | BLASTNCLUSTALW |
|
|
| 27K, Sanger | Cap3, 10K | eArray | 15K, 60 | Array | BLASTN BLASTX |
|
|
| 227K, Sanger | EST-Ferret, 57K | Array Designer | 25K, 65 | Array | Gene Index, BLASTX |
|
|
| 210K, Sanger | 15K Unigene | GoArrays | 22K, 40 | Array | Annot8r |
|
N/A: not available.
Varied depending on what assembler or estimator was used.