| Literature DB >> 17480226 |
Alexander C Cambon1, Abdelnaby Khalyfa, Nigel G F Cooper, Caryn M Thompson.
Abstract
BACKGROUND: Microarrays have been used extensively to analyze the expression profiles for thousands of genes in parallel. Most of the widely used methods for analyzing Affymetrix Genechip microarray data, including RMA, GCRMA and Model Based Expression Index (MBEI), summarize probe signal intensity data to generate a single measure of expression for each transcript on the array. In contrast, other methods are applied directly to probe intensities, negating the need for a summarization step.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17480226 PMCID: PMC1884176 DOI: 10.1186/1471-2105-8-146
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Comparisons of probe level patterns across replicates and treatments for selected genes. Each line color (in Panels A, B, and C) represents a probe level pattern on a specific array. Blue and black lines show probe level patterns on arrays from time = 0. Red and orange lines show probe level patterns on arrays for time = 96 hours. The first row shows probe level patterns for the highest up-regulated gene by fold change (Hspb1). The second row is a randomly selected gene. The third row is the most down-regulated gene by fold change. The x-axis for each plot in (Panels A, B, and C) is probe number (1 through 11). (Panel A) shows log2 of raw probe intensities by probe number. (Panel B) shows log2 array-centered intensities by probe number. The log2 quantile normalized mas-background-corrected probe intensities are shown in (Panel C). (Panel D) shows summary gene expression estimates using median polish. The summarization method used is exactly like the RMA method except that mas-background correction is used instead of rma background correction.
Figure 2Effect of middle base pair and probe number at gross array level. The top left diagram (A) shows box plots by probe number (1–11) for all transcripts in array C. A slightly increasing trend can be seen. The top right diagram (B) shows box plots of probes in array C by middle base pair (A, C, G, and T) of the probe. The comparative levels are consistent with that described in the Naef and Magnasco [27] paper. The diagrams on the second row (C and D) show box plots of log2 probe intensities categorized by both probe number and middle base pair for all probes in array 3. The slight increasing trend at the array level is too weak to pick up in the plots of individual probe sets (Figure 1).
Figure 3Scatter plots of log. The top row shows scatter plots for all four arrays (A,B,C,D) for Hsbp1, the number 1 up-regulated gene. The bottom row shows all four arrays for gene Id2, the top down-regulated gene. Affinities explain only a small part of variation between probes at the gene level. Affinities were calculated using the default method in Bioconductor package gcrma version 2.20. Affinities for perfect match probes are shown. Pearson correlation coefficients vary from 0.08 to 0.76 on the 8 scatter plots above. A scatter plot of affinities vs. log2 probe intensities (not shown) for all probes in array C are similar to the corresponding diagram in Wu et al. [5].
BLAST results for the highest up and down-regulated genes by fold change
| GGCAACTCAGCAGCGGTGTCTCAGA | 1 | 309–333 | -0.867 | -0.771 | -0.952 | -0.860 | 0.953 | 0 | 5 | 1 | 33 |
| TCAGAGATCCGACAGACGGCCGATC | 2 | 329–353 | -1.560 | -1.322 | -0.890 | -0.646 | 0.123 | 1 | 0 | 1 | 2 |
| GAGGAGCTCACAGTTAAGACCAAGG | 3 | 392–416 | -1.583 | -1.287 | -0.933 | -0.991 | -0.781 | 0 | 0 | 1 | 28 |
| GATGAACATGGCTACATCTCTCGGT | 4 | 458–482 | 0.862 | 0.429 | 4.728 | 4.665 | 0.198 | 1 | 0 | 3 | 28 |
| AAGCAGTCACACAATCAGCGGAGAT | 5 | 585–609 | -1.310 | -1.430 | -1.198 | -0.879 | -0.500 | 1 | 6 | 3 | 10 |
| GGAGATCACCATTCCGGTCACTTTC | 6 | 604–628 | 0.105 | 0.119 | 3.522 | 3.191 | 1.606 | 1 | 3 | 3 | 6 |
| TTCGAGGCCCGTGCCCAAATTGGAG | 7 | 626–650 | 2.880 | 2.293 | 3.521 | 2.828 | 2.127 | 1 | 5 | 3 | 5 |
| TGGAGGCCCAGAGTCGGAACAGTCT | 8 | 646–670 | -0.974 | -0.827 | -0.993 | -0.841 | 0.412 | 1 | 4 | 3 | 16 |
| GTCTGGAGCCAAGTAGAAGCCTTCA | 9 | 667–691 | -0.215 | -0.384 | 3.274 | 3.027 | -0.843 | 1 | 12 | 3 | 33 |
| TAGAAGCCTTCAGCTTGCTACCCAT | 10 | 680–704 | 0.970 | 0.812 | 0.711 | 0.546 | 1.685 | 1 | 0 | 3 | 21 |
| TCCCTCTCTGTCAATCTGATATGCT | 11 | 727–745 | 0.303 | 0.069 | 2.940 | 2.345 | 1.162 | 0 | NA | 0 | 19 |
| TGGACGACCCGATGAGTCTGCTCTA | 1 | 144–168 | 1.507 | 0.623 | 0.342 | 0.258 | 1.1012 | 1 | 1 | 4 | 3 |
| ACAACATGAACGACTGCTACTCCAA | 2 | 168–192 | 1.861 | 1.423 | 0.630 | 0.399 | -0.082 | 1 | 10 | 4 | 9 |
| GCTACTCCAAGCTCAAGGAACTGGT | 3 | 183–207 | 0.437 | 0.156 | -0.616 | -0.511 | -0.124 | 1 | 0 | 4 | 45 |
| ATCCTGCAGCACGTCATCGATTATA | 4 | 248–272 | 2.003 | 1.848 | 0.710 | 0.615 | 1.358 | 1 | 5 | 4 | 17 |
| TTATATCTTGGACCTGCAGATCGCC | 5 | 268–292 | 0.688 | 0.442 | -0.215 | -0.221 | 0.6038 | 1 | 0 | 4 | 37 |
| TGAACACGGACATCAGCATCCTGTC | 6 | 375–399 | 0.806 | 0.775 | -0.041 | -0.034 | 0.3126 | 1 | 8 | 4 | 32 |
| ATCCTGTCCTTGCAGGCGTCTGAAT | 7 | 392–416 | 0.741 | 0.822 | 0.310 | 0.826 | 2.1972 | 1 | 4 | 3 | 15 |
| GAATTCCCTTCTGAGCTTATGTCGA | 8 | 413–437 | 2.422 | 2.189 | 1.162 | 0.656 | 1.4553 | 1 | 0 | 3 | 41 |
| TTCTCTTTTTCTTTTGCACAACAAG | 9 | 518–542 | 0.375 | -0.217 | -0.369 | -0.691 | 1.1409 | 1 | 0 | 3 | 97 |
| TGTTATCAACCATTTCACCAGGAGA | 10 | 587–608 | 0.434 | 0.713 | -0.350 | -0.557 | 0.0674 | 1 | 0 | 3 | 40 |
| GGCCTGGACTGTGATAACCGTTATT | 11 | 683–707 | 2.214 | 1.663 | 1.130 | 0.615 | -0.061 | 1 | NA | 3 | 19 |
The BLAST search was restricted to organism Rattus norvegicus, however there is some duplication due to presence of multiple entries in some databases used in BLAST. Note that not all probes for Hsbp1 match perfectly with the target gene.
Figure 4The ADAPT tool. The ADAPT tool shows positions of probes within the most highly differentially expressed up-regulated gene and the most highly differentially expressed down-regulated gene by fold change. The ADAPT tool used with RefSeq database confirms BLASTN results (Table 1) that probes 1,3, and 11 for gene Hsbp1 do not have perfect sequence matches with the gene. The diagram also elucidates the extent of probe overlap between probes 5 through 10 for Hsbp1 (A). The ADAPT used with the Ensemble database for Hsbp1 gives different results than the ADAPT tool used with the RefSeq database, however the results still show that probe 11 does not have a perfect sequence match with the gene (B). For gene Id2, the ADAPT tool shows that all probes have perfect sequence matches with the gene whether using RefSeq or Ensemble databases (C and D).
Figure 5Histograms of BLAST results conducted on 8 selected genes. BLAST results were conducted on four up-regulated genes and four down-regulated genes. The histogram on the left (A) shows the amount of probe overlap for all probes in the 8 probe sets. Slightly more than half of the probes had no overlap. The rest of the probes had overlap of between 1 and 13 bases. The histogram on the right (B) shows the number of partial homologies for each of the probes. The median number of partial homologies over all 88 probes submitted to BLAST was 17, and first and third quartiles were 11 and 28, respectively. The search was restricted to organism Rattus norvegicus.