| Literature DB >> 17224057 |
Maria A Stalteri1, Andrew P Harrison.
Abstract
BACKGROUND: Affymetrix GeneChip technology enables the parallel observations of tens of thousands of genes. It is important that the probe set annotations are reliable so that biological inferences can be made about genes which undergo differential expression. Probe sets representing the same gene might be expected to show similar fold changes/z-scores, however this is in fact not the case.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17224057 PMCID: PMC1784106 DOI: 10.1186/1471-2105-8-13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Chromosomal alignment of target and consensus sequences for Surf4 probe sets on the MOE430A array.
| 1453117_at | chr2:27,188,365 – 27,188,758(+) | chr2: 27,185,730 – 27,189,203(+) |
| 1433609_s_at | chr2:27,188,260 – 27,189,050(+) | chr2: 27,188,216 – 27,189,206(+) |
| 1455822_x_at | chr2:27,189,082 – 27,189,245(-) | chr2: 27,189,060 – 27,189,354(-) |
| 1416213_x_at | chr2:27,189,911 – 27,190,390(-) | chr2: 27,189,060 – 27,202,949(-) |
| 1448255_a_at | chr2:27,189,344 – 27,189,640(-) | chr2: 27,189,061 – 27,202,949(-) |
| 1434589_x_at | chr2:27,189,129 – 27,189,301(-) | chr2: 27,189,062 – 27,189,367(-) |
| 1436797_a_at | chr2:27,189,904 – 27,190,264(-) | chr2: 27,189,247 – 27,190,320(-) |
| 1427285_s_at | chr19:5,657,821 – 5,658,039(-) | chr19: 5,657,707 – 5,660,083(-) |
Chromosomal coordinates are from the UCSC October 2003 mouse assembly [10].
Gene symbol, LocusID/GeneID, and UniGene cluster ID for the 8 Surf4 probe sets from six successive releases of the Bioconductor moe430a annotation package [9].
| 1416213_x_at | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 |
| 20932 | 20932 | 20932 | 20932 | 20932 | 20932 | |
| Mm.196863 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | |
| 1436797_a_at | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 |
| 20932 | 20932 | 20932 | 20932 | 20932 | 20932 | |
| Mm.196863 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | |
| 1448255_a_at | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 |
| 20932 | 20932 | 20932 | 20932 | 20932 | 20932 | |
| Mm.196863 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | |
| 1434589_x_at | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 |
| 20932 | 20932 | 20932 | 20932 | 20932 | 20932 | |
| Mm.196863 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | |
| 1455822_x_at | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 |
| 20932 | 20932 | 20932 | 20932 | 20932 | 20932 | |
| Mm.196863 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | |
| 1433609_s_at | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 | Surf4 |
| 20932 | 20932 | 20932 | 20932 | 20932 | 20932 | |
| Mm.196863 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | Mm.300594 | |
| 1453117_at | Surf4 | Surf4 | Surf4 | NA | Surf2 | Surf2 |
| 20932 | 20932 | 20932 | NA | 20931 | 20931 | |
| Mm.196863 | Mm.300594 | Mm.300594 | NA | Mm.6874/Mm.300594 | Mm.6874/Mm.300594 | |
| 1427285_s_at | Surf4 | Surf4 | Surf4 | Ramp2 | Malat1 | Malat1 |
| 20932 | 20932 | 20932 | 54409 | 72289 | 72289 | |
| Mm.196863 | Mm.300594 | Mm.300594 | Mm.298256/Mm.358667 | Mm.298256 | Mm.298256 |
Figure 1Screen shot of the UCSC Genome Browser on October 2003 mouse assembly, showing Surf2 and Surf4 on chromosome 2 [10]. The black or blue boxes represent exons. The arrows in the intronic regions indicate the direction of transcription.
Figure 2Mapping of individual probes in probe sets 1416213_x_at (blue lines), 1436797_a_at (red lines), 1448255_a_at (orange lines), 1434589_x_at (dark blue lines) and 1455822_x_at (pink lines) to the 3' UTR of Surf4 exon 6. The arrow shows the direction of transcription. Probe sequences were obtained from Affymetrix [7]. Genomic sequence of mouse chromosome 2 was from the UCSC May 2004 assembly [10]. The alignment of Surf4 transcript sequences to the genomic sequence was obtained from UCSC Genome Bioinformatics [10]. Chromosomal coordinates of individual probes were obtained from Ensembl version 28 [15].
Figure 3Mapping of individual probes in probe sets 1433609_s_at (blue lines) and 1453117_at (red lines) to mouse genomic sequence. The arrow shows the direction of transcription. The diagram shows exon 5 and exon 6 of Surf2, with the 3' UTR as described by Williams and Fried [17]. The shaded box represents the alternative splice acceptor site for exon 6. Probe sequences were obtained from Affymetrix [7]. Genomic sequence of mouse chromosome 2 was from the UCSC October 2003 assembly [10]. The alignment of Surf2 transcript sequences to the genomic sequence was obtained from UCSC Genome Bioinformatics [10].
Figure 4a) The positive correlation (0.77) between 1455822_x_at (red) and 1416213_x_at (blue). b) The negative correlation (-0.62) between 1433609_s_at (blue) and 1455822_x_at (red).
Figure 5A greyscale matrix of correlation values for all the pairs of probe sets assigned to Surf4. The probe set order is 1453117_at and 1433609_s_at (Surf2), 1455822_x_at, 1416213_x_at, 1448255_a_at, 1434589_x_at, and 1436797_a_at (Surf4) and 1427285_s_at (Chromosome 19). Positive correlations are light grey to white and negative correlations are dark grey to black.
LocusIDs and probe sets corresponding to the ten largest z-score variances for the LTP data set.
| 18.4 | 24241 | Calca/CGRP | 1369116_a_at | -6.714 | A | alt. splicing |
| 1369117_at | 0.164 | A | ||||
| 1370775_a_at | -7.701 | A | ||||
| 15.8 | 59329 | Snf1lk | 1368596_at | 0.947 | A | alt. poly(A) |
| 1368597_at | -4.67 | A | ||||
| 12.9 | 25513 | Pik3r1 | 1370114_a_at | 2.598 | A | error |
| 1371776_at | -2.479 | E | ||||
| 12.6 | 316085 | RGD1307844_predicted | 1373605_at | 0.712 | A | indeterminate/error |
| 1373920_at | -4.315 | E | ||||
| 12.1 | 170796 | Grin3b | 1387559_at | -3.852 | A | error |
| 1388905_at | 1.058 | A | ||||
| 11.0 | 56827 | Cacna1i | 1369211_at | -4.51 | A | alt. splicing |
| 1370641_s_at | 0.173 | A | ||||
| 9.8 | 29415 | Edg5 | 1367920_at | -4.581 | A | alt. poly(A) |
| 1386989_at | -0.157 | A | ||||
| 9.1 | 192181 | Podxl | 1369895_s_at | 1.951 | A | alt. poly(A) |
| 1387933_s_at | -2.313 | A | ||||
| 9.0 | 311846 | Lrrc8_predicted | 1374296_at | -0.822 | E | error? |
| 1382920_at | -5.071 | A | 1374296_at beyond 3' end of RefSeq | |||
| 8.6 | 114124 | Akap1 | 1369069_at | 1.653 | A | alt. splicing |
| 1388070_a_at | -2.494 | A |
Z-scores and gene localization of probe sets mapping to the 3 top ranked LocusIDs for the LTP data set.
| 24241 | 1369116_a_at | -6.714 | exon 2, exon 3 |
| 1369117_at | 0.164 | exon 4 | |
| 1370775_a_at | -7.701 | exon5, exon 6 | |
| 59329 | 1368596_at | 0.947 | exon 15, exon 16 |
| 1368597_at | -4.67 | exon 14 | |
| 25513 | 1370114_a_at | 2.598 | chr2, reverse strand |
| 1371776_at | -2.479 | chr2, forward strand |
Alignment of the target sequences to the UCSC June 2003 rat genome assembly was performed using BLAT [19].
Figure 6Alignment of the mRNAs for calcitonin (red) and CGRP (blue) to the genomic sequence (black line). The GenBank [28] accession numbers are given in parentheses. The coloured bars represent exons, with the narrower sections representing untranslated regions. The thin lines connecting the exons represent spliced out intronic regions. Genomic sequence was from the UCSC June 2003 rat assembly [10], and the alignments of the mRNAs to the genomic sequence were obtained from UCSC Genome Bioinformatics [10].
Figure 7Mapping of individual probes in probe sets 1369117_at (red, grey lines), 1370775_a_at (blue lines) and 1369116_a_at (turquoise lines) to genomic sequence of the rat calcitonin alpha/CGRP gene. The arrow shows the direction of transcription. The introns are represented by black dotted lines and are not drawn to scale. The fifth probe in the 1369117_at probe set (grey line) does not match the genomic sequence perfectly – there is a single base mismatch between the probe sequence and the genomic sequence. Genomic sequence was from the UCSC June 2003 rat assembly [10].
Figure 8Screen shot of the Affymetrix probe set display tool showing the consensus sequence for probe sets 1368596_at and 1368597_at (blue bar) [29]. The arrow at the left shows the 5' to 3' direction along the consensus sequence. The pink bars represent the 1368597_at probes, and the green bars the 1368596_at probes. The blue triangles above the consensus sequence show poly(A) sites.
Figure 9Alignment of the target sequences for probe sets 1370114_a_at and 1371776_at to the genomic sequence of rat chromosome 2. The arrowheads show the direction of transcription. Introns are not shown, and only the 3' end of the Pik3r1 sequence is shown. Target sequences were obtained from Affymetrix [7]. Transcript sequences were obtained from GenBank [28]. Genomic sequence of rat chromosome 2 and genomic alignments of the transcripts to the UCSC June 2003 rat assembly were obtained from UCSC Genome Bioinformatics [10].
LocusIDs and probe sets corresponding to the ten largest z-score variances for the NGF data set.
| 103.8 | 25513 | Pik3r1 | 1370114_a_at | -8.577 | A | error |
| 1371776_a_at | 5.83 | E | ||||
| 33.4 | 363087 | RGD1560364_predicted | 1376129_at | 3.837 | E | error, opposite strands |
| 1380038_at | -4.34 | B | ||||
| 17.0 | 298530 | RGD1308209_predicted | 1370876_at | 1.788 | C | error, different chromosomes |
| 1376841_at | -4.046 | E | ||||
| 13.6 | 24737 | RT1-Aw2 (MHC Class I antigen) | 1369110_x_at | 1.447 | A | |
| 1370428_x_at | 2.057 | A | ||||
| 1370429_at | 3.397 | A | ||||
| 1370463_x_at | -4.724 | - | ||||
| 1370972_x_at | 0.573 | - | ||||
| 1371078_at | 3.449 | B | ||||
| 1371111_at | 0.515 | - | ||||
| 1371119_at | 0.796 | - | ||||
| 1371171_at | -0.398 | A | ||||
| 1371209_at | 7.035 | - | ||||
| 1371210_s_at | 1.582 | A | ||||
| 1371213_at | 1.965 | - | ||||
| 1388071_x_at | 4.308 | A | ||||
| 1388202_at | -7.565 | A | ||||
| 1388203_x_at | -6.808 | B | ||||
| 1388236_x_at | -0.588 | - | ||||
| 1388254_a_at | -0.57 | - | ||||
| 1388255_x_at | 4.886 | - | ||||
| 1388256_at | -0.335 | - | ||||
| 1388694_at | 5.334 | - | ||||
| 1389734_x_at | 0.967 | B | ||||
| 13.2 | 116561 | Cltb | 1367907_a_at | 1.71 | A | error, 1375324_at, wrong strand |
| 1375324_at | -3.437 | E | ||||
| 13.0 | 171454 | Btbd14b | 1371826_at | 2.191 | E | 1371826_at, 1.5 kb beyond 3' end of RefSeq sequence |
| 1387443_at | 7.287 | A | ||||
| 12.9 | 24225 | Bdnf | 1368677_at | 6.115 | A | alt. poly(A) sites; |
| 1368678_at | 1.034 | A | 1368678_at last 4 probes don't align to genomic sequence | |||
| 12.2 | 311846 | Lrrc8_predicted | 1374296_at | 1.116 | E | error |
| 1382920_at | 6.06 | A | ||||
| 12.1 | 64160 | Basp1 | 1369310_at | 3.803 | A | 1375143_at – error, wrong strand; |
| 1375143_at | 9.602 | E | ||||
| 1398350_at | 3.391 | C | 1398350_at – beyond 3' end of RefSeq | |||
| 11.8 | 25601 | Oprm1 | 1369109_at | 4.617 | A | alt. splicing |
| 1387461_at | -0.235 | A |
LocusIDs and probe sets corresponding to the ten largest z-score variances for the kidney data set.
| 39.7 | 290851 | Vps36_predicted | 1373423_at | 8.39 | E | 1373423_at, consensus sequence 2.0 kb downstream from 3' end of Ensembl prediction for Vps36 |
| 1389191_at | -0.52 | B | ||||
| 37.5 | 79131 | Fabp3 | 1367660_at | -0.84 | A | 1376522_at maps to intron |
| 1376522_at | 7.82 | E | ||||
| 32.4 | 29223 | Ak3l1 | 1371824_at | -0.51 | B | 1371824_at, 9 probes are beyond 3' end of RefSeq sequence |
| 1398285_at | -8.55 | A | ||||
| 29.0 | 293949 | RGD1310475_predicted | 1375560_at | -3.29 | A | error, opposite strands; 1389839_at, wrong strand; |
| 1389839_at | 4.33 | E | ||||
| 23.4 | 171522 | Cyp2d22 | 1370329_at | 8.72 | A | alt. poly(A); |
| 1387913_at | 1.87 | A (9/11) | 1387913_at only 9 probes align to the RefSeq transcript | |||
| 18.4 | 289696 | LOC289696 | 1371189_x_at | 0.50 | B | error; |
| 1371190_at | 7.97 | B | 1371190_at is on opposite strand from the other 2; consensus sequences align to 5 places in genome | |||
| 1388244_s_at | 0.58 | A | ||||
| 16.9 | 25715 | Slc11a2 | 1367877_at | -0.49 | B | alt. splicing |
| 1388059_a_at | -6.31 | A | ||||
| 15.5 | 171099 | Rasd2 | 1370372_at | 4.27 | A | alt. poly(A) |
| 1370373_at | -1.29 | A | ||||
| 14.7 | 287005 | Camk2n1 | 1370853_at | 4.31 | A (9/11) | error; |
| 1374307_at | -0.37 | E | 1389876_at, wrong strand; 1374307_at, beyond 3' end of RefSeq; 1370853_at, only 9 probes align to the RefSeq transcript | |||
| 1389876_at | -3.28 | E | ||||
| 14.6 | 25197 | St6gal1 | 1370714_a_at | -6.54 | A | alt. poly(A) |
| 1370907_at | -1.13 | A |
The ranks of the LocusIDs with the ten highest z-score variances for the LTP data set compared against their ranks in the NGF experiment and the kidney experiment.
| 24241 | 1 | 22 | 1509 |
| 59329 | 2 | 1357 | 1544 |
| 25513 | 3 | 1 | 89 |
| 316085 | 4 | 1009 | 187 |
| 170796 | 5 | 137 | 878 |
| 56827 | 6 | 1675 | 1448 |
| 29415 | 7 | 727 | 928 |
| 192181 | 8 | 418 | 741 |
| 311846 | 9 | 8 | 40 |
| 114124 | 10 | 181 | 1573 |