| Literature DB >> 15722477 |
Jeremy Harbig1, Robert Sprinkle, Steven A Enkemann.
Abstract
One of the biggest problems facing microarray experiments is the difficulty of translating results into other microarray formats or comparing microarray results to other biochemical methods. We believe that this is largely the result of poor gene identification. We re-identified the probesets on the Affymetrix U133 plus 2.0 GeneChip array. This identification was based on the sequence of the probes and the sequence of the human genome. Using the BLAST program, we matched probes with documented and postulated human transcripts. This resulted in the redefinition of approximately 37% of the probes on the U133 plus 2.0 array. This updated identification specifically points out where the identification is complicated by cross-hybridization from splice variants or closely related genes. More than 5000 probesets detect multiple transcripts and therefore the exact protein affected cannot be readily concluded from the performance of one probeset alone. This makes naming difficult and impacts any downstream analysis such as associating gene ontologies, mapping affected pathways or simply validating expression changes. We have now automated the sequence-based identification and can more appropriately annotate any array where the sequence on each spot is known.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15722477 PMCID: PMC549426 DOI: 10.1093/nar/gni027
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Probesets that detect members of the interferon alpha gene family
| Probeset ID | Probe sequence | Probe location | Probeset member | Best match (score) | Sequence-based ID | Affymetrix reference sequence | Affymetrix ID | |
|---|---|---|---|---|---|---|---|---|
| X | Y | |||||||
| 211405_x_at | AGAAATACAGCCCTTGTGCCTGGGA | 514 | 123 | Probe 10 | NM_021268 (24.1) | Interferon, alpha 17 | NM_002170 | Interferon, alpha 8 |
| 207964_x_at | AGAAATACAGCCCTTGTGCCTGGGA | 515 | 123 | Probe 6 | NM_021068 (25.0) | Interferon, alpha 4 | NM_021068 | Interferon, alpha 4 |
| 208182_x_at | AGAAATACAGCCCTTGTGCCTGGGA | 516 | 123 | Probe 3 | NM_002172 (25.0) | Interferon, alpha 14 | NM_002171 | Interferon, alpha 10 |
| 208259_x_at | AGAAATACAGCCCTTGTGCCTGGGA | 517 | 123 | Probe 8 | NM_021057 (25.0) | Interferon, alpha 7 | NM_002175 | Interferon, alpha 21 |
| 211145_x_at | AGAAATACAGCCCTTGTGCCTGGGA | 518 | 123 | Probe 9 | NM_002175 (23.6) | Interferon, alpha 21 | NM_002175 | Interferon, alpha 21 |
| 208344_x_at | AGAAATACAGCCCTTGTGCCTGGGA | 519 | 123 | Probe 10 | NM_006900 (25.0) | Interferon, alpha 13 | NM_024013 | Interferon, alpha 1 |
| 208448_x_at | AGAAATACAGCCCTTGTGCCTGGGA | 520 | 123 | Probe 3 | NM_002173 (25.0) | Interferon, alpha 16 | NM_002171 | Interferon, alpha 10 |
| 208261_x_at | AGGAAATACAGCCCTTGTGCCTGGG | 17 | 79 | Probe 3 | NM_002171 (25.0) | Interferon, alpha 10 | NM_002171 | Interferon, alpha 10 |
| 208548_at | AGAGAAAAAGTACAGCCCTTGTGCC | 248 | 113 | Probe 10 | NM_021002 (25.0) | Interferon, alpha 6 | NM_000605 | Interferon, alpha 2 |
| 207932_at | No overlapping probe | NM_002170 (25.0) | Interferon, alpha 8 | NM_002170 | Interferon, alpha 8 | |||
| 208375_at | No overlapping probe | NM_024013 (25.0) | Interferon, alpha 1 | NM_024013 | Interferon, alpha 1 | |||
| 211338_at | No overlapping probe | NM_000605 (23.6) | Interferon, alpha 2 | NM_000605 | Interferon, alpha 2 | |||
| 214569_at | No overlapping probe | V00541 (25.0) | Interferon, alpha 5 | NM_002169 | Interferon, alpha 5 | |||
Indicated are single probes from several probesets that are highly similar or identical, their location on the array, and the probe number from an 11 probe set. Also indicated is the most similar gene to the probes in the probeset with the score indicating the average match across 25 possible nucleotides for 11 probes. The last two rows contain the Affymetrix reference sequence and the gene name indicated at their NetAffx website.
Probesets that do not detect human genes
| Probeset ID | U133 plus 2.0 library description | NetAFFX identification | Blast identification | Manual identification |
|---|---|---|---|---|
| 214089_at | Mitogen-activated protein kinase kinase kinase kinase 3 | Ribosomal protein S8 | No match found | Detects no gene |
| 214379_at | Heterogeneous nuclear ribonucleoprotein D-like | Heterogeneous nuclear ribonucleoprotein D-like | No match found | Detects no gene |
| 214689_at | Pregnancy-associated plasma protein-E | Placenta-specific 3 | No match found | Detects no gene |
| 214935_at | Hypothetical protein | Nucleoporin 62 kDa | No match found | Probes 9–11 weakly hybridize to nucleoporin 62 kDa transcripts |
| 217680_x_at | EST | Transcribed sequence with strong similarity to 60S ribosomal protein L10 | No match found | Some probes hybridize to ribosomal protein L10 and similar transcripts (best match LOC284393) |
| 217712_at | Moderately similar to ALU8 | Transcribed sequence with weak similarity to cytokine receptor-like factor 2 | No match found | The gene is not yet defined. The probes recognize a sequence repeated 6 times on the X chromosome. |
| 222181_at | CCR4-NOT transcription complex, subunit 2 | CCR4-NOT transcription complex, subunit 2 | No match found | Probe 5 recognizes CCR4-NOT transcription complex, subunit 2. Probes 2-4 also bind weakly |
| 220932_at | Hypothetical protein | No ID | No match found | Detects no gene |
| 211371_at | MAP kinase kinase MEK5c | Mitogen-activated protein kinase kinase 5 | U71088 | Bovine growth hormone poly A sequence engineered into commercial cloning vectors |
| 222227_at | Zinc finger protein 236 | Zinc finger protein 236 | AK000847 | SV40 poly A sequence engineered into commercial cloning vectors |
| 221106_at | HBOIT for potent brain type organic iontransporter | Solute carrier family 22 (organic cation transporter), member 17 | No match found | Rattus norvegicus solute carrier family 22 |
| 214019_at | BCL1 mRNA encoding cyclin | Cyclin D1 (PRAD1: parathyroid adenomatosis 1) | Z23022 | Mildly repetitive endogenous retroviral like element (ERVK) |
Shown is a comparison of the identifications from the original definition of the probeset, the current definition available at Affymetrix, our definition based on the sequence of the probes and a manual identification intended to define where the original sequence came from.
Characteristics of probes within the probeset 217680_at
| Probe number | Similarity to ribosomal protein L10 | PM value | MM value |
|---|---|---|---|
| 1 | No match | 452 | 131 |
| 2 | No match | 346 | 702 |
| 3 | No match | 31 | 62 |
| 4 | No match | 52 | 99 |
| 5 | No match | 21 | 24 |
| 6 | No match | 185 | 195 |
| 7 | 13 base match | 375 | 326 |
| 8 | 16 base match | 524 | 525 |
| 9 | 20 base match | 6727 | 2134 |
| 10 | 23 base match | 24 927 | 2607 |
| 11 | 25 base match | 41 487 | 18 066 |
The last five probes detect the ribosomal protein L10 with increasing affinity. The probe values for the perfect match probes (PM) and the mismatch probes (MM) are the average of 25 independent chip measurements.
Figure 1The signal captured by some probesets on the U133A array from 100 RNA samples collected from various tissues. Probeset 202029_x_at detects the expression of ribosomal protein L38. The other three probesets were designed to the complementary strand of the intended reference gene. Probeset 202028_s_at detects sequences complementary to the ribosomal protein L38. The plots for probesets 213619_at and 216868_s_at illustrate the difference between a probeset that detects a transcript and a probeset that does not detect a transcript. Although each plot is represented against a different scale, the relative expression levels are directly comparable.
Representative annotation of several Affymetrix probesets from the U133 plus 2.0 GeneChip
| Probeset ID | Best matches (average score) | Multiple genes | Splice variants | Related probesets | Unigene number | Gene name | Gene symbol | Entrez Gene ID |
|---|---|---|---|---|---|---|---|---|
| 201002_s_at | NM_021988 (25.0) | √ | √ | 201003_x_at 208270_s_at | Hs.381025 | Ubiquitin-conjugating enzyme E2 variant 1, transcript variant 1 | UBE2V1 | 7335 |
| NM_199144 (25.0) | 201001_s_at 210886_x_at | Hs.381025 | Ubiquitin-conjugating enzyme E2 variant 1, transcript variant 2 | UBE2V1 | 7335 | |||
| NM_022442 (25.0) | 210241_s_at 216315_x_at | Hs.381025 | Ubiquitin-conjugating enzyme E2 variant 1, transcript variant 3 | UBE2V1 | 7335 | |||
| NM_003349 (25.0) | Hs.381025 | Ubiquitin-conjugating enzyme E2 Kua-UEV isoform 2 | Kua-UEV | 387 522 | ||||
| NM_199203 (25.0) | Hs.381025 | Ubiquitin-conjugating enzyme E2 Kua-UEV isoform 1 | Kua-UEV | 387 522 | ||||
| 202039_at | NM_004740 (25.0) | √ | √ | Hs.354085 | TGFB1-induced anti-apoptotic factor 1 | TIAF1 | 9220 | |
| NM_078471 (25.0) | Hs.354085 | Myosin XVIIIA | MYO18A | 399 687 | ||||
| 206900_x_at | NM_021047 (25.0) | √ | 221625_at | Hs.407162 | Zinc finger protein 253 | ZNF253 | 56 242 | |
| hum_alu_at | ||||||||
| 206572_x_at | ||||||||
| 217547_x_at | ||||||||
| 215532_x_at | ||||||||
| 221626_at | ||||||||
| 215758_x_at | ||||||||
| 203639_s_at | NM_000141 (25.0) | √ | AFFX- hum_alu_at | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 1 | FGFR2 | 2663 | |
| NM_022969 (25.0) | 211401_s_at 203638_s_at | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 2 | FGFR2 | 2663 | |||
| NM_022970 (25.0) | 208225_at 208228_s_at | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 3 | FGFR2 | 2663 | |||
| NM_022972 (25.0) | 208234_x_at | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 5 | FGFR2 | 2663 | |||
| NM_022975 (25.0) | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 8 | FGFR2 | 2663 | ||||
| NM_023028 (25.0) | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 10 | FGFR2 | 2663 | ||||
| NM_023029 (25.0) | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 11 | FGFR2 | 2663 | ||||
| NM_023030 (25.0) | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 12 | FGFR2 | 2663 | ||||
| NM_023031 (25.0) | Hs.404081 | Fibroblast growth factor receptor 2, transcript variant 13 | FGFR2 | 2663 | ||||
| 1007_s_at | NM_001954 (25.0) | √ | 207169_x_at 210749_x_at | Hs.423573 | Discoidin domain receptor family, member 1, transcript variant 2 | DDR1 | 780 | |
| NM_013993 (25.0) | 208779_x_at | Hs.423573 | Discoidin domain receptor family, member 1, transcript variant 1 | DDR1 | 780 | |||
| NM_013994 (25.0) | Hs.423573 | Discoidin domain receptor family, member 1, transcript variant 3 | DDR1 | 780 | ||||
| 217547_x_at | NM_007153 (5.83) | √ | Hs.515712 | Zinc finger protein 208 | ZNF208 | 7757 | ||
| 211610_at | No best match | Hs.534315 | Caution, check this probeset carefully. This probeset may detect an unusual splice variant, alternate termination site, or extended transcript of core promoter element binding protein. It is also a chimeric probeset with some of the probes detecting a locus 1.7 Mb away on chromosome 10 | COPEB | 1316 |