| Literature DB >> 16603076 |
Wenhong Fan1, Najma Khalid, Andrew R Hallahan, James M Olson, Lue Ping Zhao.
Abstract
BACKGROUND: Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip that uses multiple oligonucleotide probes (i.e. probe set), since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip gene expression array data.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16603076 PMCID: PMC1502129 DOI: 10.1186/1742-4682-3-19
Source DB: PubMed Journal: Theor Biol Med Model ISSN: 1742-4682 Impact factor: 2.432
Figure 1A Multiple probes are used to quantify the expression value for a gene in GeneChip® technology. Currently the probe design has a 3' bias, i.e. probes are selected from the sequence at the 3'end of the gene. In the Hu6800 array, twenty probes are used for a single gene. 1 B Intensities of the twenty probes are plotted for both tissues 1 and 2. 1 C The twenty probes are clustered into three groups based on the similarity of probe intensity and probe adjacency. Each cluster, called a pseudo-exon in this paper, represents a sub-region of the gene.
Figure 2Histogram of the Z-scores for all 10,838 pseudo-exons obtained in the comparison of normal cerebellum samples with medulloblastomas.
Alternative spliced genes selected by our method: Comparison of non-metastatic medulloblastomas with metastatic medulloblastomas
| Affymetrix Probe Set ID | Gene Symbol | Number of Affymetrix Probes in the Predicted Pseudo-exon | Nucleotide Positions of Predicted Pseudo-exon in the Gene | Mean Difference | Standard Error | Z-score | Description of the Genes |
| M81882_at | GAD2 | 4 | (2135–2285) | -1.28 | 0.20 | -6.45 | glutamate decarboxylase 2 (pancreatic islets and brain, 65 kDa) |
| M13955_at | KRT7 | 5 | (1402–1474) | -0.63 | 0.12 | -5.23 | keratin 7 |
| U17327_at | NOS1 | 7 | (6805–7003) | -0.66 | 0.13 | -5.19 | nitric oxide synthase 1 (neuronal) |
| X14329_at | CPN1 | 4 | (1569–1665) | -0.62 | 0.12 | -5.18 | carboxypeptidase N, polypeptide 1, 50 kD |
| M89470_s_at | PAX2 | 6 | (2855–2972) | -0.92 | 0.19 | -4.91 | paired box gene 2 |
| L14542_at | KLRC3 | 5 | (916–1006) | -1.18 | 0.24 | -4.91 | killer cell lectin-like receptor subfamily C, member 3 |
| X76648_at | GLRX | 3 | (704–776) | -1.35 | 0.28 | -4.86 | glutaredoxin (thioltransferase) |
| U82987_at | BBC3 | 3 | (1578–1638) | 2.25 | 0.32 | 6.98 | BCL2 binding component 3 |
| U01102_at | SCGB1A1 | 2 | (409–439) | 1.42 | 0.25 | 5.62 | secretoglobin, family 1A, member 1 (uteroglobin) |
| M28219_at | LDLR | 15 | (67–277) | 0.77 | 0.14 | 5.42 | low density lipoprotein receptor (familial hypercholesterolemia) |
| X68194_at | SYPL | 5 | (1915–2089) | 1.67 | 0.31 | 5.42 | synaptophysin-like protein |
| U85267_at | DSCR1 | 10 | (64–169) | 1.20 | 0.24 | 5.08 | Down syndrome critical region gene 1 |
| L36051_at | THPO | 6 | (1647–1809) | 1.05 | 0.21 | 4.96 | thrombopoietin (myeloproliferative leukemia virus oncogene ligand, megakaryocyte growth and development factor) |
Number of Affymetrix Probes in the Predicted Pseudo-exon: number of probes that are contained in a predicted alternatively spliced pseudo-exon. Nucleotide Positions of Predicted Pseudo-exon in the Gene: nucleotide positions of the pseudo-exon from the beginning of the gene it resides. Mean difference: Mean difference of the expression values between the two tissue types being compared for each predicted pseudo-exon in the t-test in STEP 2. Standard Error: the standard error calculated in the same t-test. Z-score: the ratio of mean difference over standard error (noise), a measure of significance of the difference between the two tissues being compared. The sign of the Z-scores indicate direction of the difference. A negative Z-score means a lower expression in metastatic medulloblastomas than in non-metastatic medulloblastomas, and vice-versa for a positive Z-score.
Alternative spliced genes inferred by applying Hu's method to our dataset: Comparison of normal cerebellum with medulloblastoma samples
| Affy Probe Set ID | Gene Symbol | Number of Affymetrix Probes in the Predicted Pseudo-exon | Nucleotide Positions of Predicted Pseudo-exon in the Gene | Description of the Genes |
| X51362_s_at | DRD2 | 2 | (2541–2574) | dopamine receptor D2 |
| M15517_cds5_at | TTR | 3 | (155–197) | transthyretin (prealbumin, amyloidosis type I) |
| Y10141_s_at | SLC6A3 | 2 | (96–125) | solute carrier family 6 (neurotransmitter transporter, dopamine), member 3 |
| Z14982_rna1_at | PSMB8 | 2 | (820–850) | proteasome (prosome, macropain) subunit, beta type, 8 (large multifunctional protease 7) |
| X69654_at | RPS26 | 2 | (9–35) | ribosomal protein S26 |
| U63842_at | NEUROG1 | 2 | (834–891) | neurogenin 1 |
| M97815_at | CRABP2 | 2 | (524–554) | cellular retinoic acid binding protein 2 |
| D00017_at | ANXA2 | 2 | (1229–1265) | annexin A2 |
| U13021_s_at | CASP2 | 3 | (844–913) | caspase 2, apoptosis-related cysteine protease (neural precursor cell expressed, developmentally down-regulated 2) |
| U30999_at | ALCAM | 2 | (373–403) | activated leukocyte cell adhesion molecule |
| X04828_at | GNAI2 | 3 | (1668–1701) | guanine nucleotide binding protein (G protein), alpha inhibiting activity polypeptide 2 |
| U14971_at | RPS9 | 2 | (319–373) | ribosomal protein S9 |
| U79299_at | OLFM1 | 2 | (1342–1372) | olfactomedin 1 |
| L20298_at | CBFB | 2 | (2298–2334) | core-binding factor, beta subunit |
| X93017_at | SLC8A3 | 2 | (1725–1821) | solute carrier family 8 (sodium-calcium exchanger), member 3 |
| M17886_at | RPLP1 | 2 | (127–163) | ribosomal protein, large, P1 |
| D16480_at | HADHA | 2 | (2335–2365) | hydroxyacyl-Coenzyme A dehydrogenase/3-ketoacyl-Coenzyme A thiolase/enoyl-Coenzyme A hydratase (trifunctional protein), alpha subunit |
| D38305_at | TOB1 | 2 | (707–749) | transducer of ERBB2, 1 |
| U32519_at | G3BP | 2 | (1534–1564) | Ras-GTPase-activating protein SH3-domain-binding protein |
| U07919_at | ALDH1A3 | 3 | (3363–3411) | aldehyde dehydrogenase 1 family, member A3 |
| U29953_rna1_at | SERPINF1 | 2 | (1288–1324) | serine (or cysteine) proteinase inhibitor, clade F (alpha-2 antiplasmin, pigment epithelium derived factor), member 1 |
| D55716_at | MCM7 | 2 | (2288–2396) | MCM7 minichromosome maintenance deficient 7 (S. cerevisiae) |
| J05448_at | POLR2C | 2 | (1575–1605) | polymerase (RNA) II (DNA directed) polypeptide C, 33 kDa |
| U46570_at | TTC1 | 2 | (1226–1262) | tetratricopeptide repeat domain 1 |
| D87119_at | TRB2 | 2 | (4022–4136) | tribbles homolog 2 |
| X69910_at | CKAP4 | 2 | (2543–2573) | cytoskeleton-associated protein 4 |
| U50078_at | HERC1 | 2 | (14885–14915) | hect (homologous to the E6-AP (UBE3A) carboxyl terminus) domain and RCC1 (CHC1)-like domain (RLD) 1 |
| J04164_at | IFITM1 | 2 | (798–828) | interferon induced transmembrane protein 1 (9–27) |
| AFFX-HUMRGE/M10098_3_at | N/A | 2 | (1562–1613) | N/A |
| HG2788-HT2896_at | N/A | 2 | (N/A-N/A) | N/A |
| HG2994-HT4850_s_at | N/A | 2 | (N/A-N/A) | N/A |
Number of Affymetrix Probes in the Predicted Pseudo-exon: number of probes that are contained in a predicted alternatively spliced pseudo-exon. Nucleotide Positions of Predicted Pseudo-exon in the Gene: nucleotide positions of the pseudo-exon from the beginning of the gene it resides. Mean difference: Mean difference of the expression values between the two tissue types being compared for each predicted pseudo-exon in the t-test in STEP 2. Standard Error: the standard error calculated in the same t-test. Z-score: the ratio of mean difference over standard error (noise), a measure of significance of the difference between the two tissues being compared. The sign of the Z-scores indicate direction of the difference. A negative Z-score means a lower expression in metastatic medulloblastomas than in non-metastatic medulloblastomas, and vice-versa for a positive Z-score.
Overlapping of the predicted gene from our method and Hu's method for the comparison of normal cerebellum and medulloblastoma samples
| Affy Probe Set ID | Gene Symbol | Number of Affymetrix Probes in the Predicted Pseudo-exon | Nucleotide Positions of Predicted Pseudo-exon in the Gene | Descriptions of the Genes | ||
| Ours | Hu's | Ours | Hu's | |||
| X04828_at* | GNAI2 | 3 | 3 | (1668–1701) | (1668–1701) | guanine nucleotide binding protein (G protein), alpha inhibiting activity polypeptide 2 |
| U14971_at* | RPS9 | 19 | 2 | (103–685) | (319–373) | ribosomal protein S9 |
| U29953_rna1_at* | SERPINF1 | 13 | 2 | (1288–1492) | (1288–1324) | serine (or cysteine) proteinase inhibitor, clade F (alpha-2 antiplasmin, pigment epithelium derived factor), member 1 |
| D87119_at* | TRB2 | 13 | 2 | (3824–4184) | (4022–4136) | tribbles homolog 2 |
| X69910_at | CKAP4 | 5 | 2 | (2789–2891) | (2543–2573) | cytoskeleton-associated protein 4 |
| U30999_at | ALCAM | 16 | 2 | (25–337) | (373–403) | activated leukocyte cell adhesion molecule |
| D55716_at | MCM7 | 8 | 2 | (1952–2096) | (2288–2396) | MCM7 minichromosome maintenance deficient 7 (S. cerevisiae) |
* Consistent alternative splice sites between two methods.
Comparison of the results from our approach and those from Hu's using different R thresholds when normal cerebellum samples are compared with medulloblastomas
| R used | Number of Genes Found in Hu's Approach | Number of Overlap Between Hu's and Our 577 Genes | Percentage of the overlapping genes based on number of genes found in Hu's method | Percentage of the overlapping genes based on our 577 selected genes |
| 4 | 324 | 69 | 21% | 11.9% |
| 6 | 103 | 28 | 27% | 4.9% |
| 8 | 53 | 14 | 26% | 2.4% |
| 10 | 31 | 7 | 23% | 1.2% |
Genes found in Hu's methods using different R thresholds are compared to each other. Larger R value represents more stringent selection criterion. Genes found using smaller R values always include those found using larger R values, i.e. gene list of 324 genes contains gene list of 103 genes, etc. Genes obtained from Hu's method are also compared with 577 genes from our approach. Numbers of overlapping genes are presented in the third column for different R values. Similarly, overlapping genes for the smaller R values contains those for the larger R values, i.e. gene list of 69 genes contains gene list of 28 genes, etc.