Literature DB >> 18841244

SNPinProbe_1.0: a database for filtering out probes in the Affymetrix GeneChip human exon 1.0 ST array potentially affected by SNPs.

Shiwei Duan1, Wei Zhang, Wasim Kamel Bleibel, Nancy Jean Cox, M Eileen Dolan.   

Abstract

UNLABELLED: The Affymetrix GeneChip(R) Human Exon 1.0 ST array (exon array) is designed to measure both gene-level and exon-level expression in human samples. This exon array contains approximately 1.4 million probesets consisting of approximately 5.4 million probes and profiles over 17,000 well-annotated gene transcripts in the human genome. As with all expression arrays, the exon array is vulnerable to SNPs within probes, because these SNPs can affect the hybridization of the probes and thus produce misleading expression values. In some cases, this could result in dramatic fluctuations of the exon-level expression. For this reason, we performed a genome-wide search for SNPs within regions that hybridize to probes by evaluating approximately 18 million SNPs in dbSNP (Build 129) and about 5.4 million probes in the exon array. We identified 597,068 probes within 350,382 probe sets that hybridized to regions containing SNPs. These affected probes and/or probesets can be filtered in the data processing procedure thus controlling for potential false expression phenotypes when using this exon array. AVAILABILITY: http://cid-fb2a64e541add2be.skydrive.live.com/browse.aspx/Affy%7C_HuEx%7C_1.0ST?uc=2.

Entities:  

Keywords:  Affymetrix GeneChip® human exon 1.0 ST array; SNP; database; human genome; probes

Year:  2008        PMID: 18841244      PMCID: PMC2561168          DOI: 10.6026/97320630002469

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Using high-throughput gene expression microarrays, thousands of genes are now able to be profiled in a single analysis. The Affymetrix GeneChip® Human Exon 1.0 ST array has been designed to detect novel exons, spliced exons or sub-exons of a gene in human samples [1]. The exon array uses over 5.4 million probes representing about 1.4 million probesets that are designed based on the genomic regions of known genes and regions that may harbor hypothetical genes. Compared with other arrays including the Affymetrix Genome Human Focus® array, U95® and U133® series array, the probes on the exon array are designed to cover the whole gene region instead of the 3′-untranslated regions [1]. Additionally, gene structures are represented by the probe sets with each probe set on the exon array consisting of up to 4 perfect match probes transcribed to a region of the exon. This is quite different from previous Affymetrix gene expression arrays that contain a set of perfect match and mismatch set of oligonucleotides tiled onto the microarray that account for nonspecific hybridization [1,2]. However, studies have shown that SNPs within probes can affect hybridization of the 3′ expression arrays [3] as well as the exon arrays [4-6]. Given that there are 5.4 million probes on this human exon array, there are more probes hybridizing to regions containing SNPs and the effect can be dramatic when evaluating exon level expression. SNPs found in the probe-covered regions were shown to affect the hybridization efficiency of some probes and this can cause false relationships between the SNP genotypes and gene expression levels that are represented by the probes [4-6]. Furthermore, the hybridization difference of certain probes among individuals may not actually reflect the actual expression differences of the probe-representing regions but be due to the genotype differences of the common SNPs inside the hybridized sequences of the probes [3-6]. Quality control should include the identification of the probes containing SNPs in order to filter out the affected probes prior to expression analysis, thereby controlling the confounding effects that can be caused by these SNPs [5,7,8].

Methodology

Dataset

The dataset [9] contains the probes affected by the SNPs in their hybridization regions based on the dbSNP database (version 129, genome build 36, April, 2008) [10].

Development

The genomic positions (build 36) of over 18 million SNPs were retrieved in the dbSNP database (version 129). The sequences of over 5.4 million probes and over 1.4 million probe sets were downloaded at the Affymetrix website [11]. Since the probesets are given with the genomic regions (build 36), while the probes are still annotated with the old genomic regions (build 34), a local BLAT [12] between probes and their probesets were performed to update the probe covered genomic regions. Then, a genome-wide search process was performed between ~18 million SNPs and over 5.4 million probes to identify the probes affected by the probesets.

Database content

This database [9] provides 597,068 probes within 350,382 probesets affected by the known SNPs in dbSNP (version 129).

Database usage

The user can download the list of affected probes and probesets [9], and then apply the list to filter out the affected probes using the program provided by the Affymetrix Power Tools (1.8.6) (Figure 1). This software is a free tool with the functionality to filter out a known set of probes. Removal of affected probes can be accomplished by using their highly experimental workflow through using the apt-probeset-summarize function together with the --kill-list function [13]. Resulting probeset intensities will be summarized solely on those probes not affected by SNPs. The generated expression data will be good for routine expression analysis.
Figure 1

The process of filtering out the affected probes by SNPs inside.

Caveats

There are 111,685 probes (2% of the total probes) that failed in the BLAT process possibly due to the fact that they are the background controls. We also include them in the database [9].
  8 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  BLAT--the BLAST-like alignment tool.

Authors:  W James Kent
Journal:  Genome Res       Date:  2002-04       Impact factor: 9.043

3.  Evaluation of genetic variation contributing to differences in gene expression between populations.

Authors:  Wei Zhang; Shiwei Duan; Emily O Kistner; Wasim K Bleibel; R Stephanie Huang; Tyson A Clark; Tina X Chen; Anthony C Schweitzer; John E Blume; Nancy J Cox; M Eileen Dolan
Journal:  Am J Hum Genet       Date:  2008-02-28       Impact factor: 11.025

4.  Coding SNPs included in exon arrays for the study of psychiatric disorders.

Authors:  A Sequeira; F Meng; B Rollins; R M Myers; E G Jones; S J Watson; H Akil; A F Schatzberg; J Barchas; W E Bunney; M P Vawter
Journal:  Mol Psychiatry       Date:  2008-04       Impact factor: 15.992

5.  Genetic architecture of transcript-level variation in humans.

Authors:  Shiwei Duan; R Stephanie Huang; Wei Zhang; Wasim K Bleibel; Cheryl A Roe; Tyson A Clark; Tina X Chen; Anthony C Schweitzer; John E Blume; Nancy J Cox; M Eileen Dolan
Journal:  Am J Hum Genet       Date:  2008-04-24       Impact factor: 11.025

6.  SNPs on chips: the hidden genetic code in expression arrays.

Authors:  Elzbieta Sliwerska; Fan Meng; Terence P Speed; Edward G Jones; William E Bunney; Huda Akil; Stanley J Watson; Margit Burmeister
Journal:  Biol Psychiatry       Date:  2006-05-11       Impact factor: 13.382

7.  On the challenges of the HapMap resource.

Authors:  Wei Zhang; M Eileen Dolan
Journal:  Bioinformation       Date:  2008-01-11

8.  Sequence polymorphisms cause many false cis eQTLs.

Authors:  Rudi Alberts; Peter Terpstra; Yang Li; Rainer Breitling; Jan-Peter Nap; Ritsert C Jansen
Journal:  PLoS One       Date:  2007-07-18       Impact factor: 3.240

  8 in total
  24 in total

1.  Exon Array Biomarkers for the Differential Diagnosis of Schizophrenia and Bipolar Disorder.

Authors:  Marquis Philip Vawter; Robert Philibert; Brandi Rollins; Patricia L Ruppel; Terry W Osborn
Journal:  Mol Neuropsychiatry       Date:  2018-04-10

2.  SCAN: SNP and copy number annotation.

Authors:  Eric R Gamazon; Wei Zhang; Anuar Konkashbaev; Shiwei Duan; Emily O Kistner; Dan L Nicolae; M Eileen Dolan; Nancy J Cox
Journal:  Bioinformatics       Date:  2009-11-17       Impact factor: 6.937

3.  Population differences in microRNA expression and biological implications.

Authors:  R Stephanie Huang; Eric R Gamazon; Dana Ziliak; Yujia Wen; Hae Kyung Im; Wei Zhang; Claudia Wing; Shiwei Duan; Wasim K Bleibel; Nancy J Cox; M Eileen Dolan
Journal:  RNA Biol       Date:  2011-07-01       Impact factor: 4.652

4.  Expression profiling elucidates a molecular gene signature for pulmonary hypertension in sarcoidosis.

Authors:  Sunit Singla; Tong Zhou; Kamran Javaid; Taimur Abbasi; Nancy Casanova; Wei Zhang; Shwu-Fan Ma; Michael S Wade; Imre Noth; Nadera J Sweiss; Joe G N Garcia; Roberto F Machado
Journal:  Pulm Circ       Date:  2016-12       Impact factor: 3.017

5.  The mitochondrial cardiolipin remodeling enzyme lysocardiolipin acyltransferase is a novel target in pulmonary fibrosis.

Authors:  Long Shuang Huang; Biji Mathew; Haiquan Li; Yutong Zhao; Shwu-Fan Ma; Imre Noth; Sekhar P Reddy; Anantha Harijith; Peter V Usatyuk; Evgeny V Berdyshev; Naftali Kaminski; Tong Zhou; Wei Zhang; Yanmin Zhang; Jalees Rehman; Sainath R Kotha; Travis O Gurney; Narasimham L Parinandi; Yves A Lussier; Joe G N Garcia; Viswanathan Natarajan
Journal:  Am J Respir Crit Care Med       Date:  2014-06-01       Impact factor: 21.405

6.  Identification of common genetic variants that account for transcript isoform variation between human populations.

Authors:  Wei Zhang; Shiwei Duan; Wasim K Bleibel; Steven A Wisel; R Stephanie Huang; Xiaolin Wu; Lijun He; Tyson A Clark; Tina X Chen; Anthony C Schweitzer; John E Blume; M Eileen Dolan; Nancy J Cox
Journal:  Hum Genet       Date:  2008-12-04       Impact factor: 4.132

7.  Comprehensive survey of SNPs in the Affymetrix exon array using the 1000 Genomes dataset.

Authors:  Eric R Gamazon; Wei Zhang; M Eileen Dolan; Nancy J Cox
Journal:  PLoS One       Date:  2010-02-23       Impact factor: 3.240

8.  Targeting sphingosine kinase 1 attenuates bleomycin-induced pulmonary fibrosis.

Authors:  Long Shuang Huang; Evgeny Berdyshev; Biji Mathew; Panfeng Fu; Irina A Gorshkova; Donghong He; Wenli Ma; Imre Noth; Shwu-Fan Ma; Srikanth Pendyala; Sekhar P Reddy; Tong Zhou; Wei Zhang; Steven A Garzon; Joe G N Garcia; Viswanathan Natarajan
Journal:  FASEB J       Date:  2013-01-11       Impact factor: 5.191

9.  A genome-wide association analysis of temozolomide response using lymphoblastoid cell lines shows a clinically relevant association with MGMT.

Authors:  Chad C Brown; Tammy M Havener; Marisa W Medina; J Todd Auman; Lara M Mangravite; Ronald M Krauss; Howard L McLeod; Alison A Motsinger-Reif
Journal:  Pharmacogenet Genomics       Date:  2012-11       Impact factor: 2.089

10.  Differential splicing using whole-transcript microarrays.

Authors:  Mark D Robinson; Terence P Speed
Journal:  BMC Bioinformatics       Date:  2009-05-22       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.