Literature DB >> 19398447

GeneRegionScan: a Bioconductor package for probe-level analysis of specific, small regions of the genome.

Lasse Folkersen1, Diego Diez, Craig E Wheelock, Jesper Z Haeggström, Susumu Goto, Per Eriksson, Anders Gabrielsen.   

Abstract

SUMMARY: Whole-genome microarrays allow us to interrogate the entire transcriptome of a cell. Affymetrix microarrays are constructed using several probes that match to different regions of a gene and a summarization step reduces this complexity into a single value, representing the expression level of the gene or the expression level of an exon in the case of exon arrays. However, this simplification eliminates information that might be useful when focusing on specific genes of interest. To address these limitations, we present a software package for the R platform that allows detailed analysis of expression at the probe level. The package matches the probe sequences against a target gene sequence (either mRNA or DNA) and shows the expression levels of each probe along the gene. It also features functions to fit a linear regression based on several genetic models that enables study of the relationship between gene expression and genotype.
AVAILABILITY AND IMPLEMENTATION: The software is implemented as a platform-independent R package available through the Bioconductor repository at http://www.bioconductor.org/. It is licensed as GPL 2.0. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities:  

Mesh:

Year:  2009        PMID: 19398447      PMCID: PMC2712334          DOI: 10.1093/bioinformatics/btp279

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Expression arrays enable us to interrogate the transcription level of all known genes in a single hybridization experiment. A feature of the Affymetrix GeneChip platform is that several probes represent one gene, allowing a detailed investigation of the expression pattern along the gene sequence. However, this complexity can be detrimental when we want to focus on analyzing thousands of genes simultaneously. For that reason, traditional tools available for the analysis of expression microarrays are focused on a general approach, providing summarized values of all the probes in a probe set [e.g. RMA (Bolstad et al., 2003), MAS5 and PLIER]. Although this approach is necessary in many cases, it excludes information that can be critical for the correct interpretation of some experimental results. For example, when a non-trivial gene structure, perhaps with newly uncovered transcript variants, has rendered the established probe set structure obsolete. Individual probe sequences might also be rendered useless by updates to the reference sequence. In other cases polymorphisms in the gene sequence alter the apparent expression level of some probes. Finally, sometimes two or more probe sets map to the same gene, giving contradictory information that can be difficult to resolve using a summarizing approach. Accordingly, without further investigation, the biological relevance of these results is uncertain. Examples of these situations are provided in the Supplementary Material. To solve these complex situations, the best approach is to use all available data, and visualize the expression level and the location in the genome of individual probes. This is true for both traditional 3′ IVT arrays and for exon arrays. The information needed to perform this analysis is available (in the CEL files), but at present there is no easy way to visualize and analyze it. Herein we present GeneRegionScan, a software package for the statistical platform R, which provides the means to extract and visualize information about individual probes in an automated fashion. The primary goal of the presented package is to facilitate the analysis of alternative splicing in the broadest sense. Since alternative splicing can be mediated by local SNPs (Kwan et al., 2008), a specific goal of the package is to assist in the analysis of the relationship between expression levels and SNP genotype. To exemplify its utility, we investigated the effect of genotype on the expression of a set of leukotriene pathway genes, which are of specific interest in cardiovascular disease.

2 DESCRIPTION

To demonstrate the package functionality, we obtained previously published data for 171 lymphoblastoid cell lines from 57 individuals from the HapMap CEU population (HapMap, 2003). Expression studies based on these data have been published (Kwan et al., 2008) using the Affymetrix Human Exon ST 1.0 arrays [available through GEO (Edgar, et al., 2002) accession number GSE9372]. In that work, Kwan and co-workers explored the association between genotypic differences and expression values for the entire genome. Leukotriene pathway genes are of interest because they are involved in the inflammatory response, which is a central part of the pathophysiology of cardiovascular disease. The relation between expression levels and a set of SNPs in the leukotriene pathway genes has recently been shown to be directly associated with ischemic stroke (Bevan et al., 2008). In this example, we combined the knowledge of genotype effects seen in the leukotriene cascade genes with the GSE9372 expression data and the HapMap genotype data for these genes, to perform an analysis with GeneRegionScan. The ALOX5AP gene encodes for the five-lipoxygenase activating protein which, with 5-lipoxygenase, is required for leukotriene synthesis and is therefore a vital component of the inflammatory response. We investigated all the SNPs used in Bevan et al. with ALOX5AP and applied the same 0-1-2, 0-0-1 and 0-1-1 genetic models. 0-1-2 is a codominant model (three genotype groups per SNP separately) in which the heterozygote is valued as 1, and the homozygotes as 0 and 2, respectively. 0-0-1 and 0-1-1 are recessive and dominant models—essentially comparing groupings of heterozygote and homozygote samples with samples of the other homozygote type. A linear regression was fitted to the defined models to test the relation between specific genotype models and expression values. Further description of this algorithm can be found in the Supplementary Materials, as well as in the software documentation. The most interesting result was SNP rs3885907, also referred to as FL10 by Bevan et al., which evidenced a highly significant expression change when comparing the risk allele AA samples with the heterozygote and the non-risk allele CC samples (Fig. 1). In addition, this SNP was also found to confer a 1.473-fold increased risk of ischemic stroke (Bevan et al., 2008). Figure 1 shows that: (i) possessing two copies of the risk allele A results in a decreased intensity for all probes across the entire ALOX5AP gene, and(ii) this effect is not mediated through alternative splicing, since all probes matching to the mRNA show the same trend. These results suggest that the SNP rs3885907 or a linked SNP confers a mechanism for controlling the expression level of the ALOX5AP transcript, which could have implications in the development of cardiovascular disease.
Fig. 1.

Effect of the rs3885907 SNP on the expression level of ALOX5AP. Visualization of the expression levels of all probes with sequences mapping to current ALOX5AP sequence. The data have been stratified by risk allele (AA) or non-risk allele (CC and heterozygote) of rs3885907. The y-axis indicates intensity values of each probe. The x-axis shows the length of the gene ALOX5AP, measured in base pair. Vertical bars show the span of the 25% and 75% quartiles of samples that are either AA (blue) or CC/heterozygote (red). Gray and black circles highlight probes, which have a significant relation to the genotype of the SNP. The significance of this relation has been calculated using a linear additive model as implemented in R. The exon structure of ALOX5AP is shown along the x-axis. It has been created using the exonStructure function based on sequence data from the UCSC genome browser. Expression data were taken from 171 samples in the Gene Expression Omnibus (GEO) dataset GSE9372. Genotype data were downloaded from the HapMap project.

Effect of the rs3885907 SNP on the expression level of ALOX5AP. Visualization of the expression levels of all probes with sequences mapping to current ALOX5AP sequence. The data have been stratified by risk allele (AA) or non-risk allele (CC and heterozygote) of rs3885907. The y-axis indicates intensity values of each probe. The x-axis shows the length of the gene ALOX5AP, measured in base pair. Vertical bars show the span of the 25% and 75% quartiles of samples that are either AA (blue) or CC/heterozygote (red). Gray and black circles highlight probes, which have a significant relation to the genotype of the SNP. The significance of this relation has been calculated using a linear additive model as implemented in R. The exon structure of ALOX5AP is shown along the x-axis. It has been created using the exonStructure function based on sequence data from the UCSC genome browser. Expression data were taken from 171 samples in the Gene Expression Omnibus (GEO) dataset GSE9372. Genotype data were downloaded from the HapMap project.

3 CONCLUSION

Herein, we present a software package that enables fine-grained probe-level analysis on a gene-by-gene level. Individual probes are matched against the gene sequence, and the probe intensities are plotted. The per-probe approach gives a more versatile tool to investigate different transcript variants and discover all information available about differences across the length of each transcript. Package utility was demonstrated by analyzing data from lymphoblastoid cells, which evidenced an association between a SNP that is associated with a higher risk of myocardial infarction and the expression level of ALOX5AP.
  5 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  The International HapMap Project.

Authors: 
Journal:  Nature       Date:  2003-12-18       Impact factor: 49.962

3.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors:  B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal:  Bioinformatics       Date:  2003-01-22       Impact factor: 6.937

4.  Genome-wide analysis of transcript isoform variation in humans.

Authors:  Tony Kwan; David Benovoy; Christel Dias; Scott Gurd; Cathy Provencher; Patrick Beaulieu; Thomas J Hudson; Rob Sladek; Jacek Majewski
Journal:  Nat Genet       Date:  2008-01-13       Impact factor: 38.330

5.  Genetic variation in members of the leukotriene biosynthesis pathway confer an increased risk of ischemic stroke: a replication study in two independent populations.

Authors:  Steve Bevan; Martin Dichgans; H Erich Wiechmann; Andreas Gschwendtner; Thomas Meitinger; Hugh S Markus
Journal:  Stroke       Date:  2008-03-06       Impact factor: 7.914

  5 in total
  3 in total

1.  Functional analysis of two PLA2G2A variants associated with secretory phospholipase A2-IIA levels.

Authors:  Holly J Exeter; Lasse Folkersen; Jutta Palmen; Anders Franco-Cereceda; Jackie A Cooper; Anastasia Z Kalea; Ferdinand Van't Hooft; Per Eriksson; Steve E Humphries; Philippa J Talmud
Journal:  PLoS One       Date:  2012-07-17       Impact factor: 3.240

2.  Relationship between CAD risk genotype in the chromosome 9p21 locus and gene expression. Identification of eight new ANRIL splice variants.

Authors:  Lasse Folkersen; Theodosios Kyriakou; Anuj Goel; John Peden; Anders Mälarstig; Gabrielle Paulsson-Berne; Anders Hamsten; Anders Franco-Cereceda; Anders Gabrielsen; Per Eriksson
Journal:  PLoS One       Date:  2009-11-02       Impact factor: 3.240

3.  Endogenous control genes in complex vascular tissue samples.

Authors:  Lasse Folkersen; Sanela Kurtovic; Anton Razuvaev; Hanna E Agardh; Anders Gabrielsen; Gabrielle Paulsson-Berne
Journal:  BMC Genomics       Date:  2009-11-10       Impact factor: 3.969

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.