Literature DB >> 16845057

GEPS: the Gene Expression Pattern Scanner.

Yu-Peng Wang1, Liang Liang, Bu-Cong Han, Yu Quan, Xiao Wang, Tao Tao, Zhi-Liang Ji.   

Abstract

Gene Expression Pattern Scanner (GEPS) is a web-based server to provide interactive pattern analysis of user-submitted microarray data for facilitating their further interpretation. Putative gene expression patterns such as correlated expression, similar expression and specific expression are determined globally and systematically using geometric comparison and correlation analysis methods. These patterns can be visualized via linear plot with quantitative measures. User-defined threshold value is allowed to customize the format of the pattern search results. For better understanding of gene expression, patterns derived from 329,205 non-redundant gene expression records from the GNF SymAltas and the Gene Expression Omnibus are also provided. These profiles cover 24,277 human genes in 79 tissues, 32,905 mouse genes in 61 tissues and 4201 rat genes in 44 tissues. GEPS is available at http://bioinf.xmu.edu.cn/software/geps/geps.php.

Entities:  

Mesh:

Year:  2006        PMID: 16845057      PMCID: PMC1538815          DOI: 10.1093/nar/gkl067

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Microarray technologies have been popularly used in the identification of gene expression patterns associated with physiological or pathological states on genome scale (1,2). With their rapidly increasing use in the study of gene function, transcriptional regulation, disease etiology and drug development study of genes/proteins (1–4), a significant challenge has emerged on how to manage the overwhelming amount of transcription data generated by individual gene microarrays. Since inferring function of genes based on direct observation or simple statistical analysis of their expression profiles is both unreliable and arduous, bioinformatics tools have been developed to facilitate data analysis and interpretation (5–9). In many cases, the annotation of genes is assigned automatically using some clustering-based programs, such as GEPAS (7), DNMAD (5), MIDAW (9) and GEMS (6). Such assignments of gene functions are made by discovering the coherent expression patterns. Apart from clustering-based methods, some integrative systems employ various analysis tools such as principal component analysis, supervised classification including feature selection and cross-validation, multi-factorial ANOVA to provide wide range of data analysis (7,8,10,11). The high-level interpretation of data by mapping expression profiles onto currently available regulatory, metabolic and cellular pathways has also been reported (4). The interpretation of microarray data depends on successful selection of the consensus gene expression patterns such as correlated expression, differential expression and specific expression. These patterns are normally determined by mining gene expression profiles using different algorithms described above. Gene Expression Pattern Scanner (GEPS) is such kind of platform constructed primarily on the basis of systematic and global analysis of the gene expression patterns. One of the advantages of GEPS is the fact that the putative gene expression patterns are identified by comparing the global performance of gene expressions, thus the derived patterns may more properly reflect the true behavior of gene expression. Another advantage is that the relationships of a gene with others can be optionally listed in a descent order according to respective measures, which enables systematic study of gene expression at quantitative as well as qualitative levels. Moreover, besides of the user-submitted data, a number of public gene expression data are also provided to facilitate better understanding of gene expression behaviors.

METHODS

The data of GEPS

GEPS allows users to submit their individual normalized gene expression datasets to the system by calling an underlying dynamic CGI program. The data can be uploaded locally to remote server as a tab-delimited plain text file (‘.txt’ or Gene Expression Omnibus, GEO ‘.soft’ format), or a compressed ‘.gz’ format file in cases of the internet traffic problems. The format of the dataset is similar to the commonly used format in gene expression datasets: The first column entitled ‘ID_REF’ contains the unique ID for each gene or probeset, which is also used for browsing the analysis results. The second column entitled ‘INDENTIFIER’ is the description (e.g. gene name) of each gene or probeset. The following columns are the expression data. The first row contains the names of each column, while other rows are the expression data with one row per probeset. Null or space is not allowed in any value of the data, which should be replaced by ‘0’ or underscore ‘_’, respectively. In the same row, continued columns with same name will be merged and represented by the average value of their value during data analysis. In additional to user submitted data, GEPS also provides the pre-scanning patterns of public datasets for better understanding of gene expression. The public datasets come from two important gene expression repositories: the GNF Atlas () (12) and the GEO () (13). Currently, 19 distinct datasets of 329 205 non-redundant gene expression profiles from GNF SymAltas and GEO are deposited in GEPS, which covers about 24 277 human genes in over 79 tissues, about 32 905 mouse genes in over 61 tissues and about 4201 rat genes in 44 tissues.

The scanning of gene expression patterns

To initiate the pattern scanning, each gene expression profile is transformed into a vector X: where x is the gene expression level over tissues, time scale or other conditions and n is the number of tissues or time slots. The pattern scanning is demonstrated in three methods: similarity measure (SM), correlated analysis and specificity measure (SPM). The SM evaluates the geometric similarity between two gene expression profiles in high dimension vector space, which is given by the following equation. where θ is the angle of two vectors X and Y, |X| and |Y| are the lengths of vectors X and Y, respectively. SM ranges from 0 to 1. The correlation of two profiles X and Y can be indicated by the coefficient r, which is decided by the following equation. r ranges from −1 to 1. is the mean of gene expression levels. SPM is calculated to assess the specificity or abundance of gene expression in tissues. The SM is decided by the following equation. where α is the angle between vector and sample axis (either tissue or time) in high dimension sample space, x is the expression level in sample i, and |X| is the length of vector X.

The interpretation of GEPS

For better interpretation of biological knowledge hidden in the vast volume of data, a gene expression profile can be treated as a distribution curve (a vector during calculation) with respect to tissues, time or other conditions. Comparison between two distributions will be helpful for the identification of the gene expression patterns globally. Geometric comparison (SM) is used to indicate how similar two distribution curves are. A value of SM close to 1 means the high similarity of two distributions. This hints that these two genes may have similar expression patterns regardless of their expression levels. It may be further interpreted that these genes likely play a similar role in biological processes. However, similar expression patterns do not mean that these two gene pairs are related. Correlated analysis is thus demonstrated to tell whether the expression of two genes is correlated. A value of correlated coefficient r close to 1 or −1 concludes the high correlation of two distributions statistically, while co-expression (close to 1) or inverse-expression (close to −1) in biological extent. Such correlation further infers that these two genes may have interaction with each other or they are functionally associated proteins (14,15). Tissue-specific expression is very helpful for the understanding the physiological behavior of a gene. In many cases, the uncertainty of tissue specific genes is due to the short of quantitative measure. In this study, SPM is determined to illustrate how specificity (a value close to 1) of a gene is expressed in a tissue comparing with others. This measure can also be used to differentiate the expression of genes in varied conditions.

Access of GEPS

The GEPS can be freely accessed at . To initiate the interactive data analysis, user is required to either provide a previously assigned 6-digit file ID or upload a new dataset to the GEPS server (Figure 1). For new submitted data, user is also requested to select a data type, either count value or log ratio, to continue the analysis. An interactive search interface is generated once the data is successfully uploaded, as well a unique 6-digit ID is assigned to user for future access (Figure 2). GEPS mainly provides three ways for data query: Search patterns for genes, Compare genes and Search specific-expression genes in samples. Through the ‘Search patterns’ form, user is enabled to search expression patterns of a designated gene (represented by the probeset_ID in column ‘ID_REF’) or several genes at one time. Flexible threshold values for different measures are allowed to personalize the query. Probesets satisfying the query criteria are listed separately in three sections: co-expression, inverse-expression and similar expression (Figure 3). Through the ‘Compare genes’ form, user is allowed to compare the expression patterns between multiple genes simultaneously. The comparison results are indicated in a matrix and differentiated in colors (Figure 4). Through the ‘Search specific-expression genes’ form, user is able to browse genes that specifically expressed in designated samples (e.g. tissues or conditions). Probesets satisfying the query criteria are listed in a descending order based on the value of SPM. In all cases, clicking on a probeset_ID will lead user into the detailed information page. In the detailed information page, analysis results are summarized and visualized in charts (Figure 5). Comments on the results are also made following the rules: a value of SM >0.80 and 0.95 is interpreted as medium similar expression and highly similar expression respectively in this study. A value of correlated coefficient r more than (less than for inverse-expression) 0.75 (−0.60) and 0.90 (−0.80) is considered as medium co-expression (inverse-expression) or highly co-expression (inverse-expression), respectively. A value of SPM >0.90 and 0.99 is taken as highly abundant expression and specific expression, respectively.
Figure 1

The homepage of GEPS.

Figure 2

The interactive search interface.

Figure 3

The result page of pattern search by genes.

Figure 4

The result page of gene comparisons.

Figure 5

The detailed information page.

CONCLUSION REMARKS

The GEPS is a user-friendly platform for statistical analysis of gene expression patterns. The service of GEPS is real-time and interactive, which allows users to submit data to remote server and manage the analysis results locally. The introduction of a serial of measures enable a user to quantitatively assess the analysis results, based on which preliminary interpretation of the data is also given. The results are also visualized in compact curve charts for better understanding and interpretation of the results. However, efforts have been continuously made to improve the service in such aspects as the identification of local patterns, relationship analysis of genes systematically and better interpretation of data in biological extent.
  15 in total

1.  Endothelial cell diversity revealed by global expression profiling.

Authors:  Jen-Tsan Chi; Howard Y Chang; Guttorm Haraldsen; Frode L Jahnsen; Olga G Troyanskaya; Dustin S Chang; Zhen Wang; Stanley G Rockson; Matt van de Rijn; David Botstein; Patrick O Brown
Journal:  Proc Natl Acad Sci U S A       Date:  2003-09-08       Impact factor: 11.205

2.  DNMAD: web-based diagnosis and normalization for microarray data.

Authors:  Juan M Vaquerizas; Joaquín Dopazo; Ramón Díaz-Uriarte
Journal:  Bioinformatics       Date:  2004-07-09       Impact factor: 6.937

Review 3.  Mapping of genetic and epigenetic regulatory networks using microarrays.

Authors:  Bas van Steensel
Journal:  Nat Genet       Date:  2005-06       Impact factor: 38.330

4.  Large-scale analysis of the human and mouse transcriptomes.

Authors:  Andrew I Su; Michael P Cooke; Keith A Ching; Yaron Hakak; John R Walker; Tim Wiltshire; Anthony P Orth; Raquel G Vega; Lisa M Sapinoso; Aziz Moqrich; Ardem Patapoutian; Garret M Hampton; Peter G Schultz; John B Hogenesch
Journal:  Proc Natl Acad Sci U S A       Date:  2002-03-19       Impact factor: 11.205

5.  A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiae.

Authors:  A Grigoriev
Journal:  Nucleic Acids Res       Date:  2001-09-01       Impact factor: 16.971

Review 6.  Molecular portraits and the family tree of cancer.

Authors:  Christine H Chung; Philip S Bernard; Charles M Perou
Journal:  Nat Genet       Date:  2002-12       Impact factor: 38.330

7.  EXPANDER--an integrative program suite for microarray data analysis.

Authors:  Ron Shamir; Adi Maron-Katz; Amos Tanay; Chaim Linhart; Israel Steinfeld; Roded Sharan; Yosef Shiloh; Ran Elkon
Journal:  BMC Bioinformatics       Date:  2005-09-21       Impact factor: 3.169

8.  GEMS: a web server for biclustering analysis of expression data.

Authors:  Chang-Jiun Wu; Simon Kasif
Journal:  Nucleic Acids Res       Date:  2005-07-01       Impact factor: 16.971

9.  GECKO: a complete large-scale gene expression analysis platform.

Authors:  Joachim Theilhaber; Anatoly Ulyanov; Anish Malanthara; Jack Cole; Dapeng Xu; Robert Nahf; Michael Heuer; Christoph Brockel; Steven Bushnell
Journal:  BMC Bioinformatics       Date:  2004-12-10       Impact factor: 3.169

10.  NCBI GEO: mining millions of expression profiles--database and tools.

Authors:  Tanya Barrett; Tugba O Suzek; Dennis B Troup; Stephen E Wilhite; Wing-Chi Ngau; Pierre Ledoux; Dmitry Rudnev; Alex E Lash; Wataru Fujibuchi; Ron Edgar
Journal:  Nucleic Acids Res       Date:  2005-01-01       Impact factor: 16.971

View more
  7 in total

Review 1.  Dicer-like (DCL) proteins in plants.

Authors:  Qingpo Liu; Ying Feng; Zhujun Zhu
Journal:  Funct Integr Genomics       Date:  2009-02-17       Impact factor: 3.410

2.  TiSGeD: a database for tissue-specific genes.

Authors:  Sheng-Jian Xiao; Chi Zhang; Quan Zou; Zhi-Liang Ji
Journal:  Bioinformatics       Date:  2010-03-11       Impact factor: 6.937

3.  Evolutionary and transcriptional analysis of karyopherin beta superfamily proteins.

Authors:  Yu Quan; Zhi-Liang Ji; Xiao Wang; Alan M Tartakoff; Tao Tao
Journal:  Mol Cell Proteomics       Date:  2008-03-18       Impact factor: 5.911

4.  Mutational bias and translational selection shaping the codon usage pattern of tissue-specific genes in rice.

Authors:  Qingpo Liu
Journal:  PLoS One       Date:  2012-10-29       Impact factor: 3.240

5.  PaGeFinder: quantitative identification of spatiotemporal pattern genes.

Authors:  Jian-Bo Pan; Shi-Chang Hu; Hao Wang; Quan Zou; Zhi-Liang Ji
Journal:  Bioinformatics       Date:  2012-04-06       Impact factor: 6.937

6.  PaGenBase: a pattern gene database for the global and dynamic understanding of gene function.

Authors:  Jian-Bo Pan; Shi-Chang Hu; Dan Shi; Mei-Chun Cai; Yin-Bo Li; Quan Zou; Zhi-Liang Ji
Journal:  PLoS One       Date:  2013-12-02       Impact factor: 3.240

7.  Divergence in function and expression of the NOD26-like intrinsic proteins in plants.

Authors:  Qingpo Liu; Huasen Wang; Zhonghua Zhang; Jiasheng Wu; Ying Feng; Zhujun Zhu
Journal:  BMC Genomics       Date:  2009-07-15       Impact factor: 3.969

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.