Literature DB >> 22492640

PaGeFinder: quantitative identification of spatiotemporal pattern genes.

Jian-Bo Pan1, Shi-Chang Hu, Hao Wang, Quan Zou, Zhi-Liang Ji.   

Abstract

UNLABELLED: Pattern Gene Finder (PaGeFinder) is a web-based server for on-line detection of gene expression patterns from serial transcriptomic data generated by high-throughput technologies like microarray or next-generation sequencing. Three particular parameters, the specificity measure, the dispersion measure and the contribution measure, were introduced and implemented in PaGeFinder to help quantitative and interactive identification of pattern genes like housekeeping genes, specific (selective) genes and repressed genes. Besides the on-line computation service, the PaGeFinder also provides downloadable Java programs for local detection of gene expression patterns. AVAILABILITY: http://bioinf.xmu.edu.cn:8080/PaGeFinder/index.jsp

Entities:  

Mesh:

Year:  2012        PMID: 22492640      PMCID: PMC3356841          DOI: 10.1093/bioinformatics/bts169

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Spatiotemporal variation of gene expression can happen extensively among tissues, developmental stages, physiological conditions and individuals (Lage ). The variation is believed to link with gene function and pathology. Benefiting from current applications of high-throughput technologies, e.g. microarray and next-generation sequencing (NGS), simultaneously monitoring gene differential expressions in large scale becomes easier. When digging into these large volume of data, patterns can be detected. Currently, three kinds of pattern genes, housekeeping genes, specific/selective genes and repressed genes, have received general attentions. Housekeeping genes are generally defined as genes that express ubiquitously in all conditions, which have been adopted as molecular markers in qualitatively or semi-quantitatively measuring gene expression level for a long time (Warrington ). The specific (selective) genes are a group of genes whose expressions are enriched in one or several conditions, e.g. tissues, or developmental stages (Liang ). Opposite to the specific gene expression, some genes are expressed in almost all conditions except in one or several conditions. These genes are acknowledged as repressed genes or ‘disallowed genes’ (Thorrez ). They are exceptions of housekeeping genes. The spatiotemporal preference of these pattern genes carries crucial information of what the genes do and how they work together to execute certain physiological functions. Traditionally, these pattern genes were detected by molecular technologies like RT-PCR, in situ hybridization etc. However, due to the limitation of technologies, many pattern genes identified by these methods were later found to be inappropriate when including more samples. This problem was significantly reduced with availability of large scale datasets generated by microarray, SAGE or NGS. Upon these high-throughput data, various methods were adopted previously on detecting such pattern genes, including cutoff, relative fraction (Chang ; Liu ) and learning algorithms like Naive Bayes classifier (De Ferrari and Aitken, 2006) and SVM (Dong ). Some of them are simple but qualitative (e.g. cutoff); some are quantitative but insensitive (e.g. relative fraction); some are powerful but instable and hard to be implemented (e.g. learning algorithms). Therefore in this study, we introduced three novel parameters as quantitative indicators to describe and automatically identify pattern genes from serial transcriptomic data. An on-line server was constructed as well to provide dynamic analysis service.

2 METHODS

2.1 SPM and identification of specific gene

To quantitatively estimate the relative expression specificity of a gene in a sample, the specificity measure (SPM) was introduced as following. Each gene expression profile was first transformed into a vector X: where n is the number of samples in a profile. At the same time, a vector X was created to represent the gene expression in sample i: The SPM of a gene in a sample was then determined by calculating the cosine value of intersection angle θ between vector X and X in high-dimension feature space: where |X| and |X| are the length of vector X and X, respectively. The value of SPM ranges from 0 to 1.0. A SPM value close to 1.0 indicates the major contribution of gene expression in a designated sample (e.g. vector X) against that in all samples (vector X). The higher the SPM value is (e.g. SPM ≥ 0.9), the more specific the gene expression is in a sample.

2.2 DPM and identification of housekeeping gene and repressed gene

To evaluate the variability and diversity degree of a gene expression profile, a new parameter of dispersion measure (DPM) was introduced as following. The gene expression profile (X) was first converted to its corresponding SPM profile (XSPM): The DPM was then determined by where n is the sample number, and is the mean of SPMs in a gene expression profile. Unlike conventional SD analysis, DPM is independent of gene expression level and sample number by scaling into a region of 0–1.0 as above. In this way, DPM makes variability comparable between profiles or datasets. A value of DPM close to 0 suggests equal expressions of gene over samples. Therefore, DPM can serve as a good indicator in quantitative description and identification of ‘strict’ housekeeping genes that have nearly constant expression in all samples, e.g. DPM ≤0.3. As exceptions of housekeeping genes, the repressed genes are detected by verifying gene expressions in all samples except one.

2.3 CTM and identification of selective gene

The contribution measure (CTM) is a statistical parameter developed to measure the enrichment of gene expressions in several samples. The CTM was calculated by where k is the number of selected samples. In this study, the tissue-selective genes were described and identified as genes whose expressions are enriched in limited samples (e.g. 2 ≤k≤4), in each of samples (SPM≥0.4) and together (e.g. CTM≥0.9 and DPM≥0.9).

3 ACCESS OF PaGeFinder

The PaGeFinder can be freely accessed at http://bioinf.xmu.edu.cn:8080/PaGeFinder/index.jsp. To initiate the interactive data analysis, user is required to upload a local pre-processed gene expression dataset to the remote server. The dataset should be prepared in tab-delimited format as following: the first row contains titles of each column. The first column contains unique identifiers (normally probeset IDs or gene symbols) for genes, which will be used to query or browse the analysis results. The following rows and columns are expression data of samples. Currently, the server only accepts tab-delimited plain text file or its compressed ‘.zip’ file, which cannot exceed 10 Mbits in file size. After successful file uploading, data validation function is called to check for missing data or improper values. If the dataset passes the validation, the server will respond the statistic of valid rows (genes) and columns (samples); otherwise, prompt error messages. When file is uploaded and validated successfully, an optional expression cutoff value is asked as an indicator of gene absence/presence for further data normalization. Clicking on the button ‘continue’ will lead to the query page, where analysis results can be downloaded at the right top corner of page or browsed by three different ways: Gene Search, Pattern Gene Search and Pattern Search. When query a designated gene via the ‘Gene Search’ form, its normalized expression profile, SPM distribution and its DPM evaluation will be shown. The ‘Pattern Gene Search’ form enables user to retrieve information of specific genes, housekeeping genes, selective genes and repressed genes through four independent sub-forms. The query starts by setting proper cutoffs for detecting these four pattern genes. A set of default cutoffs have been preset for convenience; however, user can freely customize results by modifying cutoff values in respective forms. Query submission will respond a sorted list of genes (identifiers) that satisfy query criteria. Clicking on an identifier will lead to the detailed information page, where the gene patterns can be visualized in charts as well as quantitative measures. The ‘Pattern Search’ form provides functions for detecting two global gene expression patterns, similarity and correlation analyses, which was previously implemented in the GEPS sever (Wang ). For those large datasets, the PaGeFinder even provides downloadable Java programs for local analysis. Currently, three standalone Java programs for SPM/DPM calculation, similarity calculation and correlation calculation are available.

4 CONCLUSION

In summary, the introduction of three novel parameters in PaGeFinder provides an easier, more sensitive and robust way in quantitative detection of gene expression patterns than current methods like cutoff and relative fraction. PaGeFinder is particularly useful for dynamic and global understanding of gene functions under serial spatiotemporal conditions. Moreover, it also can be widely applied on mining other high-throughput data based on protein, metabolite or other molecule systems. Funding: Fundamental Research Funds for the Central Universities (#2010121084) and Natural Science Foundation of China (NSFC#30873159). Conflict of Interest: none declared.
  9 in total

1.  Comparison of human adult and fetal expression and identification of 535 housekeeping/maintenance genes.

Authors:  J A Warrington; A Nair; M Mahadevappa; M Tsyganskaya
Journal:  Physiol Genomics       Date:  2000-04-27       Impact factor: 3.107

2.  Tissue-specific disallowance of housekeeping genes: the other face of cell differentiation.

Authors:  Lieven Thorrez; Ilaria Laudadio; Katrijn Van Deun; Roel Quintens; Nico Hendrickx; Mikaela Granvik; Katleen Lemaire; Anica Schraenen; Leentje Van Lommel; Stefan Lehnert; Cristina Aguayo-Mazzucato; Rui Cheng-Xue; Patrick Gilon; Iven Van Mechelen; Susan Bonner-Weir; Frédéric Lemaigre; Frans Schuit
Journal:  Genome Res       Date:  2010-11-18       Impact factor: 9.043

3.  Detecting and profiling tissue-selective genes.

Authors:  Shuang Liang; Yizheng Li; Xiaobing Be; Steve Howes; Wei Liu
Journal:  Physiol Genomics       Date:  2006-05-09       Impact factor: 3.107

4.  A large-scale analysis of tissue-specific pathology and gene expression of human disease genes and complexes.

Authors:  Kasper Lage; Niclas Tue Hansen; E Olof Karlberg; Aron C Eklund; Francisco S Roque; Patricia K Donahoe; Zoltan Szallasi; Thomas Skøt Jensen; Søren Brunak
Journal:  Proc Natl Acad Sci U S A       Date:  2008-12-22       Impact factor: 11.205

5.  Identification of human housekeeping genes and tissue-selective genes by microarray meta-analysis.

Authors:  Cheng-Wei Chang; Wei-Chung Cheng; Chaang-Ray Chen; Wun-Yi Shu; Min-Lung Tsai; Ching-Lung Huang; Ian C Hsu
Journal:  PLoS One       Date:  2011-07-27       Impact factor: 3.240

6.  Predicting housekeeping genes based on Fourier analysis.

Authors:  Bo Dong; Peng Zhang; Xiaowei Chen; Li Liu; Yunfei Wang; Shunmin He; Runsheng Chen
Journal:  PLoS One       Date:  2011-06-08       Impact factor: 3.240

7.  GEPS: the Gene Expression Pattern Scanner.

Authors:  Yu-Peng Wang; Liang Liang; Bu-Cong Han; Yu Quan; Xiao Wang; Tao Tao; Zhi-Liang Ji
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

8.  Mining housekeeping genes with a Naive Bayes classifier.

Authors:  Luna De Ferrari; Stuart Aitken
Journal:  BMC Genomics       Date:  2006-10-30       Impact factor: 3.969

9.  TiGER: a database for tissue-specific gene expression and regulation.

Authors:  Xiong Liu; Xueping Yu; Donald J Zack; Heng Zhu; Jiang Qian
Journal:  BMC Bioinformatics       Date:  2008-06-09       Impact factor: 3.169

  9 in total
  23 in total

1.  Prediction and validation of cis-regulatory elements in 5' upstream regulatory regions of lectin receptor-like kinase gene family in rice.

Authors:  Nishat Passricha; Shabnam Saifi; Mohammad W Ansari; Narendra Tuteja
Journal:  Protoplasma       Date:  2016-05-18       Impact factor: 3.356

2.  The genome of a mangrove plant, Avicennia marina, provides insights into adaptation to coastal intertidal habitats.

Authors:  Dongna Ma; Qiansu Ding; Zejun Guo; Chaoqun Xu; Pingping Liang; Zhizhu Zhao; Shiwei Song; Hai-Lei Zheng
Journal:  Planta       Date:  2022-06-09       Impact factor: 4.116

3.  Transcriptome analysis of skin color variation during and after overwintering of Malaysian red tilapia.

Authors:  Bingjie Jiang; Lanmei Wang; Mingkun Luo; Jianjun Fu; Wenbin Zhu; Wei Liu; Zaijie Dong
Journal:  Fish Physiol Biochem       Date:  2022-04-13       Impact factor: 2.794

4.  A survey of transcriptome complexity using full-length isoform sequencing in the tea plant Camellia sinensis.

Authors:  Dongna Ma; Jingping Fang; Qiansu Ding; Liufeng Wei; Yiying Li; Liwen Zhang; Xingtan Zhang
Journal:  Mol Genet Genomics       Date:  2022-06-28       Impact factor: 2.980

5.  Comparative Study of Transcriptome in the Hearts Isolated from Mice, Rats, and Humans.

Authors:  Daigo Okada; Yosuke Okamoto; Toshiro Io; Miho Oka; Daiki Kobayashi; Suzuka Ito; Ryo Yamada; Kuniaki Ishii; Kyoichi Ono
Journal:  Biomolecules       Date:  2022-06-20

6.  Identification, characterization and expression analysis of lineage-specific genes within mangrove species Aegiceras corniculatum.

Authors:  Dongna Ma; Qiansu Ding; Zejun Guo; Zhizhu Zhao; Liufeng Wei; Yiying Li; Shiwei Song; Hai-Lei Zheng
Journal:  Mol Genet Genomics       Date:  2021-08-06       Impact factor: 3.291

7.  Prediction of MicroRNA-Disease Associations Based on Social Network Analysis Methods.

Authors:  Quan Zou; Jinjin Li; Qingqi Hong; Ziyu Lin; Yun Wu; Hua Shi; Ying Ju
Journal:  Biomed Res Int       Date:  2015-07-26       Impact factor: 3.411

8.  PaGenBase: a pattern gene database for the global and dynamic understanding of gene function.

Authors:  Jian-Bo Pan; Shi-Chang Hu; Dan Shi; Mei-Chun Cai; Yin-Bo Li; Quan Zou; Zhi-Liang Ji
Journal:  PLoS One       Date:  2013-12-02       Impact factor: 3.240

9.  Analysis of spatial-temporal gene expression patterns reveals dynamics and regionalization in developing mouse brain.

Authors:  Shen-Ju Chou; Chindi Wang; Nardnisa Sintupisut; Zhen-Xian Niou; Chih-Hsu Lin; Ker-Chau Li; Chen-Hsiang Yeang
Journal:  Sci Rep       Date:  2016-01-20       Impact factor: 4.379

10.  Discrimination of the expression of paralogous microRNA precursors that share the same major mature form.

Authors:  Minghua Wang; Weiping Wang; Ping Zhang; Juanjuan Xiao; Jianguo Wang; Chaoqun Huang
Journal:  PLoS One       Date:  2014-03-03       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.