Literature DB >> 19293993

Extract-SAGE: an integrated platform for cross-analysis and GA-based selection of SAGE data.

Cheng-Hong Yang1, Tsung-Mu Shihl, Yu-Chen Hung, Hsueh-Wei Chang, Li-Yeh Chuang.   

Abstract

UNLABELLED: Serial analysis of gene expression (SAGE) is a powerful quantification technique for gene expression data. The huge amount of tag data in SAGE libraries of samples is difficult to analyze with current SAGE analysis tools. Data is often not provided in a biologically significant way for cross-analysis and -comparison, thus limiting its application. Hence, an integrated software platform that can perform such a complex task is required. Here, we implement set theory for cross-analyzing gene expression data among different SAGE libraries of tissue sources; up- or down-regulated tissue-specific tags can be identified computationally. Extract-SAGE employs a genetic algorithm (GA) to reduce the number of genes among the SAGE libraries. Its representative tag mining will facilitate the discovery of the candidate genes with discriminating gene expression. AVAILABILITY: This software and user manual are freely available at ftp://sage@bio.kuas.edu.tw/Extract-SAGE.zip.

Entities:  

Keywords:  SAGE; genetic algorithm; set theory; software

Year:  2009        PMID: 19293993      PMCID: PMC2655045          DOI: 10.6026/97320630003291

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

Serial analysis of gene expression (SAGE) is a technique that allows global profiling of gene expression in a genome without a priori knowledge [1]. The SAGE technique enables biologists to identify a series of short sequences, as well as the count of each sequence (SAGE tag) for the gene expression profile of cell or tissue types. Each short sequence is collected in a SAGE library, and the count of each short sequence represents the gene expression of its corresponding genes. Recently, many public gene expression profile platforms have been developed for use in SAGE analysis. However, most of these platforms are restricted to only two groups of paired comparison and analysis, and the displayed results are often long-winded and show poor ranking [2,3]. Therefore, it is necessary to extract, filter and arrange the useful information a way applicable to profile gene expressions, especially when it comes to multiple SAGE libraries containing myriad biological samples. In this study, we construct a cross-analysis method with visualized output for SAGE data analysis, along with retrieval of the corresponding information between SAGE tags and genes. A genetic algorithm (GA) is introduced to facilitate the analysis and accuracy of the SAGE data available to biologists, thus avoiding manual browsing and comparison of the original SAGE data.

Methodology

Implementation

Extract‐SAGE is programmed in the JAVA language [4] and compatible to many computer platforms. We analyzed 327 samples of Homo sapiens SAGE data in various types of samples from NCBI SAGEmap [5], i.e. as brain, kidney, breast, ovary, and colon data, amongst others. For tag to gene data, restriction enzymes NlaIII and Sau3A generated the SAGEmap [5]. A filtering process of gene expression data was implemented to extract significant tags and abandon trifling tags by incorporating set theory [6]. A GA was used to implement the feature selection process, and the K‐nearest neighbor (KNN) method was used to evaluate the classification accuracy [7].

Software description

Figure 1 shows three functions provided by Extract‐SAGE, i.e. 1) cross‐analysis, 2) tag to gene, and 3) reducing‐analysis (using GA). The “cross‐analysis” function provides significant genes extracted by setting some operation conditions and difference factors between samples or sample groups of interest. Two output results, a tabular and graphic form, are provided. Both of them contain tag expression (tag per million, tpm) information of each group, and can be sorted based on the expression in the selected group or the expression difference between two selected groups. The graphic visualization of the results in gradient colors for the tag count in various samples is convenient for selecting gene candidates of interest. Tags with high or low expression (tpm) are easy to identify, and a set of key tags of curative or pathogenic genes is also provided. Users can submit a tag sequence with the “tag to gene” function to retrieve the corresponding information between tags and genes.
Figure 1

Screenshot of Extract‐SAGE. (A) The main window. Demonstration of (B) cross-analysis result, (C) tag to gene results, and (D) extract result using GA.

Relevant genes in huge output genes can be extracted with the “reducing‐analysis” function. After inputting huge sample data in a defined format, the GA function provides a class labeling selection, e.g. cases and controls, for each sample, and the representative tags are output with accurate evaluation. Setting a higher population and a higher number of generations (GA parameters) results in higher performance (higher accuracy and fewer genes).

Conclusion remarks

Extract‐SAGE constitutes a novel, effective and accurate SAGE analysis platform for comparison of multiple libraries. Common or tissue‐ and cancer‐specific biomarkers can easily be mined in silico.
  4 in total

1.  SAGEmap: a public gene expression resource.

Authors:  A E Lash; C M Tolstoshev; L Wagner; G D Schuler; R L Strausberg; G J Riggins; S F Altschul
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

2.  SAGE Genie: a suite with panoramic view of gene expression.

Authors:  Peng Liang
Journal:  Proc Natl Acad Sci U S A       Date:  2002-08-23       Impact factor: 11.205

3.  Serial analysis of gene expression.

Authors:  V E Velculescu; L Zhang; B Vogelstein; K W Kinzler
Journal:  Science       Date:  1995-10-20       Impact factor: 47.728

4.  The Mouse SAGE Site: database of public mouse SAGE libraries.

Authors:  Petr Divina; Jirí Forejt
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

  4 in total
  1 in total

1.  hSAGEing: an improved SAGE-based software for identification of human tissue-specific or common tumor markers and suppressors.

Authors:  Cheng-Hong Yang; Li-Yeh Chuang; Tsung-Mu Shih; Hsueh-Wei Chang
Journal:  PLoS One       Date:  2010-12-17       Impact factor: 3.240

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.