| Literature DB >> 17683638 |
David Murray1, Peter Doran, Padraic MacMathuna, Alan C Moss.
Abstract
Efforts aimed at deciphering the molecular basis of complex disease are underpinned by the availability of high throughput strategies for the identification of biomolecules that drive the disease process. The completion of the human genome-sequencing project, coupled to major technological developments, has afforded investigators myriad opportunities for multidimensional analysis of biological systems. Nowhere has this research explosion been more evident than in the field of transcriptomics. Affordable access and availability to the technology that supports such investigations has led to a significant increase in the amount of data generated. As most biological distinctions are now observed at a genomic level, a large amount of expression information is now openly available via public databases. Furthermore, numerous computational based methods have been developed to harness the power of these data. In this review we provide a brief overview of in silico methodologies for the analysis of differential gene expression such as Serial Analysis of Gene Expression and Digital Differential Display. The performance of these strategies, at both an operational and result/output level is assessed and compared. The key considerations that must be made when completing an in silico expression analysis are also presented as a roadmap to facilitate biologists. Furthermore, to highlight the importance of these in silico methodologies in contemporary biomedical research, examples of current studies using these approaches are discussed. The overriding goal of this review is to present the scientific community with a critical overview of these strategies, so that they can be effectively added to the tool box of biomedical researchers focused on identifying the molecular mechanisms of disease.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17683638 PMCID: PMC1964762 DOI: 10.1186/1476-4598-6-50
Source DB: PubMed Journal: Mol Cancer ISSN: 1476-4598 Impact factor: 27.401
Figure 1An overview of the SAGE process. The SAGE method for the comprehensive analysis of gene expression patterns consists of the following steps; 1. SAGE tags containing sufficient information to uniquely identify a transcript are isolated by amplification; 2. Tags are then linked and sequenced; 3. The resulting sequence data are analyzed to identify each gene expressed in the sample and the levels at which each gene is expressed; 4. This information forms a library that can be used to compare gene expression between tissues or cell types. For a review see [14].
Figure 2A typical output from the CGAP XProfiler online tool. In this example bulk breast cancer tissue is compared with normal tissue. This sample comparison was made on 21-October-2006.
Figure 3Typical DDD output. Following the selection of pools (A and B) for comparison, statistically significant differences are represented. Each line represents a gene. For each gene, the numbers represent the number of times that gene is represented in that particular pool. The p value for the difference is presented below that figure. Information on the gene, including its name, abbreviated title and unigene number are also presented.
Summary of in silico gene expression tools
| Resource | Application | Web Address |
| CGAP | Online genetics resource for cancer researchers including online analytical tools. | |
| DDD | Online EST comparison. | |
| DGED | Online identification of significantly different gene expression | |
| GENBANK | DNA, RNA & protein sequence database | |
| SAGEmap | Resource for the analysis of SAGE data. | |
| UniGene | A database of the transcriptome. Organises transcripts into specific clusters. | |
| XProfiler | Compares gene expression between two pools of libraries |
A comparison of the strengths and weaknesses of in silico gene expression mining tools
| DDD |
| Strengths: |
| Size of EST databases in Unigene |
| Conservative test (Fisher's exact test) used to determine significance |
| Absolute and relative counts given |
| Weakness: |
| Libraries with low EST count excluded by analysis |
| Limited number of "normal tissue" libraries |
| DGED |
| Strengths: |
| Statistically parameters can be varied |
| Results linked to tissue microarray data |
| Ability to select origin/type of tissue (e.g. micro dissected etc). |
| Genes with low abundance included |
| Weakness: |
| Comparison based on odds ratio |
| Sagemap |
| Strengths: |
| Wide variety in the source of SAGE data available. |
| Accounts for differences in sample size between groups |
| Weakness: |
| Exclusion of tags with low counts |
| XProfiler |
| Strengths: |
| Ability to compare groups and pools of libraries. |
| Outputs genes as unique/non-unique and known/unknown. |
| Ability to select origin/type of tissue (e.g. micro dissected etc). |
| Weakness: |
| Exclusion of tags with low counts |
| Common Strengths: |
| Freely available via internet |
| Unbiased view of transcriptome |
| Common Weaknesses: |
| Reliability of initial sequencing experiments. |
| Limited background knowledge of original tissues |
| Significant false positive rate/false negative rate unknown |