| Literature DB >> 21177656 |
Matthew N McCall1, Karan Uppal, Harris A Jaffee, Michael J Zilliox, Rafael A Irizarry.
Abstract
Various databases have harnessed the wealth of publicly available microarray data to address biological questions ranging from across-tissue differential expression to homologous gene expression. Despite their practical value, these databases rely on relative measures of expression and are unable to address the most fundamental question--which genes are expressed in a given cell type. The Gene Expression Barcode is the first database to provide reliable absolute measures of expression for most annotated genes for 131 human and 89 mouse tissue types, including diseased tissue. This is made possible by a novel algorithm that leverages information from the GEO and ArrayExpress public repositories to build statistical models that permit converting data from a single microarray into expressed/unexpressed calls for each gene. For selected platforms, users may upload data and obtain results in a matter of seconds. The raw data, curated annotation, and code used to create our resource are also available at http://rafalab.jhsph.edu/barcode.Entities:
Mesh:
Year: 2011 PMID: 21177656 PMCID: PMC3013751 DOI: 10.1093/nar/gkq1259
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Histograms of reported expression measurements and barcode standardized values. (A) Reported expression values for two genes. Values from all samples in the barcode database are shown. The red tick marks on the x-axis represent values from yeast samples expected not to hybridize. Both genes each have a single mode with a long right tail. We assume values near the mode correspond to the gene being silenced and values well above the mode correspond to the gene being expressed. However, these two genes clearly have different modes. If we were to use the background distribution of the first gene (PEG3) to estimate whether the second gene (SFN) is expressed, SFN would appear to be expressed in nearly every tissue. (B) Values standardized with the barcode approach. Notice that the mode of each distribution is now approximately zero and the yeast samples are clustered near zero. The dash lines represent a possible threshold to convert the barcode standardized measurements into a gene expression barcode.
Functional annotation clustering using DAVID
| CD4+ T cells | Cerebellum | Liver | Skeletal muscle | ||||
|---|---|---|---|---|---|---|---|
| GO Term | ES | GO Term | ES | GO Term | ES | GO Term | ES |
| RNA metabolic process | 12.4 | Synaptic transmission | 9.7 | Cellular ketone metabolic process | 26.2 | Muscle contraction | 15.8 |
| Cellular macro-molecule cata-bolic process | 8 | Transport | 9.5 | Monocarboxylic acid metabolic process | 16 | Muscle organ development | 9.1 |
| Cellular protein metabolic process | 7.6 | Neurogenesis | 7.4 | Organic acid catabolic process | 15.7 | Striated muscle tissue development | 7.1 |
| Apoptosis | 7.2 | Nervous system development | 7.3 | Steroid metabolic process | 11.4 | Energy derivation by oxidation of organic compounds | 5.9 |
| Lymphocyte activation | 6.2 | Cytoskeleton organization | 6 | Wound healing | 10.7 | Anatomical structure development | 4.5 |
The transcriptomes of four tissues were clustered using DAVID. The gene ontology (GO) term with the lowest P-value is shown to represent each cluster. ES, enrichment score.
Comparison to other tools
| Method | Tissue | Expressed | FP, % |
|---|---|---|---|
| Barcode | Kidney | 761 | 13 |
| TiGER | Kidney | 320 | 13 |
| EBI | Kidney | 245 | 14 |
| Barcode | Liver | 695 | 21 |
| TiGER | Liver | 295 | 41 |
| Bodymap | Liver | 36 | 25 |
For the competing methods we determined genes that were up-regulated in kidney and liver as compared to other tissues. For the barcode we simply obtained genes called expressed. We then compared to sec-gen data to determine the false postive rate. The barcode finds the greatest number of expressed genes in both tissues (Column 3) while maintaining the lowest false positive rate (Column 4). Note that the EBI tool does not provide information for liver, and the Bodymap does not provide it for kidney.
Figure 2.Hierarchical clustering of human and mouse tissue samples using orthologous genes. These are based on (A) average expression microarray measurements and (B) tissue specific transcriptomes based on averaged barcodes. The same genes were used in (A) and (B).