| Literature DB >> 20465848 |
Jenny Russ1, Matthias E Futschik.
Abstract
BACKGROUND: Human tissue displays a remarkable diversity in structure and function. To understand how such diversity emerges from the same DNA, systematic measurements of gene expression across different tissues in the human body are essential. Several recent studies addressed this formidable task using microarray technologies. These large tissue expression data sets have provided us an important basis for biomedical research. However, it is well known that microarray data can be compromised by high noise level and various experimental artefacts. Critical comparison of different data sets can help to reveal such errors and to avoid pitfalls in their application.Entities:
Mesh:
Year: 2010 PMID: 20465848 PMCID: PMC2885367 DOI: 10.1186/1471-2164-11-305
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Summary of analyzed tissue expression data sets.
| Data set | Publication | Technology | Number of genes | Number of samples | Number of tissues |
|---|---|---|---|---|---|
| Johnson et al., Science 2003 | Agilent oligonucleotide exon microarrays | 9,394 | 50 | 50 | |
| Schadt et al., Genome Biology 2004 | Agilent oligonucleotide microarrays | 13,367 | 54 | 54 | |
| Shyamsundar et al., Genome Biology 2005 | Dual-channel cDNA microarrays | 13,984 | 115 | 35 | |
| Su et al., PNAS 2004 | Affymetrix HG-U133A and GNF1H arrays | 16,499 | 158 | 79 |
The number of genes refers to the probes on the arrays that could be mapped to corresponding Entrez Gene identifiers.
Gene-wise correlation of correlations.
| Rosetta1 | Rosetta2 | Geneatlas | Stanford | ||
|---|---|---|---|---|---|
| Random | 0.0026 | 0.0033 | 0.0036 | 0.0044 | |
| Random | 0.0033 | 0.0021 | 0.0028 | 0.0037 | |
| Random | 0.0036 | 0.0028 | 0.0036 | 0.0051 | |
| Random | 0.0044 | 0.0037 | 0.0051 | 0.0048 |
Correlation of gene-wise correlations between the four data sets and between corresponding randomized gene expression matrices.
Figure 1Cluster image map of gene-based correlation of correlation matrix. Hierarchical clustering was performed for the assessment of the pair-wise similarities of the data sets. The numerical correlation of correlation values (from Table 2) are represented according to the displayed colour-bar.
Figure 2Comparison of gene expression in brain and non-brain tissues. Differential expression between brain and non-brain tissues was assessed by performing a gene-wise unpaired Student's t-test. To compare the results from different data sets, t-scores derived from each data set were plotted versus those from the other data sets for the corresponding genes. Additionally, the Pearson correlation coefficient is given.
Figure 3Distribution of PEM scores for liver tissue. The displayed distributions (shown in red) are based on the scores calculated for liver tissue in the compared data sets. To determine the significance of PEM scores, background distributions (shown in black) were generated. The threshold for PEM scores corresponding to FDR < 0.25 is shown. The displayed distributions are based on Gaussian kernel estimates.
Figure 4Distribution of MAX scores for brain tissue. The displayed distributions (shown in red) are based on the scores derived for brain tissue in compared data sets. To determine the significance of MAX scores, background distributions (shown in black) were generated. The threshold for significant MAX scores obtaining FDR < 0.25 is shown. The displayed distributions are based on Gaussian kernel estimates.
Figure 5Number of tissue-specifically over-expressed genes in the single data sets. For each data set and tissue type, genes were identified as specifically over-expressed if the corresponding PEM score is positive and achieves FDR < 0.25. Note that tissue-specific over-expressed genes could not be identified for several tissue classes in the Rosetta1, Rosetta2 and Stanford data sets due to the missing replicates. Only genes included in all four data sets were considered here.
Figure 6Concordance of assayed and uniquely over-expressed genes. Gene lists were derived for all four experiments and examined for common genes. The concordance of all assayed genes in the different microarray experiments is shown on the left side. The obtained concordance of uniquely over-expressed genes (with MAX value > 0 and FDR < 0.25) in adrenal gland, brain, kidney, liver, and lung is depicted on the right side. The largest overlap was detected between Rosetta 1 and Rosetta 2 sharing on average 21% of the detected genes. In contrast, Stanford and Rosetta2 display the smallest overlap sharing only 14% of the detected genes.
Top 20 brain-specific genes.
| Entrez GeneID | Consolidated Rank | Rosetta1 MAX | Rosetta2 MAX | Stanford MAX | Genealtas MAX | Locuslink Symbol | Description |
|---|---|---|---|---|---|---|---|
| 2670 | 1 | 5.25 | 4.90 | 5.02 | 3.48 | GFAP | Glial fibrillary acidic protein, a major intermediate filament proteins of astrocytes |
| 11075 | 2 | 4.57 | 4.45 | 4.53 | 5.04 | STMN2 | Stathmin-like 2, a neuronal growth-associated protein |
| 2596 | 3 | 6.38 | 5.69 | 4.16 | 2.75 | GAP43 | Growth associated protein 43, regulates growth of axons during development and regeneration |
| 4747 | 4 | 4.86 | 3.47 | 3.33 | 3.75 | NEFL | Neurofilament, light polypeptide, a major constituent of the axoskeleton |
| 5354 | 5 | 5.13 | 2.72 | 4.30 | 4.45 | PLP1 | Proteolipid protein 1, predominant myelin protein present in CNS |
| 5375 | 6 | 4.47 | 4.95 | 2.97 | 2.09 | PMP2 | Peripheral myelin protein 2 |
| 9568 | 7 | 3.41 | 3.89 | 4.25 | 2.19 | GABBR2 | Gamma-aminobutyric acid (GABA) B receptor 2 |
| 9118 | 8 | 4.06 | 4.02 | 3.24 | 1.79 | INA | Internexin neuronal intermediate filament protein alpha |
| 1759 | 9 | 3.24 | 3.35 | 2.93 | 2.69 | DNM1 | Dynamin 1, involved in clathrin-mediated endocytosis |
| 6456 | 10 | 3.57 | 3.45 | 2.46 | 2.51 | SH3GL2 | SH3-domain GRB2-like 2, Endophilin 1, mediator of synaptic vesicle formation |
| 6616 | 11 | 5.54 | 3.25 | 1.52 | 4.17 | SNAP25 | Synaptosomal-associated protein 25 kDa, a SNARE protein required for neuronal exocytosis |
| 29114 | 12 | 4.10 | 3.09 | 3.80 | 2.00 | NP22 | Neural protein 22 |
| 4155 | 13 | 5.10 | 3.46 | 1.30 | 4.79 | MBP | Myelin basic protein, major constituent of myelin sheath of oligodendrocytes and Schwann |
| 11076 | 14 | 2.67 | 3.43 | 2.84 | 2.27 | TPPP | Tubulin polymerization promoting protein |
| 6285 | 15 | 2.87 | 2.21 | 3.54 | 3.64 | S100B | S100 calcium binding protein B, glial-derived protein serving as neurotrophic factor and neuronal survival protein |
| 4741 | 16 | 2.97 | 2.78 | 2.55 | 2.52 | NEFM | Neurofilament, medium polypeptide 150 kDa |
| 1463 | 17 | 3.25 | 5.22 | 1.54 | 1.97 | NCAN | Neurocan, involved in the modulation of cell adhesion and migration. |
| 3797 | 18 | 3.86 | 2.98 | 2.56 | 1.56 | KIF3C | Neurospecific KIF3C kinesin family member 3 |
| 29106 | 19 | 4.83 | 2.81 | 2.50 | 1.51 | SCG3 | Secretogranin III, a neuroendocrine secretory protein |
| 81551 | 20 | 3.44 | 2.41 | 1.85 | 2.67 | STMN4 | Stathmin-like 4, regulation of the microtubule cytoskeleton |
The top twenty genes based on the integrative MAX scoring are displayed. Besides the consolidated rank, the MAX values for the single data sets are also presented.
Top 20 liver-specific genes.
| Entrez GeneID | Consolidated Rank | Rosetta1 MAX | Rosetta2 MAX | Stanford MAX | Genealtas MAX | Locuslink Symbol | Description |
|---|---|---|---|---|---|---|---|
| 3263 | 1 | 5.25 | 4.90 | 5.02 | 3.48 | HPX | Hemopexin, heme-binding plasma protein synthesized by the liver |
| 3053 | 2 | 4.57 | 4.45 | 4.53 | 5.04 | SERPIND1 | Serpin peptidase inhibitor, clade D, member 1, cofactor of heparin in plasma |
| 6580 | 3 | 6.38 | 5.69 | 4.16 | 2.75 | SLC22A1 | Solute carrier family 22 member 1, main organic cation uptake system in hepatocyte |
| 462 | 4 | 4.86 | 3.47 | 3.33 | 3.75 | SERPINC1 | Serpin peptidase inhibitor, clade C (antithrombin), member 1 |
| 8608 | 5 | 5.13 | 2.72 | 4.30 | 4.45 | RDH16 | Retinol dehydrogenase 16, involved in lipid metabolism in liver |
| 344 | 6 | 4.47 | 4.95 | 2.97 | 2.09 | APOC2 | Apolipoprotein C-II, component of very low density lipoprotein |
| 1571 | 7 | 3.41 | 3.89 | 4.25 | 2.19 | CYP2E1 | Cytochrome P450, family 2, subfamily E, polypeptide 1. cytochrome oxidase system |
| 6906 | 8 | 4.06 | 4.02 | 3.24 | 1.79 | SERPINA7 | Serpin peptidase inhibitor, clade A (antitrypsin), member 7 |
| 1559 | 9 | 3.24 | 3.35 | 2.93 | 2.69 | CYP2C9 | Cytochrome P450, family 2, subfamily 2, polypeptide 9 - |
| 1551 | 10 | 3.57 | 3.45 | 2.46 | 2.51 | CYP3A7 | Cytochrome P450, family 3, subfamily A, polypeptide 7 |
| 732 | 11 | 5.54 | 3.25 | 1.52 | 4.17 | C8B | Complement component 8, beta polypeptide |
| 731 | 12 | 4.10 | 3.09 | 3.80 | 2.00 | C8A | Complement component 8, alpha polypeptide |
| 7448 | 13 | 5.10 | 3.46 | 1.30 | 4.79 | VTN | Vitronectin - plasma protein promoting cell adhesion |
| 350 | 14 | 2.67 | 3.43 | 2.84 | 2.27 | APOH | Apolipoprotein H (beta-2-glycoprotein I) |
| 1373 | 15 | 2.87 | 2.21 | 3.54 | 3.64 | CPS1 | CPS1 carbamoyl-phosphate synthetase 1, enzyme in the hepatic urea cycle |
| 1361 | 16 | 2.97 | 2.78 | 2.55 | 2.52 | CPB2 | Carboxypeptidase B2 plasma protein regulating fibrinolyses, |
| 3273 | 17 | 3.25 | 5.22 | 1.54 | 1.97 | HRG | Histidine-rich glycoprotein, plasma protein |
| 338 | 18 | 3.86 | 2.98 | 2.56 | 1.56 | APOB | Apolipoprotein B, isoform apoB-100, exclusively synthesized in the liver |
| 2168 | 19 | 4.83 | 2.81 | 2.50 | 1.51 | FABP1 | Fatty acid binding protein 1 found in the liver |
| 10998 | 20 | 3.44 | 2.41 | 1.85 | 2.67 | SLC27A5 | Solute carrier family 27 (fatty acid transporter), member 5, involved in lipid synthesis |
The top twenty genes based on the integrative MAX scoring are displayed.
Figure 7Spearman correlation of MAX scores for brain tissue. To assess the reliability of the consolidated gene lists we performed a cross validation. The number of genes in each data set was reduced to the genes found in all four data sets with positive MAX score in brain tissue. The diagram shows the average Spearman correlation of each data set vs. the other data sets and of three consolidated data sets vs. the data set that was left out.
Figure 8Cluster image map for the GO analysis of the consolidated gene lists and of the Geneatlas data set. Genes of the consolidated lists and of the lists derived solely from the Geneatlas data were mapped to the biological processes to which the genes are assigned in Gene Ontology (GO). The significance of enrichment in informative GO categories was derived by using Fisher's exact test and adjusted for multiple testing. Hierarchical clustering was subsequently performed based on the derived false discovery rates (FDR). The cluster image maps display the FDR of the GO enrichment according to the colour-bar at the bottom.