| Literature DB >> 22219206 |
Isaac S Kohane1, Vladimir I Valtchinov.
Abstract
MOTIVATION: We investigate and quantify the generalizability of the white blood cell (WBC) transcriptome to the general, multiorgan transcriptome. We use data from the NCBI's Gene Expression Omnibus (GEO) public repository to define two datasets for comparison, WBC and OO (Other Organ) sets.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22219206 PMCID: PMC3288749 DOI: 10.1093/bioinformatics/btr713
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.A diagram representing the workflow of the analysis. (i) ‘Data Sets Definition’ includes bulk download of the GEO archive files for the GPL96 platform, parsing out all GSM sample files with similar probeset-averaged expression matrix, detecting and removing outlier genes and constructing two sets for comparison, WBC and OO sets. (ii) ‘Quantify Similarity between sets’ include: use GO mapping of the most correlated (highly expressed) genes (as defined by their FDR q-value thresholds) to quantify change across sets; the null hypothesis H0 is overlap between corresponding sets from WBC and OO by chance); use a linear model and general least squares and PCA to quantify relationships between OO and WBC across expression and correlation profiles, respectively; define two list of ‘most-changing’ and ‘least-changing’ from WBC to OO genes, across expression profiles. (iii) In ‘Ad-hoc Analyses’, we first construct the RelNets of the 200 most correlated WBC genes and their corresponding OO pairs ranks changes. We next looked at the TFs human homologs from the Mahoney Atlas as expressed in WBC. Another step is looking at how the most-changing and least-changing genes from step (ii) are represented in the list of human housekeeping genes. Finally, we run GO enrichment analysis of the ‘least-changing’ genes with respect to ‘tissue-of-expression’, in DAVID.
Fig. 2.Distribution of solid tissues in the samples of the OO set. A total of 1436 GSM samples annotations were parsed out and reviewed with 27 samples from the original 2006 download that was no longer available in GEO.
Representation of GO categories in 1000 most expressed (‘E’, in last column) or correlated (‘C’) genes
| Intersection | WBC | OO | E/C | |
|---|---|---|---|---|
| Cell | 116 | 53 | 65 | C |
| MF | 258 | 179 | 238 | C |
| BP | 280 | 220 | 233 | C |
| Cell | 144 | 18 | 87 | E |
| MF | 303 | 246 | 233 | E |
| BP | 322 | 267 | 257 | E |
Linear least squares fit coefficients as computed for (i) expression profile, and linear squares and PCA with two PCs for (ii) cross-correlation using MIC, over the FDR q<0.05 truncated complete phase space of ∼250 million ‘points’ in the phase space of the pairs (WBC, OO) correlation numbers, and (iii) same as in (ii) for Pearson's correlation coefficient
| Linear least squares fitting: | |
|---|---|
| Expression profiles | (A) |
| Residual SE | 40.47 on 21981 DF |
| Multiple | 0.5487 |
| 2.672e+04 on 1 and 21981 DF, | |
| <2.2e-16 | |
| Correlation, MIC | (A) |
| Residual SE | 0.3749 on 30617 DF |
| Multiple | 0.6169 |
| Adjusted | 0.6169 |
| 4.931e+04 on 1 and 30617 DF | |
| <2.2e-16 | |
| PCA: | PC1: 0.926; PC2: 0.07403 |
| Correlation, Pearson | (A) |
| Residual SE | 5.2748 on 60188 DF |
| Multiple | 0.4751 |
| Adjusted | 0.4751 |
| 1.721e+05 on 1 and 60188 DF | |
| <2.2e-16 | |
| PCA: | PC1: 0.8314; PC2: 0.1686 |
See Section 2 for exact algorithmic steps. Significance level is encoded as ‘***’<0.0001.
Fig. 3.Distribution of GO categories in 1000 most correlated genes in WBC and in OO.
The most changing GO categories from WBC to OO
| GO categories of 1000 most correlated genes | GO categories of 1000 most expressed genes | |
|---|---|---|
| Cell | Ribosome (21 → 121) Cytosolic large ribosomal subunit (sensu Eukaryota) (1 → 34) | Mitochondrion (72 → 129) Extracellular matrix (sensu Metazoa) (7 → 37) |
| MF | Nucleic acid binding (130 → 56) Structural constituent of ribosome (34 → 178) | Transmembrane receptor activity (9 → 1) Sugar binding (16 → 3) |
| BP | Nuclear mRNA splicing, via spliceosome (47 → 9) Protein biosynthesis (62 → 193) | Nucleosome assembly (3 → 14) sensory perception of sound (12 → 2) |