| Literature DB >> 24599084 |
John P Fahrenbach1, Jorge Andrade2, Elizabeth M McNally3.
Abstract
BACKGROUND: Meta-analysis of gene expression array databases has the potential to reveal information about gene function. The identification of gene-gene interactions may be inferred from gene expression information but such meta-analysis is often limited to a single microarray platform. To address this limitation, we developed a gene-centered approach to analyze differential expression across thousands of gene expression experiments and created the CO-Regulation Database (CORD) to determine which genes are correlated with a queried gene.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24599084 PMCID: PMC3944024 DOI: 10.1371/journal.pone.0090408
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Construction of CORD database using a gene-centric approach.
A) A part of the microarray study E-MEXP-3167 from the ArrayExpress database. B) The samples were grouped together by either the Individual or Grouped Factor Method and all the groups were compared to one another. C) The differentially genes for each comparison was determined. Genes with multiple probes were reduced to one entry by averaging the fold change for the multiple probes. If the multiple probes for a gene were differentially regulated in the opposite direction, the gene was removed from the list of differentially expressed genes.
Figure 2Determination of co-regulated genes.
A) The list of co-regulated genes was determined for each gene using the Individual and Grouped Factor Method. The two gene lists were then compared to one another by determining the % overlap (similarity) of the lists for the top 10 to top 1000 most correlated genes. The % overlap reached a plateau at 47%. B) The first derivative of the % overlap vs. the gene list size shows that on average, after comparing the top 400 genes the lists are no longer similar. C, D) This analysis was repeated for randomly generated gene lists and showed no change in the rate of % overlap vs. gene list size. E) To determine how using the Individual or Grouped Factor method effected gene-gene correlation co-efficients, we analyzed the ratio of the correlation co-efficient for each gene-gene pair. A histogram of this data shows that on average, the Grouped Factor method yielded higher correlation co-efficients.
Figure 3CORD results for CDKN2A.
A) CDKN2A encoding p16 plays a significant role in the cell cycle by regulating the initiation of DNA replication. A simplified diagram shows select genes that play a major role in the cell cycle. CORD identifies many genes known to play major roles in the cell cycle by determining genes co-regulated with CDKN2A (bolded text.) B) The CDKN2A-correlated genes were over representative for several KEGG pathways in cancer and the cell cycle including “DNA replication”, “p53 signaling”, and “cell cycle.”
Figure 4CORD results for VIM.
A) The epithelial-to-mesenchymal (EMT) and mesenchymal-to-epithelial transitions are important oncogenic pathways where vimentin (VIM) plays a central role. Twelve of the top 20 most correlated VIM genes affect the EMT transition. B) The EMT transitions depend heavily on cell adhesion. The VIM-correlated genes were over representative for several KEGG pathways in cell adhesion and cancer pathways such as “ECM-receptor interaction”, “Focal adhesion”, and “Pathways in cancer.”
Top 20 Genes Co-expressed with vimentin (VIM) identified by CORD.
| Gene symbol | Description | EMT | References |
| ANXA1 | annexin A1 | x |
|
| S100A6 | S100 calcium binding protein A6 | x |
|
| ANXA2 | annexin A2 | x |
|
| CAPG | capping protein (actin filament), gelsolin-like | x |
|
| LGALS1 | lectin, galactoside-binding, soluble, 1 | ||
| SPARC | secreted protein, acidic, cysteine-rich (osteonectin) | x |
|
| S100A4 | S100 calcium binding protein A4 | x |
|
| EMP3 | epithelial membrane protein 3 | x |
|
| AXL | AXL receptor tyrosine kinase | x |
|
| SERPINH1 | serpin peptidase inhibitor, clade H, member 1 | ||
| TIMP1 | TIMP metallopeptidase inhibitor 1 | x |
|
| COL1A1 | collagen, type I, alpha 1 | x | |
| calponin 2 | |||
| LGALS3 | lectin, galactoside-binding, soluble, 3 | ||
| FSTL1 | follistatin-like 1 | x |
|
| GPX8 | glutathione peroxidase 8 (putative) | ||
| RBMS1 | RNA binding motif, single stranded interacting protein 1 | ||
| COL5A2 | collagen, type V, alpha 2 | ||
| NOTCH2 | notch 2 | x | |
| CMTM3 | CKLF-like MARVEL transmembrane domain containing 3 |
Figure 5CORD results for MYOD1.
A) The differentiation of muscle stem cells (satellite cells) to myoblasts and ultimately to skeletal muscle is under the control of muscle regulatory factors including the transcription factor MyoD. CORD output for MYOD1 demonstrates co- expression of other muscle regulatory factors like myogenin (MYOG) and many genes implicated in muscle differentiation. B) The MyoD1 correlated genes were over representative for several KEGG pathways relating to muscle such as “Cardiac muscle contraction” and “Dilated cardiomyopathy.”