| Literature DB >> 29297310 |
Herty Liany1,2, Jagath C Rajapakse3, R Krishna Murthy Karuturi4,5.
Abstract
BACKGROUND: Differential co-expression (DCX) signifies change in degree of co-expression of a set of genes among different biological conditions. It has been used to identify differential co-expression networks or interactomes. Many algorithms have been developed for single-factor differential co-expression analysis and applied in a variety of studies. However, in many studies, the samples are characterized by multiple factors such as genetic markers, clinical variables and treatments. No algorithm or methodology is available for multi-factor analysis of differential co-expression.Entities:
Keywords: Differential co-expression; Gene expression; Multi-factor analysis; MultiDCoX
Mesh:
Substances:
Year: 2017 PMID: 29297310 PMCID: PMC5751780 DOI: 10.1186/s12859-017-1963-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Differential Co-Expression. Geneset is co-expressed in normal samples but not in disease samples
Fig. 2Illustration of Amn(I) for co-expression and non co-expression. A (I) tends to be higher for tighter co-expression of a geneset, while it is close to 0 for no co-expression as illustrated by the boxplots for presence and absence of co-expression of genesets
Fig. 3Flowchart of MultiDCoX algorithm. It captures all four steps of the algorithm, which are applied on a dataset until no additional DCX geneset is identified
Fig. 4Illustration of selection of thresholds of significance for coefficients. Density plots of all coefficients (of the simulation data) resulted by MultiDCoX model fitting for varying number of sample/stratum. Thresholds are chosen to be first valleys either side of the central peak
Fig. 5Simulation results. The simulations were carried out for 5 samples/stratum, 10 samples/stratum and 20 samples/stratum. Set 1 represents gene set simulated to be co-expressed only in samples B1 = −1, while Set 2 represents gene set simulated to be co-expressed for B1 = 1 and B2 = 1 (a) FDR, (b) FNR, (c) Failure rate of identifying DCX genesets, (d) Failure rate of identifying DCX profile of DCX genesets, and (e) FPR of DCX genesets (non-DCX genesets)
A gene set differentially co-expressed by p53-mutational status (p-value = 2.75E-231 and coefficient = 1.137) only and insignificant for the other co-factors: coefficients/p-values for ER and Grade are 0.087/0.114 and −0.063/0.028 respectively. Co-expression of the set occurs in p53 mutated tumors only. ER dependent differential expression, ER binding sites and p53 binding sites are also given for the geneset
| No. | Gene | ER (DE) | ER Binding Site | p53 Binding Site | Gene Description |
|---|---|---|---|---|---|
| 1. | GFRA1 | Yes (up) | Yes (dist = 58.5 kb) | No | TGF-beta related neurotrophic factor receptor |
| 2. | FOXA1 | No | Yes (dist = 4.79 kb) | No | Forkhead box protein A1 |
| 3. | GATA3 | No | Yes (dist = 30.33 kb) | Yes | GATA binding protein 3 |
| 4. | SPDEF | No | Yes (dist = 1.15 kb) | No | SAM pointed domain containing ets transcription factor |
| 5. | ESR1 | Yes (up) | Yes (dist = 32.24 kb) | Yes | Estrogen receptor 1 |
| 6. | GAMT | No | dist >100 kb | Yes | guanidinoacetate N- methyltransferase |
| 7. | TOX3 | No | dist >100 kb | No | TOX high mobility group box family member 3 |
| 8. | AGR3 | Yes (up) | Yes (dist = 54.06 kb) | No | anterior gradient 3 homolog ( |
| 9. | SDR16C5 | No | dist >100 kb | No | Short-chain dehydrogenase/reductase family 16C member 5 |
| 10. | PIP | No | dist >100 kb | No | prolactin-induced protein |
| 11. | CYP2B7P1 | No | dist >100 kb | No | cytochrome P450, family 2, subfamily B, polypeptide 7 pseudogene 1 |
| 12. | SYTL5 | Yes (up) | Yes (dist = 94.21 kb) | No | synaptotagmin-like protein 5 |
| 13. | MKX | No | Yes (dist = 35.21 kb) | No | mohawk homeobox |
| 14. | REEP6 | No | dist >100 kb | No | receptor accessory protein 6 |
| 15. | AGR2 | Yes (up) | Yes (dist = 2.15 kb) | No | anterior gradient 2 homolog (Xenopus laevis) |
| 16. | ANKRD30A | No | dist >100 kb | No | ankyrin repeat domain 30A |
| 17. | CA12 | Yes (up) | Yes (dist = 56.69 kb) | Yes | Carbonate dehydratase XII |
| 18. | SCGB2A1 | No | dist >100 kb | No | secretoglobin, family 2A, member 1 |
Fig. 6The co-expression plot of set 1 (Table 1) in p53+ tumors in the breast cancer data. a Co-expression of geneset 1 (18 genes) across p53 mutant tumor (p53+) samples; gray color line indicates mean expression value of geneset 1. b The geneset 1 showed no co-expression in p53 wild-type samples (p53-); gray color line indicates mean expression value of geneset 1
A gene set differentially co-expressed by ER-status (p-value = 1.34 × 10−252 and coefficient = −1.117) only and insignificant for the other cofactors: coefficients/p-values for p53 and Grade are 0.294/1.05E-51 and 0.095/9.33E-09 respectively. Co-expression of the set occurs in ER-negative tumors only ER dependent differential expression, ER binding sites and p53 binding sites are also given for the gene set
| No. | Gene | ER (DE) | ER Binding Site | Gene Description |
|---|---|---|---|---|
| 1. | BRCA2 | Yes (up) | dist >100 kb | breast cancer 2, early onset |
| 2. | ABCC3 | Yes (down) | Yes(dist = 20.96 kb) | ATP-binding cassette, sub-family C |
| 3. | ITGB6 | Yes (down) | dist >100 kb | integrin, beta 6 |
| 4. | ABCC11 | No | Yes(dist = 68.96 kb) | ATP-binding cassette, sub-family C (CFTR/MRP), member 11 |
| 5. | SNED1 | No | Yes(dist = 94.62 kb) | Insulin-responsive sequence DNA- binding protein 1 |
| 6. | NQO1 | Yes (down) | Yes(dist = 32.63 kb) | NAD(P)H dehydrogenase, quinone 1 |
| 7. | LOC254057 | No | NA | uncharacterized LOC254057 |
| 8. | SPDEF | No | Yes(dist = 1.159 kb) | SAM pointed domain containing ets transcription factor |
| 9. | FABP4 | No | Yes(dist = 1.159 kb) | fatty acid binding protein 4, adipocyte |
| 10. | CEACAM6 |
| Yes(dist = 19.05 kb) | carcinoembryonic antigen-related cell adhesion molecule 6 |
| 11. | DUSP4 | No | Yes(dist = 19.138 kb) | dual specificity phosphatase 4 |
| 12. | SERHL2 | No | Yes(dist = 32.63 kb) | serine hydrolase-like 2 |
| 13. | RBP4 | No | Yes(dist = 20.489 kb) | retinol binding protein 4, plasma |
| 14. | PTK6 |
| dist >100 kb | PTK6 protein tyrosine kinase 6 |
| 15. | TMC5 | No | dist >100 kb | transmembrane channel-like 5 |
| 16. | EEF1A2 | No | dist >100 kb | eukaryotic translation elongation factor 1 alpha 2 |
| 17. | CLIC3 |
| Yes(dist = 0.317 kb) | chloride intracellular channel 3 |
| 18. | LBP | No | dist >100 kb | lipopolysaccharide binding protein |
| 19. | MMP1 | No | dist >100 kb | matrix metallopeptidase 1 (interstitial collagenase) |
| 20. | FAM5C | No | dist >100 kb | family with sequence similarity 5, member C |
| 21. | AGR2 | Yes (up) | Yes(dist = 2.154 kb) | anterior gradient 2 homolog (Xenopus laevis) |
Fig. 7The co-expression plot of set 2 (Table 2) tumors in breast cancer data. a Co-expression of geneset 2 (21 genes) in ER-negative tumor samples; gray color line indicates sample-wise mean expression value of the geneset. b The geneset 2 showed no co-expression in ER-positive tumor samples; gray line indicates mean expression value
Examples of genesets whose co-expression is influenced by more than one factor. (1) geneset in the 1st row, containing CXCL13 and MMP1, is differentially co-expressed by ER and Grade covariates
| Co-expression | Genes | ER coefficient | ER pvalue | p53 coefficient | p53 pvalue | Grade coefficient | Grade pvalue |
|---|---|---|---|---|---|---|---|
| ER+ & Grade+ | HORMAD1,SCGB1D2, ABCB1,IGHM,CXCL13, FAM20B,IGK,CCL18, LOC100291464,FCRL5, IGHA1,LOC100293440, IGL,IGLV1–44,IGH, IGKV4–1, IGHD, LOC100130100,FABP7, NKG7,MMP1,PIGR, LOC652493 | 0.592 | 3.64E-05 | −0.343 | 0.000 | 1.525 | 5.05E-24 |
| ER+ & P53- & Grade+ | CLEC3A, MUC5B, RAD51C,CYP2A6,CHGB, CARTPT,GRIA2,INSM1, NTS,PCSK1 | 0.662 | 3.16E-05 | −0.818 | 4.14E-17 | 1.124 | 1.77E-40 |
| ER- & p53+ | FMO5, VGLL1, FABP7, GABRP, PKP1,TFCP2L1, NRTN,KRT15, PTX3, KRT16, MIA,CTAG1A, ELF5,HORMAD1,C8orf4 6,FAM150B | −0.906 | 7.80E-14 | 0.844 | 8.84E-23 | 0.127 | 0.305 |
The co-expression of the set occurs in ER-positive tumors and higher grade tumors (referred to as Grade+) only. Joint occurrence of ER+ and Grade3 will result in higher co-expression. (2) Geneset in the 2nd row is differentially co-expressed by all covariates. The co-expression of the set occurs in ER-positive tumors, p53-negative tumors and higher grade tumors (referred to as Grade+). Joint occurrence of ER+, p53- and Grade3 will result in higher co-expression. (3) Gene set in the 3rd row is differentially co-expressed by ER and p53 covariates. The co-expression of the set occurs in ER-negative tumors, p53-positive tumors. Joint occurrence of ER- and p53+ will result in higher co-expression