| Literature DB >> 20598132 |
Yuna Blum1, Guillaume Le Mignon, Sandrine Lagarrigue, David Causeur.
Abstract
BACKGROUND: Microarray technology allows the simultaneous analysis of thousands of genes within a single experiment. Significance analyses of transcriptomic data ignore the gene dependence structure. This leads to correlation among test statistics which affects a strong control of the false discovery proportion. A recent method called FAMT allows capturing the gene dependence into factors in order to improve high-dimensional multiple testing procedures. In the subsequent analyses aiming at a functional characterization of the differentially expressed genes, our study shows how these factors can be used both to identify the components of expression heterogeneity and to give more insight into the underlying biological processes.Entities:
Mesh:
Year: 2010 PMID: 20598132 PMCID: PMC2911460 DOI: 10.1186/1471-2105-11-368
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Structure of the illustrative data sets. Representation of three illustrative studies consisting of 1000 genes on 20 arrays divided between two groups. (A) Case 1: one independent grouping variable with red and green levels affecting all genes. (B) Case 2: one independent grouping variable with red and green levels affecting a gene set. (C) Case 3: two independent grouping variables with red and green levels and with blue and orange levels affecting each a different gene set.
Figure 2Illustrative data sets: Individuals and genes representations. Individuals and genes representation using respectively the Z matrix corresponding to the factors and the B matrix of the loadings found by FAMT. Individuals and genes are colored according to the independent variable they are affected by. (A) representations corresponding to case 1, (B) case 2 and (C) case 3.
Enrichment tests for the list of 287 genes and 688 genes
| LIST OF 287 GENES | |||||
|---|---|---|---|---|---|
| GO.0006470 | 56 | 5 | 0.015 | ACP1, PTPN14, PTPRE, PTP4A3, PTPN6 | |
| GO.0006725 | 38 | 4 | 0.017 | PPME1, GART, MOCS1, ALDH6A1 | |
| GO.0007259 | 9 | 2 | 0.022 | SOCS1, STAMBP | |
| GO.0043543 | protein amino acid acylation | 9 | 2 | 0.022 | NULL, ZDHHC17 |
| GO.0044259 | multicellular macromolecule metabolic process | 10 | 2 | 0.027 | ACE2, SERPINH1 |
| GO.0008033 | tRNA processing | 26 | 3 | 0.0296 | TSEN15, FARS2, NSUN2 |
| GO.0033002 | muscle cell proliferation | 11 | 2 | 0.032 | NOX1, BMP10 |
| GO.0050730 | regulation of peptidyl tyrosine phosphorylation | 12 | 2 | 0.038 | SOCS1, EGFR |
| Kegg ID | Kegg pathway | Size | Count | Pvalue | HGNC ID |
| map04320 | 9 | 3 | 2.38E-03 | EGFR, SPIRE1, ETS1 | |
| LIST OF 688 GENES | |||||
| GOID | GO Term | Size | Count | Pvalue | HGNC ID |
| GO.0006470 | 56 | 10 | 1.80E-03 | ACP1, PPM1E, PTPN14, PTPRE, PTP4A3, PPM1G, PTPRU, PPP3CB, PPM1L, PTPRF | |
| GO.0046483 | heterocycle metabolic process | 33 | 7 | 3.21E-03 | AMBP, GART, P4HA2, HMOX2, AFMID, MTHFS, ALDH6A1 |
| GO.0051186 | cofactor metabolic process | 64 | 10 | 4.97E-03 | AMBP, TXNRD3, NOX1, HMOX2, AFMID, GGT7, MTHFS, MOCS1, HMGCS1, ACO2 |
| GO.0016202 | regulation of striated muscle development | 15 | 4 | 0.011 | MBNL3, LEF1, NRG1, BMP4 |
| GO.0007259 | 9 | 3 | 0.014 | SOCS1, HCLS1, STAMBP | |
| GO.0040011 | locomotion | 111 | 13 | 0.017 | PRKG1, EDNRB, ACE2, NOX1, EGFR, NRG1, BMP10, ARAP3, JPH3, VHL, VAX1, DAB1, LAMA2 |
| GO.0001932 | regulation of protein amino acid phosphorylation | 26 | 5 | 0.019 | PDGFA, SOCS1, HCLS1, EGFR, BMP4 |
| GO.0048585 | negative regulation of response to stimulus | 10 | 3 | 0.020 | AMBP, PPP3CB, FABP7 |
| GO.0006534 | cysteine metabolic process | 4 | 2 | 0.021 | CBS, CDO1 |
| GO.0002274 | myeloid leukocyte activation | 11 | 3 | 0.026 | IRF4, LCP2, NDRG1 |
| GO.0006725 | 38 | 6 | 0.026 | PPME1, GART, AFMID, MTHFS, MOCS1, ALDH6A1 | |
| GO.0007185 | transmembrane receptor tyrosine phosphatase signaling | 5 | 2 | 0.033 | PTPRE, PTPRF |
| GO.0007271 | synaptic transmission cholinergic | 5 | 2 | 0.033 | CHRNA4, LAMA2 |
| GO.0000097 | sulfur amino acid biosynthetic process | 5 | 2 | 0.033 | CBS, CDO1 |
| GO.0006700 | 5 | 2 | 0.033 | STAR, CYP17A1 | |
| GO.0006787 | porphyrin catabolic process | 5 | 2 | 0.033 | AMBP, HMOX2 |
| GO.0001764 | neuron migration | 12 | 3 | 0.033 | PRKG1, VAX1, DAB1 |
| GO.0030509 | BMP signaling pathway | 21 | 4 | 0.036 | SOSTDC1, BMP10, MSX2, BMP4 |
| GO.0045321 | leukocyte activation | 64 | 8 | 0.040 | SWAP70, CHRNA4, FKBP1B, IRF4, LCP2, PPP3CB, NDRG1, SFRS17A |
| GO.0006790 | sulfur metabolic process | 32 | 5 | 0.043 | CBS, CDO1, TXNRD3, GGT7, CHST1 |
| GO.0018193 | peptidyl amino acid modification | 43 | 6 | 0.045 | PDGFA, SOCS1, P4HA2, HCLS1, EGFR, MAP2 |
| GO.0008211 | glucocorticoid metabolic process | 6 | 2 | 0.048 | STAR, CYP17A1 |
| GO.0006769 | nicotinamide metabolic process | 6 | 2 | 0.048 | NOX1, AFMID |
| GO.0030111 | regulation of Wnt receptor signaling pathway | 14 | 3 | 0.050 | SENP2, LEF1, SENP2 |
| Kegg ID | Kegg pathway | Size | Count | Pvalue | HGNC ID |
| map00630 | Glyoxylate and dicarboxylate metabolism | 9 | 4 | 1.87E-03 | GLYCTK, HYI, AFMID, ACO2 |
| map00140 | 6 | 3 | 5.11E-03 | ||
| map04320 | 9 | 3 | 0.018 | EGFR, SPIRE1, ETS1 | |
| map04012f | ErbB signaling pathway | 35 | 6 | 0.026 | PIK3R5, PLCG1, PAK3, EGFR, NRG1, PTK2 |
Enrichment tests were performed using an R program (see Methods section) with GO BP terms and Kegg pathways. The tests were done on the list of 287 genes found using the classical approach and the list of 688 genes found by FAMT. For each enriched term, the identifier (ID), the biological term, the size of the whole list of genes related to the term (size), the number of genes in the sub-list related to the term (count), the pvalue of the test and the HGNC Hugo abbreviations of the related genes are given. Italic terms are those which are present in both lists.
Figure 3Principal component analysis: individuals representation. Birds are colored according to their weight split into 3 classes: lean (green), intermediate (red) and fat (black). (A) PCA generated with the raw expression of the 688 differentially expressed genes. (B) PCA generated with the factor-adjusted expression of the 688 differentially expressed genes.
Figure 4eQTL mapping for DHCR7 on chromosome 5. The LRT curves for the gene DHCR7 are represented in blue (plain line for the factor-adjusted analysis and dotted line for the raw analysis). The LRT curve for the AF trait is represented in red. The plain curves reveal the existence of a QTL/eQTL for the 2 traits in the same region, around 175 cM on the GGA5 chromosome. This colocalization is not revealed by the raw analysis. The Significance level of 5% is represented by the horizontal green line. The genetic distances (cM) and likelihood ratio (LR) are shown on the X-axis and Y-axis, respectively.
Figure 5AF data set: individuals representation for each factor. Individuals are represented on the factor using the B matrix of loadings found by FAMT. The individuals are colored according to the variable "hatch" which has 4 levels.
Description of the factors extracted from the raw data and from the hatch and weight adjusted data
| Factors extracted from the raw AF expression dataset | |||||||
|---|---|---|---|---|---|---|---|
| Factor 1 | 0.139 | 0.129 | 0.074 | 0.179 | |||
| Factor 2 | 0.074 | 0.913 | 0.041 | 0.857 | |||
| Factor 3 | 0.848 | 0.489 | 0.716 | 0.376 | |||
| Factor 4 | 0.127 | 0.959 | 0.707 | 0.167 | |||
| Factor 5 | 0.435 | 0.217 | 0.884 | 0.529 | |||
| Factor 6 | 0.946 | 0.412 | 0.615 | 0.876 | |||
| Factor 1 | 0.219 | 0.156 | 0.078 | 0.209 | |||
| Factor 2 | 0.052 | 0.841 | 0.036 | 0.814 | |||
| Factor 3 | 0.049 | 0.819 | 0.569 | 0.554 | 0.16 | ||
| Factor 4 | 0.178 | 0.031 | 0.869 | 0.897 | 0.885 | ||
| Factor 5 | 0.949 | 0.727 | 0.647 | 0.291 | |||
The p-value is given for each association test (see Methods section). Considering a threshold of 1%, the significant p-values are in bold.
Biological terms characterizing factor 1 and 2
| FACTOR1 | ||||
|---|---|---|---|---|
| GO.0007051 | spindle organization | 6 | 2 | 8.20E-03 |
| GO.0050931 | pigment cell differentiation | 7 | 2 | 0.011 |
| GO.0000279 | M phase of meiotic cell cycle | 79 | 6 | 0.012 |
| GO.0000079 | regulation of cyclin dependent protein kinase activity | 9 | 2 | 0.019 |
| GO.0016570 | histone modification | 13 | 2 | 0.038 |
| GO.0015698 | inorganic anion transport | 53 | 4 | 0.039 |
| GO.0007156 | homophilic cell adhesion | 32 | 3 | 0.041 |
| Kegg ID | Kegg pathway | Size | Count | Pvalue |
| map05216 | Thyroid cancer | 11 | 2 | 0.020 |
| map05130 | Pathogenic Escherichia coli infection | 12 | 2 | 0.024 |
| map04520 | Adherens junction | 31 | 3 | 0.024 |
| FACTOR2 | ||||
| GO ID | GO Term | Size | Count | Pvalue |
| GO.0006195 | purine nucleotide catabolic process | 5 | 3 | 2.23E-05 |
| GO.0030168 | platelet activation | 7 | 3 | 7.65E-05 |
| GO.0007051 | spindle organization | 6 | 2 | 2.55E-03 |
| GO.0007596 | blood coagulation | 24 | 3 | 3.76E-03 |
| GO.0030336 | negative regulation of cell migration | 12 | 2 | 0.011 |
| GO.0032879 | regulation of localization | 103 | 5 | 0.012 |
| GO.0001775 | cell activation | 74 | 4 | 0.016 |
| GO.0001890 | placenta development | 18 | 2 | 0.023 |
| GO.0017038 | protein import | 49 | 3 | 0.027 |
| GO.0006403 | RNA localization | 22 | 2 | 0.034 |
| GO.0006816 | calcium ion transport | 56 | 3 | 0.038 |
| Kegg ID | Kegg pathway | Size | Count | Pvalue |
| map00230 | Purine metabolism | 64 | 4 | 0.013 |
Enrichment tests were performed on the genes contributing the more to the construction of factor 1 and 2 using the GO BP terms and Kegg pathways. For each enriched term, the identifier (ID), the bioligical term, the size of the whole list of genes related to the term (size), the number of genes in the sub-list related to the term (count), the pvalue of the test.