| Literature DB >> 21481242 |
Paolo Martini1, Davide Risso, Gabriele Sales, Chiara Romualdi, Gerolamo Lanfranchi, Stefano Cagnin.
Abstract
BACKGROUND: In the last decades, microarray technology has spread, leading to a dramatic increase of publicly available datasets. The first statistical tools developed were focused on the identification of significant differentially expressed genes. Later, researchers moved toward the systematic integration of gene expression profiles with additional biological information, such as chromosomal location, ontological annotations or sequence features. The analysis of gene expression linked to physical location of genes on chromosomes allows the identification of transcriptionally imbalanced regions, while, Gene Set Analysis focuses on the detection of coordinated changes in transcriptional levels among sets of biologically related genes. In this field, meta-analysis offers the possibility to compare different studies, addressing the same biological question to fully exploit public gene expression datasets.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21481242 PMCID: PMC3094239 DOI: 10.1186/1471-2105-12-92
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Enlargement of chromosomal regions related to leukaemia phenotype. Details on imbalanced regions calculated by STEPath chromosome mapping. Blue line represents chromosome profile; red and light green bars represent gene statistic values (d-score). A. Enlargement of the region of chromosome 11 containing the MLL gene (gene highlighted by the circle). B. Enlargement of the region between 20 and 32 Mbp of chromosome 7. This region corresponds to the localization of the HOX gene cluster (cluster highlighted by the circle). C. Enlargement of the region between 51 and 75 Mbp of chromosome 2 corresponding to the MEIS1 windows (gene highlighted by the circle). D. Enlargement of the region of chromosome 15 containing the NG2 gene (gene highlighted by the circle).
Figure 2Comparison among LAP, MACAT and STEPath. Comparison of imbalanced regions on chromosomes 2, 7, 11, and 15 identified by LAP, MACAT and STEPath. LAP procedure fails to identify as imbalanced the MLL region on chromosome 11, the HOX genes cluster on chromosome 7, and the CSPG4 containing region on chromosome 15, while identifies MEIS1 region on chromosome 2. MACAT also fails to evidence the MLL region on chromosome 11.
Comparison of GSEA approaches
| Rank | STEPath | STEPath - no correction |
|---|---|---|
| 1 | BioCarta;Erythropoietin mediated neuroprotection through NF-kB | BioCarta;Erythropoietin mediated neuroprotection through NF-kB |
| 2 | SuperArray;Homeobox (HOX) Genes | BioCarta;Induction of apoptosis through DR3 and DR4/5 Death Receptors |
| 3 | BioCarta;The IGF-1 Receptor and Longevity | BioCarta;Roles of -arrestin-dependent Recruitment of Src Kinases in GPCR Signaling |
| 4 | BioCarta;Induction of apoptosis through DR3 and DR4/5 Death Receptors | SuperArray;Homeobox (HOX) Genes |
| 5 | BioCarta;IL12 and Stat4 Dependent Signaling Pathway in Th1 Development | BioCarta;HIV-I Nef negative effector of Fas and TNF |
| 6 | BioCarta;HIV-I Nef negative effector of Fas and TNF | TCA Cycle;Metabolic Process |
| 7 | BioCarta;Roles of -arrestin-dependent Recruitment of Src Kinases in GPCR Signaling | hsa00310;Lysine degradation |
| 8 | TCA Cycle;Metabolic Process | hsa03018;RNA degradation |
| 9 | hsa00310;Lysine degradation | hsa05014;Amyotrophic lateral sclerosis (ALS) |
| 10 | hsa03018;RNA degradation | SuperArray;Stress/Toxicity PathwayFinder |
| 1 | B Cell Receptor Signaling Pathway;Cellular Process | BioCarta;Caspase Cascade in Apoptosis |
| 2 | hsa03018;RNA degradation | KEGG:03050;Proteasome |
| 3 | SuperArray;G-Proteins/Signaling Molecules | KEGG:04130;SNARE interactions in vesicular transport |
| 4 | TNF-alpha/NF-kB Signaling Pathway;Cellular Process | SuperArray;Heat Shock Proteins |
| 5 | hsa00510;N-Glycan biosynthesis | Proteasome Degradation;Physiological Process |
| 6 | hsa04142;Lysosome | KEGG:00380;Tryptophan metabolism |
| 7 | SuperArray;Autophagy | BioCarta;FAS signaling pathway (CD95) |
| 8 | BioCarta;Erk and PI-3 Kinase Are Necessary for Collagen Binding in Corneal Epithelia | KEGG:04612;Antigen processing and presentation |
| 9 | Translation Factors;Cellular Process | KEGG:03020;RNA polymerase |
| 10 | BioCyc;glyoxylate cycle II | KEGG:00020;Citrate cycle (TCA cycle) |
Rank comparison of tested GSA for the most 10 up-regulated gene sets. STEPath is the only procedure that was able to identify the activated HOX gene set with a best rank using the corrected expression value based on chromosome profile.
GSEA approach results running limma GSEA with and without chromosome profile correction
| Rank | GSEA - limma | q-value | GSEA - limma - corrected | q-value |
|---|---|---|---|---|
| 1 | B Cell Receptor Signaling Pathway;Cellular Process | 1.62E-05 | B Cell Receptor Signaling Pathway;Cellular Process | 2.68E-05 |
| 2 | hsa03018;RNA degradation | 1.44E-03 | SuperArray;G-Proteins/Signaling Molecules | 1.33E-03 |
| 3 | SuperArray;G-Proteins/Signaling Molecules | 1.53E-03 | hsa03018;RNA degradation | 2.31E-03 |
| 4 | TNF-alpha/NF-kB Signaling Pathway;Cellular Process | 3.79E-03 | TNF-alpha/NF-kB Signaling Pathway;Cellular Process | 2.93E-03 |
| 5 | hsa00510;N-Glycan biosynthesis | 4.20E-03 | Translation Factors;Cellular Process | 5.30E-03 |
| 6 | hsa04142;Lysosome | 8.78E-03 | hsa00510;N-Glycan biosynthesis | 5.40E-03 |
| 7 | SuperArray;Autophagy | 8.95E-03 | hsa04142;Lysosome | 6.40E-03 |
| 8 | BioCarta;Erk and PI-3 Kinase Are Necessary for Collagen Binding in Corneal Epithelia | 9.36E-03 | SuperArray;Autophagy | 8.21E-03 |
| 9 | Translation Factors;Cellular Process | 9.38E-03 | hsa05110;Vibrio cholerae infection | 9.42E-03 |
| 10 | BioCyc;glyoxylate cycle II | 1.06E-02 | BioCarta;Erk and PI-3 Kinase Are Necessary for Collagen Binding in Corneal Epithelia | 9.42E-03 |
Limma GSEA algorithm was run using the chromosome profile correction. Significance of the differentially expressed gene sets increases in comparison with results obtained without introducing chromosome profile correction, suggesting that it targets disease-related genes.
Details of muscle disease dataset.
| Disease | Number of samples | Case study | Series ID | Platform | Description |
|---|---|---|---|---|---|
| LGMD2A | 10 | L/S | GSE3307 | HGU133A | MUSCULAR DYSTROPHY, LIMB-GIRDLE, TYPE 2A (calpainopathy) |
| nLGMD2A | 10 | L/S | GSE11681 | HGU133A | MUSCULAR DYSTROPHY, LIMB-GIRDLE, TYPE 2A (calpainopathy) |
| LGMD2B | 10 | L/S | GSE3307 | HGU133A | MUSCULAR DYSTROPHY, LIMB-GIRDLE, TYPE 2B (Dysferlinopathy, Miyoshi distal myopathy) |
| LGMD2I | 7 | L/S | GSE3307 | HGU133A | MUSCULAR DYSTROPHY, LIMB-GIRDLE, TYPE 2I |
| BMD | 5 | S | GSE3307 | HGU133A | MUSCULAR DYSTROPHY, BECKER TYPE |
| DMD | 10 | S | GSE3307 | HGU133A | MUSCULAR DYSTROPHY, PSEUDOHYPERTROPHIC PROGRESSIVE, DUCHENNE TYPE |
| FSHD | 14 | S | GSE3307 | HGU133A | MUSCULAR DYSTROPHY, FACIOSCAPULOHUMERAL |
| AQM | 5 | S | GSE3307 | HGU133A | ACUTE QUADRIPLEGIC MYOPATHY |
| SPG4 | 4 | S | GSE3307 | HGU133A | SPASTIC PARAPLEGIA 4, AUTOSOMAL DOMINANT |
| ALS | 9 | S | GSE3307 | HGU133A | AMYOTROPHIC LATERAL SCLEROSIS 1 |
| X_EDMD | 4 | S | GSE3307 | HGU133A | EMERY-DREIFUSS MUSCULAR DYSTROPHY, 1 (X-linked) |
| AD_EDMD | 4 | S | GSE3307 | HGU133A | EMERY-DREIFUSS MUSCULAR DYSTROPHY, AUTOSOMAL DOMINANT |
| AbNORM | 11 | L/S | GSE3307 | HGU133A | NORMAL |
| Ctrl | 10 | L/S | GSE11681 | HGU133A | NORMAL |
General information about muscular disease meta-dataset. S: Skeletal Muscular disease dataset; L: LGMD analysis.
Figure 3LGMDs analysis workflow cartoon. 1) Independent application of STEPath to the N considered datasets (e.g., for analysis of LGMDs LGMD2A, LGMD2B, LGMD2I from GSE3307 and nLGMD2A from GSE11681). 2) Selection of the Main Gene set Signature (MGS) from GSE3307 dataset. Selection was performed by identifying gene sets having an expression value upper or lower (for up- or down-regulated regions, respectively) than average of expression of all significant gene sets. 3) Extraction of the MGS expression values from all datasets considered. 4) Matrix construction. 5) Normalization and cluster analysis.
Figure 4Cluster analysis for LGMD expression meta-dataset. A. Unsupervised cluster analyses on gene expression data from meta-dataset of LGMDs. Samples of each dataset are grouped separately: the GSE3307 dataset on the left branch and the GSE11681 dataset on the right. B. Cluster tree of unsupervised cluster analysis of the Main Gene set Signature matrix from LGMDs dataset: segregation is guided by disease type and not by dataset. C. Unsupervised cluster analysis result of gene set scores calculated with the STEPath algorithm. LGMD2A from two different datasets clustered together. D. Unsupervised cluster analysis performed on gene set scores calculated with sigPathway algorithm (NTk was the score used, defined as the gene-based normalized statistic obtained by permuting genes). E. Unsupervised cluster analysis based on gene set scores calculated with the GSEA limma algorithm. Clustering based on gene set scores calculated with both sigPathway and limma algorithms failed to link together the two LGMD2A different datasets.
Figure 5Heat map of STEPath matrix. Heat map of STEPath-based signature matrix for muscular disease dataset. In red and in the green box, up-regulated and down-regulated gene sets, respectively. Grey boxes are for gene sets that do not have up- or down-regulated elements.
Figure 6Heat map of PTM analysis. Heat map of PTM analysis searching for gene sets with marked up-regulation in LGMD2A pathologies with respect to other gene sets.