| Literature DB >> 26968614 |
Kévin Rue-Albrecht1,2, Paul A McGettigan1,3, Belinda Hernández4,5, Nicolas C Nalpas1,6, David A Magee1, Andrew C Parnell4, Stephen V Gordon7,5, David E MacHugh8,9.
Abstract
BACKGROUND: Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors.Entities:
Keywords: Classification; Functional genomics; Gene expression; Gene ontology; Microarray; RNA-sequencing; Supervised learning
Mesh:
Substances:
Year: 2016 PMID: 26968614 PMCID: PMC4788925 DOI: 10.1186/s12859-016-0971-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of the GOexpress workflow. A typical GOexpress analysis takes as input: an ExpressionSet of the Biobase Bioconductor package containing either microarray or RNA-seq normalised expression data; the name of an experimental factor present in the phenoData slot of the ExpressionSet; and annotations for the features and GO terms (or other functional classes) considered. The GO_analyse function calculates scores and ranks for the individual genes and GO terms. Optionally, the pValue_GO function randomly permutes the gene features to estimate the probability of each GO term to rank (or score) higher by chance. Finally, various functions allow visualisation of gene expression profiles by gene and gene ontology, and export of the calculated statistics in text files
Fig. 2Ranking of filtered GO terms by summarisation of gene ranks. The rank of each gene feature is shown on the left, while the average rank of each GO term (average of all annotated genes) is shown on the right. The ranks of all genes associated with the 1st- and 55th-ranked GO terms are shown, following filtering for only molecular function GO terms associated with at least 15 genes in the annotations. Notably, eight and 13 genes associated with the GO terms chemokine activity and kinase binding are absent from the sample ExpressionSet and ranked last
Fig. 3Expression profiles for the top-ranked microarray probeset that best clusters treatment groups. The expression_plot and expression_profiles visualisation functions facilitate summarisation of gene expression levels for CCL5 by sample group (a), or individual sample series (b) for each experimental infection (green: uninfected MDM; purple: M. avium subspecies paratuberculosis; orange: M. bovis BCG; yellow: M. bovis)
Fig. 4Heat map and hierarchical clustering of treatment groups using expression data from genes associated with the top-ranked GO term. The heatmap_GO visualisation function summarises expression level for all genes present in the ExpressionSet and associated with chemokine activity (GO:0008009). Green: uninfected MDM; purple: M. avium subspecies paratuberculosis; orange: M. bovis BCG; yellow: M. bovis
Feature-level statistics for the microarray probesets associated with the top-ranked GO term
| Probeset | Score | Rank | Gene name | Description |
|---|---|---|---|---|
| Bt.552.1.S1_at | 0.356 | 1 |
| Bos taurus chemokine (C-C motif) ligand 5 (CCL5), mRNA. [Source:RefSeq mRNA;Acc:NM_175827] |
| Bt.28088.1.S1_at | 0.121 | 33 |
| Bos taurus chemokine (C-X-C motif) ligand 13 (CXCL13), mRNA. [Source:RefSeq mRNA;Acc:NM_001015576] |
| Bt.22009.1.S1_at | 0.114 | 38 |
| Bos taurus chemokine (C-X-C motif) ligand 16 (CXCL16), mRNA. [Source:RefSeq mRNA;Acc:NM_001046095] |
| Bt.9560.1.S1_at | 0.097 | 57 |
| Bos taurus chemokine (C-C motif) ligand 20 (CCL20), mRNA. [Source:RefSeq mRNA;Acc:NM_174263] |
| Bt.23093.1.S1_at | 0.049 | 130 |
| Bos taurus chemokine (C-X-C motif) ligand 3 (CXCL3), mRNA. [Source:RefSeq mRNA;Acc:NM_001046513] |
| Bt.611.1.S1_at | 0.037 | 166 |
| Bos taurus chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity, alpha) (GRO1), mRNA. [Source:RefSeq mRNA;Acc:NM_175700] |
| Bt.611.1.S1_x_at | 0.033 | 192 |
| Bos taurus chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity, alpha) (GRO1), mRNA. [Source:RefSeq mRNA;Acc:NM_175700] |
| Bt.9504.1.A1_at | 0.028 | 244 |
| Bos taurus chemokine (C-C motif) ligand 4 (CCL4), mRNA. [Source:RefSeq mRNA;Acc:NM_001075147] |
| Bt.611.1.S2_at | 0.025 | 277 |
| Bos taurus chemokine (C-X-C motif) ligand 1 (melanoma growth stimulating activity, alpha) (GRO1), mRNA. [Source:RefSeq mRNA;Acc:NM_175700] |
| Bt.22009.2.S1_a_at | 0.023 | 298 |
| Bos taurus chemokine (C-X-C motif) ligand 16 (CXCL16), mRNA. [Source:RefSeq mRNA;Acc:NM_001046095] |
| Bt.21088.1.S1_at | 0.023 | 301 |
| Bos taurus chemokine (C-C motif) ligand 22 (CCL22), mRNA. [Source:RefSeq mRNA;Acc:NM_001099162] |
| Bt.14087.1.A1_at | 0.023 | 307 | Uncharacterized protein [Source:UniProtKB/TrEMBL;Acc:E1BGB8] | |
| Bt.2408.1.S1_at | 0.013 | 677 |
| Bos taurus chemokine (C-C motif) ligand 2 (CCL2), mRNA. [Source:RefSeq mRNA;Acc:NM_174006] |
| Bt.154.1.S1_at | 0.011 | 844 |
| Bos taurus chemokine (C-C motif) ligand 8 (CCL8), mRNA. [Source:RefSeq mRNA;Acc:NM_174007] |
| Bt.8144.1.S1_at | 0.006 | 1679 |
| Bos taurus chemokine (C motif) ligand 1 (XCL1), mRNA. [Source:RefSeq mRNA;Acc:NM_175716] |
| Bt.7165.1.S1_at | 0.003 | 3007 |
| Bos taurus chemokine (C-X-C motif) ligand 6 (granulocyte chemotactic protein 2) (CXCL6), mRNA. [Source:RefSeq mRNA;Acc:NM_174300] |
| Bt.610.1.A1_at | 0.002 | 4360 | Bos taurus chemokine (C-X-C motif) ligand 2 (CXCL2), mRNA. [Source:RefSeq mRNA;Acc:NM_174299] | |
| Bt.9974.1.S1_at | 0.001 | 4805 |
| Bos taurus chemokine (C-C motif) ligand 3 (CCL3), mRNA. [Source:RefSeq mRNA;Acc:NM_174511] |
| Bt.9974.1.S1_a_at | 0.000 | 5086 |
| Bos taurus chemokine (C-C motif) ligand 3 (CCL3), mRNA. [Source:RefSeq mRNA;Acc:NM_174511] |
| Bt.6556.1.S1_at | 0.000 | 5086 | Bos taurus regakine 1 (LOC504773), mRNA. [Source:RefSeq mRNA;Acc:NM_001034220] | |
| Bt.21950.1.S1_at | 0.000 | 5086 |
| chemokine (C-C motif) ligand 16 [Source:HGNC Symbol;Acc:HGNC:10614] |
| Bt.21950.1.S1_s_at | 0.000 | 5086 |
| chemokine (C-C motif) ligand 16 [Source:HGNC Symbol;Acc:HGNC:10614] |
| Bt.20673.1.A1_at | 0.000 | 5086 |
| chemokine (C-C motif) ligand 1 [Source:HGNC Symbol;Acc:HGNC:10609] |
| Bt.2408.1.S1_s_at | 0.000 | 5086 |
| Bos taurus chemokine (C-C motif) ligand 2 (CCL2), mRNA. [Source:RefSeq mRNA;Acc:NM_174006] |
| Bt.155.1.S1_at | 0.000 | 5086 |
| Bos taurus interleukin 8 (IL8), mRNA. [Source:RefSeq mRNA;Acc:NM_173925] |
| Bt.11581.1.S1_at | 0.000 | 5086 |
| Bos taurus platelet factor 4 (PF4), mRNA. [Source:RefSeq mRNA;Acc:NM_001101062] |
| Bt.16966.1.S1_at | 0.000 | 5086 |
| Bos taurus chemokine (C-X-C motif) ligand 10 (CXCL10), mRNA. [Source:RefSeq mRNA;Acc:NM_001046551] |
The table_genes function was used to export results for the top-ranked GO term chemokine activity (GO:0008009)
Fig. 5Screenshot of a sample R/Shiny application built on GOexpress results. Users may run the web application from GitHub (https://github.com/kevinrue/shiny-MDM) as shown in the main text, or from the ZIP archive provided in Additional file 4
Comparative table detailing features of different GO analysis software tools
| Software | Multiple organisms | Custom annotations | Platform | Statistical method | Visualisation | Flexible threshold | Multi-level factors | Environment | Application |
|---|---|---|---|---|---|---|---|---|---|
| GOexpress (2015) | Yes | Yes | Microarray RNA-seq | Gene permutation; RF/One-way ANOVA | Gene expression; GO | Yes | Yes | R/Bioconduct r Web-app (R/Shiny) | Ranking and visualisation of genes and GO termswith expression levels that best classify multiple experimental groups |
| MLseq (2014) | No | No | RNA-seq | Choose from one of several algorithms (SVM, bagSVM, RF, CART) | No | No | Yes | R/Bioconductor | Application of several ML methods to RNA-seq data (using a read count table) |
| seqGSEA (2014) | Yes | Yes | RNA-seq | Subject permutation; Use a statistic based on the negative binomial distribution to find differentially spliced genes between two groups | Gene ranking; Gene set ranking | No | No | R/Bioconductor | Gene set enrichment analysis of high-throughput RNA-seq data by integrating differential expression and splicing |
| GOseq (2010) | Yes | Yes | RNA-seq | Probability weighting function (PWF); Resampling; Wallenius distribution or random sampling to choose a null distribution to find under and over representation of GO categories | No | No | No | R/Bioconductor | Detection of GO and/or other user defined categories which are over/under represented in RNA-seq data |
| GOrilla (2009) | Yes | No | Microarray RNA-seq | Exact mHG | GO (enrichment) | Yes | No | Web-based | Identification and visualisation of enriched GO terms in ranked lists of genes |
| GOstats (2007) | Yes | Yes | Microarray | Hypergeometric test | Gene ontology (enrichment) | Yes | No | R/Bioconductor | Tools for interacting with GO and microarray data. A variety of basic manipulation tools for graphs, hypothesis testing and other simple calculations |
| STEM (2006) | Yes | Yes | Microarray | STEM clustering (assignment to predefined set of model profiles); | Gene expression cluster visualisation; integration with GO (enrichment) | Yes | No | Java | Clustering, comparison, and visualisation of short time series gene expression data from microarray experiments (~8 time points or fewer) |
| GSA (2007) | No | Yes | Microarray | Maxmean | GO (enrichment) | Yes | Yes | R/CRAN | Identification of gene sets where most genes or either positively or negatively correlate in a coordinated manner with higher values of phenotype. |
Abbreviations: RF random forest, ANOVA analysis of variance, SVM support vector machines, bagSVM bagging support vector machines, CART classification and regression trees