| Literature DB >> 31406507 |
Aurélien Brionne1, Amélie Juanchich1, Christelle Hennequet-Antier1.
Abstract
The main objective of ViSEAGO package is to carry out a data mining of biological functions and establish links between genes involved in the study. We developed ViSEAGO in R to facilitate functional Gene Ontology (GO) analysis of complex experimental design with multiple comparisons of interest. It allows to study large-scale datasets together and visualize GO profiles to capture biological knowledge. The acronym stands for three major concepts of the analysis: Visualization, Semantic similarity and Enrichment Analysis of Gene Ontology. It provides access to the last current GO annotations, which are retrieved from one of NCBI EntrezGene, Ensembl or Uniprot databases for several species. Using available R packages and novel developments, ViSEAGO extends classical functional GO analysis to focus on functional coherence by aggregating closely related biological themes while studying multiple datasets at once. It provides both a synthetic and detailed view using interactive functionalities respecting the GO graph structure and ensuring functional coherence supplied by semantic similarity. ViSEAGO has been successfully applied on several datasets from different species with a variety of biological questions. Results can be easily shared between bioinformaticians and biologists, enhancing reporting capabilities while maintaining reproducibility. ViSEAGO is publicly available on https://bioconductor.org/packages/ViSEAGO .Entities:
Keywords: Annotation; Cluster analysis; Enrichment test; Functional genomics; Gene ontology; Semantic similarity; Visualization
Year: 2019 PMID: 31406507 PMCID: PMC6685253 DOI: 10.1186/s13040-019-0204-1
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Fig. 1Database impacts on GO annotation. Bar plot of the number of GO annotations available for Molecular Function, Biological Process and Cellular Component category of protein-coding genes in two major databases (NCBI, Ensembl) on three golden standard models with Human, Mouse, Zebrafish and seven livestock animals with Chicken, Cow, Pig, Rabbit, Salmon, Sheep and Trout. Computational (blue) and Experimental (orange) evidence are represented.
Description of functionalities supported by different tools focused on biological interpretation from GO annotation
| Tool | Interface, Langage | GO Annotation Database | Input identifiers for GO annotation | Enrichment test | GO terms SS | Sets of GO terms SS | Visualization | Multiple lists | Graph interactivity |
|---|---|---|---|---|---|---|---|---|---|
David
[ | Web | Fixed release, Ensembl, Entrez, Uniprot | GeneID, Ensembl gene ID, Affymetrix probes, Illumina ID, Agilent ID (Do not allow lists of > 3000 identifiers) | Fisher Exact (EASE) | No (used gene’s Kappa similarity) | No | summary table, bar plot, Gene-term 2D view, clustering (centered genes) | Yes | No |
ClusterProfiler Bioconductor [ | R | Current release, Bioconductor databases | GeneID, Ensembl gene ID, can be converted in the module | Hypergeometric | IC-based, Graph-based (computed) | Yes | summary table, bar plot, dot plot, enrichment map, network | Yes | No |
gProfiler
[ | R, Web | Fixed release, Ensembl supported species | mixed types of gene identifiers converted to Ensembl gene ID | Hypergeometric | No | No | tree like list of enriched GO terms, summary table | Yes | No |
REVIGO
[ | Web | Fixed release, UniProt and supported several species | No | No | IC-based (computed for visualization) | No | summary table, scatter plot, interactive map, TreeMap, export R plot (centered GO terms) | No | Yes |
ViSEAGO Bioconductor | R | Old and current release, Ensembl, Entrez, Uniprot, Bioconductor databases | GeneID, Ensembl gene ID, Uniprot ACC | Fisher Exact | IC-based, Graph-based (computed for visualization) | Yes | summary table, bar plot, upset, MDS plot, clustering (centered GO terms) | Yes | Yes |
Fig. 2Illustrated ViSEAGO package. A complete ViSEAGO analysis is presented from annotation of lists of features, enrichment tests to organization and viszualisation of GO terms thanks to semantic similarity. In italic, illustration of ViSEAGO features using case 1 study
Fig. 3Visualization of ViSEAGO’s functional analysis from mouse RNA-seq with three different transcriptomic datasets. Clustering heatmap plot that combines a dendrogram based on Wang’s semantic similarity distance and ward.D2 aggregation criterion, a heatmap of -log10(p-value) from functional enrichment tests and information content (IC). Focus is made on cholesterol biosynthetic process, a major pathway involved in the study
Fig. 4Visualization of ViSEAGO’s functional analysis from chicken RNA-seq with seven different transcriptomic datasets. a Upset plot representing overlaps between lists of enriched GO terms, b Clustering heatmap plot combining a dendrogram based on Wang’s semantic similarity distance and ward.D2 aggregation criterion, a heatmap of -log10(p-value) from functional enrichment test(s) of the seven lists of genes and information content (IC)
Fig. 5Visualization of ViSEAGO’s functional analysis from cattle with three MeDiP datasets. a Clustering heatmap plot combining a dendrogram based on Wang’s semantic similarity distance and ward.D2 aggregation criterion, a heatmap of -log10(p-value) from functional enrichment tests, and information content (IC). b MDS plot based on BMA distance representing the proximities of groups obtained by cutting dendrogram in (a). Dot size depends on the number of GO terms within each cluster. c Heatmap plot of functional sets of GO terms combining a description of the first common GO ancestor of each set of GO terms, a heatmap with the number of GO terms in each set, a dendrogram based on BMA semantic similarity distance and ward.D2 aggregation criterion