| Literature DB >> 26983021 |
Shani Ben-Ari Fuchs1, Iris Lieder1, Gil Stelzer1,2, Yaron Mazor1, Ella Buzhor3, Sergey Kaplan1, Yoel Bogoch4, Inbar Plaschkes1, Alina Shitrit2, Noa Rappaport2, Asher Kohn5, Ron Edgar6, Liraz Shenhav1, Marilyn Safran2, Doron Lancet2, Yaron Guan-Golan5, David Warshawsky5, Ronit Shtrichman7.
Abstract
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics, and others yet to emerge on the postgenomics horizon.Entities:
Mesh:
Year: 2016 PMID: 26983021 PMCID: PMC4799705 DOI: 10.1089/omi.2015.0168
Source DB: PubMed Journal: OMICS ISSN: 1536-2310

GeneAnalytics structure. GeneAnalytics is powered by GeneCards, LifeMap Discovery, MalaCards, and PathCards, which integrate >100 data sources. These databases contain annotated gene lists for tissues and cells, diseases, pathways, compounds, and GO terms. GeneAnalytics compares the user's gene set to these compendia in search of the best matches. The output contains the best matched gene lists, scored and subdivided into their biological categories such as diseases or pathways. In the figure, each output category and its respective data source are marked with the same color.

The gene set input. (A) The input page is used to insert and identify the query gene list. 1) The identification process requires species indication in order to identify the gene symbols and their orthologs. GeneAnalytics identifies only official human and mouse gene symbols. 2) The genes can be inserted by typing/pasting gene symbols in the input window or by uploading a file containing the gene list. Typing a gene name in the search box initiates an autocomplete tool that includes only official gene symbols. The identification process yields two lists: (B) “Ready for analysis” gene list, which includes identified gene symbols, their full name, and all available aliases/synonyms, and (C) “Unidentified genes” list, which includes genes that were not recognized as official human or mouse gene symbols. These gene names can be manually corrected by running a search in GeneCards or by using the autocomplete option.
GeneAnalytics Data Sources and Statistics
| Expression | Normal tissues and cells | LifeMap Discovery | 3,346 | 17,512 |
| Diseased tissues and cells[ | LifeMap Discovery (via MalaCards) | 96 | 6,963 | |
| Function | Disease | MalaCards | 12,085 | 22,280 |
| Pathways | PathCards | 1073 SuperPaths (unification of 3215 pathways) | 11,479 | |
| GO—biological process | GeneCards | 9,436 | 14,907 | |
| GO—molecular function | 3,509 | 15,624 | ||
| Compounds | 19,961 (unification of 44,942 compounds) | 8,434 | ||
Data sources and statistics for each result category, based on the type of analysis.
The expression data in diseased tissues and cells are available in the disease category.
LMD Entities Used in GeneAnalytics Matching Analysis in Tissues & Cells Category
| Organ | • High throughput gene expression comparisons | Heart | These entities contain a list of genes that have been found to be expressed in whole-tissue samples. | |
| Tissue | ||||
| Anatomical compartment | • High throughput gene expression comparisons | Renal collecting duct system | These entities describe specific temporospatial regions within an organ/tissue. | |
| In -vivo cell | • Data manually curated from the scientific literature | Inner cell Mass cells (ICM) | ||
| In -vitro cell: cultured stem, progenitor and primary cell | ||||
| Protocol-derived cell | ||||
| Cell Family | ||||
| Large Scale Data Set sample cards | Large Scale Data Sets | GUDMAP: Ovary | These entities contain the gene list for each Large Scale Data Sets sample. |
The entities available in the LMD database with gene expression information and an example for each.

Tissues and Cells results. (A) The Analyzed genes are the queried genes that were identified and included in the analysis. The “Notes” indicate genes in the query that were found to be abundant or defined as housekeeping genes in human. These genes get lower scores in the Tissue and Cells matching analysis. (B) The filters panel allows for filtering genes specifically expressed in Tissue/system, In vivo/In vitro, ‘Expressed in’ (cells, anatomical compartments, organs and tissues, and/or high throughput comparisons and large-scale dataset samples), Prenatal/Postnatal. (C) The detailed results table presents all entities in which at least one of the analyzed genes is expressed, along with links to their cards in LMD. (D) A link to the list of the matched genes and additional information for them (for example, “Mature Rod Cells”). (E) The list of matched genes linked to the specific entity in LMD (connected to “Mature Rod Cells”).

GeneAnalytics Disease results. (A) The disease filter enables filtration of results by gene–disease associations and disease categories obtained from the MalaCards database. (B) The detailed results table presents diseases matched to the queried gene set. Each disease is linked to its card in MalaCards. (C) Clicking on the number of matched genes opens a list of the matched genes and associated information. (D) Differentially expressed genes (‘expression’), and (E) disease-related genes in their respective sections in a disease card in MalaCards. Both sections serve as evidence for each matched disease in the GeneAnalytics disease category.
Disease–Gene Associations from Manually Curated Genetic Sources
| Causative mutation | ClinVar, OMIM, Orphanet |
| Risk factor | ClinVar, OMIM, Orphanet |
| Resistant factor | ClinVar, OMIM |
| Genetic tests | GeneTests |
| Drug response | ClinVar |
| Structural gene variation | OMIM, Orphanet |
| Unconfirmed association | OMIM, Orphanet |
See the Supplementary S3 Appendix for additional details.

GeneAnalytics Pathways results. (A) The pathway filters panel enables filtration of results according to their data sources. (B) The detailed results table includes all of the matched SuperPaths, presented in descending score and with links to the related card in PathCards. (C) Each SuperPath includes one or more pathways from different sources. Clicking on the plus sign exposes the names of the separate pathways that comprise the SuperPath, with links to the pathway page in the original data source.