| Literature DB >> 22570409 |
Ramona Britto1, Olivier Sallou, Olivier Collin, Grégoire Michaux, Michael Primig, Frédéric Chalmel.
Abstract
We present gene prioritization system (GPSy), a cross-species gene prioritization system that facilitates the arduous but critical task of prioritizing genes for follow-up functional analyses. GPSy's modular design with regard to species, data sets and scoring strategies enables users to formulate queries in a highly flexible manner. Currently, the system encompasses 20 topics related to conserved biological processes including male gamete development discussed in this article. The web server-based tool is freely available at http://gpsy.genouest.org.Entities:
Mesh:
Year: 2012 PMID: 22570409 PMCID: PMC3394256 DOI: 10.1093/nar/gks380
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Framework for the prioritization of candidate genes. (A) and (B) describe the steps involved in pre-processing and querying respectively. Lane 1 (Data categories and modules) lists a non-exhaustive list of modules falling into the four categories (Sequence, Expression, Annotation and Association) that were collected and curated from different species to drive gene prioritization. Lane 2 outlines the scoring strategies, one for each module. The species-wise ranking process that follows the scoring of individual genes is depicted in Lane 3. H, M, F, W and Y indicate the ranked lists for human, mouse, fly, worm and yeast, respectively. (B) The server accepts as input a gene list from any one of the 45 species (human, in the displayed example). Genes in the input list are mapped onto pre-computed ranked lists for selected species (Lane 4) and an intra-module rank is generated (Lane 5). Lane 6 (WS; Weight Scheme) highlights the weight applied to each module. Lanes 7 and 8 describe the final step in gene prioritization, calculation of an inter-module weighted average rank for each gene. The output is the prioritized input list.
Figure 2.Gene ranking and RNAi phenotypes. (A) The most relevant phenotypes are plotted for each gene in the prioritized candidate list (from the 1st to the 56th, x-axis). On the y-axis, phenotype classes are indicated: RP = reproduction-associated phenotype; LP = lethal phenotype; OP = other phenotype; None = no observable phenotype. Official gene symbols are displayed for all genes. (B) Displays receiver operating characteristic (ROC) curves for: (i) the candidate gene set (n = 56 genes) versus the C. elegans negative reference set (NRS; n = 1000; blue curve); (ii) the RP genes set (n = 23) versus NRS (red); (iii) the RP versus non-RP sets (union of LP, OP and None phenotype; n = 33; green). The corresponding area under the ROC curve (AUC) values are indicated. Note the significant improvement in AUC value between (ii) and (i). The AUC value for (iii) is significantly non-random. (C) Displays ROC curves for the discrimination of the C. elegans RP (n = 23) versus non-RP sets (n = 33) using GPSy (default settings, solid blue line), GPSy (C. elegans data only, dashed blue line), Endeavour (red) and Génie (green).