| Literature DB >> 27923364 |
Patrice Godard1, Matthew Page2.
Abstract
BACKGROUND: Bridging genotype and phenotype is a fundamental biomedical challenge that underlies more effective target discovery and patient-tailored therapy. Approaches that can flexibly and intuitively, integrate known gene-phenotype associations in the context of molecular signaling networks are vital to effectively prioritize and biologically interpret genes underlying disease traits of interest.Entities:
Keywords: Biological networks; Disease-gene association; Genetics; Phenotype; Semantic similarity
Mesh:
Year: 2016 PMID: 27923364 PMCID: PMC5142268 DOI: 10.1186/s12859-016-1401-2
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of PCAN to related methods
| Software | Application | Approach | Description | Availability |
|---|---|---|---|---|
| PCAN | Gene- phenotype exploration | Indirect, phenotype-based | Implements a readily interpreted, statistical definition of phenotype consensus for configurable lists of mechanistically-related genes. Can be used for gene-prioritisation and also versatile, trait-level exploration of gene-phenotype relationships within pathways and biological networks. | R package |
| CSI-OMIM [ | Disease diagnosis | Direct, phenotype-based | Improved phenotype searching of NLP processed OMIM Clinical Synopsis descriptions. Phrases are tagged with ontological terms (MESH, UMLS) and clustered into groups of synonymous expressions. | Website |
| Phenomizer [ | Disease diagnosis | Direct, phenotype-based | Improved phenotype searching using semantic similarity methods based on HPO annotations for rare diseases. | Website |
| PhenoDigm [ | Disease-gene prioritisation | Direct, phenotype-based | Gene prioritisation based on phenotype comparison across model organisms. Model organism trait ontologies (e.g. HPO and MPO) are cross-linked and semantic similarity is computed using the OWLSim algorithm. | Website |
| Exomewalker [ | Disease-gene prioritisation | Indirect, phenotype-based | Performs a random walk of the STRING protein-interaction network, seeded with genes linked to diseases with a high semantic similarity to the disorder under investigation. Genes are prioritised based on the random walk score and variant-level criteria combined using a linear model. | Website and command line (via Exomiser) |
| Syndrome to Gene [ | Disease-gene prioritisation | Indirect, ontology-based | Use CSI-OMIM to identify genes that cause similar diseases. Quantify gene-relatedness by comparing information vectors derived from 18 source databases using a Jaccard similarity coefficient. Genes are prioritised if they are related to genes that cause similar phenotypes. | Website |
| OVA [ | Variant prioritisation | Indirect, ontology-based | Generates extensive, gene-level, multi-ontology annotation profiles for candidate variants and a query phenotype. Direct gene annotations are supplemented with inferred annotations from model organism orthologues and network neighbours. Annotation vectors are compared by computing domain-specific semantic similarities and combined using a Random Forest model to rank variants. | Website |
| Exomiser [ | Variant prioritisation | Pipeline | Variant ranking is based on both variant-level properties (allele frequency, pathogenicity) and gene-level semantic similarities for directly linked human diseases, model organism phenotypes as well as network proximity to similar phenotypes using ExomeWalker. | Website and command line |
Fig. 1PCAN workflow. The typical PCAN workflow followed to assess the relationship between a candidate gene and a disease of interest based on genes mechanistically related to the candidate (from pathways or protein-protein interaction networks) and the Mendelian disorders they are known cause. Green boxes indicate user provided inputs to the method
Fig. 2PCAN prior knowledge resources. Public resources used to link genes to phenotypic abnormalities based on the genetic diseases each gene causes. The HPO phenotype annotation resource (build #1039) was used to link HP terms to OMIM disorders and ClinVar (version of May 2015) was used to retrieve genes that cause OMIM disorders. Total counts of each distinct entity type in the resultant gene-trait resource are provided
Fig. 3Assessing a gene's relevance for a condition by applying a pathway consensus approach. a Each gene, which is known to be involved in at least one genetic disorder, is associated to the corresponding HP terms. These HP terms are compared to those related to the disease of interest by computing a symmetric semantic similarity score. b The scores of all genes related to the gene of interest are compared to scores for all known Mendelian disease genes. Here the gene candidate is in yellow and its direct neighbors are in blue. Nodes surrounded in red correspond to genes with a high semantic similarity score for the disease under focus
Fig. 4Comparing the genes belonging to the “Anchoring of the basal body to the plasma membrane” pathway to the HP terms related to Joubert syndrome. a Distribution of symmetric semantic similarity scores of genes for the 8 HP terms related to Joubert syndrome. The red bars correspond to the distribution of the scores of genes belonging to the pathway of interest. The grey bars correspond to the distribution of the scores for all the other genes. (The density of scores equal to 0 is truncated; its actual value is 12.8) b Symmetric semantic similarity scores of genes belonging to the pathway of interest. The gene candidate, CC2D2A, is highlighted. Dashed red lines show the value of three specific quantiles: 50, 75 and 95%. c Heatmap showing the best semantic similarity between each gene in the pathway of interest (columns) and each HP term under focus (rows). The red intensity of each square corresponds to the highest semantic similarity score between the HP term of interest and the gene associated HP terms (white: 0 and red: 5.2). The gene candidate, CC2D2A, is highlighted. In figures (b) and (c), only the top 10 genes are shown. Additional file 3: Figure S1 shows results for all the genes in the pathway
Performance of the pathway consensus approach depending on the prior knowledge used to identify genes related to the candidate under focus
| Type of knowledge | Resource | Number of results | Potential | AUC | Median rank |
|---|---|---|---|---|---|
| Pathways | MetaBase | 2355 | 52% | 74% | 19% |
| Pathways | Reactome | 2669 | 59% | 73% | 20% |
| Neighbors | Metabase | 4367 | 96% | 68% | 22% |
| Neighbors | MetaBase HQa | 3623 | 80% | 73% | 16% |
| Neighbors | STRING | 3705 | 81% | 71% | 19% |
| Neighbors | STRING HQa | 3247 | 71% | 74% | 14% |
| Upstream neighbors | Metabase | 4362 | 96% | 65% | 27% |
| Upstream neighbors | MetaBase HQa | 3399 | 75% | 70% | 20% |
| Upstream neighbors | STRING | 2158 | 47% | 73% | 16% |
| Upstream neighbors | STRING HQa | 1825 | 40% | 75% | 14% |
| Downstream neighbors | Metabase | 3352 | 74% | 74% | 16% |
| Downstream neighbors | MetaBase HQa | 2600 | 57% | 74% | 15% |
| Downstream neighbors | STRING | 2069 | 45% | 73% | 18% |
| Downstream neighbors | STRING HQa | 1722 | 38% | 74% | 15% |
| Pathways + Neighbors | Metabase + MetaBase HQa | 3746 | 82% | 75% | 15% |
| Pathways + Neighbors | Reactome + MetaBase HQa | 3861 | 85% | 75% | 16% |
| Pathways + Neighbors | MetaBase + STRING HQa | 3515 | 77% | 76% | 14% |
| Pathways + Neighbors | Reactome + STRING HQa | 3617 | 80% | 76% | 15% |
aHQ corresponds to high quality network as described in material and methods