| Literature DB >> 24694260 |
Yuval Itan1, Mark Mazel, Benjamin Mazel, Avinash Abhyankar, Patrick Nitschke, Lluis Quintana-Murci, Stephanie Boisson-Dupuis, Bertrand Boisson, Laurent Abel, Shen-Ying Zhang, Jean-Laurent Casanova.
Abstract
BACKGROUND: Identifying the genotypes underlying human disease phenotypes is a fundamental step in human genetics and medicine. High-throughput genomic technologies provide thousands of genetic variants per individual. The causal genes of a specific phenotype are usually expected to be functionally close to each other. According to this hypothesis, candidate genes are picked from high-throughput data on the basis of their biological proximity to core genes - genes already known to be responsible for the phenotype. There is currently no effective gene-centric online interface for this purpose.Entities:
Mesh:
Year: 2014 PMID: 24694260 PMCID: PMC4051124 DOI: 10.1186/1471-2164-15-256
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Schematic representation of the generation, data structure and workflow of the HGCS. (1) Extraction of all human direct protein-protein binding interactions and the corresponding confidence scores from String. (2) Inversion of confidence scores to give direct biological distance metrics and generation of a genome-wide human weighted network. (3) Generation, for each human gene, of a gene-specific connectome — the set of all other human genes ranked according to their biological proximity to the specific gene. (4) Generation of a MySQL table from all human gene-specific connectomes. (5) Extraction, from Ensembl BioMart, of all human protein IDs, gene IDs, and their corresponding conventional and full names. (6,7) Generation of a MySQL table of all alternative gene names for each human gene. (8,9) Establishment of the full set of query gene names by identifying missing genes with alternative gene name aliases, extracting the target genes from the connectomes of the core genes. (10) Sorting of the target genes according to user-defined metrics, by relatedness to any of the core genes, or separated by core gene. The screen output can then be downloaded as a tab-separated text file.
Figure 2The HGCS interface. (A) The two boxes contain the list of genes to prioritize/analyze (which can be acquired from any high-throughput experiment after the application of filters; alternatively, any user-defined list of candidate genes can be used), and the core genes (known to be associated with the phenotype) for ranking purposes. A scroll box allows a choice of metrics for ranking (distance, p-value, or best reciprocal p-value), and the user may choose whether to rank the results globally, or separately by core gene. (B) The output consists of a table of genes ranked with respect to the core genes, which can be downloaded as a tab-separated text file. The information about the nature of connectivity between the core and target genes provided includes HGC-predicted biological distance, ranking of the target gene in the connectome of the core gene, the ratio between biological distance and the genome-wide median and mean biological distances to the core gene, the sphere of the target gene around the core gene, degrees of separation between the genes, and the full gene name.