| Literature DB >> 27357693 |
Gil Stelzer1,2, Inbar Plaschkes2, Danit Oz-Levi1, Anna Alkelai1, Tsviya Olender1, Shahar Zimmerman1, Michal Twik1, Frida Belinky3, Simon Fishilevich1, Ron Nudel1, Yaron Guan-Golan4, David Warshawsky4, Dvir Dahary2,5, Asher Kohn4, Yaron Mazor2, Sergey Kaplan2, Tsippi Iny Stein1, Hagit N Baris6,7, Noa Rappaport1, Marilyn Safran1, Doron Lancet8.
Abstract
BACKGROUND: Next generation sequencing (NGS) provides a key technology for deciphering the genetic underpinnings of human diseases. Typical NGS analyses of a patient depict tens of thousands non-reference coding variants, but only one or very few are expected to be significant for the relevant disorder. In a filtering stage, one employs family segregation, rarity in the population, predicted protein impact and evolutionary conservation as a means for shortening the variation list. However, narrowing down further towards culprit disease genes usually entails laborious seeking of gene-phenotype relationships, consulting numerous separate databases. Thus, a major challenge is to transition from the few hundred shortlisted genes to the most viable disease-causing candidates.Entities:
Keywords: Gene prioritization; Guilt by association; Next generation sequencing analysis; Phenotype interpretation; Phenotyping; Variant selection
Mesh:
Year: 2016 PMID: 27357693 PMCID: PMC4928145 DOI: 10.1186/s12864-016-2722-2
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1VarElect information flow. Variants-containing genes are mapped by VarElect, in an example, to two OR-ed phenotype keywords. In the direct mode, Gene A is directly related to phenotype 1. In the indirect mode, Gene D is related to both Phenotype 1 and Phenotype 2. The three implicating genes R, S and T generate Gene-Gene-phenotype connections as shown via three GeneCards sections
Fig. 2VarElect interface and direct mode example. a The user enters two types of input: gene symbol list and phenotype expression. A Symbolize button activates a process whereby correct symbols are assigned. The phenotype phrase undergoes automatic syntactic validation, whereby success is indicated by a green ✓ check mark. Example cases at top enable initial familiarization with the VarElect tool. “My Analyses” store previous analyses that may be accessed by the “Open” button. b Top, “identified” genes for which correct official symbols have been entered and are ready for analysis. Symbols may be removed at will from the input list using the X icon. Bottom, the “unidentified” tab displays cases which require symbol resolution. Suggested symbols may be selected from the drop down menu or alternate symbols may be searched for in GeneCards with the magnifying glass or entered using the edit icon. c Output results presented in two tabs for direct and indirect modes. Each result row displays a gene along with its basic information, such as the gene description, category and the phenotype relevance score, also indicated by the size and color of the bar. “Matched Phenotypes” and “Matched Phenotype Count” columns appear for multi-term search phrases. Results may be sorted by any of the table columns but are ranked by the relevance score by default. Gene symbols link to the relevant GeneCard, the orange icon to the variants section of the same gene and the yellow icon to a MalaCards search for the gene symbol. Entering strings within the filter text box searches for them within any of the result rows. The question marks in each column provide additional information. d An expanded row displays a”MiniCard” with snippets from various webcard sections, which place the exhibited evidence in context, highlighting the search words that received hits for one of the genes. External links enable further scrutiny of the information in the data sources from which it was received
Fig. 3VarElect information display and indirect mode. a The query information section (top) displays the phenotype phrase, the fact that no direct hits were found and in a separate tab - the genes found in indirect mode. Links for un-hit genes and the entire submitted symbolized gene list are also provided. The latter may be reanalyzed by changing the phenotype phrase. b One of the implicated genes expanded to reveal its (top 5) implicating genes, along with their phenotype scores. c Implicating genes expanded to reveal two types of “MiniCards”. First - “Gene to Gene Relation”, i.e. evidence showing how the implicated gene TLN1 appears in the GeneCard for the implicating gene IL6. Second, “Gene to Phenotypes/Keywords Relation” show how the submitted phenotype keywords appear in different sections of the same implicating gene’s webcard. The provided external links enable further examination of the evidence in the source data itself
Fig. 4Performance comparison of VarElect to competing tools. a Phenolyzer: Thirty four queries were submitted to analysis in both VarElect and Phenolyzer (Methods). Each query is based on real results and includes a disease causing gene spiked into background genes, accompanied with real disease related symptoms that were recorded from the relevant patients (Table S1). The rank of the probe gene is color coded, bottom to top: top 10, top 11–20, top 21–100, below top 100, not found to be connected to the search terms. b Exomiser: Ten queries were submitted to both VarElect and Exomiser (Methods). The rank of the probe gene resulting from each query in Exomiser was recorded from 2 different score types; 1) Exomiser gene phenotype score that is solely derived from gene to phenotype relevance 2) Exomiser general score, combining variant severity score and gene phenotype score. Ranking results are shown with binning as in Fig. 4a. c Ingenuity: Queries as in Fig. 4b were submitted to Ingenuity Variant Analysis, and the appearance of the spiked gene in the resultant gene list was examined. For this comparison, Ingenuity 0-hop mode was considered comparable to VarElect direct mode, and Ingenuity 1-hop mode was considered comparable to VarElect indirect mode. d Phevor: Queries as in Fig. 4b were submitted to Phevor2 in combination with VAAST3 filtering (Methods). Ranking results are shown with binning as in Fig. 4a
Comparative analysis for four diseases with additional phenotype interpretation tools
| Target Gene | VarElect | Phenolyzer | Exomiser | Phevor (Omicia) | Ingenuity | ||
|---|---|---|---|---|---|---|---|
| Direct | Indirect | Direct | Indirect | ||||
| TTC37 | 1 | 3 | 1 | 1 | 77 | ||
| MYBPC | 1 | 2 | 1 | 1 | 1 | ||
| TLN1 | 1 | 0 | 19 | 12 | 58a | ||
| NDUFAF2 | 1 | 1 | 1 | ND | 10 | ||
All numbers indicate position in the prioritized list. For Ingenuity, numbers indicate the length of the unprioritized list in which the target gene appears. 0 = not detected; Direct = direct mode for VarElect, 0-hop for Ingenuity; Indirect = indirect mode for VarElect,> 0-hop for Ingenuity; a = 2-hop; ND not done for technical reasons. Diseases associated with the listed gene may be viewed in the “Clinical cases solved by VarElect” section