| Literature DB >> 35180879 |
Veronica Andric1, Esha Joshi1,2, Roozbeh Manshaei1, Sean DeLong1,3, John B A Okello1,4, Priya Dhir1,5, Cherith Somerville1, Kirsten M Farncombe6, Kelsey Kalbfleisch1,7, Rebekah K Jobling1,8,7, Stephen W Scherer9,10,11,12, Raymond H Kim13,14, S Mohsen Hosseini15.
Abstract
BACKGROUND: Variant interpretation is the main bottleneck in medical genomic sequencing efforts. This usually involves genome analysts manually searching through a multitude of independent databases, often with the aid of several, mostly independent, computational tools. To streamline variant interpretation, we developed the GeneTerpret platform which collates data from current interpretation tools and databases, and applies a phenotype-driven query to categorize the variants identified in the genome(s). The platform assigns quantitative validity scores to genes by query and assembly of the genotype-phenotype data, sequence homology, molecular interactions, expression data, and animal models. It also uses the American College of Medical Genetics and Genomics (ACMG) criteria to categorize variants into five tiers of pathogenicity. The final output is a prioritized list of potentially causal variants/genes.Entities:
Keywords: Bioinformatic application; Causative variants; Disease gene validity; Gene prioritization; Genome interpretation; Genomic variants; Genotype–phenotype correlation; Variant pathogenicity
Mesh:
Year: 2022 PMID: 35180879 PMCID: PMC8857790 DOI: 10.1186/s12920-022-01166-3
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1GeneTerpret workflow. The figure depicts the modules, their feeding databases, acceptable inputs, and the flow of information in the workflow. The main modules feed into the gene validity, VIP, and causality modules. Three sets of modules are available within GeneTerpret for gene validity exploration; (1) ExPhenosion module—accepts the phenotype as input; the number of super-classes to walk up can be customized; and outputs the connected phenotypes and their associated genes. This module works independently from other developed modules to extract the connected phenotypes to the selected phenotype and allow the analyst to explore the genes associated with related phenotypes; (2) CanGene modules—generate a list of candidate genes by compiling various types of evidence. The cross-species (zebrafish and mouse) modules accept the disease(s) by its/their MONDO ID(s) as the input(s) and generate a list of genes that their orthologue is associated with similar disease in animal models by checking the related databases. The Homology and Protein–Protein Interaction modules accept a list of known genes for a phenotype (if it is available); the Homology module returns the homologous genes (paralogues) to the genes in the known gene list. The Protein–Protein interaction module takes a similar approach to generate a list of genes that interact with the known disease genes. The analyst can select the number of interaction neighborhood levels (such as level-1, level-2, etc.) desired for this interpretation. The Gene Expression module accepts a list of relevant tissues as input and outputs the list of genes expressed in the selected tissue based on the expression cut-off threshold which is set by the analyst; (3) KING module—accepts a disease(s) (MONDO ID(s)) as the input and then outputs a list of genes associated with the said disease based on evidence obtained from Orphanet, OMIM, ClinVar, and MedGen databases. The validity module accepts the generated gene lists from the modules CanGene and KING, as well as ANNOVAR, annotated VCF file or the output of VIP module as an input. The output file is the VCF file including validity scores. VIP module has been developed based on ACMG guidelines [16]. This module annotates the variants with pathogenicity terms (PVS1, PS1, etc.) and justifies the assigned terms. The causality module integrates the output of validity and VIP modules and ranks the variants based on the number of evidence extracted from validity modules and pathogenicity terms from VIP. Simultaneously, an interactive graphical representation of the variants is generated which allows the analyst to select the desired variants by using a LASSO filter
Databases used by operational modules in the GeneTerpret Platform
| Module | Task | Databases |
|---|---|---|
| ExPhenosion | Identifies genes associated with the selected phenotype(s) and its/their superclass phenotypes | Human Phenotype Ontology (HPO) Medical Subject Headings (MeSH) |
| CanGene | ||
| Cross-species: Mouse | Identifies candidate genes that cause a “similar” phenotype in a mouse model | Mouse Genome Informatics (MGI) |
| Cross-species: Zebrafish | Identifies candidate genes that cause a “similar” phenotype in the zebrafish model | The Monarch Initiative |
| Homology | Identifies candidate genes homologous to known disease genes for a phenotype | Ensembl |
| Protein–Protein interaction | Identifies candidate genes/proteins that physically interact with known disease genes/proteins based on human studies | The Biological General Repository for Interaction Datasets (BioGRID) |
| Gene Expression | Identifies candidate genes expressed in the affected tissue | EMBL-EBI Expression Atlas |
| Known INvolved Genes ( | Identifies known genes for a selected phenotype | Online Mendelian Inheritance in Man (OMIM) Orphanet NCBI MedGen NCBI ClinVar |
| Gene Validity | Calculates validity scores for each gene by examining the strength of the evidence supporting a gene-disease relationship obtained from the above modules | N/A |
| Variant Interpretation Program ( | Classifies variants based on their pathogenicity following the criteria proposed by the American College of Medical Geneticists (ACMG) | ClinGen Dosage Sensitivity Map Decipher haploinsufficiency predictions ExAC pLI score ClinVar The NHGRI-EBI Catalog of published GWAS Pfam clans Weil et al. 2017 [ |
| Causality | Graphical visualization of the distribution of prioritized variants across the five classifications of pathogenicity |
VIP Interpretation of all variants from DECIPHER
| Clinical significance | VIP (Automated Pathogenicity Identifier module) | DECIPHER (Manual Pathogenicity Identifier) | Concordant |
|---|---|---|---|
| Benign | 14 (0.1%) | 23 (0.3%) | 0 (0%) |
| Likely Benign | 81 (0.9%) | 211 (2.4%) | 9 (0.2%) |
| Uncertain significance | 3633 (42.2%) | 3329 (38.6%) | 2202 (48.9%) |
| Likely pathogenic | 2692 (31.3%) | 2508 (29.1%) | 1055 (23.4%) |
| Pathogenic | 2190 (25.4%) | 2539 (29.5%) | 1240 (27.5%) |
| Sum of five tiers | 8610 | 8610 | 4506 |
| Benign or likely benign | 95 (1.1%) | 234 (2.7%) | 9 (0.2%) |
| Pathogenic or likely pathogenic | 4882 (56.7%) | 5047 (58.6%) | 3764 (83.5%) |
Fig. 2Graphical representation of the results from an analysis of internal datasets by GeneTerpret and manual interpretation. A The top hundred of ranked variants from the family-based analysis of ten families are represented. The red colour is highlighting the variant of interest (VOI) selected by a human analyst as published before [21]. The boxes around the variants cluster the same ranked variants by GeneTerpret (the same pathogenicity and validity terms). B The cohort-based results for 20 unrelated probands with “Tetralogy of Fallot”. The top hundred ranked variants are plotted as circles from top to bottom. The only five VOIs selected by a human genome analyst in five patients from this cohort [20] are highlighted in colours. Different colours have been selected to distinguish the VOI related to each patient. For comparison, individual analysis of genomes from the five probands with VOIs are also plotted using the same colour-coding. For instance, the purple colour represents the obtained VOI for patient TOF53 (one of the probands in the cohort). This variant is ranked 44 in the cohort-based analysis and ranked 8 in the singleton-based analysis by GeneTerpret