| Literature DB >> 31116477 |
Laurens Wiel1,2, Coos Baakman2, Daan Gilissen1,3, Joris A Veltman4,5, Gerrit Vriend2, Christian Gilissen1.
Abstract
The growing availability of human genetic variation has given rise to novel methods of measuring genetic tolerance that better interpret variants of unknown significance. We recently developed a concept based on protein domain homology in the human genome to improve variant interpretation. For this purpose, we mapped population variation from the Exome Aggregation Consortium (ExAC) and pathogenic mutations from the Human Gene Mutation Database (HGMD) onto Pfam protein domains. The aggregation of these variation data across homologous domains into meta-domains allowed us to generate amino acid resolution of genetic intolerance profiles for human protein domains. Here, we developed MetaDome, a fast and easy-to-use web server that visualizes meta-domain information and gene-wide profiles of genetic tolerance. We updated the underlying data of MetaDome to contain information from 56,319 human transcripts, 71,419 protein domains, 12,164,292 genetic variants from gnomAD, and 34,076 pathogenic mutations from ClinVar. MetaDome allows researchers to easily investigate their variants of interest for the presence or absence of variation at corresponding positions within homologous domains. We illustrate the added value of MetaDome by an example that highlights how it may help in the interpretation of variants of unknown significance. The MetaDome web server is freely accessible at https://stuart.radboudumc.nl/metadome.Entities:
Keywords: ClinVar; Pfam; genetic tolerance; genetic variation; gnomAD; meta-domains; pathogenicity; protein domain homology; web server
Mesh:
Substances:
Year: 2019 PMID: 31116477 PMCID: PMC6772141 DOI: 10.1002/humu.23798
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Statistics on the number of entries present in GENCODE, Swiss‐Prot, and our mapping database
| Database | What | # Of entries |
|---|---|---|
| GENCODE | Protein‐coding genes | 20,345 |
| MetaDome | Protein‐coding genes | 19,728 |
| GENCODE | Protein‐coding transcripts | 57,005 |
| MetaDome | Protein‐coding transcripts | 56,319 |
| Swiss‐Prot | Canonical and isoform protein sequences | 591,556 |
| Swiss‐Prot | Human canonical and isoform protein sequences | 42,130 |
| MetaDome | Gene translations identically mapped to a canonical or isoform protein sequence | 42,116 |
| MetaDome | Canonical and isoform protein sequences | 33,492 |
| MetaDome | Pfam protein domain regions | 71,419 |
| MetaDome | Unique Pfam protein domain families | 5,948 |
| MetaDome | Unique Pfam protein domain families with two or more within‐human occurrences | 3,334 |
| MetaDome | Chromosome to protein position mappings | 70,261,143 |
| MetaDome | Unique chromosome positions | 32,595,355 |
| MetaDome | Unique residues (as part of a protein) | 19,226,961 |
| MetaDome | Unique protein sequences with at least one Pfam domain annotated | 30,406 |
Figure 1MetaDome web server result for the gene CDK13 The result provided by the MetaDome web server for the analysis of gene CDK13 with transcript ENST00000181839.4, as provided in (1). In (2), there is additional information that the translation of this transcript corresponds to Swiss‐Prot protein Q14004. Here also various alternative visualizations can be selected. The visualization starts by default in the “meta‐domain landscape,” a mode selectable in the graph control in (2). The landscapes are visualized in (3), and in the meta‐domain landscape the domain regions are annotated with missense variation counts found in homologous domains as bar plots. The schematic protein representation, located at (4), is per‐position selectable, and the domains are presented as purple blocks. Selected positions are highlighted in green. The “Zoom‐in” section at (5) features a selectable grayed‐out copy of schematic protein representation that can zoom‐in on any part of the protein. Any selected positions are in the list of selected positions in (6). Here more information can be obtained by clicking on one of these positions. A detailed description of the functionality of each component is described in Table 2
Descriptions of the various functionalities on the MetaDome result page
| Component | Functionality |
|---|---|
| Gene and transcript input field |
Input of gene of interest |
| (Figure |
Retrieving transcripts for gene of interest |
|
Selecting a transcript | |
|
Starting the analysis for selected transcript | |
| Graph control field |
Toggling between different landscape representations |
| (Figure |
Reset the zoom on the landscape |
|
Reset the web‐page | |
|
Toggle ClinVar variants to be displayed in the schematic protein | |
|
Download the visual representation | |
| Landscape view |
Displays the meta‐domain landscape |
| (Figure |
Displays the tolerance landscape |
| Schematic protein |
Displays a schematic representation of the gene's protein with Pfam protein domains annotated |
| (Figure |
Hovering over a position displays positional information |
|
Clicking on a position highlights the position and adds the position to the list of “Selected Positions” | |
|
Controls the zooming of particular parts of the protein (Figure | |
| Selected positions |
Displays any positions selected in the schematic protein |
| (Figure |
Displays per selected position: if that position is part of a Pfam protein domain, any known gnomAD or ClinVar variants present at this position, and any variants that are homologously related to this position |
|
Provides more detailed information as a pop‐up when clicking on one of the positions in this list. |
Figure 2Examples of a MetaDome analysis for the gene CDK13 (a) The tolerance landscape depicts a missense over synonymous ratio calculated as a sliding window over the entirety of the protein (Methods: Computing genetic tolerance and generating a tolerance landscape). The missense and synonymous variation are annotated from the gnomAD dataset and the landscape provides some indication of regions that are intolerant to missense variation. In this CDK13 tolerance landscape the Pkinase Pfam protein domain (PF00069) in purple can be clearly seen as intolerant if compared with other parts in this protein. The red bars in the schematic protein representation correspond to pathogenic ClinVar variants found in this gene and in homologous protein domains. All of these variants are contained in the intolerant region of the landscape. (b) A zoom‐in on the meta‐domain landscape for CDK13. The Pkinase Pfam protein domain (PF00069) is located between protein positions 707 and 998 and annotated as a purple box in the schematic protein representation. The meta‐domain landscape displays a deep annotation of the protein domain: the green (gnomAD) and red (ClinVar) bars correspond to the number of missense variants found at aligned homologous positions. Unaligned positions are annotated as black bars. All of this information is displayed upon hovering over these various elements. (c) The positional information provides a detailed overview of a position from the “Selected Positions” list, especially if that position is aligned to domain homologs. Here, for position p.Gly714 we can observe in (1) the positional details for this specific protein position. In (2) is any known pathogenic information for this position. We can observe here that for this position there are two known pathogenic missense variants. In (3) meta‐domain information is displayed and we can observe that p.Gly714 is aligned to consensus position 10 in the Pkinase Pfam protein domain and related to 329 other codons. This consensus position has an alignment coverage of 93.5% for the meta‐domain MSA. There are also four pathogenic variants found in ClinVar on corresponding homologous positions as can be seen in (4) and in (5) there is an overview of all corresponding variants found in gnomAD. MSA, multiple sequence alignment