| Literature DB >> 35615018 |
Young-In Chi1,2, Timothy J Stodola1, Thiago M De Assuncao1,2, Elise N Leverence1, Brian C Smith3, Brian F Volkman3, Angela J Mathison1,2, Gwen Lomberk1,2,4, Michael T Zimmermann1,3,5, Raul Urrutia1,2,3,5.
Abstract
The histone demethylase KDM6A has recently elicited significant attention because its mutations are associated with a rare congenital disorder (Kabuki syndrome) and various types of human cancers. However, distinguishing KDM6A mutations that are deleterious to the enzyme and their underlying mechanisms of dysfunction remain to be fully understood. Here, we report the results from a multi-tiered approach evaluating the impact of 197 KDM6A somatic mutations using information derived from combining conventional genomics data with computational biophysics. This comprehensive approach incorporates multiple scores derived from alterations in protein sequence, structure, and molecular dynamics. Using this method, we classify the KDM6A mutations into 136 damaging variants (69.0%), 32 tolerated variants (16.2%), and 29 variants of uncertain significance (VUS, 14.7%), which is a significant improvement from the previous classification based on the conventional tools (over 40% VUS). We further classify the damaging variants into 15 structural variants (SV), 88 dynamic variants (DV), and 33 structural and dynamic variants (SDV). Comparison with variant scoring methods used in current clinical diagnosis guidelines demonstrates that our approach provides a more comprehensive evaluation of damaging potential and reveals mechanisms of dysfunction. Thus, these results should be taken into consideration for clinical assessment of the damaging potential of each mutation, as they provide hypotheses for experimental validation and critical information for the development of mutant-specific drugs to fight diseases caused by KDM6A dysfunctions.Entities:
Keywords: 2OG, 2-oxoglutarate; COSMIC, Catalog of somatic mutations in cancer; Cancer; DV, Dynamics variants; Epigenetic regulator; Genomic variation; HAT, Hydrogen atom transfer; HMT, Histone methyltransferase; Histone demethylase; JmjC, Jumonji C domain; KDM6A; KDM6A, Histone lysine(K)-specific demethylase 6A; Kabuki syndrome; MD, Molecular dynamics; Molecular dynamics; Mutational impact analysis; PDB, Protein data bank; Protein structure; RMSD, Root mean square deviation; RMSF, Root mean square fluctuation; Rg, Radius of gyration; SASA, Solvent-accessible surface area; SDV, Structural & dynamics variants; SNP, Single nucleotide polymorphism; SV, Structural variants; TCGA, The Cancer Genome Atlas; TPR, Tetratricopeptide repeat; VUS, Variant of uncertain (unknown) significance; dbSNP, Single nucleotide polymorphism database; gnomAD, genome aggregation database
Year: 2022 PMID: 35615018 PMCID: PMC9111933 DOI: 10.1016/j.csbj.2022.04.028
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1KDM6A tissue-specific expression profile and cancer somatic mutation types. (A) KDM6A gene expression profile across all tumor samples and paired normal tissues. These data were extracted from the Genotype-Tissue Expression (GTEx) and Gene Expression Profiling Interactive Analysis (GEPIA) portals. TPM: Transcripts Per Million (B) Primary tissue types associated with KDM6A cancer mutations. A wide range of cancer types are observed with unequal prevalence. Although the figure was prepared with the TCGA data, similar distribution patterns of cancer types are observed in the COSMIC database. (C) Mutation types of KDM6A cancer somatic variants.
Fig. 2Pre-classification of cancer-associated missense mutations and protein architecture reveal a diffuse landscape. (A) Sub-domain structure and distribution KDM6A missense mutations within the catalytic domain. The number of independent samples across TCGA, COSMIC, or ClinVar databases, harboring each missense mutation reveals that mutations spread out throughout the sequence, and the three mutations (L1100P, R1111C, and R1255W) have a high number of incidents. The impact predictions made by the genomics tools SNPs&Go, MutPred2, PolyPhen2, and Rhapsody are shown by small bars in the order of effect (damaging red to tolerated white). (B) Mapping of cancer-associated missense variants onto the KDM6A molecular structure. The catalytic domain is divided into the JmjC (blue) flanked by two additional sub-domains (helical domain: magenta and the zinc-binding domain: green) and a long flexible linker (yellow). The bound substrate is shown as ball-and-sticks while the catalytic domain is shown as ribbons. The color codes are identical to the ones used in Fig. 2A. (C) Venn diagram of the damaging variants predicted by each prediction tool, using the threshold value suggested by each program. Numbers of the damaging variants predicted by each program are indicated in parentheses. 47.7% variants (94 out of 197) share consensual damaging predictions while 40.1% (79 of 197) have conflicting predictions. The consensual tolerated variants (12.2%, 24 of 197) are not shown in this diagram. (D-F) Zoomed views of the key functional regions of KDM6A: the active site (D), the substrate binding interface (E), and the zinc ion binding site (F). Several key interaction residues (pink) are mutated in the cancer patients and they were used as additional damaging controls in the study. Within the active site, two non-natural damaging control residues (H1146 and E1148) are also labeled in red. H3 histone peptide residues (orange) are shown as sticks. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3Initial scoring and identification of congruent and more functionally relevant MD-based metrics by the cross-correlation matrix of the impact scores. (A) Heatmap of the initial raw scores by individual metrics. Because the original scores have different units and ranges in each column, they have been simply ranked to be equally scaled. Damaging scores are indicated by the intensity of red color. The controls are listed in the order of two well-known damaging and one benign mutant. The twelve key functional disruptors refer to mutations found right at the key functional residues. The contrast between the key functional disruptors and the gnomAD general population references are quite noticeable for sequence- and structure-based scores, but only for some selective MD-based scores that were identified by the subsequent congruency analysis. (B) Inter-relationship or dependence among protein sequence, structure, and dynamics for proper function. (C) Cross-correlation matrix of the scores from a comprehensive assessment. Among the MD-based scores, time-dependent substrate interaction-zinc ligation energy and RMSF (indicated by arrows) initially stand out to have notable congruencies with other sequence- and structure-based scores. (D) Cross-correlation matrix of the scores from the finally chosen metrics for meta-score calculations that are concordant and functionally relevant, thus have been evolutionally conserved. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 4Mapping of reclassified KDM6A cancer variants and comparison of conventional prediction tools to our comprehensive ‘molecular fitness’ assessment. (A-B) Damaging variants (A) and tolerated variants (B) based on meta-scoring reveal that the damaging variants (red) are concentrated near the active site (the jellyroll fold of the JmjC domain) and the substrate binding interface while the predicted tolerated variants (teal) are mostly found in the fringe of the catalytic domain. (C) Venn diagram of the tolerated (TV: green) and sub-grouped damaging variants, such as structural (SV: orange), dynamics (DV: pink), and structural & dynamics variants (SDV: red) based on our meta-scoring of all 197 variants. (D) Mapping of the sub-grouped damaging variants. The color codes are identical to the ones used in Fig. 4C. The most damaging variants (SDV) are all concentrated in the JmjC and the zinc-binding domains. (E) Comparison of conventional (sequence-based) prediction tools and comprehensive ‘molecular fitness’ (structural and dynamics-based) assessment for each pre-classified group. We compare the two classification results using a pie chart that indicates damaging versus tolerated for our new classification results, for each of the three pre-classification categories. The inset bar chart shows the balance between our three damaging categories. Overall, comprehensive assessments are in good agreement with the pre-classifications, but provide information of more specific mechanistic value. Confirmed damaging variants among the consensus damaging group by pre-classification (left chart) are altered in structure, dynamics, or both, while confirmed damaging variants among the consensus tolerated group (right) primarily affect protein dynamics. These types of mechanism-based interpretations should enable to resolve the conflicting variants (middle). Numbers of the variants in each group are indicated in parentheses. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)