Literature DB >> 30828412

Protein structure aids predicting functional perturbation of missense variants in SCN5A and KCNQ1.

Brett M Kroncke¹, Jeffrey Mendenhall^2,3, Derek K Smith⁴, Charles R Sanders^3,5, John A Capra^6,7, Alfred L George⁸, Jeffrey D Blume⁴, Jens Meiler^2,3,9, Dan M Roden^1,7,9.

Abstract

Rare variants in the cardiac potassium channel KV7.1 (KCNQ1) and sodium channel NaV1.5 (SCN5A) are implicated in genetic disorders of heart rhythm, including congenital long QT and Brugada syndromes (LQTS, BrS), but also occur in reference populations. We previously reported two sets of NaV1.5 (n = 356) and KV7.1 (n = 144) variants with in vitro characterized channel currents gathered from the literature. Here we investigated the ability to predict commonly reported NaV1.5 and KV7.1 variant functional perturbations by leveraging diverse features including variant classifiers PROVEAN, PolyPhen-2, and SIFT; evolutionary rate and BLAST position specific scoring matrices (PSSM); and structure-based features including "functional densities" which is a measure of the density of pathogenic variants near the residue of interest. Structure-based functional densities were the most significant features for predicting NaV1.5 peak current (adj. R2 = 0.27) and KV7.1 + KCNE1 half-maximal voltage of activation (adj. R2 = 0.29). Additionally, use of structure-based functional density values improves loss-of-function classification of SCN5A variants with an ROC-AUC of 0.78 compared with other predictive classifiers (AUC = 0.69; two-sided DeLong test p = .01). These results suggest structural data can inform predictions of the effect of uncharacterized SCN5A and KCNQ1 variants to provide a deeper understanding of their burden on carriers.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: And protein function; Function prediction; KCNQ1; Protein structure; SCN5A

Year: 2019 PMID： 30828412 PMCID： PMC6383132 DOI： 10.1016/j.csbj.2019.01.008

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Of an estimated 20,000 nonsynonymous single nucleotide polymorphisms (nsSNPs) in each individual's protein-coding genome, approximately 10 are presently predicted to be clinically actionable [26]. nsSNPs in KCNQ1 (KV7.1 channel protein, which complexes with the protein KCNE1 to generate the slow cardiac potassium repolarization current, IKs) and SCN5A (NaV1.5 channel protein, which generates the cardiac depolarizing sodium current, INa), are associated with heritable diseases of the heart [4,36,37,38,49] including dilated cardiomyopathy [14,28], cardiac conduction disease [6,29], short QT syndrome [13], sick sinus syndrome [15], types 1 and 3 congenital long QT syndromes (LQTS) [7,18,30,38], and Brugada syndrome (BrS) [5]. However, in aggregate, rare nsSNPs in SCN5A and KCNQ1 also appear at ~2% in the population, being more common than the rare arrhythmia disorders associated with these genes, suggesting only limited roles in disease. Determining the significance and effect size of these nsSNPs will be of increasing importance as more people undergo genome or exome sequencing [3,27]. Models used to predict the effect of these nsSNPs are most commonly trained on the information-poor inputs of binary disease-inducing/benign classification. Binary classification reduces information. Moreover, the disease-inducing vs. benign distinction ignores penetrance and the underlying molecular phenotype—or potentially multiple overlapping molecular phenotypes—that may be most informative for therapy. A striking example involves patients presenting with type 3 long QT syndrome due to a gain-of-function SCN5A variant that also impairs trafficking of the encoded channel NaV1.5. Therapeutic targeting of this gain-of-function with the antiarrhythmic drug mexiletine can increase cell surface expression of the mutant channel, leading to the unintended consequence of exaggerating the long QT phenotype [34,35,46]. Using literature datasets we have recently curated for both IKs [25,47] and INa, [22] we test the hypothesis that incorporating variant-specific functional features from KCNQ1 and SCN5A nsSNPs and structure-based features into prediction models will improve our ability to predict if previously uncharacterized nsSNPs will result in altered currents. Secondary structural elements are independent predictors of deleterious variants in SCN5A and can improve current prediction models [20], suggesting the potential utility of structure-based approaches. In fact, the highest densities of disease-associated variants across the entire spectrum of proteins fall largely in structured, functional segments: the structure/function of these molecules are compromised in the disease state [23,50]. Here, we generated a set of models able to predict INa and IKs variant-specific current phenotypes. Identifying the variant-specific functional perturbation will provide an additional tool to geneticists and physicians to determine if variants are likely disease-causing and to more accurately stratify the degree of risk that carriers who present without a phenotype will eventually develop channelopathy-based heart disease.

Methods

Quantified functional parameters of KCNQ1 and SCN5A chosen for analysis

For INa, we analyzed peak current, steady state V1/2 activation and inactivation, late/persistent current, and recovery from inactivation [49]. For IKs, we analyzed peak current, V1/2 activation, and activation and deactivation time constants [19]. We selected these functional features because these parameters are most consistently reported in the literature. We only included functional data from KV7.1 variants when functional protocols involved homotetrameric mutated KV7.1 coexpressed with KCNE1, since this protocol was most commonly reported in the literature. Details about how each dataset was collected is contained in the original papers.([22,25]; C. G. [48]) Briefly, all variants were normalized to WT measurements included in the same publication, i.e. peak current mutant/peak current WT, or V1/2 activation (mutant) – V1/2 activation (WT), etc. Most functionally characterized variants in SCN5A were characterized by heterologous expression in human embryonic kidney cells (291 of 356 total), so we used only patch-clamp data derived in human embryonic kidney cells when available. For KCNQ1-KCNE1, most variants were characterized in CHO cells (79 of 165 total). We averaged the individual parameters in cases where multiple articles reported functional characterization of the same variant in the same cell system.

Generating structural models of KV7.1 (KCNQ1)

No experimental structure of transmembrane domains of human KV7.1 exists, so we generated models using the recently released Xenopus structure of a closed pore and open voltage sensor and the human sequence NP_000209.2 with 91% identity [45]. We used comparative modeling within the Rosetta scripts utility in Rosetta 3.8 to build KV7.1 [44]. We rebuilt loops on KV7.1 monomers, followed by rebuilding the functional homotetramer with symmetry for 1000 models. Most best-scoring structures had reasonable Cα RMSDs between 1 and 3. We selected the best scoring model for subsequent analysis. We built models both with, and without, human calmodulin (CaM) bound; however no significant differences were observed in structure-based features, therefore, we selected KV7.1 with CaM bound for the analysis presented here.

Generating structural models of NaV1.5

We generated two human NaV1.5 structural models using the human sequence NP_000326.2 with the American cockroach sodium channel NaVPaS structure [41] (45% identity), and electric eel NaV1.4 structure [52] (67% identity). Models of NaV1.5 were refined with small, unstructured segments rebuilt using established protocols as for KV7.1, generating 1000 models. Most best-scoring structures had reasonable Cα RMSDs between 2 and 4. We selected the best scoring model for subsequent analysis. We tested the performance of structure-based features using both models, with very similar results. Because models based on the NaVPaS structure allow the inclusion of more variants in the analysis, we report here features calculated using those structural models.

Summary of predictive features

Our objective was to predict variant-specific functional perturbations for the cardiac ion channels KV7.1 + KCNE1 (IKs) and NaV1.5 (INa). We used the variant classifier models PROVEAN [9], PolyPhen-2 [1], and SIFT [24]; sequence alignment-based rate of evolution [32], and mutation rates derived from BLAST position specific scoring matrices (PSSM), and Point Accepted Mutation (PAM) matrix score [39]; and several structure-based features including burial propensities (how often certain residues are in the interior of the protein), neighbor counts (number of neighboring residues), neighbor identities (propensity of neighboring residues to be close in space) and what we term functional density (k-nearest neighbors-inspired metric to estimate functional perturbation). These predictive features are described below and summarized in Table S1. As can be seen in the higher off-diagonal R2s, predictive classifiers were modestly degenerate; functional density weight only, i.e. the local enrichment for variants that had been functionally characterized, were more degenerate (described below, Figs. S1 and S2).

Calculating structure-derived features

NeighborCount is derived from the number of nearest neighbors weighted by distance and within 11.4 Å of the residue of interest, a cutoff found to be optimized to predict protein structure [12]. NeighborVector is a variation of neighbor density, scaled by how evenly distributed the nearest neighbor residues are to the residue of interest. Amino acid neighbor count (aaneigh) and amino acid neighbor vector (aaneighvector) are analogous to NeighborCount and NeighborVector, respectively, modified to account for amino acid-specific propensities for a given degree of burial [12,51]. NeighborCount, NeighborVector, aaneigh, aaneighvector predictive features were generated using the BioChemical Library (BCL) and the structures described above (for more detail see [12,51]).

Generating a structure-based functional density predictor

In addition to the structure-based features described above, we leveraged both the structural models and variant-specific functional datasets for IKs and INa in estimating the “functional density”. Using an approach akin to k-nearest neighbors, we calculated functional density by averaging functional perturbations of variants near the variant of interest weighted by the inverse of their distance from the variant of interest. This calculated feature therefore depends on how many functionally perturbed variants are near the variant of interest, with regions in three-dimensional space dense with functionally perturbed variants—“hotspots”—yielding a more perturbed prediction. However, all functionally characterized variants contribute to this parameter. We did not use a cutoff to determine whether or not to include a variant in this analysis. Functional density is calculated as follows: where ρj is functional density of the jth residue and xth functional parameter, Δfunctionx,i is the change in functional parameter x for the ith variant, and di,j is the distance between the center of mass of residues i and j. i does include residue j, but only if the identity of the amino-acid mutation is changed, i.e. mutation(i) ≠ mutation(j). A graphical representation is shown in Fig. S3. The distribution of neighboring residues is similar between KV7.1 and NaV1.5, with a first shell of contacting residues at ~6 Å and a second shell at ~11 Å (Fig. S4). Additionally, we calculated the functional density weights alone (same equation as above, but with ∆function = 1) to test whether signal derived from functional densities could be attributed to protein region bias in the variants that have been functionally characterized.

Variant-specific INa and IKs functional perturbation predictive models

Because the number of features in our dataset was large relative to the number of variants, regularization was used to fit predictive models. We used a fully relaxed LASSO penalty, which has good predictive performance overall [16]. Prediction models were 10-fold cross-validated. After feature selection, the relaxed generalized linear model was bootstrapped (1000 times) to obtain bootstrapped percentile intervals for quantities of interest. We report the adjusted coefficient of determination, adj. R2, with 95% confidence intervals as a measure of overall prediction of the relaxed LASSO model. We focused on models where LASSO shrinkage yielded at least one significant predictive feature and the lower bound of the naïve 95% confidence interval for the adj. R2 was >0.10. Relatively few models were able to meet these minimum criteria. Note that since the functional density features were calculated from the data, we additionally subjected the fully relaxed LASSO to higher-level 10-fold cross validation procedure which included a functional density construction step. This accounts for any variability or overfitting that might result from using data-determined functional covariates.

Loss-of-function classification of INa and IKs with and without structure-based features

We further classified loss-of-function variants by degree of functional perturbation, for INa defined as <50% peak current [22] and for IKs < 50% peak current or > 10 mV positive shift in V1/2 activation [25], to estimate the impact of functional densities on this task. We used commonly available variant sequence-based classifiers PolyPhen2, PROVEAN, BLAST-PSSM, and rate of evolution individually, all combined, and all combined with peak current functional density in a logistic regression model. We generated 95% confidence intervals on AUCs from the candidate models using bootstrap with 2000 replicates and used a two-sided DeLong test to evaluate ROC difference significance.

Results

Ion channel missense variants have diverse effects on current

Histograms of all functional parameters analyzed are shown in Figs. 1 and 2 and Table 1. For homotetrameric KV7.1 variants, the distribution of IKs current maxima is skewed towards 0% current compared to WT function, likely a reflection of literature bias. The distribution of INa variant current maxima is bimodal with centers at 0% (complete LOF) and 100% (WT). IKs V1/2 activation is also skewed towards more positive values, whereas INa V1/2 activation is more evenly distributed about 0 mV. INa late current is skewed towards higher values. Time constants for IKs activation and inactivation and INa recovery from inactivation are clustered around WT with very wide ranges, populated with few points at extremely long characteristic times.

Fig. 1

Histogram distributions of all functional parameters for KV7.1 + KCNE1 (IKs) analyzed in this paper. All values are referenced to WT which is either 100% or 0 mV.

Fig. 2

Histogram distributions of all functional parameters for NaV1.5 (INa) analyzed in this paper. All values are referenced to WT which is either 100% or 0 mV.

Table 1

Summary statistics of functional parameters.

Na_V1.5	# of variants	Median [1st Q, 3rd Q]	WT
Peak Current	162	82 [36, 100] (%WT)	100%
Late Current	61	253 [122, 474] (%WT)	100%
V_1/2 Activation	163	0.00 [−1.63, 3.09] (mV)	0 mV
V_1/2 Inactivation	141	0.00 [−4.00, 3.44] (mV)	0 mV
Inactivation Recovery	85	98 [76, 138] (%WT)	100%

K_V7.1	# of variants	Median [1^st Q, 3^rd Q]	WT

I_Kspeak	142	17 [0, 59] (%WT)	100%
V_1/2 Act	93	6.40 [0.00, 23.80] (mV)	0 mV
tau_act	58	106 [94, 150] (%WT)	100%
tau_deact	57	87 [70, 115] (%WT)	100%

Histogram distributions of all functional parameters for KV7.1 + KCNE1 (IKs) analyzed in this paper. All values are referenced to WT which is either 100% or 0 mV. Histogram distributions of all functional parameters for NaV1.5 (INa) analyzed in this paper. All values are referenced to WT which is either 100% or 0 mV. Summary statistics of functional parameters.

Models can significantly predict INa and IKs peak current but rely on different predictive features

Using a linear model, we could predict peak current, a proxy for overall channel function, for both IKs and INa (lower bound 95% CI adj. R2 of 0.14 and 0.18 respectively; Table 2 and Fig. 3). Interestingly, sequence-based predictors, especially BLAST-PSSM, had the most significant association with IKs peak current (Table S2, Fig. S5) but were not as integral to predicting INa peak current (Table S3, Fig. S6). Conversely, functional density for peak current provided most of the signal for INa but did not contribute meaningfully to IKs peak current prediction. This suggests a spatial dependence of peak current for INa not recapitulated by other published predictive models, contrary to IKs. This difference may be due in part to the comparatively large fraction of reported SCN5A variants that do not perturb peak current yet are still associated with cardiac diseases compared to KCNQ1, such as LQT3 variants with increased late current but no change in peak current; BLAST-PSSM is sensitive to evolutionary fitness of residue changes which may be more homogeneously dependent on peak current for KCNQ1 and more heterogeneous for SCN5A. Alternatively, the spatial distribution of IKs peak current may be more heterogeneous than for INa. The functional density weight, a measure of the number of functionally characterized variants proximal to a residue of interest, was selected out of the IKs peak current model, but not for INa suggesting a modest sampling bias in regions of NaV1.5 sensitive to peak current perturbation.

Table 2

Summary statistics of predictive model.

Functional parameter	Adj. R2† [95% CI†; CV‡]
I_Ks Peak Current	0.24 [0.14–0.46; 0.24]
I_Ks V_1/2 Activation	0.29 [0.12–0.48; 0.23]
Na_V1.5 Peak Current	0.27 [0.18–0.45; 0.23]
Na_V1.5 V_1/2 Inact.	0.16 [0.08–0.34; 0.05]

CI (confidence interval)

CV (10 fold cross-validation)

Fig. 3

Experimental vs. predicted functional parameters for the subset of functional features with significant predictive models (Table 2). Plot of experimental Iks peak current, Iks V1/2 activation, and INa NaV1.5 peak current vs. predictions from a linear regression. The resulting models explain 0.24, 0.29, and 0.27 of the variance in Iks peak current, Iks V1/2 activation, and INa peak current, respectively.

Summary statistics of predictive model. CI (confidence interval) CV (10 fold cross-validation) Experimental vs. predicted functional parameters for the subset of functional features with significant predictive models (Table 2). Plot of experimental Iks peak current, Iks V1/2 activation, and INa NaV1.5 peak current vs. predictions from a linear regression. The resulting models explain 0.24, 0.29, and 0.27 of the variance in Iks peak current, Iks V1/2 activation, and INa peak current, respectively.

Models can predict steady-state IKs V1/2 activation but not INa V1/2 activation or inactivation

We were able to significantly model IKs V1/2 activation. However, no models could reliably predict INa V1/2 activation or inactivation. The IKs V1/2 activation variance explained is relatively high, 0.29 with a 95% confidence interval lower bound of 0.12 (Table 2). The functional density feature had a significant p-value, suggesting a three-dimensional localization of regions that influence V1/2 activation (Table S2, Fig. S7).

Most INa and IKs functional parameters cannot be reliably predicted

Most IKs and INa functional parameters assessed could not be predicted with stable fully relaxed LASSO-regularized linear models and a lower bound of the 95% confidence interval in adj. R2 >0.10. In many cases for these functional parameters, at least one of the 10 folds in the cross validation resulted in only an intercept, i.e. β coefficients for all inputted features shrunk to 0. For some functional parameters, such as time constants for IKs activation and inactivation and INa late current and recovery from inactivation times, lower numbers of characterized variants and relatively low dispersion of values (Table 1, Figs. 1 and 2) mean the data themselves are limiting prediction. Alternatively, or in addition, our chosen feature set may contain little information relevant to the prediction of these values, likely the case for INa V1/2 activation and inactivation, which may be under sampled for the functional density analysis.

Structural features improve INa but not IKs loss-of-function classification

For comparison with published variant classifiers predicting binary functional perturbation of these two channels [22,25], we calculated receiver operating characteristic curves for models trained using only published models as features and models trained additionally with structure-based features. We generated binary classifications of loss-of-function SCN5A and KCNQ1 variants using criteria described above in the methods section. We calculated the ability of several variant classifiers to correctly classify LOF variants. The resulting areas under the curve (AUCs) from logistic models trained to predict KCNQ1 LOF were as follows (AUC; [95% CI]): PolyPhen-2 (0.81; [0.74–0.92]), rate of evolution (0.77; [0.67–0.87]), BLAST-PSSM (0.84; [0.76–0.92]), PROVEAN (0.83; [0.75–91]), all published predictive models (0.86; [0.78–0.94]), all published predictive models with functional density for peak current (0.87; [0.79–0.94]). Most variant classifiers performed reasonably well and the addition of structural information did not meaningfully improve classification for this task. However, the resulting AUCs from logistic models trained to predict SCN5A LOF were as follows: PolyPhen-2 (0.60; [0.51–0.68]), rate of evolution (0.51; [0.42–0.60]), BLAST-PSSM (0.61; [0.52–0.69]), PROVEAN (0.66; [0.57–0.75]), SIFT (0.53; [0.48–0.58]), all published variant classifiers (0.69; [0.60–0.77]), all published variant classifiers with functional density for peak current (0.78; [0.70–0.85]). This improvement in classification ability for LOF variants in SCN5A when adding functional density for peak current (0.69 without vs. 0.78 with, p = .01) suggests structure-based features contribute information not contained in other predictive features (Fig. S8) an observation gaining appreciation elsewhere [[42], [43]].

Discussion

A limited number of IKs and INa functional parameters can be predicted reliably

Most IKs and INa functional parameters analyzed could not be predicted reliably: IKs time constants of activation and inactivation; and INa V1/2 activation/inactivation, recovery from inactivation, and late current. However, three important functional parameters could be predicted: IKs peak current and V1/2 activation and INa peak current. In two of these models, IKs V1/2 activation and INa peak current, the functional density features have the greatest predictive value, indicating three-dimensional enrichment of regions of the proteins that influence these functional parameters (Table S2 and S3, Figs. S6 and S7).

Functional density suggests regions in three-dimensional space are enriched for influence on IKs V1/2 activation and INa peak current

“Functional densities” are measure of how dense pathogenic variants are near the residue of interest, i.e. are they near “hotspots” that influence a particular function. Given the influence of the functional density calculation in predicting IKs V1/2 activation and INa peak current, there is likely a spatial influence over both of these parameters. As can be seen in Figs. 4 and 5, there are regions (black circles) where variants that have a large influence on IKs V1/2 activation and INa peak current are localized. Not surprisingly, the greatest perturbations in IKs V1/2 activation are in the regions of the channel known to be functionally critical: the selectivity filter, voltage-sensing helix in the voltage sensing domain, and in the constriction point in the middle of the pore, as we have seen previously. [25] The S6 helix in KV7.1 influences activation in part through its intrinsic flexibility, a necessary property for activation. [40] S0 helix has been found to provide stabilization to the voltage sensing domain. [17] S4 helix is canonically responsible for voltage-dependent activation [8,11,31]. Interestingly, the variants most disruptive to INa peak current are located in the extracellular region of the channel, mostly near the selectivity filter. The pore region of voltage-gated sodium channels is canonically responsible for Na+ conduction [2] and is also enriched BrS1 variants, an NaV1.5 loss-of-function disorder [21,22]. These data suggest the utility in leveraging combined structural and previously determined functional perturbation datasets to predict functional disruption of previously uncharacterized channel variants.

Fig. 4

Fig. 5

Structural model of NaV1.5 with colored spheres at Cα positions where variants have peak current available. Colors indicate the degree of perturbation from WT NaV1.5, with the darker color displaying variants with less peak current. Selection criteria are displayed in the inset. A single extracellular region shows apparent enrichment and is circled. Even though there are a greater number of variants functionally characterized for NaV1.5, KV7.1 appears to have a greater number due to its homotetrameric structure.

Structural model of KV7.1 with colored spheres at Cα positions where variants have V1/2 activation data available. Colors indicate the degree of perturbation from WT KV7.1, with the darker color displaying variants with more positive shifts in V1/2 activation. Selection criteria are displayed in the inset. Several regions of apparent enrichment are highlighted by circles. The tetrameric structure gives the appearance of a greater number of functionally characterized variants. Structural model of NaV1.5 with colored spheres at Cα positions where variants have peak current available. Colors indicate the degree of perturbation from WT NaV1.5, with the darker color displaying variants with less peak current. Selection criteria are displayed in the inset. A single extracellular region shows apparent enrichment and is circled. Even though there are a greater number of variants functionally characterized for NaV1.5, KV7.1 appears to have a greater number due to its homotetrameric structure.

Challenging regions to predict

To identify potential commonalities among the most challenging variants to predict, we identified the five least congruent predictions, at extremes both greater and less than experiment, for IKs peak current, IKs V1/2 activation, and INa peak current (Fig. S8–10). All variants, with one exception, occur in the transmembrane region and on structured segments, not flexible loops or linkers. Some commonalities for challenges in predicting IKs peak current and V1/2 activation prediction are the extracellular half of the voltage sensing domain, especially S3 and S4 helices, and the interface between the pore loop helix and helices S5 and S6. The S3 and S4 helices of the voltage sensing domain undergo large conformational changes in response to voltage [8,11,31] which are not captured by the static structure we used in this analysis. However, the distribution of predictions both greater than and less than experiment within these two segments suggests changes in function in these regions are heterogeneous possibly due to individual residues in these regions having special roles in voltage-gated activation. Interestingly, several of the challenging IKs peak current variants are located on the S0 helix in KV7.1. We previously observed an anomalous sensitivity to expression level in the S0 helix and suggest the protein is stabilized by intramolecular interactions between the S0 helix and the rest of the voltage sensing domain. [17] Challenging variants for INa peak current are more evenly distributed though the protein molecule (Fig. S10).

Classification of loss-of-function KCNQ1 and SCN5A variants

Classification of variants inherently reduces the richness of available data, in our case the continuous functional perturbation induced by variants in SCN5A and KCNQ1. However, to assess how well structure-based features contribute to predicting variant loss-of-function classification, we built logistic models trained on variants classified as loss-of-function or not loss-of-function. For INa, structure-based features improve the AUC (Fig. S11); for IKs there is no significant improvement. This is consistent with our previous KCNQ1 work suggesting sequence and evolutionary-based features, BLAST-PSSM and residue rate of evolution, yield a competent classification model and suggests alternative features will be needed to further improve prediction of KCNQ1 variants [25]. For SCN5A, structure-based features improve the classification of loss-of-function variants from an AUC of 0.69 to 0.78 (p = .01).

Recent interest in predicting functional perturbation

Recently. Clerx et al. attempted to predict classification of functionally compromised INa for many of the functional parameters we report here [10]. The authors report modest classification ability for INa late current and V1/2 activation/inactivation with better performance predicting complete loss of function. We too find limited ability to predict most functional perturbations; however, we found significant and quantitative correlations between predicted and experimental INa peak current and challenge the use of functional classification in favor of quantitative perturbation prediction. Interestingly, the authors also noted difficulty in predicting late current which we recapitulate here suggesting this feature is a more challenging target to predict. Furthermore, here we put forward a feature based on knowledge of the three-dimensional structure, functional density, and demonstrate its utility in predicting variant phenotype.

Application to variant annotation

The field is still evolving on how to include in silico predictions and experimental functional data quantitatively [33]. We suggest the model presented here could be useful in a pipeline whose first-pass filter aims to detect pathogenic variants. Our previous publication suggested the degree to which a loss-of-function variant produces non-negligible penetrance was an INa peak current 50% or less than that of WT. We suggest this implies the need to have a variance explained of experimental data from our predictions >50% such that the probability a variant predicted to be WT actually has <50% peak current is very low. Predicting around 0.2 of the variance in relevant IKs and INa functional parameters we show here is significant; however, further improvement is needed before the predictive models will be useful in classifying variants for clinical use.

Limitations

The dataset used was limited by those variants available in the literature, which are biased towards functionally perturbed variants. We chose to analyze IKs generated with homozygous KV7.1 variants (co-expressed with KCNE1) because this configuration is reported most consistently in the literature. In a majority of cases, KV7.1 variants are heterozygous in individuals. Furthermore, we have begun to investigate the influence of variant-specific functional perturbation on clinical presentation [22], but the exact relationship is complicated (notably including β-adrenergic regulation for IKs) and warrants further investigation. Another limitation is that the structural models are imperfect estimates of the functional state they represent and are also only representative of a single functional state in channels known to have at least two functional states. Models reflecting greater conformational diversity may be another source for improved features.

Conclusions

We have derived predictive features from three-dimensional structures of NaV1.5 and KV7.1 and have demonstrated these features improve our ability to predict variant-induced functional perturbations in each channel. These predictive features are based on recognizing that residue positions for pathogenic variants are likely to be clustered in three-dimensional space in proximity to other pathogenic residues. Based on this recognition, we can account for approximately 0.2 of the variance in IKs peak current, IKs V1/2 activation, and INa peak current. For IKs V1/2 activation and INa peak current, structure-based features contribute meaningfully to the predictive model and in a way not recapitulated by commonly used sequence, evolutionary features, or genetic variant classifiers methods. For predicting variant-induced loss-of-function, structure-based features contribute meaningfully to INa but not IKs.

Funding

This work was supported by the National Institutes of Health K99HL135442 to B.M.K.; R35GM127087 to J.A.C.; HL122010 to A.L.G., C.R.S., and J.M.; and P50GM115305 to D.M.R.

49 in total

1. SNPs, protein structure, and disease.

Authors: Z Wang; J Moult
Journal: Hum Mutat Date: 2001-04 Impact factor: 4.878

Review 2. Potassium channel structures.

Authors: Senyon Choe
Journal: Nat Rev Neurosci Date: 2002-02 Impact factor: 34.870

3. Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.

Authors: Tal Pupko; Rachel E Bell; Itay Mayrose; Fabian Glaser; Nir Ben-Tal
Journal: Bioinformatics Date: 2002 Impact factor: 6.937

4. Compound heterozygosity for mutations (W156X and R225W) in SCN5A associated with severe cardiac conduction disturbances and degenerative changes in the conduction system.

Authors: Connie R Bezzina; Martin B Rook; W Antoinette Groenewegen; Lucas J Herfst; Allard C van der Wal; Jan Lam; Habo J Jongsma; Arthur A M Wilde; Marcel M A M Mannens
Journal: Circ Res Date: 2003-02-07 Impact factor: 17.367

5. A single Na(+) channel mutation causing both long-QT and Brugada syndromes.

Authors: C Bezzina; M W Veldkamp; M P van Den Berg; A V Postma; M B Rook; J W Viersma; I M van Langen; G Tan-Sindhunata; M T Bink-Boelkens; A H van Der Hout; M M Mannens; A A Wilde
Journal: Circ Res Date: 1999 Dec 3-17 Impact factor: 17.367

6. Differential roles of S6 domain hinges in the gating of KCNQ potassium channels.

Authors: Guiscard Seebohm; Nathalie Strutz-Seebohm; Oana N Ureche; Ravshan Baltaev; Angelika Lampert; Ganna Kornichuk; Kaichiro Kamiya; Thomas V Wuttke; Holger Lerche; Michael C Sanguinetti; Florian Lang
Journal: Biophys J Date: 2005-12-02 Impact factor: 4.033

Review 7. The KCNQ1 potassium channel: from gene to physiological function.

Authors: Thomas Jespersen; Morten Grunnet; Søren-Peter Olesen
Journal: Physiology (Bethesda) Date: 2005-12

8. Abrupt rate accelerations or premature beats cause life-threatening arrhythmias in mice with long-QT3 syndrome.

Authors: D Nuyens; M Stengl; S Dugarmaa; T Rossenbacker; V Compernolle; Y Rudy; J F Smits; W Flameng; C E Clancy; L Moons; M A Vos; M Dewerchin; K Benndorf; D Collen; E Carmeliet; P Carmeliet
Journal: Nat Med Date: 2001-09 Impact factor: 53.440

9. A novel SCN5A arrhythmia mutation, M1766L, with expression defect rescued by mexiletine.

Authors: Carmen R Valdivia; Michael J Ackerman; David J Tester; Tomoyuki Wada; Jorge McCormack; Bin Ye; Jonathan C Makielski
Journal: Cardiovasc Res Date: 2002-08-01 Impact factor: 10.787

10. Short QT Syndrome: a familial cause of sudden death.

Authors: Fiorenzo Gaita; Carla Giustetto; Francesca Bianchi; Christian Wolpert; Rainer Schimpf; Riccardo Riccardi; Stefano Grossi; Elena Richiardi; Martin Borggrefe
Journal: Circulation Date: 2003-08-18 Impact factor: 29.690

9 in total

1. Deep Mutational Scan of an SCN5A Voltage Sensor.

Authors: Andrew M Glazer; Brett M Kroncke; Kenneth A Matreyek; Tao Yang; Yuko Wada; Tiffany Shields; Joe-Elie Salem; Douglas M Fowler; Dan M Roden
Journal: Circ Genom Precis Med Date: 2020-01-12

2. Structure-function relationship of the slow delayed rectifier channel: impactful questions in 2020 and beyond.

Authors: Gea-Ny Tseng
Journal: Am J Physiol Heart Circ Physiol Date: 2020-01-10 Impact factor: 4.733

3. High-Throughput Reclassification of SCN5A Variants.

Authors: Andrew M Glazer; Yuko Wada; Bian Li; Ayesha Muhammad; Olivia R Kalash; Matthew J O'Neill; Tiffany Shields; Lynn Hall; Laura Short; Marcia A Blair; Brett M Kroncke; John A Capra; Dan M Roden
Journal: Am J Hum Genet Date: 2020-06-12 Impact factor: 11.025

Review 4. Structures Illuminate Cardiac Ion Channel Functions in Health and in Long QT Syndrome.

Authors: Kathryn R Brewer; Georg Kuenze; Carlos G Vanoye; Alfred L George; Jens Meiler; Charles R Sanders
Journal: Front Pharmacol Date: 2020-05-04 Impact factor: 5.810

5. Genotype-Phenotype Correlation in a Family with Brugada Syndrome Harboring the Novel p.Gln371* Nonsense Variant in the SCN5A Gene.

Authors: Michelle M Monasky; Emanuele Micaglio; Daniela Giachino; Giuseppe Ciconte; Luigi Giannelli; Emanuela T Locati; Elisa Ramondini; Roberta Cotugno; Gabriele Vicedomini; Valeria Borrelli; Andrea Ghiroldi; Luigi Anastasia; Carlo Pappone
Journal: Int J Mol Sci Date: 2019-11-06 Impact factor: 5.923

6. Computational identification of vesicular transport proteins from sequences using deep gated recurrent units architecture.

Authors: Nguyen Quoc Khanh Le; Edward Kien Yee Yapp; N Nagasundaram; Matthew Chin Heng Chua; Hui-Yuan Yeh
Journal: Comput Struct Biotechnol J Date: 2019-10-25 Impact factor: 7.271

7. A Bayesian method to estimate variant-induced disease penetrance.

Authors: Brett M Kroncke; Derek K Smith; Yi Zuo; Andrew M Glazer; Dan M Roden; Jeffrey D Blume
Journal: PLoS Genet Date: 2020-06-22 Impact factor: 5.917

8. Predicting the functional impact of KCNQ1 variants with artificial neural networks.

Authors: Saksham Phul; Georg Kuenze; Carlos G Vanoye; Charles R Sanders; Alfred L George; Jens Meiler
Journal: PLoS Comput Biol Date: 2022-04-20 Impact factor: 4.779

9. Estimating the Posttest Probability of Long QT Syndrome Diagnosis for Rare KCNH2 Variants.

Authors: Krystian Kozek; Yuko Wada; Luca Sala; Isabelle Denjoy; Christian Egly; Matthew J O'Neill; Takeshi Aiba; Wataru Shimizu; Naomasa Makita; Taisuke Ishikawa; Lia Crotti; Carla Spazzolini; Maria-Christina Kotta; Federica Dagradi; Silvia Castelletti; Matteo Pedrazzini; Massimiliano Gnecchi; Antoine Leenhardt; Joe-Elie Salem; Seiko Ohno; Yi Zuo; Andrew M Glazer; Jonathan D Mosley; Dan M Roden; Bjorn C Knollmann; Jeffrey D Blume; Fabrice Extramiana; Peter J Schwartz; Minoru Horie; Brett M Kroncke
Journal: Circ Genom Precis Med Date: 2021-07-26

9 in total