Literature DB >> 33815689

Improved sequence-based prediction of interaction sites in α-helical transmembrane proteins by deep learning.

Abstract

Interactions between transmembrane (TM) proteins are fundamental for a wide spectrum of cellular functions, but precise molecular details of these interactions remain largely unknown due to the scarcity of experimentally determined three-dimensional complex structures. Computational techniques are therefore required for a large-scale annotation of interaction sites in TM proteins. Here, we present a novel deep-learning approach, DeepTMInter, for sequence-based prediction of interaction sites in α-helical TM proteins based on their topological, physiochemical, and evolutionary properties. Using a combination of ultra-deep residual neural networks with a stacked generalization ensemble technique DeepTMInter significantly outperforms existing methods, achieving the AUC/AUCPR values of 0.689/0.598. Across the main functional families of human transmembrane proteins, the percentage of amino acid sites predicted to be involved in interactions typically ranges between 10% and 25%, and up to 30% in ion channels. DeepTMInter is available as a standalone package at https://github.com/2003100127/deeptminter. The training and benchmarking datasets are available at https://data.mendeley.com/datasets/2t8kgwzp35.

Entities: Chemical Disease Gene Species

Keywords: Deep learning; Molecular evolution; Protein function; Protein structure; Protein-protein interactions; Sequence annotation

Year: 2021 PMID： 33815689 PMCID： PMC7985279 DOI： 10.1016/j.csbj.2021.03.005

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 7.271

Introduction

Transmembrane (TM) proteins are crucial for a large variety of biological functions [1], acting as transporters, channels, receptors and enzymes [2], and many of these functions are mediated by protein–protein interactions (PPI). For example, a large-scale affinity purification essay of yeast proteins [3] yielded 13,343 interactions between 2875 proteins, of which 1726 interactions involve at least one TM protein. The membrane-linked interactome of Arabidopsis thaliana was reported to involve 12,102 interactions [4]. According to the BioGRID database [5] TM proteins are involved in nearly a quarter of all confirmed human interactions, and an even higher percentage (almost 40%) was identified based on the most recent human interactome map [6]. Precise molecular details of the majority of membrane-bound protein complexes remain poorly characterized due to the scarcity of high-resolution structures. Although TM proteins constitute 25–30% of the cellular proteomes [7], they account for less than 2% of all experimentally determined structures in PDB and, based on the UniProt annotation [8], we estimate that less than 15% of all human protein complexes with a known structure involve TM proteins. Thus, knowledge about the location of protein binding sites in the amino acid sequences of TM proteins remains limited. There are several computational tools for predicting protein interaction sites, including DLPred [9] and DELPHI [10] and, to our knowledge, there are only two methods specifically trained on membrane proteins. One of them, described in [11] and trained on 128 protein chains, relies on amino acid composition and evolutionary conservation while a more recent tool, developed in our group (MBPred [12]) and trained on 171 chains, also leverages co-evolutionary information. Both methods are implemented using the same conventional machine learning approach - random forest [13]. The recent advent of deep learning techniques has led to a surge in prediction performance in many areas of structural bioinformatics and allowed to reduce its dependence on subjective selection of informative features by domain experts [14]. Among these deep-learning architectures, deep residual neural networks (ResNets), which allow for a fast training with a very large number of layers [15], have enabled considerable progress in predicting secondary structures [16], residue contacts [17], [18] as well as 3D protein structures [19]. An additional path to enhancing the accuracy lies in the application of deep learning model ensembles, as highlighted in a recent review [20]. Exponential growth in the number of known amino acid sequences makes it possible to obtain deep multiple sequence alignments (MSAs) required for efficient training of deep learning models for a growing number of protein families. Here, we present a fully automated deep-learning tool, DeepTMInter, to predict interaction sites in transmembrane proteins. This method exploits an ultra-deep ResNet containing 27 residual units, 58 convolutional layers and 1 dense layer, followed by stacked generalization [21] for variance error reduction. DeepTMInter was trained on a dataset of 301 chains from 241 unique transmembrane protein assemblies using a wide range of features, including a non-redundant set of physicochemical scales, protein topology, as well as evolutionary and co-evolutionary characteristics [22]. The model hyperparameters (>2 million weight parameters) were determined based on a 5-fold cross validation by the stratified-shuffle method [23]. On an independent dataset of 30 chains (of which more than one third were released between 2019 and 2020) and on two previously released datasets of 137 chains, DeepTMInter significantly outperformed MBpred and DELPHI both within structure-derived and predicted cytoplasmic, transmembrane, and extracellular regions as well as in full amino acid sequences. Furthermore, by reducing MSA sizes using HHfilter [24] at a 90% sequence-identity threshold we were able to speed up the method by an order of magnitude without a loss in accuracy.

Materials and methods

Datasets of transmembrane proteins with known 3D structure

We obtained from the PDBTM database (version: July 2020) [25] a dataset of 3090 three-dimensional structures of α-helical TM proteins at better than 3.5 Å resolution (Fig. 1a). Their biological oligomer structures were generated using the TMDET algorithm [26], [27] based on the PDB BIOMATRIX records. Upon removing structures with non-biological contacts and those with less than two chains we were left with 2073 PDB files containing TM protein complexes. Subsequently, a TM protein chain in any of the 2073 complexes was retained only if it possessed at least one residue contact with any other chain in the same complex, defined based on the minimal distance between any two nonhydrogen atoms of less than 6 Å (see Section 2.2 for detailed information about interaction site definition). This procedure resulted in 10,194 unique protein chains.

Fig. 1

Flowchart of our method to predict interaction sites in TM proteins. (a), (b), and (c) schematically illustrate the dataset generation, input features, and prediction process, respectively.

Flowchart of our method to predict interaction sites in TM proteins. (a), (b), and (c) schematically illustrate the dataset generation, input features, and prediction process, respectively. For comparison purposes we also used two additional datasets described in our previous work [12]. Briefly, the CompData dataset (101 TM protein chains, Supplementary Table S1) was derived by imposing a less than 30% sequence identity cutoff on a dataset of 267 TM protein chains benchmarked by Bordner [11]. The TestData dataset (Supplementary Table S2) contains a non-redundant (sequence identity <30%) dataset of 36 protein chains deposited with the PDBTM database between June 2015 and June 2017 and used to test our previously developed MBpred method [12]. The structures of 81 and 35 chains in the CompData and TestData datasets were determined at better than 3.5 Å resolution, respectively. Upon removing these 116 chains from the collection of 10,194 chains described above, we were left with 10,078 chains. Following the common practice in structural bioinformatics [28], [29], [30], we subjected this dataset to a stringent redundancy reduction procedure by imposing the requirement that no sequence pair shares a sequence identity above 25%. The resulting 331 chains were then randomly split into a training dataset (301 protein chains, dubbed TrainData) and an independent dataset (30 protein chains, dubbed IndepData) (Supplementary Tables S3 and S4). The detailed description of the interaction partners of the training and testing protein chains is given in Supplementary Tables S31-34.

Definition of interaction sites

Prediction of interaction sites is a class-imbalanced problem as the interacting (minority) class is strongly under-represented compared to the non-interacting (majority) class. As discussed in our earlier publication [12], this problem can be partially alleviated by defining amino acid residue contacts based on a somewhat larger distance threshold, which will result in more residues being assigned to the interacting class. For this reason, out of several alternative residue contact definitions, we selected the one proposed by Hamp and Rost [31], which is based on the distance between any two non-hydrogen atoms of less than 6 Å.

Protein topology

Extracellular (Extra), transmembrane (TM), and cytoplasmic (Cyto) segments were determined exactly as in our previous publication [12]. First, positions of TM regions were defined according to PDBTM. Since PDBTM does not always contain information about the localization of extramembraneous regions (inside or outside), we used Phobius [32] predictions to verify sequence topology. A non-TM segment as defined by PDBTM was confirmed as cytoplasmic if the overlap between this segment and the cytoplasmic region predicted by Phobius was larger than the overlap between this segment and the predicted extracellular region. The same approach was used for extracellular regions. A combination (Combined) of the three segment types above was also used in benchmarking the performance of predictors. The sizes of the three kinds of regions in the training and test datasets are presented in Supplementary Table S17.

Multiple sequence alignments

We generated multiple sequence alignments (MSAs) for each TM protein sequence by running HHblits searches [24] against the latest release of the specially prepared Uniclust30 database available at . Additionally, we also utilized the Uniprot20 database (), which has been used in several recent contact prediction projects [33], [34], [35]. We found that there was no significant difference in the prediction performance of models trained using either of these two databases. All parameters used to run HHblits in this work were according to the recommendations of Seedmayer et al. [36], which can be found at . In order to keep the CPU and memory requirements for calculating features at a manageable level, HHfilter [24] was applied to only keep sequences sharing <90% sequence identity, which resulted in a significant reduction of MSA depth. In Section 3.1.2 we will discuss the influence of the filtered MSAs on prediction performance.

Input features

For each amino acid and each MSA position we generated a series of sequence-based, physiochemical, and evolutionary characteristics (Fig. 1b), including amino acid representation, amino acid physicochemical scales, amino acid composition, MSA evolutionary profile, Shannon entropy, evolutionary conservation, relative position, protein topology, and residue coevolution.

Amino acid representation

Amino acids in each sequence position were encoded by the one-hot representation. A boolean vector of length 20 was used to indicate the presence (1) or absence (0) of the amino acid X, where X is one of the 20 amino acid symbols arranged in sequential order: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, and Y.

Amino acid physicochemical scales

The AAanalysis tool (Breimann et al., manuscript in preparation) was used to generate a representative set of amino acid physicochemical scales from the 565 redundant scales curated in the AAindex database [37] and further 69 scales compiled from other references. The redundancy of scales was reduced by applying 2-centroid k-means clustering with the Pearson correlation cutoff of 0.5. The resulting set of non-redundant scales was further clustered into 33 groups and in each group a scale with the highest Pearson correlation value with the centroid of that group was chosen as representative. The final representative dataset contains 34 scales falling into the following 7 categories: conformation (16), polarity (5), energy (8), composition (1), accessible surface (1), shape (1), structure-activity (2) (Supplementary Table S5). Each physicochemical scale was rescaled to the range [0,1]. For comparison purposes we also used several other widely used amino acid physicochemical scales (Supplementary Fig. S6).

Amino acid composition

Amino acid composition of each protein was represented by a vector of length 20 containing the relative frequency of each amino acid.

MSA evolutionary profile

The evolutionary profile for each symbol Y of 21 symbols (20 amino acids and one gap symbol) at MSA column i was calculated aswhere is the relative frequency of the symbol Y in the MSA column i and is the relative frequency of Y in the whole MSA.

Shannon entropy and evolutionary conservation

Shannon entropy for each MSA column i was computed aswhere n is 21 (20 amino acids and one gap symbol) and is the relative frequency of each symbol Y at MSA column i. Lower values of Shannon entropy correspond to higher conservation. Entropy values were transformed in such a way that higher values correspond to a stronger evolutionary conservation C:where the constant c is .

Relative sequence position

Relative sequence position was computed by normalizing the actual position i by protein length L: .

Protein topology

For each amino acid position i we generated a boolean vector of length 3 containing a one-hot representation of three topological regions: cytoplasm, transmembrane helix, or extracellular region.

Residue coevolution

The likelihood of two amino acid residues to be in contact can be measured by the evolutionary coupling (EC) values predicted by the evolutionary coupling analysis (ECA) methods [38]. In order to quantify the likelihood of a given residue to be involved in a contact, the evolutionary coupling ratio (ECR) has been proposed [22]:where is the sum of all EC values involving the residue X at position i and is the sum of all EC values of all residues in protein of length L. In order to reduce the variance error, we employed four ECA tools to generate three types of EC values, namely: mutual information and EVfold (generated by FreeContact ftp://rostlab.org/free/) and Gaussian DCA (). The ECR feature is thus represented by a vector of length 3.

The deep learning approach

Sequence window size and feature vector dimension

The choice of the sequence window size is crucial for optimizing the speed of training and also because it determines the dimension of the feature vector. For each window centered around a certain sequence position we tested three different setups: i) a comparatively large window size of 9, ii) a comparatively small window size of 3, and iii) a combination of two different window sizes, 9 and 3, dependent on a particular group of features being used. Upon conducting extensive computational experiments (data not shown) we found the setup iii to deliver the most optimal results in terms of the number of epochs required for training. We finally chose the window size of 9 for three features - physicochemical scales, evolutionary profile, and residue coevolution – while for all other features we used the windows of length 3. This choice resulted in a feature vector of length 660 (Supplementary Table S7).

Residual neural network architecture

We developed a deep learning architecture based on a residual neural network (ResNet) for predicting interacting amino acid residues in transmembrane proteins (Fig. 2a). For each amino acid position, the 660-dimensional feature vector (Section 2.6.1) was reshaped into a 26 × 26 matrix (i.e. with 676 elements, with 16 dimensions padded by 0), which was batch-normalized in order to speed up the training process [39]. The matrix was fed into an initial convolutional layer with 64 filters using stride 1, thus resulting in 64 separate output matrices with the same dimension 26 × 26 as the input matrix. In line with previous work [18], [33], [40], the same number of filters (64) was used in all other convolutional layers within the proposed architecture. The l × l values contained in the filter are the actual parameters to be learned during training. Since the computational cost of applying the convolution operation is directly proportional to the number of parameters, we used relatively small filters of size 3 × 3 containing randomly generated normally distributed values.

Fig. 2

Overview of the supervised learning process of DeepTMInter. (a) Layout of the deep ResNet architecture to predict interacting amino acid residues in TM proteins. (b) Using stacked generalization to further enhance the prediction performance of the ResNet models. Rounded boxes in light grey: deep learning or machine learning methods; rounded boxes in dark grey: respective obtained models. At the core of our ResNet architecture are 27 residual units (also called residual blocks), all with the same structure (Fig. 2a). For comparison, the number of residual units in some of the recently published ResNet architectures were 9 [41], 18 [40], and 22 [34]. We found that increasing the number of residual units even further led to significantly higher requirements for computing power and GPU memory resources; a deep architecture with 27 residual units could be trained in an acceptable running time. Each residual unit consists of 2 repeated basic structures consisting of batch normalization [39], a rectified linear unit (ReLU) [42], and a convolutional layer. This design of the residual unit, already used in our previous work [17], allows to accelerate the optimization and to avoid overfitting of the ResNet architecture [43]. ReLU [42], which makes input data non-negative [44], can alleviate the vanishing gradient problem [44] and further improve the performance of ResNet [42]. In order to reduce data dimensionality and the computational cost, we plugged an additional block comprising batch normalization and a convolutional layer with stride 2 (Fig. 2a) evenly distributed across the architecture: in the beginning (between residual units 1 and 2), in the middle (between residual units 13 and 14), and at the end (between residual units 26 and 27). It has become a common practice to apply dimensionality reduction subsequent to a group of convolutional layers [45], [46], [47], [48]. As a result, the original matrix dimension is progressively reduced from 26 × 26 to 13 × 13, then to 7 × 7, and finally to 4 × 4. The last residual unit is connected to a dense layer with 1024 neurons, followed by a 2-way softmax function which outputs for each amino acid the probabilities ( where ) of it being in involved in an interaction (corresponding to the label [0, 1] if ) or not (corresponding to the label [1, 0] if ). We implemented the deep architecture of ResNet depicted in Fig. 2a by using Google’s Tensorflow library (version 1.12.0) based on Python programming language.

Settings for training the ResNet architecture

Adam [49], a computationally efficient algorithm, was employed to train and optimize the ResNet architecture with the learning rate of 0.001 and the batch size of 100 training samples. The cross entropy objective function [50] was used to quantitively measure the difference between the actual labels and predicted labels (see Section 2.6.2).

Cross validation of the ResNet architecture

We categorized protein chains in the TrainData dataset into 5 classes according to their length: <200 (104 chains), 200–400 (125 chains), 401–600 (51 chains), 601–800 (16 chains), and >800 (4 chains) (Supplementary Table S8). 5-fold stratified-shuffle cross validation [23] was employed to evenly allocate protein chains of different length classes for training and validation at each iteration (Supplementary Fig. S1). Protein chains of each class were then randomly selected for training and validation according to the ratio 4:1 at each of the 5 iterations. Specifically, at each iteration, 240 chains were used for training and 60 chains for validation.

Avoiding over-training

Over-training is detrimental to the performance of models on unseen validation data [51], [52], [53], even if they achieve ideal performance on training data [54] (Supplementary Fig. S2). In order to avoid over-training, the early stopping strategy [55] was adopted, which involves aborting the training when the performance on validation data begins to worsen [56]. At each round of cross-validation, the model was chosen at one of the training epochs over which the performance on validation data continued to show an optimal trend [54], [55].

Stacked generalization

Stacked generalization [21], an approach for implementing model ensembles [57], was used to minimize the generalization errors of the models trained by the ResNet architecture [58] (Fig. 2b). The combined output, which was constructed by merging the output (by column) of the five models generated by a 5-round training on the full TrainData dataset (see Section 3.1.1), served as input for a multi-layer perceptron (MLP) [59] and a Gaussian Naive Bayes (GNB) classifier [60]. The output of the MLP and GNB models was then fitted by logistic regression. The resulting final model was used to report and evaluate the performance of the interaction site predictor reported in this paper. The MLP, GNB, and logistic regression models were implemented using the scikit-learn package ().

Evaluation criteria

The overall performance of DeepTMInter was evaluated based on two threshold-free [61], [62] measures: the area under the ROC (Receiver Operating Characteristic) curve (AUC) and the area under the Precision-Recall curve (AUCPR) [63]. In addition, we also employed the following single-threshold performance measures [61]:where TP (true positive), FP (false positive), TN (true negative) and FN (false negative) are the number of interacting residues predicted as interacting, the number of non-interacting residues predicted as interacting, the number of non-interacting residues predicted as non-interacting, and the number of interacting residues predicted as non-interacting, respectively. For each protein of length L in the three test datasets we evaluated predictions based on the top-ranked L/5 interaction sites (L-based performance evaluation is most commonly adopted in structural bioinformatics [18], [29], [33]).

Comparison with other methods

We compared the performance of DeepTMInter on three test datasets (see Section 2.1) with the MBPred algorithm previously developed in our group ([12]; ). The standalone MBPred suite contains four individual predictors - MBPredTM, MBPredCyto, MBPredExtra, and MBPredAll - trained on transmembrane, cytoplasmic, and extracellular regions as well as full-length TM protein sequences, respectively. Additionally, we also compared DeepTMInter with MBPredCombined, which combines MBPredTM, MBPredCyto, and MBPredExtra predictions. Another method we compared DeepTMInter with is DELPHI [10], which employs a hybrid convolutional and recurrent neural network architecture for predicting interaction sites in proteins of any type, both soluble and transmembrane.

Human transmembrane proteins

We obtained 5178 human protein sequences with at least one annotated transmembrane region from the UniProtKB/Swiss-Prot database [8]. Interaction sites of the proteins were predicted by DeepTMInter. Topologies of the human transmembrane proteins were annotated according to UniProt. We finally retained for further analysis 5051 human transmembrane proteins with the MSA depth in the range between 20 and 10,000 after filtering (see Section 2.4 for alignment generation). Thus, shallow MSAs providing insufficient evolutionary information as well as excessively deep MSAs imposing excessively high CPU requirements were excluded. These proteins were classified into eight major functional classes - G-protein-coupled receptor (GPCR), catalytic receptor, ligand-gated ion channel (LGIC), voltage-gated ion channel (VGIC), other ion channel, transporter, enzyme, and other protein target – according to the expert-curated “Guide to PHARMACOLOGY” database (GtoPdb; ) [64], [65].

Protein-protein interaction databases

For human transmembrane proteins we obtained 76,584 unique pairs of interacting proteins from a high-quality expert-curated resource HuRI-Union [6], which represents the union of the HI-union and Lit-BM databases. The 64,006 binary proteins interactions (PPIs) in the HI-union database were systematically identified by the yeast two-hybrid (Y2H) assay, while the Lit-BM database comprises a collection of 13,441 high-confidence binary PPIs from literature. Interaction partners for the proteins in our three test datasets (TestData, CompData, and IndepData) were obtained by merging 1,858,173 binary interactions from the BioGRID database (version 3.5.188) [5] and 1,063,382 binary interactions from the IntAct database (version: 4.1.25) [66], respectively (Supplementary Figs. S3 and S4). A mapping between the PDB codes and UniProt IDs of proteins was obtained by PyPDB [67].

Results

Prediction performance of DeepTMInter

In addition to performing a 5-fold cross-validation procedure, we also conducted 5 rounds of training on the full TrainData dataset in order to eliminate the influence on the prediction performance of some random factors, such as the initialization parameters of the residual neural network (see 2.6.4, 2.6.6). Note that no protein chain in the TrainData dataset shares more than 25% sequence identity with any protein from the IndepData dataset (see Section 2.1), on which our method was assessed. As seen in Fig. 3, the best performance on validation data is achieved in the vicinity of epoch 60 and the corresponding models were chosen for final assessment according to the early stopping strategy. Overall, the performance of models trained on the full TrainData dataset is significantly better in terms of mean AUC values, AUCPR values, and cross-entropy error function than that of models trained on TrainData subsets in the course of cross validation (Fig. 3a-c, Supplementary Tables S9 and S10). The ~20% increase in the number of protein chains between each cross-validation subset and the full training set and the concomitant increase in the number of interaction sites (from ~82,000 for each of cross validations to 102,685, see Supplementary Table S8) lead to a surge in prediction performance. Thus, we finally settled for the models trained based on the full TrainData dataset, which were further used to construct the final ensemble model referred to as DeepTMInter (see Section 2.6.6). The application of the stacked generalization to the final ensemble model results in an approximately 0.5%-3% increase compared to the separate models in terms of AUC performance using different regions (Supplementary Table S10).

Fig. 3

Performance of our method on the IndepData dataset based on the 80% subsets and the full TrainData dataset. (a), (b), and (c) show AUC, AUCPR, and cross-entropy error values over 100 training epochs. The errors in (c) measure the difference between actual and predicted labels of interaction sites using the cross-entropy objective function (see Section 2.6.3). Blue lines and red dots represent the mean AUC, AUCPR, and error values produced by the models trained over 5 rounds on the full training set or the models trained on the 80% subsets in the course of 5-fold cross validation, respectively. For each red dot the upper and lower bounds correspond to the maximum and minimum values produced by 5 cross-validation models at each of the 100 epochs, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Influence of MSA depth on prediction performance

MSA depth is a major factor determining the CPU and memory requirements for feature generation. We therefore evaluated the performance of our method on full MSAs (this model is referred to as DeepTMInter-Unfiltered) and on shallow MSAs filtered by HHfilter (see Section 2.4) (referred to as DeepTMInter). The performance of these two models is in general comparable (Fig. 4a-c), with DeepTMInter even overperforming DeepTMInter-Unfiltered in some cases using the four types of either structure-derived or Phobius-predicted regions (Cyto, TMH, Extra, and Combined) (Fig. 4e and f). For example, using the structure-derived Extra region the AUC and AUCPR values of DeepTMInter (0.688 and 0.458, respectively) are significantly higher than the values achieved by DeepTMInter-Unfiltered (0.656 and 0.425, respectively). Thus, significantly reducing alignment depth (Fig. 4d) allowed to speed up our method without sacrificing prediction performance. We examined how the prediction performance is influenced by the MSA quality based on the numbers of effective sequences (Neff) [68], [69], which were computed using ConKit [70] with the 80% sequence-identity threshold [68] (Fig. 4g and h). According to the linear regression analysis there is overall a statistically significant correlation between the MSA quality and the prediction performance on all test datasets, as evidenced by low p-values (e.g., 1.415e−05 on TestData and 2.084e−04 on IndepData), although the R-square values are moderate (e.g., 0.42 on TestData and 0.38 on IndepData).

Fig. 4

Performance comparison of DeepTMInter and DeepTMInter-Unfiltered on the IndepData dataset. (a), (b), and (c) show mean AUC, AUCPR, and cross-entropy error values with (solid line) and without (dashed line) HHfilter over 100 training epochs produced by the models trained over 5 rounds on the full training set. (d) presents the number of homologous sequences in MSAs generated with and without using HHfilter, with the mean values of 39,553 and 12,663, respectively. (e) and (f) show the AUC and AUCPR values produced by the final ensemble models (stacked generalization, see Section 2.6.6) on four types of structure-derived and Phobius-predicted regions (Cyto, TMH, Extra, and Combined) (Supplementary Tables S11). (g) shows the distribution of Neff values for the MSAs of proteins in the three test datasets. (h) shows the correlation between Neff and the prediction performance measured as AUC on the three test datasets. Each dot represents a protein.

Selection of amino acid physicochemical scales

In order to investigate how the choice of amino acid physicochemical scales influences model performance, two groups of scales were prepared: one generated by the AAanalysis tool and the other one manually collected from literature (see Methods, Section 2.5.2). Our final model, DeepTMInter, was trained with the scales generated by the AAanalysis tool and all the other features. For comparison, the model trained with the scales collected from references and all the other features is further referred to as DeepTMInter-Lit. Overall, DeepTMInter shows a better performance than DeepTMInter-Lit in terms of AUC values on all test datasets (Fig. 5). For instance, using Phobius-predicted combined regions on the IndepData dataset DeepTMInter achieves the AUC value of 0.690 (AUCPR = 0.599) compared to 0.676 (0.595) of DeepTMInter-Lit (Supplementary Tables S10 and 12). We assume that this gain in performance stems from the fact that the AAanalysis tool selects the representative scales of each kind and thus significantly reduces data redundancy, which is detrimental to learning algorithms [71], [72].

Fig. 5

Performance comparison of DeepTMInter and DeepTMInter-Lit based on the IndepData dataset. (a), (b), and (c) show mean AUC, AUCPR, and cross-entropy error values over 100 training epochs produced by the DeepTMInter (solid line) and DeepTMInter-Lit (dashed line) models trained over 5 rounds on the full training set. (d) shows the AUC and AUCPR values produced by the final ensemble models (stacked generalization, see Section 2.6.6) on four types of structure-derived and Phobius-predicted regions (Cyto, TMH, Extra, and Combined). (e) and (f) display the distribution of MCC and recall values of protein chains.

Performance comparison of DeepTMInter with other methods

We compared the prediction performance of DeepTMInter with the four underlying predictors (MBPredTM, MBPredCyto, MBPredExtra, and MBPredAll) in the MBPred suite, with the ensemble predictor MBPredCombined, as well as with DELPHI (see Section 2.8). Due to the adoption of the early stopping strategy to prevent over-training (see Section 2.6.5), our trained model achieved high prediction performance not only on the two previous test datasets (TestData and CompData) (Table 1, Table 2), but also on the independent test dataset (IndepData) (Fig. 6, Fig. 7, Table 3, and Supplementary Tables S13-S14).

Table 1

Predictor	Region	P	R	SP	ACC	MCC	F1	P	R	SP	ACC	MCC	F1	AUC	AUCPR
Predictor	Region	L/5						L/10						AUC	AUCPR
MBPredCyto	Cyto	0.651	0.231	0.817	0.439	0.077	0.298	0.613	0.113	0.913	0.413	0.038	0.172	0.760	0.539
MBPredTM		0.593	0.215	0.806	0.422	0.008	0.276	0.540	0.109	0.907	0.397	0.001	0.157	0.612	0.488
MBPredExtra		0.645	0.219	0.811	0.440	0.068	0.295	0.566	0.093	0.905	0.403	0.001	0.153	0.581	0.444
MBPredAll		0.657	0.227	0.814	0.449	0.075	0.293	0.619	0.114	0.910	0.420	0.031	0.172	0.745	0.511
DELPHI		0.663	0.256	0.838	0.485	0.107	0.325	0.689	0.148	0.924	0.460	0.113	0.212	0.720	0.575
DeepTMInter		0.743	0.322	0.871	0.481	0.202	0.376	0.733	0.197	0.944	0.440	0.158	0.250	0.807	0.744

MBPredCyto	TMH	0.574	0.224	0.816	0.490	0.059	0.293	0.574	0.119	0.909	0.480	0.040	0.184	0.605	0.449
MBPredTM		0.620	0.248	0.837	0.509	0.106	0.324	0.665	0.141	0.926	0.500	0.105	0.219	0.753	0.507
MBPredExtra		0.572	0.210	0.814	0.489	0.051	0.284	0.592	0.120	0.911	0.484	0.051	0.189	0.581	0.410
MBPredAll		0.645	0.267	0.841	0.532	0.150	0.343	0.637	0.137	0.922	0.510	0.095	0.213	0.724	0.471
DELPHI		0.584	0.248	0.836	0.535	0.096	0.320	0.654	0.147	0.930	0.535	0.121	0.227	0.707	0.569
DeepTMInter		0.734	0.350	0.890	0.554	0.240	0.429	0.775	0.205	0.958	0.523	0.200	0.303	0.820	0.721

MBPredCyto	Extra	0.624	0.205	0.804	0.447	0.025	0.288	0.587	0.101	0.900	0.424	0.009	0.165	0.626	0.528
MBPredTM		0.572	0.184	0.797	0.418	0.001	0.258	0.516	0.085	0.891	0.408	0.001	0.141	0.594	0.453
MBPredExtra		0.660	0.228	0.809	0.463	0.072	0.312	0.658	0.112	0.910	0.437	0.061	0.184	0.685	0.497
MBPredAll		0.611	0.222	0.814	0.442	0.023	0.290	0.625	0.117	0.910	0.439	0.047	0.187	0.672	0.518
DELPHI		0.591	0.230	0.828	0.479	0.054	0.304	0.648	0.134	0.919	0.470	0.071	0.209	0.644	0.591
DeepTMInter		0.763	0.325	0.867	0.504	0.194	0.405	0.778	0.181	0.938	0.460	0.161	0.265	0.738	0.685

MBPredAll	Combined	0.654	0.245	0.829	0.493	0.111	0.329	0.646	0.120	0.916	0.458	0.065	0.190	0.721	0.497
MBPredCombined		0.665	0.242	0.827	0.492	0.118	0.331	0.662	0.119	0.914	0.457	0.075	0.191	0.732	0.517
DELPHI		0.616	0.251	0.839	0.513	0.094	0.331	0.653	0.137	0.926	0.495	0.094	0.213	0.695	0.572
DeepTMInter		0.759	0.344	0.888	0.530	0.234	0.425	0.815	0.202	0.961	0.488	0.205	0.294	0.793	0.718

Table 2

Predictor	Region	P	R	SP	ACC	MCC	F1	P	R	SP	ACC	MCC	F1	AUC	AUCPR
Predictor	Region	L/5						L/10						AUC	AUCPR
MBPredCyto	Cyto	0.727	0.236	0.838	0.432	0.082	0.337	0.693	0.115	0.924	0.385	0.066	0.190	0.618	0.622
MBPredTM		0.674	0.215	0.809	0.410	0.020	0.309	0.616	0.102	0.907	0.371	0.008	0.168	0.571	0.552
MBPredExtra		0.681	0.216	0.821	0.411	0.030	0.308	0.663	0.112	0.914	0.379	0.048	0.181	0.585	0.570
MBPredAll		0.764	0.261	0.855	0.454	0.139	0.363	0.728	0.132	0.935	0.400	0.103	0.210	0.656	0.664
DELPHI		0.754	0.248	0.870	0.463	0.136	0.358	0.751	0.129	0.942	0.414	0.111	0.212	0.735	0.717
DeepTMInter		0.804	0.289	0.900	0.461	0.174	0.387	0.764	0.156	0.957	0.400	0.129	0.229	0.807	0.779

MBPredCyto	TMH	0.598	0.227	0.818	0.505	0.074	0.307	0.594	0.120	0.911	0.486	0.049	0.188	0.578	0.452
MBPredTM		0.663	0.269	0.850	0.531	0.147	0.354	0.715	0.149	0.931	0.507	0.133	0.232	0.650	0.558
MBPredExtra		0.574	0.209	0.812	0.495	0.047	0.289	0.570	0.108	0.906	0.481	0.030	0.173	0.576	0.444
MBPredAll		0.662	0.270	0.848	0.535	0.150	0.355	0.667	0.144	0.931	0.505	0.106	0.222	0.669	0.566
DELPHI		0.664	0.285	0.853	0.553	0.171	0.365	0.699	0.159	0.933	0.528	0.146	0.237	0.715	0.571
DeepTMInter		0.719	0.319	0.889	0.552	0.211	0.403	0.757	0.181	0.955	0.517	0.174	0.267	0.803	0.716

MBPredCyto	Extra	0.647	0.216	0.823	0.472	0.075	0.308	0.617	0.107	0.913	0.435	0.040	0.176	0.591	0.515
MBPredTM		0.589	0.194	0.801	0.450	0.016	0.277	0.545	0.094	0.899	0.422	−0.007	0.154	0.545	0.448
MBPredExtra		0.696	0.243	0.863	0.492	0.133	0.341	0.703	0.131	0.943	0.453	0.110	0.212	0.643	0.586
MBPredAll		0.681	0.244	0.854	0.496	0.124	0.339	0.701	0.135	0.937	0.463	0.110	0.216	0.635	0.581
DELPHI		0.664	0.244	0.861	0.499	0.110	0.338	0.700	0.135	0.936	0.474	0.110	0.216	0.706	0.626
DeepTMInter		0.784	0.315	0.895	0.527	0.234	0.415	0.792	0.181	0.959	0.474	0.187	0.272	0.777	0.712

MBPredAll	Combined	0.687	0.260	0.856	0.509	0.134	0.360	0.707	0.136	0.936	0.475	0.104	0.220	0.651	0.604
MBPredCombined		0.679	0.260	0.855	0.510	0.129	0.359	0.717	0.141	0.938	0.481	0.114	0.227	0.635	0.589
DELPHI		0.698	0.275	0.868	0.538	0.169	0.379	0.729	0.145	0.939	0.505	0.135	0.235	0.718	0.633
DeepTMInter		0.773	0.320	0.903	0.544	0.229	0.426	0.799	0.174	0.960	0.494	0.177	0.270	0.796	0.738

Fig. 6

Performance comparison of DeepTMInter with MBPred and DELPHI. (a), (b), and (c) show the ROC curves on the TestData, CompData, and IndepData datasets, respectively, while (d), (e), and (f) show the Precision-Recall curves on the TestData, CompData, and IndepData datasets, respectively.

Fig. 7

Comparison of JSCs (Jaccard similarity coefficients) between DeepTMInter and the three specialized MBPred predictors (MBPredTM, MBPredCyto, and MBPredExtra) on the TestData dataset. Each dot corresponds to one protein chain.

Table 3

Predictor	Region	P	R	SP	ACC	MCC	F1	P	R	SP	ACC	MCC	F1	AUC	AUCPR
Predictor	Region	L/5						L/10						AUC	AUCPR
MBPredCyto	Cyto	0.612	0.230	0.853	0.506	0.104	0.322	0.608	0.113	0.932	0.479	0.070	0.185	0.624	0.581
MBPredTM		0.538	0.206	0.824	0.476	0.027	0.285	0.569	0.108	0.919	0.470	0.040	0.176	0.568	0.522
MBPredExtra		0.524	0.205	0.819	0.471	0.019	0.278	0.554	0.111	0.919	0.467	0.037	0.174	0.591	0.540
MBPredAll		0.659	0.246	0.868	0.519	0.153	0.347	0.641	0.118	0.936	0.480	0.090	0.195	0.648	0.622
DELPHI		0.616	0.223	0.844	0.507	0.110	0.317	0.614	0.107	0.920	0.478	0.069	0.178	0.632	0.600
DeepTMInter		0.633	0.269	0.855	0.514	0.135	0.352	0.687	0.147	0.937	0.493	0.129	0.228	0.689	0.657

MBPredCyto	TMH	0.588	0.206	0.809	0.498	0.058	0.295	0.603	0.106	0.907	0.483	0.048	0.177	0.566	0.482
MBPredTM		0.635	0.242	0.837	0.515	0.090	0.335	0.652	0.124	0.921	0.493	0.073	0.204	0.603	0.513
MBPredExtra		0.594	0.211	0.815	0.500	0.063	0.299	0.622	0.113	0.908	0.488	0.061	0.185	0.548	0.476
MBPredAll		0.649	0.240	0.836	0.524	0.125	0.336	0.644	0.116	0.915	0.494	0.080	0.192	0.593	0.502
DELPHI		0.634	0.234	0.829	0.516	0.106	0.329	0.617	0.115	0.913	0.486	0.059	0.188	0.596	0.533
DeepTMInter		0.635	0.246	0.834	0.515	0.109	0.331	0.676	0.131	0.919	0.499	0.107	0.209	0.661	0.603

MBPredCyto	Extra	0.514	0.216	0.803	0.555	0.074	0.288	0.444	0.093	0.898	0.545	0.031	0.148	0.571	0.355
MBPredTM		0.518	0.238	0.807	0.556	0.085	0.301	0.471	0.112	0.909	0.551	0.055	0.170	0.581	0.386
MBPredExtra		0.463	0.204	0.804	0.539	0.029	0.265	0.465	0.106	0.902	0.543	0.042	0.163	0.519	0.312
MBPredAll		0.520	0.232	0.807	0.556	0.075	0.299	0.508	0.130	0.905	0.553	0.065	0.192	0.571	0.367
DELPHI		0.560	0.269	0.822	0.576	0.122	0.342	0.543	0.131	0.915	0.563	0.105	0.202	0.703	0.476
DeepTMInter		0.558	0.259	0.820	0.573	0.133	0.328	0.503	0.112	0.913	0.555	0.075	0.175	0.688	0.458

MBPredAll	Combined	0.608	0.249	0.840	0.571	0.139	0.344	0.597	0.122	0.918	0.548	0.084	0.200	0.610	0.514
MBPredCombined		0.573	0.240	0.834	0.557	0.104	0.325	0.573	0.118	0.916	0.542	0.068	0.190	0.589	0.493
DELPHI		0.578	0.238	0.831	0.559	0.109	0.325	0.603	0.126	0.919	0.548	0.091	0.202	0.642	0.541
DeepTMInter		0.612	0.276	0.846	0.573	0.151	0.360	0.658	0.151	0.928	0.559	0.135	0.233	0.689	0.598

Performance gauged by mean precision (P), recall (R), specificity (SP), accuracy (ACC), MCC, F1-score (F1), and AUC, and AUCPR (see Section 2.7) using structure-derived Cyto, TMH, Extra, and Combined regions on the TestData dataset. Performance gauged by mean precision (P), recall (R), specificity (SP), accuracy (ACC), MCC, F1-score (F1), and AUC, and AUCPR (see Section 2.7) using structure-derived Cyto, TMH, Extra, and Combined regions on the CompData dataset. Performance comparison of DeepTMInter with MBPred and DELPHI. (a), (b), and (c) show the ROC curves on the TestData, CompData, and IndepData datasets, respectively, while (d), (e), and (f) show the Precision-Recall curves on the TestData, CompData, and IndepData datasets, respectively. Comparison of JSCs (Jaccard similarity coefficients) between DeepTMInter and the three specialized MBPred predictors (MBPredTM, MBPredCyto, and MBPredExtra) on the TestData dataset. Each dot corresponds to one protein chain. Performance gauged by mean precision (P), recall (R), specificity (SP), accuracy (ACC), MCC, F1-score (F1), and AUC, and AUCPR (see Section 2.7) using structure-derived Cyto, TMH, Extra, and Combined regions on the IndepData dataset.

Performance comparison using threshold-free measures

On all test datasets the AUC and AUCPR performance of predictors was benchmarked using the Cyto, TMH, Extra and Combined regions either defined according to PDBTM or predicted by Phobius. For the three specialized predictors (MBPredCyto, MBPredTM, and MBPredExtra), we calculated the AUC and AUCPR values not only for their specialized regions but also for the regions on which they were not trained (Table 3 and Supplementary Tables S13 and S14). Overall, DeepTMInter shows a significant improvement in terms of the AUC and AUCPR performance compared to the other two state-of-the-art methods. For example, on the IndepData dataset our method gives the AUC value of 0.661 (AUCPR = 0.603), distinctly higher than 0.603 (0.513) of MBPredTM and 0.596 (0.533) of DELPHI using structure-derived TMH regions. Note that DELPHI achieves the best AUC (0.703) and AUCPR (0.476) performance using the Extra region. Interestingly, on the TestData and CompData datasets, DeepTMInter also significantly outperforms all other methods in the Extra regions (Table 1, Table 2). To investigate the weaker performance of our method in this region on the IndepData dataset, we derived from protein structures the number of interaction and non-interaction sites (Supplementary Table S18) and found that unlike the even ratios of interaction vs. non-interaction sites in the Cyto (1199 vs. 1338) and TMH (766 vs. 984) regions, sites in the Extra region are clearly biased towards the non-interaction class with the ratio of 516 vs. 1210. We therefore speculate that this imbalance complicates distinguishing the interaction sites from the overrepresented non-interaction sites. By contrast, the method DELPHI, trained on non-TM proteins, still performs well in this non-TM region [10]. MBPredCyto, MBPredTM, and MBPredExtra have been reported to perform best in predicting interacting amino acid residues located in their respective regions (Cyto, TMH, and Extra) on the TestData dataset [12]. Indeed, we found that this is the case both on the CompData and IndepData datasets. For example, on the CompData dataset among all specialized predictors in the MBPred suite MBPredCyto, MBPredTM, and MBPredExtra yield the highest AUC values of 0.618, 0.650, and 0.643 (AUCPR = 0.622, 0.558, and 0.586). Fig. 6 shows the ROC and Precision-Recall curves of predictors using their specialized structure-derived regions on all test datasets. DeepTMInter is clearly superior to the MBPred suite and DELPHI and achieves the highest AUC (0.793, 0.796, and 0.689) (Fig. 6a-c) and AUCPR values (0.718, 0.738, and 0.598) (Fig. 6d-f). Overall, the MBPred suite is mostly comparable to DELPHI in terms of the AUC performance across the three test datasets on average, but is worse than DELPHI in terms of the AUCPR performance (Fig. 6). On all test datasets MBPred predictors exhibit comparable performance in predicting interaction sites located in the regions they are specifically trained on. For example, on the IndepData dataset MBPredCyto and MBPredTM produce similar ROC curves corresponding to the AUC values of 0.624 and 0.603, respectively.

Performance comparison using single-threshold measures

For proteins in all test datasets mean precision, recall, F1-score, and MCC were calculated using entire combined structure-derived regions (Table 1, Table 2, Table 3). Based on these performance measures DeepTMInter is also way ahead of the MBPred suite and DELPHI. For example, on the CompData dataset DeepTMInter achieved the highest precision 0.783, recall 0.324, F1-score 0.432, and MCC 0.239 values. In addition, JSC (Jaccard similarity coefficient, see Section 2.7) was used to assess the similarity between a set of actual labels and a set of predicted labels for sites in TM proteins [73], [74]. A high JSC is indicative of high performance of a predictor. For the three structure-derived regions (Cyto, TMH, and Extra) on the TestData dataset we compared JSCs of DeepTMInter to those of the MBPredTM, MBPredCyto, and MBPredExtra, respectively, for each individual protein. As seen in Fig. 7, DeepTMInter vastly outperforms the three underlying MBPred predictors on all 36 proteins and in all the three structure-derived regions (Cyto, TMH, and Extra), with mean JSCs 0.258, 0.298, and 0.274 (averaged over JSCs of all proteins in that dataset) compared to 0.182, 0.199, and 0.194 of MBPredCyto, MBPredTM, and MBPredExtra, as well as 0.205, 0.197, and 0.185 of DELPHI. Mean JSC values indicate that the set of predicted labels corresponding to protein sites produced by DeepTMInter shows a stronger agreement to the experimentally determined sites (the set of their actual labels) than those produced by MBPred. Additionally, we also evaluated the JSC performance on the IndepData dataset, e.g., with DeepTMInter achieving 0.220, 0.204, and 0.221 compared to DELPHI achieving 0.196, 0.208, and 0.198 on the Cyto, TMH, and Combined regions, respectively, and furthermore compared to MBPredCyto, MBPredTM, and MBPredExtra achieving 0.199, 0.207, and 0.162 using their specialized regions, respectively. In the Extra region DELPHI achieves a significantly better JSC of 0.226 compared to 0.179 of DeepTMInter.

Exposure states of falsely classified amino acid positions

To get an understanding of how the falsely predicted interaction sites are correlated with interface properties, we calculated the occurrence of misclassified amino acid positions in exposed and buried regions of structures (Fig. 8). For each amino acid, the relative solvent accessibility (RSA) [75] was calculated based on known 3D structures using the DSSP software [76]. The RSA threshold for classifying residues as exposed or buried was set to 0.1, as suggested in [77]. We found that in terms of the RSA values the interaction sites predicted by DeepTMInter closely follow the distribution of values for the interaction sites derived from 3D structures (Fig. 8a and b). Both actual and predicted interaction sites are overwhelmingly located in the solvent (or lipid) accessible regions of proteins. While the false negative (FN) predictions (assigning established interaction sites as non-interacting) are apparent errors, the false positive (FP) predictions might point to unknown interaction sites that have yet to be experimentally confirmed. Overall, falsely predicted positions, both FN and FP, tend to reside in exposed areas of structure (Fig. 8c and d).

Fig. 8

Relative solvent accessibility of interaction and non-interaction sites. (a) and (b) show the RSA density of actual and predicted interaction and non-interaction sites, respectively. (c) shows the RSA density of FN and FP predictions. The densities in (a)-(c) are calculated by using the Gaussian kernel density estimate (Gaussian KDE) from the Seaborn library [78]. (d) shows the percentage of buried and exposed sites among the false predictions.

Feature importance analysis

We performed an importance analysis of topological, physiochemical, and evolutionary features on our full training and 5-iteration cross-validation datasets (Fig. 9). Since we are not aware of any measures that are able to deduce feature importance from deep-learning model training phases [79], we followed the common practice [12], [79], [80] to quantify the feature importance based on mean decrease of impurity (MDI) inferred from XGBoost [79], [81]. The MDI values provide support for recursively splitting the feature set in the XGBoost method to classify sites as interacting or non-interacting. In each split the features that best improve the classification performance are identified. For visualization purposes we evaluated the mean (Fig. 9a) and the maximal (Fig. 9b) MDI values of the different feature categories within sliding windows (see reference [79]). Amino acid composition makes the strongest contribution to the interaction site classification among all features. The EVfold coevolution feature remains very powerful for distinguishing between interaction and non-interaction sites (the EVfold feature has also been ranked best in our previous analysis for MBPred [12]) while the Gaussian DCA co-evolution makes a moderate contribution. This finding is in line with previous reports that residues involved in binding sites are collectively under functional constraints and are therefore subject to co-evolution [82]. Features representing amino acid types and reflecting their physiochemical properties only have run-of-the-mill importance (Fig. 9).

Fig. 9

Feature importance measured by mean decrease of impurity (MDI). The mean (a) and the max (b) MDI values of 11 unique features are achieved over the full training and the 5-iteration cross-validation datasets, respectively.

Performance evaluation using different residue contact definitions

To evaluate how DeepTMInter performance in predicting interaction sites depends on the choice of a particular residue contact definition, the AUC and AUCPR values were calculated on TestData, CompData, and IndepData datasets and compared using the BordInter [11], FuchsInter [83], and RostInter [31] residue contact definitions (Fig. 10a-c and Supplementary Table S15). Note that the sites in the full protein sequences (the Combined region) obtained by DeepTMInter were involved in the calculation of the two criteria above. The numbers of interacting (NI) and non-interacting (NNI) amino acid residues were derived from experimental 3D structures (Fig. 10a-c and Supplementary Table S16). As expected, the number of residue contacts increases progressively with the spatial distance cutoff according to the BordInter (4 Å), FuchInter (5.5 Å), and RostInter (6 Å) definitions. The RostInter definition leads to the highest AUC and AUCPR values on the three test datasets. For example, on the CompData dataset the AUC (0.762, 0.790, and 0.796) and AUCPR values (0.527, 0.690, and 0.738) were obtained using the BordInter, FuchInter, and RostInter definitions, respectively. A higher distance threshold (RostInter) also results in more residues labeled as interacting, thus partially alleviating the imbalance between the two residue classes (interacting and non-interacting), while the low distance threshold BordInter impairs the performance by increasing the imbalance as discussed in [12] (see also Supplementary Table S16). Using the most stringent contact definition (BordInter) DeepTMInter remains the most accurate method on structure-derived Combined regions (Fig. 10d-i). For example, it achieves an AUC value of 0.675 (AUCPR = 0.448) compared to 0.639 (0.402) of DELPHI on the IndepData dataset. Based on the Extra region and the IndepData dataset, DELPHI has the highest AUC value of 0.700 (AUCPR = 0.338) compared to that of 0.668 (AUCPR = 0.303) for DeepTMInter. Note that the distance threshold utilized for training DELPHI - the sum of the Van der Waal’s radii of two atoms +0.5 Å [10] – is in most cases smaller than the BordInter, FuchInter, and RostInter distance thresholds.

Fig. 10

Statistics and performance comparison using the BordInter, FuchInter, and RostInter residue contact definitions on the TestData (a), CompData (b), and IndepData (c) datasets. Left side: NI - number of interacting amino acid residues; NNI - number of non-interacting amino acid residues. Right side: AUC and AUCPR. (d), (e), and (f) show the ROC curves on the TestData, CompData, and IndepData datasets, respectively based on the BordInter interaction definition, while (g), (h), and (i) show the Precision-Recall curves on the TestData, CompData, and IndepData datasets, respectively. Similar to Fig. 6, the ROC and Precision-Recall curves of three separate predictors (MBPredCyto, MBPredTM, and MBPredExtra) are evaluated using the predictions for the respective regions they were trained on.

Evolutionary conservation of interaction sites

We compared the conservation scores (ranging from 0 to 1) of interaction and non-interaction sites in the Cyto, TMH, Extra, and Combined regions (Fig. 11), disregarding alignment columns with more than 50% of gaps. In line with [11] and our own previous work [12], interaction sites both derived from structures (Fig. 11a) and predicted by DeepTMInter (Fig. 11b) are significantly more evolutionarily conserved than non-interaction sites in all four regions (Cyto, TMH, Extra, and Combined). The Combined region (i.e., the full sequence) displays the most statistically significant difference between interaction and non-interaction sites both derived from structures (p-value 2.39e−24, t-Test) and predicted by DeepTMInter (p-value 8.05e−37, t-Test). Interaction sites in the transmembrane domains, while still more conserved compared to the positions not involved in interactions, exhibit a lower p-value of 5.32e−06 due to the degenerate amino acid composition and hence stronger overall conservation of hydrophobic, lipid-immersed sequence segments [84], [85].

Fig. 11

Evolutionary conservation of interaction and non-interaction sites derived from structures (a) and predicted by DeepTMInter (b) across all three test datasets (TestData, CompData, and IndepData) in the Cyto, TMH, Extra, and Combined (the full protein sequence) regions. Statistical significance of the difference between interaction and non-interaction sites is inferred by p-values obtained by t-Test.

Family-specific analysis of interaction sites and network connectivity in human transmembrane proteins

We investigated the relationship between the biological activities of proteins and their interaction patterns by analyzing the average percentage of interaction sites in the eight major membrane protein classes (see Section 2.9) curated by the GtoPdb database (Fig. 12a and b, and Supplementary Tables S19-S26 and Fig. S14). Overall, ligand-gated, voltage-gated, and other ion channels appear to have the largest percentage of their amino acid sequence involved in interactions (21.6%, 22.9%, and 32.4%, respectively) (Fig. 12a). Within the latter class (Supplementary Figs. S5 and S13) Orai channels [86] as well as connexins and pannexins [87] stand out as having the highest proportion of interaction sites. Similarly, among the 20 sub-families within the “other protein target” class, other pattern recognition receptors, sigma receptors and abscisic acid receptor complex all have more than 40% of their sequence predicted to mediate interactions (Supplementary Fig. S9). Proteins in these three sub-families play an important role in signaling pathways [88], [89], [90]. Among the eight functional groups, the “other ion channel” class harbors the highest percentages (37.6% and 35.9%) of interaction sites in the TMH and the Extra regions (Fig. 12b and Supplementary Figs. S6-S8), respectively, while in the “Enzyme” and “other protein target” classes interaction sites reside in the Cyto region (Fig. 12b and Supplementary Figs. S10-S12).

Fig. 12

Percentage of amino acids predicted to be involved in interactions in the complete protein sequences (a) and in the TMH, Cyto, and Extra regions (b).

Percentage of amino acids predicted to be involved in interactions in the complete protein sequences (a) and in the TMH, Cyto, and Extra regions (b). In most of the families the percentage of the amino acids involved in interactions lies within the range of 10%-25%, while the “catalytic receptors” and “other ion channel” classes are characterized by a wider range of 10%-50% (Fig. 13). On average human transmembrane protein interact with 10.5 protein partners according to the HuRI-Union database (see Section 2.10). We found no clear dependence (low R2 values) of the number of interaction partners on the number of amino acid positions predicted to be involved in interactions both across all human transmembrane proteins (Fig. 13a) and in each of the eight major functional families (Fig. 13b-i). Such dependence might be obscured by the fact that the same binding site often mediates interactions with a large number of partners [91].

Fig. 13

Number of interaction partners versus the percentage of amino acids involved in interactions in all human transmembrane proteins (a) and in separate functional families (b)-(i).

Case studies

Human cardiac voltage-gated sodium channel

The human cardiac voltage-gated sodium channel Nav1.5, encoded by the gene SCN5A, is expressed in the cardiomyocyte sarcolemma to carry an inward depolarizing current during phase 0 of the cardiac action potential [92]. According to the BioGRID database Nav1.5 has 22 interaction partners, and we identified additional 21 interaction partners based on literature analysis (Supplementary Table S27) [93], [94], [95], [96], [97], [98], [99]. Only a small number of interaction sites have been described in these studies, of which most are situated in the C-terminal portion of Nav1.5 (positions 1773–2016, indicated by blue triangles in Fig. 14a) [100] and only a few appear in the N-terminal portion [101], [102], [103]. The tail of its C-terminal domain is involved in Na gating inactivation [104] and mutations in this area cause many diseases such as long QT and Brugada syndromes [105]. One ten-residue peptide (positions 2007–2016) [106] and four motifs - PY (1974–1976) [107], extended PY (1974–1980) [108], SXV (2014–2016) [106], and IQ (1901–1927) [105] - localized in the tail of the C-terminal portion, frequently interact with other proteins (Supplementary Table S27). DeepTMInter predicts 17.10% of amino acid sites in Nav1.5 to be involved in interactions (Supplementary Table S28) and these predicted interaction sites occur more frequently in the C-terminal part of the Nav1.5 sequence (Fig. 14a). DeepTMInter was able to capture some of the experimentally established interfaces, including 9 out of 10 residue positions mediating the interaction with syntrophin [106], 5 out of 7 residue positions in the extended PY motif, and 8 positions within the IQ motif involved in the interaction with calmodulin (Fig. 14b-d) [100], [105], [109]. Interactions between Na1.5 and other proteins also occur via the same interfaces; e.g., three proteins encoded by the genes NEDD4, NEDD4L, and WWP2 interact with the PY motif of Na1.5 (see Supplementary Table S27).

Fig. 14

Overview of interaction sites in Na1.5. (a) Comparison of known interaction sites reported in the literature with DeepTMInter predictions. (b) The Nav1.5 fragment colored in white (PDB code: 4jq0 chain D) interacts with the C- and N-lobes of calmodulin (PDB code: 4jq0 chain A) colored in cyan through its IQ motif. The known and predicted interaction surfaces of Nav1.5 are colored in red (c) and blue (d), respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Human calcium homeostasis modulator-2

The human calcium homeostasis modulator-2 (CALHM2), whose 3D structure (PDB code: 6uiw) was released in 2019, is a voltage-gated Ca2+-inhibited non-selective channel controlling ATP release [110]. The entire biological assembly of the human CALHM2 channel (Fig. 15a) consists of 11 subunits arranged in a circular fashion. We used the CALHM2 chain A from the IndepData dataset to examine the performance of three interaction site prediction methods: DeepTMInter, MBPred, and DELPHI. Overall, predictions generated by DeepTMInter (Fig. 15b) correspond better to the experimentally confirmed interfaces than the MBPred and DELPHI results (Figs. 15c and 15d). DeepTMInter achieves the highest AUC value of 0.691 (AUCPR = 0.660) compared to 0.615 (0.603) of MBPred and 0.599 (0.587) of DELPHI. Additionally, we display two models in Supplementary Figs. S15 and S16 from the TestData and CompData datasets, respectively.

Fig. 15

Interaction interfaces of the human CALHM2 chain A. (a) Amino acid positions involved in protein interactions derived from the experimental 3D structure according to the RostInter definition (see Methods) are shown in blue. (b), (c), and (d) Interaction probabilities (left) and binary predictions of interacting (blue) vs. non-interacting residues (right) generated by DeepTMInter, MBPred, and DELPHI, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

Conclusions

Here, we have introduced DeepTMInter, a sequence-based predictor of interaction sites in transmembrane proteins. Methodologically, we capitalized on the ResNet advances, supplemented by stacked generalization for further optimization. On an independent dataset DeepTMInter achieved the AUC and AUCPR values of 0.698 and 0.598, respectively, significantly outperforming MBPred, previously developed by our group, as well as the most recent method, DELPHI. Among the eight major functional families of transmembrane proteins, the percentage of amino acids predicted to reside in interaction sites varies approximately between 10 and 30%, with ion channels exhibiting the largest proportion of sequence involved in interactions.

CRediT authorship contribution statement

Jianfeng Sun: Investigation, Data curation, Conceptualization, Methodology, Formal analysis, Visualization, Software, Validation, Writing - original draft. Dmitrij Frishman: Supervision, Investigation, Data curation, Conceptualization, Methodology, Validation, Project administration, Writing - review & editing.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

84 in total

1. ResPRE: high-accuracy protein contact prediction by coupling precision matrix with deep residual neural networks.

Authors: Yang Li; Jun Hu; Chengxin Zhang; Dong-Jun Yu; Yang Zhang
Journal: Bioinformatics Date: 2019-11-01 Impact factor: 6.937

2. A threshold-free summary index of prediction accuracy for censored time to event data.

Authors: Yan Yuan; Qian M Zhou; Bingying Li; Hengrui Cai; Eric J Chow; Gregory T Armstrong
Journal: Stat Med Date: 2018-02-08 Impact factor: 2.373

Review 3. The multi-faceted aspects of the complex cardiac Nav1.5 protein in membrane function and pathophysiology.

Authors: Nicola Detta; Giulia Frisso; Francesco Salvatore
Journal: Biochim Biophys Acta Date: 2015-07-21

4. Unbiased Taxonomic Annotation of Metagenomic Samples.

Authors: Bruno Fosso; Graziano Pesole; Francesc Rosselló; Gabriel Valiente
Journal: J Comput Biol Date: 2017-10-13 Impact factor: 1.479

5. High precision in protein contact prediction using fully convolutional neural networks and minimal sequence features.

Authors: David T Jones; Shaun M Kandathil
Journal: Bioinformatics Date: 2018-10-01 Impact factor: 6.937

6. Alternative protein-protein interfaces are frequent exceptions.

Authors: Tobias Hamp; Burkhard Rost
Journal: PLoS Comput Biol Date: 2012-08-02 Impact factor: 4.475

7. Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning.

Authors: Rhys Heffernan; Kuldip Paliwal; James Lyons; Abdollah Dehzangi; Alok Sharma; Jihua Wang; Abdul Sattar; Yuedong Yang; Yaoqi Zhou
Journal: Sci Rep Date: 2015-06-22 Impact factor: 4.379

8. The IUPHAR/BPS Guide to PHARMACOLOGY in 2020: extending immunopharmacology content and introducing the IUPHAR/MMV Guide to MALARIA PHARMACOLOGY.

Authors: Jane F Armstrong; Elena Faccenda; Simon D Harding; Adam J Pawson; Christopher Southan; Joanna L Sharman; Brice Campo; David R Cavanagh; Stephen P H Alexander; Anthony P Davenport; Michael Spedding; Jamie A Davies
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

9. Extracting drug-drug interaction from the biomedical literature using a stacked generalization-based approach.

Authors: Linna He; Zhihao Yang; Zhehuan Zhao; Hongfei Lin; Yanpeng Li
Journal: PLoS One Date: 2013-06-13 Impact factor: 3.240

10. Calmodulin binds to the N-terminal domain of the cardiac sodium channel Na_v1.5.

Authors: Zizun Wang; Sarah H Vermij; Valentin Sottas; Anna Shestak; Daniela Ross-Kaschitza; Elena V Zaklyazminskaya; Andy Hudmon; Geoffrey S Pitt; Jean-Sébastien Rougier; Hugues Abriel
Journal: Channels (Austin) Date: 2020-12 Impact factor: 2.581