Literature DB >> 24014171

Ordered disorder of the astrocytic dystrophin-associated protein complex in the norm and pathology.

Insung Na1, Derek Redmon, Markus Kopa, Yiru Qin, Bin Xue, Vladimir N Uversky.   

Abstract

The abundance and potential functional roles of intrinsically disordered regions in aquaporin-4, Kir4.1, a dystrophin isoforms Dp71, α-1 syntrophin, and α-dystrobrevin; i.e., proteins constituting the functional core of the astrocytic dystrophin-associated protein complex (DAPC), are analyzed by a wealth of computational tools. The correlation between protein intrinsic disorder, single nucleotide polymorphisms (SNPs) and protein function is also studied together with the peculiarities of structural and functional conservation of these proteins. Our study revealed that the DAPC members are typical hybrid proteins that contain both ordered and intrinsically disordered regions. Both ordered and disordered regions are important for the stabilization of this complex. Many disordered binding regions of these five proteins are highly conserved among vertebrates. Conserved eukaryotic linear motifs and molecular recognition features found in the disordered regions of five protein constituting DAPC likely enhance protein-protein interactions that are required for the cellular functions of this complex. Curiously, the disorder-based binding regions are rarely affected by SNPs suggesting that these regions are crucial for the biological functions of their corresponding proteins.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24014171      PMCID: PMC3754965          DOI: 10.1371/journal.pone.0073476

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

It is recognized now that many biologically active proteins, known as intrinsically disordered proteins (IDPs), lack stable tertiary and/or secondary structure under physiological conditions in vitro [1]–[34]. They are highly abundant in nature, with ∼25–30% of eukaryotic proteins being mostly disordered, and with >50% of eukaryotic proteins and >70% of signaling proteins having long IDP regions (IDPRs) [35]–[39]. IDPs possess remarkable structural heterogeneity, ranging from completely structure-less, coil-like conformational ensembles to compact molten globule-like structural ensembles [33], [40]–[42]. Furthermore, disorder can affect proteins to a different degree, and some proteins are disordered as a whole, whereas other proteins possess a mosaic or hybrid structure containing both ordered and disordered regions [33], [40]–[43]. Functional repertoire of IDPs/IDPRs is very broad and complements functions of ordered proteins and domains. Disorder-based functions may arise from the specific disorder form, from inter-conversion of disordered forms, or from transitions between disordered and ordered conformations [3], [4], [9], [10], [33]. Many IDPs/IDPRs possess an exceptional binding promiscuity often associated with the ability to fold in a template dependent manner, where a single IDPR can bind to multiple partners gaining very different structures in the bound state [28], [44]. Often, IDPs are involved in regulation, signaling and control pathways, where binding to multiple partners and high-specificity/low-affinity interactions play a crucial role and where IDPs/IDPRs play different roles in regulation of the function of their binding partners and in promotion of the assembly of supra-molecular complexes [1], [3], [5]–[7], [14], [15], [19], [24]–[28], [33]. IDPs and IDPRs are the key players in various protein-protein interaction networks, being especially abundant among hub proteins and their binding partners [14], [45]–[49]. Furthermore, regions of pre-mRNA which undergo alternative splicing commonly encode for the IDRs [50]. This association of alternative splicing and intrinsic disorder helps proteins to avoid folding difficulties and provides a novel mechanism for developing tissue-specific protein interaction networks [32], [50]. Since the absence of rigid structure in IDPs is encoded in the specific features of their amino acid sequences [2], [4], [9], [10], [33], [51], multiple computational tools were elaborated for evaluation of the abundance of intrinsic disorder in proteins and proteomes, for the analysis of the peculiarities of disorder distribution within a given protein, and for finding disorder-based functional sites [16], [52]–[61]. Multifactorial computational studies indicated the abundance and functional importance of intrinsic disorder in various proteinaceous machines, such as nucleosome [62], spliceosome [63], [64], ribosome [65], nuclear pore [66]–[68], the mediator complex [69], and many transcription-related complexes [70]. Computational analysis of the transmembrane proteins revealed that the intracellular regions of single-path proteins are heavily enriched in disorder [71], [72], that the cytoplasmic signaling domains of various cell receptors are frequently disordered [73], that many transmembrane and peripheral membrane proteins contain disorder-based binding sites known as molecular recognition features [74], that the majority of human plasma membrane proteins contain long disordered regions [75], and that the IDPRs from helical bundle integral membrane proteins, those from β-barrel integral membrane proteins, and IDPRs from water soluble proteins all exhibit statistically distinct amino acid compositional biases [76]. Although the multifarious functional roles of disorder in nuclear pore were reported [67], [68], [77], no other membrane-associated protein complexes were subjected to the detailed analysis focused at protein intrinsic disorder. To fill this gap, we report here a computational study on the abundance and roles of intrinsic disorder in five proteins (aquaporin-4, Kir4.1, Dp71, α-1 syntrophin, and α-dystrobrevin) that constitutes the membrane-bound astrocytic dystrophin-associated protein complex (DAPC). The major function of dystrophin is to anchor the extracellular matrix to the cytoskeleton via the F-actin. In a classic view, dystrophin of skeletal and cardiac muscle associates with various proteins to form the dystrophin-associated protein complex (DAPC) [78]. However, DAPC is also positioned in the extracellular membrane of the astrocytic endfeet abutting the blood vessels [79], and it can be found at the neuromuscular junction (NMJ) and at a variety of synapses in the peripheral and central nervous systems where it has a structural function in stabilizing the sarcolemma [80]. The core of the astrocytic DAPC is composed of seven proteins (see Figure 1), an extracellular peripheral glycoprotein α-dystroglycan, a transmembrane protein β- dystroglycan, two specific transmembrane channels, aquaporin-4 (AQP-4) and Kir4.1, and three cytosolically located proteins, such as a fifth isoform of dystrophin (Dp71), α-1 syntrophin (SNTA1), and α-dystrobrevin (DTN-A, also known as dystrophin-related protein 3)) [78], [81]. The heterodimer of the α- and β-dystroglycans is the central DAPC components, where the α-dystroglycan interacts with the laminins in the basal lamina, and the transmembrane β-dystroglycan is involved in connecting the extracellular matrix to the cytoskeleton. The AQP-4 is an extremely important channel maintaining the osmotic balance of the blood-brain barrier (BBB) [82], whereas Kir4.1 acts as an inwardly rectifying K+ channel that has a role in potassium buffering [83]. The PDZ and SU domains of the α-1 syntrophin and other syntrophins are responsible for a set of the cross-protein contacts, interacting with AQP-4, Kir4.1, Dp71 and dystrobrevin-α [78]. A dystrophin isoform Dp71 is reported as an anchoring protein of AQP-4 and Kir4.1 [84], [85]. α-Dystrobrevin is another DAPC component which connects proteins related to the complex in a fashion similar to Dp71 [78]. The structural and functional relationship between each of the members of the DAPC has been known for some time [78], [81] (Figure 1). Since this study is focused on AQP-4, Kir4.1, Dp71, α-1 syntrophin, and α-dystrobrevin, these important DAPC components are briefly introduced below.
Figure 1

Schematic representation of the astrocytic DAPC analyzed in this study.

The major focus of our work was dystrophin (Dp71) and the Dp71-associated proteins AQP-4, Kir4.1, α-1 syntrophin (α-Snt), and α-dystrobrevin.

Schematic representation of the astrocytic DAPC analyzed in this study.

The major focus of our work was dystrophin (Dp71) and the Dp71-associated proteins AQP-4, Kir4.1, α-1 syntrophin (α-Snt), and α-dystrobrevin. The AQP-4 protein is associated with a number of human diseases, but most importantly with Neuromyelitits Optica (NMO). Also referred to as Devic's disease, NMO is an inflammatory demyelinating disease that selectively affects optic nerves and spinal cord. In 2004, an Anti-AQP-4 antibody was discovered that had a significant impact on the diagnosis and understanding of the molecular NMO mechanisms. The discovery of an NMO disease-specific antibody, NMO-IgG [86], was fueled by the observation that immunoglobulin and complement deposition in active lesions followed a distinct rim and rosette vasculocentric pattern, suggesting an antibody-mediated mechanism of disease [86]. Later, the AQP-4 protein was identified as the NMO-IgG target [87]. Potassium channels play an important role in the DAPC signal transduction pathways suggesting that AQP-4 and potassium channels, such as Kir4.1, might work together [83]. Here, AQP-4 likely acts in concert with potassium and bicarbonate channels to regulate water dynamics in the central nervous system (CNS) between the brain, blood, and cerebrospinal fluid (CSF). Overall, this cooperation plays an integral role in maintaining both brain fluid volume and ion homeostasis [88], [89]. The clinical manifestations of NMO are comparable to those found in AQP-4 knockout mice, which implicates a correlation between AQP-4 and Kir4.1. In support of this implication is the subcellular co-localization of AQP-4 with the inwardly rectifying potassium channel Kir4.1 [90]. These results support the hypothesis that AQP-4 and Kir4.1 interact with each other [88]–[90]. The ATP-sensitive, inwardly rectifying potassium channel Kir4.1 is a protein that is encoded by the KCNJ-10 gene in humans [91]. It has the tendency to allow a greater extent of potassium to flow into the cell, rather than out. Potassium channels are found in various parts of the human body, but most notably in cardiac, liver, endothelial and neuronal cells. By carefully modulating the net potassium currents, Kir4.1 helps to maintain a resting membrane potential. The basic building blocks of the Kir4.1 channel are two transmembrane helices with cytoplasmic N- and C-termini and an extracellular loop which folds back to form the pore-lining ion selectivity filter [92]. Mutations of Kir4.1 can cause a number of symptoms including Epilepsy, Ataxia, Sensorineural deafness and Tubulopathy; these are collectively known East syndrome [93]. The next protein in the study is dystrophin, which is a product of the Duchenne Muscular Dystrophy (DMD) gene. More specifically we looked at the fifth isoform of dystrophin, known as Dp71 that constitutes the C-terminal domain of the dystrophin protein. DMD being one of the longest human genes, with a total length of ∼2.2 megabases, is located on the X chromosome. In its mutant form, it is responsible for various myopathies [94]–[96], including Duchenne muscular dystrophy [97], Becker muscular dystrophy [98], X-linked dilated cardiomyopathy [99], and sporadic dilated cardiomyopathy [100]. Dystrophin is normally present within the sarcolemma of the skeletal muscle as part of a large protein complex which forms a linkage between the cytoskeleton, the sarcolemma, and the extracellular matrix [101]. The DMD gene exhibits complex transcriptional regulation and drives the synthesis of a variety of dystrophin isoforms through utilization of different promoters. Full-length dystrophin (427 kDa) is derived from three independent promoters, located at the 5′-end of the DMD gene, that regulate its spatiotemporal expression in muscles, brain structures, and cell types [102]–[104]. Another important DAPC protein is α-1 syntrophin. It is encoded by the SNTA1 gene and acts as an adapter between the AQP-4 and Dp71 proteins by binding via its PDZ domain to the AQP-4 C-terminus. The PDZ domain of α-1 syntrophin, which is the most abundant gene product in the heart, has been reported to bind to the C-terminal domain of the cardiac voltage-gated sodium channels (SkM2) causing alterations of the ion channel activity which causes long QT syndrome [105], [106]. Long QT syndrome (LQTS) is an inborn, abnormal heart rhythm condition characterized by a delayed repolarization of the cardiac muscle, and can lead to dangerous episodes of arrhythmias, cardiac arrest, or even sudden death. LQTS can arise from mutations of one of several associated genes [105], and is inherited in either an autosomal dominant or autosomal recessive manner. The last protein considered in this study is dystrobrevin, which can bind to other molecules in the multi-protein DAPC. Although there are two isoforms dystrobrevin are known, the α- and the β-dystrobrevins, the focus of our research was on the α-isoform because of its interaction with syntrophin and Dp71. In fact, based on the yeast two-hybrid (Y2H) analysis it has been concluded that α-dystrobrevin is involved in the direct heterodimerization with the dystrophin [107]. This is based on findings that suggest the C-terminus of dystrobrevin binds to the C-terminus of dystrophin. The gene that codes for α-dystrobrevin is called DTNA. Mutations in DTNA are known to cause the left ventricular non-compaction type 1 (LVNC1) disease, which is defined by the presence of poor systolic function, and is sometimes associated with other cardiac abnormalities including atrial or ventricular septal defects [108]. In the overall analysis of the DAPC, we found that multiple proteins play significant roles in the formation of the complete, functional complex. Although many of their roles have yet to be elucidated, the physical interactions between these proteins have been studied in detail. Among other analyses, we looked for SNPs in regions that were specifically involved in the interactions between the DAPC proteins. We also looked at the entire amino acid sequences of these proteins to see where the most frequent single nucleotide polymorphisms (SNPs) occurred. In this way we wanted to connect the presence and occurrence of specific SNPs with the destabilization of the DAPC and also to find a potential correlation between such DAPC destabilization and the presence and/or duration of the epileptic seizures in humans. Through our study we have come to hypothesize that the amino acid composition, as well as mutations and variations within each protein's amino acid sequence, affects the DAPC stability and therefore function. To verify this hypothesis, we identified the conserved amino acid sequences within the binding sites of the protein-protein interactions, as well as any mutations, single nucleotide polymorphisms (SNPs), and intrinsically disordered protein regions (IDPRs) related to the DAPC proteins. According to an earlier study emphasizing that the conserved motifs in IDPR have serious impacts on protein function [109], we performed IDPR analysis of the important binding sites in each protein in order to find the conserved amino acid motifs. We also analyzed all available disease mutation and SNP data to cross check the effects that these types of variations have on the DAPC functions.

Results and Discussion

Intrinsic Disorder and Its Conservation within the DAPC Members

To understand the peculiarities of intrinsic disorder predispositions of AQP-4, Kir4.1, Dp71, α-1 syntrophin, and α-dystrobrevin, we investigated the distributions of the predicted disorder propensity (in a form of the plots showing the per-residue PONDR-FIT scores produced by the PONDR-FIT disorder prediction algorithm) in these five DAPC proteins from selected vertebrates, such as mammals (Homo sapiens and Mus musculus), bird (Gallus gallus), reptile (Anolis carolinensis), amphibian (Xenopus laevis), and fish (Brachydanio rerio). Table 1 lists the analyzed proteins together with their corresponding UniProt IDs (www.UniProt.org). Figure 2 represents the results of this analysis and shows that all five proteins from all the organisms analyzed in this study are typical hybrid proteins [43]; i.e., proteins possessing both ordered domains and variously disordered regions. Figure 2 shows that AQP-4 and Kir4.1 have similar disorder profiles with areas of high disorder at the N- and C-termini, and with the majority of sequences having disorder values between the 0.1 and 0.5, although in both proteins, there are a few areas which approach the 0.5 disorder threshold. The low abundance of disorder in AQP-4 and Kir4.1 is rather typical for the transmembrane proteins [76]. Also, disordered tails of transmembrane channels are common points of channel regulation and interaction with other proteins. The disorder profile of Dp71 is quite different from the two transmembrane proteins. Though its N- and C-termini are highly disordered, as seems to be a common feature for all five proteins of interest, the Dp71 protein seems to be evenly split between ordered and disordered domains, with the first half of the protein falling in the 0.1–0.3 disorder range, while the second half falling in the 0.7–0.9 disorder range. Of course, the protein has little “spikes” of disorder in its area of order, and order in its area of disorder. Finally, the disorder profiles of α-1 syntrophin and α-dystrobrevin possess somewhat more sporadic appearance containing multiple clusters of order and disorder.
Table 1

Members of the dystrophin-associated protein complex (DAPC) analyzed in this study.

Homo sapiens Mus musculus Gallus gallus Anolis carolinensis Xenopus laevis Brachydanio rerio
Aquaporin-4P55087P55088F1NJZ6* G1KAC7* F6Z564* F1R274*
Kir4.1P78508Q9JM63F1P0R9* G1KIK1* F6PWL6* E7FD27*
Dp71P11532-5A2A9Z1* F1NS97* G1KGB9* F7BS74* F1QDK4*
α-1 syntrophinQ13424Q61234Q76EY8* G1KHX2* F7BBF2* Q6GMG2*
α-dystrobrevinQ9Y4J8Q9D2N4E1C4Z7* H9GC72* F7BTY8* A2CJ03*

Unreviewed protein identifier.

Figure 2

Abundance of intrinsic disorder in proteins from six vertebrate DAPCs.

A. AQP-4; B. Kir4.1; C. Dp71; D. α-1 syntrophin; and E. α-dystrobrevin. PONDR-FIT scores are shown for corresponding proteins from Homo sapiens (black lines), Mus musculus (red lines), Gallus gallus (green lines), Anolis carolinensis (yellow lines), Xenopus laevis (blue lines), and Brachydanio rerio (pink lines). Disorder profiles were manually aligned by visual inspection to ensure matching of the most characteristic features. The number of gaps introduced in affected proteins during these visual alignments was kept to a minimum. Plot F represents a PONDR-FIT plot for human α-1 syntrophin. Domain structure of this protein is also indicated in relation to the disorder profile. Pink shadow around PONDR-FIT curve represents distribution of errors in the evaluation of the PONDR-FIT scores.

Abundance of intrinsic disorder in proteins from six vertebrate DAPCs.

A. AQP-4; B. Kir4.1; C. Dp71; D. α-1 syntrophin; and E. α-dystrobrevin. PONDR-FIT scores are shown for corresponding proteins from Homo sapiens (black lines), Mus musculus (red lines), Gallus gallus (green lines), Anolis carolinensis (yellow lines), Xenopus laevis (blue lines), and Brachydanio rerio (pink lines). Disorder profiles were manually aligned by visual inspection to ensure matching of the most characteristic features. The number of gaps introduced in affected proteins during these visual alignments was kept to a minimum. Plot F represents a PONDR-FIT plot for human α-1 syntrophin. Domain structure of this protein is also indicated in relation to the disorder profile. Pink shadow around PONDR-FIT curve represents distribution of errors in the evaluation of the PONDR-FIT scores. Unreviewed protein identifier. Looking at each plot globally, trends of order and disorder can still be seen. Here, the amount of predicted disorder among the various DAPC proteins increases in the following order: AQP-4<Kir4.1<α-1 syntrophin <Dp71<α-dystrobrevin. Figure 3 further illustrates this conclusion by showing mean disorder scores evaluated for all analyzed proteins. Another interesting observation is that for any given DAPC member, the overall appearance of the profiles of predicted disorder is rather conserved among various vertebrate species (see Figure 2). This finding suggests that both ordered and intrinsically disordered regions might play important roles in the functionality of the corresponding proteins.
Figure 3

Mean disorder scores evaluated for five DAPC proteins from six vertebrate species: Homo sapiens (black bars), Mus musculus (red bars), Gallus gallus (green bars), Anolis carolinensis (yellow bars), Xenopus laevis (blue bars), and Brachydanio rerio (pink bars).

Error bars represent the corresponding standard errors.

Mean disorder scores evaluated for five DAPC proteins from six vertebrate species: Homo sapiens (black bars), Mus musculus (red bars), Gallus gallus (green bars), Anolis carolinensis (yellow bars), Xenopus laevis (blue bars), and Brachydanio rerio (pink bars).

Error bars represent the corresponding standard errors.

Structural Characterization of the DAPC Proteins

Figure 4 summarizes currently available information on the 3-D structures of proteins from the DAPC. Figure 4A represents side and top views of the structurally characterized domain of AQP-4 corresponding to the central 32–254 region, which is predicted to be mostly ordered (see Figure 2A). Despite the fact that the central region of Kir4.1 is expected to be mostly ordered (see Figure 2B), no structural information is available for this protein as of yet. Structure of a part of the C-terminally located fragment of dystrophin (residues 3046–3306) that roughly corresponds to the N-terminal half of Dp71 is shown in Figure 4B. Dp71 (622 residues) is the fifth isoforms of the human dystrophin produced by the alternative splicing. Dp71 differs from the canonical full-length protein (residues 1–3685) by lacking the first 3068 residues, missing the residues 3409–3421, having the residues KVPYYIN (3069–3075) changed to MREQLKG, and with the 3673–3685 region (residues RNTPGKPMREDTM) being changed and extended to HNVGSLFHMADDLGRAMESLVSVMTDEEGAE. The crystallized fragment of human dystrophin includes the WW-domain (residues 3055–3088) and is clearly predicted to be mostly ordered (see region corresponding to the first 300 residues in Figure 2C).
Figure 4

Structural information on some members of the dystrophin-associated protein complex: A. Crystal structure of the human AQP-4 fragment (PDB ID: 3GD8, residues 32–254); B. Crystal structure of the human dystrophin fragment (PDB ID: 1EG3, residues 3046–3306).

C. NMR solution structure of the mouse α-1 syntrophin fragment (PDB ID: 1Z87, residues 2–264 that correspond to the PHN–PDZ–PHC module). D. NMR solution structure of the mouse α-1 syntrophin fragment (PDB ID: 2ADZ, residues 2–80/165–264 that correspond to the PHN-‘L’-PHC construct). Ten representative members of the conformational ensemble are shown by chains of different color. E. Crystal structure of a complex (PDB ID: 1QAV) between the PDZ domain of mouse α-1 syntrophin (residues 77–164, shown as a colored chain) and the Neuronal nitric oxide synthase, nNOS (shown as gray surface). F. NMR solution structure of the fragment of human α-dystrobrevin (PDB ID: 2E5R, residues 237–292 that correspond to the ZZ-domain). Ten representative members of the conformational ensemble are shown by chains of different color.

Structural information on some members of the dystrophin-associated protein complex: A. Crystal structure of the human AQP-4 fragment (PDB ID: 3GD8, residues 32–254); B. Crystal structure of the human dystrophin fragment (PDB ID: 1EG3, residues 3046–3306).

C. NMR solution structure of the mouse α-1 syntrophin fragment (PDB ID: 1Z87, residues 2–264 that correspond to the PHN–PDZ–PHC module). D. NMR solution structure of the mouse α-1 syntrophin fragment (PDB ID: 2ADZ, residues 2–80/165–264 that correspond to the PHN-‘L’-PHC construct). Ten representative members of the conformational ensemble are shown by chains of different color. E. Crystal structure of a complex (PDB ID: 1QAV) between the PDZ domain of mouse α-1 syntrophin (residues 77–164, shown as a colored chain) and the Neuronal nitric oxide synthase, nNOS (shown as gray surface). F. NMR solution structure of the fragment of human α-dystrobrevin (PDB ID: 2E5R, residues 237–292 that correspond to the ZZ-domain). Ten representative members of the conformational ensemble are shown by chains of different color. Although structural information on human α-1 syntrophin is not available as of yet, the solution structure of the N-terminal domain of its mouse counterpart has been determined (see Figure 4C). It is known that all five members (α, β1, β2, γ1, and γ2) of the syntrophin family share the same domain organization: an N-terminal split pleckstrin homology (PH) domain containing an embedded PDZ domain (PHN–PDZ–PHC), a central PH domain, and a C-terminal syntrophin unique domain (SU) [110]–[114]. In a mouth α-1 syntrophin split PH-domain, the PHN half is composed of three β-strands (β1–β3), and the PHC half contains the remaining four β-strands (β4–β7) and the C-terminal α-helix. A well-folded PDZ domain and two long linkers are inserted at the β3/β4-loop of the PH domain. Figure 4C shows a solution structure of one of the members of the conformational ensemble of the PHN–PDZ–PHC module. Here, both PDZ and split PH-domains are well-folded and separated by highly flexible linkers [114]. This shows that PHN and PHC fragments fold together to form a canonical PH-domain structure containing seven β-strands and one C-terminal α-helix, and that PDZ-domian does not interfere with this folding process. Interestingly, the individual PHN and PHC fragments are completely disordered in isolation but fold into the canonical PH-domain, being mixed together [114]. When the PDZ-domain is taken out the PHN–PDZ–PHC module and substituted by an eight-residue peptide linker (‘L’), the resultant PHN-‘L’-PHC construct is able to fold into a structure indistinguishable from that of the joined PHN–PHC domain within the PHN–PDZ–PHC module (see Figure 4D) [114]. An interesting feature of this structure is the presence of a highly flexible β3/β4 loop. This structure provides further support to the notion that long flexible linkers are needed to ensure sufficient separation of the PDZ-domain from the halves of the split PH-domain thereby guarantying their ability to interact and mutually fold. Finally, several crystal and NMR structures are available for the isolated PDZ-domain and for a complex of this domain with specific binding partners. Figure 4F shows one of these complexes were the PDZ-domain of the mouse α-1 syntrophin was co-crystallized with the neuronal nitric oxide synthase (nNOS) [115]. Importantly, these ordered α-1 syntrophin domains mostly match regions of predicted order in this protein whereas noticeable regions of predicted disorder corresponds to the flexible linkers connecting PHN to PDZ and PDZ to PHC (see Figure 2F). As far as human α-dystrobrevin is concerned, this protein contains several functional domains, such as a region of interaction with the melanoma-associated antigen E1 (MAGEE1, residues 1–288) that also contains a zinc finger domain of ZZ-type (residues 237–284), a syntrophin-binding region (residues 400–450) and a potential coiled-coil region (residues 46–556). The high abundance of intrinsic disorder within the C-terminal half of α-dystrobrevin (Figure 2E) suggests that defining crystal structure of this protein could be a challenge. In agreement with this hypothesis, the structural information is only available for the zinc finger domain (residues 237–292, see Figure 4F). Interestingly, although this region is predicted to be disordered (see Figure 2E), it folds as a result of binding of two Zn2+ ions (Figure 4F), as typically the case for many other zinc-binding proteins and zinc-finger domains [21]–[23]. Both, MAGEE1-interacting region and syntrophin-binding region are predicted to be mostly ordered (see Figure 2E).

Intrinsic Disorder and Sequence Conservation Analysis of the Intersubunit Binding Sites

Since DAPC is formed and stabilized by a number of intersubunit interactions, the corresponding binding sites were next found based on the previously published experimental studies (see Table 2). At the next step, the found intersubunit binding sites of the DAPC members were subjected to the sequence conservation analysis which revealed that these binding sites are well conserved in six vertebrate species, two mammals (Homo sapiens and Mus musculus), bird (Gallus gallus), reptile (Anolis carolinensis), amphibian (Xenopus laevis), and fish (Brachydanio rerio).
Table 2

Intersubunit binding regions found in the DAPC members.

No.A: BBinding Site of ABinding Site of BReference
1AQP-4: α-1 syntrophin319–323PDZ (87–170) [90], [126], UniProt
2Kir4.1: α-1 syntrophin375–379PDZ (87–170) [90], UniProt
3α-1 syntrophin: Dp71SU (449–505)‘Syntrophin Domain’ (362–412) [118], [119], UniProt
4α-1 syntrophin: α-dystrobrevin‘Dystrobrevin Domain’ (408–426)‘Dp71 Domain’ (378–450) [121], [122], UniProt
5α-dystrobrevin: Dp71‘Syntrophin Domain’ (460–500)‘Dystrobrevin Domain’ (420–460) [107]
C-terminal sequences of AQP-4 and Kir4.1 each contains short highly conserved regions (the 319–323 fragment in human AQP-4 (residues VLSSV) and the residues RISNV at the position 377–381 in human Kir4.1. These sequences are responsible for the AQP-4 and Kir4.1 interaction with the α-1 syntrophin PDZ-domain. Figure 2 shows that the PDZ-domain binding sites of both channels are located within their disordered C-terminal tails. As it was mentioned above, human and mouse α-1 syntrophins have similar domain organization. The human protein consists of an N-terminal split PH domain (residues 6–86 and 171–268) containing an embedded PDZ domain (residues 87–170), a central PH domain (residues 293–401), and a C-terminal SU-domain (residues 449–505). The PDZ domains of α-1 syntrophin (residues 87–170 in human protein) are known to bind to the last three or four amino acids of ion channels and receptor proteins [116], [117]. Evolutionary analysis revealed that the PDZ-domain of α-1 syntrophin as well as the PDZ-binding motifs of AQP-4 and Kir4.1 are highly conserved among vertebrates (see Figure 5).
Figure 5

Sequence conservation of the PDZ-binding motifs in AQP-4 (A) and Kir4.1 (B), and of the α-1 syntrophin PDZ-domain (C).

Sequence conservation of the α-1 syntrophin SU domain (D) and of the α-1 syntrophin binding domain of Dp71 (E). In each plot, the position of the corresponding binding motifs and functional domains are highlighted as a colored rectangle.

Sequence conservation of the PDZ-binding motifs in AQP-4 (A) and Kir4.1 (B), and of the α-1 syntrophin PDZ-domain (C).

Sequence conservation of the α-1 syntrophin SU domain (D) and of the α-1 syntrophin binding domain of Dp71 (E). In each plot, the position of the corresponding binding motifs and functional domains are highlighted as a colored rectangle. The SU-domain of α-1 syntrophin (residues 449–505 in human protein) is known to interact with specific segments of the Dp71 and α-dystrobrevin proteins [116], [118], [119]. This SU domainis mostly conserved among the vertebrates (Figure 5D). Significant portion of the SU-domain, including its calmodulin-binding subdomain (residues 481–503 in human protein) [120], is predicted to be disordered (see Figure 2F). The portion of the molecule involved in binding to the α-1 syntrophin SU-domain spans amino acids 362–412. This interactive Dp71 segment, being mostly disordered (see Figure 2C), is highly conserved among vertebrates (Figure 5E). The interacting regions between α-1 syntrophin (residues 408–416) and α-dystrobrevin (residues 378–450) are also known [121], [122]. Figure 2 suggests that although the α-1 syntrophin-interacting domain of α-dystrobrevin is located within the predominantly ordered region, the α-dystrobrevin-binding motif of α-1 syntrophin is predicted to possess noticeable conformational flexibility (as indicated by its disorder scores noticeably deviating from 0.0). Curiously, conservation analysis revealed that the α-dystrobrevin-binding motif of α-1 syntrophin possesses minimal conservation among different species (Figure 6A), whereas the mostly disordered α-1 syntrophin-interacting domain of α-dystrobrevin showed more cross-species conservation (Figure 6B). Figure 2 shows that the interacting regions between the Dp71 (residues 420–460) and α-dystrobrevin (residues 460–500) are predicted to be highly disordered. Despite this fact, these intersubunit binding regions were shown to be highly conserved in the vertebrate species (see Figures 6B and 6C). Finally, since the Dp71 is one of the alternatively spliced isoforms of a canonical dystrophin, we analyzed the inter-isoform conservation of the dystrophin domains interacting with the SU-domain of α-1 syntrophin and with α-dystrobrevin and showed that these two binding domains are highly conserved among all the dystrophin DMD isoforms (Figure 6D).
Figure 6

Analysis of sequence conservation of the inter-subunit interaction sites in the DAPC.

In each plot, the position of the corresponding binding motifs and functional domains are highlighted as a colored rectangle. A. Conservation of the α-dystrobrevin-interacting site of α-1 syntrophin. B. Multi-domain sequence conservation analysis of α-dystrobrevin (conservation of the domains interacting with α-1 syntrophin and Dp71). C. Conservation of the α-dystrobrevin-interacting domain of Dp71. D. Multi-domain sequence conservation analysis of Dp71 (conservation of the domains interacting with the α-1 syntrophin SU-domain and α-dystrobrevin).

Analysis of sequence conservation of the inter-subunit interaction sites in the DAPC.

In each plot, the position of the corresponding binding motifs and functional domains are highlighted as a colored rectangle. A. Conservation of the α-dystrobrevin-interacting site of α-1 syntrophin. B. Multi-domain sequence conservation analysis of α-dystrobrevin (conservation of the domains interacting with α-1 syntrophin and Dp71). C. Conservation of the α-dystrobrevin-interacting domain of Dp71. D. Multi-domain sequence conservation analysis of Dp71 (conservation of the domains interacting with the α-1 syntrophin SU-domain and α-dystrobrevin). Therefore, these analyses revealed that the intersubunit binding regions of the DAPC proteins are highly conserved. Importantly, our analysis also showed that in each interacting pair analyzed, at least one binding region involved in such intersubunit contacts was predicted to be intrinsically disordered. These results indicated that the DAPC proteins utilize highly conserved IDPRs for their intersubunit communications, thereby emphasizing the functional importance of intrinsic disorder in assembly of this important complex.

Predicted Intrinsic Disorder-Based Binding Sites in the DAPC proteins

We also analyzed the coincidence of known binding sites of various DAPC proteins with the potential binding sites predicted by two principally different computational tools for finding the disorder-based interaction features, the ANCHOR (http://anchor.enzim.hu/) [123] and the MoRFpred (http://biomine-ws.ece.ualberta.ca/MoRFpred/ index.html) [124]. Results of this analysis are summarized in Table 3 which clearly shows that the majority of known binding sites are predicted by at least one of these computational tools. This finding provides further support to the idea that intrinsic disorder is important for the DAPC formation and stabilization, since many binding motifs of the DAPC proteins are located within the disordered regions. Also, it is obvious that there is some kind of disorder complementarity among binding sites, since when one interacting protein displays interaction-prone regions of intrinsic disorder, the binding site of its binding partner typically does not contain disorder. In this way, one protein acts as a donor, offering its intrinsically disordered region for binding, while the other protein acts as an acceptor, offering its more rigid region to complement the binding interaction.
Table 3

Comparison of the ANCHOR and MoRFpred results with known binding site.

No.ProteinANCHORMoRFpredAlready known
1AQP-4None319–323319–323
2Kir4.1None375–379375–379
3α-1 syntrophin113–119, 129–135, 163–170None87–170, 449–505, 408–426
4Dp71362–401, 409–412, 420–460387–395, 423–430, 442–449362–412, 420–460
5α-dystrobrevin379–396, 406–416, 437–445378–384, 434–442, 466–475, 477–480378–450, 460–500

Intrinsic Disorder and Diseases-Causing Mutations in the DAPC proteins

Next we analyzed the human DAPC proteins for the presence of mutations known to cause diseases. This analysis did not find reported mutations in the AQP-4 protein that were known to lead to disease phenotypes. However, twelve allelic variants were found in Kir4.1. Analysis of these mutations revealed that half of the allelic variants affected arginine (see Figure 7). The complete list of substitutions found in Kir4.1 and the positions at which they occurred are shown in Table 4. It is important to note that no disease-promoting mutations were found in the known binding site of Kir4.1 (amino acids 375–379).
Figure 7

Frequencies of disease-related mutations in Kir4.1.

Table 4

Disease-related amino acid substitutions in Kir4.1.

OriginalSubstitutionPosition
ArgPro65
ArgCys65
PheLeu75
GlyArg77
ProLeu121
CysArg140
ThrIle164
AlaVal167
ArgHis194
ArgTer199
ArgCys297
ArgCys348
Eleven disease-related mutations were identified in the Dp71-encoding gene DMD. In most cases, these mutations result in Duchenne muscular dystrophy (see Table 5). Six out of the eleven mutations were frame shifts downstream from either a Lysine or Leucine residue, causing the protein to be truncated. In addition, point mutations, which caused arginine to change into a stop codon, were observed three times at positions 122, 302 and 313. Interestingly, there were three frame-shift mutations, in the α-1 syntrophin binding site of Dp71, at positions 363, 404, and 411. Mutations of two of these positions (Leu363 and Leu404) lead to the frame shifts resulting in the Duchenne muscular dystrophy, whereas the mutation of Ala411 was another frame-shift mutation that caused Becker muscular dystrophy.
Table 5

Disease-related amino acid substitutions in Dp71.

OriginalEffectAA Position
IleFrame shift89
ArgTermination122
ArgTermination122
CysSubstitution to Tyr272
ArgTermination302
LysFrame shift306
ArgTermination313
LeuFrame shift363
LeuFrame shift404
AlaFrame shift411
LeuFrame shift414
The only disease-related mutation in the α-1 syntrophin is an A390V mutation which is linked to long QT syndrome 12. However, this mutation is positioned outside the PDZ- and SU-domains as well as outside the region that interacts with α-dystrobrevin. Finally, only one allelic variant (P121L) is known for α-dystrobrevin, which is also located outside any relevant binding sites. Because, intrinsic disorder is an important function-related feature, we analyzed whether the known SNPs can cause significant changes in the intrinsic disorder pattern of whole protein. To this end, a paired T-test between the wild type protein and each variant containing a single SNP was performed. Here, we analyzed the statistical significance of the single SNP effect on the disorder probabilities of proteins evaluated by PONDR-FIT, PONDR® VLXT, and PONDR® VSL2. Here, a paired T-test was applied to a pair of averaged per residue disorder scores calculated for the wild-type protein and variant and significance level of 0.05 was used to determine the statistical significance of the effect of a given SNP. Among numerous SNPs reported for each protein, only SNPs affecting binding sites and disease causing SNPs were considered. The analysis of SNPs affecting residues in the binding sites or binding domains revealed that the majority of such mutations caused the related protein to be less disordered than the corresponding wild type protein (i.e., the majority of SNPs resulted in some decrease in the overall disorder score of a corresponding protein) (Table 6). Here, among 40, 39 and 39 SNPs analyzed by PONDR-FIT, PONDR® VLXT, and PONDR® VSL2, there were 24, 29, and 25 SNPs, respectively, which were predicted to make whole protein to be less disordered. Similarly, the majority of disease-causing SNPs tend to change each protein to be less disordered than the corresponding wild type protein (see Table 7). Again, among 15-15-15 SNPs checked by PONDR-FIT, PONDR® VLXT, and PONDR® VSL2 algorithms, 7, 13 and 6 SNPs were predicted to decrease the mean disorder score of a corresponding protein. Importantly, there was no intersection between the SNPs in binding sites and disease causing SNPs. To further substantiate our conclusions, we also analyzed the effect of disease-causing SNPs located within the functional motifs and functional domains that are listed for the DACP members in the ELM (http://elm.eu.org/) and pFam (http://pfam.sanger.ac.uk/) databases. The corresponding data are listed in Table 8.
Table 6

Paired T-test P-values of the SNPs in the intersubunit binding sites.

ProteinSNPPONDR-FIT (effect)PONDR-VLXT (effect)VSL2 (effect)
AQP-4V319I1.72e-2 (Less ID)1.62 e-3 (Less ID)1.51e-9 (Less ID)
Kir4.1R375H7.88e-7 (Less ID)1.26e-3 (Less ID)1.90e-8 (Less ID)
Dp71N399K8.74e-9 (Less ID)1.47e-6 (Less ID)9.21e-8 (Less ID)
Q400R7.57e-11 (Less ID)4.76e-6 (Less ID)2.18e-11 (Less ID)
I428V1.99e-7 (Less ID)4.98e-7 (More ID)2.84e-11 (Less IDP)
R445H1.65e-6 (Less ID)3.98e-6 (Less ID)4.41e-12 (Less ID)
R445L2.81e-10 (Less ID)4.32e-6 (Less ID)3.05e-13 (Less ID)
R445P4.17e-4 (Less ID)NoneNone
S456T1.08e-7 (Less ID)1.88e-5 (Less ID)1.51e-14 (Less ID)
S456Y6.17e-12 (Less ID)6.18e-5 (Less ID)1.98e-15 (Less ID)
L458V3.31e-12 (Less ID)3.33e-5 (More ID)1.31e-15 (Less ID)
α-1 syntrophinR106Q4.75e-4 (Less ID)5.68e-6 (Less ID)4.90e-1 (Less ID)
I114F1.13e-7 (Less ID)2.05e-5 (Less ID)8.81e-9 (Less ID)
K116E1.24e-8 (More ID)8.99e-6 (More ID)1.63e-4 (Less ID)
A122T3.63e-3 (Less ID)4.92e-6 (Less ID)7.71e-4 (More ID)
Q125H1.61e-1 (Not significant)5.44e-5 (Less IDP)1.07e-1 (Not significant)
F130L1.11e-1 (Not significant)2.21e-6 (More IDP)5.04e-7 (More ID)
S144A6.13e-8 (Less ID)4.87e-7 (Less ID)2.53e-8 (Less ID)
T147N1.32e-4 (More ID)5.62e-7 (Less ID)3.32e-3 (More ID)
V154I4.64e-6 (Less ID)2.21e-5 (Less ID)6.81e-11 (Less ID)
R419C3.82e-6 (Less ID)1.87e-4 (Less ID)7.86e-6 (Less ID)
E451K2.43e-3 (Less ID)2.14e-6 (Less ID)1.68e-3 (More ID)
G460S2.96e-6 (More ID)5.08e-7 (More ID)1.42e-6 (More ID)
L463R2.48e-4 (More ID)9.35e-5 (Less ID)1.17e-5 (More ID)
S481L1.36e-2 (More ID)5.70e-5 (Less ID)1.07e-4 (Less ID)
S495-1.13e-4 (More ID)2.50e-1 (Not significant)1.31e-2 (More ID)
S495L3.93e-1 (Not significant)1.95e-3 (Less ID)1.34e-4 (Less ID)
α-dystrobrevinA383S6.72e-10 (More ID)3.82e-5 (More ID)3.36e-9 (More ID)
N397D1.02e-5 (More ID)1.95e-1 (Not significant)1.59e-8 (More ID)
A403V9.21e-12 (Less ID)8.18e-7 (Less ID)6.58e-9 (Less ID)
H406Y2.33e-7 (Less ID)3.53e-5 (Less ID)9.25e-10 (Less ID)
G410R1.33e-9 (More ID)2.68e-5 (Less ID)4.59e-10 (More ID)
Y412S1.28e-13 (More ID)1.76e-7 (More ID)6.58e-12 (More ID)
R417W6.83e-15 (Less ID)2.80e-6 (Less ID)2.31e-9 (Less ID)
C422R3.79e-13 (More ID)1.95e-2 (More ID)2.55e-9 (More ID)
E425G4.21e-10 (Less ID)1.03e-5 (Less ID)1.63e-9 (Less ID)
A445T1.07e-4 (More ID)2.84e-6 (Less ID)1.93e-10 (Less ID)
S448F1.62e-13 (Less ID)6.80e-7 (Less ID)5.48e-11 (Less ID)
D467N1.97e-1 (Not significant)5.96e-4 (Less ID)9.20e-7 (More ID)
R494W6.21e-11 (Less ID)4.17e-7 (Less ID)3.68e-13 (Less ID)
Table 7

Paired T-test P-values of the disease-causing SNPs.

ProteinSNPPONDR-FIT (effect)PONDR-VLXT (effect)VSL2 (effect)
AQP-4None
Kir4.1R65P8.64e-8 (More ID)6.45e-4 (Less ID)1.44e-3 (More ID)
G77R1.11e-8 (Less ID)7.97e-5 (More ID)2.83e-1 (Not significant)
C140R1.77e-1 (Not significant)3.66e-7 (More ID)2.18e-4 (More ID)
T164I2.30e-4 (More ID)1.34e-4 (Less ID)5.96e-4 (Less ID)
A167V4.41e-3 (Less ID)4.89e-5 (Less ID)2.14e-4 (Less ID)
P194H1.95e-1 (Not significant)4.65e-5 (Less ID)3.82e-3 (More ID)
R199-1.07e-3 (More ID)2.85e-3 (Less ID)1.54e-1 (Not significant)
R297C3.20e-8 (Less ID)3.23e-6 (Less ID)8.30e-5 (Less ID)
R348C5.50e-7 (Less ID)4.41e-7 (Less ID)9.84e-10 (Less ID)
Dp71R122-2.83e-4 (Less ID)5.25e-5 (Less ID)2.33e-1 (Not significant)
C272Y2.08e-7 (More ID)1.76e-4 (Less ID)3.29e-5 (More ID)
R302-1.78e-3 (More ID)5.16e-3 (Less ID)3.07e-1 (Not significant)
R313-5.06e-3 (More ID)3.55e-2 (Less ID)3.33e-1 (Not significant)
α-1 syntrophinA390V2.15e-4 (Less ID)2.68e-7 (Less ID)6.17e-7 (Less ID)
α-dystrobrevinP121L3.55e-3 (Less ID)2.80e-4 (Less ID)5.71e-4 (Less ID)
Table 8

Functional motifs and functional domains of the DAPC members affected by the disease causing SNPs.

ProteinSNPpFamELMP-valueIDPRTendencyDisease
Kir4.1R65PIRKTRG_LysEnd_APsAcLL_1 LIG_PP18.64e-8×More IDSeizure
G77RMOD_GSK3_1TRG_PEX_21.11e-11×Less IDSeizure
C140RNone1.77e-1×Not significantSeizure
T164INone2.30e-4×More IDSeizure
A167VNone4.41e-3×Less IDSeizure
P194HNone1.95e-1×Not significantDeafness
R199-None1.07e-3×More IDSeizure
R297CNone3.20e-8×Less IDSeizure
R348CMOD_PKA_2 MOD_PLK5.50e-7×Less IDDeafness
Dp71R122-EF-Hand 2LIG_FHA_1 LIG_MAPK_12.83e-4×Less IDMuscle Dystrophy
C272YZZNone2.08e-7×More IDMuscle Dystrophy
R302-pFam-B _2847None1.78e-3×More IDMuscle Dystrophy
R313-pFam-B _2847CLV_PCSK_FUR_15.06e-3×More IDMuscle Dystrophy
α-1 syntrophinA390VPH, pFam-B_796 & _18143None2.15e-4×Less IDLong QT Syndrome 12
α-dystrobrevinP121LEF-Hand 2 SporeIII_AENone3.55e-3×Less IDLeft Ventricular Noncompaction 1
Our analysis revealed that although all five proteins contain multiple SNPs, disorder-based binding regions of these DAPC members are rarely affected by mutations. This observation suggests that the functional versatility of IDPRs in AQP-4, Kir4.1, Dp71, α-1 syntrophin, and α-dystrobrevin precludes these regions from being mutated, hence showing least number of mutations in them. However, it is also likely that mutations of individual residues within the functional IDPRs of these proteins are well tolerated, since the evolutionary pressure may have shifted to maintaining global biophysical properties and structural malleability of the IDPRs to safeguard the critical protein functions [125].

Visual Analyses of the Effects of SNPs on Protein Intrinsic Disorder Propensity

Going beyond the disease related mutations, further analysis was performed by looking at all available SNPs for each of the five proteins from the dystrophin-associated protein complex. As stated above, the Ensemble Genome Browser (http://useast.ensemble.org) was used to search for all SNPs related to the proteins in question. The total variation analysis of each gene showed that there were 2,775 SNPs for genes encoding for AQP-4, 730 SNPs for KCNJ-10 (Kir4.1), and 832, 2,563, and 2,427 SNPs for α-1 syntrophin, Dp71, and α-dystrobrevin, respectively. Only SNPs that corresponded to the actual amino acid substitutions in corresponding proteins were analyzed. Thus, any nonsense, non-coding, splice region, synonymous, 3′ or 5′ UTR variants, or any other type of variant that did not pertain to actual amino acid changes were omitted as this data related to areas of each protein transcriptome that would not impact the amino acid sequences of the resulting protein forms. There were 242, 67, 81, 69, and 115 amino acid substitutions-inducing SNPs in AQP-4, Kir4.1, α-1 syntrophin, Dp71, and α-dystrobrevin, respectively. The analyzed variations were broken down into an overall distribution of substitutions affecting polar and non-polar residues in each protein. Figure 8 shows which amino acids dominate the total polar and non-polar substitutions for each protein and gives insights into the likelihood of SNPs and related disorders being linked to specific amino acids, as some are much more common than others. These distributions also help to conceptualize the potential effects of substitutions and make a prediction on whether a given mutation can affect protein folding and functionality. For example, the addition of a polar or charged residue to a protein that is mostly non-polar and hydrophobic (such as AQP-4 or Kir4.1) could have drastic impacts on the ability of this protein to maintain both stability and function and therefore to interact with the conjugate proteins in the complex. It is this exact occurrence that is the focus of our study.
Figure 8

Distribution of SNPs affecting polar and non-polar residues in AQP-4 (A), KCNJ-10 (Kir4.1, B), α1-syntrophin (C), Dp71 (D), and α-dystrobrevin (E).

Figure 9 represents a series of scatter diagrams that show the frequencies of substitutions within the corresponding protein sequences. Here, a high frequency number means that a certain position within the protein has a high level of SNP occurrence. These diagrams give an idea of where in the sequence certain variations are prone to happen. One should keep in mind though that multiple variations at a certain point in a protein can generate different residues. Areas of absence or abundance of points in these diagrams is a nice clue that can be linked to some other sequence specific features, such as distribution of intrinsic disorder propensities, thereby giving more insights in context of the overall stability/structure of the proteins of interest.
Figure 9

Scatter plots showing SNP distributions within the amino acid sequences of AQP-4 (A), Kir4.1 (B), α1-syntrophin (C), Dp71 (D), and α-dystrobrevin (E).

SNPs happening in known binding regions are indicated by different colors that match colors in corresponding tables (see Tables 9-11).

Scatter plots showing SNP distributions within the amino acid sequences of AQP-4 (A), Kir4.1 (B), α1-syntrophin (C), Dp71 (D), and α-dystrobrevin (E).

SNPs happening in known binding regions are indicated by different colors that match colors in corresponding tables (see Tables 9-11).
Table 9

Binding site SNPs in α-1 syntrophin.

RegionSNP IDSNP TypeVariationAA Position
PDZ Domain (87–170, red)rs75025585MissenseR/Q106
TMP_ESP_20_32026803MissenseI/F114
TMP_ESP_20_32026797MissenseK/E116
rs151113230MissenseA/T122
rs201421292MissenseQ/H125
rs199964677MissenseF/L130
rs142978180MissenseS/A144
rs141724500MissenseT/N147
rs200241523MissenseV/I154
rs200647905SynonymousL164
Dystrobrevin Domain (408–416, green)rs142580715SynonymousV410
SU Domain (449–505, blue)rs116747979SynonymousF450
rs148604302MissenseE/K451
rs146134721SynonymousD459
TMP_ESP_20_31996554MissenseG/S460
rs188835994MissenseL/R463
rs201485963SynonymousH480
TMP_ESP_20_31996389MissenseS/L481
TMP_ESP_20_31996346SynonymousS495
rs144006909MissenseS/L495
rs144006909Stop gainedS/*495
rs34901081SynonymousA496
Table 11

Binding site SNPs in α-dystrobrevin.

RegionSNP IDSNP TypeVariationAA Position
Syntrophin Domain (378–450, green)rs192673085SynonymousL378
rs141881401MissenseA/S383
rs150679265MissenseN/D397
TMP_ESP_18_32418744MissenseA/V403
rs139872140MissenseH/Y406
TMP_ESP_18_32418763SynonymousI409
TMP_ESP_18_32418764MissenseG/R410
rs186573363MissenseY/S412
rs199867593MissenseR/W417
COS122724MissenseR/W417
TMP_ESP_18_32418800MissenseC/R422
TMP_ESP_18_32428268MissenseE/G425
rs144776465SynonymousA441
COSM221429MissenseA/T445
COSM221430MissenseA/T445
rs147541731MissenseS/F448
rs202088347SynonymousS450
Dp-71 Domain (460–500, blue)rs145061501SynonymousI466
rs144880521MissenseD/N467
rs149071180SynonymousL495
Specific SNPs are highlighted in each of these scatter plots to denote those that were present in specific binding regions. Different colors were used in order to distinguish between multiple binding regions present on a protein. All the different SNPs in binding regions of these proteins are given in Tables 9–11 and are color coded to indicate the regions in which they occur. Note, that a plot for the Kir4.1 protein has two colors to highlight four specific SNP's. While two SNPs (blue highlights, positions 375 and 376, Figure 9B) are associated with the actual binding region of this protein, the other two (red highlights, positions 26 and 199, Figure 9B) are SNPs resulting in gaining the stop-codons that lead to protein truncations. Therefore, these SNPs were considered important and included in the scatter plot for Kir4.1. The latter three scatter plots have many more SNPs associated with binding regions. However, as stated earlier no SNP within the binding regions of any of these proteins had known links to epileptic seizure phenotypes. Only Long QT Syndrome and multiple phenotypes of muscular dystrophy have been linked to mutations/variants in these binding regions. The results above show that only one SNP (rs200498749) was present in the AQP-4 binding region, at position 319. This was a missense variant that caused the amino acid change from valine to isoleucine. This mutation is located at the C-terminal end of the protein which is responsible for interacting with α-1 syntrophin. As stated earlier, this SNP is not known to be associated with any disease. The same was true for our analysis of Kir4.1 which showed an SNP at position 376, which is in the C-terminally located binding region of this protein. However, as with AQP-4, this SNP is not associated with any disease phenotypes. These two proteins are thought to be directly responsible for certain seizure phenotypes because of the role they play in the DAPC and cell homeostasis. However, the scatter plots show that SNPs are frequent outside of their short binding regions. This could mean that although the biding regions are primarily responsible for these proteins ability to interact, they have little to do with sustaining stability and overall protein functionality. The scatter plots for α-1 syntrophin, Dp71 and α-dystrobrevin show a very different variation content as compared to those of the Kir4.1 and AQP-4 proteins. These plots emphasize the large difference in SNP frequency, especially in the binding regions of interest. As Figure 9 shows, the large majority of variations only have a frequency of one. This frequency forms the linear clustering that can be seen across the bottom of each respective scatter plot. Tables 9–11 give even more detail on these proteins, listing the ID, type, actual variation and amino acid position of each SNP as it relates to the particular binding site. It is important to note that a single amino acid position can be affected by multiple SNP types. This is best represented in the SU-domain of α-1 syntrophin where position 495 can have three different variations; a missense variant, a synonymous variant and a stop-codon variant. Although each of these SNPs can occur at the same location, they have drastically different impacts on the protein. The synonymous variant does not induce amino acid change, while the variant resulting in the stop-codon cuts off the last 11 amino acids, all of which are part of the SU-domain. As this domain is the region responsible for α-1 syntrophin interaction with both Dp71 and α-dystrobrevin, this SNP could have a significant impact on the DAPC and its overall stability/function. Therefore, the scatter plots shown in Figure 9 and the associated data help to uncover meaningful results from the SNP data as well as aid in the visualization of where these variations are physically occurring within each proteins amino acid sequence. Further analysis was performed on the available SNPs for each of these five proteins in order to determine the impact that specific variations had on protein intrinsic disorder propensity that can be translated to the potential effects of amino acid variations on protein stability and functionality. At this stage we got rid of any identical SNPs which were repeated multiple times at a specific location. However, those SNPs which occurred at the same location but resulted in different amino acids were included. This allowed us to trim down the total variants of each protein to just those that were unique and which directly impacted the amino acid sequences. This pre-filtering generated 52, 56, 57, 56, and 80 unique SNPs for AQP-4, KCNJ-10 (Kir4.1), α-1 syntrophin, Dp71, and α-dystrobrevin, respectively that were used in all subsequent analyses. Figure 10 is a visual representation of data on the effect of SNPs on the mean intrinsic disorder score of a given protein. The statistical significance of the effects of corresponding mutations on protein's disorder score was already discussed above. In each plot of the Figure 10, disorder scores are shown on the Y-axis, whereas X-axis represents the SNP numbers. Note that these SNP numbers are used as identifiers of given SNPs and are not related to the SNP positions within the protein sequence. Figure 10 shows that some proteins have very little variation associated with their SNPs while others have a wide range of differences. Additionally, proteins differ from each other not only by the number of SNPs affecting their mean disorder scores, but also by the amplitudes of these changes. Therefore, these plots provide a simple illustrative mean to discern which proteins are more likely to be effected by changes in their amino acid sequences. For example, the comparison of plots corresponding to AQP-4 and α-dystrobrevin shows that AQP-4 has only a few spots of extreme deviation from the wild type values, whereas α-dystrobrevin has numerous spots that differ from the wild type disorder value. This suggests that the disorder propensity of AQP-4 is much less affected by most variations. We believe that these types of graphs are helpful in that they give a clear picture of those SNPs which do or do not affect the disorder status of a protein. In this way, the number of unique SNPs which have to be focused on for further analysis can be trimmed down. Also, SNPs that do affect the mean disorder level can be ranked based on the severity of their effects (the scale of deviation from the mean disorder scores of wild type proteins), which provides another way to focus on specific variations.
Figure 10

Analyzing the effect of SNPs on the fraction of disordered residues in a target protein: AQP-4 (A), Kir4.1 (B), Dp71 (C), α1-syntrophin (D), and α-dystrobrevin (E).

For a given protein, faction of disordered residues was determined as a relative content of residues with the disorder score above the 0.5 threshold. Note that numbers on the X-axis correspond to the identification numbers of SNPs and not to their positions within the protein sequence.

Analyzing the effect of SNPs on the fraction of disordered residues in a target protein: AQP-4 (A), Kir4.1 (B), Dp71 (C), α1-syntrophin (D), and α-dystrobrevin (E).

For a given protein, faction of disordered residues was determined as a relative content of residues with the disorder score above the 0.5 threshold. Note that numbers on the X-axis correspond to the identification numbers of SNPs and not to their positions within the protein sequence. Another way of showing the effects of SNPs on the protein's disorder propensity is a plot where the per-residue disorder scores of the SNP-produced variant are correlated with the per-residue disorder scores of the corresponding wild type protein (see Figure 11). In corresponding plots, the X-axis shows the per-residue disorder scores for the wild type proteins, whereas the Y-axis shows the position matched disorder scores of the mutant protein. Obviously, when mutations do not affect the protein's disorder score, the corresponding dependence is described as a straight line following the diagonal of a given plot, whereas any deviation from this diagonal straight line is a representation of an effect of an SNP on protein's disorder propensity. Here, the positive deviations (i.e., moving plots above the diagonal) reflect the SNP-induced increase of intrinsic disorder propensity in a given protein, whereas the negative deviations (i.e., those moving plots below the diagonal) denote the SNP-promoted decrease in protein's disorder level. Obviously, the severity of the effect of an SNP on protein disorder propensity can be evaluated by the magnitude of the corresponding line deviation from the diagonal.
Figure 11

Correlation between the per-residue disorder scores of the wild type proteins and SNP-produced variants of AQP-4 (A), Kir4.1 (B), α1-syntrophin (C), Dp71 (D), and α-dystrobrevin (E).

These graphs are generated by plotting the per-residue disorder scores of SNP-produced variants versus the per-residue disorder scores of corresponding wild type proteins. Each line in these plots corresponds to the pre-residue disorder scores correlation evaluated for one SNP-produced variant.

Correlation between the per-residue disorder scores of the wild type proteins and SNP-produced variants of AQP-4 (A), Kir4.1 (B), α1-syntrophin (C), Dp71 (D), and α-dystrobrevin (E).

These graphs are generated by plotting the per-residue disorder scores of SNP-produced variants versus the per-residue disorder scores of corresponding wild type proteins. Each line in these plots corresponds to the pre-residue disorder scores correlation evaluated for one SNP-produced variant. Figure 11A represents the corresponding disorder correlation graph for AQP-4 that has only a few SNPs that significantly shift the disorder line away from the diagonal. The three that stand out the most are SNPs at positions 136 (Valine-Phenylalanine), 182 (Arginine-Tryptophan), and 260 (Arginine-Cysteine). Figure 11A shows that these three variants pushed the protein towards more order. Most changes in disorder propensity happen in a window of about 15–30 amino acids surrounding the SNP. The range seems to be dependent on how different the amino acid is from the wild type and the characteristics of the residues surrounding it. Figure 11B shows that Kir4.1 has five SNPs causing severe shifts in protein's order/disorder propensity. These are SNPs at positions 18 (Arginine-Tryptophan), 26 (Arginine-Stop Codon), 36 (Arginine-Cytosine), 171 (Arginine-Glutamine), and 181 (Phenylalanine-Leucine). The stop codon at position was the most severe as it truncated the protein to a length of only 25 residues and caused noticeable increase in protein's disorder propensity. The SNP at position 181 also increased disorder noticeably. The other three SNPs, 18, 36 and 171, all resulted in increased order in a protein. α-1-Syntrophin was mostly unaffected by a majority of SNP's. However, Figure 11C shows that there is a general shift below the diagonal which would denote more order in variants produced by SNPs. The SNPs at positions 189 (Serine-Leucine), 207 (Arginine-Tryptophan), and 495 (Serine-Leucine) were the most order-promoting, whereas the SNPs affecting residues 442 (Arginine-Stop Codon) and 495 (Serine-Stop Codon) both truncated the protein and caused significant increases in disorder. Figure 11D illustrates that the SNP-induced variability of the Dp71 disorder propensity is rather high. Four SNPs, 122 (Arginine-Stop Codon), 242 (Glutamine-Stop Codon), 302 (Arginine-Stop Codon) and 313 (Arginine-Stop Codon), incorporated stop codons and resulted in the great increase in the disorder propensity. SNPs that occur at positions 7 (Glycine-Serine), 64 (Alanine-Aspartic Acid), 583 (Asparagine-Lysine), and 583 (Asparagine-Serine) all promote a little more disorder, whereas SNPs at positions 196 (Arginine-Tryptophan), 399 (Asparagine-Lysine), 485 (Arginine-Cysteine), and 600 (Methionine-Valine) all resulted in increased order. Finally, Figure 11E shows that α-dystrobrevin is impacted the most by a majority of SNPs. However, this effect could partly be attributed to the fact that α-dystrobrevin has the largest number of unique SNPs. Standouts include SNPs at positions 31 (Arginine-Glutamine), 121 (Proline-Leucine), 308 (Serine-Isoleucine), 448 (Serine-Phenylalanine), 448 (Serine-Phenylalanine), 494 (Arginine-Tryptophan), 519 (Arginine-Tryptophan) and 691 (Arginine-Cysteine), which all result in an increased order. Alternatively, SNPs affecting positions 9 (Glycine-Arginine), 246 (Cysteine-Serine), 412 (Tyrosine-Serine) and 422 (Cysteine-Arginine) all cause somewhat increased disorder, whereas an SNP at position 676 (Arginine-Stop Codon) is the most detrimental as it cut off the last 69 amino acids and greatly increases disorder. Concluding, presented in this study several types of visual analyses constitute a practical tool in determining which SNPs change the intrinsic disorder predisposition in a target protein. They also allow for visual representation of the severity of the resulting changes. Thus, for a given protein, the analysis of a few hundred SNPs can be reduced to a small subset comprising of SNPs that possess the most detrimental effects on the protein intrinsic disorder propensity. Since the correlation between the peculiarities of intrinsic disorder profiles and functionality is established for several proteins, we believe that the described analyses represent a useful addition to the arsenal of tools for computational analysis of disorder-based protein functions.

Materials and Methods

Conserved Sequence Analysis

Proteins from mammals (Homo sapiens and Mus musculus), fish (Brachydanio rerio), amphibian (Xenopus laevis), bird (Gallus gallus), and reptile (Anolis carolinensis) analyzed in this study are listed in Table 1 together with their corresponding UniProt IDs (www.UniProt.org). Alignments of different sequences performed to find the conservation levels were done using the alignment tools available at the UniProt website (www.UniProt.org), Clustal O 1.1.0 (http://www.clustal.org/omega/). Each binding site was determined based on the analysis of the available literature data [90], [107], [118], [119], [121], [122], [126] and the UniProt database (www.UniProt.org).

Mutation Analysis

Information on the mutations in AQP-4, Kir4.1, Dp71, α-1 syntrophin, and α-dystrobrevin associated with various diseases was extracted from the corresponding articles (which have been referenced in the introduction and discussion sections). The National Center for Biotechnology Information (www.NCBI.nlh.nih.gov) was used to look up for related publications and served as a platform when searching for specific information related to the DAPC proteins. Databases, including OMIM and DMDM, were consulted to find allelic variants within the amino acid sequences of each protein. For the dystrophin protein, allelic variants were only available for its entire DMD gene, but not for the fifth isoform, Dp71, which we were interested in. However, the UniProt website provided the information necessary to convert DMD gene to Dp71. A special protocol was developed that allowed us to interpret sequence information from the DMD and change it to the correct sequence numbering when converting from one isoform to the other. In this way it was possible to find allelic variants related to the Dp71 isoform.

Single Nucleotide Polymorphisms Analysis

The Single Nucleotide Polymorphisms (SNPs) for each protein in the DAPC were identified using the Ensemble Genome Browser website (http://useast.ensembl.org). Each DAPC protein and all related variations, mutations and SNPs were found within the corresponding genome browser. Genetic variations for each protein analyzed were presented by the website as tables, with the total number of variants sub-divided into categories such as missense variants, stop gain/loss variants, synonymous variants, splice region variants, etc. These tables were downloaded for each of the proteins in the complex. Every type of variation, for each respective protein, was compiled into an excel spreadsheet, where it was sorted according to the location of the amino acid substitutions. In this regard, every SNP that had an assigned/known AA variant was analyzed. Variations that did not correspond to specific locations in a proteins amino acid (AA) sequence were discarded, as these did not give any relevant SNP data. In most cases, a majority of the variation table was composed of variants that did not correspond to specific AA locations, such as intron variants and downstream un-translated region (UTR) variants. The AA substitutions were separated according to whether they were affecting polar or non-polar residues and then summed. The corresponding data are listed in Table 12.
Table 12

SNPs causing amino acid substitutions in the DAPC proteins.

AQP-4PolarArgAsnAspCysGluGlnHisLysSerThrStop gained
Counts3861231721201311520
Non-polarAlaGlyIleLeuMetPheProTrpTyrVal
Counts34173949169186432
Kir4.1PolarArgAsnAspCysGluGlnHisLysSerThrStop gained
Counts2147510963652
Non-polarAlaGlyIleLeuMetPheProTrpTyrVal
Counts5528138218
α-1 SyntrophinPolarArgAsnAspCysGluGlnHisLysSerThrStop gained
Counts1336567451592
Non-polarAlaGlyIleLeuMetPheProTrpTyrVal
Counts158512245327
Dp71PolarArgAsnAspCysGluGlnHisLysSerThrStop gained
Counts1985555571350
Non-polarAlaGlyIleLeuMetPheProTrpTyrVal
Counts74512126126
α-DystrobrevinPolarArgAsnAspCysGluGlnHisLysSerThrStop gained
Counts318548129215170
Non-polarAlaGlyIleLeuMetPheProTrpTyrVal
Counts171151372135311
Disorder predictions were performed for each of the wild type and mutant proteins by creating complete, unique amino acid sequences based on each SNP. We did this by taking the wild type sequences of each respective protein and substituting a single unique variant at its appropriate position to create a new sequence that differed from the wild type by a single amino acid. Therefore, each SNP had a complete amino acid sequence associated with it that could be used for disorder prediction analysis and which could then be compared to the wild type sequences. To see the effect of SNPs on protein disorder characteristics and to check the statistical meaning of disease-causing SNPs a paired T-test between the wild type protein and each variant was performed. In the paired T-test, significance level of 0.05 was utilized. For each protein, the disorder probability values were obtained from disorder predictor algorithms, such as PONDR-FIT [127], PONDR® VLXT [128], and PONDR® VSL2 [129]. Also, to verify the effect of disease-causing variants, their locations within the functional motifs or protein domains were checked through ELM (http://elm.eu.org/) and pFam (http://pfam.sanger.ac.uk/) databases.

Binding Site Prediction

To predict the potential binding sites in each protein, the ANCHOR database (http://anchor.enzim.hu/) [123] and the MoRFpred computational tool (http://biomine-ws.ece.ualberta.ca/MoRFpred/ index.html) were used [124].
Table 10

Binding site SNP in Dp71.

RegionSNP IDSNP TypeVariationAA Position
Syntrophin Domain (362–412, blue)rs190867451SynonymousC395
TMP_ESP_X_31187673MissenseN/K399
TMP_ESP_X_31187671MissenseQ/R400
Dystrobrevin Domain (420–460) greenrs149723217MissenseI/V428
rs139547504MissenseR/H445
COSM216862MissenseR/L445
COSM216863MissenseR/H445
rs150957972MissenseS/T456
rs145123497MissenseS/Y456
TMP_ESP_X_31165574MissenseL/V458
rs72466538SynonymousP459
  128 in total

1.  Unexpected modes of PDZ domain scaffolding revealed by structure of nNOS-syntrophin complex.

Authors:  B J Hillier; K S Christopherson; K E Prehoda; D S Bredt; W A Lim
Journal:  Science       Date:  1999-04-30       Impact factor: 47.728

Review 2.  Protein folding revisited. A polypeptide chain at the folding-misfolding-nonfolding cross-roads: which way to go?

Authors:  V N Uversky
Journal:  Cell Mol Life Sci       Date:  2003-09       Impact factor: 9.261

3.  Intrinsic protein disorder and protein-protein interactions.

Authors:  Wei-Lun Hsu; Christopher Oldfield; Jingwei Meng; Fei Huang; Bin Xue; Vladimir N Uversky; Pedro Romero; A Keith Dunker
Journal:  Pac Symp Biocomput       Date:  2012

Review 4.  Dystroglycan: an extracellular matrix receptor linked to the cytoskeleton.

Authors:  M D Henry; K P Campbell
Journal:  Curr Opin Cell Biol       Date:  1996-10       Impact factor: 8.382

Review 5.  Flexible nets. The roles of intrinsic disorder in protein interaction networks.

Authors:  A Keith Dunker; Marc S Cortese; Pedro Romero; Lilia M Iakoucheva; Vladimir N Uversky
Journal:  FEBS J       Date:  2005-10       Impact factor: 5.542

Review 6.  Structural disorder throws new light on moonlighting.

Authors:  Peter Tompa; Csilla Szász; László Buday
Journal:  Trends Biochem Sci       Date:  2005-09       Impact factor: 13.807

Review 7.  Role of dystrophin and utrophin for assembly and function of the dystrophin glycoprotein complex in non-muscle tissue.

Authors:  T Haenggi; J-M Fritschy
Journal:  Cell Mol Life Sci       Date:  2006-07       Impact factor: 9.261

8.  Identification of alpha-syntrophin binding to syntrophin triplet, dystrophin, and utrophin.

Authors:  B Yang; D Jung; J A Rafael; J S Chamberlain; K P Campbell
Journal:  J Biol Chem       Date:  1995-03-10       Impact factor: 5.157

Review 9.  Intrinsically disordered proteins in human diseases: introducing the D2 concept.

Authors:  Vladimir N Uversky; Christopher J Oldfield; A Keith Dunker
Journal:  Annu Rev Biophys       Date:  2008       Impact factor: 12.981

10.  Microlesions and polymorphisms in the Duchenne/Becker muscular dystrophy gene.

Authors:  F Rininsland; J Reiss
Journal:  Hum Genet       Date:  1994-08       Impact factor: 4.132

View more
  4 in total

Review 1.  Multi-functionality of proteins involved in GPCR and G protein signaling: making sense of structure-function continuum with intrinsic disorder-based proteoforms.

Authors:  Alexander V Fonin; April L Darling; Irina M Kuznetsova; Konstantin K Turoverov; Vladimir N Uversky
Journal:  Cell Mol Life Sci       Date:  2019-08-19       Impact factor: 9.261

Review 2.  p53 Proteoforms and Intrinsic Disorder: An Illustration of the Protein Structure-Function Continuum Concept.

Authors:  Vladimir N Uversky
Journal:  Int J Mol Sci       Date:  2016-11-10       Impact factor: 5.923

Review 3.  Life in Phases: Intra- and Inter- Molecular Phase Transitions in Protein Solutions.

Authors:  Vladimir N Uversky; Alexei V Finkelstein
Journal:  Biomolecules       Date:  2019-12-08

Review 4.  New technologies to analyse protein function: an intrinsic disorder perspective.

Authors:  Vladimir N Uversky
Journal:  F1000Res       Date:  2020-02-10
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.