| Literature DB >> 20972208 |
Hedi Hegyi1, Lajos Kalmar, Tamas Horvath, Peter Tompa.
Abstract
According to current estimations ∼95% of multi-exonic human protein-coding genes undergo alternative splicing (AS). However, for 4000 human proteins in PDB, only 14 human proteins have structures of at least two alternative isoforms. Surveying these structural isoforms revealed that the maximum insertion accommodated by an isoform of a fully ordered protein domain was 5 amino acids, other instances of domain changes involved intrinsic structural disorder. After collecting 505 minor isoforms of human proteins with evidence for their existence we analyzed their length, protein disorder and exposed hydrophobic surface. We found that strict rules govern the selection of alternative splice variants aimed to preserve the integrity of globular domains: alternative splice sites (i) tend to avoid globular domains or (ii) affect them only marginally or (iii) tend to coincide with a location where the exposed hydrophobic surface is minimal or (iv) the protein is disordered. We also observed an inverse correlation between the domain fraction lost and the full length of the minor isoform containing the domain, possibly indicating a buffering effect for the isoform protein counteracting the domain truncation effect. These observations provide the basis for a prediction method (currently under development) to predict the viability of splice variants.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20972208 PMCID: PMC3045584 DOI: 10.1093/nar/gkq843
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Human alternatively spliced proteins in Swissprot with minimum two isoforms in PDB (FGFR2_HUMAN has three isoforms in PDB)
| Swissprot ID | Isoform numbers | PDB ID1 | PDB ID2 | Alt splic type | Difference | Description |
|---|---|---|---|---|---|---|
| CHKA_HUMAN | Isoforms 1,2 | 3f2r_A | 2cko_A | Insertion* | 18 | Longer isoform 3f2r has an insertion of 18 amino acids disordered in the longer variant. In 2cko, 7 residues around the site of insertion are also disordered |
| CRK_HUMAN | Isoforms 1,2 | 2eyz_A | 2eyy_A | C-term + domain | 96 | Longer isoform 2eyz has three SH3 domains whereas shorter 2eyy has only two. The shorter version is oncogenic |
| CSF3_HUMAN | Isoforms 1,2 | 1gnc_A | 1pgr_A | Insertion* | 3 | Longer isoform 1gnc_A has a 3 amino acid insertion, missing from structure (i.e. disordered) |
| EDA_HUMAN | Isoforms 1,3 | 1rj7_A | 1rj8_A | Insertion | 2 | 1rj7_A has a 2-amino-acid insertion compared to 1rj8_A, structure determined. Distinct receptor specificities |
| FGFR2_HUMAN | Isoforms 19,14 | 1nun_B | 3dar_A | C-term + domain | 120 | 1nun_B consists of two Ig-like domains, 3dar_A has only one |
| FGFR2_HUMAN | Isoforms 1,19 | 1djs_A | 1nun_B | Insertion | 2 | 1djs_A has a 2-amino-acid insertion compared to 1nun_B, both structured |
| GHR_HUMAN | Isoforms 1,4 | 1axi_B | 2aew_A | Insertion* | 27 | 1axi_B is 27-amino-acid longer in N-terminus, disordered |
| GNAS2_HUMAN | Isoforms 1,2 | 1azs_C | 1cul_C | Insertion* | 14 | 1azs_C has a 14-amino-acid insertion compared to 1cul_C, disordered; 1 cul_C has 5-amino-acid disordered around the place of insertion |
| KHK_HUMAN | Isoforms 1,2 | 2hqq_A | 3b3l_A | Alt exon | 44 | Same length substitution (alternative exons), both structures are fully ordered |
| MK08_HUMAN | Isoforms 1,3 | 3elj_A | 1ukh_A | Alt exon | 12 | Same length substitution (alternative exons), both structures are fully ordered |
| NRX1A_HUMAN | Isoforms 1,2 | 2r1b_A | 3bod_A | Insertion* | 30 | 2r1b_A has a 30-amino-acid insertion, half of it is ordered and forms a protruding-long helix |
| PTN13_HUMAN | Isoforms 1,4 | 1q7x_A | 3pdz_A | Insertion | 5 | 1q7x_A has a 5-amino-acid insertion, fully ordered; they have a different affinity to tumor suppressor protein APC |
| RAC1_HUMAN | Isoforms 1,2 | 1ryf_A | 1i4t_D | Insertion* | 19 | 1ryf_A has a 19-amino-acid insertion, disordered, also in the preceeding 15 amino acids; the insertion induces a conformational change, PMID:14625275 |
| ST2B1_HUMAN | Isoforms 1,2 | 1q1z_A | 1q1q_A | Alt N-term | 10 | 1q1q_A and 1q1z_A both form helical structures in their N-termini upon pregnenolone binding, have different specificity |
| UAP1_HUMAN | Isoforms 1,2 | 1jvd_A | 1jv1_A | Insertion* | 17 | 1jvd_A has a 17-amino-acid insertion, fully disordered. 1jv1_A is structured at the place of insertion |
The isoform numbers and the PDB identifiers representing the two isoforms are listed. The type of AS and the number of residues the two PDBs differ from each other (Difference) are indicated. Asterisk next to the number shows if intrinsic disorder was also observed in one or both isoforms.
Figure 5.CHASA-diff analysis of 1c47 phosphoglucomutase-1 with SCOP domain boundaries indicated by vertical lines. The domain boundaries separating the four domains in the PDB structure coincide with minima in CHASA-diff, indicating relatively small hydrophobic surface areas.
Figure 1.‘Retained portion’ of the truncated domains (i.e. the remaining part divided by the full length of the domain) versus their ‘relative length’ related to the full length of the containing minor isoform (i.e. the remaining part of domain divided by the full length of the protein). (A) Each truncated domain indicated by a dot, shown for the ‘verified’ group. (B) Same data as in (A) but the population of the four quadrangles indicated with percentage numbers. (C) Data for the named group, same representation as in (B). (D) Data for the total number of alternative splice variants in Swissprot.
Figure 2.Percentage distribution of relative domain size of domains truncated by AS in the ‘named’/all Swissprot/randomized Swissprot sets. Bin size increment is 0.1.
Figure 3.Frequency distribution of percentage disorder in protein regions deleted (A) substituted (B) or disrupted by an insertion (C) by AS in the ‘named’ group/all of Swissprot/randomized Swissprot. The three groups were significantly different from one another, established by χ2-tests (P < 0.05), regarding deletions (A) and substitutions (B) but not for insertions (C), due to the small sample number of the named group. For further details see ‘Results’ section.
Figure 4.Structural isoforms of alternatively spliced human proteins. (A) Structural alignment of the major and minor isoform in the PDZ2 domain of the protein tyrosine phosphatase PTN13_HUMAN. The two splice variants have a different affinity to the tumor suppressor protein APC (35). Isoform 4 has a 5-residue-long insertion (colored bright red) compared to the main isoform. This was the longest insertion in a splice variant in PDB where both variants are ordered at the alternative splice site. (B) The structural alignment of the two isoforms of the hexokinase KHK_HUMAN (PDB codes: 2hqqA, 3b3lA). The minor isoform contains a 44-residue substitution, which represents a paralogous local exon duplication (indicated with solid blue and red colors in 2 hqqA and 3b3lA, respectively). AS of the KHK gene selects either one or the other of two adjacent, homologous 135-bp exons.
Truncated PDB chain/SCOP domain matches with a human Swissprot alternative splice variant and acceptable truncation points based on CHASA-diff analysis
| Swissprot splice variant | PDB chain | PDB | Sw | Percentage ID | len PDB | len Sw | Potential breakpoints | Chasa | TrDom/ Swlen | SCOP domain | SCOP domain | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Beg | End | beg | end | beg | end | len | |||||||||
| O15392-3|BIRC5_HUMAN | 1xoxA | 1 | 76 | 1 | 76 | 96 | 117 | 137 | 107,117, | + | 0.51 | d1xoxa1 | 1 | 70 | 111 |
| O60869-2|EDF1_HUMAN | 1x57A | 8 | 66 | 71 | 129 | 100 | 91 | 139 | + | 0.42 | d1x57a1 | 1 | 59 | 78 | |
| P06733-2|ENOA_HUMAN | 1pdyA | 94 | 433 | 1 | 338 | 75 | 434 | 341 | 433,233, | – | 0.12 | d1pdya2 | 93 | 137 | 139 |
| P37231-2|PPARG_HUMAN | 1hraA | 8 | 79 | 109 | 179 | 63 | 80 | 475 | 80,30, | + | 0.15 | d1hraa_ | 8 | 79 | 80 |
| P51991-2|ROA3_HUMAN | 1x4bA | 22 | 109 | 7 | 94 | 76 | 116 | 356 | 110,100, | + | 0.25 | d1x4ba1 | 15 | 102 | 103 |
| P55957-4|BID_HUMAN | 2bidA | 99 | 197 | 1 | 99 | 100 | 197 | 99 | 197, | + | 1.00 | d2bida_ | 99 | 197 | 197 |
| P62993-2|GRB2_HUMAN | 1gbrA | 10 | 68 | 1 | 59 | 100 | 74 | 176 | + | 0.34 | d1gbra_ | 10 | 68 | 74 | |
| P62993-2|GRB2_HUMAN | 1fhsA | 50 | 112 | 60 | 122 | 100 | 112 | 176 | 102,112, | + | 0.36 | d1fhsa_ | 50 | 112 | 112 |
| Q01167-2|FOXK2_HUMAN | 1d5vA | 4 | 84 | 258 | 338 | 61 | 94 | 614 | 94,89, | + | 0.13 | d1d5va_ | 4 | 84 | 94 |
| Q07820-2|MCL1_HUMAN | 1wsxA | 6 | 65 | 171 | 230 | 85 | 162 | 271 | 152,162, | + | 0.22 | d1wsxa_ | 6 | 65 | 162 |
| Q9NR12-2|PDLI7_HUMAN | 1wf7A | 14 | 100 | 11 | 91 | 62 | 103 | 423 | 93,98,103, | + | 0.21 | d1wf7a_ | 14 | 100 | 103 |
| Q9NR12-3|PDLI7_HUMAN | 1wf7A | 14 | 100 | 11 | 91 | 62 | 103 | 153 | 93,98,103, | + | 0.57 | d1wf7a_ | 14 | 100 | 103 |
The table contains the following columns: Swissprot splice variant, accession number and identifier of splice variant; PDB chain, name and chain of matched PDB entry; PDB beg, PDB end, Sw beg, Sw end: beginning and end of a Blastp match between the sequences of the PDB chain and splice variant, respectively; Percentage ID, proportion of matching residues; len PDB, length of PDB chain in residues; len Sw, length of Swissprot splice variant; potential breakpoints, positions in the PDB chain where a truncation would be permitted by CHASA-diff, based on the exposed hydrophobicity value differences from an ideal value for an intact globular domain of that size (see text for more details), bold residue number is the closest to the actual truncation in the splice variant in question; Chasa, + if there is an acceptable minimum in the vicinity of the breakpoint (within 13 residues in the sequence) as determined by Chasa-diff, – otherwise; TrDom/Swlen, truncated domain length divided by the total length of Swissprot; SCOP domain, matching SCOP domain identifier; SCOP domain beg, end and len, beginning, end of match and length of SCOP domain, respectively.
Figure 6.Selected truncated domains from Table 2, PDB structures and CHASA-diff hydrophobic values. The structural portions shown in gray are removed by AS. In the CHASA-diff graphs the position of truncation is indicated by a vertical line. (A) 1x57A. The AS removes residues 67–91, which are actually disordered, thus no hydrophobicity issues arise in this region. CHASA-diff usually predicts negative values for disordered segments. (B) 1pdyA. This was the only match between a verified splice variant and a PDB structure (with 75% sequence identity between isoform 2 of ENOA_HUMAN and 1pdyA) where the calculated exposed hydrophobic surface (CHASA difference between the actual and theoretical values) at the position of truncation at residue 94 was ‘prohibitively’ high. Interestingly, Pfam predicts a coiled-coil structure for this region over a 20-residue stretch. (C) 2bidA. Our method allows a truncation at residue 87, the actual splice is at residue 99, which is within the accepted vicinity, set at 13 residues.