| Literature DB >> 19888473 |
Hedi Hegyi1, László Buday, Peter Tompa.
Abstract
Chromosomal translocations, which often generate chimeric proteins by fusing segments of two distinct genes, represent the single major genetic aberration leading to cancer. We suggest that the unifying theme of these events is a high level of intrinsic structural disorder, enabling fusion proteins to evade cellular surveillance mechanisms that eliminate misfolded proteins. Predictions in 406 translocation-related human proteins show that they are significantly enriched in disorder (43.3% vs. 20.7% in all human proteins), they have fewer Pfam domains, and their translocation breakpoints tend to avoid domain splitting. The vicinity of the breakpoint is significantly more disordered than the rest of these already highly disordered fusion proteins. In the unlikely event of domain splitting in fusion it usually spares much of the domain or splits at locations where the newly exposed hydrophobic surface area approximates that of an intact domain. The mechanisms of action of fusion proteins suggest that in most cases their structural disorder is also essential to the acquired oncogenic function, enabling the long-range structural communication of remote binding and/or catalytic elements. In this respect, there are three major mechanisms that contribute to generating an oncogenic signal: (i) a phosphorylation site and a tyrosine-kinase domain are fused, and structural disorder of the intervening region enables intramolecular phosphorylation (e.g., BCR-ABL); (ii) a dimerisation domain fuses with a tyrosine kinase domain and disorder enables the two subunits within the homodimer to engage in permanent intermolecular phosphorylations (e.g., TFG-ALK); (iii) the fusion of a DNA-binding element to a transactivator domain results in an aberrant transcription factor that causes severe misregulation of transcription (e.g. EWS-ATF). Our findings also suggest novel strategies of intervention against the ensuing neoplastic transformations.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19888473 PMCID: PMC2768585 DOI: 10.1371/journal.pcbi.1000552
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Length of proteins and genes involved in translocation.
(A) Ratio of all human and translocating proteins as a function of protein length, shown by increments of 100 amino acids. The data were fitted with a power function. (B) The percentage distribution of the ratio of protein and gene length for translocating partner proteins (cyan) and all human proteins (magenta). Both functions show a linear tendency when represented on a logarithmic scale (as shown here), which is characteristic of the power law function. The explicit trendline is also shown for both sets of proteins.
Figure 2Structural disorder in proteins involved in translocation.
Intrinsic disorder percentage distribution of translocating proteins. The percent intrinsic disorder of proteins involved in chromosomal translocation with (255, blue diamonds) and without (151, orange triangles) a known breakpoint and all human proteins in Swissprot (18,609, magenta squares), is shown. Bin size is 10%. Each of the 406 translocating proteins represents a different gene, the longest known protein isoform was chosen for each. Gray dotted line shows the disorder distribution of 500 randomly selected sets of 255 human Swissprot proteins with length that match that of the 255 translocating proteins with breakpoints.
Figure 3Structural disorder in translocation proteins and other proteins.
(A) Structural disorder in translocation proteins was predicted separately for the protein providing the N-terminal (blue) and C-terminal (magenta) segment of the fusion protein generated by the translocation event. Error bars represent SDs. (B) Mean disorder values for translocating proteins (tp), N-, and C-terminal segments (tp-N-seg, tp-C-seg) and fusion products (fusp) were predicted with IUPred. For reference, the mean disorder of proteins in the PDB database (pdb40, 40% non-redundant in sequence similarity), all human proteins in Swissprot (SwissProt) and experimentally determined disordered proteins/segments in DisProt (DisProt), are also shown. Error bars represent SDs.
Truncated domains in the fusion proteins.
| fusion protein | fplen | bp | Pfid | Dlen | Dbeg | Dend | N/C | Dfract | IUleft | IUright | Pfam Desc | Mode of Survival? |
|
| 1966 |
| PF10390 | 237 | 4 | 237 |
| 0.99 |
|
| RNA pol II ef | no PDB |
|
| 488 |
| PF04692 | 76 | 2 | 76 |
| 0.99 |
|
| Platelet gf | almost full domain |
|
| 553 |
| PF02312 | 171 | 1 | 165 |
| 0.96 |
|
| Core bind f α | almost full domain |
|
| 806 |
| PF00853 | 135 | 1 | 130 |
| 0.96 |
|
| Runt domain | almost full domain |
|
| 811 |
| PF00023 | 33 | 3 | 33 |
| 0.94 |
|
| Ankyrin | short repeats |
|
| 784 |
| PF00261 | 237 | 1 | 210 |
| 0.89 |
|
| Tropomyosin | coiled-coil |
|
| 390 |
| PF00076 | 70 | 1 | 60 |
| 0.86 |
|
| RNA rec motif | 2dgs_A termini IUP (11, 14 aa) |
|
| 298 |
| PF00715 | 144 | 1 | 123 |
| 0.85 |
|
| Interleukin 2 | artefact |
|
| 1653 |
| PF02198 | 87 | 14 | 87 |
| 0.85 |
|
| SAM domain | 1×66_A N-terminal 18 aa IUP |
|
| 422 |
| PF04621 | 333 | 61 | 333 |
| 0.82 |
|
| PEA3 ETS tf N | no PDB |
|
| 1462 |
| PF00270 | 171 | 1 | 133 |
| 0.78 |
|
| DEAD b h-ase |
|
|
| 2599 |
| PF00628 | 53 | 13 | 53 |
| 0.77 |
|
| PHD-finger |
|
|
| 825 |
| PF03792 | 200 | 56 | 200 |
| 0.73 |
|
| PBC domain | no PDB |
|
| 1212 |
| PF00637 | 140 | 1 | 101 |
| 0.72 |
|
| Clathrin rep | elongated coil of a-helices |
|
| 2308 |
| PF05110 | 1201 | 338 | 1201 |
| 0.72 |
|
| AF-4 oncoprot | no PDB |
|
| 1184 |
| PF05110 | 1205 | 341 | 1205 |
| 0.72 |
|
| AF-4 oncoprot | no PDB |
|
| 2207 |
| PF01576 | 858 | 1 | 605 |
| 0.71 |
|
| Myosin tail | no PDB |
|
| 1317 |
| PF09311 | 196 | 1 | 127 |
| 0.65 |
|
| Rab5 binding | coiled-coil |
|
| 563 |
| PF03066 | 199 | 1 | 121 |
| 0.61 |
|
| Nucleoplasmin | nucleophosmin 2p1b_H, 122 aa |
|
| 2089 |
| PF00092 | 174 | 1 | 100 |
| 0.57 |
|
| VWA |
|
|
| 488 |
| PF01391 | 60 | 1 | 34 |
| 0.57 |
|
| Collagen | coiled-coil, repeat |
|
| 622 |
| PF07714 | 261 |
|
|
| 0.55 |
|
| Tyr kinase |
|
|
| 1910 |
| PF00250 | 98 | 50 | 98 |
| 0.51 |
|
| Fork head |
|
|
| 1005 |
| PF00769 | 240 | 1 | 111 |
| 0.46 |
|
| Ezrin |
|
|
| 553 |
| PF01576 | 858 | 512 | 858 |
| 0.40 |
|
| Myosin tail | no PDB |
|
| 2599 |
| PF00069 | 288 | 1 | 114 |
| 0.40 |
|
| Protein kinase |
|
|
| 1571 |
| PF05337 | 268 | 181 | 268 |
| 0.33 |
|
| Mphag CSF1 | no PDB |
|
| 792 |
| PF01808 | 328 | 1 | 96 |
| 0.29 |
|
| AIC/IMPCHase | 1p4r_A 200(scop) |
|
| 1156 |
| PF04597 | 429 | 1 | 59 |
| 0.14 |
|
| Ribophorin I | no PDB |
|
| 1803 |
| PF07651 | 267 | 237 | 267 |
| 0.12 |
|
| ANTH domain |
|
Nontrivial cases of fusion proteins are shown where breakpoint falls into a Pfam domain. The abbreviated column identifiers are as follows: Pfid, Pfam identifier; fplen, fusion protein length; bp, breakpoint; Dlen, domain length, Dbeg, Dend, domain match beginning and end, respectively; N/C, the retained half of the truncated domain; Dfract, the retained fraction of the truncated domain; IUleft, IUright, the predicted disorder for the truncated domain and its “mirror” (same number of amino acids as in the truncated domain) on the opposite side of the breakpoint. In the IUleft/IUright columns the value for the truncated domain is italicized whereas the disorder value for its “mirror” is shown in bold. In the last column possible strategies are shown for the truncated domains to follow to avoid elimination by the proteasomal degradation system [49]. “No PDB” indicates the lack of any PDB structures associated with the protein family in question, which together with high predicted disorder values raises the suspicion of the domain being intrinsically disordered. When a PDB code is shown with a list of numbers (shown in ) they indicate positions in the actual domains that are presumably indifferent to truncation based on the exposed hydrophobic surface of the truncated domain (as shown in detail in Figure 4).
Figure 4Effect of truncation on the accessible hydrophobic surface of a kinase domain.
(A) Theoretical and actual values for the accessible nonpolar surface area (Anp) of a cyclin-dependent kinase. The C-terminus of the protein structure was gradually truncated and the actual values of Anp (full magenta circles) for the truncated fragments were determined with the CHASA program [22]. They coincide with the theoretical values for an intact domain of the same size (blue contiguous line) around residue 90. (B) Structure of the cyclin-dependent kinase (PDB code 1g3n, chain A). The C-terminal portion missing due to the translocation is colored grey.
Disorder in oncogenic function of fusion proteins.
| Fusion protein (breakpoint 1/breakpoint 2) | GenBank/Swissprot id (length) | Elements of oncogenic function in the fusion proteins | Distance/disorder between oncogenic elements | Reference |
|
| AAB60388 (1271) | Oligomerization domain (BCR, 1–79) | 562/355 |
|
| ABL1_HUMAN (1130) | Tyr kinase (ABL, 642–893) | |||
|
| NPM_HUMAN (294) | Oligomerisation domain (NPM, 1–117) | 60/2 |
|
| ALK_HUMAN (1620) | Tyr kinase (ALK, 176–443) | |||
|
| EMAL4_HUMAN (981) | Basic dimerisation domain (EML4, 31–140) | 415/89 |
|
| Tyr kinase (ALK, 555–822) | ||||
|
| ALK_HUMAN (1620) | HELP/WD domains (EML4, 223–298, 299–327) | 228/12 |
|
| Tyr kinase (ALK, 555–822)) | ||||
|
| NP_003283.2 (2363) | Leu-zipper (TPR, 75–99, 117–141) | 147/26 |
|
| MET_HUMAN (1390) | Tyr-kinase (MET, 287–546) | |||
|
| NP_006061 (400) | Coiled-coil dimerisation domain (TFG, 93–124) | 175/138 |
|
| ALK_HUMAN (1620) | Tyr-kinase (ALK, 299–566) | |||
|
| NP_001978.1 (452) | Sam dimerisation domain (TEL, 38–124) | 558/203 |
|
| JAK2_HUMAN (1154) | Tyr-kinase (JAK2, 682–956) | |||
|
| NP_005234.1 (656) | EAD (EWS, 1–86) | 280/265 |
|
| NP_005162.1 (271) | Leu-zipper (ATF1, 366–425) | |||
|
| HRX_HUMAN (3969) | AT-hooks (MLL, 169–180, 217–227, 301–309) | 2403/1620 |
|
| NP_004371.2 (2442) | HAT domain (CBP, 2412–2649) | |||
|
| HRX_HUMAN (3969) | AT-hooks (MLL, 169–180, 217–227, 301–309) | 1436/1270 |
|
| NP_005925.2 (559) | Trans-activator helix (ENL, 1745–1829) |
We collected functional information for 9 fusion proteins, which suggests that disorder of the region intervening newly joined functional regions contributes to oncogenic function. The table shows the distance and length of disorder separating the two elements required for the oncogenic function.
Figure 5Predicted disorder and domain structure of select fusion proteins.
Disorder predicted by the IUPred algorithm and domain structure identified by Pfam are shown for BCR-ABL, TFG-ALK, and EWS-ATF1, values above 0.5 are considered disordered. The position of the breakpoint is marked by a vertical line in the disorder plot whereas the elements critical for the oncogenic function are shown as colored rectangles (arrowhead for breakpoint) on the domain models below. The critical elements are connected by long segments of structural disorder.