| Literature DB >> 22912569 |
Iga Korneta1, Janusz M Bujnicki.
Abstract
The spliceosome is a molecular machine that performs the excision of introns from eukaryotic pre-mRNAs. This macromolecular complex comprises in human cells five RNAs and over one hundred proteins. In recent years, many spliceosomal proteins have been found to exhibit intrinsic disorder, that is to lack stable native three-dimensional structure in solution. Building on the previous body of proteomic, structural and functional data, we have carried out a systematic bioinformatics analysis of intrinsic disorder in the proteome of the human spliceosome. We discovered that almost a half of the combined sequence of proteins abundant in the spliceosome is predicted to be intrinsically disordered, at least when the individual proteins are considered in isolation. The distribution of intrinsic order and disorder throughout the spliceosome is uneven, and is related to the various functions performed by the intrinsic disorder of the spliceosomal proteins in the complex. In particular, proteins involved in the secondary functions of the spliceosome, such as mRNA recognition, intron/exon definition and spliceosomal assembly and dynamics, are more disordered than proteins directly involved in assisting splicing catalysis. Conserved disordered regions in spliceosomal proteins are evolutionarily younger and less widespread than ordered domains of essential spliceosomal proteins at the core of the spliceosome, suggesting that disordered regions were added to a preexistent ordered functional core. Finally, the spliceosomal proteome contains a much higher amount of intrinsic disorder predicted to lack secondary structure than the proteome of the ribosome, another large RNP machine. This result agrees with the currently recognized different functions of proteins in these two complexes.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22912569 PMCID: PMC3415423 DOI: 10.1371/journal.pcbi.1002641
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Intrinsic disorder content of the various groups of core spliceosome proteins.
In deeper shades are marked the values for all proteins of the snRNP subunits of the major spliceosome (“snRNP proteins, major spl.”) and for all the proteins of the major spliceosome (“all proteins, major spl.”). The orange line indicates means calculated per-protein (disorder fraction was calculated for each protein first, and then a mean was taken out of this) while the green line indicates means calculated per-residue (the number of all disordered residues in a protein group divided by the total length of proteins in the group). Per-residue means are indicated above the line. Spliceosome protein groups are ordered according to per-residue means.
Figure 2Types of disorder in core spliceosomal proteins.
Compositionally biased disorder (Y-axis) vs. disorder with SS (X-axis). Datapoints are colored according to predicted total per-residue disorder content. Groups of all proteins of the major spliceosome and all proteins of the snRNP subunits of the major spliceosome are indicated in bold.
Figure 3Disorder in core vs. non-abundant spliceosome proteins.
Blue bars indicates values of intrinsic disorder content for core proteins, green bars for both core and additional spliceosome proteins. The blue and green lines indicate means for given protein groups, calculated per-residue. In deeper shade, values for all core (blue) and all (green) proteins associated with the major spliceosome.
Post-translational modifications in 252 spliceosome proteins.
| Modification | Structural order | Disorder with SS | RS-like | Poly-P/Q | hnRNP-like G-rich | Noncharged | Charged | Other disorder | Total | Percent |
| Phosphorylation (*) | 158 | 326 | 572 | 137 | 82 | 43 | 49 | 412 | 1779 | 82.6% |
| Lysine N-acetylation | 127 | 30 | 12 | 4 | 6 | 0 | 3 | 27 | 209 | 9.7% |
| Other N-acetylation (**) | 14 | 20 | 1 | 0 | 1 | 2 | 2 | 44 | 84 | 3.9% |
| Arginine methylations (***) | 5 | 2 | 13 | 4 | 42 | 2 | 0 | 6 | 74 | 3.4% |
| Lysine methylations (****) | 3 | 0 | 2 | 0 | 0 | 0 | 0 | 1 | 6 | 0.3% |
| Cysteine methyl ester | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0.0% |
(*) S,T and Y phosphorylation.
(**) N-terminal acetylation of MGASTV.
(***) Includes the keywords “dimethylarginine”, “asymmetric dimethylarginine”, “omega-N-methylarginine”.
(****) Includes the keywords “N6-methyllysine”, “N6, N6-dimethyllysine”, “N6, N6, N6-trimethyllysine”.
Regions predicted to be disordered, found to be ordered in experimentally solved complexes of spliceosomal proteins.
| Region | Type | Protein | Region | Protein group | Partner (*) | Predicted ordered/disordered status in isolation | Structure | Reference |
| N-U1snRNP70_N | MoRF | U1-70K | 8–22 | U1 snRNP | U1-C (zf-U1) | disordered, next to ordered helix | 3CW1 |
|
| C-U1snRNP70_N | short, RNA-binding | U1-70K | 63–89 | U1 snRNP | U1 snRNA | disordered | 3CW1 |
|
| ULM (**) | MoRF | SF3b155 | 333–342 | U2, SF3B | SPF45 (UHM) | disordered | 2PEH |
|
| ULM | MoRF | U2AF65 | 90–112 | U2 snRNP-related | U2AF35 (UHM) | disordered | 1JMT |
|
| ULM | MoRF | SF1 | 13–25 | A-complex (***) | U2AF65 (UHM) | disordered | 1O0P |
|
| SF3b1 | MoRF | SF3b155 | 377–415 | U2, SF3B | SF3b14a/p14 (RRM) | partially ordered | 2F9D |
|
| SF3a60_bindingd | Domain-length | SF3a60 | 71–106 | U2, SF3A | SF3a120 (Surp) | partially ordered | 2DT7 |
|
| PRP4 | Domain-length | U4/U6-60K | 107–137 | U4/U6 di-snRNP | U4/U6-20K | partially ordered | 1MZW |
|
| PRP4 (****) | Domain-length | Prp18 | 77–115 | step 2 factors | ordered | 2DK4 | ||
| Btz | Domain-length | MLN51 | 169–196, 215–230 | EJC | EIF4A3 | disordered, next to ordered helix | 2J0S |
|
(*) Domain names in brackets.
(**) ULMs correspond to the ELM motif LIG_ULM_U2AF65_1, defined by the pattern [KR]{1,4}[KR]-x{0,1}-[KR]W-x{0,1}.
(***) Non-abundant A-complex protein.
(****) The PRP4 region of Prp18 is ordered and its structure in isolation was solved. It is included in the table since the PRP4 region of U4/U6-60K is predicted to be partially disordered.
“Most highly disordered” proteins in the spliceosomal proteome.
| Abundance | Protein | Disorder fraction | PFAM domains | Group |
| Abundant | SPF30 | 80.3% | SMN | U2 snRNP-related |
| U4/U6.U5-110K | 87.9% | SART-1 | U4/U6.U5 trisnRNP | |
| U4/U6.U5-27K | 76.8% | DUF1777 | U4/U6.U5 trisnRNP | |
| CCAP2 | 78.2% | Cwf_Cwc_15 | hPrp19/CDC5L | |
| TRAP150 | 100.0% | A-complex | ||
| MFAP1 | 79.3% | MFAP1_C | B-complex | |
| RED | 79.5% | RED_N, RED_C | B-complex | |
| MGC23918 | 100.0% | cwf18 | B-act complex | |
| HSPC220 | 84.8% | Hep_59 | C-complex | |
| GCIP p29 | 93.0% | SYF2 | C-complex | |
| Non-abundant | U11/U12-59K | 91.1% | U11/U12 | |
| Npw38BP | 93.8% | Wbp11 | hPrp19/CDC5L | |
| MLN51 | 100.0% | Btz | EJC | |
| pinin | 92.3% | Pinin_SDK_N, Pinin_SDK_memA | EJC | |
| MGC13125 | 93.5% | Bud13 | RES | |
| C19orf43 | 88.6% | A-complex | ||
| FLJ10154 | 100.0% | A-complex | ||
| CCDC55 | 100.0% | DUF2040 | B-complex | |
| CCDC49 | 100.0% | CWC25 | B-complex | |
| PRCC | 100.0% | PRCC_Cterm | B-act complex | |
| DGCR14 | 86.1% | Es2 | C-complex | |
| DKFZP586O0120 | 100.0% | DUF1754 | C-complex | |
| FLJ22626 | 100.0% | SynMuv_product | C-complex | |
| LENG1 | 100.0% | Cir_N | C-complex | |
| BCLAF1 | 100.0% | pre-mRNA/mRNA-binding |
Entries in this table fulfill simultaneously two conditions: they have a predicted disorder content >75%, and do not contain any PFAM domains that correspond to ordered structural domains.
Statistics of conserved ordered and disordered PFAM domains.
| ordered domains | disordered domains | |||||
| all proteins | abundant proteins | U4/U6.U5 tri-snRNP (*) | all proteins | abundant proteins | U4/U6.U5 tri-snRNP | |
| all domains | 124 | 86 | 29 | 46 | 24 | 5 |
| domains found in LECA | 121 | 86 | 29 | 36 | 22 | 5 |
| domains found in prokaryotes (**) | 47 (37.9%) | 34 (39.5%) | 19 (65.5%) | 1 (0.0%) | 0 (0.0%) | 0 (0.0%) |
(*) Including the LSM domain present in Sm and Lsm proteins.
(**) In >100 copies.
Features of intrinsic disorder in E. coli and human ribosomes and human major spliceosome snRNP subunits.
| Feature | Ribosome, | Ribosome, human | Major spliceosome, snRNP subunits, human |
| Number of proteins | 54 | 80 | 45 |
| Maximum protein length (aa) | 557 (S1) | 427 (L4) | 2335 (U5-220K/hPrp8) |
| Mean protein length (aa) | 132 | 170 | 453 |
| Fraction of predicted disorder (% of the combined lengths of proteins) | 37.7% | 47.0% | 34.1% |
| Number of proteins with at least one IDR ≥30 residues | 28 | 61 | 28 |
| Number of proteins with at least one IDR ≥70 residues | 1 | 19 | 23 |
| Mean IDR length (aa) | 28 | 39 | 93 |
| Fraction of predicted disordered residues with secondary structure (% predicted disorder) | 66.6% | 64.0% | 41.9% |
| Number of non-PSE IDRs ≥70 residues | 0 | 3 | 15 |
| Fraction of predicted disordered residues found in the crystal structure of the complex (% of predicted disorder) | 98.9% | — | <10% (U1 snRNP) |
| Minimal and maximal fractions of predicted disordered residues for individual subunits | 34.8% (small subunit) - 40.0% (large subunit) | 39.1% (small subunit) - 52.2% (large subunit) | 20.1% (U5 snRNP) - 65.5% (U1 snRNP) |
| Maximum RNA length (nt) | 2904 (23S) | 5070 (28S) | 188 (U2 snRNA)(*) |
| RNA fraction of total weight (% total weight) | 65.2% | 60.3% | 8.2% |
(*) Saccharomyces cerevisiae U1 snRNA is 570 nts long, while the U2 snRNA is 1172 nts long. Such exceptional lengths are restricted to the genus Saccharomyces.
Features of different IDR classes in the 130 spliceosomal proteins.
| IDR class | Description | Number of regions | Mean length | Compositional bias |
| disorder with SS | contains secondary structure | 95 (predicted to contain coiled coils), 115 (other types) | 64 aa (predicted to contain coiled coils), 55 aa (other types) | RKDE with additional MQW (predicted to contain coiled coils), no rule (other types) |
| compositionally biased, RS-like | biased towards arginine and serine residues | 35 | 65 aa | RS |
| compositionally biased, polyP/Q | noncharged with poly P/Q (P/Q(n), n≥3)) repeats | 17 | 138 aa | PQMGVWA |
| compositionally biased, hnRNP G-rich | contains RGG and related repeats ([RSY]GG, R[AGT][AGTFIVR]) (*) | 4 (hnRNP proteins), 10 (other proteins) | 145 aa (hnRNP proteins), 56 aa (other proteins) | GRY |
| compositionally biased, noncharged | biased towards noncharged residues | 16 | 45 aa | PQMGVWA |
| compositionally biased, charged | biased towards charged residues | 9 | 57 aa | RKDE |
(*) [72]: XGG, where X aromatic or long aliphatic; arginine methylation data: R[AGT][AGTFIVR].