| Literature DB >> 22573172 |
Iga Korneta1, Marcin Magnus, Janusz M Bujnicki.
Abstract
In this work, we describe the results of a comprehensive structural bioinformatics analysis of the spliceosomal proteome. We used fold recognition analysis to complement prior data on the ordered domains of 252 human splicing proteins. Examples of newly identified domains include a PWI domain in the U5 snRNP protein 200K (hBrr2, residues 258-338), while examples of previously known domains with a newly determined fold include the DUF1115 domain of the U4/U6 di-snRNP protein 90K (hPrp3, residues 540-683). We also established a non-redundant set of experimental models of spliceosomal proteins, as well as constructed in silico models for regions without an experimental structure. The combined set of structural models is available for download. Altogether, over 90% of the ordered regions of the spliceosomal proteome can be represented structurally with a high degree of confidence. We analyzed the reduced spliceosomal proteome of the intron-poor organism Giardia lamblia, and as a result, we proposed a candidate set of ordered structural regions necessary for a functional spliceosome. The results of this work will aid experimental and structural analyses of the spliceosomal proteins and complexes, and can serve as a starting point for multiscale modeling of the structure of the entire spliceosome.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22573172 PMCID: PMC3424538 DOI: 10.1093/nar/gks347
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Rules for selecting and producing structural representations of protein regions. From left to right, structural representations decrease in the average confidence.
Statistics of structural domains detected in the human spliceosomal proteome
| Feature | Major spliceosome snRNP | All proteins |
|---|---|---|
| Number of proteins | 45 | 252 |
| Number of residues | 20 390 | 133 040 |
| Number of ordered residues | 13 427 | 63 242 |
| Number of ordered structural domains | 80 | 465 |
| Number of suspected ordered structural domains | 7 | 25 |
| Number of domains predicted to be disordered, but found to be ordered in experimentally determined structures | 3 | 9 |
| Fraction of ordered residues covered by ordered structural domains (%) | 89.6 | 90.3 |
| Fraction of total number of residues covered by ordered and disordered structural domains (%) | 61.0 | 43.4 |
Statistics of ordered structural domains of the human spliceosome according to the SCOP classification
| SCOP ID | Description | Number of domains |
|---|---|---|
| a | All α | 79 |
| b | All β | 83 |
| c | α and β (a/b) | 53 |
| d | α and β (a + b) | 159 |
| e | Multi-domain (α and β) | 1 |
| g | Small | 49 |
Common types of ordered structural domains in the human spliceosomal proteome
| Domain type | Example PFAM domains | Number of copies | Examples of proteins |
|---|---|---|---|
| Small RNA-binding domains | RRM_1 | ≥201 | U1-A, U1-70K, U1-C |
| Small protein disorder-binding domains | WW, FHA, FF, GYF, SMN, SH3_1 | ≥24 | FBP11, U5-52K (CD2BP2) |
| Repeat-based protein-binding domains | Arm, TPR/HAT, HEAT, LRR_4, WD40 repeats | ≥28 | U4/U6-60K (hPrp4), U5-102K (hPrp6), SF3b155, U2-A' |
| Ubiquitin-related domains | Ubiquitin, U-box, zf-UBP, UCH, Rtf2, zf-C3HC4, ZZ, DWNN, RWD, JAB + PROCT | ≥19 | SF3a120, U4/U6.U5-65K, RNF113A |
| Heat shock-related | DnaJ, HSP70, HSP20, CS | ≥6 | CCAP1 |
| Proline isomerase | Pro_isomerase | 8 | U4/U6-20K (PPIH) |
| Stable helicase architectures | DEAD + Helicase_C, DEAD + Helicase_C + HA2 + OB_NTP_bind, (DEAD + Helicase_C + Sec63) × 2, Upf1p-like | ≥19 | hPrp43 (DHX15), U5-200K (hBrr2), KIAA0560 (AQR) |
| Small domains that act as ligands | U1snRNP70_N, SF3b1, PRP4, SF3a60_bindingd | ≥6 | SF3b155, U4/U6-60K (hPrp4) |
| Sm/Lsm domains | LSM | 14 | Sm, Lsm proteins |
aSome RRM domains bind peptide ligands (66).
bThe Surp domain is predicted to bind RNA. However, in the only single structure of a Surp domain in complex (PDB ID: 2DT7), the Surp domain binds a peptide ligand.
cSome zf-C2H2 domains mediate protein binding.
Structural representations of regions of proteins of the human spliceosomal proteome
| Feature | Major spliceosome snRNP | All proteins |
|---|---|---|
| Number of proteins | 45 | 252 |
| Number of residues | 20 390 | 133 040 |
| Number of ordered residues | 13 427 | 63 242 |
| Number of non-redundant experimental models | 20 | 104 |
| Number of non-redundant X-ray models | 11 | 43 |
| Mean resolution of X-ray models (Å) | 2.20 | 2.08 |
| Number of non-redundant NMR models | 9 | 61 |
| Number of non-redundant theoretical models | 49 | 297 |
| Number of non-redundant comparative models | 37 | 255 |
| Number of non-redundant de novo models | 13 | 43 |
| Total number of non-redundant representations | 139 | 803 |
| Number of experimental models containing residues of more than one splicing protein (X-ray/NMR) | 9 (8/1) | 13 (11/2) |
| Total fraction of structural order covered (%) | 91.2 | 92.7 |
| Total fraction of combined protein sequence covered (%) | 64.3 | 48.7 |
Figure 2.Coverage of structural order and disorder with different types of structural models. The values displayed on the graph are the number of residues covered by a given type of structural model, followed by percentage value.
Predicted quality of models of regions of human spliceosomal proteins
| Feature | X-ray | NMR | Comparative | |
|---|---|---|---|---|
| Mean (SD) | Mean (SD) | Mean (SD) | Mean (SD) | |
| Number of models | 43 | 61 | 255 | 43 |
| Predicted RMSD (MetaMQAPII) | 1.90 (0.84) | 3.85 (1.82) | 4.53 (1.96) | 4.02 (1.50) |
| Predicted GDT_TS (MetaMQAPII) | 78.56 (12.78) | 55.94 (19.45) | 47.28 (21.35) | 45.59 (15.85) |
| QMEAN total score | 0.805 (0.087) | 0.744 (0.110) | 0.585 (0.164) | 0.562 (0.132) |
| QMEAN | 0.42 (0.87) | 0.08 (0.86) | −1.30 (1.43) | −1.42 (1.33) |
Figure 3.Models of regions of human splicing proteins divided by quality. This bubble graph displays the numbers of models of different types that belong to different classes of quality. Mean lengthcomp is the mean length of a comparative model of a given quality class.
Ubiquitin-related regions in the spliceosomal proteome
| Type of domain | SCOP ID | PFAM ID | Protein | Protein region | Protein group |
|---|---|---|---|---|---|
| Ubiquitin | d.15.1 | Ubiquitin | SF3a120 | 689,785 | U2 snRNP |
| d.15.1 | Ubiquitin | U11/U12-25K (C16orf33) | 41,132 | U11/U12 di-snRNP | |
| d.15.1 | SAP18 | SAP18 | 18,140 | EJC | |
| d.15.1 | ubiquitin | UBL5 | 1,73 | B complex | |
| d.15.1 | FLJ35382 (C1orf55) | 7,74 | C complex | ||
| d.15.1 | XAP5 | XAP-5 (FAM50A) | 197,283 | C complex | |
| DWNN | d.15.2 | DWNN | RBQ-1 | 3,77 | Miscellaneous |
| RING zinc finger/U-box | g.44.1 | zf-UBP | U4/U6.U5-65K (USP39) | 97,200 | U4/U6.U5 trisnRNP |
| g.44.1 | U-box | hPRP19 | 1,60 | hPrp19 / CDC5L | |
| g.44.1 | Rtf2 | Cyp-60 | 36,94 | B-act complex | |
| g.44.1 | Rtf2 | Cyp-60 | 101,161 | B-act complex | |
| g.44.1 | zf-C3HC4 | RNF113A | 256,319 | B-act complex | |
| g.44.1 | Rtf2 | NOSIP | 33,79 | C complex | |
| g.44.1 | Rtf2 | NOSIP | 217,286 | C complex | |
| g.44.1 | DUF572 (ZZ) | CCDC130 | 43,117 | C complex | |
| g.44.1 | U-box | RBQ-1 | 258,312 | Miscellaneous | |
| UCH | d.3.1 | UCH | U4/U6.U5-65K (USP39) | 220,556 | U4/U6.U5 trisnRNP |
| UBC-like (RWD) | d.20.1 | THOC5 | 468,640 | TREX | |
| JAB1/MPN | c.97.3 | JAB+PROCT | U5-220K (Prp8) | 2064,2335 | U5 snRNP |
aAbundant protein.
Figure 4.Ubiquitin-related structural regions of human splicing proteins. (A) Ubiquitin-fold region of protein FLJ35382 (C1orf55; residues 1–80). Predicted RMSD 3.5 Å, QMEAN Z-score −1.33. (B) RWD-like region of protein THOC5 (residues 458–641). Predicted RMSD 3.9 Å, QMEAN Z-score −1.85.
Figure 5.Architecture of the conserved middle region of protein SF3a120 (residues 217–530). (A) Alignment of the residues of a zinc-finger domain in the middle part of SF3a120 (residues 407–435). The ‘g.37.1’ annotation row displays residues predicted to form a part of a g.37.1 (zf-C2H2) zinc finger. The ‘jnetpred SF3a120’ annotation row displays predicted secondary structure elements of the human of the human SF3a120 (ovals represent α-helices, while arrows represent β-strands). (B) Architecture of the middle region of SF3a120; disordered linkers denoted as ‘IDR linker’ (intrinsically disordered region-linker). (C) Model of the middle region.
Zinc-finger domains flanked by or embedded in predicted disordered regions
| PFAM domain | Protein | Protein group | Region | SCOP superfamily ID | PFAM domain of template | SCOP description | Confidence | Region- superfamily similarity |
|---|---|---|---|---|---|---|---|---|
| PRP21_like_P | SF3a120 | U2 snRNP SF3A | 406,435 | g.37.1 | zf-U11-48K | β–β–α zinc fingers | High | High |
| LUC7 | LUC7B1 | A complex | 30,74 | g.66.1 | zf-CCCH | CCCH zinc finger | High | High |
| LUC7 | LUC7B1 | A complex | 186,232 | g.37.1 | zf-C2H2_jaz | β–β–α zinc fingers | High | High |
| DUF572 | CCDC130 | C complex | 43,117 | g.44.1 | ZZ | RING/U-box | High | High |
| Rtf2 | NOSIP | C complex | 33,79 | g.44.1 | RING | RING/U-box | High | High |
| Rtf2 | NOSIP | C complex | 217,286 | g.44.1 | zf-C3HC4 | RING/U-box | High | High |
| Fra10Ac1 | Fra10Ac1 | C complex | 166,220 | d.325.1 | Ribosomal_L28 | L28p-like | Low | Low |
| ARS2 | ASR2B | pre-mRNA/mRNA-binding | 714,738 | g.37.1 | zf-C2H2 | β–β–α zinc fingers | High | High |
aAbundant protein.
bAlternative templates: FYVE, fn1.
Figure 6.BLUF-like region of protein U4/U6-90K (hPrp3) (domain DUF1115, residues 540–683). The position of the conserved residue W604 is displayed. Predicted RMSD 3.7 Å, QMEAN Z-score −3.06.
Figure 7.PWI-like regions of splicing helicases. (A) hPrp22 (DHX8; residues 1–120 shown, but domain may end at residue 92). Predicted RMSD 2.4 Å, QMEAN Z-score −2.76. (B) hPrp2 (DHX16; residues 1–95). Predicted RMSD 5.8 Å, QMEAN Z-score −2.19. (C) U5-200K (hBrr2; residues 259–338). Predicted RMSD 3.8 Å, QMEAN Z-score −0.79.
Figure 8.The PWI domain and PWI-like regions in splicing helicases. In all alignments, the ‘PWI’ annotation row displays the residues of the PWI motif conserved in a given protein. The ‘jnetpred (…)’ annotation row displays secondary structure elements predicted in the relevant human proteins (ovals represent α-helices, while arrows represent β-strands). Vertical lines indicate hidden columns (inserted residues present in only one or two sequences in the alignment). (A) Alignment of a ‘canonical’ PWI domain from protein SRm160. The ‘PDB ID: 1mp1’ annotation row displays the actual secondary structure elements found in the structure of the PWI domain of the human protein SRm160. (B) PWI-like region from protein hPrp22 (DHX8). The ‘disorder’ annotation row displays the position of a disordered region in the hPrp22 protein. (C) PWI-like region from protein hPrp2 (DHX16). (D) PWI-like region from protein U5-200K (hBrr2).
Figure 9.Other previously uncharacterized structural regions of the spliceosomal proteome. (A) The C-terminus of protein KIAA0560 (AQR), structurally similar to protein Upf1p (residues 453–1485). RMSD 3.3 Å, QMEAN Z-score −4.97. (B) Dsrm-like region of protein TFIP11 (residues 701–838). Predicted RMSD 4.5 Å, QMEAN Z-score −2.28. (C) The G-patch domain of LUCA15 (residues 741–815). Predicted RMSD 3.0 Å, QMEAN Z-score −1.22. (D) HTH-like region of protein hnRNP R (residues 23–92). Predicted RMSD 1.3 Å, QMEAN Z-score 0.12.
New types of predicted structural regions in the human spliceosomal proteome that can be classified into known superfamilies
| PFAM domain | Protein | Protein group | Region | SCOP superfamily ID | PFAM domain of template | SCOP description | Confidence | Region- superfamily similarity |
|---|---|---|---|---|---|---|---|---|
| KIAA0560 (A) | hPrp19/ CDC5L-related | 1,452 | a.118.1 | Arm repeats | ARM repeat | Medium | Medium | |
| KIAA0560 (A) | hPrp19/ CDC5L-related | 453,1348 | Upf1p | High | High | |||
| TFIP11 | B-complex | 771,837 | d.50.1 | dsrm | dsRNA-binding domain-like | Medium | High | |
| G-patch | LUCA15 (A) | A-complex | 741,815 | d.50.1 | dsrm | dsRNA-binding domain-like | Medium | High |
| hnRNP R | hnRNP | 28,92 | a.4.14 | KorB (clan HTH) | KorB DNA-binding domain-like | Medium | High | |
| DUF2414 | ELG | pre-mRNA/ mRNA-binding | 124,182 | d.58.7 | RNA_bind | RNA-binding domain, RBD | High | High |
| DUF1604 | Q9BRR8 | C-complex | 28,53 | b.34.2 | SH3_1 | SH3-domain | High | High |
| CTK3 | SR140 | U2 snRNP-related | 534,680 | a.118.9 | DUF618 | ENTH/VHS domain | High | High |
| Slu7 | hSlu7 (A) | step 2 factors | 424,457 | BTK motif | Low | High | ||
| PRP38 | hPrp38 (A) | B-complex | 26,206 | a.96.1 | HhH-GPD | DNA-glycosylase | Low | Medium |
| TRAP150 (A) | A-complex | 861,934 | Btz | High | High | |||
| BCLAF1 | pre-mRNA/ mRNA-binding | 827,899 | Btz | High | High | |||
| DZF | NFAR | A-complex | 82,177 | d.218.1 | NTP_transf_2 | Nucleotidyl transferase | High | High |
| DZF | NFAR | A-complex | 194,325 | a.160.1 | OAS1_C | PAP/OAS1 substrate- binding domain | High | High |
aProtein.
bHighly scored alternative template TcpQ (bacterial).
cDe novo model, highly scored, structural similarity only (1DI2_B).
dDe novo model, highly scored, structural similarity only (1R71_A).
eShort; BTK motif always found C-terminal to PH domains, which is not found in Slu7.
fAlternative templates: HtH motifs.
gPredicted disordered region.
hDZF is a member of clan NTP_transf.
Human spliceosomal proteins with potential G. lamblia homologs, and these potential homologs
| Protein group | Human protein | GI of | Human protein architecture | |
|---|---|---|---|---|
| Sm | Sm-B/B′ | 159117899 | LSM + | LSM |
| Sm | Sm-D1 | 159116502 | LSM + | LSM |
| Sm | Sm-D2 | 159111944 | LSM | LSM |
| Sm | Sm-D3 | 159107430 | LSM + | LSM |
| Sm | Sm-E | 159110758 | LSM | LSM |
| Sm | Sm-F | 159114826 | LSM | LSM |
| Lsm | Lsm2 | 159109501 | LSM | LSM |
| Lsm | Lsm3 | 159118879 | LSM | LSM |
| Lsm | Lsm4 | 159110729 | LSM + | LSM |
| U1 snRNP/U2 snRNP | U1-A/U2-B″ | 253745584 | (RRM_1) × 2 | RRM_1 |
| U1 snRNP | U1-C | 308158556 | zf-U1 + | zf-U1 |
| U2 snRNP | U2-A′ | 159115402 | (LRR_4) × 2 | (LRR_4) × 2 |
| U2 snRNP | SF3a66 | 159112716 | PRP4 + zf-met + b.15.1 | zf-met + b.15.1 |
| U2 snRNP | SF3a60 | 159115731 | SF3a60_bindingd + SAP + g.37.1 + g.37.1 | zf-met (g.37.1) + g.37.1 |
| U2 snRNP | SF3b155 | 253747536 | a.118.1 repeats | |
| U2 snRNP | SF3b145 | 159118535 | SAP + | DUF382 + PSP |
| U2 snRNP | SF3b130 | 308162520 | WD40 repeats + CPSF_A | CPSF_A |
| U2 snRNP | SF3b49 | 159117358 | (RRM_1) × 2 + | (RRM_1) × 2 |
| U2 snRNP | PHF5A | 159114698 | PHF5 | PHF5 |
| U2 snRNP-related | U2AF35 | 159112951 | zf-CCCH + RRM_1 + zf-CCCH + | zf-CCCH + RRM_1 + zf-CCCH |
| U4/U6 di-snRNP | NHP2L1 | 159112698 | Ribosomal_L7Ae | Ribosomal_L7Ae |
| U4/U6 di-snRNP | NHP2L1 | 159111753 | Ribosomal_L7Ae | Ribosomal_L7Ae |
| U5 snRNP | U5-15K | 159116909 | DIM1 | DIM1 |
| U5 snRNP | U5-200K | 159109491 | a.188.1 + (DEAD + Helicase_C + Sec63) × 2 | DEAD + Helicase_C + Sec63 |
| U5 snRNP | U5-220K | 159109144 | PRO8NT + PROCN + RRM_4 + U5_2-snRNA_bdg + U6-snRNA_bdg + PRP8_domainIV+c.97.3 (JAB + PROCT) | PRO8NT + PROCN + RRM_4 + U5_2-snRNA_bdg + U6-snRNA_bdg + PRP8_domainIV + d.15.3 |
| U2 snRNP-related | hPrp43 (DHX15) | |||
| B-act complex | hPrp2 (DHX16) | a.188.1 + DEAD + Helicase_C + HA2 + OB_NTP_bind | ||
| step 2 factors | hPrp22 (DHX8) | a.188.1 + | ||
| step 2 factors | hPrp16 (DHX38) | |||
| 159108899 | ATP11 + DEAD + Helicase_C + HA2 | |||
| 159113861 | DEAD + Helicase_C + HA2 + OB_NTP_bind | |||
| 159117264 | DEAD + Helicase_C + HA2 | |||
| B complex | hPrp38A | 159116389 | PRP38 + | PRP38 |
| B-act complex | RNF113A | 159114937 | zf-CCCH + zf-C3HC4 | zf-CCCH |
| hPrp19/CDC5L | CCAP2 | 159115167 | Cwf_Cwc_15 | |
| EJC | EIF4A3 | 159117719 | DEAD + Helicase_C | DEAD + Helicase_C |
Only abundant human splicing proteins with homologs in G. lamblia are shown. Predicted disordered regions with an independent function are included in italics. Ordered structural regions are usually described with their PFAM domains; SCOP IDs are used if the structural region does not correspond to a PFAM domain.
aOnly in G. lamblia P15.
bSAP domain insertion is limited to animals and plants.
cSimilarity to human SF3b155 only in C-terminal region (human SF3b155: 998–1304).
dOnly in G. lamblia P15; WD40 repeat-like domain may be found via FR.
eMay not participate in splicing (other possible human homologs: ribosomal protein L7, 15.5K).
fUbiquitin-like fold (d.15) found in protein instead of c.97.3 domain.
gThe human splicing helicases hPrp43, hPrp2, hPrp22 and hPrp16 and potential G. lamblia homologs cannot be unequivocally assigned to one another.
hOB_NTP_bind found via FR.
iMay not participate in splicing (other possible human homolog: initiation factor EIF4A).