| Literature DB >> 18831736 |
Matteo Floris1, Massimiliano Orsini, Thangavel Alphonse Thanaraj.
Abstract
BACKGROUND: It is often the case that mammalian genes are alternatively spliced; the resulting alternate transcripts often encode protein isoforms that differ in amino acid sequences. Changes among the protein isoforms can alter the cellular properties of proteins. The effect can range from a subtle modulation to a complete loss of function.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18831736 PMCID: PMC2573899 DOI: 10.1186/1471-2164-9-453
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Definitions of splice-mediated changes in the annotation for PRINTS fingerprints and Pfam domains among protein isoforms.
Figure 2Flow of data (on genes and protein isoforms) through methodological steps adopted to derive the Set D used for characterizations. The numbers given in red correspond to the ASD data set, and those given in print colour correspond to the Vega data set. The number of genes in Set D forms 44.7% (33.4% in the case of ASD) of the genes from the start-up data set, the number of (PRINTS and Pfam) annotated protein isoforms and unannotated protein isoforms form 41% and 7% (27.5% and 8.6% in the case of ASD), respectively of the isoforms from the start-up data set.
Classifications of fingerprints involved in insertion/deletion and truncation events.
| Top-level PRINTS classifications of fingerprints | No. of observed events* affecting | |||
| the whole fingerprint& | some of the constituent motifs of the fingerprint$ | |||
| Type A – insertion/ | Type C – insertion/ | Type B – truncation | Type D – fingerprint is | |
| Receptors | ||||
| Enzymes: Hydrolases | 11 ( | 4 ( | ||
| 13 ( | 2 ( | 19 | ||
| Transport proteins:others | ||||
| Enzymes: Oxidoreductases | 17 (18%) | 39 ( | ||
| 11 ( | 40 ( | |||
| Enzymes: Transferases | 13 ( | 3 ( | 3 ( | |
| 17 (18%) | 5 ( | 7 ( | ||
| Structural proteins | ||||
| 13 ( | 6 ( | |||
| RNA- or DNA-associated proteins | 3 ( | 67 ( | ||
| 2 ( | 44 ( | 10 | ||
| PRINTS 'Domains' signatures | 21 ( | |||
| 4 ( | 17 ( | 3 | ||
| Cytokines and growth factors | 4 ( | 2 ( | 17 ( | 4 ( |
| 6 (18%) | 3 ( | 22 ( | 2 ( | |
| Protein secretion and chaperones | 0 ( | 2 (14%) | 8 ( | 4 (28%) |
| 0 ( | 6 (26%) | 11 ( | 6 (26%) | |
*, Events of same insertion/deletion type involving fingerprint members of a PRINTS classification observed among multiple pairs of isoforms from a gene are counted as one. Per every type of insertion/deletion event, the observed numbers involving the five top scoring fingerprint PRINTS classification are underlined. In brackets are given values on what fraction of events involving the PRINTS classification is of the given insertion/deletion type – values > = 35% are in bold; values < = 15% are in italics. Values for Vega data set are given in line 1 and values for ASD data set are given in line 2 in every row.
&, A fingerprint seen as either 'complete' (with all the constituent motifs) or 'partial' (with only some of the constituent motifs) in an isoform is deleted in the other isoform.
$, Some of the constituent motifs of a fingerprint ('complete' or 'partial') in an isoform are deleted in the other isoform.
Pfam domains that are frequently truncated among protein isoforms.
| Domain | No. of genes that encode the domain | Count of unique | Variation in lengths of | |
| Minimal length | Maximal length | |||
| Pkinase | 116 (77%) | 52 | 23 | 572 |
| 3 (8%) | 5 | 199 | 486 | |
| C1-set | 47 (43%) | |||
| Not seen in ASD | 8 | 30 | 89 | |
| Ras | 38 (92%) | |||
| Not seen in ASD | 25 | 192 | ||
| MHC_I | 35 (70%) | 7 | 36 | 178 |
| 3 (60%) | 5 | 91 | 178 | |
| Trypsin | 29 (60%) | 32 | ||
| 20 (66%) | 104 | |||
| ABC_tran | 29 (80%) | 15 | 55 | |
| 11 (68%) | 9 | 80 | 199 | |
| Filament | 26 (92%) | 23 | 34 | 452 |
| 4 (100%) | 6 | 142 | 400 | |
| PH | 24 (25%) | 15 | 26 | 241 |
| 6 (15%) | 5 | 84 | 134 | |
| MFS_1 | 23 (92%) | 33 | 82 | 537 |
| 6 (54%) | 7 | 326 | 426 | |
| Serpin | 22 (100%) | 30 | 31 | 424 |
| 10 (100%) | 13 | 140 | 378 | |
| P450 | 22 (100%) | 22 | 86 | 463 |
| 13 (72%) | 20 | 187 | 486 | |
| Proteasome | 22 (100%) | 11 | 29 | 191 |
| 9 (100%) | 8 | 117 | 187 | |
| 7tm_1 | 22 (70%) | 24 | 25 | 459 |
| 16 (48) | 17 | 149 | 388 | |
| Ion_trans | 21 (63) | 18 | 27 | 280 |
| 1 (10%) | 2 | 208 | 220 | |
| RRM_1 | 20 (46%) | 9 | 23 | 86 |
| 9 (34%) | 4 | 46 | 72 | |
| DEAD | 19 (67%) | 17 | 27 | 188 |
| 9 (56%) | 9 | 96 | 180 | |
| Pkinase_Tyr | 19 (38%) | |||
| Not seen in ASD | 17 | 50 | 301 | |
| Collagen | 19 (37%) | |||
| Not seen in ASD | 5 | 28 | 59 | |
| Tubulin | 18 (100%) | 13 | 46 | 227 |
| 2 (50%) | 5 | 113 | 227 | |
| I-set | 18 (37%) | |||
| Not seen in ASD | 10 | 22 | 99 | |
| Helicase_C | 18 (35%) | 7 | 41 | 91 |
| 3 (12%) | 3 | 55 | 76 | |
| Mito_carr | 17 (73%) | 14 | 26 | 146 |
| 13 (65%) | 10 | 50 | 136 | |
| UQ_con | 7 (87%) | 8 | 28 | 144 |
| 12 (100%) | 11 | 69 | 157 | |
$, presents the number of genes that encode the domain as undergoing truncation event among the protein isoforms. In brackets, is given in terms of percentage fraction of genes that encode the domain in the protein isoforms. Values with Vega data set is shown in line 1, and values with ASD data set is shown in line 2.
@, the observed lengths were grouped in a manner that any two lengths that differ by 5 or less amino acids is considered as one unique length.
Top 20 Pfam domains that are often inserted or deleted among protein isoforms*.
| No. of genes$ | ||||
| zf-C2H2 | 104 (75%) | Zinc finger, C2H2 type | Zinc ion binding | Nucleic Acid binding |
| 61 (48%) | ||||
| PH | 59 (62%) | pleckstrin homology | Intracellular signaling/constituent of cytoskeleton | |
| 22 (44%) | ||||
| Ank | 54 (80%) | Ankyrin repeat | Protein-protein interaction | |
| 25 (34%) | ||||
| ig | 51 (82%) | Immunoglobulin family | Domains for cell surface recognition. | |
| Not seen in asd | ||||
| fn3 | 46 (77%) | Fibronectin type III domain | Multi-domain glycoproteins. | |
| 6 (28%) | ||||
| SPRY | 46 (77%) | SPIa and the Ryanodine receptor | ||
| 3 (75%) | ||||
| Collagen | 45 (88%) | Collagen triple helix repeat | Phosphate transport | Extracellular structural proteins |
| 8 (66%) | ||||
| zf-C3HC4 | 44 (61%) | Zinc finger, C3HC4 type (RING finger) | Protein binding, zinc ion binding | Key role in ubiquitination pathway. |
| 7 (50%) | ||||
| Pkinase | 44 (29%) | Protein kinase domain | ATP binding, protein kinase activity, protein amino acid phosphorylation | |
| 1 (2%) | ||||
| PDZ | 43 (66%) | PDZ domain | Protein binding | Signaling |
| 18 (42%) | ||||
| KRAB | 42 (46%) | Kruppel-associated box present in proteins containg C2H2 fingers. | Nucleic acid binding, intracellular, DNA-dependent regulation of transcription | Protein-protein interactions |
| 34 (49%) | ||||
| C1-set | 41 (37%) | Immunoglobulin C1-set domain | Cell-cell recognition, cell-surface receptors, muscle structure, immune system. | |
| 1 (50%) | ||||
| WD40 | 40 (83)% | WD or beta-transducin repeats | Signal transduction, transcription regulation, cell cycle control, apoptosis. | |
| 35 (38%) | ||||
| EGF | 40 (83%) | Epidermal growth factor – like domain | Found in extracellular domain. | |
| 5 (18%) | ||||
| SH3_1 | 40 (51%) | Src homology 3 | Signal transduction | |
| 4 (8%) | related to cytoskeletal organisation. | |||
| Sushi | 39 (97%) | Complement control protein (CCP) modules, or short consensus repeats (SCR). | Complement and adhesion | |
| 17 (73%) | ||||
| Helicase_C | 32 (62%) | Helicase conserved C-terminal domain | Nucleic acid binding | Helicase |
| 12 (44%) | ||||
| I-set | 32 (66%) | Immunoglobulin I-set domain | Cell-cell recognition, cell-surface receptors, muscle structure, immune system | |
| Not present in ASD | ||||
| RRM_1 | 31 (72%) | RNA recognition motif | Nucleic Acid binding | RNA binding |
| 21 (46%) | ||||
| C2 | 27 (61%) | Ca2+-dependent membrane-targeting module | Signal transduction/membrane trafficking | |
| 16 (53%) | ||||
| LIM | 19 (82%) | LIM domain (Binding protein) | Zinc ion binding | Interface for protein-protein interaction |
| 20 (74%) | ||||
| Mito_carr | 21 (91%) | Mitochondrial carrier | Transport, binding, membrane | |
| 19 (86%) | ||||
| CH | 16 (53%) | Calponin homology domain | Cytoskeletal/signal transduction | |
| 12 (66%) | ||||
| Hormone_receptor | 9 (32%) | Ligand-binding domain of nuclear hormone receptor | Transcription factor; regulation of transcription | Hormone binding |
| 12 (60%) | ||||
| Trypsin | 20 (41%) | Trypsin | Proteolysis | Proteolytic enzyme |
| 11 (30%) |
$, presents the number of genes in which the domain is seen as undergoing domain insertion/deletion event. In brackets, is given in terms of percentage fraction of genes containing the domain – in what percentage fraction of genes (that contain the domain), the domain undergoes insertion/deletion.
*, Line 1 gives values from Vega data set and line 2 gives values from ASD data set.
Unique pairs of alternating Pfam domains
| Domain | Domain | Description of Domain | Description of Domain |
| Hormone_receptor | zf-C4 | Ligand-binding domain of nuclear hormone receptor. Steroid hormone receptor activity; transcription factor activity. DNA-dependent regulation of transcription. | Zinc finger C4 type. Found in steroid/thyroid hormone receptors; transcription factor activity. Regulation of transcription. |
| KRAB | Zf-C2H2 | Kruppel-associated box. Nucleic Acid binding; DNA dependent regulation of transcription. | Zinc finger. Nucleic acid binding. |
| SCAN | zf-C2H2 | SCAN domain (named after SRE-ZBP, CTfin51, AW-1 and Number 18 cDNA). Found in several zf-C2H2 proteins. DNA dependent regulation of transcription. | Zinc finger, C2H2 type. Zinc ion binding; nucleic acid binding. |
| Mito_carr | efhand | Mitochondrial carrier. Transport | EF hand. Calcium ion binding. Signaling. Buffering/transport. |
| CH | Plectin | Calponin homology domain. Actin-binding family. Cytoskeletal/signal transduction | Plectin repeat. Found in Plakin proteins. Plasma and nuclear membrances. |
| sushi | CUB | Sushi domain (SCR repeat) Complement control protein (CCP) modules, or short consensus repeats (SCR). Complement and adhesion. | Structural motif in extracellular and plasma membrane-associated proteins. |
| RGS | PDZ | Regulator of G protein signaling domain. | PDZ domain. Protein binding. Signaling |
| C2 | PDZ | Ca2+-dependent membrane-targeting module. Signal transduction/membrane trafficking | PDZ domain. Protein binding. Signaling |
| collagen | emi | Collagen triple helix repeat. Phosphate transport. Extracellular structural proteins | Found in extracellular proteins. |
| Nebulin | LIM | Nebulin repeat. Found in the thin filaments of striated vertebrate muscle. Actin-binding protein. | LIM domain (Binding protein). Zinc ion binding. Interface for protein-protein interaction |
| PH | Pkinase_Tyr | pleckstrin homology. Intracellular signaling/constituent of cytoskeleton. Pkinase_tyr supposed to contain PH domains. | Protein tyrosine kinase. Mediates the response to external stimuli. |
| Tubulin-binding | MAP2_projctn | Tau and MAP protein. Tubulin-binding repeat. | MAP domain (MHC class II analogue protein) |
| FHA | BRCT | Forkhead-associated domain. Phosphopeptide binding motif | BRCA1 C terminus domain. Phospho-protein binding protein. |
| NTP_transf_2 | PAP_RNA-bind | Nucleotidyltransferase domain. | Poly(A) polymerase predicted RNA binding domain. Polynucleotide adenyltransferase activity. |
| Ion_trans | Ion_trans_2 | Ion transport protein | Ion channel. Both are of same clan. |
| Orn_Arg_deC_N | Orn_DAP_Arg_deC | Pyridoxal-dependent decarboxylase, pyridoxal binding domain. Catalytic activity | Pyridoxal-dependent decarboxylase, C-terminal sheet domain. Catalytic activity. |
| MAM | ig | Adhesive function. Cellular component: membrance | Immunoglobulin domain |
| ig | I-set | Immunoglobulin | Immunoglobulin intermediate. Both are of same clan. |
| I-set | V-set | Immunoglobulin I-set (intermediate) domain. I-set and V-set are of same clan. | Immunoglobulin V-set (variable) domain. |
| EGF_CA | EGF | Calcium binding EGF domain. | EGF-like protein. Both are of same clan. |
| Hydrolase | E1-E2_ATPase | Haloacid dehalogenase-like hydrolase. Catalytic activity. Metabolic process | Hydrolase activity. ATP binding. |
| Radical_SAM | Mob_synth_C | Catalytic activity; iron-sulfur cluster binding. | Molybdenum cofactor synthesis C. iron, sulfur cluster binding |
| Aconitase | Aconitase_C | Aconitase hydratase. Lyase activity. | Aconitase hydratase. Hydro-lyase activity. |
| Filament | Filament_head | Intermediate filament protein. Structural molecule activity | Head region of intermediate filaments. |
| CNH | Pkinase | Citron and Citron kinase. Small GTPase regulator activity. | Protein kinase activity. ATP binding. |
| PSI | Sema | Plexin repeat. Membrane. Receptor activity | Semaphorins. Secreted and transmembrane proteins. |
| GTP_EFTU_D2 | GTP_EFTU | Elongation factor | GTP binding. Elongation factor |
| PARP | WWE | Poly(ADP-ribose) polymerase. Catalyses covalent attachment of ADP-ribose to DNA binding proteins | Mediates protein-protein interactions in ubiquitin and ADP ribose conjugation system. |
| Sushi | An_peroxidase | Complement control protein (CCP) modules, or short consensus repeats (SCR). Complement and adhesion | Animal haem peroxidase. Peroxidase activity. |
| PH | Oxysterol_BP | pleckstrin homology. Intracellular signaling/constituent of cytoskeleton | Oxysterol binding protein. Steroid metabolic process |
| Ank | KH_1 | Ankyrin repeat. Protein-protein interaction | K homology domain. RNA binding |
| GON | TSP_1 | Proteinaceous extracellular matrix. Zinc ion binding. Metalloendopeptidase activitiy. | Thrombospondin type 1 domain. Cell adhesion |
| Thioredoxin | DnaJ | Participates in redox reactions. | Heat shock protein binding |
| Collagen | EMI | Collagen triple helix repeat. Phosphate transport process. Connective tissue structures. | EMI domain. Participates in multimerization |
| HECT | RCC1 | HECT-domain (ubiquitin-transferase) Homologous to the E6-AP Carboxyl terminus. Ubiquitin-protein ligase; protein modification process. | Regulator of chromose condensation. Acts as a guanine-nucleotide dissociation simulator (GDS) |
Pfam domains and the undergoing events – Gene & events distribution$
| Percent fraction of genes that show | Percent fraction of events as per | ||||
| Pfam domain | No. of Genes that encode | insertion/deletion | Truncation | insertion/deletion | Truncation |
| Pkinase | 149 (139) | 29% | 27% | ||
| 48 (4) | 2% | 6% | 25% | 75% | |
| zf-C2H2 | 138 (106) | 1% | 1% | ||
| 127 (61) | 48% | 0% | 0% | ||
| C1-set | 108 (86) | 37% | 43% | 46% | 53% |
| 2 (1) | 50% | 0% | 100% | 0% | |
| PH | 94 (73) | 25% | 28% | ||
| 50 (24) | 44% | 12% | 21% | ||
| Ank | 67 (55) | 7% | 8% | ||
| 72 (25) | 34% | 5% | 13% | ||
| ig | 62 (52) | 8% | 8% | ||
| Not seen in ASD | |||||
| fn3 | 59 (49) | 30% | 28% | ||
| 21 (6) | 28% | 4% | 85% | 14% | |
| SPRY | 59 (48) | 5% | 6% | ||
| 4 (3) | 75% | 0% | 100% | 0% | |
| Trypsin | 47 (47) | 42% | 38% | ||
| 36 (29) | 30% | 35% | |||
| PDZ | 65 (46) | 27% | 29% | ||
| 42 (20) | 42% | 11% | 21% | ||
| zf-C3HC4 | 72 (46) | 4% | 6% | ||
| 14 (7) | 50% | 0% | 100% | 0% | |
| Collagen | 51 (45) | 37% | 29% | ||
| 12 (8) | 66% | 0% | 100% | 0% | |
| KRAB | 90 (43) | 46% | 2% | 4% | |
| 69 (34) | 49% | 1% | 2% | ||
| SH3_1 | 77 (43) | 51% | 9% | 14% | |
| 45 (4) | 8% | 0% | 100% | 0% | |
| WD40 | 48 (43) | 18% | 18% | ||
| 90 (35) | 38% | 2% | 5% | ||
| Sushi | 40 (40) | 27% | 22% | ||
| 23 (17) | 4% | 5% | |||
| EGF | 48 (40) | 8% | 9% | ||
| 27 (5) | 18% | 0% | 100% | 0% | |
| Ras | 41 (39) | 4% | 5% | ||
| Not present in ASD | |||||
| Helicase_C | 51 (38) | 35% | 36% | ||
| 27 (13) | 44% | 11% | 20% | ||
| RRM_1 | 43 (38) | 46% | 39% | ||
| 45 (24) | 46% | 20% | 30% | ||
| MHC_I | 50 (35) | 12% | 14% | ||
| 5 (3) | 0% | 60% | 0% | 100% | |
| ABC_tran | 36 (33) | 50% | 38% | ||
| 18 (13) | 38% | 38% | |||
| zf-B_box | 47 (33) | 46% | 25% | 35% | |
| 14 (3) | 21% | 0% | 100% | 0% | |
| C2 | 44 (32) | 36% | 37% | ||
| 30 (17) | 53% | 10% | 15% | ||
| LIM | 23 (22) | 39% | 32% | ||
| 27 (20) | 14% | 16% | |||
| Mito_carr | 23 (22) | 55% | 44% | ||
| 22 (19) | 59% | 59% | 40% | ||
$, Line 1 gives the values from Vega data set and Line 2 gives the values from ASD data set.
Domains that particularly undergo insertion/deletion events in a higher fraction of genes (containing the specific domain) as compared to truncation events are: zf-C2H2, PH, Ank, SPRY, KRAB, WD40, Sushi, and EGF. Domains that undergo truncation events in higher fraction of genes (containing the specific domain) as compared to insertion/deletion events: Trypsin, Ras, MHC_1, and ABC_trans. Since Swap events were seen in few instances, they are not considered in deriving this table
Figure 3Illustration of a typical result page from the web access of SpliVaP database. Reported is the data on protein isoforms from PEPD gene. Reported changes in Pfam domains between two isoforms SP1 and SP4 (which are hyperlinked to splice patterns in ASD database) are an insertion/deletion and a truncation. Associations to a template structure entry in MSD, and to a related entry of genetic disorder in OMIM are shown and are hyperlinked.