| Literature DB >> 21051348 |
Pier L Martelli1, Mattia D'Antonio, Paola Bonizzoni, Tiziana Castrignanò, Anna M D'Erchia, Paolo D'Onorio De Meo, Piero Fariselli, Michele Finelli, Flavio Licciulli, Marina Mangiulli, Flavio Mignone, Giulio Pavesi, Ernesto Picardi, Raffaella Rizzi, Ivan Rossi, Alessio Valletti, Andrea Zauli, Federico Zambelli, Rita Casadio, Graziano Pesole.
Abstract
Alternative splicing is emerging as a major mechanism for the expansion of the transcriptome and proteome diversity, particularly in human and other vertebrates. However, the proportion of alternative transcripts and proteins actually endowed with functional activity is currently highly debated. We present here a new release of ASPicDB which now provides a unique annotation resource of human protein variants generated by alternative splicing. A total of 256,939 protein variants from 17,191 multi-exon genes have been extensively annotated through state of the art machine learning tools providing information of the protein type (globular and transmembrane), localization, presence of PFAM domains, signal peptides, GPI-anchor propeptides, transmembrane and coiled-coil segments. Furthermore, full-length variants can be now specifically selected based on the annotation of CAGE-tags and polyA signal and/or polyA sites, marking transcription initiation and termination sites, respectively. The retrieval can be carried out at gene, transcript, exon, protein or splice site level allowing the selection of data sets fulfilling one or more features settled by the user. The retrieval interface also enables the selection of protein variants showing specific differences in the annotated features. ASPicDB is available at http://www.caspur.it/ASPicDB/.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21051348 PMCID: PMC3013677 DOI: 10.1093/nar/gkq1073
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Pipeline for the annotation of alternative transcripts.
Statistics of the ASPicDB content (v2.0, August 2010)
| ASPicDB v2.0 | |
|---|---|
| Genes | 17 191 |
| Transcripts | 319 092 |
| Proteins | 256 939 |
| Exons | 390 886 |
| Splicing sites | 351 345 |
| U2 | 302 164 |
| U12 | 1712 |
| Splicing events | 233 717 |
The number of splicing sites belonging to the U2 or U12 class and of splicing events is also reported.
Annotation of human variants upon similarity and PFAM searches
| Sequence repository | No of proteins | No of genes |
|---|---|---|
| UniProtKB/SwissProt, % | ||
| | 239 814 (93) | 17 054 (99) |
| Identical, % | 42 601 ( | 13 043 (76) |
| PDB | ||
| | 137 528 (54) | 11 062 (64) |
| Identical, % | 1079 (0.4) | 316 ( |
| PFAM | ||
| All matches, | 183 483 (71) | 14 205 (83) |
| Complete matches, | 46 630 ( | 5621 ( |
aThe percentages are computed with respect to 256 939 protein variants and 17 191 genes.
Machine learning-based prediction of the human proteins deposited ASPicDB
| Annotation | No. of proteins | No. of genes |
|---|---|---|
| Type | ||
| Globular, % | 210 608 (82) | 15 513 (90) |
| Transmembrane, % | 41 561 ( | 5439 ( |
| Localization (globular proteins) | ||
| Secretory pathway, % | 31 917 ( | 7348 (43) |
| Cytoplasm, % | 90 046 ( | 10 327 (60) |
| Nucleus, % | 69 167 ( | 8183 (48) |
| Mitochondrion, % | 19 478 ( | 4698 ( |
| Domains | ||
| Signal peptide, % | 30 508 ( | 5153 ( |
| GPI-anchor propeptide, % | 1673 (0.7) | 629 ( |
| Coiled-coil segments, % | 3423 (1.3) | 497 (2.8) |
aThe percentages are computed with respect to 256 939 protein variants and 17 191 genes.
Differences among alternative proteins encoded by the same human gene
| Annotation | No. of genes |
|---|---|
| Type (globular/transmembrane) | 3817 ( |
| Subcellular localization (globular proteins) | 9593 (56) |
| Presence of signal peptide | 3939 ( |
| Presence of GPI-anchor propeptide | 591 (3.4) |
| Presence of coiled-coil domains | 464 (2.7) |
| Number of transmembrane helices | 2140 ( |
| PFAM models (all matches) | 6575 ( |
aThe percentages are computed with respect to 17 191 genes.
Figure 2.‘Predicted proteins’ panel for gene CSMD3 (CUB and Sushi multiple domains 3). The gene is predicted to encode for 12 transcripts and 7 different protein sequences. Variants labeled as PR1, PR2 and PR3 are identical to the isoforms reported in the CSMD3_HUMAN entry of SwissProt/UniprotKB. Two more variants are reported in that file, although lacking of experimental annotations. Several repetitions of Sushi and CUB domains are predicted with PFAM (25) and represented with symbols indicating whether the model is completely or partially mapped to the sequence. The two transmembrane helices are predicted with ENSEMBLE (35).