| Literature DB >> 33988713 |
Shani T Gal-Oz1, Nimrod Haiat2, Dana Eliyahu2, Guy Shani2, Tal Shay1,2.
Abstract
Alternative splicing results in multiple transcripts of the same gene, possibly encoding for different protein isoforms with different domains. Whereas it is possible to manually determine the effect of alternative splicing on the domain composition for a single event, the process requires the tedious integration of several data sources; it is error prone and not feasible for genome-wide characterization of domains affected by differential splicing. To fulfill the need for an automated solution, we developed the Domain Change Presenter (DoChaP, https://dochap.bgu.ac.il/), a web server for the visualization of exon-domain associations. DoChaP visualizes all transcripts of a given gene, the encoded proteins and their domains, and enables a comparison between the transcripts and between their protein products. The colors and organization make the structural effect of alternative splicing events on protein structures easily identified. To enable the study of the conservation of exons structure, alternative splicing, and the effect of alternative splicing on protein domains, DoChaP also provides a two-species comparison of exon-domain associations. DoChaP thus provides a unique and easy-to-use visualization of the exon-domain association and conservation, and will facilitate the study of the structural effects of alternative splicing in health and disease.Entities:
Mesh:
Substances:
Year: 2021 PMID: 33988713 PMCID: PMC8262731 DOI: 10.1093/nar/gkab357
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
DoChaP data sources.
| Source | Download path | File/table names | Obtained information |
|---|---|---|---|
|
|
| Gff file | RefSeq's transcripts, genes CDS and exons annotations |
|
|
| All GPFF files per species | RefSeq's proteins and protein domains data |
|
|
| Gff3 file | Ensembl's transcripts, genes CDS and exons annotations |
|
|
| Xml query templates are to be found in the GitHub | Ensembl's protein domains data |
|
|
| Xml query templates are to be found in the GitHub | Ensembl-compara, Orthology information |
|
|
| Entries_table | Connection between domains accession from different sources |
|
|
| Gene2ensembl | Transition from RefSeq to Ensembl identifiers |
DoChaP database content for five species (as of February 2021)
| Species | Gene IDs | Transcript- isoform pairs | Unique exons | Protein domain types |
|---|---|---|---|---|
|
| 20 272 | 177 531 | 425 709 | 11 167 |
|
| 23 161 | 157 623 | 390 326 | 10 676 |
|
| 22 063 | 74 270 | 275 677 | 7206 |
|
| 29 633 | 71 205 | 361 335 | 9831 |
|
| 21 683 | 44 877 | 248 980 | 7759 |
|
| 116 812 | 525 506 | 1 702 027 | 46 639 |
aOnly includes RefSeq data because Ensembl current version is older than RefSeq current version.
Figure 1.Sample output of DoChaP for the human breast cancer susceptibility gene BRCA1. (A) Left, genomic visualization of the transcripts in their genomic context (genomic region, genomic range and strand are on the top). The sliding scale and zoom buttons control the genomic region displayed. The genomic range shown is chr17:43 127,790–43 044 294 which is the genomic region coding for the displayed transcripts of the gene BRCA1. Right, mRNA and protein domain composition for each transcript. Different colors represent different exons and are consistent across all the visualizations of the same gene. Domains are shown as circular shapes and are colored according to the exons that encode for them. In BRCA1, the top transcript encodes for four domains. The BRCT domain exists in all three isoforms shown. The x-axis is the position in the protein and coding region, as shown on top. Sliding scale and zoom buttons control the transcript and protein region shown, and double click on an exon in a transcript in the genomic visualization will zoom in to the relevant region of the corresponding transcript in the mRNA and protein domain composition visualization. (B) In the second transcript, the third exon (E3) is skipped and therefore the RING protein domain is missing (‘RING-HC_BRCA1’). (C) The third protein isoform does not include the serine rich domain (‘BRCT_assoc’), encoded by exon 10 (E10), as its associated transcript has a shorter exon 10 (E10117) due to an alternative 5’ splice site event. For the sake of simplicity, only three representative transcripts of BRCA1 are shown.