| Literature DB >> 31642488 |
Pora Kim1, Mengyuan Yang1, Ke Yiya2, Weiling Zhao1, Xiaobo Zhou1,3,4.
Abstract
Exon skipping (ES) is reported to be the most common alternative splicing event due to loss of functional domains/sites or shifting of the open reading frame (ORF), leading to a variety of human diseases and considered therapeutic targets. To date, systematic and intensive annotations of ES events based on the skipped exon units in cancer and normal tissues are not available. Here, we built ExonSkipDB, the ES annotation database available at https://ccsm.uth.edu/ExonSkipDB/, aiming to provide a resource and reference for functional annotation of ES events in multiple cancer and tissues to identify therapeutically targetable genes in individual exon units. We collected 14 272 genes that have 90 616 and 89 845 ES events across 33 cancer types and 31 normal tissues from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx). For the ES events, we performed multiple functional annotations. These include ORF assignment of exon skipped transcript, studies of lost protein functional features due to ES events, and studies of exon skipping events associated with mutations and methylations based on multi-omics evidence. ExonSkipDB will be a unique resource for cancer and drug research communities to identify therapeutically targetable exon skipping events.Entities:
Mesh:
Year: 2020 PMID: 31642488 PMCID: PMC7145592 DOI: 10.1093/nar/gkz917
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Overview of ExonSkipDB. (A) The genomic structures of exon skipping events of TCGA and GTEx across reference gene model. (B) Abundance of isoforms and PSI values of individual exon skipping events across TCGA and GTEx. (C) The protein functional features based on the canonical protein sequence. The grey color shaded regions correspond to individual exon skipping events. (D) The analyzed ORF information of individual exon skipping events based on the canonical transcript sequence. (E) RNA-seq evidence for the mutation-associated exon skipping event. Consistent evidence through the depth of coverage, differential PSI values between mutated and non-mutated samples, and sashimi plots have identified 136 exon skipping events that have associations with mutations.
Number of in-frame ES event genes per lost protein functional features
| Subsection | Content | # in-frame exon skipped genes |
|---|---|---|
| Molecule processing | Initiator methionine | 13 |
| Signal peptide | 93 | |
| Transit peptide | 31 | |
| Propeptide | 45 | |
| Chain | 7044 | |
| Peptide | 8 | |
| Regions | Topological domain | 1243 |
| Transmembrane | 847 | |
| Intramembrane | 21 | |
| Domain | 2427 | |
| Repeat | 596 | |
| Calcium binding | 25 | |
| Zinc finger | 193 | |
| DNA binding | 21 | |
| Nucleotide binding | 269 | |
| Region | 1055 | |
| Coiled coil | 458 | |
| Motif | 192 | |
| Compositional bias | 497 | |
| Sites | Active site | 240 |
| Metal binding | 237 | |
| Binding site | 378 | |
| Site | 142 | |
| Amino acid modifications | Modified residue | 1374 |
| Lipidation | 18 | |
| Glycosylation | 624 | |
| Disulfide bond | 647 | |
| Cross-link | 234 | |
| Natural variations | Alternative sequence | 3123 |
| Natural variant | 2080 | |
| Experimental info | Mutagenesis | 665 |
| Sequence conflict | 1482 | |
| Secondary structure | Helix | 1750 |
| Turn | 952 | |
| Beta strand | 1623 |
Figure 2.The overrepresentation enrichment analysis of ES genes per lost protein feature categories. For individual protein functional feature categories, which were lost in ES genes due to exons kipping, we have investigated the enriched genes in the biological processes. The abbreviations of categories are following with alphabetical order; Active site (AS), Binding site (BS), Coiled coil (CC), Compositional bias (CB), Cross-link (CL), Disulfide bond (DB), Glycosylation (G), Metal binding (MB), Motif (MB), Mutagenesis (MUT), Nucleotide binding (NB), Propeptide (PP), Repeat (R), Signal peptide (SP), Site (SP), Transit peptide (TP) and Zinc finger (ZF). In the left top panel, SP, TP, and PP belong to the UniProt's protein feature subsection of ‘Molecule processing’, and R, ZF, NB, CC, M and CB belong to the ‘Regions’ subsection. In the right top panel, AS, MS, BS and S belong to the ‘Sites’ subsection. G, DB and CL belong to the subsection of ‘Amino acid modifications’. Lastly, Mutagenesis (M) belongs to the ‘Experimental info’ subsection.
Exon skipping events associated with mutations
| Gene | Cancer type | ESID | Gene | Cancer type | ESID | Gene | Cancer type | ESID |
|---|---|---|---|---|---|---|---|---|
| AASS | BRCA | exon_skip_479254 | HNRNPC | READ | exon_skip_111139 | RASA1 | HNSC | exon_skip_436351 |
| ACADVL | HNSC | exon_skip_148534 | IRF3 | SKCM | exon_skip_320801 | RB1 | LGG | exon_skip_100319 |
| ACE2 | LIHC | exon_skip_514046 | IRF3 | LIHC | exon_skip_320801 | RB1 | LUSC | exon_skip_100319 |
| ACSL1 | BRCA | exon_skip_433327 | KCTD10 | STAD | exon_skip_96034 | RB1 | SKCM | exon_skip_100319 |
| ACTR5 | HNSC | exon_skip_351842 | KDM4C | LIHC | exon_skip_495073 | RB1 | SARC | exon_skip_100320 |
| AK9 | SKCM | exon_skip_461739 | LAMTOR4 | CESC | exon_skip_468895 | RB1 | SKCM | exon_skip_100321 |
| AP1G2 | KIRP | exon_skip_111822 | LPCAT1 | COAD | exon_skip_440961 | RBBP7 | BRCA | exon_skip_514095 |
| AP1G2 | UCS | exon_skip_111839 | LRRC37B | TGCT | exon_skip_150743 | RBL2 | LIHC | exon_skip_136487 |
| ARAP1 | HNSC | exon_skip_75409 | LRRC49 | LUSC | exon_skip_123285 | RBM10 | LUAD | exon_skip_509946 |
| ARAP2 | STAD | exon_skip_428975 | LZTFL1 | SKCM | exon_skip_382808 | RBM11 | CESC | exon_skip_358911 |
| ARHGAP4 | CESC | exon_skip_517379 | MAOB | LIHC | exon_skip_514541 | RBM23 | KIRP | exon_skip_111436 |
| ARSA | ESCA | exon_skip_370946 | MAP4K4 | SKCM | exon_skip_328236 | RBM27 | SKCM | exon_skip_438439 |
| ATF7IP | LUAD | exon_skip_80383 | MCTP2 | SKCM | exon_skip_125176 | RWDD2B | LUAD | exon_skip_361480 |
| ATXN7L1 | BRCA | exon_skip_478799 | MET | LUAD | exon_skip_470683 | SAAL1 | LIHC | exon_skip_69749 |
| AXIN1 | LIHC | exon_skip_140168 | MFN1 | BRCA | exon_skip_379593 | SAFB | LUSC | exon_skip_301206 |
| BAP1 | UVM | exon_skip_384821 | MFSD11 | BRCA | exon_skip_156579 | SCRIB | STAD | exon_skip_493826 |
| BCHE | LIHC | exon_skip_389735 | MIB2 | LIHC | exon_skip_292 | SEMA6C | LUAD | exon_skip_30610 |
| BRCC3 | READ | exon_skip_513411 | MPP1 | LIHC | exon_skip_517757 | SIK3 | BRCA | exon_skip_77623 |
| BSDC1 | SKCM | exon_skip_24351 | MPP1 | LIHC | exon_skip_517758 | SLC44A1 | LUSC | exon_skip_497929 |
| BSDC1 | SKCM | exon_skip_24356 | MSH2 | STAD | exon_skip_325448 | SLC6A9 | BRCA | exon_skip_25867 |
| C16orf70 | OV | exon_skip_137511 | MSI2 | SARC | exon_skip_154718 | SLCO2A1 | LUAD | exon_skip_388682 |
| C20orf96 | LGG | exon_skip_354191 | MYO9B | LUAD | exon_skip_303297 | SMARCA1 | KIRP | exon_skip_516648 |
| CCDC125 | SKCM | exon_skip_442509 | NF1 | PCPG | exon_skip_150648 | SMARCA2 | KIRC | exon_skip_494778 |
| CCDC90B | HNSC | exon_skip_76335 | NF2 | CHOL | exon_skip_364330 | SMARCC1 | HNSC | exon_skip_383264 |
| CD44 | SKCM | exon_skip_57862 | NSFL1C | LUAD | exon_skip_354315 | SNRNP200 | KIRP | exon_skip_341775 |
| CDH1 | BRCA | exon_skip_138094 | OCIAD1 | LGG | exon_skip_423495 | SPCS1 | HNSC | exon_skip_375123 |
| CHD3 | SKCM | exon_skip_148936 | OCRL | LUSC | exon_skip_512323 | STAG2 | GBM | exon_skip_512303 |
| CHEK2 | LUSC | exon_skip_368202 | ODF2 | PAAD | exon_skip_499697 | SUPT3H | SKCM | exon_skip_459890 |
| CNOT6 | LUAD | exon_skip_440379 | PACRGL | LUSC | exon_skip_422774 | TBL1X | UCEC | exon_skip_509098 |
| COL13A1 | LUAD | exon_skip_42139 | PEA15 | LGG | exon_skip_12856 | TBX3 | STAD | exon_skip_96977 |
| COL7A1 | LUSC | exon_skip_383673 | PELI2 | LUSC | exon_skip_106790 | TCTN2 | THCA | exon_skip_88383 |
| COPS3 | SKCM | exon_skip_287002 | PGLYRP2 | LIHC | exon_skip_316122 | TEP1 | LUAD | exon_skip_110964 |
| CREBBP | CESC | exon_skip_141389 | PHKA2 | LUAD | exon_skip_514168 | TERF1 | OV | exon_skip_484059 |
| CTCF | BRCA | exon_skip_137805 | PHKA2 | BRCA | exon_skip_514195 | TNFRSF10A | PAAD | exon_skip_488782 |
| DIAPH3 | SKCM | exon_skip_103986 | PIK3R1 | SKCM | exon_skip_435166 | TP53 | OV | exon_skip_286364 |
| DLG3 | STAD | exon_skip_511069 | PIK3R1 | COAD | exon_skip_435167 | TP53 | OV | exon_skip_286376 |
| DSG2 | LUAD | exon_skip_296226 | PLEC | LUSC | exon_skip_494000 | TP53 | UCS | exon_skip_286379 |
| EEA1 | STAD | exon_skip_95294 | PLEC | SKCM | exon_skip_494000 | TP53 | BRCA | exon_skip_286384 |
| EPB41L4A | LUAD | exon_skip_443476 | PLOD3 | LUSC | exon_skip_478451 | TP53 | KICH | exon_skip_286388 |
| FARP2 | ESCA | exon_skip_336006 | PPP6C | SKCM | exon_skip_506902 | TRAPPC2L | BRCA | exon_skip_139393 |
| FAS | DLBC | exon_skip_43777 | PROM1 | BRCA | exon_skip_428630 | TULP4 | BRCA | exon_skip_455061 |
| FN1 | SKCM | exon_skip_346530 | PTCH1 | ESCA | exon_skip_505102 | UBE2D3 | LIHC | exon_skip_431479 |
| GCSAM | DLBC | exon_skip_386471 | PTEN | COAD | exon_skip_43702 | UBE2E1 | CESC | exon_skip_372157 |
| GLRX3 | LUAD | exon_skip_46116 | PTEN | CESC | exon_skip_43707 | USP40 | LUAD | exon_skip_347441 |
| GNB5 | LUAD | exon_skip_127162 | PTEN | SKCM | exon_skip_43707 | YIF1A | STAD | exon_skip_74538 |
| GPR143 | SKCM | exon_skip_513862 | PTEN | BRCA | exon_skip_43714 | YLPM1 | SKCM | exon_skip_108215 |
| HAUS6 | STAD | exon_skip_502793 | RABL3 | LUSC | exon_skip_386919 | ZNF512B | LUAD | exon_skip_358876 |
Figure 3.Splice-site mutation associated exon skipping event in RB1. Among 10 recurrent ES events, RB1 showed consistent expression change and differential PSI values with non-mutated samples in multiple cancer types. (A) Mutations associated with exon skipping event (ESID: exon_skip_100319) in RB1 and mutation location in the gene structure. (B) RNA-seq evidence for the mutation associated exon skipping event. Consistent evidence through the depth of coverage, differential PSI values between mutated and non-mutated samples, and sashimi plots. (C) The analyzed ORF information of individual exon skipping events based on the canonical transcript sequence. (D) The protein functional features based on the canonical protein sequence.