| Literature DB >> 33931666 |
Yosuke Ito1,2,3, Hideya Kawaji4,5,6,7, Yasuhisa Terao8, Shohei Noma9, Michihira Tagami9, Emiko Yoshida1,10,3, Yoshihide Hayashizaki11, Masayoshi Itoh9,11.
Abstract
Gene expression is controlled at the transcriptional and post-transcriptional levels. The TACC2 gene was known to be associated with tumors but the control of its expression is unclear. We have reported that activity of the intronic promoter p10 of TACC2 in primary lesion of endometrial cancer is indicative of lymph node metastasis among a low-risk patient group. Here, we analyze the intronic promoter derived isoforms in JHUEM-1 endometrial cancer cells, and primary tissues of endometrial cancers and normal endometrium. Full-length cDNA amplicons are produced by long-range PCR and subjected to nanopore sequencing followed by computational error correction. We identify 16 stable, 4 variable, and 9 rare exons including 3 novel exons validated independently. All variable and rare exons reside N-terminally of the TACC domain and contribute to isoform variety. We found 240 isoforms as high-confidence, supported by more than 20 reads. The large number of isoforms produced from one minor promoter indicates the post-transcriptional complexity coupled with transcription at the TACC2 locus in cancer and normal cells.Entities:
Year: 2021 PMID: 33931666 PMCID: PMC8087818 DOI: 10.1038/s41598-021-88018-9
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The intronic promoter p10 of the TACC2 gene is active in the endometrial cancer cell line JHUEM-1. (A) Genomic view of the transcriptome (RNA-seq and CAGE) of the TACC2 locus with known isoforms. Turquoise background indicates the same region of p10, pink and red bars indicate signal intensities of RNA-seq and CAGE. Black and blue boxes indicate exons of gene models previously reported and included RefSeq database. (B) Close-up view of the intronic promoter region. (C) Schematic representation of our experiments.
Figure 2Sequence error correction and identified exons. (A) Schematic representation of sequence error correction. Individual lines with the same color indicate the same isoform, where “x” indicates sequence errors (B) Sequence matches and mismatches in the alignment before and after sequence error correction, shown in IGV. The pink and blue regions indicate forward and reverse aligned blocks between the reads and the genome. A minor exon, comprising a Gencode gene model ENST00000491540, is shown (referred as rex04). (C) The 29 identified exons: rex01 to rex29. Novel exons are colored by red.
Representative exon boundaries identified by long-read sequencing.
| Start | End | Name | Length (bp) | Symmetric | Frequency | State | Position in a gene model or reported isoform |
|---|---|---|---|---|---|---|---|
| 123,779,335 | 123,779,396 | rex01 (Ex1-2) | 61 | No | Stable | Known | Yoshida et al., 1st exon (p10@TACC2 ) |
| 123,781,451 | 123,781,529 | rex02 (Ex2) | 78 | Yes | Stable | Known | Ref seq Gene NM_206862.3 2nd exon |
| 123,792,608 | 123,792,668 | rex03 (Ex2-3) | 60 | Yes | Rare | Known | Yoshida et al. short variant 4, 3rd exon; Ensemble ENST00000491540, 2nd exon |
| 123,794,962 | 123,795,075 | rex04 (Ex2-3) | 113 | No | Rare | Novel | NA |
| 123,809,952 | 123,810,065 | rex05 (Ex3) | 113 | No | Stable | Known | Ref seq Gene NM_206862.3 3rd exon |
| 123,822,912 | 123,823,005 | rex06 (Ex3-4) | 93 | Yes | Rare | Novel | NA |
| 123,838,491 | 123,839,151 | rex07 (Ex3-4) | 660 | Yes | Rare | Partially known | Ensembl ENST00000498721, 4th exon |
| 123,842,161 | 123,847,474 | rex08 (Ex4) | 5313 | Yes | Variable | Known | Ref seq Gene NM_206862.3 4th exon |
| 123,847,992 | 123,848,106 | rex09 (Ex5) | 114 | Yes | Variable | Known | Ref seq Gene NM_206862.3 5th exon |
| 123,892,123 | 123,892,249 | rex10 (Ex6) | 126 | Yes | Variable | Known | Ref seq Gene NM_206862.3 6th exon |
| 123,903,086 | 123,903,221 | rex11 (Ex7) | 135 | Yes | Variable | Known | Ref seq Gene NM_206862.3 7th exon |
| 123,943,555 | 123,943,720 | rex12 (Ex-8) | 165 | Yes | Rare | Novel | NA |
| 123,954,554 | 123,954,691 | rex13 (Ex8) | 137 | No | Stable | Known | Ref seq Gene NM_206862.3 8th exon |
| 123,962,081 | 123,962,141 | rex14 (Ex8-9) | 60 | Yes | Rare | Partially known | SIB Gene, HTR001818.10.1839.19, 8th exon |
| 123,969,388 | 123,969,484 | rex15 (Ex8-9) | 96 | Yes | Rare | Known | Ref seq Gene NM_001291879.1 |
| 123,969,911 | 123,971,223 | rex16 (Ex9) | 1312 | No | Stable | Known | Ref seq Gene NM_206862.3 9th exon |
| 123,972,856 | 123,972,892 | rex17 (Ex9-10) | 36 | Yes | Rare | Known | Yoshida et al. long variant 6, 10th exon ; UCSC gene, variat 7, 4th exon |
| 123,974,905 | 123,974,966 | rex18 (Ex10) | 61 | No | Stable | Known | Ref seq Gene NM_206862.3 10th exon |
| 123,976,141 | 123,976,343 | rex19 (Ex11) | 202 | No | Stable | Known | Ref seq Gene NM_206862.3 11th exon |
| 123,984,240 | 123,984,302 | rex20 (Ex12) | 62 | No | Stable | Known | Ref seq Gene NM_206862.3 12th exon |
| 123,985,880 | 123,985,996 | rex21 (Ex13) | 116 | No | Stable | Known | Ref seq Gene NM_206862.3 13th exon |
| 123,987,351 | 123,987,523 | rex22 (Ex14) | 172 | NO | Stable | Known | Ref seq Gene NM_206862.3 14th exon |
| 123,988,860 | 123,989,001 | rex23 (Ex15) | 141 | Yes | Rare | Known | Ref seq Gene NM_206862.3 15th exon |
| 123,996,909 | 123,997,053 | rex24 (Ex17) | 144 | Yes | Stable | Known | Ref seq Gene NM_206862.3 17th exon |
| 123,997,475 | 123,997,552 | rex25 (Ex18) | 77 | No | Stable | Known | Ref seq Gene NM_206862.3 18th exon |
| 124,001,472 | 124,001,516 | rex26 (Ex19) | 44 | No | Stable | Known | Ref seq Gene NM_206862.3 19th exon |
| 124,008,157 | 124,008,318 | rex27 (Ex20) | 161 | No | Stable | Known | Ref seq Gene NM_206862.3 20th exon |
| 124,008,564 | 124,008,671 | rex28 (Ex21) | 107 | No | Stable | Known | Ref seq Gene NM_206862.3 21st exon |
| 124,009,058 | 124,009,089 | rex29 (Ex22) | 31 | No | Stable | Known | Ref seq Gene NM_206862.3 22nd exon |
Genomic coordinates (start, end) on chromosome 10 are based on GRCh37 human genome assembly (hg19). Exon names are prefixed by “rex” (e.g., rex01), with exon order in a previous study[6] indicated in parentheses (e.g., Ex2 indicates the second exon, and Ex2-3 indicates an intron between the second and third exons). When the number of nucleotides in an exon is a multiple of 3, it is referred as symmetric and could be skipped without a change in the reading frame.
Variations of exon boundaries.
| Representative boundaries | Variations | ||||
|---|---|---|---|---|---|
| Name | Length (bp) | Symmetric | Pattern | Length change (BP) | Symmetric |
| rex07|Ex3-4 | 660 | Yes | L0:R-465 | − 465 | Yes |
| rex08|Ex4 | 5313 | Yes | L-4412:R0 | − 4412 | No |
| L-4463:R0 | − 4463 | No | |||
| L0:R-2445 | − 2445 | Yes | |||
| L0:R-2656 | − 2656 | No | |||
| L0:R-4536 | − 4536 | Yes | |||
| L0:R-5231 | − 5231 | No | |||
| rex09|Ex5 | 114 | Yes | L0:R-19 | − 19 | No |
| rex11|Ex7 | 135 | Yes | L-11:R0 | − 11 | No |
| L-47:R0 | − 47 | No | |||
| rex16|Ex9 | 1312 | No | L0:R12 | 12 | Yes |
| rex17|Ex9-10 | 36 | Yes | L0:R4 | 4 | No |
| rex18|Ex10 | 61 | No | L-12:R0 | − 12 | Yes |
| rex19|Ex11 | 202 | No | L-13:R0 | − 13 | No |
| L0:R93 | 93 | Yes | |||
| rex21Ex13 | 116 | No | L-3:R0 | − 3 | Yes |
| rex23|Ex15 | 141 | Yes | L-4:R0 | − 4 | No |
| L0:R4 | 4 | No | |||
| rex24|Ex17 | 144 | Yes | L9:R0 | 9 | Yes |
| rex25|Ex18 | 77 | No | L0:R26 | 26 | No |
| rex26|Ex19 | 44 | No | L0:R4 | 4 | No |
Patterns of variations are written as “L value 1:R value 2”, where “L” refers to the 5′-end and “R” indicates changes at the 3′-end. A positive value indicates exon extension, and a negative one indicates exon truncation. For example, L-4412:R0 indicates that the 5′-end is truncated by 4412 bp and the 3′-end is unchanged.
Figure 3Frequencies of exon–exon junctions of the three novel exons. Black boxes, known exons; red boxes, novel exons. The connections between exons are depicted as arches; frequencies are shown by arch width and numbers.
Figure 4PCR amplification of novel exons. (A) Schematic representation of PCR amplicons. (B) Electrophoresis of the PCR amplicons. PC and NC indicates synthesized positive and negative controls shown in Table S6. (a)–(c) indicate the amplicons in (A). Full-length image of the gel is included as Fig. S11.
Figure 5Isoforms transcribed from the TACC2 intronic promoter. Exon structures of 157 isoforms, where the validated novel exon rex12 firstly appears in 157th, are indicated by gray or pink boxes in the main (left bottom) panel. Isoform frequencies are shown by box plots in the middle panel, where frequency is calculated through dividing the number of corrected reads corresponding to the isoform by the total corrected reads per profile. The isoforms are ordered from the top by their total of corrected reads, the number of supporting evidences. Cumulative distribution is shown in the right panel.