| Literature DB >> 30005633 |
Zhuyi Xue1, René L Warren1, Ewan A Gibb1, Daniel MacMillan1, Johnathan Wong1, Readman Chiu1, S Austin Hammond1, Chen Yang1, Ka Ming Nip1, Catherine A Ennis1, Abigail Hahn2, Sheila Reynolds2, Inanc Birol3,4.
Abstract
BACKGROUND: Alternative polyadenylation (APA) results in messenger RNA molecules with different 3' untranslated regions (3' UTRs), affecting the molecules' stability, localization, and translation. APA is pervasive and implicated in cancer. Earlier reports on APA focused on 3' UTR length modifications and commonly characterized APA events as 3' UTR shortening or lengthening. However, such characterization oversimplifies the processing of 3' ends of transcripts and fails to adequately describe the various scenarios we observe.Entities:
Keywords: 3’ UTR; Alternative polyadenylation; Cancer; Cleavage site; Cloud computing; RNA-Seq; The Cancer Genome Atlas ; Trans-ABySS; de novo assembly
Mesh:
Substances:
Year: 2018 PMID: 30005633 PMCID: PMC6045855 DOI: 10.1186/s12864-018-4903-7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Cleavage site predictions. (a) Schematic diagram of the CS prediction pipeline. See Additional file 1: Figure S3A for a description of the CS post-processing step. (b) Count of gene types. (c) Count of TCGA RNA-Seq samples across 33 cancer types (sorted in decreasing order of normal and tumor samples). Sufficient normal: ≥15 samples. Alphabetically, ACC: adrenocortical carcinoma; BLCA: bladder urothelial carcinoma; BRCA: breast invasive carcinoma; CESC: cervical squamous cell carcinoma and endocervical adenocarcinoma; CHOL: cholangiocarcinoma; COAD: colon adenocarcinoma; DLBC: lymphoid neoplasm diffuse large B-cell lymphoma; ESCA: esophageal carcinoma; GBM: glioblastoma multiforme; HNSC: head and neck squamous cell carcinoma; KICH: kidney chromophobe; KIRC: kidney renal clear cell carcinoma; KIRP: kidney renal papillary cell carcinoma; LAML: acute myeloid leukemia; LGG: brain lower grade glioma; LIHC: liver hepatocellular carcinoma; LUAD: lung adenocarcinoma; LUSC: lung squamous cell carcinoma; MESO: mesothelioma; OV: ovarian serous cystadenocarcinoma; PAAD: pancreatic adenocarcinoma; PCPG: pheochromocytoma and paraganglioma; PRAD: prostate adenocarcinoma; READ: rectum adenocarcinoma; SARC: sarcoma; SKCM: skin cutaneous melanoma; STAD: stomach adenocarcinoma; TGCT: testicular germ cell tumors; THCA: thyroid carcinoma; THYM: thymoma; UCEC: uterine corpus endometrioid carcinoma; UCS: uterine carcinosarcoma; UVM: uveal melanoma. (d, e) Validation of our pipeline for predicting CSs. (d) Distribution of the distances between predicted and the closest annotated CSs. (e) Distribution of the distances between a predicted CS and the PAS hexamer motif found within 50 bp upstream. A high-resolution version of this figure is available for download in Additional file 5
Fig. 2Selected events of tumor-specific APA regulations that indicate clear 3’ UTR length modulations in cancer. (a) FGF2 in LUAD, a 3’ UTR shortening event. (b) CCNE1 in LUAD, a 3’ UTR lengthening event. (c, d) RNF43 in KIRC (3’ UTR shortening) and UCEC (lengthening). (a-d) Inside each left-hand panel, each group of bars represents the frequency of a specific CS in normal (blue) and tumor (red) samples. Bar groups are ordered by corresponding CS genomic coordinates. The text box shows the number of normal (N) and tumor (T) samples that were used for frequency calculation. The label box color indicates the trend of 3’ UTR length modulation in cancer. At the top, we indicate the number of cancer types with recurrent tumor-specific APA regulations. For example, “4 cancers” means that besides LUAD, tumor-specific APA regulation of FGF2 is also observed in three other cancer types with consistent patterns (see text and Additional file 3: Figure S4 for details). Inside each right-hand panel, the diagram represents a depiction of the 3′ end region of each gene with 3’ UTR models directly below the genome axis. The axis direction (right/left) indicates the relative DNA strand (plus/minus); the axis coordinates are offset by that of the gene’s first stop codon. On the axis, arcs show the relationship between CSs and stop codons based solely on annotation. Below the axis, vertical arrows indicate the positions of predicted CSs. Annotated and predicted CSs match well, but they are not expected to overlap exactly. An arrow pointing upwards (downwards) represents an increase (decrease) in frequency from normal to tumor. Arrow height represents the difference (Δ) of the increase/decrease. Bars and arrows of insignificant difference are colored gray. For clarity, CSs with frequencies lower than 5% in both normal and tumor samples, and that do not undergo any significant change in any cancer type considered herein are not shown. For a comprehensive view of all CSs with distribution of gene expression levels, see Additional file 3: Figure S4. A high-resolution version of this figure is available for download in Additional file 5
Fig. 3Selected events of tumor-specific APA regulations that do not fit the 3’ UTR length modulation paradigm. (a,b) CDKN2A in KIRC and HNSC. (c) EZH2 in LUAD. (d) PTCH1 in BRCA. (a-d) The legend of Fig. 2 applies. In addition, when the 3’ UTR length change is too complex to be resolved into a shortening or lengthening trend, the corresponding text box is left uncolored. NMD-related transcript elements are colored in cyan. An orange arrow indicates that a predicted CS with a significant frequency change is mapped to multiple stop codons, with its associated 3’ UTR length being ambiguous. A high-resolution version of this figure is available for download in Additional file 5
Fig. 4Trends of 3’ UTR length modulation across all 77 tumor-specific APA events. The numbers of annotated stop codons and CSs per gene are shown in parentheses. For example, AKT2 (8, 9) means the gene has eight annotated stop codons and nine annotated CSs