| Literature DB >> 24621851 |
Abstract
Eukaryotic polycistronic transcription units are rare and only a few examples are known, mostly being the outcome of serendipitous discovery. We claim that nonsense-mediated mRNA decay (NMD) immune structure is a common characteristic of polycistronic transcripts, and that this immunity is an emergent property derived from all functional CDSs. The human RefSeq transcriptome was computationally screened for transcripts capable of eliciting NMD, and which contain an additional ORF(s) potentially capable of rescuing the transcript from NMD. Transcripts were further analyzed implementing domain-based strategies in order to estimate the potential of the candidate ORF to encode a functional protein. Consequently, we predict the existence of forty nine novel polycistronic transcripts. Experimental verification was carried out utilizing two different types of analyses. First, five Gene Expression Omnibus (GEO) datasets from published NMD-inhibition studies were used, aiming to explore whether a given mRNA is indeed insensitive to NMD. All known bicistronic transcripts and eleven out of the twelve predicted genes that were analyzed, displayed NMD insensitivity using various NMD inhibitors. For three genes, a mixed expression pattern was observed presenting both NMD sensitivity and insensitivity in different cell types. Second, we used published global translation initiation sequencing data from HEK293 cells to verify the existence of translation initiation sites in our predicted polycistronic genes. In five of our genes, the predicted rescuing uORFs are indeed identified as translation initiation sites, and in two additional genes, one of two predicted rescuing uORF is verified. These results validate our computational analysis and reinforce the possibility that NMD-immune architecture is a parameter by which polycistronic genes can be identified. Moreover, we present evidence for NMD-mediated regulation controlling the production of one or more proteins encoded in the polycistronic transcript.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24621851 PMCID: PMC3951408 DOI: 10.1371/journal.pone.0091535
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Known human polycistronic transcripts architecture.
Exon junctions highlighted in bold, uncovered exon junction coordinates are indicated in bold; annotated CDS in turquoise; ORF in purple; CDS, ORF and transcript coordinates are indicated.
Novel bicistronic transcript candidates followed 3′ UTR analysis of penultimate or upstream NMD-eliciting transcripts.
| Gene Symbol | Gene Name | Transcript GI | Predicted functional ORF position | Kozak Sequence | InterProScan | BlastP |
| C20orf203 | Chromosome 20 open reading frame 203 | 292658848 | 1876..2109 |
| signal peptide 1–19; PTHR12138, family-not-named domain 11–49 | No |
| NAT15 | N-acetyltransferase 15 (GCN5-related, putative) | 134254454 | 1165..1716 |
| signal peptide 1–23; | 95% identity with hypothetical protein LOC100609520 [Pan troglodytes] 183 a.a.; 86% identity with hypothetical protein LOC100443079 [Pongo abelii] 223 a.a. |
| 134254439 | 1115..1666 |
Annotated ATG exon position in human RefSeq transcripts.
| Exon no. | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |
| No. of Transcripts | 17715 | 8908 | 2369 | 687 | 217 | 77 | 35 | 15 | 6 | 1 | 1 | 1 | 1 | 2 |
| % of Total | 59.0 | 29.7 | 7.9 | 2.3 | 0.72 | 0.26 | 0.12 | 0.050 | 0.020 | 0.003 | 0.003 | 0.003 | 0.003 | 0.007 |
Novel human polycistronic transcript candidates followed 5' UTR analysis.
| GeneID | Gene Symbol | Gene Name |
| 80823 | BHLHB9 | basic helix-loop-helix domain containing, class B, 9 |
| 6046 | BRD2 | bromodomain containing 2 |
| 84798 | C19orf48 | chromosome 19 open reading frame 48 |
| 9139 | CBFA2T2 | core-binding factor, runt domain, alpha subunit 2; translocated to, 2 |
| 966 | CD59 | CD59 molecule, complement regulatory protein |
| 9425 | CDYL | chromodomain protein, Y-like |
| 56616 | DIABLO | diablo, IAP-binding mitochondrial protein |
| 405754 | ERVFRD-1 | endogenous retrovirus group FRD, member 1 |
| 57579 | FAM135A | family with sequence similarity 135, member A |
| 391059 | FRRS1 | ferric-chelate reductase 1 |
|
|
|
|
| 81491 | GPR63 | G protein-coupled receptor 63 |
|
|
|
|
| 3146 | HMGB1 | high mobility group box 1 |
|
|
|
|
| 3781 | KCNN2 | potassium intermediate/small conductance calcium-activated channel, subfamily N, member 2 |
|
|
|
|
| 401052 | LOC401052 | hypothetical LOC401052 |
|
|
|
|
| 8195 | MKKS | McKusick-Kaufman syndrome |
| 318 | NUDT2 | nudix (nucleoside diphosphate linked moiety X)-type motif 2 |
| 5569 | PKIA | protein kinase (cAMP-dependent, catalytic) inhibitor alpha |
| 11272 | PRR4 | proline rich 4 (lacrimal) |
| 80758 | PRR7 | proline rich 7 |
| 5724 | PTAFR | platelet-activating factor receptor |
| 494115 | RBMXL1 | RNA binding motif protein, X-linked-like 1 |
| 5265 | SERPINA1 | serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1 |
| 6579 | SLCO1A2 | solute carrier organic anion transporter family, member 1A2 |
|
|
|
|
| 441273 | SPDYE2 | speedy homolog E2 (Xenopus laevis) |
| 1E+08 | SPDYE2L | WBSCR19-like protein 3 |
| 442578 | STAG3L3 | stromal antigen 3-like 3 |
| 51807 | TUBA8 | tubulin, alpha 8 |
| 347736 | TXNDC6 | thioredoxin domain containing 6 |
| 9724 | UTP14C | UTP14, U3 small nucleolar ribonucleoprotein, homolog C (yeast) |
| 9189 | ZBED1 | zinc finger, BED-type containing 1 |
| 9189 | ZBED1 | zinc finger, BED-type containing 1 |
| 51351 | ZNF117 | zinc finger protein 117 |
| 8187 | ZNF239 | zinc finger protein 239 |
| 339324 | ZNF260 | zinc finger protein 260 |
| 353274 | ZNF445 | zinc finger protein 445 |
| 55769 | ZNF83 | zinc finger protein 83 |
| 162962 | ZNF836 | zinc finger protein 836 |
| 284371 | ZNF841 | zinc finger protein 841 |
Novel polycistronic transcript candidates are presented (alphabetically sorted by gene symbol). Documented genes highlighted in bold.
NMD sensitivity status of human bicistronic genes in published NMD-inhibition experiments.
| GEO Dataset/Cell type | Citation | NMD -Inhibition method | Gene symbol | ProbeID | Type of transcripts identified | NMD Sensitivity |
| GSE1703 Hela Cells | Mendell, JT. et al, Nat Genet. 36, 1073 - 1078 (2004); PMID:15448691 | RENT1-siRNA | GDF1-LASS1 | 887_at | bicistronic transcripts (NM_0212673; NM_001492) | NMD insensitive |
| 888_s_at | monicistronic variant (NM_198207) | NMD insensitive | ||||
| SNRPN-SNURF | 34842_at | both monicistronic and bicistronic variants | NMD insensitive | |||
| GSE16170 Hela Cells | Choe, J. et al EMBO Rep 11(5): 380-386 (2010); PMID: 20395958 | Ago2 siRNA; UPF1 and Ago2 siRNA. | SNRPN-SNURF | ILMN_1656537 | both monicistronic and bicistronic variants | NMD insensitive |
| MTPN-LUZP6 | ILMN_2180682 | bicistronic transcript (NM_145808) | NMD insensitive | |||
| GSE20491 Clear cell renal cell carcinoma | Duns, G. et al, Cancer Res 70(11):4287–4291 (2010). PMID:20501857 | Emetine or caffeine inhibition | SNRPN-SNURF | ILMN_1660000 | bicistronic transcript (NM_005678) | NMD insensitive |
| MTPN-LUZP6 | ILMN_218068, ILMN_1791478 | bicistronic transcript (NM_145808) | NMD insensitive | |||
| GSE24204 Prostate cancer | Mattila, H., University of Tampere. Finland (unpublished). | Emetine inhibition | GDF1-LASS1 | 25143 | two bicistronic transcripts (NM_0212673; NM_001492) | NMD insensitive |
| MFRP-C1QTNF5 | 37231, 20996 | bicistronic transcripts (NM_031433; NM_015645) | NMD insensitive | |||
| MTPN-LUZP6 | 4388, 23064, 41236 | bicistronic transcript (NM_145808) | NMD insensitive | |||
| GSE29788 Head and neck cell lines | Sharma. S., et al, Mol Cancer Ther. 10(9):1751–1759, (2011). PMID: 21764905 | Emetine inhibition | SNRPN-SNURF | 201522_x_at, 206042_x_at | both monicistronic and bicistronic variants | NMD insensitive |
NMD sensitivity status of human polycistronic predicted genes in published NMD-inhibition experiments.
| GEO Dataset/Cell type | Citation | NMD -Inhibition method | Gene symbol | ProbeID | Type of transcripts identified | NMD Sensitivity |
| GSE1703 Hela Cells | Mendell, JT. et al, Nat Genet. 36, 1073–1078 (2004); PMID:15448691 | RENT1-siRNA | ZNF117 | 36783_f_at | NM_015852. | NMD insensitive |
| UTP14C | 39405_at | UTP14C (chr13) and UTP14 (chrX) genes. | NMD insensitive | |||
| GSE16170 Hela Cells | Choe, J. et al EMBO Rep 11(5): 380–386 (2010); PMID: 20395958 | Ago2 siRNA; UPF1 and Ago2 siRNA. | HMGB1 | ILMN_2231242 | NM_002128 | NMD insensitive Both with Ago2 siRNA and UPF1 + Ago2 siRNAs. |
| UTP14C | ILMN_1686645 | NM_021645 | ||||
| FRRS1 | ILMN_2214734 | NM_001013660 | ||||
| LOC401052 | ILMN_1791423 | NM_001008737 | ||||
| MGC119295 | ILMN_2144654 | NM_001031618 | ||||
| LOC442578 | ILMN_1791375 | NM_001013739 | ||||
| GSE20491 Clear cell renal cell carcinoma | Duns, G. et al, Cancer Res 70(11):4287–4291 (2010). PMID:20501857 | Emetine or caffeine inhibition | HMGB1 | ILMN_223124; ILMN_1791466 | NM_002128 | NMD insensitive. (both emetine and caffeine; in 10 cell lines) |
| UTP14C | ILMN_1686645 | NM_021645 | NMD insensitive (both emetine and caffeine; in 10 cell lines) | |||
| GSE24204 Prostate cancer | Mattila, H., University of Tampere. Finland (unpublished). Healthy and cancerous cells | Emetine inhibition | C20orf203 | 27463 | AK091025 | NMD insensitive |
| HMGB1 | 27795, 2170, 7063, 8395 | NM_002128 | NMD insensitive | |||
| UTP14C | 32662 | NM_021645 |
| |||
| ZNF841 | 39976 | NM_001136499 |
| |||
| TXNDC6 | 7699, 11753, 4719 | NM_178130 | NMD insensitive | |||
| FRRS1 | 31823 | NM_001013660 | NMD insensitive in healthy cells; | |||
| LOC401052 | 13485 | NM_001008737 | NMD insensitive | |||
| ERVFRD-1 | 14886 | NM_207582 | NMD insensitive | |||
| STAG3L3 | 3563 | NM_001013739 | NMD insensitive | |||
| GSE29788 Head and neck cell lines | Sharma. S., et al, Mol Cancer Ther. 10(9):1751–1759,(2011) PMID: 21764905 | Emetine inhibition | HMGB1 | 200679_x_at; 200680_x_at | NM_002128 | NMD insensitive |
| 214938_x_at | NM_002128 and AF283771 - anti-sense transcript |
| ||||
| UTP14C | 203614_at | NM_021645 | NMD insensitive | |||
| ZNF117 | 207117_at; 207605_x_at | NM_015852 | NMD insensitive |
Human polycistronic transcripts found in Lee et al TIS dataset: Novel polycistronic transcripts candidates that were found in Lee et al TIS dataset with exact match both in ORF start position and length; missing rescuing ORF in brackets.
| GeneID | Gene Symbol | RefSeq Accession (GI) | Predicted ORF | ORF size | Line No. in | Line No. in Lee et al |
| 3953 | LEPR | NM_001003680 (310923183) | 74..184 | 111 | 22 | 9113 |
| NM_002303 (310923184) | 23 | 9114 | ||||
| NM_001003679 (310923185) | 24 | 9112 | ||||
| 8195 | MKKS | NM_018848 (25914751) | 261..452 | 192 | 28 | 19993 |
| 9189 | ZBED1 | NM_004729 (57165426) | 43..165 | 123 | 45 | 22240 |
| NM_001171136 (283806700) | 43..168 | 126 | 46 | 22242 | ||
| 80823 | BHLHB9 | NM_030639 (216547631) | 101..211 | 111 | 4 | 19752 |
| NM_001142528 (216547671) | 101..226 | 126 | 5 | 19742 | ||
| 494115 | RBMXL1 | NM_001162536 (242247050) | 378..548 | 171 | 35 | 10363 |
| 84798 | C19orf48 | NM_199249 (40548381) | [139..243] 337..378 | 42 | 7 | 21003 |
| 339324 | ZNF260 | NM_001166036 (260436927) | 201..299 [390..485] | 99 | 49 | 8477 |