| Literature DB >> 21726469 |
Chun-Hsi Chen1, Ben-Yang Liao, Feng-Chi Chen.
Abstract
BACKGROUND: Small insertions and deletions ("indels" with size >/= 100 bp) whose lengths are not multiples of three (non-3n) are strongly constrained and depleted in protein-coding sequences. Such a constraint has never been reported in noncoding genomic regions. In 5'untranslated regions (5'UTRs) in mammalian genomes, upstream start codons (uAUGs) and upstream open reading frames (uORFs) can regulate protein translation. The presence of non-3n indels in uORFs can potentially disrupt the functions of these regulatory elements. We thus hypothesize that natural selection disfavors non-3n indels in 5'UTRs when these regulatory elements are present.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21726469 PMCID: PMC3146882 DOI: 10.1186/1471-2148-11-192
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Figure 1Classification of upstream open reading frames (uORFs) and potential effects of non-3n indels when they occur between uAUGs and TISs. (A) The open circle represents the 5'-cap structure of the transcript. The solid- and dashed-line open boxes represent the main coding sequences and the uORFs, respectively. The open and solid inverted triangles, respectively, indicate the locations of the uAUGs and the translation initiation sites (TIS). The open and solid triangles indicate locations of the stop codons of the uORFs and the main coding sequences. (B) Symbols "○", "×", and "?" represent that the protein isoforms or protein expression "is affected", "is not affected", and "uncertain", respectively.
Transcripts of human-mouse orthologous genes analyzed in this study
| Type | Randomly-selected 5'UTR | Longest 5'UTR | Pure 5'UTR | ||||
|---|---|---|---|---|---|---|---|
| Human | Mouse | Human | Mouse | Human | Mouse | ||
| Without uORF | G0 | 3,265 (54.0%) | 3,560 (58.9%) | 2,701 (46.6%) | 3,153 (54.4%) | 3,144 (55.2%) | 3,368 (59.1%) |
| Ga | 73 (1.2%) | 61 (1.0%) | 99 (1.7%) | 76 (1.3%) | 38 (0.7%) | 40 (0.7%) | |
| Single uORF type | Gs | 1,558 (25.8%) | 1,456 (24.1%) | 1,564 (27.0%) | 1,487 (25.6%) | 1,638 (28.7%) | 1,523 (26.7%) |
| Gv | 401 (6.6%) | 385 (6.4%) | 356 (6.1%) | 333 (5.7%) | 380 (6.7%) | 345 (6.1%) | |
| Multiple types of uORF | 749 (12.4%) | 584 (9.7%) | 1080 (18.6%) | 751 (12.9%) | 500 (8.8%) | 424 (7.4%) | |
| Total | 6,046 | 6,046 | 5,800 | 5,800 | 5,700 | 5,700 | |
a G0, Ga, Gs, and Gv indicate transcripts without uAUGs, with AISs, with SuAUGs, and with VuAUGs, respectively.
b For each gene, only one transcript is selected (a randomly selected transcript, or the one that has the longest 5'UTR or pure 5'UTR).
Figure 2Distributions of ISI values of (from left to right) (G. The numbers in the parentheses following G0 indicate the median distances of the uAUGs from 5' cap in terms of percentage of 5'UTR length in the non-G0 transcripts. These proportions of length are referenced to determine which G0 distributions to use in the comparisons. The P values of pair-wise differences (calculated by using the Mann-Whitney U test) are shown at the top. The symbols "*", "**", and "***" represent 0.01 ≦ P < 0.05, 0.001 ≦ P < 0.01, and P < 0.001, respectively.
Figure 3Distributions of the . One thousand times of re-sampling with replacement was performed for each transcript subgroup to derive the P value distributions. The P values were estimated by using the Mann-Whitney U test. The dashed line indicates P = 0.05 (or -log (P) = 1.301). "Fsig" indicates the fraction of P values in the distribution that is smaller than 0.05. The statistical significance between P value distributions of different transcript subgroups was estimated by using the Kolmogorov-Smirnov test and shown in the upper right corner of each panel.