| Literature DB >> 16186132 |
Alexander Churbanov1, Igor B Rogozin, Vladimir N Babenko, Hesham Ali, Eugene V Koonin.
Abstract
By comparing sequences of human, mouse and rat orthologous genes, we show that in 5'-untranslated regions (5'-UTRs) of mammalian cDNAs but not in 3'-UTRs or coding sequences, AUG is conserved to a significantly greater extent than any of the other 63 nt triplets. This effect is likely to reflect, primarily, bona fide evolutionary conservation, rather than cDNA annotation artifacts, because the excess of conserved upstream AUGs (uAUGs) is seen in 5'-UTRs containing stop codons in-frame with the start AUG and many of the conserved AUGs are found in different frames, consistent with the location in authentic non-coding sequences. Altogether, conserved uAUGs are present in at least 20-30% of mammalian genes. Qualitatively similar results were obtained by comparison of orthologous genes from different species of the yeast genus Saccharomyces. Together with the observation that mammalian and yeast 5'-UTRs are significantly depleted in overall AUG content, these findings suggest that AUG triplets in 5'-UTRs are subject to the pressure of purifying selection in two opposite directions: the uAUGs that have no specific function tend to be deleterious and get eliminated during evolution, whereas those uAUGs that do serve a function are conserved. Most probably, the principal role of the conserved uAUGs is attenuation of translation at the initiation stage, which is often additionally regulated by alternative splicing in the mammalian 5'-UTRs. Consistent with this hypothesis, we found that open reading frames starting from conserved uAUGs are significantly shorter than those starting from non-conserved uAUGs, possibly, owing to selection for optimization of the level of attenuation.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16186132 PMCID: PMC1236974 DOI: 10.1093/nar/gki847
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Plots of nucleotide triplet conservation in mammalian cDNAs. (A) Human–mouse 5′-UTRs. (B) Mouse–rat 5′-UTRs. (C) Human–mouse CDS. (D) Mouse–rat CDS. (E) Human–mouse 3′-UTR. (F) Mouse–rat 3′-UTR.
Preferential conservation of uAUGs in orthologous human and mouse 5′-UTRs
| Trinucleotide | Triplet present in human but not in mouse (%) | Triplet conserved in human and mouse (%) | Triplet present in mouse but not in human (%) |
|---|---|---|---|
| AUG | 8 | 85 | 7 |
| AGU | 19 | 61 | 20 |
| GUA | 24 | 54 | 23 |
| GAU | 17 | 68 | 15 |
| UAG | 21 | 59 | 20 |
| UGA | 16 | 69 | 15 |
Fisher's exact test: 1266 conserved uAUGs, 218 non-conserved uAUGs, 5690 conserved AGU/GUA/GAU/UAG/UGAs, 3219 non-conserved AGU/GUA/GAU/UAG/UGAs and P = 1.4 × 10−66.
Preferential conservation of uAUGs in mouse and rat orthologous 5′-UTRs
| Trinucleotide | Triplet present in mouse but not in rat (%) | Triplet conserved in mouse and rat (%) | Triplet present in rat but not in mouse (%) |
|---|---|---|---|
| AUG | 9 | 81 | 10 |
| AGU | 17 | 65 | 18 |
| GUA | 20 | 57 | 23 |
| GAU | 15 | 69 | 16 |
| UAG | 18 | 62 | 20 |
| UGA | 13 | 72 | 15 |
Fishers's exact test: 1539 conserved uAUGs, 359 non-conserved uAUGs, 11 218 conserved AGU/GUA/GAU/UAG/UGAs, 5703 non-conserved AGU/GUA/GAU/UAG/UGAs and P = 2.9 × 10−42.
Figure 2A mammalian 5′-UTR with conserved uAUGs and an in-frame stop codon. The alignment of the 5′UTRs of human and mouse α1 type III collagen proprotein (GI numbers: 15 149 480 and 33 859 525, respectively). uAUGs are colored yellow, the collagen starting codon is colored green, and the open reading frames are colored grey. The protein-coding region cannot be extended in the 5′ direction because of the presence of a conserved in-frame UGA stop codon (dark gray).
Preferential conservation of uAUGs in human–mouse stop-codon-bounded 5′-UTR alignments (5′-UTR
| Trinucleotide | Triplet present in human, but not in mouse (%) | Triplet conserved in human and mouse (%) | Triplet present in mouse, but not in human (%) |
|---|---|---|---|
| AUG | 9 | 84 | 7 |
| AGU | 17 | 69 | 14 |
| GUA | 20 | 62 | 18 |
| GAU | 17 | 70 | 13 |
| UAG | 15 | 70 | 15 |
| UGA | 16 | 72 | 12 |
Fishers's exact test: 255 conserved uAUGs, 49 non-conserved uAUGs, 1124 conserved AGU/GUA/GAU/UAG/UGAs, 491 non-conserved AGU/GUA/GAU/UAG/UGAs and P = 1.52 × 10−7.
Preferential conservation of uAUGs in mouse–rat stop-codon-bounded 5′-UTR alignments (5′-UTR
| Trinucleotide | Triplet present in mouse, but not in rat (%) | Triplet conserved in mouse and rat (%) | Triplet present in rat, but not in mouse (%) |
|---|---|---|---|
| AUG | 11 | 80 | 9 |
| AGU | 15 | 67 | 18 |
| GUA | 18 | 57 | 25 |
| GAU | 16 | 68 | 16 |
| UAG | 17 | 62 | 21 |
| UGA | 14 | 70 | 16 |
Fishers's exact test: 499 conserved uAUGs, 124 non-conserved uAUGs, 3262 conserved AGU/GUA/GAU/UAG/UGAs, 1675 non-conserved AGU/GUA/GAU/UAG/UGAs and P = 3.53 × 10−13.
Phase distribution of conserved uAUGs in alignments of mammalian 5′-UTRs
| Phase 0 | Phase 1 | Phase 2 | |
|---|---|---|---|
| Human/mouse | |||
| Phase 0 | 221 | 60 | 72 |
| Phase 1 | 94 | 282 | 94 |
| Phase 2 | 54 | 89 | 350 |
| Mouse/rat | |||
| Phase 0 | 267 | 88 | 100 |
| Phase 1 | 66 | 350 | 93 |
| Phase 2 | 81 | 99 | 395 |
Phase distribution of conserved uAUGs in stop-codon-bounded alignments of mammalian 5′-UTRs (5′-UTR
| Phase 0 | Phase 1 | Phase 2 | |
|---|---|---|---|
| Human/mouse | |||
| Phase 0 | 50 | 23 | 34 |
| Phase 1 | 36 | 42 | 22 |
| Phase 2 | 30 | 32 | 36 |
| Mouse/rat | |||
| Phase 0 | 64 | 60 | 56 |
| Phase 1 | 59 | 65 | 54 |
| Phase 2 | 58 | 37 | 72 |
Figure 3Length distributions of human uORFs starting with conserved uAUGs, non-conserved uAUGs and pseudo-ORFs starting with uGAUs. The ORF length is represented by bins, each including 10 codons (i.e. bin 1 includes ORFs from 0 to 10 codons, bin 2 ORFs from 11 to 20 codons and so on).
Preferential conservation of uAUGs in yeast orthologous 5′-UTRs
| Trinucleotide | Conserved triplets (%) | Non-conserved triplets (%) |
|---|---|---|
| AUG | 30 | 70 |
| AUG (5′-UTR<stop) | 28 | 72 |
| AUG (5′-UTR<stop + cDNA) | 36 | 64 |
| AGU | 9 | 91 |
| GUA | 7 | 93 |
| GAU | 9 | 91 |
| UAG | 10 | 90 |
| UGA | 10 | 90 |
Triplets were considered when present in the aligned sequences from all four yeast species. The AUG data set consisted of alignments of genomic sequences located upstream (50 nt) of sAUGs. The 5′-UTR
Expected and observed numbers of uAUG triplets and shuffled triplets per 1000 nt in mammalian and yeast 5′-UTRs
| Species, triplets | Expected | Observed |
|---|---|---|
| Human uAUGs Human | 12.6 | 7.4 |
| AGU/GUA/GAU/UAG/UGAs | 63.0 | 47.7 |
| Mouse uAUGs | 12.6 | 6.9 |
| Mouse AGU/GUA/GAU/UAG/UGAs | 63.0 | 48.2 |
| Rat uAUGs | 12.7 | 7.6 |
| Rat AGU/GUA/GAU/UAG/UGAs | 63.5 | 49.4 |
| Yeast uAUGs | 17.7 | 10.6 |
| Yeast AGU/GUA/GAU/UAG/UGAs | 88.5 | 73.6 |
The statistical significance of the differences between expected and observed frequencies of uAUG and shuffled triplets were estimated using the χ2 test (2 × 2 tables). In all cases, the difference for uAUGs was significantly greater than that for the combined shuffled triplets (P < 0.001).