| Literature DB >> 16438715 |
Mark L Crowe1, Xue-Qing Wang, Joseph A Rothnagel.
Abstract
BACKGROUND: Approximately 40% of mammalian mRNA sequences contain AUG trinucleotides upstream of the main coding sequence, with a quarter of these AUGs demarcating open reading frames of 20 or more codons. In order to investigate whether these open reading frames may encode functional peptides, we have carried out a comparative genomic analysis of human and mouse mRNA 'untranslated regions' using sequences from the RefSeq mRNA sequence database.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16438715 PMCID: PMC1402274 DOI: 10.1186/1471-2164-7-16
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Frequency of upstream AUGs and ORFs.
| Human | Mouse | |
| Number of mRNA sequences in initial dataset (1) | 16504 | 11291 |
| Number of mRNA sequences containing > 1 uAUG (2) | 9531 | 6352 |
| Total number of uAUGs (3) | 35599 | 24308 |
| Number of mRNA sequences containing > 1 uORF (4) | 4557 | 2820 |
| Total number of uORFs (5) | 8216 | 5487 |
| Number of mRNA sequences containing > 1 uORF after removal of duplicates (6) | 3924 | 2795 |
| Total number of uORFs after removal of duplicates (7) | 7138 | 5430 |
| Number of mRNA sequences containing > 1 uORF after removal of blast matches (8) | 3650 | 2678 |
| Total number of uORFs after removal of blast matches (9) | 6454 | 5089 |
Frequency of upstream AUGs and ORFs in human and mouse RefSeq mRNA sequences. Initial dataset refers to all sequences of > 60 nucleotides annotated as 5' UTRs. Numbers in parentheses indicate the corresponding stage in the filtering flowchart (figure 1).
Frequency of downstream AUGs and ORFs.
| Human | Mouse | |
| Number of mRNA sequences in initial dataset (1) | 21597 | 14790 |
| Number of mRNA sequences containing > 1 dAUG (2) | 19853 | 14327 |
| Total number of dAUGs (3) | 352301 | 229945 |
| Number of mRNA sequences containing > 1 dORF (4) | 16965 | 11598 |
| Total number of dORFs (5) | 85876 | 55815 |
| Number of mRNA sequences containing > 1 dORF after removal of duplicates (6) | 14252 | 11377 |
| Total number of dORFs after removal of duplicates (7) | 69202 | 54403 |
| Number of mRNA sequences containing > 1 dORF after removal of blast matches (8) | 13899 | 11299 |
| Total number of dORFs after removal of blast matches (9) | 65258 | 52786 |
Frequency of downstream AUGs and ORFs (dAUGs and dORFs) in human and mouse RefSeq mRNA sequences. Initial dataset refers to all sequences of > 60 nucleotides annotated as 3' UTRs. Numbers in parentheses indicate the corresponding stage in the filtering flowchart (figure 1).
Comparison of the frequency of AUGs in different contexts.
| Context of AUG present in: | Human | Mouse | ||||
| Optimal | Strong | Weak | Optimal | Strong | Weak | |
| All main CDS | 39.8 | 51.4 | 8.8 | 38.7 | 52.6 | 8.6 |
| Conserved uORFs | 24.5 | 44.6 | 30.9 | 24.0 | 42.6 | 30.4 |
| Non-conserved uORFs | 16.5 | 48.8 | 34.7 | 17.1 | 47.5 | 35.3 |
| Main CDS of uORF genes | 33.6 | 52.4 | 13.9 | 32.7 | 53.7 | 13.6 |
| Short uORFs | 12.3 | 51.2 | 36.5 | 12.6 | 50.0 | 37.4 |
| Theoretical uAUG* | 15.4 | 50.4 | 34.2 | 14.8 | 50.2 | 34.9 |
| Conserved dORFS | 11.8 | 47.4 | 40.8 | 11.0 | 45.2 | 40.8 |
| Non-conserved dORFs | 11.4 | 49.0 | 39.6 | 12.0 | 48.0 | 40.1 |
| Main CDS of dORF genes | 39.2 | 51.6 | 9.2 | 38.7 | 52.7 | 8.5 |
| Short dORFs | 8.5 | 48.6 | 42.9 | 8.9 | 47.9 | 43.2 |
| Theoretical dAUG* | 10.6 | 49.7 | 39.7 | 10.6 | 49.6 | 39.6 |
Comparison of the frequency of AUGs in different contexts between uORFs and main ORFs. All values are expressed as percentages of the total number of AUGs in that class of ORF.
*Theoretical is defined as the predicted distribution of AUG contexts if the nucleotide selection at the two critical positions was randomly selected based on the sequence composition of all UTRs in that category (i.e. 5' UTR composition for upstream AUGs, 3' UTR composition for downstream AUGs).
Figure 2Proportion of AUG codons in optimal, strong or weak sequence contexts for main coding regions, interspecific conserved uORFs, and non-conserved uORFs.
Figure 1Flowchart of steps used in identification of conserved upstream and downstream ORFs. Numbers in parentheses indicate the corresponding count in tables 1 and 2.