| Literature DB >> 21266472 |
Ivaylo P Ivanov1, Andrew E Firth, Audrey M Michel, John F Atkins, Pavel V Baranov.
Abstract
In eukaryotes, it is generally assumed that translation initiation occurs at the AUG codon closest to the messenger RNA 5' cap. However, in certain cases, initiation can occur at codons differing from AUG by a single nucleotide, especially the codons CUG, UUG, GUG, ACG, AUA and AUU. While non-AUG initiation has been experimentally verified for a handful of human genes, the full extent to which this phenomenon is utilized--both for increased coding capacity and potentially also for novel regulatory mechanisms--remains unclear. To address this issue, and hence to improve the quality of existing coding sequence annotations, we developed a methodology based on phylogenetic analysis of predicted 5' untranslated regions from orthologous genes. We use evolutionary signatures of protein-coding sequences as an indicator of translation initiation upstream of annotated coding sequences. Our search identified novel conserved potential non-AUG-initiated N-terminal extensions in 42 human genes including VANGL2, FGFR1, KCNN4, TRPV6, HDGF, CITED2, EIF4G3 and NTF3, and also affirmed the conservation of known non-AUG-initiated extensions in 17 other genes. In several instances, we have been able to obtain independent experimental evidence of the expression of non-AUG-initiated products from the previously published literature and ribosome profiling data.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21266472 PMCID: PMC3105428 DOI: 10.1093/nar/gkr007
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 2.Pipeline of RefSeq mRNA analysis for the identification of conserved 5′ CDS extensions (P5ECs). White boxes indicate annotated CDSs. Black boxes correspond to 5′ in-frame codon extensions up to the closest in-frame stop codon. Xs correspond to the deleted regions of human–mouse alignments prior to Ka/Ks analysis.
Ranking of newly identified and known non-AUG-initiated N-terminal extensions
Column 1, combined ranking; column 2, GenBank accession number; column 3, gene name; column 4, predicted non-AUG initiation codon in human (see Supplementary Dataset 1 for full details); column 5, length in codons of predicted N-terminal extension in human; column 6, MLOGD score (negative values are shown in red and indicate candidates subject to weak or no purifying selection); column 7, P-value for MLOGD score based on randomizations; column 8, Ka/Ks ratio (with gap-containing columns removed from the alignment); column 9, P-value for Ka/Ks based on randomizations; column 10, BLAST bits score measured on the alignment of the human and mouse extensions; column 11, probability of the ORF of the extension being preserved by chance if non-coding. Note that the MLOGD score scales with alignment length and divergence and the BLAST bits score scales with length. The final ranking is based on the average of rankings by the individual scores (MLOGD, Ka/Ks and BLAST bits). Columns 12 and 13 give information on the number of mRNA fragments protected by ribosomes for extension (column 10) and annotated CDSs (column 11). The absolute number of footprints and density of footprints are separated by a backslash. Density is calculated as the absolute number divided by the length of the mRNA fragment (extension or CDS). ‘ND’, not detected. A) Ranking of the 42 newly identified non-AUG extensions. GenBank accession numbers highlighted in green represent extensions that are conserved beyond mammals and for which all available sequences from vertebrates appear to utilize non-AUG initiation codon(s); accession numbers highlighted in light blue represent extensions that are conserved beyond mammals and for which some or all non-mammalian sequences appear to utilize AUG instead of non-AUG initiation; accession numbers highlighted in magenta represent extensions that are initiated by AUG codons in at least some mammals; accession numbers highlighted in yellow represent extensions that are conserved only in mammals (in some cases only eutherian mammals) and which are never initiated by AUG codons. B) Ranking of previously reported non-AUG extensions.
Figure 1.Five known molecular mechanisms responsible for the initiation of translation upstream of the first 5′ in-frame AUG codon. mRNAs are shown as horizontal lines. Dark grey boxes represent annotated CDS regions. Light grey boxes represent extensions of CDSs upstream of annotated AUG codons up to the closest in-frame stop codon. Black boxes denoted as P5EC represent upstream regions where codons in-frame with annotated CDSs evolve under purifying selection. Diagonal stripes are used to denote alternatively spliced exons.
Figure 3.Histogram of Ka/Ks values for mRNA sequences with known 5′ extensions. White bars represent mRNAs for which alternative transcripts with extended CDSs are known and therefore corresponding extensions are known to be translated in alternative transcripts. Sequences of these extensions are expected to evolve as protein coding sequences and were used as an internal control in this study. Black bars represent the remaining mRNAs for which it is not known whether alternative mRNA isoforms exist. Curves indicate the number of genes (y-axis) with Ka/Ks below a particular value (x-axis).
Figure 4.Scatter plots of Ka/Ks ratios for the alignments of the sequences corresponding to P5ECs from different mRNAs (y-axis) in relation to the level of protein identity (bottom panels), and the lengths of P5ECs (top panels). The right-hand panels correspond to mRNAs for which transcript variants with 5′-extended CDSs are known. The left panels correspond to the remaining mRNAs.
Figure 5.Boxplots of non-AUG CDS extension length distributions for previously known cases and those identified in this study.
Figure 6.Weblogo representation of the region surrounding the known and putative conserved non-AUG initiation sites in humans. Numbering is relative to the first nucleotide of the start codon. (A) Representation for the 42 sequences with newly identified extensions. (B) Representation for the 17 sequences with previously identified and conserved extensions. (C) Representation of all AUG start sites of humans [the frequencies for nucleotide occurrence at each position for the human mRNAs were obtained from the Transterm database (73)].
Figure 7.Plots showing density of mRNA fragments protected by ribosomes for NM_004494 and NM_001010858. The position of the annotated AUG codon was taken as zero; relative coordinates of stop codons and predicted non-AUG initiators are indicated. Regions corresponding to annotated CDSs are highlighted in dark grey; regions corresponding to non-AUG-initiated extensions are highlighted in light grey. The presence of ribosomal footprints in the region of an extension indicates that the initiation of translation takes place upstream of the annotated CDS.