| Literature DB >> 18667093 |
Michael K Tran1, Carolyn J Schultz, Ute Baumann.
Abstract
BACKGROUND: Upstream open reading frames (uORFs) can down-regulate the translation of the main open reading frame (mORF) through two broad mechanisms: ribosomal stalling and reducing reinitiation efficiency. In distantly related plants, such as rice and Arabidopsis, it has been found that conserved uORFs are rare in these transcriptomes with approximately 100 loci. It is unclear how prevalent conserved uORFs are in closely related plants.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18667093 PMCID: PMC2527020 DOI: 10.1186/1471-2164-9-361
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Tran_Figure1.eps. 'Overview of the uORFSCAN pipeline'. The pipeline consists of four steps: 1) Identifying putative orthologues using a modified reciprocal best hit (rbh) method, 2) Clustering of orthologues according to how many cereal species they are found in, 3) Using uORFSCAN program to find conserved uORFs using a comparative approach, and 4) manual curation of predicted conserved cereal and Arabidopsis uORFs.
The uORFs predicted by uORFSCAN in 5 out of 5
| Rice | Wheat | Barley | Maize | Sorghum | Avg. A.A. similarity (%) | Putative functionb | |||||
| Identifier | 5'-UTR | Identifier | 5'-UTRa | Identifier | 5'-UTRa | Identifier | 5'-UTRa | Identifier | 5'-UTRa | ||
| AK106095 | 131_9_17 | TC265929 | 113_9_16 | TC148181 | 67_9_16 | TC288369 | 131_9_17 | TC102998 | 149_9_17 | 100 | Oxidoreductase |
| AK103391c | 205_75_74 | TC269775 | 251_75_62 | TC134190 | 204_75_62 | TC294011 | 215_75_75 | TC103599 | 106_75_378 | 88 | Trehalose-6-phosphate phosphatase |
| AK100589 d,e | 240_9_334 | TC264559 | 201_9_317 | TC130707 | 228_9_318 | TC292591 | 286_9_320 | TC91317 | 260_9_329 | 50 | S-adenosylmethionine decarboxylase (AdoMetDC) |
| 248_156_179 | 209_150_168 | 236_150_169 | 294_153_168 | 268_153_177 | 90 | ||||||
| 296_108_179 | 254_105_168 | 281_105_169 | 336_111_168 | 310_111_177 | 92 | ||||||
| AK073303 | 67_9_142 | TC237149 | 75_9_113 | TC132556 | 81_9_139 | TC305609 | 127_9_69 | TC102988 | 222_9_69 | 50 | Alkaline phytoceramidase |
| 135_9_74 | 75_9_113 | 81_9_139 | 127_9_69 | 222_9_69 | 50 | ||||||
| AK072868 e | 249_27_248 | TC247418 | 258_27_266 | TC139536 | 298_27_272 | TC306591 | 444_27_564 | TC102544 | 331_27_265 | 11 | CBL-interacting protein kinase |
| 259_195_70 | 268_198_85 | 308_198_91 | 260_192_583 | 341_195_87 | 29 | ||||||
| 269_39_216 | 278_39_234 | 318_39_240 | 768_39_228 | 351_39_233 | 8 | ||||||
| 338_90_96 | 347_93_111 | 387_93_117 | 576_93_366 | 420_90_113 | 10 | ||||||
| 392_36_96 | 404_36_111 | 444_36_117 | 633_36_366 | 474_36_113 | 8 | ||||||
| AK072649 | 100_192_117 | TC236348 | 79_192_117 | TC133316 | 76_192_93 | TC305793 | 180_192_116 | TC93140 | 168_192_116 | 78 | Ribosomal protein S6 kinase |
| AK066145 | 178_12_58 | TC266262 | 149_12_73 | TC134484 | 154_12_231 | TC286452 | 224_12_70 | TC94546 | 187_12_69 | 33 | Ubiquitin-fold protein |
| AK064792 | 276_15_187 | TC267323 | 254_15_188 | TC132983 | 253_15_-9 | TC306152 | 263_15_170 | TC107743 | 230_15_150 | 87 | F9L1.29 protein |
| AK060523 | 173_123_185 | TC235416 | 201_126_157 | TC148319 | 211_120_163 | TC305149 | 255_129_195 | TC103609 | 240_129_212 | 58 | Ankyrin-3 |
a Pre-orf distance_uORF length_intercistronic distance.
b Functional annotation based on "The UniProt Knowledgebase (UniProt)" database.
c One of several genes (identifiers) that are in multiple tables because different conserved uORFs were identified in the different datasets.
d Previously reported upstream open reading frames (see 'Introduction' section on AdoMetDC).
e Contain one or more nested uORFs.
Ribosomal rRNA genes have been removed.
See Table 2 for criteria for verifying rice uORFs in 5 out of 5.
Criteria for verifying rice uORFs in 5 out of 5
| Accession | FL- cDNAa | Upstream & In-frame stop codon | Agreement with genome annotationb | Alignment of uORFSCAN identified main proteins with UniProt proteinsc | uORF valid | |||||
| UniProt protein length (AA) | Align length (AA) | Identities (%) | Expect | Annotation | GO classication | |||||
| AK106095 | Yes | Yes | Yes | 392 | 392 | 100 | 2.2e-217 | Oxidoreductase | Yes | |
| AK103391 | Yes | Yes | Yes | 371 | 371 | 100 | 3.4e-194 | Trehalose-6-phosphate phosphatase | Yes | |
| AK100589 | Yes | Yes | Yes | 398 | 398 | 100 | 1.1e-215 | AdoMetDC | Yes | |
| AK073303 | Yes | Yes | Yes | 257 | 257 | 100 | 1.6e-141 | Acyl-CoA independent ceramide synthase | Yes | |
| AK072868 | Yes | Yes | Yes | 443 | 443 | 100 | 3.6e-238 | uncharacterized protein (probable CBL-interacting serine/threonine-protein kinase 15) | Yes | |
| AK072649 | Yes | Yes | Yes | 480 | 488 | 76 | 9.6e-199 | Ribosomal protein S6 Kinase | Yes | |
| AK066145 | No | Yes | Yes | 119 | 119 | 100 | 1.3e-59 | Membrane-anchored ubiquitin-fold protein | Yes | |
| AK064792 | Yes | Yes | 197d | 109 | 108 | 57 | 8.4e-26 | F9L1.29 protein | Not available | Yes |
| AK060523 | Yes | Yes | Yes | 166 | 166 | 100 | 1.9e-88 | uncharacterized protein (probable ankyrin-3) | Yes | |
a Used rice cDNA in blastn search against "NCBI EST_Others" database (rice) to search for longer 5' ESTs.
b Used rice cDNA in blastn search against "TIGR Rice Genome Annotation DB: Coding Sequences" database to verify the cDNA ORF.
c Translated the rice cDNA in the same frame as the main open reading frame identified by uORFSCAN (include translations upstream of predicted start Methionine). The resulting protein sequence was used in a blastp search against "The UniProt Knowledgebase (UniProt)" database.
d The genome annotation for the CDS is longer by the indicated number of base pairs.
Rice uORFs predicted by uORFSCAN that are conserved in Arabidopsis
| Rice | Arabidopsis | Avg. A.A. similarity (%) | Putative functionb | ||
| Identifier | 5'-UTRa | Identifier | 5'-UTRa | ||
| AK101100 | 142_12_21c,d | AT1G51690.1 | 555_12_1160 | 33 | Protein phosphatase 2a |
| AK066952 | 365_66_182 | AT3G13225.1 | 364_63_431 | 27 | WW domain containing protein |
| 368_63_182e | 364_63_431 | 29 | |||
| 503_51_59 | 553_51_254 | 1 | |||
| AK119592 | 304_90_148c,d | AT3G01470.1 | 162_87_120 | 36 | Homeodomain leucine zipper protein |
| AK100589 | 248_156_179c,d | AT3G02470.3 | 222_156_154 | 82 | S-Adenosylmethionine decarboxylase |
| AK103391 | 176_30_148c,d,f | AT4G22590.1 | 254_30_137 | 44 | Trehalose-6-phosphate phosphatase |
| 205_75_74g | 283_75_63g | 71 | |||
| AK069534 | 813_9_432 | AT4G12770.1 | 41_9_108 | 50 | Auxilin-like protein |
| AK069526 | 214_126_544c | AT4G19110.2 | 255_126_527 | 44 | GAMYB-binding protein |
| 690_9_185c | 603_9_296 | 50 | |||
| 820_36_28 | 398_36_474 | 17 | |||
| AK072868 | 338_90_96c,d | AT5G58380.1 | 11_87_295 | 17 | CBL-interacting protein kinase |
| AK060523 | 173_123_185c | AT5G07840.1 | 289_117_250 | 36 | Ankyrin-3 |
| 313_93_250e | 44 | ||||
| 206_90_185e | 313_93_250e | 33 | |||
| AK067412 | 222_84_49c,h | AT5G50180.1 | 357_84_79 | 4 | Protein kinase ATN1 |
| AK102277 | 228_117_150c | AT1G68550.1 | 309_96_95 | 21 | Hypothetical protein |
a Pre-uORF distance_uORF length_intercistronic distance.
b Functional annotation based on "the UniProt Knowledgebase (uniProt)" database.
c Rice uORF is conserved in at least two orthologous cereal and Arabidopsis genes.
d Rich in serine (at least 20%).
e Nested uORF.
f One of several genes (identifiers) that are in multiple tables because different conserved uORFs were identified in the different datasets.
g Overlapping uORF.
h Rice in arginine (at least 20%).
Ribosomal rRNA genes have been removed.
Rows in italics are false positive predictions (see Table 4. Criteria for verifying rice uORFs that are conserved in Arabidopsis)
Criteria for verifying rice uORFs that are conserved in Arabidopsis
| Accession | FL- cDNAa | Upstream & In-frame stop codon | Agreement with genome annotationb | Alignment of uORFSCAN identified main proteins with UniProt proteinsc | uORF valid | |||||
| UniProt protein length (AA) | Align length (AA) | Identities (%) | Expect | Annotation | GO classication | |||||
| AK101100 | Yes | Yes | Yes | 525 | 525 | 100 | 5.0e-287 | Protein phosphatase 2A | Yes | |
| AK066952 | Yes | Yes | Yes | 860 | 694 | 99 | 0 | WW domain containing protein | Not available | Yesd |
| AK119592 | Yes | Yes | Yes | 343 | 343 | 100 | 6.8e-187 | Homeodomain leucine zipper protein | Yes | |
| AK100589 | Yes | Yes | Yes | 398 | 398 | 100 | 1.1e-215 | S-Adenosylmethionine decarboxylase | Yes | |
| AK103391 | Yes | Yes | Yes | 371 | 371 | 100 | 3.3e-194 | Trehalose-6-phosphate phosphatase | Yes | |
| AK069534 | Yes | Yes | 1066e | 485 | 413 | 61 | 7.6e-117 | Auxilin-like protein | Not available | Yesf |
| AK069526 | Yes | Yes | Yes | 483 | 483 | 83 | 5.8e-256 | GAMYB-binding protein | Yes | |
| AK072868 | Yes | Yes | Yes | 443 | 443 | 100 | 3.5e-238 | CBL-interacting kinase 15 | Yes | |
| AK060523 | No | Yes | Yes | 166 | 166 | 99 | 8.2e-88 | Ankyrin-3 | Yes | |
| AK067412 | Yes | Yes | Yes | 353 | 353 | 72 | 1.2e-136 | Protein kinase ATN1 | Yes | |
| AK102277 | Yes | Yes | Yes | 338 | 338 | 99 | 4.9e-179 | Hypothetical protein | Not available | Yes |
| AK100332 | Yes | Yes | 4092e | 2192 | 872 | 30 | 5.3e-28 | Helicase | Nog | |
| AK059639 | No | Yes | Yes | 154 | 154 | 100 | 2.6e-77 | 40S ribosomal s15 protein | Noh | |
a Used rice cDNA in blastn search against "NCBI EST_Others" database (rice) to search for longer 5' ESTs.
b Used rice cDNA in blastn search against "TIGR Rice Genome Annotation DB: Coding Sequences" database to verify the cDNA ORF.
c Translated the rice cDNA in the same frame as the main open reading frame identified by uORFSCAN (include translations upstream of predicted start Methionine). The resulting protein sequence was used in a blastp search against "The UniProt Knowledgebase (UniProt)" database.
d The protein data suggests that the main open reading frame predicted by uORFSCAN extends further upstream, but does not overlap the predicted uORFs and so the uORFs are still valid.
e The genome annotation for the CDS is longer by the indicated number of base pairs.
f A shorter protein was identified, but does not overlap the predicted uORFs and so the uORFs are still valid.
g A longer protein was identified indicating the main open reading frame extends further upstream, and does overlap the predicted uORFs and so the uORFs are not valid.
h Possibly not functional because pre-orf distance is less than 20 nucleotides that is thought to be required for translation initiation.
Figure 2Tran_Figure2.eps. 'The position of conserved uORFs within their 5'-UTRs'. It contains rice uORFs conserved in four other cereals and in Arabidopsis.
Figure 3Tran_Figure3.eps. 'A frequency distribution of the length (nt) of rice uORFs conserved in four other cereals and in Arabidopsis.
Comparison of conserved cereal uORFs and their main ORF start context'
| In five cereals | ||||||
| Identifier | uORF1 | uORF2 | uORF3 | uORF4 | uORF5 | Main ORF |
| AK106095 | 131_9_17a CCGATGCb | 157_1179 CCCATGG | ||||
| AK103391 | 205_75_74 TTGATGA | 354_1116 CAAATGG | ||||
| AK100589 | 240_9_334 TGGATGT | 248_156_179 CTAATGG | 296_108_179 TTGATGT | 583_1197 CCAATGG | ||
| AK073303 | 67_9_142 TCCATGC | 135_9_74 CTCATGA | 218_774 AGCATGG | |||
| AK072868 | 249_27_248 GGAATGC | 259_195_70 AAGATGT | 269_39_216 TGCATGC | 338_90_96 TTCATGA | 392_36_96 ACTATGG | 524_1332 GTGATGG |
| AK072649 | 100_192_117 CTCATGA | 409_1443 AAGATGG | ||||
| AK066145 | 178_12_58 GCTATGG | 248_360 GAGATGG | ||||
| AK064792 | 276_15_187 CGGATGC | 478_330 GGAATGG | ||||
| AK060523 | 173_123_185 ACTATGG | 481_501 CGGATGG | ||||
| In rice and arabidopsis | ||||||
| Identifier | uORF1 | uORF2 | uORF3 | uORF4 | uORF5 | Main ORF |
| AK101100 | 142_12_21 GCCATGG | 175_1578 AAGATGG | ||||
| AK066952 | 365_66_182 CCAATGA | 368_63_182 ATGATGA | 503_51_59 CTGATGA | 613_2085 GGGATGC | ||
| AK119592 | 304_90_148 CCGATGA | 542_1032 GCGATGG | ||||
| AK100589 | 248_156_179 CTAATGG | 583_1197 CCAATGG | ||||
| AK103391 | 176_30_148 AACATGA | 205_75_74 TTGATGA | 354_1116 CAAATGG | |||
| AK069534 | 813_9_432 TCGATGA | 1254_1602 GAGATGC | ||||
| AK069526 | 214_126_544 GATATGG | 690_9_185 TTGATGG | 820_36_28 CATATGA | 884_1455 AAAATGG | ||
| AK072868 | 338_90_96 TTCATGA | 524_1332 GTGATGG | ||||
| AK060523 | 173_123_185 ACTATGG | 206_90_185 CCGATGC | 481_501 CGGATGG | |||
| AK067412 | 222_84_49 CTGATGC | 355_1059 GGGATGG | ||||
| AK102277 | 228_117_150 TCTATGC | 495_1017 GAAATGG | ||||
a Pre-orf distance_uORF length_intercistronic distance.
b uORF or mainORF sequence context from -3 position to +4.