| Literature DB >> 16168086 |
Barnali N Chaudhuri1, Todd O Yeates.
Abstract
In several natural settings, the standard genetic code is expanded to incorporate two additional amino acids with distinct functionality, selenocysteine and pyrrolysine. These rare amino acids can be overlooked inadvertently, however, as they arise by recoding at certain stop codons. We report a method for such recoding prediction from genomic data, using read-through similarity evaluation. A survey across a set of microbial genomes identifies almost all the known cases as well as a number of novel candidate proteins.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16168086 PMCID: PMC1242214 DOI: 10.1186/gb-2005-6-9-r79
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Schematic representation of the selenocysteine insertion machinery and the selenoprotein detection scheme. (a) A cartoon diagram of selenocysteine incorporation during protein translation inside the cell. The selenocysteine-specific elongation factor (SelB; pink) is shown interacting with the selenocysteine insertion sequence (SECIS) hairpin element in the mRNA and tRNA-sec (SelC). The anticodon of SelC tRNA interacts with and recognizes the 'UGA' codon. The ribosome and other components of the translational machinery are omitted for clarity. (b) Schematic representation of the 'read-through similarity analysis' approach. The top BLAST hit is shown in blue. The window lengths used for the BLAST search and read-through similarity evaluation are marked in the drawing. (c) A flow chart describing how the different components of the predictive scheme are combined for selenoprotein prediction. ORF, open reading frame.
A list of predicted selenoproteins encoded by UGA read-through
| Accession ID | Organism | Computationally identified selenoproteins* annotated by their homologs |
| AE000657 | 1. gi|12515210|gb|AAG56295.1|AE005358_3 formate dehydrogenase-N, nitrate-inducible, alpha subunit [ | |
| 2. gi|51589698|emb|CAH21328.1| selenide, water dikinase [ | ||
| AE017125 | 1.gi|27362035|gb|AAO10941.1|AE016805_198 formate dehydrogenase, alpha subunit [ | |
| 2. gi|46914191|emb|CAG20971.1| putative selenophosphate synthase [ | ||
| AE017143 | 1. gi|26108424|gb|AAN80626.1|AE016761_201 selenide, water dikinase [ | |
| AE004439 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AE005674 | ||
| 3. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| 4. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| 5. gi|3868721|gb|AAD13462.1| selenopolypeptide subunit of formate dehydrogenase H; formate dehydrogenase H, selenopolypeptide subunit [ | ||
| AE014073 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| 3. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| 4. gi|3868721|gb|AAD13462.1| selenopolypeptide subunit of formate dehydrogenase H; formate dehydrogenase H, selenopolypeptide subunit [ | ||
| AE006469 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AE008691 | 1. gi|41816370|gb|AAS11237.1| glycine reductase complex selenoprotein GrdA [ | |
| 2. gi|51857693|dbj|BAD41851.1| glycine reductase complex selenoprotein B [ | ||
| 3. gi|46914191|emb|CAG20971.1| putative selenophosphate synthase [ | ||
| AE014075 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| 2. gi|56130341|gb|AAV79847.1| formate dehydrogenase H [ | ||
| 3. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| BA000007 | 1. gi|56130341|gb|AAV79847.1| formate dehydrogenase H [ | |
| 2. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| 3. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| U00096 | ||
| 2. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| 3. gi|56130341|gb|AAV79847.1| formate dehydrogenase H [ | ||
| 4. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| AE014299 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AE015451 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AE004091 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AE016958 | ||
| 2. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| AE017042 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AE009952 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AL590842 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| AE017180 | 1. gi|19918170|gb|AAM07420.1| 4-carboxymuconolactone decarboxylase [ | |
| 2. gi|21956737|gb|AAM83670.1|AE013608_5 glutaredoxin 3 [ | ||
| 3. gi|37201109|dbj|BAC96933.1| thiol-disulfide isomerase and thioredoxins [ | ||
| 4. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| 5. gi|34105000|gb|AAQ61356.1| conserved hypothetical protein [ | ||
| 6. gi|46914191|emb|CAG20971.1| Putative selenophosphate synthase [ | ||
| 7. gi|32448022|emb|CAD77542.1| peroxiredoxin [ | ||
| 8. gi|29605647|dbj|BAC69712.1 hypothetical protein [ | ||
| 9. gi|34482757|emb|CAE09757.1| sulfur transferase precursor [ | ||
| AE017226 | 1. gi|51857694|dbj|BAD41852.1| glycine reductase complex selenoprotein A [ | |
| 2. gi|51857693|dbj|BAD41851.1| glycine reductase complex selenoprotein B [ | ||
| 3. gi|56380162|dbj|BAD76070.1| glutathione peroxidase [ | ||
| 4. gi|51857693|dbj|BAD41851.1| glycine reductase complex selenoprotein B [ | ||
| 5. gi|26108424|gb|AAN80626.1|AE016761_201 selenide, water dikinase [ | ||
| 6. gi|52209545|emb|CAH35498.1| thioredoxin 1 [ | ||
| AL111168 | 1. gi|27362035|gb|AAO10941.1|AE016805_198 formate dehydrogenase, alpha subunit [ | |
| 2. gi|54018125|dbj|BAD59495.1| hypothetical protein [ | ||
| AL513382 | 1. gi|3868721|gb|AAD13462.1| selenopolypeptide subunit of formate dehydrogenase H; formate dehydrogenase H, selenopolypeptide subunit [ | |
| 2. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| AE006468 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| 2. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | ||
| 3. gi|3868721|gb|AAD13462.1| selenopolypeptide subunit of formate dehydrogenase H; formate dehydrogenase H, selenopolypeptide subunit [ | ||
| BA000016 | 1. gi|28202985|gb|AAO35429.1| conserved protein [ | |
| 2. gi|46914191|emb|CAG20971.1| putative selenophosphate synthase [ | ||
| BX470251 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase alpha subunit [ | |
| BX571656 | 1. gi|27362035|gb|AAO10941.1|AE016805_198 formate dehydrogenase, alpha subunit [ | |
| L42023 | 1. gi|2983532|gb|AAC07107.1| formate dehydrogenase, alpha subunit [ | |
| 2. gi|26108424|gb|AAN80626.1|AE016761_201 selenide, water dikinase [ | ||
| CR354531 | ||
| CR354532 | 1. gi|41816370|gb|AAS11237.1| glycine reductase complex selenoprotein GrdA [ | |
| 2. gi|51589698|emb|CAH21328.1| selenide, water dikinase [ | ||
| 3. gi|41816370|gb|AAS11237.1| glycine reductase complex selenoprotein GrdA [ | ||
| 4. gi|41818450|gb|AAS12639.1| glycine reductase complex selenoprotein GrdB2 [ | ||
| AE009439 | 1. gi|2622673|gb|AAB86026.1| formate dehydrogenase, alpha subunit homolog [ | |
| 2. gi|57160335|dbj|BAD86265.1| probable formate dehydrogenase, alpha subunit [ | ||
| 3. gi|33566318|emb|CAE37231.1| putative iron-sulfur binding protein [ | ||
| 4. gi|44921146|emb|CAF30381.1| heterodisulfide reductase, subunit A [ | ||
| 5. gi|44921142|emb|CAF30377.1| coenzyme F420-non-reducing hydrogenase, subunit delta [ | ||
| 6. gi|45047811|emb|CAF30938.1| coenzyme F420-reducing hydrogenase subunit alpha [ | ||
| 7. gi|39576202|emb|CAE80367.1| selenide, water dikinase [ | ||
| L77117 | 1. gi|44921146|emb|CAF30381.1| heterodisulfide reductase subunit A [ | |
| 2. gi|45047811|emb|CAF30938.1| coenzyme F420-reducing hydrogenase subunit alpha [ | ||
| 3. gi|50875900|emb|CAG35740.2| methyl-viologen-reducing hydrogenase, delta subunit [ | ||
| 4. gi|2622240|gb|AAB85625.1| methyl viologen-reducing hydrogenase, delta subunit [ | ||
| 5. gi|2622673|gb|AAB86026.1| formate dehydrogenase, alpha subunit homolog [ | ||
| 6. gi|26108424|gb|AAN80626.1|AE016761_201 selenide, water dikinase [ | ||
| 7. gi|53758707|gb|AAU92998.1| HesB/YadR/YfhF family protein [ | ||
| 8. gi|45047727|emb|CAF30854.1| formate dehydrogenase, alpha subunit [ | ||
| BX950229 | 1. gi|2622673|gb|AAB86026.1| formate dehydrogenase, alpha subunit homolog [ | |
| 2. gi|2622673|gb|AAB86026.1| formate dehydrogenase, alpha subunit homolog [ | ||
| 3. gi|2622240|gb|AAB85625.1| methyl viologen-reducing hydrogenase, delta subunit [ | ||
| 4. gi|2622673|gb|AAB86026.1| formate dehydrogenase, alpha subunit homolog [ | ||
| 5. gi|2622673|gb|AAB86026.1| formate dehydrogenase, alpha subunit homolog [ | ||
| 6. gi|19886593|gb|AAM01482.1| Heterodisulfide reductase, subunit A, polyferredoxin [ |
Organism names, National Center for Biotechnology Information accession numbers for the genomes and the top PSI-BLAST hit(s) from our database are shown. Seven novel candidate selenoproteins are shown in bold type. *Each entry corresponds to a computationally identified read-through protein in the organism indicated to the left. FASTA files for these recoded protein sequences are provided in the Additional file 2. For each recoded protein, the GI number and the functional annotation for a homologous protein are given.
Methyltransferases predicted to encode pyrrolysine by UAG read-through in a set of methanogenic archaea
| Organism | Computationally identified pyrrolysine-proteins* annotated by their homologs |
| 1. gi|56678713|gb|AAV95379.1| trimethylamine methyltransferase family protein [ | |
| 2. gi|14247242|dbj|BAB57633.1| menaquinone biosynthesis methyltransferase [ | |
| 3. gi|36785418|emb|CAE14364.1| protein methyltranferase [ | |
| 4. gi|56679325|gb|AAV95991.1| trimethylamine methyltransferase family protein [ | |
| 5. i|20904823|gb|AAM30145.1| SAM-dependent methyltransferases [ | |
| 6. gi|56312282|emb|CAI06927.1| predicted methyltransferase [ | |
| 7. gi|45047608|emb|CAF30735.1| generic methyltransferase [ | |
| 8. gi|20905508|gb|AAM30766.1| methylcobalamin: Coenzyme M methyltransferase [ | |
| 9. Predicted ORF monomethylamine methyltransferase [ | |
| 10. Predicted ORF monomethylamine methyltransferase [ | |
| 11. Predicted ORF dimethylamine methyltransferase [ | |
| 12. Predicted ORF dimethylamine methyltransferase [ | |
| 13. Predicted ORF dimethylamine methyltransferase [ | |
| 1. gi|19914316|gb|AAM03972.1| trimethylamine methyltransferase [ | |
| 2. gi|19914320|gb|AAM03976.1| dimethylamine methyltransferase [ | |
| 3. gi|19914753|gb|AAM04365.1| trimethylamine methyltransferase [ | |
| 4. gi|19913899|gb|AAM03597.1| monomethylamine methyltransferase [ | |
| 5. gi|19914755|gb|AAM04366.1| dimethylamine methyltransferase [ | |
| 6. gi|19914320|gb|AAM03976.1| dimethylamine methyltransferase [ | |
| 7. gi|19913899|gb|AAM03597.1| monomethylamine methyltransferase [ | |
| 1. gi|19914320|gb|AAM03976.1| dimethylamine methyltransferase [ | |
| 2. gi|19913899|gb|AAM03597.1| monomethylamine methyltransferase [ | |
| 3. gi|19914316|gb|AAM03972.1| trimethylamine methyltransferase [ | |
| 4. gi|19914320|gb|AAM03976.1| dimethylamine methyltransferase [ | |
| 5. gi|19914334|gb|AAM03988.1| protein-L-isoaspartate (D-aspartate) O-methyltransferase [ | |
| 6. gi|19913899|gb|AAM03597.1| monomethylamine methyltransferase [ | |
| 7. gi|19913899|gb|AAM03597.1| monomethylamine methyltransferase [ | |
| 1. gi|19914320|gb|AAM03976.1| dimethylamine methyltransferase [ | |
| 2. gi|19914753|gb|AAM04365.1| trimethylamine methyltransferase [ | |
| 3. gi|5458504|emb|CAB49992.1| methlytransferase, putative [ | |
| 4. gi|5458504|emb|CAB49992.1| methlytransferase, putative [ | |
| 5. gi|19914320|gb|AAM03976.1| dimethylamine methyltransferase [ | |
| 6. gi|19914753|gb|AAM04365.1| trimethylamine methyltransferase [ | |
| 7. gi|19913899|gb|AAM03597.1| monomethylamine methyltransferase [ |
*Each entry corresponds to a computationally identified read-through protein in the organism indicated to the left. FASTA files for these recoded protein sequences are provided in the Additional data files. For each recoded protein, the GI number and the functional annotation for a homologous protein are given. †These open reading frames (ORFs) in M. acitovorans were predicted during a repeat search using a BLAST database containing putative methylamine methyltransferase ORFs in M. mazei as identified by our method. Although the M. acitovorans genome was annotated for several pyrrolysine-containing methylamine methyltranferases, this was not the case with the M. mazei genome. Thus, several methyltransferases that are specific to these methanosarcina species could not be detected in our original calculation due to the lack of read-through homologs. Such repeat searches were not performed for the two unfinished genomes.
Figure 2An overview of the predicted selenoproteome. (a) A Venn diagram representation of the overlap between the known selenoproteins in the RECODE database (bold line) and the results of our prediction method (plain line) over the same set of organisms as included in RECODE. (b) A pie chart illustrating the types of selenoproteins in our predicted dataset. The dataset was divided into the following groups: formate dehydrogenase (FDH) family enzymes; archaeal methanogenesis selenoproteins (excluding the FDH family); selenophosphate synthetase (SelD); other known selenoproteins (for example, thioredoxin, hesB); glycine reductase genes (GRD); and new candidate selenoproteins. (c) A section of the multiple sequence alignments (MSA) of the newly predicted candidate selenoprotein from P. profundum with its four homologs found in our database. Note the alignment of putative selenocysteine (U denotes selenocysteine) with cysteine residues in the MSA. (d) The MSA of a selenoprotein formylmethanofuran dehydrogenase from M. maripaludis in which the recoded selenocysteine aligns with a set of conserved aspartate residues rather than the cysteine residues. The MSA illustrations were prepared using ALSCRIPT [39].
Figure 3Representatives of the putative selenocysteine insertion sequence (SECIS) hairpin elements in various genomes as identified by the present study. (a) The SECIS elements from the genes coding for the following proteins from P. profundum: 1, glycine reductase GrdA; 2, glycine reductase GrdB2; 3, glycine reductase GrdA; 4, selenophosphate synthetase (SelD); 5, a hypothetical protein. (b) The SECIS elements from the genes coding for the following proteins from E. coli: 1, formate dehydrogenase; 2, formate dehydrogenase-N; 3, formate dehydrogenase-O.
Figure 4Sections of the multiple sequence alignments of the putative pyrrolysine-containing proteins. (a) A protein known to use UAG read-through, methylamine methyltransferase from M. acetivorans. (b) A putative methyltransferase from M. burtonii. (c) A predicted read-through ORF homologous to a cobalamin biosynthesis protein CobN (gi|20906100|gb|AAM31298.1|, Methanosarcina mazei Goe1) from M. acetivorans. Note the alignment of presumed pyrrolysine residues (denoted as X) with various amino acids.