| Literature DB >> 17597547 |
Masashi Fujita1, Hisaaki Mihara, Susumu Goto, Nobuyoshi Esaki, Minoru Kanehisa.
Abstract
BACKGROUND: Selenocysteine and pyrrolysine are the 21st and 22nd amino acids, which are genetically encoded by stop codons. Since a number of microbial genomes have been completely sequenced to date, it is tempting to ask whether the 23rd amino acid is left undiscovered in these genomes. Recently, a computational study addressed this question and reported that no tRNA gene for unknown amino acid was found in genome sequences available. However, performance of the tRNA prediction program on an unknown tRNA family, which may have atypical sequence and structure, is unclear, thereby rendering their result inconclusive. A protein-level study will provide independent insight into the novel amino acid.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17597547 PMCID: PMC1914089 DOI: 10.1186/1471-2105-8-225
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Basic ideas of the prediction method. (a) Schematic illustration of an interrupted ORF (iORF). (b) Readthrough genes can be distinguished from two adjacent genes based on the results of BLAST searches. Boxes denote iORFs, and × indicates the inframe stop codon. Shaded regions represent actual protein-coding regions. If an iORF codes a readthrough protein, BLAST hits from other organisms will cover the inframe stop codon. In contrast, if the iORF consists of two adjacent genes, many hits that do not cover the inframe stop codon will be found.
Figure 2A flowchart of the prediction procedure. Several steps are omitted for simplicity. Detailed explanation is given in the text.
Predicted clusters of readthrough proteins
| Cluster description | Codon | Size | Example organism (locus) |
| Formate dehydrogenase α subunit | TGA | 45 | Escherichia coli (b1474) |
| Selenide water dikinase | TGA | 12 | Haemophilus influenzae (HI0200m) |
| Glycine reductase complex selenoprotein A | TGA | 6 | Treponema denticola (TDE0745) |
| Glycine reductase complex selenoprotein B | TGA | 6 | Treponema denticola (TDE0078) |
| Heterodisulfide reductase subunit A | TGA | 6 | Methanococcus jannaschii (MJ1190m) |
| Coenzyme F420-reducing hydrogenase δ subunit | TGA | 5 | Methanococcus jannaschii (MJ1190a) |
| Formylmethanofuran dehydrogenase subunit B | TGA | 4 | Methanococcus jannaschii (MJ1194m) |
| Glutaredoxin-like | TGA | 3 | Carboxydothermus hydrogenoformans (CHY_0740) |
| Thioredoxin | TGA | 3 | Geobacter sulfurreducens (GSU3446) |
| Coenzyme F420-reducing hydrogenase α subunit | TGA | 3 | Methanococcus jannaschii (MJ0029) |
| HesB family | TGA | 3 | Desulfovibrio vulgaris (DVU_1382) |
| HesB family | TGA | 2 | Methanococcus maripaludis (MMP0252 + upstream) |
| Fe-S oxidoreductase | TGA | 2 | Desulfotalea psychrophila (DP1009) |
| DsbA-like | TGA | 2 | Desulfovibrio desulfuricans (Dde_1263 + upstream) |
| Periplasmic [NiFeSe] hydrogenase large subunit | TGA | 2 | Desulfovibrio vulgaris (DVU_1918) |
| Monomethylamine methyltransferase | TAG | 7 | Methanosarcina acetivorans (MA0144) |
| Dimethylamine methyltransferase | TAG | 7 | Methanosarcina acetivorans (MA0532) |
| Trimethylamine methyltransferase | TAG | 6 | Methanosarcina acetivorans (MA0528) |
| Transcriptional regulator, TetR family | TAG | 2 | Methanosarcina acetivorans (MA2902) |
| Cytochrome c family protein | TGA | 2 | Geobacter sulfurreducens (GSU2937 + GSU2936) |
| Hypothetical protein | TAG | 2 | Geobacter sulfurreducens (GSU2293 + downstream) |
A plus sign in a locus indicates that the genomic coordinates of the iORF can be described by a concatenation of two genes or regions. For example, "GSU2293 + downstream" means that the iORF consists of the gene GSU2293 and its downstream sequence. HesB family was not clustered into one family, because their sequences were too short and diverged.
Figure 3Selenoprotein families we failed to detect because of nonconserved location of stop codons. Selenocysteine residues of Peroxiredoxin-like protein families constitute homologous redox motifs (TXXU and UXXC), but their positions are different between two families. Columns are colored according to sequence conservation. Selenocysteine residues are shown in red, and the other residues in the redox motifs are shown in yellow. Prx; Peroxiredoxin, TPO; thiol:protein disulphide oxidereductase, Adeh; Anaeromyxobacter dehalogenans, Gmet; Geobacter metallireducens, Gsul; G. sulfurreducens, Dpsy; Desulfotalea psychrophila. The alignments were computed using ClustalW, and the figures were generated using Jalview.
Figure 4Multiple sequence alignments of novel candidate proteins. (a) A selenoprotein candidate from Geobacter sulfurreducens and its homologs. The possible selenocysteine residues are shown in red, and putative heme-binding motifs are underlined. Note that sequence conservation near the selenocysteine is comparable to that of the N-terminal cytochrome domain. A protein Dpro_2 contains yet another inframe stop codon (TAG) at the column 189. It will be either a sequencing error or a pseudogene. Gsul; G. sulfurreducens, Gmet; G. metallireducens, Gura; G. uraniumreducens, Gfrc; Geobacter sp. FRC-32, Dace; Desulfuromonas acetoxidans, Dpro; Delta proteobacterium MLMS-1. (b) Hypothetical proteins from Geobacter species. The inframe stop codons (TAG) are shown in red. This cluster is probably an artifact of close phylogenetic relationship.
Figure 5Discrepancies of stop codon usages between the inframe and C-terminal stop codons. The inframe stop codon usage is taken from the pre-filtering clusters, and the C-terminal usage is computed based on the annotated proteins of the organism. Red circle: an organism with pyrrolysine, blue; selenocysteine, yellow; both pyrrolysine and selenocysteine, white; neither pyrrolysine nor selenocysteine. The organisms are ordered by their discrepancy scores. The discrepancy score is the negative logarithm of a p-value of Fisher's exact test. The dotted line indicates significance level 0.05 after a correction for multiple testing.