| Literature DB >> 23557142 |
Christian Theil Have1, Sine Zambach, Henning Christiansen.
Abstract
BACKGROUND: Pyrrolysine (the 22nd amino acid) is in certain organisms and under certain circumstances encoded by the amber stop codon, UAG. The circumstances driving pyrrolysine translation are not well understood. The involvement of a predicted mRNA structure in the region downstream UAG has been suggested, but the structure does not seem to be present in all pyrrolysine incorporating genes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23557142 PMCID: PMC3639795 DOI: 10.1186/1471-2105-14-118
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Illustration of the pipeline for identifying pyrrolysine containing genes. The process extracts iORFs which are then clustered using blast. Finally, the clusters are ranked according to several features.
Table of potential Pyl-coding organisms and their pyrrolysine connection
| Methanosarcina acetivorans | Archaea (Methanosarcinaceae) | Yes | Yes[ | Yes | Yes | Yes |
| Methanococcoides burtonii | Archaea (Methanosarcinaceae) | Yes | Yes[ | Yes | Yes | Yes |
| Methanosarcina barkeri | Archaea (Methanosarcinaceae) | Yes | Yes[ | Yes | Yes | Yes |
| Methanosarcina mazei | Archaea (Methanosarcinaceae) | Yes | Yes[ | Yes | Yes | Yes |
| Methanohalophilus mahii | Archaea (Methanosarcinaceae) | Yes | Verified by Infernal | Yes | Yes | Yes |
| Methanohalobium evestigatum | Archaea (Methanosarcinaceae) | Yes | Verified by Infernal | Yes | Yes | Yes |
| Methanosarcina thermophila | Archaea (Methanosarcinaceae) | Yes | Verified by Infernal | Yes | No | Yes |
| Methanosalsum zhilinae | Archaea (Methanosarcinaceae) | Yes | Verified by Infernal | Yes | Yes | Yes |
| Desulfitobacterium hafniense | Bacteria (Clostridia) | Yes | Yes[ | Yes | Yes | Yes |
| Desulfitobacterium autotrophicum | Bacteria (Deltaproteobacteria) | Yes | Yes[ | Yes | Yes | Yes |
| Desulfotomaculum acetoxidans | Bacteria( Clostridia) | Yes | Verified by Infernal | Yes | Yes | Yes |
| Bilophila wadsworthia | Bacteria (Deltaproteobacteria) | Yes | - | Yes | No | Yes |
| Acetohalobium arabaticum | Bacteria (Clostridia) | Yes | Verified by Infernal | Yes | Yes | Yes |
| Thermincola potens | Bacteria (Clostridia) | Yes | Verified by Infernal | Yes | Yes | Yes |
| Desulfosporosinus orientis | Bacteria (Clostridia) | - | Verified by Infernal | Yes | Yes | Yes |
| Desulfotomaculum gibsoniae | Bacteria (Clostridia) | - | - | Yes | No | Yes |
| Desulfosporosinus meridiei | Bacteria (Clostridia) | - | - | Yes | No | Yes |
| Geodermatophilus obscurus | Bacteria (Actinobacteria) | - | Verified by Infernal* | - | No | - |
The organisms marked in bold are used in further analyses. We exclude the organisms for which a whole genome sequence is not currently available. We furthermore exclude organisms which do not have both a transfer RNA with UAG anticodon (tRNApyl) and a corresponding tRNA synthetase (pylS).
* No UAG-anticodon.
Figure 2Correlations between cluster features. Each cluster is represented as a point in each of the panels. Each panel shows the correlation between a pair of feature values. Known pyrrolysine gene clusters are marked with colors: mttB (red), mtbB (orange), mtmB (yellow), transposase1 (green), transposase2 (blue) and TetR (purple). Unknown clusters are white. Feature combinations with discriminatory potential display significant separation between the bulk of white clusters (the majority of which are false positives) and the colored clusters (true positives). A separation is apparent in panel 4.3 which shows the combination of the f and f features.
Cluster ranking
| 11 | 9 | 10 | 2 | 1 | 2 | - | |
| 32.2 | 18.8 | 25.2 | 5.4 | 0.0 | 18.0 | - | |
| 19.0 | 37.6 | 22.5 | 41.3 | 42.4 | 23.2 | - | |
| 111.8 | 107.1 | 95.2 | 103.2 | 56.0 | 58.3 | - | |
| 1016.6 | 992.1 | 605.4 | 293.2 | 325.7 | 297.5 | - | |
| 495 | 325.4 | 773.2 | 1150 | 1149.7 | 391 | - | |
| 90.6 | 96 | 93.7 | 98.4 | 100 | 91.0 | - | |
| 18 | 18 | 19 | 16 | 3 | 2 | - | |
| 1 | 3 | 2 | 84 | 508 | 85 | ||
| 32 | 222 | 87 | 639 | 890 | 243 | 0.13 | |
| 895 | 344 | 858 | 196 | 147 | 840 | 0.72 | |
| 22 | 26 | 38 | 31 | 80 | 76 | ||
| 6 | 7 | 13 | 46 | 32 | 41 | ||
| 6 | 90 | 18 | 602 | 888 | 255 | 0.064 | |
| 793 | 385 | 596 | 252 | 99 | 762 | 0.51 | |
| 8 | 9 | 6 | 10 | 305 | 498 | ||
| 4 | 5 | 10 | 16 | 18 | 19 | ||
| 211 | 6 | 132 | 504 | 876 | 417 | 0.13 | |
| rank(regression) | 2 | 3 | 4 | 15 | 23 | 19 |
Raw feature values and ranking of known true positive clusters based on single features and combined ranking using regression weights. p-values for significant features (p > 0:05) are marked in bold.
Figure 3Structural clustering of PYLIS regions from iORFs in the clusters for known pyrrolysine incorporating genes. The figure shows that there are certain structural groupings which roughly correspond to gene fami-lies. Almost all transposase genes are in the green cluster (26 sequences), but the cluster also contains sequences from the mtmB (5 sequences), mttB (2 sequences) and mtbB (2 sequences) families. This is also a quite diverse cluster in terms of maximal distance between elements. The orange cluster (12 sequences) predominantly contains mttB genes (8 sequences), but also includes the TetR genes (2 sequences), a single transposase1 gene and a single mtmB gene. The purple cluster (9 sequences) is a mix of mttB genes (4 sequences) and mtmB genes (4 sequences), but includes a single spurious transposase gene. The red cluster (12 sequences) is predominantly mtmB (8 sequences), but includes also two mttB genes and two mtbB genes. The blue cluster (15 sequences) mostly contains mtbB genes (13 sequences) and includes a distant sub-cluster having a mttB gene and a transposase1 gene.
Figure 4Predicted consensus structures for each of the known pyrrolysine incorporating clusters. The consensus sequences are using IUPAC Ambiguity Codes. The consensus structures are predicted through the WAR web service [22] and are based on a variety of methods for structure predictions and alignments integrated through the WAR web service ([23-36]). For each structure, the average pairwise identity, free energy, covariance, base pair probability and percentage canonical base pairs are reported. The consensus structures suggested are not necessarily the correct structures, but reasonable (usually conservative) approximations. The different methods integrated in WAR predict slightly different structures, but the similarities between these are reflected in the consensus structure. It is clear that there are several different structures, but also that mttB does not seem to have any significant structure.