| Literature DB >> 17069655 |
Timothy J D Goodwin1, Margaret I Butler, Russell T M Poulter.
Abstract
BACKGROUND: Inteins are self-splicing protein elements. They are translated as inserts within host proteins that excise themselves and ligate the flanking portions of the host protein (exteins) with a peptide bond. They are encoded as in-frame insertions within the genes for the host proteins. Inteins are found in all three domains of life and in viruses, but have a very sporadic distribution. Only a small number of intein coding sequences have been identified in eukaryotic nuclear genes, and all of these are from ascomycete or basidiomycete fungi.Entities:
Mesh:
Substances:
Year: 2006 PMID: 17069655 PMCID: PMC1635734 DOI: 10.1186/1741-7007-4-38
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Newly described inteins from the second largest subunit of RNA polymerases.
| Intein | Organism | Taxonomic group | Allele | size |
| Pno RPA2 | Ascomycota | RPA2-a | 456 | |
| Cre RPB2 | Green alga | RPB2-a | 431 | |
| Cst RPB2 | Chytrid | RPB2-b | 362 | |
| Sas RPB2 | Zygomycota | RPB2-b | 354 | |
| Bde RPB2 | Chytrid | RPB2-c | 488 | |
| Ddi RPC2 | Amoebozoa | RPC2-a | 464 | |
| PrV RPO | StramenopileNCLDV? | RPO-a | incomplete | |
| Unnamed | Unclassified Sargasso sea | unknown | RPO-a | incomplete |
| EhV RPO | Haptophyte NCLDV | RPO-a | incomplete |
*No intein is present at the allelic site in another Emiliana huxleyi virus isolate, Emiliana huxleyi virus 86.
Intein size is expressed as amino-acid residue number. NCLDV indicates a member (or putative member) of the nucleocytoplasmic large DNA virus group.
Figure 1Intein insertions into eukaryotic and viral RNA polymerases. Alignments of intein/extein borders for the eight inteins in the six RNA polymerase intein insertion sites. RNA polymerase sequences are taken from accession data as described in Methods. The unclassified Sargasso Sea sequence is from GenBank accession AACY01369547, the E. huxleyi virus 163 sequence is from GenBank accession (DQ127798). The dashes represent missing data.
Figure 2Phylogenetic distance tree of RNA polymerases. RNA polymerase sequences are taken from accession data as described in the Methods section. The unrooted tree was constructed by the neighbour-joining method using PAUP*4b10 [52] and the default settings. Numbers on the branches indicate the percentages of bootstrap support indicated by a heuristic search with 100 random addition replicates and the tree-bisection-reconnection branch-swapping algorithm. All bootstrap values > 50 have been reported except where they occur within the three well-supported RNA polymerase I (rpo1), RNA polymerase II (rpo2) and RNA polymerase III (rpo3) groups. RNA polymerases that contain an intein are indicated by asterisks (*); strains of the E. huxleyi virus are polymorphic for the presence of an intein in RNA polymerase (*/-). The alignment used is available as supplementary data (additional file 1).
The unclassified sequence from the Sargasso Sea is unlikely to represent a fragment of a viral genome. TBLASTN searches were conducted at NCBI using as a query the 59 residues from the Sargasso Sea sequence (Accession AACY01369547) that formed the putative C-extein. These 59 residues are encoded on the complementary strand, from base 556 to base 732. Each search was restricted to one of the six groups outlined below.
| Accession | Sequences producing significant alignments | E value |
| Fungi | ||
| gb | 4 × 10-9 | |
| gb | 7 × 10-9 | |
| gb | 1 × 10-8 | |
| gb | 1 × 10-8 | |
| gb | 2 × 10-8 | |
| Metazoa | ||
| gb | 5 × 10-7 | |
| dbj | 1 × 10-6 | |
| gb | 2 × 10-6 | |
| gb | 2 × 10-6 | |
| gb | 2 × 10-6 | |
| Plantae | ||
| emb | MGU565937 | 9 × 10-7 |
| gb | 1 × 10-6 | |
| gb | 1 × 10-6 | |
| gb | 1 × 10-6 | |
| gb | 1 × 10-6 | |
| emb | GSP566358 | 1 × 10-6 |
| Archaea | ||
| gb | 2 × 10-6 | |
| gb | 2 × 10-6 | |
| gb | 3 × 10-6 | |
| gb | 4 × 10-6 | |
| emb | 5 × 10-6 | |
| emb | 9 × 10-6 | |
| Viruses | ||
| gb | Tiger frog virus, complete genome | 2 × 10-4 |
| gb | Frog virus 3, complete genome | 4 × 10-4 |
| gb | Grouper iridovirus, complete genome | 0.001 |
| gb | 0.001 | |
| gb | 0.001 | |
| gb | 0.001 | |
| emb | 0.003 | |
| Eubacteria | ||
| No significant similarity found. |
Figure 3Profile of RNA polymerase alignment showing high level of conservation at intein insertion sites. The plot was generated from an alignment of multiple eukaryotic RNA polymerase I, II and III sequences using the PLOTSIMILARITY program of the GCG package of sequence analysis programs [49]. Intein location positions are as follows: BdeRPB2-c 843; Ddi RPC2 853; PnoRPA2 1195; P. ramorum virus, Sargasso Sea isolate, E. huxleyi virus 163 1516; SasRPB2-b, CstRPB2-b 1664; CreRPB2-a 1696. Intron locations are indicated by a short vertical line at their insertion site and are as follows: D. discoideum 1–111, 2–184; P. nodorum 5–301, 17–1192; B. dendrobatidis 13–826, 21–1357; C. reinhardtii 3–222, 4–295, 6–335, 7–376, 8–439, 9–545, 10–657, 11–709, 12–797, 14–867, 15–998, 16–1093, 18–1217, 19–1287, 20–1346, 22–1401, 23–1511, 24–1587, 25–1688, 26–1817.
Figure 4The position of the six RNA polymerase intein insertion sites mapped onto the crystal structure of the RNA polymerase II of . Representation of S. cerevisiae RNA polymerase II (PDB: 1I3Q); the second largest subunit is coloured dark grey (other subunits, including the largest, are coloured light brown). Top: surface view showing the position of the cleft formed by the two largest subunits and the position of the four intein insertion sites (indicated by different colours) on the surface of the cleft, near the active site/"wall" region. Lower: three surface views from a different orientation; the middle image has all of the subunits other than the second largest subunit as a semi-transparent surface so that the position of the two intein insertion sites on the interface between RPB1 and RPB2 (red and blue regions) can be seen. The lowest image is of RPB2 only, but with the intein insertion sites labelled with the names of inteins inserted at these positions in some homologues.
Figure 5Phylogenetic tree of intein splicing domains. This unrooted distance tree was constructed by the neighbour-joining method using PAUP*4b10 [52] with the default settings. Numbers on the branches indicate the percentages of bootstrap support derived from a heuristic search with 100 random addition replicates; this search included a tree-bisection-reconnection branch-swapping algorithm. All bootstrap values > 50 have been reported except in cases where allelic inteins fall within a well-supported (95–100% bootstrap) group, when some values >50 have been omitted for reasons of space. Inteins encoded by nuclear genes are highlighted with an asterisk. The alignment used is available as supplementary data (additional file 5).