| Literature DB >> 31513263 |
Motofumi Saito1,2, Asako Sato1, Shohei Nagata1,2, Satoshi Tamaki1, Masaru Tomita1,2,3, Haruo Suzuki1,3, Akio Kanai1,2,3.
Abstract
<span class="Gene">Clp1, a <span class="Chemical">polyribonucleotide 5'-hydroxyl kinase in eukaryotes, is involved in pretRNA splicing and mRNA 3'-end formation. Enzymes similar in amino acid sequence to Clp1, Nol9, and Grc3, are present in some eukaryotes and are involved in prerRNA processing. However, our knowledge of how these Clp1 family proteins evolved and diversified is limited. We conducted a large-scale molecular evolutionary analysis of the Clp1 family proteins in all living organisms for which protein sequences are available in public databases. The phylogenetic distribution and frequencies of the Clp1 family proteins were investigated in complete genomes of Bacteria, Archaea and Eukarya. In total, 3,557 Clp1 family proteins were detected in the three domains of life, Bacteria, Archaea, and Eukarya. Many were from Archaea and Eukarya, but a few were found in restricted, phylogenetically diverse bacterial species. The domain structures of the Clp1 family proteins also differed among the three domains of life. Although the proteins were, on average, 555 amino acids long (range, 196-2,728), 122 large proteins with >1,000 amino acids were detected in eukaryotes. These novel proteins contain the conserved Clp1 polynucleotide kinase domain and various other functional domains. Of these proteins, >80% were from Fungi or Protostomia. The polyribonucleotide kinase activity of Thermus scotoductus Clp1 (Ts-Clp1) was characterized experimentally. Ts-Clp1 preferentially phosphorylates single-stranded RNA oligonucleotides (Km value for ATP, 2.5 µM), or single-stranded DNA at higher enzyme concentrations. We propose a comprehensive assessment of the diversification of the Clp1 family proteins and the molecular evolution of their functional domains.Entities:
Keywords: comprehensive identification; experimental verification; large protein; molecular evolution; multidomain protein; protein family
Mesh:
Substances:
Year: 2019 PMID: 31513263 PMCID: PMC6777427 DOI: 10.1093/gbe/evz195
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
Data Sets Used in This Study
| (A) Numbers of CDSs in the three domains of life from the UniProtKB database are shown | |||||
|---|---|---|---|---|---|
| Eukarya | Bacteria | Archaea | Others | Total | |
| Species | 1,259,362 | 486,584 | 13,015 | 193,814 | 1,952,775 |
| CDS | 32,407,431 | 96,898,645 | 2,980,127 | 5,485,853 | 137,772,065 |
Distribution of Clp1 Family Proteins in 72 Representative Species with Complete Genome Sequences
| Taxon | Protein | |||||
|---|---|---|---|---|---|---|
| Domain | Kingdom | Species | Clp1 | Nol9 | Grc3 | Othersa |
| (A) | ||||||
| Eukarya | Metazoa |
| 1 | 1 | — | 1 |
|
| 1 | 1 | — | — | ||
|
| 1 | 1 | — | 1 | ||
|
| 1 | 1 | — | — | ||
|
| 1 | 1 | — | — | ||
|
| 3 | 1 | — | — | ||
|
| 2 | 1 | — | — | ||
|
| 1 | 1 | — | — | ||
|
| 1 | 1 | — | — | ||
|
| 1 | 1 | — | — | ||
|
| 1 | — | — | 1 | ||
| Fungi |
| 1 | — | 1 | — | |
|
| 1 | — | 1 | — | ||
|
| 1 | — | — | 1 | ||
| Plantae |
| 1 | — | — | — | |
|
| 3 | 1 | — | 2 | ||
|
| 2 | — | — | 4 | ||
| Amoebozoa |
| 1 | 1 | — | — | |
| Archaea | Euryarchaeota |
| 1 | — | — | — |
|
| — | — | — | — | ||
|
| — | — | — | — | ||
|
| — | — | — | 1 | ||
|
| 1 | — | — | — | ||
|
| — | — | — | — | ||
|
| — | — | — | 1 | ||
|
| 1 | — | — | — | ||
|
| — | — | — | — | ||
|
| — | — | — | — | ||
|
| 1 | — | — | — | ||
|
| — | — | — | — | ||
|
| — | — | — | — | ||
| Crenarchaeota |
| — | — | — | 2 | |
|
| — | — | — | 2 | ||
|
| — | — | — | 1 | ||
|
| — | — | — | — | ||
| Nanoarchaeota |
| — | — | — | — | |
| (B) | ||||||
| Bacteria | Firmicutes |
| — | — | — | — |
|
| — | — | — | — | ||
|
| — | — | — | — | ||
| Planctomycetes |
| — | — | — | — | |
|
| — | — | — | — | ||
| Spirochaetes |
| — | — | — | — | |
|
| — | — | — | — | ||
| Actinobacteria |
| — | — | — | — | |
|
| — | — | — | — | ||
| Fibrobacteres |
| — | — | — | — | |
| Chlorobi |
| — | — | — | — | |
| Bacteroidetes |
| — | — | — | — | |
|
| — | — | — | — | ||
| Chlamydiae |
| — | — | — | — | |
| Fusobacteria |
| — | — | — | — | |
| Thermotogae |
| — | — | — | — | |
| Aquificae |
| — | — | — | — | |
| Chloroflexi |
| — | — | — | — | |
| Deinococcales |
| — | — | — | 1 | |
|
| — | — | — | — | ||
| Cyanobacteria |
| — | — | — | — | |
|
| — | — | — | — | ||
| Acidobacteria |
| — | — | — | — | |
|
| — | — | — | — | ||
| δ-Proteobacteria |
| 1 | — | — | — | |
|
| — | — | — | 1 | ||
| ɛ-Proteobacteria |
| — | — | — | — | |
|
| — | — | — | — | ||
| α-Proteobacteria |
| — | — | — | — | |
|
| — | — | — | — | ||
| β-Proteobacteria |
| — | — | — | — | |
|
| — | — | — | — | ||
| γ-Proteobacteria |
| — | — | — | — | |
|
| — | — | — | — | ||
|
| — | — | — | — | ||
|
| — | — | — | — | ||
Note.—The numbers of Clp1 or Clp1-related proteins in representative complete genomes of (A) Eukarya (18 species), Archaea (18 species), and (B) Bacteria (36 species) are shown.
“Others” contains proteins annotated as “GTPase,” “translation factor GUF1,” or “uncharacterized protein” based on their domain similarities.
. 1.—Phylogeny and domain structure of Clp1 family proteins in the three domains of life. Phylogenetic trees were constructed based on the amino acid sequences of 14 selected Clp1 family proteins using the maximum likelihood method with 1,000 bootstrap replicates. Numbers on the branches indicate bootstrap values. Scale bar under the tree indicates the number of amino acid substitutions per site. Because there is no exact outgroup for the three domains of life, midpoint rooting was used. Proteins that were experimentally characterized are marked with an asterisk. In this study, we demonstrated the polynucleotide kinase activity of a bacterial protein from Thermus scotoductus (UniProt AC: E8PQM6) with unknown function. Domains were visualized with DoMosaics. Domains are defined as follows: Clp1_eN, Clp1 N-terminal domain in eukaryotes; Nol9_eN, Nol9 N-terminal domain in eukaryotes; Clp1_P, polynucleotide kinase domain; Clp1_eC, Clp1 C-terminal domain in eukaryotes; Clp1_aC, Clp1 C-terminal domain in archaea; Nol9_eC, Nol9 C-terminal domain in eukaryotes. § N-terminal lengths in prokaryotic Clp1 family proteins are classified into two groups: Short (S), <60 aa; and long (L), >61 aa (see also supplementary figs. S2 and S3, Supplementary Material online). N/A, not applicable. Scale bar beneath the domain illustration shows the amino acid (aa) length of each protein. The organisms are: Pyrococcus furiosus, Pyrococcus horikoshii, Methanocaldococcus jannaschii, Spirochaetes bacterium, Desulfovibrio africanus, T. scotoductus, Homo sapiens, Bos taurus, Schizosaccharomyces pombe, and Grifola frondosa.
. 2.—Scattered and restricted distribution of Clp1-related proteins on bacterial phylogenetic tree. The number of bacterial Clp1-related proteins was mapped onto a bacterial phylogenetic tree consisting of 578 species with complete genome sequences (Wu and Eisen 2008). At the level of each phylum, the ratio of the number of Clp1-related proteins to the total number of species used in this study (see supplementary table S5, Supplementary Material online) is shown. The calculated ratio is also shown in parentheses. Phyla in which species possessed Clp1-related proteins are underlined in red.
. 3.—Large proteins with the Clp1 polynucleotide kinase domain. Representative examples of large proteins that contain the polynucleotide kinase domain of Clp1 and other functional domains. Scale bar represents 100 amino acid (aa) residues. Each domain is schematically shown in a box and numbered according to its frequency of appearance. See figure 1 legend for the definition of domains Clp1_eN, Nol9_eN, Clp1_P, Clp1_eC, Clp1_aC, and Nol9_eC. Other functional domains are defined in the Pfam database (https://pfam.xfam.org/; last accessed September 17, 2019). Number of proteins with a similar domain structure is shown as “number ×.” For example, “3×” means three proteins with a similar domain structure.
. 4.—Schematic representation of bacterial Clp1 and its conserved motifs. (A) Three Clp1 proteins from human (Homo Sapiens), bacteria (Thermus scotoductus), and archaea (Pyrococcus furiosus) are shown as bars. Numbers below the proteins refer to the positions of the amino acid residues. Percentage identities (similarities) of specific regions among the Clp1 proteins are indicated. The polynucleotide kinase domain of each protein is shown in green. The four conserved functional motifs in the polynucleotide kinase domain are indicated with triangles (see text in detail). (B) Amino acid sequences of the conserved motifs in the polynucleotide kinase domain were aligned with MAFFT. Amino acid sequence alignments were visualized with Jalview. Identical amino acid residues are indicated in blue and partly conserved amino acid residues are indicated in light blue. The amino acid numbers, from the first methionine (Met) residue, are shown on the left of each line. In the consensus sequence line, “x” and “h” mean any amino acid residue and a hydrophobic amino acid residue, respectively. See figure 1 legend for the five organisms used here.
. 5.—Verification of the polynucleotide kinase activity of bacterial Clp1. (A) Purified recombinant Thermus scotoductus Clp1 (Ts-Clp1) was separated on 10–20% SDS-PAGE and stained with Coommassie Brilliant Blue. An arrowhead indicates the position of purified Ts-Clp1. (B–G) Characterization of the polynucleotide kinase activity of Ts-Clp1. Basically, a 3′-fluorescein amidite (FAM)-labeled oligoribonucleotide was incubated with purified Ts-Clp1 protein (0.5 μg/ml) and 10 mM MgCl2 at 60 °C for 15 min. The products were separated by 15% polyacrylamide gel electrophoresis with 8 M urea. The effects of the nucleoside triphosphate (B–C), type of nucleic acid (D–F), and temperature (G) on the polynucleotide kinase activity were examined. Substrates used are indicated on the top of each column. Pre-incu: The reaction mixture was preincubated at 90 °C before the enzyme was added. See also supplementary figure S10, Supplementary Material online.