| Literature DB >> 32823766 |
Kun-Lung Li1, Keisuke Nakashima1, Jun Inoue2, Noriyuki Satoh1.
Abstract
Horizontal gene transfer (HGT) is the movement of genetic material between different species. Although HGT is less frequent in eukaryotes than in bacteria, several instances of HGT have apparently shaped animal evolution. One well-known example is the tunicate cellulose synthase gene, CesA, in which a gene, probably transferred from bacteria, greatly impacted tunicate evolution. A Glycosyl Hydrolase Family 6 (GH6) hydrolase-like domain exists at the C-terminus of tunicate CesA, but not in cellulose synthases of other organisms. The recent discovery of another GH6 hydrolase-like gene (GH6-1) in tunicate genomes further raises the question of how tunicates acquired GH6. To examine the probable origin of these genes, we analyzed the phylogenetic relationship of GH6 proteins in tunicates and other organisms. Our analyses show that tunicate GH6s, the GH6-1 gene, and the GH6 part of the CesA gene, form two independent, monophyletic gene groups. We also compared their sequence signatures and exon splice sites. All tunicate species examined have shared splice sites in GH6-containing genes, implying ancient intron acquisitions. It is likely that the tunicate CesA and GH6-1 genes existed in the common ancestor of all extant tunicates.Entities:
Keywords: Glycosyl Hydrolase Family 6; Tunicates; cellulose synthase; horizontal gene transfer; intron gain
Mesh:
Substances:
Year: 2020 PMID: 32823766 PMCID: PMC7464555 DOI: 10.3390/genes11080937
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Hypotheses on the origin of tunicate GH6 domain-containing genes. Three scenarios have been proposed to explain the existence of two GH6 domain-containing genes in extant tunicate genomes.
Tunicate GH6-containing genes or gene models and related genes analyzed in this study.
| Species | Domain Content | Short Name of the Gene Used in this Manuscript * | Source Database | Accession/ID of Gene, Transcript, or Protein | Note |
|---|---|---|---|---|---|
| GH6 |
| GenBank | XM_002119543.4/XP_002119579.1 | ||
| CesA+GH6 |
| GenBank | NM_001047983.1/BAD10864.1 | As reported in [ | |
|
| GH6 |
| GenBank (Transcriptome) | GGEI01013363.1 | |
| CesA+GH6 |
| GenBank | AY504665.1/AAR89623.1 | As reported in [ | |
|
| GH6 |
| GenBank (Transcriptome) | GFCC01117283.1 | Possible lineage-specific duplication |
| GH6 |
| GenBank (Transcriptome) | GFCC01119318.1 | No possible catalytic Asp; possible lineage-specific duplication. | |
| CesA+GH6 |
| GenBank (Transcriptome) | GFCC01072613.1 | ||
|
| GH6 |
| Aniseed database | Moocci.CG.ELv1_2.S285391.g07021.01.t | |
| CesA+GH6 |
| Aniseed database | Moocci.CG.ELv1_2.S469068.g15915.01.t | Short GH6 part | |
| GH6 |
| Aniseed database | Moocci.CG.ELv1_2.S469068.g15914.01.t | Very short | |
| GH6 |
| Aniseed database | Moocci.CG.ELv1_2.S469068.g15913.01.t | ||
|
| GH6 |
| Aniseed database | Moocul.CG.ELv1_2.S112948.g12660.01.t | |
| CesA+GH6 |
| Aniseed database | Moocul.CG.ELv1_2.S71617.g04842.01.t | Rhodopsin-like GPCR domain at upstream part | |
| GH6 |
| Aniseed database | Moocul.CG.ELv1_2.S69739.g04625.01.t | ||
|
| GH6 |
| g9326 | ||
| GH6 |
| g61144 | Short, similar to BscGH6-1 | ||
| GH6 |
| g44331 | Similar to BscCesAbGH6 (89.6% identity in the matching 222 AA region) | ||
| GH6 |
| g45080 | Similar to BscCesAaGH6 | ||
|
| GH6 |
| Aniseed database | Boleac.CG.SB_v3.S133.g02304.01.t | |
| CesA+GH6 |
| Aniseed database | Boleac.CG.SB_v3.S157.g03251.01.t | ||
|
| GH6 |
| OikoBase/GenBank | GSOIDT00010490001/CBY09680.1 | |
| GH6 |
| OikoBase/GenBank | GSOIDT00021901001/CBY33927.1 | 98% identical to OdiGH6 | |
| CesA+GH6 |
| GenBank | AB543593.1/BAJ65326.1 | As reported in [ | |
| CesA+GH6 |
| GenBank | AB543594.1/BAJ65327.1 | As reported in [ |
* Gene names were assigned after considering phylogenetic information examined in this study and in that by Inoue et al. [14].
GH6 proteins in different taxa.
| Taxa | GH6 presence? | |||
|---|---|---|---|---|
| Bacteria |
| |||
| Archaea | Not yet observed | |||
| Eukaryota | Opisthokonta | Metazoa | tunicates |
|
| Metazoa, except tunicate | No? Contamination? *1 | |||
| Fungi |
| |||
| Opisthokonta, except Metazoa and fungi | Not yet observed | |||
| Viridiplantae | No? Contamination? *2 | |||
| SAR-Stramenopiles |
| |||
| SAR-Alveolate |
| |||
| SAR-Rhizaria | Not yet observed | |||
| Haptista |
| |||
| Rhodophyta |
| |||
| Other eukaryotes | Not yet observed | |||
*1: A GH6 protein in the Lucilia cuprina (a dipteran) genome project, XP_023300643.1, was very similar to bacterial GH6 proteins. It was located at a genomic scaffold that contained other probable bacterial genes. *2: A GH6 protein found in the Gossypium hirsutum (upland cotton) genome project, XP_016733546.1, was highly similar to bacterial GH6 proteins and it was located at a genomic scaffold that contained other probable bacterial genes. The above two cases were the only results that contained GH6 domains in each search. We treated these two cases as bacterial contaminants.
Figure 2Phylogenetic trees of GH6-containing proteins constructed by Bayesian inference (A) and maximum likelihood (B). All tunicate sequences formed a cluster. The cluster was further divided into two subclusters of CesA-GH6 domains and GH6-1 proteins. However, the clustering of tunicate GH6 sequences with GH6 proteins of other organisms was not well-supported. Rooting was arbitrary in both panels. Numbers next to internal nodes or branches represent posterior probabilities (in panel A) or bootstrap support (in panel B) of the neighboring branch. The same trimmed multiple sequence alignment was used as input for both analyses. Bayesian inference was performed with MrBayes using a mixed substitution model (aamodelpr = mixed). The analysis was terminated after 2,500,000 generations as the standard deviation of split frequencies remained as a stable 0.126917 after generation 1,830,000, although this analysis could not reach an ideal convergence due to short sequence lengths and divergent data. The maximum likelihood analysis was performed with RAxML-HPC BlackBox on CIPRES Science Gateway. The WAG amino acid substitution model with empirical base frequencies was selected and bootstrapping was automatically stopped after 804 cycles. The starting part of sequence names represents its source organism category, in alphabetical order: a, Actinobacteria, excluding Streptomyces; b, Bacteria excluding Actinobacteria; f, fungi; s, genus Streptomyces; T, tunicates. Fully-expanded trees are shown as supplementary figures. Scales represent expected changes per site.
Figure 3Amino acid conservation of tunicate GH6-domain-containing proteins. (A) GH6-1 proteins from ascidians and the GH6-1a from Salpa thompsoni have aspartic acids that correspond to the catalytic center of fungal Cel6A protein; however, another S. thompsoni GH6-1 protein (SthGH6-1b), an Oikopleura GH6-1 protein, and tunicate CesA proteins show other amino acids at this site. Similar amino acids under the BLOSUM62 matrix are color-shaded. HjeCel6A: H. jecorina Exoglucanase 2, UniProtKB P07987.1. (B–C) Sequence logos of Glycosyl Hydrolases Family 6 Signature 1 (PROSITE entry PS00655, panel B) and Glycosyl Hydrolases Family 6 Signature 2 (PROSITE entry PS00656, panel C), showing the amino acid frequency of each site.
Splice site matches of tunicate GH6-1 proteins.
| Splice site name | ||||
|---|---|---|---|---|
| Cin217 | Cin256 | Cin316 | ||
|
|
|
| ||
| CinGH6-1 | 3 | V217, +2 | G256, +1 | K316, +3 |
| CsaGH6-1 | 3 | V223, +2 | G262, +1 | K322, +3 |
| SthGH6-1a | 6 | E229, +2 | G268, +1 | P328, +3 |
| SthGH6-1b | 5 | K230, +2 | G269, +1 | K329, +3 |
| MoxGH6-1 | 3 | R222, +2 | G260, +1 | A320, +3 |
| MocGH6-1 | 2 | n.s.*1 (R222) | G260, +1 | A320, +3 |
| BscGH6-1 | 5 | K335, +2 | G373, +1 | A433, +3 |
| BleGH6-1 | 4 | K229, +2 | n.s.*2 (G285) | A345, +3 |
| OdiGH6-1 | 6 | n.s.*1 (N244) | n.s.*2 (G282) | n.s.*1 (K343) |
| OdiCesA1*3 | 8 | n.s.*1 (R1001) | n.s.*2 (G1040) | R1100*3, frame +2 |
All matching splice sites found in this study are C-terminal to the possible catalytic center: positions 178–187 in C. intestinalis type A GH6-1. *1: No splice (n.s.) site at the aligned amino acid and the amino acid is not conserved; *2: No splice (n.s.) site at the aligned amino acid, although this position encodes a conserved glycine; *3: The splice site OdiCesA1-R1100 could be aligned with splice site Cin316 of GH6-1 proteins at the amino acid level, but there is a one-nucleotide position difference and it may not represent a shared splice site.