| Literature DB >> 32953382 |
Lenka Kerényiová1, Štefan Janeček1,2.
Abstract
The family GH126 is a family of glycoside hydrolases established in 2011. Officially, in the CAZy database, it counts ~ 1000 sequences originating solely from bacterial phylum Firmicutes. Two members, the proteins CPF_2247 from Clostridium perfringens and PssZ from Listeria monocytogenes have been characterized as a probable α-amylase and an exopolysaccharide-specific glycosidase, respectively; their three-dimensional structures being also solved as possessing catalytic (α/α)6-barrel fold. Previously, based on a detailed in silico analysis, the seven conserved sequence regions (CSRs) were identified for the family along with elucidating basic evolutionary relationships within the family members. The present study represents a continuation study focusing on two particular aims: (1) to find out whether the taxonomic coverage of the family GH126 might be extended outside the Firmicutes and, if positive, to deliver those out-of-Firmicutes proteins with putting them into the context of the family; and (2) to identify the family members containing the N- and/or C-terminal extensions of their polypeptide chain, additional to the catalytic (α/α)6-barrel domain, and perform the bioinformatics characterization of the extra domains. The main results could be summarized as follows: (1) 17 bacterial proteins caught by BLAST searches outside Firmicutes (especially from phyla Proteobacteria, Actinobacteria and Bacteroidetes) have been found and convincingly suggested as new family GH126 members; and (2) a thioredoxin-like fold and various leucine-rich repeat motifs identified by Phyre2 structure homology modelling have been recognized as extra domains occurring most frequently in the N-terminal extensions of family GH126 members possessing a modular organization.Entities:
Keywords: Bacterial members out-of-firmicutes; Family GH126; In silico analysis; Leucine-rich repeat motif; Sequence-structural features; Thioredoxin-like fold
Year: 2020 PMID: 32953382 PMCID: PMC7479077 DOI: 10.1007/s13205-020-02415-x
Source DB: PubMed Journal: 3 Biotech ISSN: 2190-5738 Impact factor: 2.406
Seventeen hypothetical proteins outside Firmicutes with clear similarities to GH126
| No.a | Organism | Phylum | GenBankb | UniProtc | Length |
|---|---|---|---|---|---|
| 1 | Bacterium BCRC 81,127 | Unclassified | WP_135371658.1 | UPI00107F4117 | 379 |
| 2 | Bacterium BCRC 81,129 | Unclassified | WP_135367822.1 | UPI00107F44C3 | 350 |
| 3 | Bacterium 42_11 | Unclassified | KUK13779.1 | A0A117KYT7 | 365 |
| 4 | Bacteroidetes | WP_104434259.1 | UPI000CEC40E9 | 538 | |
| 5 | Proteobacteria | OGQ30614.1 | UPI0008C880CE | 416 | |
| 6 | Proteobacteria | OGQ58036.1 | A0A1F9IPT0 | 437 | |
| 7 | Actinobacteria | CPW32488.1 | UPI0001A5C03B | 388 | |
| 8 | Proteobacteria | PNB55453.1 | A0A2N8FV32 | 131f | |
| 9 | Bacteroidetes | SJN19201.1 | UPI00032F5CEA | 388 | |
| 10 | Synergistetes | HDQ93145.1 | ---d | 370 | |
| 11 | Chlamydiae | SHE13947.1 | UPI000A27BFEE | 364 | |
| 12 | Proteobacteria | OON71423.1 | UPI00016B383E | 361 | |
| 13 | Actinobacteria | SLB95965.1 | UPI0009C51C47 | 358 | |
| 14 | Proteobacteria | RJO68936.1 | A0A3A4K738 | 387 | |
| 15 | Proteobacteria | WP_047792160.1 | UPI0006492D13 | 361 | |
| 16 | Proteobacteria | EAU0476096.1 | ---d | 281f | |
| 17 | Proteobacteria | EAQ6393019.1 | ---d | 285f |
aProteins 1–10 were caught by BLAST with the CPF_2247 protein as the query only; proteins 11–15 were caught by BLAST with both CPF_2247 and PssZ proteins as queries; proteins 16–17 were caught by BLAST with the PssZ protein as the query only. For all 17 proteins, the E-value from all BLAST searches ranged from 6e−35 to 4e−06, which was considered satisfactory
bThe accession numbers from the GenBank database
cThe accession numbers from the UniProt database (UniParc – starting with “UPI”)
dThe UniProt accession number is still not available
eThe protein from Bacteroides xylanolyticus contains the N-terminal extension (1–154) adopting the thioredoxin-like fold
f Fragment; the sequence does not contain the entire catalytic (α/α)6-barrel domain characteristic for the family GH126 that typically covers 7 conserved sequence regions
List of ten GH126 proteins possessing either the N- or C-terminal extension
| No.a | Organism | GenBankb | Length | Extensionc | GH126d | Motife | Template (PDB)f | CDDg | Pfamh |
|---|---|---|---|---|---|---|---|---|---|
| 1 | QHK13041.1 | 637 | 458–636 | 43–353 | GGDEF | Signalling protein from | + + | + + | |
| 2 | QJU43754.1 | 521 | 36–202 | 207–520 | Trx-like | Protein DipZ from | + + | + + | |
| 3 | APF21752.1 | 521 | 53–197 | 199–520 | Trx-like | Protein DipZ from | + + | + + | |
| 4 | AXB84457.1 | 526 | 41–207 | 212–525 | Trx-like | Protein DipZ from | + + | + + | |
| 5 | QGG46501.1 | 523 | ~ 1–150 | 166–520 | No relevant homologous structure found | + + | DUF | ||
| 6 | QGG60425.1 | 776 | 178–294 | 450–774 | LRR | Internalin k from | + | ||
| 7 | AYM02277.1 | 1399 | 37–759 | 1051–1390 | LRR | Ser/Thr-protein kinase from | + + | + + | |
| 8 | ALO03904.1 | 658 | 47–151 | 317–658 | LRR | Internalin k from | + | ||
| 9 | AVK64614.1 | 658 | 47–151 | 322–658 | LRR | Internalin k from | + | ||
| 10 | WP_104434259.1 | 538 | 36–209 | 213–532 | Trx-like | Protein DipZ from | + + | + + |
aProteins Nos 1–9 were taken directly from the CAZy database from the family GH126; they all originate from the phylum Firmicutes. The protein No. 1 should belong to the group of the PssZ protein from L. monocytogenes, whereas the proteins Nos 2–9 should belong to the group of the CPF_2247 amylolytic enzyme from C. perfringens (for details, see Kerenyiova and Janecek 2020). Note, the protein No. 5 from Heliorestis convoluta exhibits features of both above-mentioned groups. The protein No. 10 was caught by the BLAST search (cf. Table 1)
bThe accession numbers from the GenBank database
cThe borders of individual extensions were decided with respect to: (1) sequence alignment with family GH126 members without any extension (mainly the two members with solved tertiary structure—CPF_2247 and PssZ); and (2) structure homology modelling results obtained by the Phyre2 server
dThe approximate borders of the family GH126 (α/α)6-barrel anticipated from the results provided by the Phyre2 server according to the templates of the CPF_2247 amylolytic enzyme (3REN)
eThe motifs are abbreviated as follows: GGDEF, a diguanylate cyclase domain with the GGDEF region; Trx-like, thioredoxin-like fold; LRR, leucine-rich repeat
fA protein used as one of a few closely related best structural templates for homology modelling by the Phyre2 server (PDB code in parentheses)
g,hA search in databases CDD and Pfam using the entire amino acid sequence. The sign “ + + ” means the results from homology modelling were confirmed. The results were confirmed also for the sign “ + ”; in that case just the first 300 residues from the N-terminal end were used for the particular search. For the protein No. 5: DUF—an archaeal domain of unknown function DUF373 (predicted to be an integral membrane protein with six transmembrane regions)—although shown here, considered irrelevant since spanning only a short region of residues 42–83
Fig. 1Sequence alignment of potential members of the family GH126 originating outside the phylum Firmicutes with two best studied family representatives. Seventeen putative family members (cf. Table 1) are shown in green, while the two representatives of the family GH126, the CPF_2247 amylolytic enzyme from C. perfringens and the PssZ protein from L. monocytogenes, are coloured red and blue, respectively. Note, the N-terminal extension (residues 1-154) of the protein from Bacteroides xylanolyticus has been cut off as well as the protein from Pseudomonas sp. GW457-E7 and both from Salmonella enterica represent fragments with respective lengths of 131, 281 and 285 residues, respectively. The seven conserved sequence regions characteristic for the family GH126 (Kerenyiova and Janecek 2020) are boxed and indicated above the alignment. The two potential catalytic residues—Glu84 in CSR-1 and Asp136 in CSR-3 (CPF_2247 numbering) as well as the potentially functional aromatics—Tyr194 in CSR-5 are italicized. Identical positions and conservative substitutions are signified by asterisks and dots/colons, respectively, under the alignment. The colour code for the selected residues: W, yellow; F, Y—blue; V, L, I—green; D, E—red; R, K—cyan; H—brown; C—magenta; G, P—black
Fig. 2Evolutionary tree of the family GH126. The tree consists of 117 unique non-redundant sequences of the family GH126 (all from Firmicutes) and 17 additional potential family members originating outside the phylum Firmicutes. The tree is based on the alignment of complete sequences (for details, see Fig. S1). The two large evolutionary groups identified previously (Kerenyiova and Janecek 2020) represented by the CPF_2247 amylolytic enzyme from C. perfringens (48 members; red colour) and the PssZ protein from L. monocytogenes (69 members; blue colour) are completed by additional out-of-Firmicutes sequences coloured green. Each protein is labelled by the name of the organism and the GenBank accession number. Four proteins containing the N-terminal extensions that were cut for making the alignment are marked by an asterisk; the length of the extension being indicated in parentheses. With regard to bootstrap values (not shown to preserve the clarity), they were ≥ 50% for more than 83% of interior branches
Fig. 3Structural models of terminal extra domains of family GH126 members. a The model of the C-terminal extension of the protein from Bacillus velezensis (GenBank accession No.: QHK13041.1; residues S458-E636; red) overlapped with the corresponding part of a signalling protein from Caulobacter vibrioides (PDB code: 1W25; residues L261-K442; yellow); b the model of the N-terminal extension of the protein from Clostridium butyricum (AXB84457.1; residues I41-S207; red) overlapped with the thioredoxin-like fold present in the protein Rv2874 from Mycobacterium tuberculosis (2HYX; residues I376-K545; yellow); c the model of the N-terminal extension of the protein from Lactobacillus brevis (AYM02277.1; residues S37-G759; red) with the leucine-rich-repeat domain present in the Ser/Thr-protein kinase from Arabidopsis thaliana (6S6Q; residues T29-N-859; yellow); and d the model of the N-terminal extension of the protein from Bacteroides xylanolyticus (WP_104434259.1; residues N37-E209; green) overlapped with the thioredoxin-like fold present in the protein Rv2874 from Mycobacterium tuberculosis (2HYX; residues E366-N542; yellow). The individual superimposed parts cover: a 179 Cα-atoms with a 0.24 Å RMSD; b 162 Cα-atoms with a 0.50 Å RMSD; c 676 Cα-atoms with a 0.59 Å RMSD; and d 170 Cα-atoms with a 0.57 Å RMSD. Note, all templates are in each case coloured yellow, whereas the models are shown in red (a, b and c) or green (d) depending on the fact whether or not the protein has already been classified in the family GH126