| Literature DB >> 23671590 |
Anna Lenart1, Małgorzata Dudkiewicz, Marcin Grynberg, Krzysztof Pawłowski.
Abstract
The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23671590 PMCID: PMC3650047 DOI: 10.1371/journal.pone.0062272
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Histogram.
Pfam protein domains binned by percentage of family members that possess the HExxH motif. Shown are only the domains with more than 50 occurrences of the motif included plus all Peptidase domains with at least one occurrence.
Structure predictions for domain families with majority of members possessing the HExxH motif.
| Query | % of domain family members possessing the HExxH motif | FFAS | % sequence identity | top FFAS hit | First prediction of a zincin-like structure |
| CLCA_N (PF08434) Calcium-activated chloride channel | 72 | −6,3 | 11 | d1kufa_ d.92.1.9 (A:) Snake venom metalloprotease [Trimeresurus mucrosquamatus], atrolysin E |
|
| SprA-related (PF12118) SprA-related family | 95 | −7,7 | 11 | PF05569.4; Q8RPJ4_DESHA/7–280; BlaR1 peptidase M56 | this article |
| FA_desaturase (PF00487) Fatty acid desaturase | 37 | −8,4 | 16 | d1k7ia2 d.92.1.6 (A:18–258) Metalloprotease [Erwinia chrysanthemi] | this article |
| Metallopep (PF12044) Putative peptidase family | 87 | −12,3 | 20 | d1c7ka_ d.92.1.1 (A:) Zinc protease [Streptomyces caespitosus] | Pfam annotation |
| MtfA (PF06167) Phosphoenolpyruvate:glucose-phosphotransferase regulator | 100 | −43,9 | 11 | d1j7na2 d.92.1.14 (A:551–773) Anthrax toxin lethal factor, N- and C-terminal domains [Bacillus anthracis] |
|
| DUF2248 (PF10005) Uncharacterized protein conserved in bacteria | 100 | −13,9 | 10 | d1j7na2 d.92.1.14 (A:551–773) Anthrax toxin lethal factor, N- and C-terminal domains [Bacillus anthracis] | this article |
| DUF2265 (PF10023) Predicted aminopeptidase | 100 | −7,7 | 11 | d3b7sa3 d.92.1.13 (A:209–460) Leukotriene A4 hydrolase catalytic domain [Homo sapiens] | Pfam annotation |
| DUF462 (PF04315) Protein of unknown function | 100 | −7 | 13 | d1j7na2 d.92.1.14 (A:551–773) Anthrax toxin lethal factor, N- and C-terminal domains [Bacillus anthracis] | this article |
| DUF922 (PF06037) Bacterial protein of unknown function | 100 | −7,4 | 10 | d1kjpa d.92.1.2 (A:) Thermolysin [Bacillus thermoproteolyticus] | this article |
| DUF3267 (PF11667) Protein of unknown function | 83 | −10,8 | 12 | d1asta_ d.92.1.8 (A:) Astacin [Astacus astacus)] | this article |
| DUF1025 (PF06262) Domain of unknown function | 76 | −43,4 | 34 | d3e11a1 d.92.1.17 (A:1–113) Uncharacterized protein Acel_2062 [Acidothermus cellulolyticus] | this article |
| DUF2342 (PF10103) Uncharacterized conserved protein | 43 | −93,8 | 19 | d3cmna1 d.92.1.16 (A:43–391) Uncharacterized protein Caur0242 [Chloroflexus aurantiacus] | this article |
Figure 2Sequence logos of substituted and conserved active site motifs in selected zincin-like families.
Figure 3Phylogenetic tree (ANCESCON, see Methods) of selected representatives of the CLCA_N domain.
Locations of proteins with substituted and correct active site motifs. Also predicted active sites of ancestral sequences shown.
Habitats and lifestyles of bacteria and archaea possessing CLCA proteins.
| Species, strain | phylum | environment | lifestyle | oxygen requirement | energy source | thermo-philic |
|
| γ-proteobacteria | aquatic | free living | facultative anaerobic | chemoautotroph | − |
|
|
| aquatic/sewage | free living | anaerobic | chemoautotroph | − |
|
|
| aquatic/hot springs | free living | facultative aerobic | photoautotroph | + |
|
| α-proteobacteria | aquatic | free living | aerobic | chemoautotroph | − |
|
| δ-proteobacteria | aquatic/marine sediments | free living | anaerobic | chemoautotroph | − |
|
|
| aquatic | free living | aerobic | chemotroph | − |
|
| δ-proteobacteria | marine/aquatic | free living | aerobic | chemoheterotroph | − |
|
|
| aquatic | free living | aerobic | phototroph | + |
|
| γ-proteobacteria | aquatic | free living | facultative anaerobie | heterotroph | mesophile (30–40°C) |
|
|
| aquatic | free living | facultative anaerobie | chemoautotroph | hyper-thermofile |
Figure 4Tree of Life, i.e. representative species tree (adapted from iTOL [77]), with approximate locations of CLCA protein-possessing organisms shown.
Schematic diagrams of domain architectures shown also: red, CLCA_N; green, von Willebrand factor type A; magenta, DUF1973; blue, fibronectin type III; black, other.
Figure 5Protein domain architectures (Pfam) of selected CLCA proteins.
Figure 6Multiple sequence alignments of representative CLCA_N sequences (top) and representative CLCA_X sequences (bottom).
Only regions around the predicted HexxH active site shown. Predicted secondary structures shown (jnetpred). Full versions of the alignments shown in Figs. S2 and S4, respectively.
Figure 7HHalign alignment between HHsenser-generated profiles of CLCA_N and CLCA_X protease domains.
Figure 8CLANS sequence similarity network for zincin-like proteins.
Pfam clan Peptidase_MA and other zincin-like proteins included (see Methods). Four different BLAST E-value thresholds used for CLANS clustering. Pfam clan Peptidase_MA proteins: Matrixin (Peptidase_M10): green, Peptidase_M1: magenta, Reprolysin: cyan, Zn_peptidase_2: brown. Others in the Peptidase_MA clan: black Proteins not included in the Pfam clan Peptidase_MA: CLCA_N: red, CLCA_X: orange, Others outside the Peptidase_MA clan: blue. A) Relations with significance of P-value below 0.1, B) P-value below 0.01, C) P-value below 1E-5 and D) P-value below 1E-10.